Opened 8 years ago

Last modified 8 years ago

#4690 assigned enhancement

page breaks: paragraph/headings handling etc.

Reported by: izzy Owned by: Nickolas Grigoriadis
Priority: normal Component: TracWikiToPdfPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.11


If possible, content could be handled a bit better on page breaks. At the moment it can (and does) happen, that a heading is the last line on a page, and the corresponding paragraph starts on the next page - in those cases, the paragraph should go to the next page as well. This applies to:

  • Headings: No heading in the last line of a page - move it to the next
  • Tables: Do not end a page with the first line of a table - move it to the next. Here could be a configuration item on how many table rows to keep together, but obviously we have to split the table at least when it's longer than a page.
  • {{{ Code }}}: Do not end the page with the first line of it - move it to the next
  • lists: the same (it simply does not look nice otherwise)
  • paragraphs: optional setting for orphans (single line(s) at page end) and widows (single line(s) at page start), which also could apply to tables, code, and lists

Attachments (0)

Change History (9)

comment:1 Changed 8 years ago by Nickolas Grigoriadis

I know that you can add special (htmldoc specific) formatting hints in to the HTML, and it does listen. So I'm assuming it would merely be a a series of regex search & modify statements.

Placing like comments before a HTML statement:

<!-- NEED 12cm -->

would mean that at that point it needs 12cm to bottom of printable page, else it must immediately skip to next page.

Would a mechanism like that be able to improve the formatting?

comment:2 Changed 8 years ago by izzy

If you mean this needs to be added in the wiki text, it is not practicable. You cannot expect all users to always keep that in mind - and a wiki is community driven. So practically, those HTML comments need to be added by the plugin when processing the content. And in that context: Yes, this is what I meant - the plugin could add these comments to let htmldoc decide, or it could itself make the decisions and place adequate page breaks. But these comments should not be a mandatory part of the wiki content itself (though they could of course be there optionally, in which case the plugin would need to be aware of them).

comment:3 Changed 8 years ago by Nickolas Grigoriadis

I meant that the plugin adds this implicitly (No user intervention). But for that to work, we need to know what minimum space would be needed for what type of HTML tag.


  • <H1> - Needs 12cm
  • <H2> - Needs 6cm
  • <h3> - Needs 4cm

and for what other types of tags we need to be concerned about. (Sorry I'm being lazy here)

comment:4 Changed 8 years ago by izzy

Ah - OK, generally spoken: Yes. But how much space does <H1> need? That always depends on how much text it contains. So you'd need to consider the page width, the font size used, the spacing between the lines, and how many lines will be there then. To give an example: Assuming a page width of 160px (to make it easier to follow the example), <H1> using a monospaced font with a size of 16px (resulting in 10 characters per line), containing a)5 or b)15 characters, would mean a) one line with 16px or b) two lines à 16px plus the space between the lines.

Not trivial, I know - but should be possible.

comment:5 Changed 8 years ago by Nickolas Grigoriadis

Hmmm. I think that it actually is the job of the layout engine (HTMLDoc) to do much fancy stuff. I was thinking of adding some generic "hints". Since depending on the configured system font and page/size we cannot ever really know, so let's not over-engineer.

Your initial requirements requested coverage over orphaned lines. So maybe we should just requirements to cover just more than 2 lines?

e.g. Just before (at start) of:

  • 4cm : <H1>
  • 2cm : <H2>, <H3>, 'table', ' ', <UL>, <OL>
  • 1cm : 'new paragraph'

So we just litter the data with hints that we want at-least that much spacing. I think the result will be satisfactorily.

comment:6 Changed 8 years ago by izzy

Better than nothing :) So what we are talking about with "cm" when it comes to PDF: 75dpi? i.e. ~30px per cm? And what are the default font sizes: 12px standard .. 16px for H1? So we could average on 16px per row (12px + spacing for standard), i.e. ~2 lines per cm - sounds realistic. In that case I'd agree to your suggestions for the first two points - but suggest to increase the 3rd to 1.5 cm (3 lines).

Remains the question: Having a P with 4 lines, the new page would start with a single line (widow) - which is also not that nice. Could that be handled as well?

comment:7 Changed 8 years ago by Nickolas Grigoriadis

Even LaTeX doesn't get it perfect. And my preferred default font-size (for printing, and therefore for pdf) is 10pt. I wonder if HTMLDoc supports the em unit?

Found solution: (From HTMLDoc documentation)

<!-- NEED length -->
    Break if there is less than length units left on the current page. The length value defaults to lines of text but can be suffixed by in, mm, or cm to convert from the corresponding units. 

So we can just specify a minimum no of lines, and no need for cm. I need to determine if it is lines, as in post-formatted or pre-formatted lines. if it is post-formatted, we just state 'NEED 3' for everything, else we need to handle H1, H2 & H3 differently. (6,4,3 respectively).

Regarding orphaned lines at the top of a page, I haven't found any hints in HTMLDoc that helps for that :|

comment:8 Changed 8 years ago by izzy

Sounds good so far - and at least solves more than half of the trouble, I'd say. And the widows look only strange if followed immediately by another Hx or table, so this can be added later if we found something. At least the orphans can be handled the way you described - which is already a huge improvement, I'd say!

And if I should guess, htmldoc means the "post-formatted" lines - since this is the only thing making sense. In HTML source (i.e. pre-formatted), a line could easily reach several hundreds (or even thousands) of characters - so in that case the result is unpredictable for dynamic content (database driven, etc.). I'd say give it a try and assume I'm right. We will see from the results if I really was ;)

comment:9 Changed 8 years ago by Nickolas Grigoriadis

Owner: changed from Diorgenes Felipe Grzesiuk to Nickolas Grigoriadis
Status: newassigned

Modify Ticket

as assigned The owner will remain Nickolas Grigoriadis.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.