Opened 8 years ago

# page breaks: paragraph/headings handling etc.

Reported by: Owned by: izzy grigi normal TracWikiToPdfPlugin normal 0.11

### Description

If possible, content could be handled a bit better on page breaks. At the moment it can (and does) happen, that a heading is the last line on a page, and the corresponding paragraph starts on the next page - in those cases, the paragraph should go to the next page as well. This applies to:

• Headings: No heading in the last line of a page - move it to the next
• Tables: Do not end a page with the first line of a table - move it to the next. Here could be a configuration item on how many table rows to keep together, but obviously we have to split the table at least when it's longer than a page.
• {{{ Code }}}: Do not end the page with the first line of it - move it to the next
• lists: the same (it simply does not look nice otherwise)
• paragraphs: optional setting for orphans (single line(s) at page end) and widows (single line(s) at page start), which also could apply to tables, code, and lists

### comment:1 Changed 8 years ago by grigi

I know that you can add special (htmldoc specific) formatting hints in to the HTML, and it does listen. So I'm assuming it would merely be a a series of regex search & modify statements.

Placing like comments before a HTML statement:

<!-- NEED 12cm -->


would mean that at that point it needs 12cm to bottom of printable page, else it must immediately skip to next page.

Would a mechanism like that be able to improve the formatting?

### comment:2 Changed 8 years ago by izzy

If you mean this needs to be added in the wiki text, it is not practicable. You cannot expect all users to always keep that in mind - and a wiki is community driven. So practically, those HTML comments need to be added by the plugin when processing the content. And in that context: Yes, this is what I meant - the plugin could add these comments to let htmldoc decide, or it could itself make the decisions and place adequate page breaks. But these comments should not be a mandatory part of the wiki content itself (though they could of course be there optionally, in which case the plugin would need to be aware of them).

### comment:3 Changed 8 years ago by grigi

I meant that the plugin adds this implicitly (No user intervention). But for that to work, we need to know what minimum space would be needed for what type of HTML tag.

e.g.:

• <H1> - Needs 12cm
• <H2> - Needs 6cm
• <h3> - Needs 4cm

and for what other types of tags we need to be concerned about. (Sorry I'm being lazy here)

### comment:4 Changed 8 years ago by izzy

Ah - OK, generally spoken: Yes. But how much space does <H1> need? That always depends on how much text it contains. So you'd need to consider the page width, the font size used, the spacing between the lines, and how many lines will be there then. To give an example: Assuming a page width of 160px (to make it easier to follow the example), <H1> using a monospaced font with a size of 16px (resulting in 10 characters per line), containing a)5 or b)15 characters, would mean a) one line with 16px or b) two lines à 16px plus the space between the lines.

Not trivial, I know - but should be possible.

### comment:5 Changed 8 years ago by grigi

Hmmm. I think that it actually is the job of the layout engine (HTMLDoc) to do much fancy stuff. I was thinking of adding some generic "hints". Since depending on the configured system font and page/size we cannot ever really know, so let's not over-engineer.

Your initial requirements requested coverage over orphaned lines. So maybe we should just requirements to cover just more than 2 lines?

e.g. Just before (at start) of:

• 4cm : <H1>
• 2cm : <H2>, <H3>, 'table', ' ', <UL>, <OL>
• 1cm : 'new paragraph'

So we just litter the data with hints that we want at-least that much spacing. I think the result will be satisfactorily.

### comment:6 Changed 8 years ago by izzy

Better than nothing :) So what we are talking about with "cm" when it comes to PDF: 75dpi? i.e. ~30px per cm? And what are the default font sizes: 12px standard .. 16px for H1? So we could average on 16px per row (12px + spacing for standard), i.e. ~2 lines per cm - sounds realistic. In that case I'd agree to your suggestions for the first two points - but suggest to increase the 3rd to 1.5 cm (3 lines).

Remains the question: Having a P with 4 lines, the new page would start with a single line (widow) - which is also not that nice. Could that be handled as well?

### comment:7 Changed 8 years ago by grigi

Even LaTeX doesn't get it perfect. And my preferred default font-size (for printing, and therefore for pdf) is 10pt. I wonder if HTMLDoc supports the em unit?

Found solution: (From HTMLDoc documentation)

<!-- NEED length -->
Break if there is less than length units left on the current page. The length value defaults to lines of text but can be suffixed by in, mm, or cm to convert from the corresponding units.


So we can just specify a minimum no of lines, and no need for cm. I need to determine if it is lines, as in post-formatted or pre-formatted lines. if it is post-formatted, we just state 'NEED 3' for everything, else we need to handle H1, H2 & H3 differently. (6,4,3 respectively).

Regarding orphaned lines at the top of a page, I haven't found any hints in HTMLDoc that helps for that :|

### comment:8 Changed 8 years ago by izzy

Sounds good so far - and at least solves more than half of the trouble, I'd say. And the widows look only strange if followed immediately by another Hx or table, so this can be added later if we found something. At least the orphans can be handled the way you described - which is already a huge improvement, I'd say!

And if I should guess, htmldoc means the "post-formatted" lines - since this is the only thing making sense. In HTML source (i.e. pre-formatted), a line could easily reach several hundreds (or even thousands) of characters - so in that case the result is unpredictable for dynamic content (database driven, etc.). I'd say give it a try and assume I'm right. We will see from the results if I really was ;)

### comment:9 Changed 8 years ago by grigi

• Owner changed from diorgenes to grigi
• Status changed from new to assigned