Modify

Opened 7 years ago

Last modified 2 years ago

#1535 reopened enhancement

Export Wiki pages to HTML

Reported by: tkachenko.igor@… Owned by: anybody
Priority: low Component: Request-a-Hack
Severity: minor Keywords:
Cc: Trac Release: 0.10

Description

Ability to export set of Wiki pages to HTML files linked to each other. It neccessary when we create documentation to our project and then we want include it to the project distributive, or when we want to make CHM file from the Wiki page set. So it would more practical to have script allows this functionality.

Also it would be great to have some additional functionality:

  • export to single document (with preserving link jump)
  • export of all pages tree. We specify only one page in input parameters of the script, and it export all the pages are linked by this page, and sript did it recursively for another pages.

Attachments (0)

Change History (15)

comment:1 Changed 7 years ago by coderanger

  • Resolution set to worksforme
  • Status changed from new to closed

You mean like the CombineWikiPlugin?

comment:2 Changed 7 years ago by ThurnerRupert

  • Resolution worksforme deleted
  • Status changed from closed to reopened

CombineWikiPlugin allows html output? just saw pdf ...

comment:3 Changed 7 years ago by ThurnerRupert

see also #t3332

comment:4 Changed 7 years ago by coderanger

There is quite clearly an HTML output format (id addition to PostScript, PDF, and TiddlyWiki). If you do not see it, please send me a screenshot of the admin page.

comment:5 Changed 7 years ago by anonymous

  • Priority changed from normal to highest

comment:6 Changed 7 years ago by anonymous

CombineWikiPlugin has a lot of bugs + it is not support export of the sublinked pages + no support a hyperlinks translations (to local)

comment:7 in reply to: ↑ description ; follow-ups: Changed 7 years ago by anonymous

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

comment:8 in reply to: ↑ 7 Changed 7 years ago by anonymous

  • Priority changed from highest to low
  • Severity changed from blocker to minor

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

This is just a fast try in order for a whole Trac to be viewed off-line with proper CSS, logos, even change history is preserved (of course you should change the URL to whatever that fits you):

wget -m -k -K -E http://www.example.com/cgi-bin/trac.cgi

Wise use of wget options (man wget) will export the site for static view on a different web server, etc.

Since there a proper way to get the asked for functionality, I'll reorder priority and severity to "low".

comment:9 in reply to: ↑ 7 Changed 6 years ago by datenimperator

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

While wget certainly does a good job grabbing stuff from a http server, it's not particularly well suited for this task. We've had that issue before, because we needed to convert a range of wiki pages to MS Word to include them in Documentation sent to external customers.

  • Just copy/paste the formatted text doesn't work well, because Word tries to import all formatting in terms of fonts, colors, indents and whatnot. Surely breaks your layout, or will be a nightmare to clean up.
  • Copy/paste it as "text only" will loose all structured elements like headers und lists. Again, hard to clean up.

We found that working with the XMLRPC plugin gave us means to extract pure HTML versions of any wiki page, just using plain HTML structure elements like H1, OL or LI. No formatting, no CSS, no navigation bar, no scripts.

We opened those files in MS Word, which gave us a barely formatted but well structured Word document which played well when pasted into templates containing elaborate style sheets. Even embedded images were still in place, because they use relative URLs and we've inserted BASEURL tags in the resulting HTML files.

comment:10 follow-up: Changed 6 years ago by anonymous

is there any code you could share?

comment:11 in reply to: ↑ 10 Changed 6 years ago by anonymous

Replying to anonymous:

is there any code you could share?

We needed a Win32 GUI Version of that tol, since people using it weren't familiar with command lines in general. That's why I hacked up something using the Realbasic IDE. Unfortunately, I don't have a license of RB available anymore, so I can't compile a working version (might have the source codes, but I'm not sure)

Then again, it's not hard at all to do it again. XMLRPC is pretty basic and well supported, it shouldn't be too much work to cook up a version using Java or else.

Then again, is it really necessary? What people need is a ZIP archive containing a number of simple HTML files. Should be possible to do it as a trac plugin and a standard export, which (optionally) follows all wiki links up to a given depth. So under each wiki page you've had some "Export to HTML archive", clikcing it would generate and download a ZIP file containing the files. I'd love to see that.

comment:12 Changed 6 years ago by anonymous

Following the advice of "Gui-" on #trac, I started messing around with httrack to do this. The command I came up with that does an excellent job is this:

httrack "http://test:XXXXXXXX@trac.mydomain.com/login" -O TracDump "-*" "+trac.mydomain.com/wiki/*" "+trac.mydomain.com/raw-attachment/wiki/*" "+trac.mydomain.com/attachment/wiki/*" "+trac.mydomain.com/ticket/*" "+trac.mydomain.com/raw-attachment/ticket/*" "+trac.mydomain.com/attachment/ticket/*" "+trac.mydomain.com/chrome/*" "-*?*" -v

What this does:

  • Starts at the login page and uses HTTPAUTH to gain access as a "test" user
  • Puts the output in a TracDump folder
  • Downloads all wiki pages and tickets, and their attachments
  • Avoids all other pages (notably the "logout" page)
  • Avoids any page with a ? in it (such as older versions, edit pages, diffs, etc.)
  • Rewrites all of their internal links to be relative

Problems:

  • It downloads wiki pages that are linked but don't exist (not a huge deal, IMO)

comment:13 follow-up: Changed 5 years ago by rjollos

See WikiToPdfPlugin

comment:14 in reply to: ↑ 13 Changed 5 years ago by rjollos

Replying to rjollos:

See WikiToPdfPlugin

Correction, TracWikiToPdfPlugin.

comment:15 Changed 5 years ago by AdrianFritz

Also there is PageToOdtPlugin. With Open Office support you can convert it to MSOffice format, edit, or else more, export to PDF.

Add Comment

Modify Ticket

Action
as reopened .
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.