Ticket #1535 (reopened enhancement)

Opened 6 years ago

Last modified 10 months ago

Export Wiki pages to HTML

Reported by: tkachenko.igor@gmail.com Assigned to: anybody
Priority: low Component: Request-a-Hack
Severity: minor Keywords:
Cc: Trac Release: 0.10

Description

Ability to export set of Wiki pages to HTML files linked to each other. It neccessary when we create documentation to our project and then we want include it to the project distributive, or when we want to make CHM file from the Wiki page set. So it would more practical to have script allows this functionality.

Also it would be great to have some additional functionality:

  • export to single document (with preserving link jump)
  • export of all pages tree. We specify only one page in input parameters of the script, and it export all the pages are linked by this page, and sript did it recursively for another pages.

Attachments

Change History

05/12/07 23:43:57 changed by coderanger

  • status changed from new to closed.
  • resolution set to worksforme.

You mean like the CombineWikiPlugin?

05/14/07 21:13:53 changed by ThurnerRupert

  • status changed from closed to reopened.
  • resolution deleted.

CombineWikiPlugin allows html output? just saw pdf ...

05/14/07 21:18:26 changed by ThurnerRupert

see also #t3332

05/14/07 21:24:57 changed by coderanger

There is quite clearly an HTML output format (id addition to PostScript, PDF, and TiddlyWiki). If you do not see it, please send me a screenshot of the admin page.

09/13/07 12:44:55 changed by anonymous

  • priority changed from normal to highest.

09/13/07 12:46:53 changed by anonymous

CombineWikiPlugin has a lot of bugs + it is not support export of the sublinked pages + no support a hyperlinks translations (to local)

(in reply to: ↑ description ; follow-ups: ↓ 8 ↓ 9 ) 01/25/08 01:53:20 changed by anonymous

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

(in reply to: ↑ 7 ) 03/14/08 16:03:49 changed by anonymous

  • priority changed from highest to low.
  • severity changed from blocker to minor.

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

This is just a fast try in order for a whole Trac to be viewed off-line with proper CSS, logos, even change history is preserved (of course you should change the URL to whatever that fits you):

wget -m -k -K -E http://www.example.com/cgi-bin/trac.cgi

Wise use of wget options (man wget) will export the site for static view on a different web server, etc.

Since there a proper way to get the asked for functionality, I'll reorder priority and severity to "low".

(in reply to: ↑ 7 ) 08/27/08 13:41:05 changed by datenimperator

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages? * You can point it to a single URL and it will follow every link within that page * ...up to some defined tree depth * ...following up or not through external servers * ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly * ...and much more.

While wget certainly does a good job grabbing stuff from a http server, it's not particularly well suited for this task. We've had that issue before, because we needed to convert a range of wiki pages to MS Word to include them in Documentation sent to external customers.

  • Just copy/paste the formatted text doesn't work well, because Word tries to import all formatting in terms of fonts, colors, indents and whatnot. Surely breaks your layout, or will be a nightmare to clean up.
  • Copy/paste it as "text only" will loose all structured elements like headers und lists. Again, hard to clean up.

We found that working with the XMLRPC plugin gave us means to extract pure HTML versions of any wiki page, just using plain HTML structure elements like H1, OL or LI. No formatting, no CSS, no navigation bar, no scripts.

We opened those files in MS Word, which gave us a barely formatted but well structured Word document which played well when pasted into templates containing elaborate style sheets. Even embedded images were still in place, because they use relative URLs and we've inserted BASEURL tags in the resulting HTML files.

(follow-up: ↓ 11 ) 08/27/08 13:44:05 changed by anonymous

is there any code you could share?

(in reply to: ↑ 10 ) 08/27/08 13:53:42 changed by anonymous

Replying to anonymous:

is there any code you could share?

We needed a Win32 GUI Version of that tol, since people using it weren't familiar with command lines in general. That's why I hacked up something using the Realbasic IDE. Unfortunately, I don't have a license of RB available anymore, so I can't compile a working version (might have the source codes, but I'm not sure)

Then again, it's not hard at all to do it again. XMLRPC is pretty basic and well supported, it shouldn't be too much work to cook up a version using Java or else.

Then again, is it really necessary? What people need is a ZIP archive containing a number of simple HTML files. Should be possible to do it as a trac plugin and a standard export, which (optionally) follows all wiki links up to a given depth. So under each wiki page you've had some "Export to HTML archive", clikcing it would generate and download a ZIP file containing the files. I'd love to see that.

10/24/08 18:19:51 changed by anonymous

Following the advice of "Gui-" on #trac, I started messing around with httrack to do this. The command I came up with that does an excellent job is this:

httrack "http://test:XXXXXXXX@trac.mydomain.com/login" -O TracDump "-*" "+trac.mydomain.com/wiki/*" "+trac.mydomain.com/raw-attachment/wiki/*" "+trac.mydomain.com/attachment/wiki/*" "+trac.mydomain.com/ticket/*" "+trac.mydomain.com/raw-attachment/ticket/*" "+trac.mydomain.com/attachment/ticket/*" "+trac.mydomain.com/chrome/*" "-*?*" -v

What this does:

  • Starts at the login page and uses HTTPAUTH to gain access as a "test" user
  • Puts the output in a TracDump? folder
  • Downloads all wiki pages and tickets, and their attachments
  • Avoids all other pages (notably the "logout" page)
  • Avoids any page with a ? in it (such as older versions, edit pages, diffs, etc.)
  • Rewrites all of their internal links to be relative

Problems:

  • It downloads wiki pages that are linked but don't exist (not a huge deal, IMO)

(follow-up: ↓ 14 ) 09/06/09 13:31:42 changed by rjollos

(in reply to: ↑ 13 ) 09/06/09 13:32:36 changed by rjollos

Replying to rjollos:

See WikiToPdfPlugin?

Correction, TracWikiToPdfPlugin.

01/24/10 19:13:07 changed by AdrianFritz

Also there is PageToOdtPlugin. With Open Office support you can convert it to MSOffice format, edit, or else more, export to PDF.


Add/Change #1535 (Export Wiki pages to HTML)




Change Properties
Action