Modify

Opened 10 years ago

Last modified 5 years ago

#1535 reopened enhancement

Export Wiki pages to HTML

Reported by: Igor Owned by: anybody
Priority: low Component: Request-a-Hack
Severity: minor Keywords:
Cc: Trac Release: 0.10

Description

Ability to export set of Wiki pages to HTML files linked to each other. It neccessary when we create documentation to our project and then we want include it to the project distributive, or when we want to make CHM file from the Wiki page set. So it would more practical to have script allows this functionality.

Also it would be great to have some additional functionality:

  • export to single document (with preserving link jump)
  • export of all pages tree. We specify only one page in input parameters of the script, and it export all the pages are linked by this page, and sript did it recursively for another pages.

Attachments (0)

Change History (15)

comment:1 Changed 10 years ago by Noah Kantrowitz

Resolution: worksforme
Status: newclosed

You mean like the CombineWikiPlugin?

comment:2 Changed 10 years ago by rupert thurner

Resolution: worksforme
Status: closedreopened

CombineWikiPlugin allows html output? just saw pdf ...

comment:3 Changed 10 years ago by rupert thurner

see also #t3332

comment:4 Changed 10 years ago by Noah Kantrowitz

There is quite clearly an HTML output format (id addition to PostScript, PDF, and TiddlyWiki). If you do not see it, please send me a screenshot of the admin page.

comment:5 Changed 9 years ago by anonymous

Priority: normalhighest

comment:6 Changed 9 years ago by anonymous

CombineWikiPlugin has a lot of bugs + it is not support export of the sublinked pages + no support a hyperlinks translations (to local)

comment:7 in reply to:  description ; Changed 9 years ago by anonymous

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

comment:8 in reply to:  7 Changed 9 years ago by anonymous

Priority: highestlow
Severity: blockerminor

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

This is just a fast try in order for a whole Trac to be viewed off-line with proper CSS, logos, even change history is preserved (of course you should change the URL to whatever that fits you):

wget -m -k -K -E http://www.example.com/cgi-bin/trac.cgi

Wise use of wget options (man wget) will export the site for static view on a different web server, etc.

Since there a proper way to get the asked for functionality, I'll reorder priority and severity to "low".

comment:9 in reply to:  7 Changed 8 years ago by Christian Aust

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

While wget certainly does a good job grabbing stuff from a http server, it's not particularly well suited for this task. We've had that issue before, because we needed to convert a range of wiki pages to MS Word to include them in Documentation sent to external customers.

  • Just copy/paste the formatted text doesn't work well, because Word tries to import all formatting in terms of fonts, colors, indents and whatnot. Surely breaks your layout, or will be a nightmare to clean up.
  • Copy/paste it as "text only" will loose all structured elements like headers und lists. Again, hard to clean up.

We found that working with the XMLRPC plugin gave us means to extract pure HTML versions of any wiki page, just using plain HTML structure elements like H1, OL or LI. No formatting, no CSS, no navigation bar, no scripts.

We opened those files in MS Word, which gave us a barely formatted but well structured Word document which played well when pasted into templates containing elaborate style sheets. Even embedded images were still in place, because they use relative URLs and we've inserted BASEURL tags in the resulting HTML files.

comment:10 Changed 8 years ago by anonymous

is there any code you could share?

comment:11 in reply to:  10 Changed 8 years ago by anonymous

Replying to anonymous:

is there any code you could share?

We needed a Win32 GUI Version of that tol, since people using it weren't familiar with command lines in general. That's why I hacked up something using the Realbasic IDE. Unfortunately, I don't have a license of RB available anymore, so I can't compile a working version (might have the source codes, but I'm not sure)

Then again, it's not hard at all to do it again. XMLRPC is pretty basic and well supported, it shouldn't be too much work to cook up a version using Java or else.

Then again, is it really necessary? What people need is a ZIP archive containing a number of simple HTML files. Should be possible to do it as a trac plugin and a standard export, which (optionally) follows all wiki links up to a given depth. So under each wiki page you've had some "Export to HTML archive", clikcing it would generate and download a ZIP file containing the files. I'd love to see that.

comment:12 Changed 8 years ago by anonymous

Following the advice of "Gui-" on #trac, I started messing around with httrack to do this. The command I came up with that does an excellent job is this:

httrack "http://test:XXXXXXXX@trac.mydomain.com/login" -O TracDump "-*" "+trac.mydomain.com/wiki/*" "+trac.mydomain.com/raw-attachment/wiki/*" "+trac.mydomain.com/attachment/wiki/*" "+trac.mydomain.com/ticket/*" "+trac.mydomain.com/raw-attachment/ticket/*" "+trac.mydomain.com/attachment/ticket/*" "+trac.mydomain.com/chrome/*" "-*?*" -v

What this does:

  • Starts at the login page and uses HTTPAUTH to gain access as a "test" user
  • Puts the output in a TracDump folder
  • Downloads all wiki pages and tickets, and their attachments
  • Avoids all other pages (notably the "logout" page)
  • Avoids any page with a ? in it (such as older versions, edit pages, diffs, etc.)
  • Rewrites all of their internal links to be relative

Problems:

  • It downloads wiki pages that are linked but don't exist (not a huge deal, IMO)

comment:13 Changed 7 years ago by Ryan J Ollos

See WikiToPdfPlugin

comment:14 in reply to:  13 Changed 7 years ago by Ryan J Ollos

Replying to rjollos:

See WikiToPdfPlugin

Correction, TracWikiToPdfPlugin.

comment:15 Changed 7 years ago by Adrian Fritz

Also there is PageToOdtPlugin. With Open Office support you can convert it to MSOffice format, edit, or else more, export to PDF.

Modify Ticket

Action
as reopened The owner will remain anybody.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.