Modify

Opened 17 years ago

Closed 7 years ago

#1535 closed enhancement (worksforme)

Export Wiki pages to HTML

Reported by: Igor Owned by: anybody
Priority: low Component: Request-a-Hack
Severity: minor Keywords:
Cc: Trac Release: 0.10

Description

Ability to export set of Wiki pages to HTML files linked to each other. It neccessary when we create documentation to our project and then we want include it to the project distributive, or when we want to make CHM file from the Wiki page set. So it would more practical to have script allows this functionality.

Also it would be great to have some additional functionality:

  • export to single document (with preserving link jump)
  • export of all pages tree. We specify only one page in input parameters of the script, and it export all the pages are linked by this page, and sript did it recursively for another pages.

Attachments (0)

Change History (17)

comment:1 Changed 17 years ago by Noah Kantrowitz

Resolution: worksforme
Status: newclosed

You mean like the CombineWikiPlugin?

comment:2 Changed 17 years ago by rupert thurner

Resolution: worksforme
Status: closedreopened

CombineWikiPlugin allows html output? just saw pdf ...

comment:3 Changed 17 years ago by rupert thurner

see also #t3332

comment:4 Changed 17 years ago by Noah Kantrowitz

There is quite clearly an HTML output format (id addition to PostScript, PDF, and TiddlyWiki). If you do not see it, please send me a screenshot of the admin page.

comment:5 Changed 17 years ago by anonymous

Priority: normalhighest

comment:6 Changed 17 years ago by anonymous

CombineWikiPlugin has a lot of bugs + it is not support export of the sublinked pages + no support a hyperlinks translations (to local)

comment:7 in reply to:  description ; Changed 16 years ago by anonymous

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

comment:8 in reply to:  7 Changed 16 years ago by anonymous

Priority: highestlow
Severity: blockerminor

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

This is just a fast try in order for a whole Trac to be viewed off-line with proper CSS, logos, even change history is preserved (of course you should change the URL to whatever that fits you):

wget -m -k -K -E http://www.example.com/cgi-bin/trac.cgi

Wise use of wget options (man wget) will export the site for static view on a different web server, etc.

Since there a proper way to get the asked for functionality, I'll reorder priority and severity to "low".

comment:9 in reply to:  7 Changed 16 years ago by Christian Aust

Replying to anonymous:

Replying to tkachenko.igor@gmail.com:

Ability to export set of Wiki pages to HTML files linked to each other.

Isn't this exactly what wget has been doing for ages?

  • You can point it to a single URL and it will follow every link within that page
    • ...up to some defined tree depth
    • ...following up or not through external servers
    • ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
    • ...and much more.

While wget certainly does a good job grabbing stuff from a http server, it's not particularly well suited for this task. We've had that issue before, because we needed to convert a range of wiki pages to MS Word to include them in Documentation sent to external customers.

  • Just copy/paste the formatted text doesn't work well, because Word tries to import all formatting in terms of fonts, colors, indents and whatnot. Surely breaks your layout, or will be a nightmare to clean up.
  • Copy/paste it as "text only" will loose all structured elements like headers und lists. Again, hard to clean up.

We found that working with the XMLRPC plugin gave us means to extract pure HTML versions of any wiki page, just using plain HTML structure elements like H1, OL or LI. No formatting, no CSS, no navigation bar, no scripts.

We opened those files in MS Word, which gave us a barely formatted but well structured Word document which played well when pasted into templates containing elaborate style sheets. Even embedded images were still in place, because they use relative URLs and we've inserted BASEURL tags in the resulting HTML files.

comment:10 Changed 16 years ago by anonymous

is there any code you could share?

comment:11 in reply to:  10 Changed 16 years ago by anonymous

Replying to anonymous:

is there any code you could share?

We needed a Win32 GUI Version of that tol, since people using it weren't familiar with command lines in general. That's why I hacked up something using the Realbasic IDE. Unfortunately, I don't have a license of RB available anymore, so I can't compile a working version (might have the source codes, but I'm not sure)

Then again, it's not hard at all to do it again. XMLRPC is pretty basic and well supported, it shouldn't be too much work to cook up a version using Java or else.

Then again, is it really necessary? What people need is a ZIP archive containing a number of simple HTML files. Should be possible to do it as a trac plugin and a standard export, which (optionally) follows all wiki links up to a given depth. So under each wiki page you've had some "Export to HTML archive", clikcing it would generate and download a ZIP file containing the files. I'd love to see that.

comment:12 Changed 15 years ago by anonymous

Following the advice of "Gui-" on #trac, I started messing around with httrack to do this. The command I came up with that does an excellent job is this:

httrack "http://test:XXXXXXXX@trac.mydomain.com/login" -O TracDump "-*" "+trac.mydomain.com/wiki/*" "+trac.mydomain.com/raw-attachment/wiki/*" "+trac.mydomain.com/attachment/wiki/*" "+trac.mydomain.com/ticket/*" "+trac.mydomain.com/raw-attachment/ticket/*" "+trac.mydomain.com/attachment/ticket/*" "+trac.mydomain.com/chrome/*" "-*?*" -v

What this does:

  • Starts at the login page and uses HTTPAUTH to gain access as a "test" user
  • Puts the output in a TracDump folder
  • Downloads all wiki pages and tickets, and their attachments
  • Avoids all other pages (notably the "logout" page)
  • Avoids any page with a ? in it (such as older versions, edit pages, diffs, etc.)
  • Rewrites all of their internal links to be relative

Problems:

  • It downloads wiki pages that are linked but don't exist (not a huge deal, IMO)

comment:13 Changed 15 years ago by Ryan J Ollos

See WikiToPdfPlugin

comment:14 in reply to:  13 Changed 15 years ago by Ryan J Ollos

Replying to rjollos:

See WikiToPdfPlugin

Correction, TracWikiToPdfPlugin.

comment:15 Changed 14 years ago by Adrian Fritz

Also there is PageToOdtPlugin. With Open Office support you can convert it to MSOffice format, edit, or else more, export to PDF.

comment:16 Changed 7 years ago by anonymous

just use wget

I did try using wget. I used these options, as they're more informative than the single-character shorthand:

wget --mirror --convert-links --adjust-extension https://trac-dev.yougov.net --http-user=myusername --http-password=$(read -s -p password: password; echo $password)

I didn't pass --backup-converted as I didn't want to concern with overwrite issues as I was extracting to a clean directory.

And while this technique does download the pages, it doesn't come close to what we need - which is to download the content page-by-page, such that it might be imported into another system. Using wget causes all of the header/footer content to be included, which is undesirable when trying to export the pages.

I'll be working on an RPC-based solution and will publish in this SO question.

comment:17 Changed 7 years ago by Ryan J Ollos

Resolution: worksforme
Status: reopenedclosed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain anybody.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.