Opened 18 years ago
Closed 7 years ago
#1535 closed enhancement (worksforme)
Export Wiki pages to HTML
Reported by: | Igor | Owned by: | anybody |
---|---|---|---|
Priority: | low | Component: | Request-a-Hack |
Severity: | minor | Keywords: | |
Cc: | Trac Release: | 0.10 |
Description
Ability to export set of Wiki pages to HTML files linked to each other. It neccessary when we create documentation to our project and then we want include it to the project distributive, or when we want to make CHM file from the Wiki page set. So it would more practical to have script allows this functionality.
Also it would be great to have some additional functionality:
- export to single document (with preserving link jump)
- export of all pages tree. We specify only one page in input parameters of the script, and it export all the pages are linked by this page, and sript did it recursively for another pages.
Attachments (0)
Change History (17)
comment:1 Changed 18 years ago by
Resolution: | → worksforme |
---|---|
Status: | new → closed |
comment:2 Changed 18 years ago by
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
CombineWikiPlugin allows html output? just saw pdf ...
comment:4 Changed 18 years ago by
There is quite clearly an HTML output format (id addition to PostScript, PDF, and TiddlyWiki). If you do not see it, please send me a screenshot of the admin page.
comment:5 Changed 17 years ago by
Priority: | normal → highest |
---|
comment:6 Changed 17 years ago by
CombineWikiPlugin has a lot of bugs + it is not support export of the sublinked pages + no support a hyperlinks translations (to local)
comment:7 follow-ups: 8 9 Changed 17 years ago by
Replying to tkachenko.igor@gmail.com:
Ability to export set of Wiki pages to HTML files linked to each other.
Isn't this exactly what wget has been doing for ages?
- You can point it to a single URL and it will follow every link within that page
- ...up to some defined tree depth
- ...following up or not through external servers
- ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
- ...and much more.
comment:8 Changed 17 years ago by
Priority: | highest → low |
---|---|
Severity: | blocker → minor |
Replying to anonymous:
Replying to tkachenko.igor@gmail.com:
Ability to export set of Wiki pages to HTML files linked to each other.
Isn't this exactly what wget has been doing for ages?
This is just a fast try in order for a whole Trac to be viewed off-line with proper CSS, logos, even change history is preserved (of course you should change the URL to whatever that fits you):
wget -m -k -K -E http://www.example.com/cgi-bin/trac.cgi
Wise use of wget options (man wget) will export the site for static view on a different web server, etc.
Since there a proper way to get the asked for functionality, I'll reorder priority and severity to "low".
comment:9 Changed 16 years ago by
Replying to anonymous:
Replying to tkachenko.igor@gmail.com:
Ability to export set of Wiki pages to HTML files linked to each other.
Isn't this exactly what wget has been doing for ages?
- You can point it to a single URL and it will follow every link within that page
- ...up to some defined tree depth
- ...following up or not through external servers
- ...retaining directory structure or collapsing it to a single directory and rewriting page links accordingly
- ...and much more.
While wget certainly does a good job grabbing stuff from a http server, it's not particularly well suited for this task. We've had that issue before, because we needed to convert a range of wiki pages to MS Word to include them in Documentation sent to external customers.
- Just copy/paste the formatted text doesn't work well, because Word tries to import all formatting in terms of fonts, colors, indents and whatnot. Surely breaks your layout, or will be a nightmare to clean up.
- Copy/paste it as "text only" will loose all structured elements like headers und lists. Again, hard to clean up.
We found that working with the XMLRPC plugin gave us means to extract pure HTML versions of any wiki page, just using plain HTML structure elements like H1, OL or LI. No formatting, no CSS, no navigation bar, no scripts.
We opened those files in MS Word, which gave us a barely formatted but well structured Word document which played well when pasted into templates containing elaborate style sheets. Even embedded images were still in place, because they use relative URLs and we've inserted BASEURL tags in the resulting HTML files.
comment:11 Changed 16 years ago by
Replying to anonymous:
is there any code you could share?
We needed a Win32 GUI Version of that tol, since people using it weren't familiar with command lines in general. That's why I hacked up something using the Realbasic IDE. Unfortunately, I don't have a license of RB available anymore, so I can't compile a working version (might have the source codes, but I'm not sure)
Then again, it's not hard at all to do it again. XMLRPC is pretty basic and well supported, it shouldn't be too much work to cook up a version using Java or else.
Then again, is it really necessary? What people need is a ZIP archive containing a number of simple HTML files. Should be possible to do it as a trac plugin and a standard export, which (optionally) follows all wiki links up to a given depth. So under each wiki page you've had some "Export to HTML archive", clikcing it would generate and download a ZIP file containing the files. I'd love to see that.
comment:12 Changed 16 years ago by
Following the advice of "Gui-" on #trac, I started messing around with httrack
to do this. The command I came up with that does an excellent job is this:
httrack "http://test:XXXXXXXX@trac.mydomain.com/login" -O TracDump "-*" "+trac.mydomain.com/wiki/*" "+trac.mydomain.com/raw-attachment/wiki/*" "+trac.mydomain.com/attachment/wiki/*" "+trac.mydomain.com/ticket/*" "+trac.mydomain.com/raw-attachment/ticket/*" "+trac.mydomain.com/attachment/ticket/*" "+trac.mydomain.com/chrome/*" "-*?*" -v
What this does:
- Starts at the login page and uses HTTPAUTH to gain access as a "test" user
- Puts the output in a TracDump folder
- Downloads all wiki pages and tickets, and their attachments
- Avoids all other pages (notably the "logout" page)
- Avoids any page with a ? in it (such as older versions, edit pages, diffs, etc.)
- Rewrites all of their internal links to be relative
Problems:
- It downloads wiki pages that are linked but don't exist (not a huge deal, IMO)
comment:14 Changed 15 years ago by
comment:15 Changed 15 years ago by
Also there is PageToOdtPlugin. With Open Office support you can convert it to MSOffice format, edit, or else more, export to PDF.
comment:16 Changed 7 years ago by
just use wget
I did try using wget. I used these options, as they're more informative than the single-character shorthand:
wget --mirror --convert-links --adjust-extension https://trac-dev.yougov.net --http-user=myusername --http-password=$(read -s -p password: password; echo $password)
I didn't pass --backup-converted as I didn't want to concern with overwrite issues as I was extracting to a clean directory.
And while this technique does download the pages, it doesn't come close to what we need - which is to download the content page-by-page, such that it might be imported into another system. Using wget causes all of the header/footer content to be included, which is undesirable when trying to export the pages.
I'll be working on an RPC-based solution and will publish in this SO question.
comment:17 Changed 7 years ago by
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
You mean like the CombineWikiPlugin?