Modify

Opened 16 years ago

Last modified 5 years ago

#4743 new enhancement

Patch: intertrac support, include external html, remote authentication

Reported by: jholg Owned by:
Priority: normal Component: IncludeMacro
Severity: normal Keywords: intertrac
Cc: Trac Release: 0.11

Description

Hi,

I finally got round to working on IncludeMacro.

Attached proposal code for IncludeMacro supports:

  • intertrac support (also requested in ticket:1196 )
  • support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).
  • the possibility to select part of the included html result by using a genshi transformer xpath expression
  • if installed, BeautifulSoup will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

This should also fix ticket:2474 (non-standard http port).

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to

HTMLSanitizer.SAFE_TAGS|set(["html", "body"])

which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

I'm attaching this as a full source file as the changes to the original source are by now quite significant.

I'd love to see getting these capabilities into the official IncludeMacro. Let me know if I can help.

Holger

Attachments (7)

macros.py (15.2 KB) - added by jholg 16 years ago.
proposal includemacro code for intertrac support / external html inclusion
macros.py.patch (3.8 KB) - added by John Hanks 16 years ago.
Patch to enable include of authenticated websites.
macros.2.py (16.9 KB) - added by jholg 12 years ago.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration
macros.3.py (18.4 KB) - added by jholg 12 years ago.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument
macros.4.py (18.6 KB) - added by jholg 12 years ago.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument, plus using http header encoding ("charset") information
macros.4.py.patch (271 bytes) - added by morgana 9 years ago.
HTMLParser seems to need unicode in Trac 1.0.12
macros.4.py.2.patch (2.0 KB) - added by morgana 9 years ago.
HTMLParser seems to need unicode in Trac 1.0.12. Added support voor InterTrac links.

Download all attachments as: .zip

Change History (18)

Changed 16 years ago by jholg

Attachment: macros.py added

proposal includemacro code for intertrac support / external html inclusion

Changed 16 years ago by John Hanks

Attachment: macros.py.patch added

Patch to enable include of authenticated websites.

comment:1 Changed 16 years ago by John Hanks

I have attached a patch to your macro.py which enables support for authenticated websites. Although I may have broken processing the xpath argument (I don't use that so I didn't test it.)

griznog

comment:2 Changed 12 years ago by Steffen Hoffmann

Owner: changed from Noah Kantrowitz to Ryan J Ollos
Priority: highnormal
Summary: Patch: intertrac support, include external htmlPatch: intertrac support, include external html, remote authentication
Type: defectenhancement

#1196 has been closed as a duplicate of this ticket.

Other changes are on order to ensure current maintainers attention. Ultimately, let him choose priority for this feature.

comment:3 Changed 12 years ago by Steffen Hoffmann

#6879 refers to this ticket, because it has been created exclusively for adding remote authentication to this plugin.

comment:4 Changed 12 years ago by jholg

Just spotted the activity here so I'm attaching a newer version of my original patch. This basically adopts the basic auth support as provided by griznog plus options to control the HTMLSanitizers' "safe tags" and "safe attributes" in the macro config section.

Note that my own use case doesn't really expand to using the auth support so this part isn't well tested (by me).

This is of course still based on an ancient 0.11 revision of the original code; I've been using this for years now in our Trac intranet.

Holger

Changed 12 years ago by jholg

Attachment: macros.2.py added

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration

Changed 12 years ago by jholg

Attachment: macros.3.py added

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument

comment:5 Changed 12 years ago by jholg

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

Changed 12 years ago by jholg

Attachment: macros.4.py added

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument, plus using http header encoding ("charset") information

comment:6 in reply to:  5 Changed 12 years ago by jholg

Replying to jholg:

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

To better scratch my current itch this now hands the include source http header encoding information from the content-type charset to BeautifulSoup, if no explicit from_encoding is set (attachment:macros.4.py).

comment:7 Changed 12 years ago by Ryan J Ollos

I'll probably get around to reviewing this more quickly if you attach it as a unified diff. Thanks for the patch!

comment:8 in reply to:  description Changed 12 years ago by jholg

Hi,

please see the ticket description - I originally attached this as a full source file as the changes were quite substantial by the time I finished the functionality I needed.

That being said I can work on a patch against the latest trunk (or released version) if there are chances that this goes into the official version, and if there's agreement on the target functionality.

In particular:

Attached proposal code for IncludeMacro supports:

I can't remember anything that might be controversial about this.

  • support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).

Note that to actually include external html without setting render_unsafe_content you need to modify the "safe tags" settings of HTMLSanitizer. More explanations below.

  • the possibility to select part of the included html result by using a genshi transformer xpath expression

I think this is a nice optional add-on and doesn't add any security-relevant concerns in itself.

  • if installed, BeautifulSoup will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

While this introduces an (optional) dependency on BeautifulSoup I found that more often than not included external (x)html is pretty broken and needs fixup. Also, sometimes Trac plugins break xhtml conformance. This is an easy way to mostly cover such issues.

This should also fix ticket:2474 (non-standard http port).

Probably not worth discussing.

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to

HTMLSanitizer.SAFE_TAGS|set(["html", "body"])

which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

As mentioned above I guess this is a point that warrants discussion. As I see things there's basically 2 things that can be done if you want to include external html pages:

  1. Set render_unsafe_content to True. Undesirable as a general setting imho.
  2. Do what I propose here. I am neither a Trac nor an http expert but I fail to see any real security concerns; the original macro author unfortunately never commented on that.

Holger

comment:9 Changed 11 years ago by Ryan J Ollos

Status: newassigned

Changed 9 years ago by morgana

Attachment: macros.4.py.patch added

HTMLParser seems to need unicode in Trac 1.0.12

comment:10 Changed 9 years ago by Ryan J Ollos

Please submit patches in unified diff format. See trac:TracDev/SubmittingPatches for more info.

Changed 9 years ago by morgana

Attachment: macros.4.py.2.patch added

HTMLParser seems to need unicode in Trac 1.0.12. Added support voor InterTrac links.

comment:11 Changed 5 years ago by Ryan J Ollos

Owner: Ryan J Ollos deleted
Status: assignednew

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.