Ticket #4743 (new enhancement)

Opened 4 years ago

Last modified 5 months ago

Patch: intertrac support, include external html, remote authentication

Reported by: jholg Assigned to: rjollos
Priority: normal Component: IncludeMacro
Severity: normal Keywords: intertrac
Cc: Trac Release: 0.11

Description

Hi,

I finally got round to working on IncludeMacro.

Attached proposal code for IncludeMacro supports:

  • intertrac support (also requested in ticket:1196 )
  • support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).
  • the possibility to select part of the included html result by using a genshi transformer xpath expression
  • if installed, BeautifulSoup? will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

This should also fix ticket:2474 (non-standard http port).

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to

HTMLSanitizer.SAFE_TAGS|set(["html", "body"])

which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

I'm attaching this as a full source file as the changes to the original source are by now quite significant.

I'd love to see getting these capabilities into the official IncludeMacro. Let me know if I can help.

Holger

Attachments

macros.py (15.2 kB) - added by jholg on 03/04/09 18:17:45.
proposal includemacro code for intertrac support / external html inclusion
macros.py.patch (3.8 kB) - added by griznog on 04/29/09 18:03:24.
Patch to enable include of authenticated websites.
macros.2.py (16.9 kB) - added by jholg on 12/20/12 10:58:00.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration
macros.3.py (18.4 kB) - added by jholg on 12/20/12 15:06:50.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument
macros.4.py (18.6 kB) - added by jholg on 12/21/12 13:22:23.
Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument, plus using http header encoding ("charset") information

Change History

03/04/09 18:17:45 changed by jholg

  • attachment macros.py added.

proposal includemacro code for intertrac support / external html inclusion

04/29/09 18:03:24 changed by griznog

  • attachment macros.py.patch added.

Patch to enable include of authenticated websites.

04/29/09 18:04:06 changed by griznog

I have attached a patch to your macro.py which enables support for authenticated websites. Although I may have broken processing the xpath argument (I don't use that so I didn't test it.)

griznog

12/13/12 20:57:23 changed by hasienda

  • priority changed from high to normal.
  • owner changed from coderanger to rjollos.
  • type changed from defect to enhancement.
  • summary changed from Patch: intertrac support, include external html to Patch: intertrac support, include external html, remote authentication.

#1196 has been closed as a duplicate of this ticket.

Other changes are on order to ensure current maintainers attention. Ultimately, let him choose priority for this feature.

12/13/12 21:39:52 changed by hasienda

#6879 refers to this ticket, because it has been created exclusively for adding remote authentication to this plugin.

12/20/12 10:56:08 changed by jholg

Just spotted the activity here so I'm attaching a newer version of my original patch. This basically adopts the basic auth support as provided by griznog plus options to control the HTMLSanitizers' "safe tags" and "safe attributes" in the macro config section.

Note that my own use case doesn't really expand to using the auth support so this part isn't well tested (by me).

This is of course still based on an ancient 0.11 revision of the original code; I've been using this for years now in our Trac intranet.

Holger

12/20/12 10:58:00 changed by jholg

  • attachment macros.2.py added.

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration

12/20/12 15:06:50 changed by jholg

  • attachment macros.3.py added.

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument

(follow-up: ↓ 6 ) 12/20/12 15:14:02 changed by jholg

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

12/21/12 13:22:23 changed by jholg

  • attachment macros.4.py added.

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument, plus using http header encoding ("charset") information

(in reply to: ↑ 5 ) 12/21/12 13:30:35 changed by jholg

Replying to jholg:

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

To better scratch my current itch this now hands the include source http header encoding information from the content-type charset to BeautifulSoup?, if no explicit from_encoding is set (attachment:macros.4.py).

12/26/12 11:01:02 changed by rjollos

I'll probably get around to reviewing this more quickly if you attach it as a unified diff. Thanks for the patch!

(in reply to: ↑ description ) 12/26/12 14:12:08 changed by jholg

Hi,

please see the ticket description - I originally attached this as a full source file as the changes were quite substantial by the time I finished the functionality I needed.

That being said I can work on a patch against the latest trunk (or released version) if there are chances that this goes into the official version, and if there's agreement on the target functionality.

In particular:

Attached proposal code for IncludeMacro supports: * intertrac support (also requested in ticket:1196)

I can't remember anything that might be controversial about this.

* support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).

Note that to actually include external html without setting render_unsafe_content you need to modify the "safe tags" settings of HTMLSanitizer. More explanations below.

* the possibility to select part of the included html result by using a genshi transformer xpath expression

I think this is a nice optional add-on and doesn't add any security-relevant concerns in itself.

* if installed, BeautifulSoup will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

While this introduces an (optional) dependency on BeautifulSoup I found that more often than not included external (x)html is pretty broken and needs fixup. Also, sometimes Trac plugins break xhtml conformance. This is an easy way to mostly cover such issues.

This should also fix ticket:2474 (non-standard http port).

Probably not worth discussing.

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to {{{ HTMLSanitizer.SAFE_TAGS|set(["html", "body"]) }}} which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

As mentioned above I guess this is a point that warrants discussion. As I see things there's basically 2 things that can be done if you want to include external html pages:

  1. Set render_unsafe_content to True. Undesirable as a general setting imho.
  2. Do what I propose here. I am neither a Trac nor an http expert but I fail to see any real security concerns; the original macro author unfortunately never commented on that.

Holger


Add/Change #4743 (Patch: intertrac support, include external html, remote authentication)




Change Properties
Action