# Ticket #4743 (new enhancement)

Opened 4 years ago

## Patch: intertrac support, include external html, remote authentication

Reported by: Assigned to: jholg rjollos normal IncludeMacro normal intertrac 0.11

### Description

Hi,

I finally got round to working on IncludeMacro.

Attached proposal code for IncludeMacro supports:

• intertrac support (also requested in ticket:1196 )
• support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).
• the possibility to select part of the included html result by using a genshi transformer xpath expression
• if installed, BeautifulSoup? will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

This should also fix ticket:2474 (non-standard http port).

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to

HTMLSanitizer.SAFE_TAGS|set(["html", "body"])


which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

I'm attaching this as a full source file as the changes to the original source are by now quite significant.

I'd love to see getting these capabilities into the official IncludeMacro. Let me know if I can help.

Holger

## Change History

### 03/04/09 18:17:45 changed by jholg

proposal includemacro code for intertrac support / external html inclusion

### 04/29/09 18:03:24 changed by griznog

Patch to enable include of authenticated websites.

### 04/29/09 18:04:06 changed by griznog

I have attached a patch to your macro.py which enables support for authenticated websites. Although I may have broken processing the xpath argument (I don't use that so I didn't test it.)

### 12/13/12 20:57:23 changed by hasienda

• priority changed from high to normal.
• owner changed from coderanger to rjollos.
• type changed from defect to enhancement.
• summary changed from Patch: intertrac support, include external html to Patch: intertrac support, include external html, remote authentication.

#1196 has been closed as a duplicate of this ticket.

Other changes are on order to ensure current maintainers attention. Ultimately, let him choose priority for this feature.

### 12/13/12 21:39:52 changed by hasienda

#6879 refers to this ticket, because it has been created exclusively for adding remote authentication to this plugin.

### 12/20/12 10:56:08 changed by jholg

Just spotted the activity here so I'm attaching a newer version of my original patch. This basically adopts the basic auth support as provided by griznog plus options to control the HTMLSanitizers' "safe tags" and "safe attributes" in the macro config section.

Note that my own use case doesn't really expand to using the auth support so this part isn't well tested (by me).

This is of course still based on an ancient 0.11 revision of the original code; I've been using this for years now in our Trac intranet.

Holger

### 12/20/12 10:58:00 changed by jholg

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration

### 12/20/12 15:06:50 changed by jholg

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument

### (follow-up: ↓ 6 ) 12/20/12 15:14:02 changed by jholg

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

### 12/21/12 13:22:23 changed by jholg

Updated version of (intertrac- and external html supporting) macros.py, with basic auth and config control over HTMLSanitizer configuration, plus optional encoding hint argument, plus using http header encoding ("charset") information

### (in reply to: ↑ 5 ) 12/21/12 13:30:35 changed by jholg

And while I'm at it attachment:macros.3.py adds a parameter to manually set an encoding hint for the BeautifulSoup HTML cleanup, for situations when the default detection of the included source encoding doesn't work properly.

To better scratch my current itch this now hands the include source http header encoding information from the content-type charset to BeautifulSoup?, if no explicit from_encoding is set (attachment:macros.4.py).

### 12/26/12 11:01:02 changed by rjollos

I'll probably get around to reviewing this more quickly if you attach it as a unified diff. Thanks for the patch!

### (in reply to: ↑ description ) 12/26/12 14:12:08 changed by jholg

Hi,

please see the ticket description - I originally attached this as a full source file as the changes were quite substantial by the time I finished the functionality I needed.

That being said I can work on a patch against the latest trunk (or released version) if there are chances that this goes into the official version, and if there's agreement on the target functionality.

In particular:

Attached proposal code for IncludeMacro supports: * intertrac support (also requested in ticket:1196)

* support of actual inclusion of remote html pages, with conversion of relative or server-local absolute links to full absolute links (fixes ticket:3979).

Note that to actually include external html without setting render_unsafe_content you need to modify the "safe tags" settings of HTMLSanitizer. More explanations below.

* the possibility to select part of the included html result by using a genshi transformer xpath expression

I think this is a nice optional add-on and doesn't add any security-relevant concerns in itself.

* if installed, BeautifulSoup will be used to clean up messy included html. This is regularly the case for external html, but it also often happens that Trac macros/plugins break Trac's xhtml-validity.

While this introduces an (optional) dependency on BeautifulSoup I found that more often than not included external (x)html is pretty broken and needs fixup. Also, sometimes Trac plugins break xhtml conformance. This is an easy way to mostly cover such issues.

This should also fix ticket:2474 (non-standard http port).

Probably not worth discussing.

Regarding the 2nd point: The actual inclusion of remote sources is not possible in the original IncludeMacro, only html snippets can be used (unless render_unsafe_content is set to true). My version therefore changes the HTMLSanitizer instantiation to {{{ HTMLSanitizer.SAFE_TAGS|set(["html", "body"]) }}} which allows inclusion of "full" external html. Without allowing html & body, no (x)html documents will ever be rendered if render_unsafe_content is off, only (x)html snippets. Does it hurt to allow body to be parsed? I don't think so, as already non xhtml-valid inclusion may render due to the parser not yielding events in document order. All offensive stuff inside body will be filtered out nonetheless.

As mentioned above I guess this is a point that warrants discussion. As I see things there's basically 2 things that can be done if you want to include external html pages:

1. Set render_unsafe_content to True. Undesirable as a general setting imho.
2. Do what I propose here. I am neither a Trac nor an http expert but I fail to see any real security concerns; the original macro author unfortunately never commented on that.

Holger

### Add/Change #4743 (Patch: intertrac support, include external html, remote authentication)

Change Properties