Opened 7 years ago
Closed 7 years ago
#13243 closed defect (fixed)
Some inline images / attachments can not be viewed due to %0D%0A in URL
Reported by: | anonymous | Owned by: | lucid |
---|---|---|---|
Priority: | normal | Component: | MailArchivePlugin |
Severity: | normal | Keywords: | |
Cc: | Trac Release: |
Description
One email had an inline image (in HTML source <img src="cid:part1.6D8E6A6C.5AFB627A@gmail.com" alt="">
).
It is shown in the plugin as an attachment (part1.6D8E6A6C.5AFB627A@gmail.com.png; name=nkdjpgmpnpoepckd.png (52.4 KB) - added by (none) 7 minutes ago.
).
But clicking the link (https://.../trac/attachment/mailarchive/881/part1.6D8E6A6C.5AFB627A%40gmail.com.png%3B%0D%0A%20name%3Dnkdjpgmpnpoepckd.png
) leads to an error:
Bad Request - Invalid URL HTTP Error 400. The request URL is invalid.
The file is on disk in the tracenv/files/...
folder.
The problem seems to be the %0D%0A
part in the URL. (Maybe it's IIS which does not allow that?)
Attachments (0)
Change History (9)
comment:1 Changed 7 years ago by
comment:2 Changed 7 years ago by
Or better use something like this:
def parse_mime_type(mime_type): """Parse a mime-type into its parts. Example: 'image/png; name="a.png"' becomes: ('image/png', {'name': '"a.png"'}) """ parts = mime_type.split(';') type = parts[0].strip() params = dict([tuple([s.strip() for s in param.split('=', 1)]) for param in parts[1:]]) return (type, params)
and then:
mimetype, mimetype_params = parse_mime_type(mimetype) if "name" in mimetype_params: filename = mimetype_params["name"] else: # ...
comment:4 Changed 7 years ago by
We could simply use part.get_content_type()
rather than mimetype = part.get('Content-Type')
and parsing.
>>> from email import message_from_string >>> m = message_from_string("""From: <x> ... Content-Type: image/png;\n name="nkdjpgmpnpoepckd.png" ... ... body ... """) >>> m.get('Content-Type') 'image/png;\n name="nkdjpgmpnpoepckd.png"' >>> m.get_content_type() 'image/png'
Also, filename of attachment cannot have control characters. Those characters would be trimmed by trac.attachment
module.
comment:5 follow-up: 6 Changed 7 years ago by
Thanks, I was not aware. So we can use this to get the intended filename (nkdjpgmpnpoepckd.png
):
filename = part.get_param('name') if not filename: mimetype = part.get_content_type() ext = EXT_MAP.get(mimetype) # if ... else ... filename = cid + '.' + ext
Also, filename of attachment cannot have control characters. Those characters would be trimmed by
trac.attachment
module.
Are you sure? Unfortunately it seems to me filename is not trimmed automatically, but used as-is, leading to this problem.
comment:6 Changed 7 years ago by
Replying to anonymous:
Thanks, I was not aware. So we can use this to get the intended filename (
nkdjpgmpnpoepckd.png
):filename = part.get_param('name') if not filename: mimetype = part.get_content_type() ext = EXT_MAP.get(mimetype) # if ... else ... filename = cid + '.' + ext
We could use get_filename() to retrieve attached filename.
>>> m1 = message_from_string("""\ ... Content-Type: image/png; name="file-1.png" ... Content-Transfer-Encoding: base64 ... Content-ID: <image-0.png> ... Content-Disposition: inline; filename="file-1.png" ... """) >>> m1.get_filename() 'file-1.png' >>> >>> m2 = message_from_string("""\ ... Content-Transfer-Encoding: base64 ... Content-ID: <image-0.png> ... Content-Disposition: inline; filename="file-2.png" ... """) >>> m2.get_filename() 'file-2.png' >>> >>> m3 = message_from_string("""\ ... Content-Type: image/png; name="file-3.png" ... Content-Transfer-Encoding: base64 ... Content-ID: <image-0.png> ... """) >>> m3.get_filename() 'file-3.png' >>> >>> m4 = message_from_string("""\ ... Content-Type: image/png ... Content-Transfer-Encoding: base64 ... Content-ID: <image-0.png> ... """) >>> m4.get_filename() is None True
Also, filename of attachment cannot have control characters. Those characters would be trimmed by
trac.attachment
module.Are you sure? Unfortunately it seems to me filename is not trimmed automatically, but used as-is, leading to this problem.
Sorry for my misunderstanding. Adding an attachment via trac-admin and web ui, Trac replaces control characters with white spaces since 1.2. See trac:r13603.
Root cause of unable to view attachment named with LF characters is that .
in regular expression doesn't match LF in req.path_info
.
match = re.match(r'/(raw-|zip-)?attachment/([^/]+)(?:/(.*))?$', req.path_info)
See also:
- https://groups.google.com/d/topic/trac-dev/fl-36eCmuYQ/discussion
- trac:source:/tags/trac-1.2.2/trac/attachment.py@:164-165#L163
I think this plugin should normalize filename with the same logic in Trac. See trac:source:/tags/trac-1.2.2/trac/attachment.py@:1127-1141#L1127.
comment:7 Changed 7 years ago by
Thanks for all the research! So maybe something like this:
-
mailarchive/model.py
diff -r 7c0856815e45 mailarchive/model.py
a b 5 5 from email.header import decode_header 6 6 from email.utils import parsedate_tz, mktime_tz 7 7 from tempfile import TemporaryFile 8 import unicodedata 9 import re 8 10 9 11 from trac.attachment import Attachment 10 12 from trac.db import Table, Column, Index 11 13 from trac.mimeview.api import KNOWN_MIME_TYPES 12 14 from trac.resource import Resource 13 15 from trac.util.datefmt import from_utimestamp, to_utimestamp, utc 16 from trac.util.text import stripws 14 17 15 18 16 19 SCHEMA = [ … … 34 37 EXT_MAP['image/tiff'] = 'tiff' 35 38 EXT_MAP['image/svg+xml'] = 'svg' 36 39 40 DELETE_CHARS_RE = re.compile( 41 '[' + 42 ''.join(filter(lambda c: unicodedata.category(c) == 'Cc', 43 map(unichr, xrange(0x10000)))) + 44 '\\/:*?"<>|' + 45 ']') 46 47 def normalized_filename(filename): 48 filename = DELETE_CHARS_RE.sub(' ', filename) 49 filename = stripws(filename) 50 return filename 51 37 52 def terms_to_clauses(terms): 38 53 """Split list of search terms and the 'or' keyword into list of lists of search terms.""" 39 54 clauses = [[]] … … 125 140 @classmethod 126 141 def storeattachments(cls, env, mail, msg): 127 142 def add_attachment(payload, filename): 143 filename = normalized_filename(filename) 128 144 with TemporaryFile('w+b') as file: 129 145 file.write(payload) 130 146 size = file.tell() … … 143 159 add_attachment('Invalid attachment: message/rfc822 parts can not be base64 encoded!', part.get_filename()) 144 160 continue 145 161 add_attachment(part.get_payload(decode=True), part.get_filename()) 162 continue 146 163 147 164 cid = part.get('Content-ID') 148 165 if cid: 149 mimetype = part.get('Content-Type') 150 ext = EXT_MAP.get(mimetype) if mimetype in EXT_MAP else \ 151 mimetype[mimetype.rindex('/')+1:] if '/' in mimetype else \ 152 mimetype 153 filename = cid + '.' + ext 154 deletechars = '\/:*?"<>|' 155 filename = filename.translate(None, deletechars) 166 filename = part.get_filename() 167 if not filename: 168 mimetype = part.get_content_type() 169 ext = EXT_MAP.get(mimetype) if mimetype in EXT_MAP else \ 170 mimetype[mimetype.rindex('/')+1:] if '/' in mimetype else \ 171 mimetype 172 filename = cid + '.' + ext 173 156 174 add_attachment(part.get_payload(decode=True), filename) 157 175 158 176 @classmethod
comment:9 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks.
Use trac-admin <path/to/tracenv/> mailarchive fix-attachment-filenames
to fix old filenames somewhat.
(See trac:ticket:12870 for renaming attachment API.)
Looks like we are here with the following variable values:
cid
<part1.6D8E6A6C.5AFB627A@gmail.com>
mimetype
image/png;\n name="nkdjpgmpnpoepckd.png"
ext
png;\n name="nkdjpgmpnpoepckd.png"
filename
(beforetranslate
)<part1.6D8E6A6C.5AFB627A@gmail.com>.png;\n name="nkdjpgmpnpoepckd.png"
So something like this could help: