Modify

Opened 7 years ago

Closed 7 years ago

#13243 closed defect (fixed)

Some inline images / attachments can not be viewed due to %0D%0A in URL

Reported by: anonymous Owned by: lucid
Priority: normal Component: MailArchivePlugin
Severity: normal Keywords:
Cc: Trac Release:

Description

One email had an inline image (in HTML source <img src="cid:part1.6D8E6A6C.5AFB627A@gmail.com" alt="">).

It is shown in the plugin as an attachment (part1.6D8E6A6C.5AFB627A@gmail.com.png; name=nkdjpgmpnpoepckd.png​ (52.4 KB) - added by (none) 7 minutes ago.).

But clicking the link (https://.../trac/attachment/mailarchive/881/part1.6D8E6A6C.5AFB627A%40gmail.com.png%3B%0D%0A%20name%3Dnkdjpgmpnpoepckd.png) leads to an error:

Bad Request - Invalid URL

HTTP Error 400. The request URL is invalid.

The file is on disk in the tracenv/files/... folder.

The problem seems to be the %0D%0A part in the URL. (Maybe it's IIS which does not allow that?)

Attachments (0)

Change History (9)

comment:1 Changed 7 years ago by anonymous

Looks like we are here with the following variable values:

Variable Value
cid <part1.6D8E6A6C.5AFB627A@gmail.com>
mimetype image/png;\n name="nkdjpgmpnpoepckd.png"
ext png;\n name="nkdjpgmpnpoepckd.png"
filename (before translate) <part1.6D8E6A6C.5AFB627A@gmail.com>.png;\n name="nkdjpgmpnpoepckd.png"

So something like this could help:

import re
# ...
m = re.match("^[\w/]+;\r\n name=\"([\w\d\.]+)\"$", mimetype)
if m:
    filename = str(m.group(1))
else:
    # ...

comment:2 Changed 7 years ago by anonymous

Or better use something like this:

def parse_mime_type(mime_type):
    """Parse a mime-type into its parts.

    Example: 'image/png; name="a.png"'
    becomes: ('image/png', {'name': '"a.png"'})
    """
    parts = mime_type.split(';')
    type = parts[0].strip()
    params = dict([tuple([s.strip()
                          for s in param.split('=', 1)])
                   for param in parts[1:]])
    return (type, params)

and then:

mimetype, mimetype_params = parse_mime_type(mimetype)
if "name" in mimetype_params:
    filename = mimetype_params["name"]
else:
    # ...

comment:3 Changed 7 years ago by anonymous

And add % to the filtered characters?

comment:4 Changed 7 years ago by Jun Omae

We could simply use part.get_content_type() rather than mimetype = part.get('Content-Type') and parsing.

>>> from email import message_from_string
>>> m = message_from_string("""From: <x>
... Content-Type: image/png;\n name="nkdjpgmpnpoepckd.png"
...
... body
... """)
>>> m.get('Content-Type')
'image/png;\n name="nkdjpgmpnpoepckd.png"'
>>> m.get_content_type()
'image/png'

Also, filename of attachment cannot have control characters. Those characters would be trimmed by trac.attachment module.

comment:5 Changed 7 years ago by anonymous

Thanks, I was not aware. So we can use this to get the intended filename (nkdjpgmpnpoepckd.png):

filename = part.get_param('name')
if not filename:
    mimetype = part.get_content_type()
    ext = EXT_MAP.get(mimetype) # if ... else ...
    filename = cid + '.' + ext

Also, filename of attachment cannot have control characters. Those characters would be trimmed by trac.attachment module.

Are you sure? Unfortunately it seems to me filename is not trimmed automatically, but used as-is, leading to this problem.

comment:6 in reply to:  5 Changed 7 years ago by Jun Omae

Replying to anonymous:

Thanks, I was not aware. So we can use this to get the intended filename (nkdjpgmpnpoepckd.png):

filename = part.get_param('name')
if not filename:
    mimetype = part.get_content_type()
    ext = EXT_MAP.get(mimetype) # if ... else ...
    filename = cid + '.' + ext

We could use get_filename() to retrieve attached filename.

>>> m1 = message_from_string("""\
... Content-Type: image/png; name="file-1.png"
... Content-Transfer-Encoding: base64
... Content-ID: <image-0.png>
... Content-Disposition: inline; filename="file-1.png"
... """)
>>> m1.get_filename()
'file-1.png'
>>>
>>> m2 = message_from_string("""\
... Content-Transfer-Encoding: base64
... Content-ID: <image-0.png>
... Content-Disposition: inline; filename="file-2.png"
... """)
>>> m2.get_filename()
'file-2.png'
>>>
>>> m3 = message_from_string("""\
... Content-Type: image/png; name="file-3.png"
... Content-Transfer-Encoding: base64
... Content-ID: <image-0.png>
... """)
>>> m3.get_filename()
'file-3.png'
>>>
>>> m4 = message_from_string("""\
... Content-Type: image/png
... Content-Transfer-Encoding: base64
... Content-ID: <image-0.png>
... """)
>>> m4.get_filename() is None
True

Also, filename of attachment cannot have control characters. Those characters would be trimmed by trac.attachment module.

Are you sure? Unfortunately it seems to me filename is not trimmed automatically, but used as-is, leading to this problem.

Sorry for my misunderstanding. Adding an attachment via trac-admin and web ui, Trac replaces control characters with white spaces since 1.2. See trac:r13603.

Root cause of unable to view attachment named with LF characters is that . in regular expression doesn't match LF in req.path_info.

        match = re.match(r'/(raw-|zip-)?attachment/([^/]+)(?:/(.*))?$',
                         req.path_info)

See also:

I think this plugin should normalize filename with the same logic in Trac. See trac:source:/tags/trac-1.2.2/trac/attachment.py@:1127-1141#L1127.

comment:7 Changed 7 years ago by anonymous

Thanks for all the research! So maybe something like this:

  • mailarchive/model.py

    diff -r 7c0856815e45 mailarchive/model.py
    a b  
    55from email.header import decode_header
    66from email.utils import parsedate_tz, mktime_tz
    77from tempfile import TemporaryFile
     8import unicodedata
     9import re
    810
    911from trac.attachment import Attachment
    1012from trac.db import Table, Column, Index
    1113from trac.mimeview.api import KNOWN_MIME_TYPES
    1214from trac.resource import Resource
    1315from trac.util.datefmt import from_utimestamp, to_utimestamp, utc
     16from trac.util.text import stripws
    1417
    1518
    1619SCHEMA = [
     
    3437EXT_MAP['image/tiff'] = 'tiff'
    3538EXT_MAP['image/svg+xml'] = 'svg'
    3639
     40DELETE_CHARS_RE = re.compile(
     41    '[' +
     42    ''.join(filter(lambda c: unicodedata.category(c) == 'Cc',
     43                   map(unichr, xrange(0x10000)))) +
     44    '\\/:*?"<>|' +
     45    ']')
     46
     47def normalized_filename(filename):
     48    filename = DELETE_CHARS_RE.sub(' ', filename)
     49    filename = stripws(filename)
     50    return filename
     51
    3752def terms_to_clauses(terms):
    3853    """Split list of search terms and the 'or' keyword into list of lists of search terms."""
    3954    clauses = [[]]
     
    125140    @classmethod
    126141    def storeattachments(cls, env, mail, msg):
    127142        def add_attachment(payload, filename):
     143            filename = normalized_filename(filename)
    128144            with TemporaryFile('w+b') as file:
    129145                file.write(payload)
    130146                size = file.tell()
     
    143159                        add_attachment('Invalid attachment: message/rfc822 parts can not be base64 encoded!', part.get_filename())
    144160                        continue
    145161                    add_attachment(part.get_payload(decode=True), part.get_filename())
     162                    continue
    146163
    147164            cid = part.get('Content-ID')
    148165            if cid:
    149                 mimetype = part.get('Content-Type')
    150                 ext = EXT_MAP.get(mimetype) if mimetype in EXT_MAP else \
    151                     mimetype[mimetype.rindex('/')+1:] if '/' in mimetype else \
    152                     mimetype
    153                 filename = cid + '.' + ext
    154                 deletechars = '\/:*?"<>|'
    155                 filename = filename.translate(None, deletechars)
     166                filename = part.get_filename()
     167                if not filename:
     168                    mimetype = part.get_content_type()
     169                    ext = EXT_MAP.get(mimetype) if mimetype in EXT_MAP else \
     170                        mimetype[mimetype.rindex('/')+1:] if '/' in mimetype else \
     171                        mimetype
     172                    filename = cid + '.' + ext
     173               
    156174                add_attachment(part.get_payload(decode=True), filename)
    157175
    158176    @classmethod

comment:8 Changed 7 years ago by lucid

In 16717:

MailArchivePlugin: Fix attachment filenames.
(see #13243)

comment:9 Changed 7 years ago by lucid

Resolution: fixed
Status: newclosed

Thanks.

Use trac-admin <path/to/tracenv/> mailarchive fix-attachment-filenames to fix old filenames somewhat.

(See trac:ticket:12870 for renaming attachment API.)

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain lucid.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.