Opened 5 years ago

Closed 3 years ago

# sending unicode email fails

Reported by: Owned by: anonymous Ryan J Ollos normal AnnouncerPlugin normal Stephan Geulette, Dmitri 1.0

### Description

The AnnouncerPlugin fails to send unicode emails for me. I get the following traceback:

Traceback (most recent call last):
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/api.py", line 584, in _real_send
evt)
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/distributors/mail.py", line 330, in distribute
self._do_send(transport, event, k, v, fmtdict[k])
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/distributors/mail.py", line 488, in _do_send
msgText = MIMEText(output, msg_format)
File "/usr/lib/python2.7/email/mime/text.py", line 30, in __init__
File "/usr/lib/python2.7/email/message.py", line 226, in set_payload
self.set_charset(charset)
File "/usr/lib/python2.7/email/message.py", line 262, in set_charset
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 32: ordinal not in range(128)


I'm using the mime_encoding = base64 and the plain text email format (but it seems to happen for html email as well). I'm on Python 2.7.3 and Trac 1.0.1.

### comment:1 Changed 5 years ago by Ryan J Ollos

I'm a bit surprised by this. Could you paste the [announcer] section from your trac.ini file just so we can double-check.

### comment:2 follow-up:  3 Changed 5 years ago by Ryan J Ollos

Hi, Would you mind trying out this (untested) patch?:

• ## announcerplugin/trunk/announcer/distributors/mail.py

diff --git a/announcerplugin/trunk/announcer/distributors/mail.py b/announcerplu
index b994502..bd79d1f 100644
 a class EmailDistributor(Component): rootMessage.attach(parentMessage) alt_msg_format = 'html' in alternate_style and 'html' or 'plain' msgText = MIMEText(alternate_output, alt_msg_format) msgText.set_charset(self._charset) msgText = MIMEText(alternate_output, alt_msg_format, self._charset) parentMessage.attach(msgText) else: parentMessage = rootMessage msg_format = 'html' in format and 'html' or 'plain' msgText = MIMEText(output, msg_format) msgText = MIMEText(output, msg_format, self._charset) del msgText['Content-Transfer-Encoding'] msgText.set_charset(self._charset) # According to RFC 2046, the last part of a multipart message is best #   and preferred. parentMessage.attach(msgText)

Based on documentation for MIMEText, we may need to specify the charset in the constructor.

### comment:3 in reply to:  2 ; follow-up:  4 Changed 5 years ago by anonymous

Thanks for your quick response. I'm fine with testing your suggestion. Here's the result:

Traceback (most recent call last):
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/api.py", line 584, in _real_send
evt)
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/distributors/mail.py", line 330, in distribute
self._do_send(transport, event, k, v, fmtdict[k])
File "/var/lib/trac/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r12503-py2.7.egg/announcer/distributors/mail.py", line 490, in _do_send
msgText = MIMEText(output, msg_format, self._charset)
File "/usr/lib/python2.7/email/mime/text.py", line 29, in __init__
**{'charset': _charset})
File "/usr/lib/python2.7/email/mime/base.py", line 25, in __init__
parts.append(_formatparam(k.replace('_', '-'), v))
File "/usr/lib/python2.7/email/message.py", line 45, in _formatparam
if value is not None and len(value) > 0:


(Regarding the line numbers I shall add, that I didn't remove the other lines but commented them out.)

I did some further debugging, which is hopefully useful to you: self._charset is an instance of email.charset.Charset having the following attributes (extracted by printing self._charset.__dict__ to the log file): {'input_codec': 'utf-8', 'body_encoding': 2, 'input_charset': 'utf-8', 'header_encoding': 2, 'output_charset': 'utf-8', 'output_codec': 'utf-8'}. As the python documentation of the MIMEText constructor says that the third parameter defaults to us-ascii, I just tried the string utf-8, which however creates a broken email like this:

Return-path: <trac@wobsta.de>
Envelope-to: contact@wobsta.de
Delivery-date: Wed, 27 Mar 2013 14:27:38 +0100
by h2032560.stratoserver.net with esmtp (Exim 4.76)
(envelope-from <trac@wobsta.de>)
id 1UKqO9-0008MK-Tl
for contact@wobsta.de; Wed, 27 Mar 2013 14:27:37 +0100
Content-Type: multipart/related;
boundary="===============2509084458330105465=="
MIME-Version: 1.0
Date: Wed, 27 Mar 2013 13:27:37 -0000
To: "undisclosed-recipients:"
Message-ID: <041.feac52a1e43c7efa8e99991f7bd1ff00@wobsta.de>
From: "wobsta.de" <trac@wobsta.de>
Subject: Blog: rundfunkgebuehren2 comment deleted
Auto-Submitted: auto-generated
Precedence: bulk
X-Announcer-Version: 1.0dev-r12503
X-Mailer: AnnouncerPlugin v1.0dev-r12503 on Trac v1.0.1
X-Trac-Announcement-Realm: blog
X-Trac-Project: wobsta.de
X-Trac-Version: 1.0.1
X-SA-Exim-Connect-IP: 127.0.0.1
X-SA-Exim-Mail-From: trac@wobsta.de
X-SA-Exim-Scanned: No (on h2032560.stratoserver.net); SAEximRunCond expanded to false

This is a multi-part message in MIME format.
--===============2509084458330105465==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

STNKMWJtUm1kVzVyWjJWaWRXVm9jbVZ1TWpvZ1VuVnVaR1oxYm10blpXTER2R2h5Wlc0Z1pzTzhj
aUJEYjIxd2RYUmxjaUJwYmlCbAphVzVsYlNCSVpXbHRZc084Y204c0lGWmxjbWhoYm1Sc2RXNW5J
SFZ1WkNCRmJuUnpZMmhsYVdSMWJtY2dZbVZwYlNCQ2RXNWtaWE4yClpYSjNZV3gwZFc1bmMyZGxj
bWxqYUhRS0Nnb0sK

--===============2509084458330105465==--


which displays as some random garbage (looks like base64 or so) in the mailer.

However, if I encode output using utf-8, i.e. use msgText = MIMEText(output.encode('utf-8'), msg_format), everything looks fine. Maybe the correct code should thus be msgText = MIMEText(output.encode(self._charset.input_codec), msg_format), but I'm not sure.

For the curious, as you asked me, this is the announcer section of my trac.ini:

[announcer]
default_email_format = text/plain
email_enabled = true
email_from = trac@wobsta.de
email_from_name =
email_sender = SmtpEmailSender
email_subject_prefix = __default__
email_to = undisclosed-recipients:
mime_encoding = base64
use_public_cc = false


### comment:4 in reply to:  3 Changed 5 years ago by Ryan J Ollos

... However, if I encode output using utf-8, i.e. use msgText = MIMEText(output.encode('utf-8'), msg_format), everything looks fine. Maybe the correct code should thus be msgText = MIMEText(output.encode(self._charset.input_codec), msg_format), but I'm not sure.

That seems like a good workaround for now. It seems like it should be possible to specify the encoding when constructing the MIMEText object, but the documentation was unclear, and I made a bit of an assumption when deciding to pass a Charset object as the third parameter.

Thank you for debugging and providing all of this info. I'll do some more testing in the next day or so, but if all else fails, we can just encode the output that is passed to the MIMEText constructor like you did.

### comment:5 Changed 5 years ago by anonymous

I'm fine with my patched version for now, but I really want to emphasize, that it needs fixing upstream. The reason is, that its a really odd failure, if you don't monitor your system in detail. Let me explain: In case you just setup trac with the annoucer plugin and test it with some simple message (ascii only), it pretty much works like a charm. (So first of all thanks for the great plugin, its awesome.)

I just did so as well, but I was lucky to test it with some real data, which fortunately happened to contain non-ascii characters (as it will be rather common in some of my use-cases). Anyway, the problem is, that everything still looks rather ok, just the email is not sent at all. If you turn on logging in trac, you will find the traceback in the logs, but for the person how triggered the email everything looks fine. You do not receive an error and the item is updated online properly. So in the end of the day, people will just not receive their notification emails, without anybody to understand why this happens! I guess you agree, that this is a really serious issue.

I was just poking around, whether my solution is kind of correct. I agree that the Python documentation is not really useful here. (It's a shame, actually.) However, I found a blog message about it: http://mg.pov.lt/blog/unicode-emails-in-python ... which supports my solution. On the other hand, people have been disussing the problem elsewhere as well: see ht tp://bugs.python.org/issue1368247 ... but it looks like it is still not applied on python2.7. On my system Python still does not contain the suggested patch (ht tp://bugs.python.org/file12190/mimetext-unicode.patch). I'm on Ubuntu 12.04 LTS using the python from the distrubution, which happens to be Python 2.7.3. I was just lucky to try such an encoding myself, however, using self._charset.input_codec, which just happen to be utf-8 in my case as well. Probably self._charset.output_charset is the right thing (although I wonder why it should be the output charset, as the output in the announcer code is kind of the input to the email system, but I just might be confused about the terminology.

### comment:6 Changed 5 years ago by anonymous

PS: I'm sorry for posting broken links. The spam filter complained about my reply being spam due to too many external links, so I slightly broke the link format. I'm sorry for that workaround (instead of createing an account for myself, which probably would have worked around the spam filter as well).

### comment:7 Changed 5 years ago by Ryan J Ollos

Cc: Stephan Geulette added; anonymous removed

#11227 closed as a duplicate. We should consider applying the patch in this ticket.

### comment:8 Changed 5 years ago by bormotov@…

use AnnouncerPlugin from trunk (r13373), and still got this error

2013-09-02 23:45:45,425 Trac[api] ERROR: AnnouncementSystem failed.
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/announcer/api.py", line 584, in _real_send
evt)
File "/usr/lib/python2.5/site-packages/announcer/distributors/mail.py", line 330, in distribute
self._do_send(transport, event, k, v, fmtdict[k])
File "/usr/lib/python2.5/site-packages/announcer/distributors/mail.py", line 490, in _do_send
msgText.set_charset(self._charset)
File "/usr/lib/python2.5/email/message.py", line 262, in set_charset
File "/usr/lib/python2.5/email/charset.py", line 384, in body_encode
return email.base64mime.body_encode(s)
File "/usr/lib/python2.5/email/base64mime.py", line 148, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-13: ordinal not in range(128)


Trac 1.0.1

workaround with MIMEText(output.encode('utf-8'), msg_format) helps

### comment:9 Changed 4 years ago by anonymous

Looks like Trac's notification system fixed this problem in trac:changeset:10176 by letting Genshi encode the body: trac:source:trunk/trac/notification.py@12085:416,474#L409

Announcer could do this in each formatter: source:announcerplugin/trunk/announcer/formatters.py@12359:152,248,318#L248

### comment:10 follow-up:  11 Changed 4 years ago by Ryan J Ollos

Issue was raised again on the mailing list.

### comment:11 in reply to:  10 Changed 4 years ago by roger@…

Issue was raised again on the mailing list.

And suddenly it appears again. No changes to the system. But e-mail stops. I get this in the Trac log:

2013-12-03 11:52:47,761 Trac[api] ERROR: AnnouncementSystem failed.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r0-py2.7.egg/announcer/api.py", line 584, in _real_send
evt)
File "/usr/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r0-py2.7.egg/announcer/distributors/mail.py", line 330, in distribute
self._do_send(transport, event, k, v, fmtdict[k])
File "/usr/lib/python2.7/site-packages/TracAnnouncer-1.0dev_r0-py2.7.egg/announcer/distributors/mail.py", line 481, in _do_send
msgText = MIMEText(alternate_output, alt_msg_format)
File "/usr/lib/python2.7/email/mime/text.py", line 30, in __init__
File "/usr/lib/python2.7/email/message.py", line 226, in set_payload
self.set_charset(charset)
File "/usr/lib/python2.7/email/message.py", line 262, in set_charset
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 472: ordinal not in range(128)


I have made the edit suggested in 8. Odd that it seemed to solve the problem for a bit, and then the problem returns.

### comment:12 Changed 4 years ago by Jun Omae

#11354 was closed as duplicate.

### Changed 4 years ago by patrick

patch implied by comment #9

### comment:13 Changed 4 years ago by patrick

I ran into exactly the same problem.

It occurred out of the blue after no issues whatsoever for approx. 30 tickets with various comments and modifications along the way. Then all of sudden this issue shows up. My error message concerned a white space:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 4639: ordinal not in range(128)


The fact that it occurred after some time makes me wonder if this is somehow tied to a counter that is being incremented with each ticket creation, modification, comment etc...

At any rate I applied the patch implied in the comment above and it corrected the problem for me. At least for the time being. See: attachment:formatters_patch.diff. And I apologize that the description of the attachment is linking to an incorrect ticket.

### Changed 4 years ago by Dmitri

changed patch from comment:3 which worked for me

### comment:15 follow-up:  16 Changed 4 years ago by Dmitri

I tried the patch from comment:3 (with output.encode(self._charset.input_codec)): exception disappeared but now all parts have charset="us-ascii" in Content-Type header. It seems that lines with msgText.set_charset(self._charset) should be retained.

attachment:formatters_patch.diff didn't help at all.

attachment:sending-unicode-email-fails.patch works for me.

### comment:16 in reply to:  15 ; follow-up:  17 Changed 3 years ago by dskrzypczak

I tried the patch from comment:3 (with output.encode(self._charset.input_codec)): exception disappeared but now all parts have charset="us-ascii" in Content-Type header. It seems that lines with msgText.set_charset(self._charset) should be retained.

attachment:formatters_patch.diff didn't help at all.

attachment:sending-unicode-email-fails.patch works for me.

I had similar problem, and attachment:sending-unicode-email-fails.patch also works for me.

Could you push it to trunk?

Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/announcer/api.py", line 584, in _real_send
evt)
File "build/bdist.linux-x86_64/egg/announcer/distributors/mail.py", line 330, in distribute
self._do_send(transport, event, k, v, fmtdict[k])
File "build/bdist.linux-x86_64/egg/announcer/distributors/mail.py", line 488, in _do_send
msgText = MIMEText(output, msg_format)
File "/usr/lib/python2.7/email/mime/text.py", line 30, in __init__
File "/usr/lib/python2.7/email/message.py", line 226, in set_payload
self.set_charset(charset)
File "/usr/lib/python2.7/email/message.py", line 262, in set_charset
UnicodeEncodeError: 'ascii' codec can't encode character u'\u015b' in position 168: ordinal not in range(128)


### comment:17 in reply to:  16 Changed 3 years ago by Steffen Hoffmann

I had similar problem, and attachment:sending-unicode-email-fails.patch also works for me.

Could you push it to trunk?

Thanks for the positive test feedback. I'll do.

### comment:18 Changed 3 years ago by Robert Becker

The issue still remains in trunk - the patch is not yet applied.

### comment:19 Changed 3 years ago by Ryan J Ollos

Owner: changed from Steffen Hoffmann to Ryan J Ollos new → accepted

### comment:20 Changed 3 years ago by Ryan J Ollos

Resolution: → fixed accepted → closed

In 14855:

1.0dev: Encode unicode output to UTF-8. Fixes #10974.

Nice, thanks!

### comment:22 Changed 3 years ago by Robert Becker

The changeset contains an error in line 481:

if isinstance(alternative_output, unicode):

should be:

if isinstance(alternate_output, unicode):

### comment:23 Changed 3 years ago by Ryan J Ollos

In 14867:

1.0dev: Fixed typo in [14855]. Refs #10974.

### Modify Ticket

Change Properties
Action
as closed The owner will remain Ryan J Ollos.
The resolution will be deleted.