Opened 11 years ago

Closed 10 years ago

# ASCII char > 128 fails - Internationalization no more working

Reported by: Owned by: anonymous athomas normal TocMacro normal

### Description

After upgrading from trac 0.8 to 0.9 TOC macro does not support international chars:

 Error: Macro TOC(None) failed
'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)


### comment:1 Changed 11 years ago by athomas

• Status changed from new to assigned

Can you try [download:tocmacro-r382 this] version? This was prior to a change which supposedly fixed #92.

If that doesn't work, please enable Trac logging and and add the full traceback to this ticket, along with an example heading that triggers the exception.

### comment:2 Changed 11 years ago by anonymous

This version (r382) works great and there is no more internationalization problem.

### comment:3 Changed 11 years ago by athomas

Hmm, okay. Well I'm really not sure what the correct solution is. Apparently r382 doesn't work with Japanese character sets, but the fix breaks other international characters :\

Perhaps I need to decode using the character set from the HTTP header...

### comment:4 Changed 10 years ago by devil

The version of TOC macro reffered to download: http://trac-hacks.org/download/tocmacro.zip has indeed some difficulties with international chars. But I've tried some version from repository directly source:tocmacro/0.9/TOC.py#382 and it is OK.

### comment:5 Changed 10 years ago by marc.zonzon@…

• Cc marc.zonzon@… added; anonymous removed

OK getting rid of this decode('utf-8') solves the problem, just take the last release and apply the patch below, or take the version http://trac-hacks.org/browser/tocmacro/0.9/TOC.py?rev=320 320 (but you lose the docstring!).

I suppose that it fails because you have yet written unicode 16, before this utf-8 and StringIO cannot handle both as specified in http://docs.python.org/lib/module-StringIO.html module-StringIO

I don't think it is a problem of http header:

• I'm using utf-8 so you will end-up with the same code that induces this bug,
• The problem is not a bad 8 bits code, it is that your StringIO does not accept 8 bits.

--- TOC.py.bak  2006-03-17 13:38:26.000000000 +0100
+++ TOC.py      2006-03-17 15:09:52.000000000 +0100
@@ -87,7 +87,7 @@
out.write('<a href="%s">%s</a> : %s</li>\n' % (env.href.wiki(page), page, formatted_header))
break
else:
-                default_anchor = anchor = Formatter._anchor_re.sub("", header.decode('utf-8'))
+                default_anchor = anchor = Formatter._anchor_re.sub("", header)
anchor_n = 1
while anchor in seen_anchors:
anchor = default_anchor + str(anchor_n)


### comment:6 Changed 10 years ago by marc.zonzon@…

I'm new to this trac ticket system, and when I have entered this cc field, I didn't think my email will appear, how can I erase it; please save me from a new load of spams''

### comment:7 Changed 10 years ago by athomas

• Cc anonymous added; marc.zonzon@… removed

All you do is clear the CC field...

### comment:8 Changed 10 years ago by athomas

Regarding your previous comment, the thing is, the decode() was necessary to make TOC work for #92. So I have conflicting reports, one where decode() fixes a problem and one where it causes a problem.

Unfortunately I know very little about Python's locale support, so I can't really be of much help fixing it. As usual, patches welcome.

### comment:9 Changed 10 years ago by marc.zonzon@…

How this decode can work in japanese is quite mysterious.

• If you decode utf-8 you obtain an unicode string, so if you write this unicode string, you cannot any longer write a 8 bits string.
• If you need to decode the header why not decode the formatted_header?

I'm quite new to trac, so I don't know how text is stored in database, but I cannot imagine why decoding here the header to unicode.

### comment:10 Changed 10 years ago by athomas

• Resolution set to fixed
• Status changed from assigned to closed

Fixed in r574.