Modify

Opened 12 years ago

Closed 11 years ago

Last modified 9 years ago

#146 closed defect (fixed)

ASCII char > 128 fails - Internationalization no more working

Reported by: anonymous Owned by: Alec Thomas
Priority: normal Component: TocMacro
Severity: normal Keywords:
Cc: Trac Release:

Description

After upgrading from trac 0.8 to 0.9 TOC macro does not support international chars:

 Error: Macro TOC(None) failed
'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Attachments (0)

Change History (10)

comment:1 Changed 12 years ago by Alec Thomas

Status: newassigned

Can you try [download:tocmacro-r382 this] version? This was prior to a change which supposedly fixed #92.

If that doesn't work, please enable Trac logging and and add the full traceback to this ticket, along with an example heading that triggers the exception.

comment:2 Changed 12 years ago by anonymous

This version (r382) works great and there is no more internationalization problem.

comment:3 Changed 12 years ago by Alec Thomas

Hmm, okay. Well I'm really not sure what the correct solution is. Apparently r382 doesn't work with Japanese character sets, but the fix breaks other international characters :\

Perhaps I need to decode using the character set from the HTTP header...

comment:4 Changed 11 years ago by devil

The version of TOC macro reffered to download: http://trac-hacks.org/download/tocmacro.zip has indeed some difficulties with international chars. But I've tried some version from repository directly source:tocmacro/0.9/TOC.py#382 and it is OK.

comment:5 Changed 11 years ago by marc.zonzon@…

Cc: marc.zonzon@… added; anonymous removed

OK getting rid of this decode('utf-8') solves the problem, just take the last release and apply the patch below, or take the version http://trac-hacks.org/browser/tocmacro/0.9/TOC.py?rev=320 320 (but you lose the docstring!).

I suppose that it fails because you have yet written unicode 16, before this utf-8 and StringIO cannot handle both as specified in http://docs.python.org/lib/module-StringIO.html module-StringIO

I don't think it is a problem of http header:

  • I'm using utf-8 so you will end-up with the same code that induces this bug,
  • The problem is not a bad 8 bits code, it is that your StringIO does not accept 8 bits.

--- TOC.py.bak  2006-03-17 13:38:26.000000000 +0100
+++ TOC.py      2006-03-17 15:09:52.000000000 +0100
@@ -87,7 +87,7 @@        
                 out.write('<a href="%s">%s</a> : %s</li>\n' % (env.href.wiki(page), page, formatted_header))
                 break                                                         
             else:                                                             
-                default_anchor = anchor = Formatter._anchor_re.sub("", header.decode('utf-8'))                                                 
+                default_anchor = anchor = Formatter._anchor_re.sub("", header)
                 anchor_n = 1     
                 while anchor in seen_anchors:  
                     anchor = default_anchor + str(anchor_n)

comment:6 Changed 11 years ago by marc.zonzon@…

I'm new to this trac ticket system, and when I have entered this cc field, I didn't think my email will appear, how can I erase it; please save me from a new load of spams''

comment:7 Changed 11 years ago by Alec Thomas

Cc: anonymous added; marc.zonzon@… removed

All you do is clear the CC field...

comment:8 Changed 11 years ago by Alec Thomas

Regarding your previous comment, the thing is, the decode() was necessary to make TOC work for #92. So I have conflicting reports, one where decode() fixes a problem and one where it causes a problem.

Unfortunately I know very little about Python's locale support, so I can't really be of much help fixing it. As usual, patches welcome.

comment:9 Changed 11 years ago by marc.zonzon@…

How this decode can work in japanese is quite mysterious.

  • If you decode utf-8 you obtain an unicode string, so if you write this unicode string, you cannot any longer write a 8 bits string.
  • If you need to decode the header why not decode the formatted_header?

I'm quite new to trac, so I don't know how text is stored in database, but I cannot imagine why decoding here the header to unicode.

comment:10 Changed 11 years ago by Alec Thomas

Resolution: fixed
Status: assignedclosed

Fixed in r574.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Alec Thomas.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.