Modify

Opened 9 years ago

Closed 9 years ago

Last modified 6 years ago

#146 closed defect (fixed)

ASCII char > 128 fails - Internationalization no more working

Reported by: anonymous Owned by: athomas
Priority: normal Component: TocMacro
Severity: normal Keywords:
Cc: Trac Release:

Description

After upgrading from trac 0.8 to 0.9 TOC macro does not support international chars:

 Error: Macro TOC(None) failed
'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Attachments (0)

Change History (10)

comment:1 Changed 9 years ago by athomas

  • Status changed from new to assigned

Can you try this version? This was prior to a change which supposedly fixed #92.

If that doesn't work, please enable Trac logging and and add the full traceback to this ticket, along with an example heading that triggers the exception.

comment:2 Changed 9 years ago by anonymous

This version (r382) works great and there is no more internationalization problem.

comment:3 Changed 9 years ago by athomas

Hmm, okay. Well I'm really not sure what the correct solution is. Apparently r382 doesn't work with Japanese character sets, but the fix breaks other international characters :\

Perhaps I need to decode using the character set from the HTTP header...

comment:4 Changed 9 years ago by devil

The version of TOC macro reffered to download: http://trac-hacks.org/download/tocmacro.zip has indeed some difficulties with international chars. But I've tried some version from repository directly
source:tocmacro/0.9/TOC.py#382 and it is OK.

comment:5 Changed 9 years ago by Marc dot Zonzon at univ-rennes1.fr

  • Cc marc.zonzon@… added

OK getting rid of this decode('utf-8') solves the problem, just take the last release and apply the patch below, or take the version http://trac-hacks.org/browser/tocmacro/0.9/TOC.py?rev=320 320 (but you lose the docstring!).

I suppose that it fails because you have yet written unicode 16, before this utf-8 and StringIO cannot handle both as specified in http://docs.python.org/lib/module-StringIO.html module-StringIO

I don't think it is a problem of http header:

  • I'm using utf-8 so you will end-up with the same code that induces this bug,
  • The problem is not a bad 8 bits code, it is that your StringIO does not accept 8 bits.

--- TOC.py.bak  2006-03-17 13:38:26.000000000 +0100
+++ TOC.py      2006-03-17 15:09:52.000000000 +0100
@@ -87,7 +87,7 @@        
                 out.write('<a href="%s">%s</a> : %s</li>\n' % (env.href.wiki(page), page, formatted_header))
                 break                                                         
             else:                                                             
-                default_anchor = anchor = Formatter._anchor_re.sub("", header.decode('utf-8'))                                                 
+                default_anchor = anchor = Formatter._anchor_re.sub("", header)
                 anchor_n = 1     
                 while anchor in seen_anchors:  
                     anchor = default_anchor + str(anchor_n)

comment:6 Changed 9 years ago by Marc dot Zonzon at univ-rennes1.fr

I'm new to this trac ticket system, and when I have entered this cc field, I didn't think my email will appear, how can I erase it; please save me from a new load of spams''

comment:7 Changed 9 years ago by athomas

  • Cc marc.zonzon@… removed

All you do is clear the CC field...

comment:8 Changed 9 years ago by athomas

Regarding your previous comment, the thing is, the decode() was necessary to make TOC work for #92. So I have conflicting reports, one where decode() fixes a problem and one where it causes a problem.

Unfortunately I know very little about Python's locale support, so I can't really be of much help fixing it. As usual, patches welcome.

comment:9 Changed 9 years ago by Marc dot Zonzon at univ-rennes1.fr

How this decode can work in japanese is quite mysterious.

  • If you decode utf-8 you obtain an unicode string, so if you write this unicode string, you cannot any longer write a 8 bits string.
  • If you need to decode the header why not decode the formatted_header?

I'm quite new to trac, so I don't know how text is stored in database, but I cannot imagine why decoding here the header to unicode.

comment:10 Changed 9 years ago by athomas

  • Resolution set to fixed
  • Status changed from assigned to closed

Fixed in r574.

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.