Modify

Opened 11 years ago

Closed 11 years ago

Last modified 6 years ago

#897 closed defect (wontfix)

two incorrect characters instead of one correct in PDF output

Reported by: anonymous Owned by: Alec Thomas
Priority: normal Component: PageToPdfPlugin
Severity: normal Keywords: UTF-8
Cc: Trac Release: 0.10

Description

Hi

I've checked out this plugin form subversion repository and it can't handle utf-8 encoded pages. Generates two characters instead of one correct. I've read previous posts on this topic and saw that it had been fixed, but it does not work for me. Thanks.

Attachments (0)

Change History (12)

comment:1 Changed 11 years ago by Noah Kantrowitz

What is your default_charset in trac.ini?

comment:2 in reply to:  1 Changed 11 years ago by anonymous

Replying to coderanger:

What is your default_charset in trac.ini?

Hi. My trac.ini contains:

[trac]
default_charset = UTF-8

[pagetopdf]
charset = UTF-8

comment:3 Changed 11 years ago by Noah Kantrowitz

I think that should be utf-8 (note the lower case).

comment:4 in reply to:  3 Changed 11 years ago by anonymous

Replying to coderanger:

I think that should be utf-8 (note the lower case).

Unfortunately, it doesn't work with lowercase, either.

Environment:

  • CentOS 4.3 linux
  • htmldoc 1.8.27
  • trac-0.10
  • Python 2.3.4

The text is in Hungarian with accented characters. Trac wiki works ok.

comment:5 Changed 11 years ago by Noah Kantrowitz

What encoding are you actually using for the text?

comment:6 in reply to:  5 Changed 11 years ago by anonymous

Replying to coderanger:

What encoding are you actually using for the text?

I'm not sure, I understand your question correctly... What do you mean? I use utf-8 default_charset in trac.ini, the default is utf-8 on my linux-box. Wiki pages are utf-8 texts in trac:

[root@dev tmp]# trac-admin /opt/trac/dia wiki export TestPage test
[root@dev tmp]# file test
test: UTF-8 Unicode text, with CRLF line terminators
[root@dev tmp]#

comment:7 Changed 11 years ago by Noah Kantrowitz

Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.

comment:8 in reply to:  7 ; Changed 11 years ago by anonymous

Replying to coderanger:

Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.

utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.

pagetopdf.py fragment:

        hfile, hfilename = mkstemp('tracpdf')
        codepage = self.env.config.get('trac', 'default_charset', 0)
        page = wiki_to_html(source, self.env, req).encode(codepage)
        page = re.sub('<img src="(?!\w+://)', '<img src="%s://%s:%d' % (req.scheme,              
                            req.server_name, req.server_port), page)
        os.write(hfile, '<html><body>' + page + '</body></html>')
        os.close(hfile)

Trac logs this:

2006-11-12 16:47:04,174 Trac[pagetopdf] DEBUG: --right 1.5cm --bottom 1.5cm --webpage  --top 1.5cm --format pdf14 --size A4 --charset utf-8 --left 1.5cm

comment:9 in reply to:  8 Changed 11 years ago by anonymous

utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.

I changed the code to test other encoding (ISO-8859-2):

         page = wiki_to_html(source, self.env, req).encode('iso-8859-2') 

and

   htmldoc_args = { 'webpage': None, 'format': 'pdf14', 'left': '1.5cm',
                         'right': '1.5cm', 'top': '1.5cm', 'bottom': '1.5cm',
                         'charset': '8859-2'}

I left defaul_charset as utf-8, since I want utf on my wiki. Only PDF generation is based on Latin2 encoding.

This way it works ok for iso-latin-2 accented characters (utf-8 would be better, but it'll do at this moment). HTMLDOC can't handle UTF-8 (but then how is it possible to work somewhere?).

Well, this is a workaround, but not for trac - for HTMLDOC.

comment:10 Changed 11 years ago by Alec Thomas

Resolution: wontfix
Status: newclosed

UTF-8 is not supported by htmldoc. You must use one of the supported encodings.

comment:11 Changed 11 years ago by Noah Kantrowitz

#980 has been marked as a duplicate.

comment:12 Changed 6 years ago by anonymous

Keywords: UTF-8 added; utf8 removed

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain Alec Thomas.
The resolution will be deleted.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.