Modify

Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#897 closed defect (wontfix)

two incorrect characters instead of one correct in PDF output

Reported by: anonymous Owned by: athomas
Priority: normal Component: PageToPdfPlugin
Severity: normal Keywords: UTF-8
Cc: Trac Release: 0.10

Description

Hi

I've checked out this plugin form subversion repository and it can't handle utf-8 encoded pages. Generates two characters instead of one correct. I've read previous posts on this topic and saw that it had been fixed, but it does not work for me. Thanks.

Attachments (0)

Change History (12)

comment:1 follow-up: Changed 8 years ago by coderanger

What is your default_charset in trac.ini?

comment:2 in reply to: ↑ 1 Changed 8 years ago by anonymous

Replying to coderanger:

What is your default_charset in trac.ini?

Hi. My trac.ini contains:

[trac]
default_charset = UTF-8

[pagetopdf]
charset = UTF-8

comment:3 follow-up: Changed 8 years ago by coderanger

I think that should be utf-8 (note the lower case).

comment:4 in reply to: ↑ 3 Changed 8 years ago by anonymous

Replying to coderanger:

I think that should be utf-8 (note the lower case).

Unfortunately, it doesn't work with lowercase, either.

Environment:

  • CentOS 4.3 linux
  • htmldoc 1.8.27
  • trac-0.10
  • Python 2.3.4

The text is in Hungarian with accented characters. Trac wiki works ok.

comment:5 follow-up: Changed 8 years ago by coderanger

What encoding are you actually using for the text?

comment:6 in reply to: ↑ 5 Changed 8 years ago by anonymous

Replying to coderanger:

What encoding are you actually using for the text?

I'm not sure, I understand your question correctly... What do you mean? I use utf-8 default_charset in trac.ini, the default is utf-8 on my linux-box. Wiki pages are utf-8 texts in trac:

[root@dev tmp]# trac-admin /opt/trac/dia wiki export TestPage test
[root@dev tmp]# file test
test: UTF-8 Unicode text, with CRLF line terminators
[root@dev tmp]#

comment:7 follow-up: Changed 8 years ago by coderanger

Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 8 years ago by anonymous

Replying to coderanger:

Trac uses Unicode strings internally, but this doesn't mean your browser is actually sending UTF8. Not sure how you check this on a Linux box, though I would hope it takes the system charset.

utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.

pagetopdf.py fragment:

        hfile, hfilename = mkstemp('tracpdf')
        codepage = self.env.config.get('trac', 'default_charset', 0)
        page = wiki_to_html(source, self.env, req).encode(codepage)
        page = re.sub('<img src="(?!\w+://)', '<img src="%s://%s:%d' % (req.scheme,              
                            req.server_name, req.server_port), page)
        os.write(hfile, '<html><body>' + page + '</body></html>')
        os.close(hfile)

Trac logs this:

2006-11-12 16:47:04,174 Trac[pagetopdf] DEBUG: --right 1.5cm --bottom 1.5cm --webpage  --top 1.5cm --format pdf14 --size A4 --charset utf-8 --left 1.5cm

comment:9 in reply to: ↑ 8 Changed 8 years ago by anonymous

utf-8 is default on linux boxes. htmldoc converts HTML to PDF, trac - I think - creates HTML page from wiki and gives it to htmldoc. The client's charset doesn't affect this process, as far as I know.

I changed the code to test other encoding (ISO-8859-2):

         page = wiki_to_html(source, self.env, req).encode('iso-8859-2') 

and

   htmldoc_args = { 'webpage': None, 'format': 'pdf14', 'left': '1.5cm',
                         'right': '1.5cm', 'top': '1.5cm', 'bottom': '1.5cm',
                         'charset': '8859-2'}

I left defaul_charset as utf-8, since I want utf on my wiki. Only PDF generation is based on Latin2 encoding.

This way it works ok for iso-latin-2 accented characters (utf-8 would be better, but it'll do at this moment). HTMLDOC can't handle UTF-8 (but then how is it possible to work somewhere?).

Well, this is a workaround, but not for trac - for HTMLDOC.

comment:10 Changed 8 years ago by athomas

  • Resolution set to wontfix
  • Status changed from new to closed

UTF-8 is not supported by htmldoc. You must use one of the supported encodings.

comment:11 Changed 8 years ago by coderanger

#980 has been marked as a duplicate.

comment:12 Changed 4 years ago by anonymous

  • Keywords UTF-8 added; utf8 removed

Add Comment

Modify Ticket

Action
as closed The owner will remain athomas.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.