Edgewall Software

Ticket #3058 (closed defect: fixed)

Opened 2 years ago

Last modified 2 years ago

Bug of the wiki compiler

Reported by: m.petretta@… Owned by: cboos
Priority: highest Milestone: 0.9.6
Component: general Version: 0.9.5
Severity: normal Keywords: unicode
Cc:

Description (last modified by cboos) (diff)

When compiling a page of a project of mine, the compiler crashes without any reason. By seeveral tests, I isolated the problem: it happens when I add to the page the following string: " === Indice di Priorità ==="

In the following is the python traceback:

Traceback (most recent call last):
  File "C:\Python24\lib\site-packages\trac\web\standalone.py", line 303, in _do_trac_req
    dispatch_request(path_info, req, env)
  File "C:\Python24\lib\site-packages\trac\web\main.py", line 139, in dispatch_request
    dispatcher.dispatch(req)
  File "C:\Python24\lib\site-packages\trac\web\main.py", line 107, in dispatch
    resp = chosen_handler.process_request(req)
  File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 92, in process_request
    self._render_editor(req, db, page, preview=True)
  File "C:\Python24\lib\site-packages\trac\wiki\web_ui.py", line 311, in _render_editor
    info['page_html'] = wiki_to_html(page.text, self.env, req, db)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 744, in wiki_to_html
    Formatter(env, req, absurls, db).format(wikitext, out, escape_newlines)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 599, in format
    result = re.sub(self.rules, self.replace, line)
  File "C:\Python24\lib\sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 221, in replace
    return getattr(self, '_' + itype + '_formatter')(match, fullmatch)
  File "C:\Python24\lib\site-packages\trac\wiki\formatter.py", line 389, in _heading_formatter
    anchor = self._anchor_re.sub('', sans_markup.decode('utf-8'))
  File "C:\Python24\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 17: unexpected end of data

Attachments

Change History

Changed 2 years ago by cboos

  • keywords unicode added
  • owner changed from jonas to cboos
  • priority changed from normal to highest
  • description modified (diff)
  • milestone set to 0.9.6

Right, I can reproduce this.

Changed 2 years ago by cboos

Does anybody have an idea why, on the command line, I have this:

>>> s = 'Indice di Priorit\xc3\xa0'
>>> s.strip()
'Indice di Priorit\xc3\xa0'

while in Trac, the same strip() operation, on the same input, returns 'Indice di Priorit\xc3' ?

i.e.

  • formatter.py

     
    657660        self.out = out 
    658661        self._open_tags = [] 
    659662 
     663        print 'oneliner', type(text), `text` 
     664        print 'oneliner.strip', `text.strip()` 
     665 
    660666        # Simplify code blocks 
    661667        in_code_block = 0 

shows:

oneliner <type 'str'> 'Indice di Priorit\xc3\xa0'
oneliner.strip 'Indice di Priorit\xc3'

?

Changed 2 years ago by cboos

Answering to myself:

>>> s = 'Indice di Priorit\xc3\xa0'
>>> s.strip()
'Indice di Priorit\xc3\xa0'
>>> import locale
>>> locale.getlocale()
(None, None)
>>> locale.setlocale(locale.LC_ALL, 'en')
'English_United States.1252'
>>> s.strip()
'Indice di Priorit\xc3'

Changed 2 years ago by cboos

... and in cp1252, we have: A0 = U+00A0 : NO-BREAK SPACE

Yet another example of why using unicode internally is so important (0.10).

In the meantime, for this issue, a temporary conversion to unicode could do the trick:

Index: trac/wiki/formatter.py
===================================================================
--- trac/wiki/formatter.py	(revision 3213)
+++ trac/wiki/formatter.py	(working copy)
@@ -21,6 +21,7 @@
 import re
 import os
 import urllib
+import StringIO as pyStringIO
 
 try:
     from cStringIO import StringIO
@@ -660,7 +661,9 @@
         # Simplify code blocks
         in_code_block = 0
         processor = None
-        buf = StringIO()
+        buf = pyStringIO.StringIO()
+        text = unicode(text, 'utf-8', 'replace')
+
         for line in text.strip().splitlines():
             if line.strip() == '{{{':
                 in_code_block += 1
@@ -678,6 +681,7 @@
             else:
                 print>>buf, line
         result = buf.getvalue()[:-1]
+        result = result.encode('utf-8')
 
         if shorten:
             result = util.shorten_line(result)

Opinions?

Changed 2 years ago by cboos

  • status changed from new to closed
  • resolution set to fixed

Fixed in r3236.

Add/Change #3058 (Bug of the wiki compiler)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.