Edgewall Software

Ticket #2868 (closed defect: duplicate)

Opened 3 years ago

Last modified 3 years ago

garbled unicode chars in inline diff view

Reported by: Andrew Stromnov Owned by: jonas
Priority: normal Milestone:
Component: version control/changeset view Version: devel
Severity: minor Keywords: diff unicode
Cc:

Description

garbled unicode chars in highlighted inline diff view

Example: diff between str1="АШИПКА" and str2="ОШИБКА"

  1. str1 and str2 passed to markup_intraline_changes as raw strings (not unicode) as '\xd0\x90\xd0\xa8\xd0\x98\xd0\x9f\xd0\x9a\xd0\x90' and '\xd0\x9e\xd0\xa8\xd0\x98\xd0\x91\xd0\x9a\xd0\x90' accordingly
  2. Then str1 and str2 passed to _get_change_extent. But in this raw strings extent calculated from '\x9e' (second octet of UTF8 char).
  3. Results after tag substitution: '\xd0<del>\x90\xd0\xa8\xd0\x98\xd0\x9f</del>\xd0\x9a\xd0\x90' and '\xd0<add>\x9e\xd0\xa8\xd0\x98\xd0\x91</add>\xd0\x9a\xd0\x90'. First UTF8 chars are broken.

Possible fix: use unicode strings for extent calculation.

Quick (and dirty) hack:

--- diff.py.orig	Mon Mar 13 13:43:21 2006
+++ diff.py	Mon Mar 13 15:26:11 2006
@@ -148,6 +147,11 @@
             if tag == 'replace' and i2 - i1 == j2 - j1:
                 for i in range(i2 - i1):
                     fromline, toline = fromlines[i1 + i], tolines[j1 + i]
+		    
+                    fromline, toline = fromline.decode('utf8'), toline.decode('utf8')
+		    
                     (start, end) = _get_change_extent(fromline, toline)
 
                     if start == 0 and end < 0:
@@ -170,6 +174,12 @@
                         tolines[j1 + i] = toline[:start] + '\0' + \
                                           toline[start:end] + '\1' + \
                                           toline[end:]
+                    
+                    fromlines[i1 + i] = fromlines[i1 + i].encode('utf8')
+                    tolines[j1 + i] = tolines[j1 + i].encode('utf8')
+		    
             yield tag, i1, i2, j1, j2
 
     changes = []

Attachments

Change History

Changed 3 years ago by cboos

  • keywords unicode added
  • milestone set to 0.11

Changed 3 years ago by cboos

  • status changed from new to closed
  • resolution set to duplicate
  • milestone 0.11 deleted

Duplicate of #2363

Add/Change #2868 (garbled unicode chars in inline diff view)

Author



Change Properties
<Author field>
Action
as closed
Next status will be 'reopened'
to The owner will change from jonas. Next status will be 'closed'
 
Note: See TracTickets for help on using tickets.