Ticket #2868 (closed defect: duplicate)
garbled unicode chars in inline diff view
| Reported by: | Andrew Stromnov | Owned by: | jonas |
|---|---|---|---|
| Priority: | normal | Milestone: | |
| Component: | version control/changeset view | Version: | devel |
| Severity: | minor | Keywords: | diff unicode |
| Cc: |
Description
garbled unicode chars in highlighted inline diff view
Example: diff between str1="АШИПКА" and str2="ОШИБКА"
- str1 and str2 passed to markup_intraline_changes as raw strings (not unicode) as '\xd0\x90\xd0\xa8\xd0\x98\xd0\x9f\xd0\x9a\xd0\x90' and '\xd0\x9e\xd0\xa8\xd0\x98\xd0\x91\xd0\x9a\xd0\x90' accordingly
- Then str1 and str2 passed to _get_change_extent. But in this raw strings extent calculated from '\x9e' (second octet of UTF8 char).
- Results after tag substitution: '\xd0<del>\x90\xd0\xa8\xd0\x98\xd0\x9f</del>\xd0\x9a\xd0\x90' and '\xd0<add>\x9e\xd0\xa8\xd0\x98\xd0\x91</add>\xd0\x9a\xd0\x90'. First UTF8 chars are broken.
Possible fix: use unicode strings for extent calculation.
Quick (and dirty) hack:
--- diff.py.orig Mon Mar 13 13:43:21 2006
+++ diff.py Mon Mar 13 15:26:11 2006
@@ -148,6 +147,11 @@
if tag == 'replace' and i2 - i1 == j2 - j1:
for i in range(i2 - i1):
fromline, toline = fromlines[i1 + i], tolines[j1 + i]
+
+ fromline, toline = fromline.decode('utf8'), toline.decode('utf8')
+
(start, end) = _get_change_extent(fromline, toline)
if start == 0 and end < 0:
@@ -170,6 +174,12 @@
tolines[j1 + i] = toline[:start] + '\0' + \
toline[start:end] + '\1' + \
toline[end:]
+
+ fromlines[i1 + i] = fromlines[i1 + i].encode('utf8')
+ tolines[j1 + i] = tolines[j1 + i].encode('utf8')
+
yield tag, i1, i2, j1, j2
changes = []
Attachments
Change History
Note: See
TracTickets for help on using
tickets.


