In a previous post i was discussing about PyRXP .. Now i’m playing with PyRXPU which aims to PyRXP with Unicode support … But, i’m not sure to understand the crap strings it returns:
>>> import pyRXPU >>> pyRXPU.version '1.05' >>> doc = pyRXPU.Parser().parse('<foo>Bar</foo>') >>> doc (u'\U006f0066o\x10', None, [u'\U00610042r\x10'], None)
Seems like it’s UTF-16, but i couldn’t managed to print such strings:
>>>print doc[0] Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'latin-1' codec can't encode character u'\U006f0066' in position 0: ordinal not in range(256) >>>print doc[0].encode('utf16') ÿþÃfÃo
Where’s my foo ? :)