43
5. Ah, here’s the fix. Instead of taking the last element of the byte array, use
list slicing to create a new byte
array containing just thelast element. That is, start with thelast element and continue the slice until theend
of the byte array. Now
mLastChar
is a bytearray of length 1.
6. Concatenating abytearray of length 1 with abytearray of length 3 returns a new byte array of length 4.
So, to ensure that the
feed()
method in
universaldetector.py
continues to work no matter how often it’s
called, you need to
initialize
self._mLastChar
as a 0-length bytearray, then make sure it stays a byte array.
self._escDetector.search(self._mLastChar + aBuf):
self._mInputState = eEscAscii
- self._mLastChar = aBuf[-1]
+ self._mLastChar = aBuf[-1:]
15.6.7. ord()
EXPECTEDSTRING OF LENGTH
1,
BUT
int
FOUND
Tired yet? You’re almost there…
C:\home\chardet> python test.py tests\*\*
tests\ascii\howto.diveintomark.org.xml ascii with confidence 1.0
tests\Big5\0804.blogspot.com.xml
Traceback (most recent call last):
File "test.py", line 10, in <module>
u.feed(line)
File "C:\home\chardet\chardet\universaldetector.py", line 116, in feed
if prober.feed(aBuf) == constants.eFoundIt:
File "C:\home\chardet\chardet\charsetgroupprober.py", line 60, in feed
st = prober.feed(aBuf)
File "C:\home\chardet\chardet\utf8prober.py", line 53, in feed
codingState = self._mCodingSM.next_state(c)
File "C:\home\chardet\chardet\codingstatemachine.py", line 43, in next_state
byteCls = self._mModel['classTable'][ord(c)]
TypeError: ord() expected string of length 1, but int found
405