76
functions. Memoizing, or automatic caching, may help with performance if the function is called
repeatedly, since it allows you to avoid redoing the preparation of the format for the struct
unpacking. See also Recipe 17.8
.
In a purely Python context, the point of this recipe is to remind you that
struct.unpack
is
often viable, and sometimes preferable, as an alternative to string slicing (not quite as often as
unpack
versus
substr
in Perl, given the lack of a
*
-valued field length, but often enough to
be worth keeping in mind).
Each of these snippets is, of course, best encapsulated in a function. Among other advantages,
encapsulation ensures we don't have to work out the computation of the last field's length on each
and every use. This function is the equivalent of the first snippet in the solution:
def fields(baseformat, theline, lastfield=None):
numremain = len(theline)-struct.calcsize(baseformat)
format = "%s %d%s" % (baseformat, numremain, lastfield
and "s" or "x")
return struct.unpack(format, theline)
If this function is called in a loop, caching with a key of
(baseformat,
len(theline),
lastfield)
may be useful here because it can offer an easy speed-up.
The function equivalent of the second snippet in the solution is:
def split_by(theline, n, lastfield=None):
numblocks, therest = divmod(len(theline), n)
baseblock = "%d%s"%(n, lastfield and "s" or "x")
format = "%s %dx"%(baseblock*numblocks, therest)
And for the third snippet:
def split_at(theline, cuts, lastfield=None):
pieces = [ theline[i:j] for i, j in zip([0]+cuts, cuts) ]
if lastfield:
pieces.append(theline(cuts[-1:]))
return pieces
In each of these functions, a decision worth noticing (and, perhaps, worth criticizing) is that of
having a
lastfield=None
optional parameter. This reflects the observation that while we
often want to skip the last, unknown-length subfield, sometimes we want to retain it instead. The
use of
lastfield
in the expression
lastfield and
"s" or "x"
(equivalent to C's
lastfield?'s':'c'
) saves an
if/else
, but it's unclear whether the saving is worth it.
"sx"[not lastfield]
and other similar alternatives are roughly equivalent in this respect;
see Recipe 17.6
. When
lastfield
is false, applying
struct.unpack
to just a prefix of
theline
(specifically,
theline[:struct.calcsize(format)]
) is an alternative,
but it's not easy to merge with the case of
lastfield
being true, when the format does need a
supplementary field for
len(theline)-struct.calcsize(format)
.
3.11.4 See Also
Recipe 17.6
and Recipe 17.8
; Perl Cookbook Recipe 1.1.