Describing binary bitload fields¶
Each bitload is a sequence of simple python values (numbers, bytestrings, booleans) in binary format. The slots in which each of the values fit are caled ‘fields’. Each field is associated with a name, data type, and, in most cases, length of the field in bits.
Let’s take a look at a simple example:
>>> message_format = (
... ('id', 'hex', 128),
... ('count', 'int', 4)
... )
This example defines two fields, ‘id’, and ‘count’. The ‘id’ field is a hex field and it’s 128 bits long. The other field is 4 bits long and its data type is an integer.
The following data types are supported:
- ‘str’: string
- ‘bytes’: raw bytestring
- ‘int’: unsigned integer number
- ‘hex’: hexadecimal representation of a number in string format
- ‘bool’: boolean value
- ‘pad’: padding bits
Here are some more examples:
>>> another_message = (
... ('done', 'bool'),
... (None, 'pad', 2)
... ('note', 'str', 256),
... )
Note
The ‘bool’ data type does not need a length, since it is always 1 bit.
Note
The name of a field with ‘pad’ data type is ignored. By convention, we use
None
so it stands out, but you can use names like '*** PADDING ***'
for a more dramatic effect.
The order in which the fields are added to bitloads is the order in which they appear in the tuple/iterable.
It is important to understand the limits of your data. There are no checks to make sure the source data will fit the bitload field, so you may get unexpected results if you are no careful (e.g., inserting a 10-bit integer into a 4-bit field will yield the wrong value after deserialization).
About the built-in types¶
The built-in types have conversion functions in the utils
module. The functions use names that follow the '{type}_to_bita'
and
'bita_to_{type}'
‘ format. The following table gives an overview of possible
input (serializable) and output (deserialized) values:
type | inputs | outputs |
---|---|---|
str | bytes , str /unicode |
str /unicode |
bytes | bytes |
bytes |
int | int (unsigned long long) |
int (unsigned long long) |
hex | bytes , str /unicode
(hex number as a string) |
bytes
(hex number as a string) |
bool | any value
(coerced using bool() ) |
bool |
pad | n/a | n/a |
Note
All unicode strings are stored as UTF-8.
Dealing with other types of data¶
In order to deal with values that aren’t directly supported by one of the standard types, there are two possible strategies we can employ.
One strategy is to adapt the python values. For example,
datetime
objects can be represented as unsigned integers.
Floats can also be represented as a product of an integer and negative power of
10, and we can therefore store only the integer and restore the float by
multiplying with the same negative power of 10 after deserializing it. Signed
integers can be represented by scaling them such as 0 represents the smallest
negative value.
Another strategy is to use a custom type. This data type allows one to add completely new types with relative ease. The tuple for this data type looks like this:
>>> ('myfield', 'user', 24, serializer, deserializer)
Note
The use of 'user'
type name is just an example. Any type that is not
one of the types listed in this section can be used (i.e., any type other
than ‘str’, ‘bytes’, ‘int’, ‘hex’, ‘bool’, and ‘pad’).
Two additional elements are the serializer
and deserializer
functions.
The serializer
function takes a python value, and is expected to return a
bitarray
instance. The length of the output is not
important as it will be adjusted to the correct length during serialization by
padding with 0 or trimming off surplus bits. Keep in mind, though, that surplus
bits are going to be trimmed off, which may not be what you want.
The deserializer
function takes a bitarray
instance,
and is expected to return a python value. There are no restrictions on the
return value.
Note
The bitarray documentation can be found on GitHub.