Describing binary bitload fields

Each bitload is a sequence of simple python values (numbers, bytestrings, booleans) in binary format. The slots in which each of the values fit are caled ‘fields’. Each field is associated with a name, data type, and, in most cases, length of the field in bits.

Let’s take a look at a simple example:

>>> message_format = (
...    ('id', 'hex', 128),
...    ('count', 'int', 4)
... )

This example defines two fields, ‘id’, and ‘count’. The ‘id’ field is a hex field and it’s 128 bits long. The other field is 4 bits long and its data type is an integer.

The following data types are supported:

  • ‘str’: string
  • ‘bytes’: raw bytestring
  • ‘int’: unsigned integer number
  • ‘hex’: hexadecimal representation of a number in string format
  • ‘bool’: boolean value
  • ‘pad’: padding bits

Here are some more examples:

>>> another_message = (
...     ('done', 'bool'),
...     (None, 'pad', 2)
...     ('note', 'str', 256),
... )

Note

The ‘bool’ data type does not need a length, since it is always 1 bit.

Note

The name of a field with ‘pad’ data type is ignored. By convention, we use None so it stands out, but you can use names like '*** PADDING ***' for a more dramatic effect.

The order in which the fields are added to bitloads is the order in which they appear in the tuple/iterable.

It is important to understand the limits of your data. There are no checks to make sure the source data will fit the bitload field, so you may get unexpected results if you are no careful (e.g., inserting a 10-bit integer into a 4-bit field will yield the wrong value after deserialization).

About the built-in types

The built-in types have conversion functions in the utils module. The functions use names that follow the '{type}_to_bita' and 'bita_to_{type}'‘ format. The following table gives an overview of possible input (serializable) and output (deserialized) values:

type inputs outputs
str bytes, str/unicode str/unicode
bytes bytes bytes
int int (unsigned long long) int (unsigned long long)
hex bytes, str/unicode (hex number as a string) bytes (hex number as a string)
bool any value (coerced using bool()) bool
pad n/a n/a

Note

All unicode strings are stored as UTF-8.

Dealing with other types of data

In order to deal with values that aren’t directly supported by one of the standard types, there are two possible strategies we can employ.

One strategy is to adapt the python values. For example, datetime objects can be represented as unsigned integers. Floats can also be represented as a product of an integer and negative power of 10, and we can therefore store only the integer and restore the float by multiplying with the same negative power of 10 after deserializing it. Signed integers can be represented by scaling them such as 0 represents the smallest negative value.

Another strategy is to use a custom type. This data type allows one to add completely new types with relative ease. The tuple for this data type looks like this:

>>> ('myfield', 'user', 24, serializer, deserializer)

Note

The use of 'user' type name is just an example. Any type that is not one of the types listed in this section can be used (i.e., any type other than ‘str’, ‘bytes’, ‘int’, ‘hex’, ‘bool’, and ‘pad’).

Two additional elements are the serializer and deserializer functions.

The serializer function takes a python value, and is expected to return a bitarray instance. The length of the output is not important as it will be adjusted to the correct length during serialization by padding with 0 or trimming off surplus bits. Keep in mind, though, that surplus bits are going to be trimmed off, which may not be what you want.

The deserializer function takes a bitarray instance, and is expected to return a python value. There are no restrictions on the return value.

Note

The bitarray documentation can be found on GitHub.