Know your Python container types
This is the last of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Python contains multitudes
There are a lot of container types available in the Python standard library, and it can be confusing sometimes to keep track of them all. So since it’s Christmas Eve and time to wrap any last-minute gifts, I’d like to wrap up this “Advent calendar” series with a guide to the most common types of data containers and what kinds of things you might want to wrap in them.
list
is a mutable data type, and often used to store multiple objects of the same type — though unlike, say, arrays in other languages like C, there’s no requirement that all values in alist
be of the same type. Just note that many static type checkers for Python will default to assuming a list’s contents are all of the same type (i.e., if it sees you put anint
in a given list, it will assume a type oflist[int]
and error if you then add a value of another type).tuple
is an immutable/heterogeneous data type, closer in purpose to the “record” types or structs of other languages. Very often you’ll see code which generates lots of tuples which each have the same “structure” — for example, a color library might represent color values as 3-tuples ofint
.collections.namedtuple
andtyping.NamedTuple
are two different ways to write the same thing: tuple subclasses with fields that can be accessed by name as well as by numeric index, and instantiated using keyword-argument syntax. The key difference is thetyping.NamedTuple
version supports a type-hint-based declarative syntax. I like using named tuples as a way to define tuple types that will be reused a lot (in the example above, it would probably make sense to define anRGBColor
named tuple withred
,green
, andblue
fields).set
is a container which enforces uniqueness of its elements. No matter how many times you add the same value to a set, it still ends up with only one copy.dict
(short for “dictionary”) is a hash table, mapping keys to values; there is no requirement that all keys or values be the same type, but type checkers will generally still assume such a requirement. There’s alsotyping.TypedDict
for explicitly type-hinting the expected structure of a dictionary.dataclasses.dataclass
is not really a “container” at all, though it sometimes gets used as one. Thedataclass
decorator is primarily a shortcut for declaring a class with a set of attributes (using type-hint syntax) and having it auto-derive a constructor for you which will expect arguments for those attributes and set them appropriately (though it will not do runtime type-checking of the values of those arguments).
There are also other container types in the standard library — the collections
module and the array
module, for example, provide some specialized container types like Counter
, which acts as a histogram structure, or array.array
which works like a numeric-type array in C — but they’re more rarely used.
In general, my advice is:
- Use a
list
for most cases where you just want an iterable/indexable sequence. - Use a
tuple
as a struct-like type where multiple instances will have the same structure, but consider using named tuples to make that structure clearer (a lot of people don’t like named tuples because of the fact that they support iteration and numeric indexing as well as named field access, but I don’t personally mind this). - Use
dict
for key-value mappings. - Use
set
when uniqueness matters, though this is not super common; most of the value of sets is in the union/intersection/etc. operations they support. - Don’t use
dataclass
as a “super-tuple”; if what you really want is just a plain data container with named field access, just use a named tuple. Usedataclass
when you also want the result to be an ordinary mutable Python class (tuples are immutable) with potentially extra behavior attached via methods. - Avoid most of the other container types unless you know they’re the right thing for your specific use case. And if you’re not sure whether they’re right, they aren’t; you generally will just know when one of them is the right fit (for example,
collections.Counter
is very useful for the exact specific thing it does, and not really useful at all for anything else).