Know your Python container types
This is the last of a series of posts I’m doing as a sort of Python/Django Advent calendar, offering a small tip or piece of information each day from the first Sunday of Advent through Christmas Eve. See the first post for an introduction.
Python contains multitudes
There are a lot of container types available in the Python standard library, and it can be confusing sometimes to keep track of them all. So since it’s Christmas Eve and time to wrap any last-minute gifts, I’d like to wrap up this “Advent calendar” series with a guide to the most common types of data containers and what kinds of things you might want to wrap in them.
listis a mutable data type, and often used to store multiple objects of the same type — though unlike, say, arrays in other languages like C, there’s no requirement that all values in alistbe of the same type. Just note that many static type checkers for Python will default to assuming a list’s contents are all of the same type (i.e., if it sees you put anintin a given list, it will assume a type oflist[int]and error if you then add a value of another type).tupleis an immutable/heterogeneous data type, closer in purpose to the “record” types or structs of other languages. Very often you’ll see code which generates lots of tuples which each have the same “structure” — for example, a color library might represent color values as 3-tuples ofint.collections.namedtupleandtyping.NamedTupleare two different ways to write the same thing: tuple subclasses with fields that can be accessed by name as well as by numeric index, and instantiated using keyword-argument syntax. The key difference is thetyping.NamedTupleversion supports a type-hint-based declarative syntax. I like using named tuples as a way to define tuple types that will be reused a lot (in the example above, it would probably make sense to define anRGBColornamed tuple withred,green, andbluefields).setis a container which enforces uniqueness of its elements. No matter how many times you add the same value to a set, it still ends up with only one copy.dict(short for “dictionary”) is a hash table, mapping keys to values; there is no requirement that all keys or values be the same type, but type checkers will generally still assume such a requirement. There’s alsotyping.TypedDictfor explicitly type-hinting the expected structure of a dictionary.dataclasses.dataclassis not really a “container” at all, though it sometimes gets used as one. Thedataclassdecorator is primarily a shortcut for declaring a class with a set of attributes (using type-hint syntax) and having it auto-derive a constructor for you which will expect arguments for those attributes and set them appropriately (though it will not do runtime type-checking of the values of those arguments).
There are also other container types in the standard library — the collections module and the array module, for example, provide some specialized container types like Counter, which acts as a histogram structure, or array.array which works like a numeric-type array in C — but they’re more rarely used.
In general, my advice is:
- Use a
listfor most cases where you just want an iterable/indexable sequence. - Use a
tupleas a struct-like type where multiple instances will have the same structure, but consider using named tuples to make that structure clearer (a lot of people don’t like named tuples because of the fact that they support iteration and numeric indexing as well as named field access, but I don’t personally mind this). - Use
dictfor key-value mappings. - Use
setwhen uniqueness matters, though this is not super common; most of the value of sets is in the union/intersection/etc. operations they support. - Don’t use
dataclassas a “super-tuple”; if what you really want is just a plain data container with named field access, just use a named tuple. Usedataclasswhen you also want the result to be an ordinary mutable Python class (tuples are immutable) with potentially extra behavior attached via methods. - Avoid most of the other container types unless you know they’re the right thing for your specific use case. And if you’re not sure whether they’re right, they aren’t; you generally will just know when one of them is the right fit (for example,
collections.Counteris very useful for the exact specific thing it does, and not really useful at all for anything else).