Typed Python: Choose Sequence over List
What am I talking about?
I’ve been working with type hints in python for a few years now. Over time I’ve noticed certain patterns evolving in my code. This will be a short post on one of those patterns. It’s a small pattern where I try and be more precise in what I require or accept as a function input. Or more specifically why I try and default to writing the following:
from typing import Sequence
def do_a_thing(items: Sequence[float]):
...
instead of:
def do_a_thing(items: list[float]):
...
It’s not a hard rule. There are plenty of situations where list is fine (or even required) but defaulting to a sequence has a number of benefits.
Soft immutability (via a type checker)
If I try and write this code:
def calculate_sum_and_add_ten(items: Sequence[float]):
items.append(10)
return sum(items)
then I will get an error from mypy (or my IDE or any other type checker I may be running):
error: "Sequence[float]" has no attribute "append" [attr-defined]
This means I won’t accidentally mutate a list I pass in to the function. If I expect the function to mutate the list then I can communicate this fact by altering the typehint to a list. This helps make my intent clear. I took a list because I wanted a list (with all its mutability).
Covariance
Most of the time an int
can be treated as a float
. Your code will treat 5
as effectively being 5.0
. So a
function which accepts a float
can be passed an int
without any issues. This breaks down though once you have a
function taking a list of floats. If you try and write the following code:
def double_then_sum(items: list[float]):
return sum(item * 2 for item in items)
my_integers: list[int] = [2, 4]
my_doubled_total = double_then_sum(my_integers)
then mypy (or another type checker) will give the error:
error: Argument 1 to "double_then_sum" has incompatible type "list[int]"; expected "list[float]" [arg-type]
note: "List" is invariant -- see https://mypy.readthedocs.io/en/stable/common_issues.html#variance
note: Consider using "Sequence" instead, which is covariant
You can read a thorough writeup of what’s going in the page linked by the error,
but effectively it’s because the function signature double_then_sum(items: list[float])
means “I accept a list of
floats and may add a float to it”.
You can see why this wouldn’t work with this contrived example:
def add_5_point_0(items: list[float]):
items.append(5.0)
my_integers: list[int] = [2, 4]
add_5_point_0(my_integers)
print(my_integers)
# 2, 4, 5.0
# ^ this is clearly not an integer
Sequence
does not have the same problem because it cannot be appended to. So if I pass the function a list of integers
the type checker can ensure that it stays as a list of integers.
Accepts a wider variety of inputs
Another benefit of Sequence
is that it can accept a much wider variety of types (including custom classes written by you).
This makes it much easier to write functions that are re-usable and compose well together.
Consider my earlier double_then_sum
function. But this time I’ve got an input that’s a tuple. This seems like a
perfectly valid use-case. There’s no reason why I should have to convert this to a list.
my_integers = (2, 4, 5)
my_doubled_total = double_then_sum(my_integers)
However, mypy says the following:
error: Argument 1 to "double_then_sum" has incompatible type "tuple[int, int, int]"; expected "list[int]" [arg-type]
A tuple is not a list. But a tuple is a Sequence
.
Bonus - I would also consider Collection instead of Sequence
I also often go a step further with specifying the type to indicate exactly what I require. If the order of the items
doesn’t really matter to my function then I can hint as a Collection
instead.
My function double_then_sum
should probably work with sets:
some_set_of_numbers = set([2, 3, 4])
my_doubled_total = double_then_sum(some_set_of_numbers)
but mypy says:
Argument 1 to "double_then_sum" has incompatible type "set[int]"; expected "Sequence[int]" [arg-type]
I can fix this by swapping Sequence
for Collection
:
from typing import Collection
def double_then_sum(items: Collection[int]):
return sum(item * 2 for item in items)
some_set_of_numbers = set([2, 3, 4])
my_doubled_total = double_then_sum(some_set_of_numbers)
The type checker is then happy with this as a set is an instance of a Collection
.
Wrapping up
This is a fairly specific example but generally what I’ve been trying to do is be more intentional about what data types a function actually requires. I’ve found the upside is more bug free and more flexible code. It requires a little bit more thought on my side as I won’t just always reach for a list or a dict but I’m quite happy with the results.
Comments (from mastodon)