Pure Functions in Python: A Complete Beginner Guide
Table of Contents
If you take nothing else away from this article, please take this: pure functions are fantastic. A pure function has exactly two properties: it always returns the same value given the same arguments (it's deterministic), and running it causes no side effects. In short: pure functions don't do anything with anything that exists outside of their scope.
All the content from our Boot.dev courses are available for free here on the blog. This one is the "Pure Functions" chapter of Learn Functional Programming in Python. If you want to try the far more immersive version of the course, do check it out!
What Is a Pure Function in Python?
Click to play video
Here's a pure function. It only reads its argument and returns a new value:
def find_max(nums: list[int]) -> float:
max_val: float = float("-inf")
for num in nums:
if max_val < num:
max_val = num
return max_val
And here's an impure version. Instead of returning a value, this function modifies a global variable:
global_max: float = float("-inf")
def find_max(nums: list[int]) -> None:
global global_max
for num in nums:
if global_max < num:
global_max = num
The global keyword tells Python to allow modification of that outer-scoped variable.
Why Are Pure Functions Better?
Pure functions have a lot of benefits. Whenever possible, good developers try to use pure functions instead of impure functions. Because they don't change the external state of the program (for example, they don't change any variables outside of their scope) and don't perform any I/O operations, they're easier to test, debug, and think about.
One of the biggest differences between good and great developers is how often they incorporate pure functions into their code. Even if you're working in an imperative language like Python, you can (and should) write pure functions whenever reasonable. There's nothing worse than trying to debug a program where the order of function calls needs to be juuuuust right because they all read and modify the same global variable.
What's the Difference Between Reference and Value?
When you pass a value into a function as an argument, one of two things can happen:
- It's passed by reference: The function has access to the original value and can change it.
- It's passed by value: The function only has access to a copy. Changes to the copy within the function don't affect the original.
In Python, lists, dictionaries, and sets are passed by reference. Integers, floats, strings, booleans, and tuples are passed by value. Most container types are passed by reference (except for tuples!), and most basic types are passed by value.
Lists are passed by reference and are mutable:
def modify_list(inner_lst: list[int]) -> None:
inner_lst.append(4)
# the original "outer_lst" is updated
# because inner_lst is a reference to the original
outer_lst: list[int] = [1, 2, 3]
modify_list(outer_lst)
# outer_lst = [1, 2, 3, 4]
Integers are passed by value; they can be copied freely but are immutable:
def attempt_to_modify(inner_num: int) -> None:
inner_num += 1
# the original "outer_num" is not updated
# because inner_num is a copy of the original
outer_num: int = 1
attempt_to_modify(outer_num)
# outer_num = 1
How Do You Keep Function Inputs Immutable?
Because certain types in Python are passed by reference, we can mutate values that we didn't intend to. This is a form of function impurity! Remember, a pure function has no side effects. It shouldn't modify anything outside of its scope, including its inputs. It should return new copies of inputs instead of changing them.
This impure version mutates the dictionary the caller handed in:
def remove_format(default_formats: dict[str, bool], old_format: str) -> dict[str, bool]:
default_formats[old_format] = False
return default_formats
The pure version uses the .copy() method to create a new copy of the dictionary, leaving the original untouched:
def remove_format(default_formats: dict[str, bool], old_format: str) -> dict[str, bool]:
new_formats: dict[str, bool] = default_formats.copy()
new_formats[old_format] = False
return new_formats
Simply assigning a new variable to an existing dictionary doesn't copy that dictionary; it points to the same dictionary. That's why we use the .copy() method instead.
What Is I/O and Why Is It Impure?
The term "i/o" stands for input/output. In the context of writing programs, i/o refers to anything in our code that interacts with the "outside world." And "outside world" just means anything that's not stored in our application's memory (like variables). Examples include:
- Reading from or writing to a file on the hard drive
- Accessing the internet
- Reading from or writing to a database
- Even simply
printing to the console!!
All i/o is a form of "side effect." Even the print function (technically) has a side effect! It doesn't return anything, but it does print text to the console, which is a form of I/O.
When Should a Function Do I/O?
A program that doesn't do any i/o is pretty useless. What's the point of computing something if you can't see the results?
In functional programming, i/o is viewed as dirty but necessary. We know we can't eliminate i/o from our code, so we just contain it as much as possible. There should be a clear place in your project that does nasty i/o stuff, and the rest of your code can be pure. For example, a Python program might:
- Read a file from the hard drive as the program starts
- Run a bunch of pure functions to analyze the data
- Write the results of the analysis to another file on the hard drive at the end
What Is a No-Op and How Do You Avoid It?
A no-op is an operation that does... nothing. If a function doesn't return anything, it's probably impure. Apart from returning a value, the only reason for a function to exist is to perform a side effect. Otherwise it would be a no-op, right?
This function performs a useless computation, since it doesn't return anything or perform a side effect. It's a no-op:
def square(x: int) -> None:
x * x
This function lacks a return statement but performs a side effect: it changes the value of the y variable that is outside of its scope. It's impure:
y: int = 5
def add_to_y(x: int) -> None:
global y
y += x
add_to_y(3)
# y = 8
What Is Memoization in Python?
Memoization is a technical term that basically means caching (storing a copy of) the result of a computation so that we don't have to compute it again in the future. For example, take this simple function:
def add(x: int, y: int) -> int:
return x + y
A call of add(5, 7) will always evaluate to 12. If you think about it, once we know that add(5, 7) can be replaced with 12, we can just store 12 in memory as the result value. Then, the next time we need to add(5, 7), we can look up the value instead of repeating a (potentially expensive) CPU operation.
The slower and more complex the function, the more memoization can help speed things up. Step through how a cache fills in as a recursive function runs:
It's pronounced "memOization," not "memORization." This confused me for quite a while in college. I thought my professor just didn't speak goodly...
Why Can You Only Memoize Pure Functions?
Pure functions are always referentially transparent. "Referential transparency" is a fancy way of saying that a function call can be replaced by its would-be return value because it's the same every time. Referentially transparent functions can be safely memoized. For example add(2, 3) can be replaced by the value 5.
The great thing about pure functions is that it's always safe to memoize them. Impure functions often can't be memoized because they might perform a side effect in addition to returning a static value, or they might return different values given the same arguments.
Should you always memoize? No! Memoization is a tradeoff between memory and speed. If your function is fast to execute, it's probably not worth memoizing, because the amount of memory your program will need to store the results will go way up. It's also a bunch of extra code to write, so you should only do it if you have a good reason.
Pure functions are a part of functional programming, and the same ideas carry over to other languages like pure functions in Go. If you're still new to the language itself, our Learn Python course covers the fundamentals first.
Frequently Asked Questions
What is a pure function in Python?
A pure function is one that always returns the same value when given the same arguments and causes no side effects. It only reads its inputs and returns a new value, without touching any state outside of its own scope.
What is a side effect in Python?
A side effect is anything a function does that changes state outside its own scope, like modifying a global variable, mutating an input, or performing I/O such as writing to a file or printing to the console. Pure functions have no side effects.
Why are pure functions better than impure functions?
Pure functions are easier to test, debug, and reason about because their output depends only on their inputs. You never have to worry about hidden state or the order of function calls, and they can be safely cached through memoization.
Can you memoize an impure function in Python?
Usually no. Memoization relies on a function returning the same value for the same arguments. Impure functions may return different values or perform side effects, so caching their results would skip behavior the program depends on.
Are Python lists passed by reference or by value?
Lists, dictionaries, and sets are passed by reference in Python, so a function can mutate the original. Integers, floats, strings, booleans, and tuples are effectively passed by value. To stay pure, copy reference types before modifying them.
