7 Writing Python code

Note

This guidebook is written following the diátaxis “how-to guide” style. And because this document reflects how we work in the Seedcase Project, it is living and constantly evolving. It won’t ever be in a state of “done”.

What Python code to write in your Python package is based on the design you’ve created for the package and isn’t something that a guide can (easily) help with. However, how you write Python code is something that a guide can help with. This page is a guide to some common and recommended practices for writing Python code within the context of developing a Python package, including lessons we’ve learned.

7.1 General coding style

When writing Python code, you should follow a common and well-established style guide. The official style guide for Python is called PEP 8, and it is a comprehensive guide on different aspects of writing Python code. So, you should start by reading the PEP 8 guide to learn some basics on the style of Python code.

However, it doesn’t cover everything. For additional style guidelines, there are several other guides you can refer to. The one we recommend and use is the Google Style Guide. This is the second resource you should use to learn about how to write effective and well-styled Python code.

Thankfully, there are many aspects of both PEP 8 and the Google Style Guide that can be automatically checked and fixed. The tools we use and recommend are Ruff for general styling and mypy for type checking.

To use Ruff and mypy, you need to be in a Python project. If you have created a Python project with our template-python-package, then running Ruff can be done through our justfile. In your terminal, you can run this:

Terminal

just format-python

Which will run Ruff and format your Python code. Then, to check if your code needs manual fixes using Ruff and mypy, you can check the code with:

Terminal

just check-python

Tip

If you didn’t use our template nor do you want to add Ruff to your project’s dependencies, you can use Ruff through uv:

Terminal

uvx ruff format
# Tries to fix the linted issues, a bit different from `format`
uvx ruff check --fix
uvx ruff check
uvx mypy .

Aside from the above tools, IDEs usually have extensions or support for using formatters and linters. Our template includes settings and extension recommendations for VS Code, which will allow you to write code that will be automatically formatted and checked as you write.

7.2 File and folder structure when writing code

Before knowing which files to write code in and what the structure is for your Python package, you need to know in general what code you want to write and specifically what code the user(s) of your package will use. An effective software design principle is to have few public-facing functions, methods, and classes in a package, but internally have many more functions, methods, and classes. This internal code is used to keep the code for the public-facing functionality smaller and clearer in intent and purpose.

Unfortunately, Python does not have any process for making functionality completely private or hidden from the user. That means that even internal functionality is exposed to the user. However, Python does have a convention of using an underscore (_) before the name of a function, method, or class to communicate that it is internal. So, a best practice is to use these underscores to communicate whether an object is internal or not.

Once you know (generally) what code you want exposed to the user, you can decide where to write the code. Python is very flexible in this aspect, as how and where you can write code isn’t strictly set. This can be a good thing, but often makes it difficult to have a consistent and standardised way of writing code.

We’ve found these following guidelines to help keep things organised and consistent:

An individual public-facing functionality has its own Python file. Usually this means that each public function or class has its own file, but if the functionality is similar enough, multiple functions or classes could be grouped together in a single file.
If some internal functionality is only used by public-facing functionality located in a single file, keep the internal functionality in that same file.
If some internal functionality is used by multiple public-facing functionalities, then you should write it in a separate file in a folder called internals/.
If internal functionality is kept in the internals/ folder, then group the internal functions into files by their common action.

Let’s look at an example of how this works in practice. Let’s say you have a package called health_data_checks. In this package there are two public functions called check_blood_pressure() and check_heart_rate(). Both of these functions use an internal function called _is_data_object() while only check_blood_pressure() uses a function _has_two_measures() (since blood pressure is often measured twice). The file structure might look like:

health-data-checks/
└── src/
    └── health_data_checks/
        ├── __init__.py
        ├── check_blood_pressure.py
        ├── check_heart_rate.py
        └── internals/
            ├── __init__.py
            └── is.py

Since _is_data_object() has an action is, it is put into the is.py file in the internals/ folder. However, since _has_two_measures() is only used in check_blood_pressure(), it is kept in that file. By naming the file after the action, you can add other internal functions with the same action to this file as well. This keeps the internal code organised around actions and also allows you to use from internals import FUNCTION when needing to use the internal functionality.

7.3 How to write code

As in nearly all programming languages, how you write code is up to you as the developer. The exact details depend heavily on the problem your software is trying to solve, the design, the constraints, and the requirements.

However, there are some good general practices to follow that will help you write more effective and maintainable Python code.

7.3.1 Avoiding “magic”

Code that is “magic” does something unexpected or unpredictable. Usually this is because it has “side effects” or conditionally changes its action in fundamental ways that are not always obvious.

The way you avoid magic is by writing code that is simple, predictable, and consistent. This means that writing code in a more functional programming style tends to produce more maintainable and more predictable code. Strategies for writing more functional code include:

Using clear inputs and one type of output for functions and methods. There is more explanation on this in the type hints section below.
Avoiding side effects—described more below—by only outputting the result of the function or method. This includes not modifying the input data or variables by treating them as immutable (not changeable).
Minimizing internal conditional statements that change the action or behaviour of the code.
Preferring functionals, also called higher-order functions, like map(), filter(), and reduce() over loops or list comprehensions, as they tend to be more predictable and easier to read and write. A major benefit to writing more functional code is that it is much easier to create unit tests for it and it is easier for the user to understand what the code does.

Note

Sometimes it’s impossible to write in a purely functional style. For example, when you need to write to a file or database, this action is inherently a side effect—when an action is done that isn’t part of the returned output. And that’s why the most successful programming languages, like Python, are those that are flexible in the type of style you can use to write code. However, Python’s functional programming capabilities are not as strong as those of some other languages, which can sometimes make it quite difficult to write functional code.

7.3.2 Type hints

For keeping the input and output consistent and clear, it helps to write type hints. Type hints are a way to formally declare the type of object that is the input or output by a function or method. For example, for the function check_heart_rate(), you might write:

src/health_data_checks/check_heart_rate.py

def check_heart_rate(data: dict[str, Any]) -> bool:
    # Code ...
    return boolean_output

Where data: dict[str, Any] tells Python that the input to the function must be a dict object with str or Any as internal data types and the -> bool tells Python that the output of the function has to be a boolean value (True or False). So if you look at the documentation for the function or method, you will be able to read what object types the code needs and what it outputs.

Note

If you’ve used our template-python-package, then mypy will be setup and type hints will be required. Otherwise mypy will give an error. Unlike some other programming languages, Python does not enforce type hints when the code is run, as it is dynamically type-checked. This means that you can use any type of object as input to the function, even if it is not the type you declared in the type hint. While this is a feature of Python, it does make it more difficult to write code that is predictable and consistent.

7.3.3 Side effects

Side effects happen when the code does something that isn’t obvious from the name of the function or method, when it performs actions without returning a value, or when it does something in addition to returning a value.

Some side effects are intentional and can’t be avoided, like writing to a file or database, which are inherently side effects. But in general, you want to avoid side effects as they introduce unpredictability in your code.

For example, if you use a function called check_heart_rate(), you probably expect it to output a boolean value that tells you if the heart rate is normal or not. However, if it also writes to a file or modifies the data in some way, then it has a side effect. A side effect for check_heart_rate() might look like:

src/health_data_checks/check_heart_rate.py

import json

def check_heart_rate(data: dict[str, Any]) -> bool:
    # Side effect: Writing
    with open('file.txt', 'w') as file:
        file.write(json.dumps(data))
    # Pseudocode of a hypothetical check
    normal = data["heart_rate"] > 30 or data["heart_rate"] < 180
    return normal

But without a side effect, it would be:

src/health_data_checks/check_heart_rate.py

def check_heart_rate(data: dict) -> bool:
    # Pseudocode of a hypothetical check
    normal = data["heart_rate"] > 30 or data["heart_rate"] < 180
    return normal

Other side effects include modifying objects in the global scope, such as modifying the input arguments. Using the same check_heart_rate() function, a side effect that modifies the input might look like this:

src/health_data_checks/check_heart_rate.py

# Fake data
data = {
    "heart_rate": 30,
}


def check_heart_rate(data: dict) -> bool:
    # Side effect: Modifying the input in the global scope.
    normal = data["heart_rate"] > 30 or data["heart_rate"] < 180
    data.update(check=normal)
    return normal


# Original data
print(data)
print(check_heart_rate(data))
# Modified data
print(data)

A function should never modify objects in the global scope or modify the input arguments, as it makes it much harder to understand, predict, and test.

7.3.4 Internal code

Rarely does a public-facing function or method do everything within itself. That would end up making the code very long and difficult to read. Instead, you create smaller, well-defined internal functions, methods, or classes that do very specific actions and use them to build up the larger public-facing functionality.

How you write this internal functionality is basically no different from writing public-facing functionality. The only difference is that the name of internal objects should start with an underscore (_) and if shared by multiple public-facing functions or methods, they should go in the internals/ folder (as described in file and folder structure above).

To easily export and use the functions in the internals module, create an internals/__init__.py file and add the function or method to it by writing something like:

src/health_data_check/internals/__init__.py

# Import the `is.py` script by using the relative
# `.` import syntax.
from .is import _is_data_object

# Use `__all__` to explicitly define what is exported
# from this module.
__all__ = [
    "_is_data_object",
]

This way, when you import the internal functionality in other scripts, you can use either:

from internals import _is_data_object

# Using the function.
_is_data_object(data)

Or:

import internals

# Using the function
internals._is_data_object(data)

We tend to prefer the first option, as it explicitly states the function used which makes it easier to know what the script it is from in the internals/ module.

7.4 Writing code documentation

Documentation is one of the most important parts of writing code. Without good documentation, code is nearly unusable or at least extremely difficult to use. Documenting at the code level is done with “docstrings” in Python.

At the very least, you should write a docstring for every public-facing function, method, and class. This helps the user understand what the code does, what it needs as input, and what it outputs. You might think this sounds similar to the type hints described above, but docstrings are different: They explain what the code does and expects in natural language, while type hints communicate in a much more technical language what data types the code needs and outputs. Together, they enhance both code clarity and reliability. For internal code, docstrings are not strictly required, but we still recommend writing them (see guideline below about this).

As with writing code, you should follow a style guide for writing docstrings. The most common style guides for writing docstrings are the PEP 257 and Google style guides. We use and recommend the Google style guide, as it is more comprehensive. In addition to the Google style guide, other guidelines you should follow are:

Use the """ triple quotes for docstrings.
Use sentence case for describing things and end with a full stop. So use path: The path to the file. rather than path: the path to the file.
Include example code in the Quarto code chunk format.
Write simple one-line docstrings for:
- Very small functions or methods (for example, <5 lines)
- Functions or methods with <2 arguments
- Functions or methods with simple and clear input and output
Write longer and more detailed docstrings for:
- Larger and/or more complicated internal functions, classes, or methods, such as those with non-standard input or output
- User-facing functions, classes, or methods

Tip

When using VS Code and adding docstrings, you can use the autoDocstring VS Code extension to automatically generate a skeleton structure for the docstrings.

Following the Google style for functions or methods, the docstring for the check_heart_rate() function looks like this:

src/health_data_checks/check_heart_rate.py

def check_heart_rate(data: dict[str, Any]) -> bool:
    """Check if the heart rate is within a normal range.

    Args:
        data: The data object as a dict containing the
            "heart_rate" variable.

    Returns:
        bool: True if the heart rate is not normal,
            False otherwise.

    Example:
        ```{python}
        # Fake data
        data = {
            "heart_rate": 30,
        }
        print(check_heart_rate(data))
        ```
    """
    # Pseudocode of a hypothetical check
    normal = data["heart_rate"] > 30 or data["heart_rate"] < 180
    return normal