This post contains an overview of the folder/file structure, PEP440 versioning, and PEP8 styling guide of a python application for deployment.
Folder Structure
The folder hierarchy is a balance between too many nested folders and a cluster of files in folders. Python files are broken up into features which then build modules (folders) which come together to build packages. The overall structure of a Python package is
foobar
is the name of the Python package in this example. Inside the root directory, setup files are used for setuptools
to package our application. setup.py
is used to install the package while setup.cfg
is used for an installation configuration details. The inner foobar
directory is where the actual package files will be held. A separate documentation folder called docs
holds files written in reStructuredText
format to be processed by the Sphinx Python library to generate documentation files such as HTML, LaTeX, and ePub. Other folders that may be included in the root directory are etc
for sample configuration files, bin
for binary files, and tools
for shell scripts and any other miscellaneous tool needed for your package.
In each folder for the inner foobar
package directory, an __init__.py
file is required to mark this folder as a module. The __init__.py
file is only ran once when the module is initially loaded into memory. Any subsequent imports the same Python library are cached as library import instances are singletons. This __init__.py
file will usually be empty but can be used to run setup/initialization logic when first importing the module. An example is a Google module that interacts with different Google APIs. The __init__.py
file can have initial importing of the python Google package and authentication setup since this is required and will only need to be ran once. It is not recommended to have all your logic in the __init_.py`
file.
Each folder or Python module in the package should be broken up into features. A good example is the Google feature example above. Each file in this Google module can represent a different Google product such as Gmail, Ad-sense, and YouTube. Code should not be broken up into their functionality such as all error checking in one file, all API calls in another, and data processing in the third. This requires a developer who is working on say, the Gmail portion to have three separate files open to handle all the logic involved in getting emails. Instead, all logic for retrieving, processing, and error handling for the Gmail API should all be located together in one ‘feature’ file (Gmail is the ‘feature’ you are adding to your application). Another thing to note is not every module needs it’s own folder. If all you need is one feature file then it’s unnecessary folder cluttering to only have a folder with an __init__.py
and one feature file. Instead just move that one feature file up to the package directory. Looking at the example above, the storage.py
feature is not in it’s own storage module as it’s just one file for this feature.
The final note about package folder structure is tests. Tests should be included close to where they are testing. Having a separate testing folder in the root directory of your package that follows the same folder structure as your actual package leads to unnecessary maintenance. You’ll not only need to keep continuous track of making sure the tests
and inner foobar
folder are is folder tree sync, but any time you need to open a deeply nested feature and it’s corresponding tests, you’ll need to dig for both of them separately. Just having them all together allows a developer to locate one folder and have everything they need right there. If you need to move a module for whatever reason, it’s as easy as moving the folder with it’s tests instead of moving folders twice to keep to folder hierarchy in sync. Using the folder example above, the storage.py
module tests are located in ./tests/test_storage.py
. As a side note, all test files begin with test_
then the name of the module it’s testing. This is standard pytest
configuration but can be changed.
Versioning
PEP440 defines the rules on package versioning. The rule can be condensed into regex as N[.N]+[{a|b|c|rc}N][.postN][.devN]
where N is a number 0-9. Let’s walk through each part to see what this means. The first N
is the major version followed by a optional minor version [.N]+
. Versions with the match N[.N]+
are consider final releases. Zeros are optional so 1.1.0
is the same as 1.1
. The next section [{a|b|c|rc}N]
is for testing based releases. a
is for an alpha release such as 1.1a1
which denotes the first alpha release for version 1.1
. b
is for beta releases and c
or rc
is for release candidates (whichever you prefer). An example of an rc release would be 1.1rc1
. Other less common suffixes such as.postN
for post releases and .devN
for developmental release can also be used. An example of a developmental release would be 1.1dev1
. A final note, semantic versioning, although similar, is not compatible with PEP440. Date based versioning such as 2019.03.21
is also discouraged.
Coding Styles
PEP8 defines an extensive list of rules on how Python should be formatted. Some of these rules you may disagree with and can certainly configure on linters such as pylint
or flake8
(pyflakes
and pep8
combination tool). We can’t go over all rules defined but here are a few basic rules:
- Use four spaces per indentation level. This can be a touchy subject into the debate of tabs versus spaces or two versus four space but strictly following the PEP8 guidelines, four spaces is chosen.
- Python files are encoded in ASCII or UTF-8 format.
- Limit all lines to a maximum of 79 characters. This may be seen as a silly rule with today’s monitor size. The rule originated with IMB punch cards which had a limit of 80 columns. This rule carried over as that’s what people were comfortable with reading. Although the rule doesn’t have to strictly followed today, there is a benefit of having a maximum line length for readability.
- Separate top-level function and class definitions with two blank lines. When writing Class
docstrings
a space is also required afterwards. - Use one module import per
import
statement and per line. If you are importing fromdatetime
then import everything you need on the same line. - Place
import
statements at the top of the file after comments, anddocstrings
. Group imports by standard library first, then third party, and finally local library imports. Each group of imports must also be in alphabetical order by import path. - Do not use extraneous white spaces between parentheses, square brackets, braces, or before commas.
- Class names are written on camel case. Error based classes are suffixed with
Error
and class functions (methods) are all lowercase snake case format. A leading underscore can be used for private attributes and methods.
Python PEP8 styling guide can be very strict but you’re planning to contribute to Python applications written by many developers it helps to know what these rules are. If you are coding in C for Python then PEP7 has styling rules for that.
Conclusion
Standardization helps developers know exactly what to look for when looking at a project for the first time. When all Python projects follow the same folder structure, versioning, and style guide, it can be easier for developers to get up to speed on the application to dive into development. The same coding patterns seen over and and over can also help train your mind to know exactly where to look when reviewing code for more efficient and better code reviewing.
One Comment on “Overview of Python Project Layouts”