‹ back home

Notes on pre-commit

2023-01-12 #development #open-source #python

pre-commit is a tool to configure git pre-commit hooks. It allows defining hooks in a simply syntax and runs them only if files of a specific type change (e.g.: run mypy only if *.py files have changed).

Additionally, it’ll stash unstaged changes when running, so any files that are not being committed won’t interefere with the hooks. This makes hooks fail if you forgot to commit a file, and prevents hooks from failing if some uncommitted file has broken code.

All of pre-commit’s configuration is defined in .pre-commit-config.yaml.

Pre-commit plugins

Pre-commit has a concept of plugins. These include a definition on how to install a checker, and how to run it.

For example, this mypy plugin installs mypy (in a virtualenv, as with all python-based plugins). If a project should require that mypy run with additional dependencies installed, these need to be defined in the configuration file too.

Here’s an example snippet for the mypy configuration for todoman:

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: "v0.991"
    hooks:
      - id: mypy
        additional_dependencies:
          - types-atomicwrites
          - types-tabulate
          - types-freezegun
          - types-pytz
          - types-python-dateutil

Duplication of definitions

After using pre-commit for many months, I started to get frustrated at the way dependencies for plugins need to be kept in sync. mypy and its dependencies are already specified in requirements-dev.txt, which is used by developers and contributor to bootstrap their development environment. The same is true for most other plugins (flake8, for example).

The situation is the same for nodejs-based plugins (where the dependencies constantly need to be kept in sync between .pre-commit-config.yaml and package.json; the latter being used by IDEs and developer tools during development).

Duplication of downloads and installations

A nice feature of pre-commit is that it takes care of downloading and installing all tools required by plugins into isolated environments. However, this comes at a price; each time any of these change (or each time a developer commits in a new project) all plugins are installed into their own environment. This isn’t terribly slow, but the problem is that the development environment will usually already have these tools installed.

If I am working on a project that uses flake8 for checking errors, I’ll be using a development environment with flake8 installed, along with the right set of flake8-plugins for that project. So having all these get re-installed is an annoyance with arguable value.

I would very much like to see a pre-commit plugin that merely checks that the environment being used matches the one defined in requirements-dev.txt (or whatever convention this project is using). While pre-commit’s approach does ensure that the right version is used when running hooks, it does so by installing its own copy, and doesn’t even hint to the developer that they might be using the wrong version for development (this also implies their IDE would be using the wrong version, and them getting bogus diagnostics).

Finally, some plugins share dependencies with the main project itself, and these dependencies need to be kept in sync. In my experience, I have always seen this be address by leaving a comment # make sure these are updated in sync, and developers manually updating the pinned versions in both places.

Using the system plugin language

Pre-commit can handle installing tools and their dependencies in many languages (e.g.: it can create virtualenvs for tools in Python, etc). Additionally, there’s a system language which means “expect this tool to be installed in the host system”. This can be used to avoid re-installed dependencies that are expected to be available on a development setup.

For example, for the above mypy example can be rewritten like:

  - repo: local
      - id: mypy
        name: mypy
        language: system
        entry: mypy
        types_or: [python, pyi]

This will use whichever mypy is available in $PATH, which should be the correct one if the developer has set up their system correctly, but will be the right one on CI (assuming CI is configured to installed all development tools to the correct version). Again, an extra hook that confirms this would be pretty valuable.

This approach works very well on CI too; assuming that a pipeline installs requirements-dev.txt before the CI run, then using pre-commit run ... will use those versions of each tool.

pre-commit.ci

pre-commit.ci is a service which runs pre-commit hooks as CI pipelines. This is a proprietary service which runs only on proprietary forges (and, as far as I know, has no public API).

It seems to cache plugins locally based on their definition, which makes it run very quickly. The general idea seems pretty solid, but regrettably requires fully buying into depending on proprietary tools and services for development. Using pre-commit itself directly on any regular pipeline works well enough.

Final thoughts

All in all, I think pre-commit does a good job at what it intends to do. I think the “install tools into an isolated environment” feature is a bit of an overkill that adds more complexity that it’s worth (though it does play well with its proprietary CI implementation).

It’s ability to only run hooks if relevant files have changed and skipping unstashed changes are both of great value on top of git’s default hooks, and using hooks with language: system seems to work around its major shortcomings.

— § —