pre-commit
is a tool to configure git pre-commit hooks. It allows defining
hooks in a simply syntax and runs them only if files of a specific type change
(e.g.: run mypy
only if *.py
files have changed).
Additionally, it’ll stash unstaged changes when running, so any files that are not being committed won’t interefere with the hooks. This makes hooks fail if you forgot to commit a file, and prevents hooks from failing if some uncommitted file has broken code.
All of pre-commit
’s configuration is defined in .pre-commit-config.yaml
.
Pre-commit plugins
Pre-commit has a concept of plugins. These include a definition on how to install a checker, and how to run it.
For example, this mypy plugin installs mypy (in a virtualenv, as with all python-based plugins). If a project should require that mypy run with additional dependencies installed, these need to be defined in the configuration file too.
Here’s an example snippet for the mypy
configuration for todoman:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v0.991"
hooks:
- id: mypy
additional_dependencies:
- types-atomicwrites
- types-tabulate
- types-freezegun
- types-pytz
- types-python-dateutil
Duplication of definitions
After using pre-commit
for many months, I started to get frustrated at the
way dependencies for plugins need to be kept in sync. mypy
and its
dependencies are already specified in requirements-dev.txt
, which is used by
developers and contributor to bootstrap their development environment. The same
is true for most other plugins (flake8
, for example).
The situation is the same for nodejs
-based plugins (where the dependencies
constantly need to be kept in sync between .pre-commit-config.yaml
and
package.json
; the latter being used by IDEs and developer tools during
development).
Duplication of downloads and installations
A nice feature of pre-commit
is that it takes care of downloading and
installing all tools required by plugins into isolated environments. However,
this comes at a price; each time any of these change (or each time a developer
commits in a new project) all plugins are installed into their own environment.
This isn’t terribly slow, but the problem is that the development environment
will usually already have these tools installed.
If I am working on a project that uses flake8
for checking errors, I’ll be
using a development environment with flake8
installed, along with the right
set of flake8
-plugins for that project. So having all these get re-installed
is an annoyance with arguable value.
I would very much like to see a pre-commit
plugin that merely checks that the
environment being used matches the one defined in requirements-dev.txt
(or
whatever convention this project is using). While pre-commit
’s approach does
ensure that the right version is used when running hooks, it does so by
installing its own copy, and doesn’t even hint to the developer that they might
be using the wrong version for development (this also implies their IDE would
be using the wrong version, and them getting bogus diagnostics).
Finally, some plugins share dependencies with the main project itself, and
these dependencies need to be kept in sync. In my experience, I have always
seen this be address by leaving a comment # make sure these are updated in sync
, and developers manually updating the pinned versions in both places.
Using the system
plugin language
Pre-commit can handle installing tools and their dependencies in many languages
(e.g.: it can create virtualenvs for tools in Python, etc). Additionally,
there’s a system
language which means “expect this tool to be installed in
the host system”. This can be used to avoid re-installed dependencies that are
expected to be available on a development setup.
For example, for the above mypy
example can be rewritten like:
- repo: local
- id: mypy
name: mypy
language: system
entry: mypy
types_or: [python, pyi]
This will use whichever mypy
is available in $PATH
, which should be the
correct one if the developer has set up their system correctly, but will be the
right one on CI (assuming CI is configured to installed all development tools
to the correct version). Again, an extra hook that confirms this would be
pretty valuable.
This approach works very well on CI too; assuming that a pipeline installs
requirements-dev.txt
before the CI run, then using pre-commit run ...
will
use those versions of each tool.
pre-commit.ci
pre-commit.ci is a service which runs pre-commit
hooks as CI pipelines.
This is a proprietary service which runs only on proprietary forges (and, as
far as I know, has no public API).
It seems to cache plugins locally based on their definition, which makes it run
very quickly. The general idea seems pretty solid, but regrettably requires
fully buying into depending on proprietary tools and services for development.
Using pre-commit
itself directly on any regular pipeline works well enough.
Final thoughts
All in all, I think pre-commit does a good job at what it intends to do. I think the “install tools into an isolated environment” feature is a bit of an overkill that adds more complexity that it’s worth (though it does play well with its proprietary CI implementation).
It’s ability to only run hooks if relevant files have changed and skipping
unstashed changes are both of great value on top of git’s default hooks, and
using hooks with language: system
seems to work around its major
shortcomings.