Settling down a configuration format for the upcoming vdirsyncer v2 has taken more than I anticipated. These is a summary of my journey, considerations and the current state.
The previous format
My first approach was to retain the existing configuration format. I’ll call
this one the “legacy” format, to keep language simple. I wrote a parser for it
but it was far from trivial and, honestly, extremely complicated code. The
configuration format itself is a bespoke format designed for vdirsyncer. The
general structure could be parsed as an ini
file, while settings that took
multiple values look more like JSON.
So the parser I wrote reads the file as json and then deserialises some fields as JSON. At this point, I ended up with a key-value of settings, from which I need to extract the data itself into the “real” types that the application will use.
It was a lot of code that did very little1.
On top of this, the legacy configuration format doesn’t quite contain all the
information needed. In particular, the filesystem
storage might be one of two
types: filesystem/icalendar
or filesystem/vcard
. Due to the dynamic nature
of the Python implementation, treating both as the same works fine, but that is
not the case for the new implementation.
So changes needed to be made, even though quite minor. With this in mind, and considering that this is a new development, keeping the legacy format felt a bit like opting into technical debt.
scfg vs toml
While considering new confirmation definitions, I narrowed my options down to the following two:
- scfg: a simple format, designed to be easy for humans to write and simple for machines to parse.
- toml: a somewhat complex format. Parsing it is non-trivial, but implementations exist in most mainstream languages.
An upside of toml
is that it is very similar to the legacy configuration
format, so it’s possible to document the subtle differences clearly and make
migration easy for users.
scfg
is quite different. While not too hard to understand, it does imply that
users need to learn a rather different format when migrating. On the other
hand, it’s very easy to understand for new users.
For reference, here is what a portion of my configuration file in the current implementation:
[pair calendars]
a = "calendars_local"
b = "calendars_fastmail"
collections = ["from b"]
metadata = ["color", "displayname"]
conflict_resolution = ["command", "nvim", "-d"]
[storage calendars_local]
type = "filesystem"
path = "~/.local/share/calendars/"
fileext = ".ics"
[storage calendars_fastmail]
type = "caldav"
url = "https://caldav.fastmail.com/"
username = "hugo@whynothugo.nl"
password.fetch = ["command", "hiq", "-dFpassword", "proto=caldavs", "username=whynothugo@fastmail.com"]
The same thing in toml
would look almost identical:
[pair.calendars]
a = "calendars_local"
b = "calendars_fastmail"
collections = ["from b"]
metadata = ["color", "displayname"]
conflict_resolution = ["command", "nvim", "-d"]
[storage.calendars_local]
type = "filesystem/icalendar"
path = "~/.local/share/calendars/"
fileext = ".ics"
[storage.calendars_fastmail]
type = "caldav"
url = "https://caldav.fastmail.com/"
username = "hugo@whynothugo.nl"
password.fetch = ["command", "hiq", "-dFpassword", "proto=caldavs", "username=whynothugo@fastmail.com"]
Meanwhile, an scfg
variation would look something like this:
pair calendars {
a calendars_local
b calendars_fastmail
collections from b
metadata color displayname
conflict resolution {
command nvim -d
}
}
storage calendars_local {
type filesystem/icalendar
path = ~/.local/share/calendars/
fileext = .ics
}
storage calendars_fastmail {
type caldav
url https://caldav.fastmail.com/
username = hugo@whynothugo.nl
password {
command hiq -dFpassword proto=caldavs username=whynothugo@fastmail.com
}
}
scfg
Note how the scfg
variant gets rid of quoting. While it’s still possible to
quote values with spaces, it’s not necessary. Honestly, this looks like the
most human-friendly option.
So this is the variant that I tried implementing first. I used scfg-rs
, a
rust library for parsing scfg
files.
This kind of worked. This library parses the file into an in-memory type that
feels like a HashMap
with Vec
s. Extracting the information from this
intermediate type into a proper Config
format requires a additional code and
complexity of which I’m not a fan.
In the end, I felt that this library is a good fit for simpler usages, but not
so much for this case. A serde
based approach would likely be a great fit.
While I did consider implementing such a thing, it’s just a huge scope creep
for this project.
toml
Before deciding between scfg
and toml
, I wanted to be sure that I had tried
out both properly. The toml
implementation is substantially simpler; I mostly
just declared a few idiomatic types to represent my configuration, and added
#[derive(Deserialize, Debug)]
to have serde
deserialise this from the
toml
file.
I feel a lot more comfortable moving forward with this for now. In particular: I can move onto the next milestone which is writing the command line itself instead of writing more code to unwrap a configuration file.
Note that some version far in the future might end up supporting scfg
as
well. For now, the focus is on moving forward and not on picking the one true
ultimate configuration format.
Specifying collections
The config file needs to specify which collections to sync. This can take a few shapes:
null
: this is a special value which will no longer be supported.["from a"]
syncs all collections that exist on storage a.["from b"]
syncs all collections that exist on storage b.["from a", "from b"]
syncs all collections that exist on either side.["work"]
syncs a collection with name “work”.["from"]
syncs a collection with name “from”.
toml
doesn’t even support null
, but this use case is being dropped
entirely (more on that later), so it’s not a problem at all.
I introduced a new option here, which is equivalent to ["from a", "from b"]
:
collections = ["all"]
An issue with this is that it’s not possible to specify a collection named “all”. It was also impossible to specify a collection named “from a”, although I don’t think this has ever realistically been a problem.
The "from b"
variant remains the same:
collections = ["from b"]
However, specifying collections by their id
now has an entirely different
format:
collections = [
{ id = "italki" }
]
Note that the following is also valid2:
collections = [
"from a",
{ id = "work" }
]
The id
part is to disambiguate exactly what the string itself means, which is
especially important due to a new addition:
collections = [
{ href = "/work" }
]
The id
syntax looks for a collection with a matching id
(the “id” generally
being the name of the directory itself or the last component in a URL). The
href
approach works on situations where discovery is not an option.
Finally, the legacy configuration format supported mapped synchronisation: specifying a different collection on each side to be synchronised with each other. The legacy format was:
collections = [["bar", "bar_a", "bar_b"], "foo"]
The new format is a bit more verbose. In my honest opinion, it’s not necessarily simpler to write:
collections = [
{ mapped = [ "work", { id = "work" }, { href = "/path/to/work" } ]}
# ^^^^^^ this is an alias used only for logging.
]
For reference, this is the scfg
version of the above:
collections {
mapped work {
# ^^^^ this is the same alias as above.
id work
href /path/to/work
}
}
This maps the collection on storage a with id work
with a collection on
storage b with href /path/to/work
. The legacy format did not allow specifying
collections by href
at all, which can be an issue in some niche cases. Given
that the new sync algorithm supports it entirely, it is important that the
configuration file allows making use of this feature.
The null
collection
As I mentioned before, previously one could specify a null
collection, and
this indicated that the configuration for a given storage points directly to a
collection and not to a storage with multiple collections.
Using a storage with a single collection is still possible, but the approach
has changed. Rather than specifying null
as a collection, the href
syntax
should be used instead to point directly to the collection. This works even in
situations where discovery doesn’t work (it might simply be unsupported server
side).
The new approach keeps some abstractions in place for all scenarios, which makes a lot of the under-the-hood logic much simpler.
Other fields
Other fields will remain largely the same. I’ve focused on the basic ones first (mostly to allow simple usages), and will continue addressing others at a later stage. In particular, custom TLS configuration is likely to come around after an initial alpha version of the command line interface.
Current state
As I’ve mentioned above, I’ve defined some idiomatic types that represent the configuration itself, and the parser simply creates instances of those, which are rather easy to operate with.
At this point, I’m needing to translate these into the actual Storage
and
StoragePair
instances and trigger a synchronisation. It shouldn’t be too
long before I have a working alpha version of vdirsyncer2.
That’s mostly it for this month’s update.
I’ve retained this code in the
legacy-config
branch. It will likely be useful in future to write a tool to auto-migrate configuration formats. ↩︎The initial version of this example was missing a trailing comma on the first line, which made this example invalid
toml
. It’s easy for programmers to deal with, but I fear it has too many quirks for everyone else. ↩︎