Settling down a configuration format for the upcoming vdirsyncer v2 has taken more than I anticipated. These is a summary of my journey, considerations and the current state.

The previous format
[permalink]

My first approach was to retain the existing configuration format. I’ll call this one the “legacy” format, to keep language simple. I wrote a parser for it but it was far from trivial and, honestly, extremely complicated code. The configuration format itself is a bespoke format designed for vdirsyncer. The general structure could be parsed as an ini file, while settings that took multiple values look more like JSON.

So the parser I wrote reads the file as json and then deserialises some fields as JSON. At this point, I ended up with a key-value of settings, from which I need to extract the data itself into the “real” types that the application will use.

It was a lot of code that did very little¹.

On top of this, the legacy configuration format doesn’t quite contain all the information needed. In particular, the filesystem storage might be one of two types: filesystem/icalendar or filesystem/vcard. Due to the dynamic nature of the Python implementation, treating both as the same works fine, but that is not the case for the new implementation.

So changes needed to be made, even though quite minor. With this in mind, and considering that this is a new development, keeping the legacy format felt a bit like opting into technical debt.

scfg vs toml
[permalink]

While considering new confirmation definitions, I narrowed my options down to the following two:

scfg: a simple format, designed to be easy for humans to write and simple for machines to parse.
toml: a somewhat complex format. Parsing it is non-trivial, but implementations exist in most mainstream languages.

An upside of toml is that it is very similar to the legacy configuration format, so it’s possible to document the subtle differences clearly and make migration easy for users.

scfg is quite different. While not too hard to understand, it does imply that users need to learn a rather different format when migrating. On the other hand, it’s very easy to understand for new users.

For reference, here is what a portion of my configuration file in the current implementation:

[pair calendars]
a = "calendars_local"
b = "calendars_fastmail"
collections = ["from b"]
metadata = ["color", "displayname"]
conflict_resolution = ["command", "nvim", "-d"]

[storage calendars_local]
type = "filesystem"
path = "~/.local/share/calendars/"
fileext = ".ics"

[storage calendars_fastmail]
type = "caldav"
url = "https://caldav.fastmail.com/"
username = "hugo@whynothugo.nl"
password.fetch = ["command", "hiq", "-dFpassword", "proto=caldavs", "username=whynothugo@fastmail.com"]

The same thing in toml would look almost identical:

[pair.calendars]
a = "calendars_local"
b = "calendars_fastmail"
collections = ["from b"]
metadata = ["color", "displayname"]
conflict_resolution = ["command", "nvim", "-d"]

[storage.calendars_local]
type = "filesystem/icalendar"
path = "~/.local/share/calendars/"
fileext = ".ics"

[storage.calendars_fastmail]
type = "caldav"
url = "https://caldav.fastmail.com/"
username = "hugo@whynothugo.nl"
password.fetch = ["command", "hiq", "-dFpassword", "proto=caldavs", "username=whynothugo@fastmail.com"]

Meanwhile, an scfg variation would look something like this:

pair calendars {
  a calendars_local
  b calendars_fastmail
  collections from b
  metadata color displayname
  conflict resolution {
    command nvim -d
  }
}

storage calendars_local {
  type filesystem/icalendar
  path = ~/.local/share/calendars/
  fileext = .ics
}

storage calendars_fastmail {
  type caldav
  url https://caldav.fastmail.com/
  username = hugo@whynothugo.nl
  password {
    command hiq -dFpassword proto=caldavs username=whynothugo@fastmail.com
  }
}

`scfg`
[permalink]

Note how the scfg variant gets rid of quoting. While it’s still possible to quote values with spaces, it’s not necessary. Honestly, this looks like the most human-friendly option.

So this is the variant that I tried implementing first. I used scfg-rs, a rust library for parsing scfg files.

This kind of worked. This library parses the file into an in-memory type that feels like a HashMap with Vecs. Extracting the information from this intermediate type into a proper Config format requires a additional code and complexity of which I’m not a fan.

In the end, I felt that this library is a good fit for simpler usages, but not so much for this case. A serde based approach would likely be a great fit. While I did consider implementing such a thing, it’s just a huge scope creep for this project.

toml
[permalink]

Before deciding between scfg and toml, I wanted to be sure that I had tried out both properly. The toml implementation is substantially simpler; I mostly just declared a few idiomatic types to represent my configuration, and added #[derive(Deserialize, Debug)] to have serde deserialise this from the toml file.

I feel a lot more comfortable moving forward with this for now. In particular: I can move onto the next milestone which is writing the command line itself instead of writing more code to unwrap a configuration file.

Note that some version far in the future might end up supporting scfg as well. For now, the focus is on moving forward and not on picking the one true ultimate configuration format.

Specifying collections
[permalink]

The config file needs to specify which collections to sync. This can take a few shapes:

null: this is a special value which will no longer be supported.
["from a"] syncs all collections that exist on storage a.
["from b"] syncs all collections that exist on storage b.
["from a", "from b"] syncs all collections that exist on either side.
["work"] syncs a collection with name “work”.
["from"] syncs a collection with name “from”.

toml doesn’t even support null, but this use case is being dropped entirely (more on that later), so it’s not a problem at all.

I introduced a new option here, which is equivalent to ["from a", "from b"]:

collections = ["all"]

An issue with this is that it’s not possible to specify a collection named “all”. It was also impossible to specify a collection named “from a”, although I don’t think this has ever realistically been a problem.

The "from b" variant remains the same:

collections = ["from b"]

However, specifying collections by their id now has an entirely different format:

collections = [
  { id = "italki" }
]

Note that the following is also valid²:

collections = [
  "from a",
  { id = "work" }
]

The id part is to disambiguate exactly what the string itself means, which is especially important due to a new addition:

collections = [
  { href = "/work" }
]

The id syntax looks for a collection with a matching id (the “id” generally being the name of the directory itself or the last component in a URL). The href approach works on situations where discovery is not an option.

Finally, the legacy configuration format supported mapped synchronisation: specifying a different collection on each side to be synchronised with each other. The legacy format was:

collections = [["bar", "bar_a", "bar_b"], "foo"]

The new format is a bit more verbose. In my honest opinion, it’s not necessarily simpler to write:

collections = [
  { mapped = [ "work", { id = "work" }, { href = "/path/to/work" } ]}
  #            ^^^^^^ this is an alias used only for logging.
]

For reference, this is the scfg version of the above:

collections {
    mapped work {
        #  ^^^^ this is the same alias as above.
        id work
        href /path/to/work
    }
}

This maps the collection on storage a with id work with a collection on storage b with href /path/to/work. The legacy format did not allow specifying collections by href at all, which can be an issue in some niche cases. Given that the new sync algorithm supports it entirely, it is important that the configuration file allows making use of this feature.

The `null` collection
[permalink]

As I mentioned before, previously one could specify a null collection, and this indicated that the configuration for a given storage points directly to a collection and not to a storage with multiple collections.

Using a storage with a single collection is still possible, but the approach has changed. Rather than specifying null as a collection, the href syntax should be used instead to point directly to the collection. This works even in situations where discovery doesn’t work (it might simply be unsupported server side).

The new approach keeps some abstractions in place for all scenarios, which makes a lot of the under-the-hood logic much simpler.

Other fields
[permalink]

Other fields will remain largely the same. I’ve focused on the basic ones first (mostly to allow simple usages), and will continue addressing others at a later stage. In particular, custom TLS configuration is likely to come around after an initial alpha version of the command line interface.

Current state
[permalink]

As I’ve mentioned above, I’ve defined some idiomatic types that represent the configuration itself, and the parser simply creates instances of those, which are rather easy to operate with.

At this point, I’m needing to translate these into the actual Storage and StoragePair instances and trigger a synchronisation. It shouldn’t be too long before I have a working alpha version of vdirsyncer2.

That’s mostly it for this month’s update.

I’ve retained this code in the legacy-config branch. It will likely be useful in future to write a tool to auto-migrate configuration formats. ↩︎
The initial version of this example was missing a trailing comma on the first line, which made this example invalid toml. It’s easy for programmers to deal with, but I fear it has too many quirks for everyone else. ↩︎

The previous format[permalink]

scfg vs toml[permalink]

scfg[permalink]

toml[permalink]

Specifying collections[permalink]

The null collection[permalink]

Other fields[permalink]

Current state[permalink]