‹ back home

vdirsyncer status update, August 2023

2023-08-26 #open-source #status-update #vdirsyncer

I had family visiting during this past month, so I’ve taken some time off to spend with them. Progress has therefore been slower than usual.

Sub-tasks missing from planning

As I mentioned a few months ago, the NLnet foundation is currently funding my work rewriting on vdirsyncer. As part of this process, I’ve shared a full plan with tasks that need to be done and an estimation for them.

While I had a pretty solid plan on how to approach re-writing vdirsyncer, some requirements did fly under my radar. For example, the original plan included the work necessary for the caldav client, but I didn’t really take into account the DNS-based service discovery. The DNS-based discovery didn’t consume a huge amount of time – but it was still a divergence from my original estimations.

When looking into the coming milestones, it has become clear that I’ve missed out on estimating several more tasks related to intermediate work. I have been doing a lot of re-planning to make sure that I have a good plan and estimation for this work.

Some items that I had overlooked are:

Coordinating with Pimalaya

I’ve also been coordinating further with @soywod, who is working on the Pimalaya project. The goal of Pimalaya is to provide a suite of libraries and applications for email, calendar and other personal information management tools. We’re obviously on very related fields, so coordinating to avoid duplicating work is key for both of us.

As I’ve mentioned above, a low-level icalendar parser is in scope for me, and the intent is for it to be fully re-usable. Tentatively, it seems that it might be usable for Pimalaya in future, although nothing in set in stone yet. My plans after vdirsyncer involve other calendar-related tools, so we do aspire to try and converge in our general direction.

Email synchronisation

A topic that has come up many times is the idea of extending vdirsyncer to also synchronise email (e.g.: between IMAP, Maildir, and potentially even JMAP). The general idea of “synchronise two storages while keeping a state file to resolve mismatches” applies well to both email and calendar. In fact, vdirsyncer’s original algorithm is inspired by the one used in offlineimap

However, while some aspects of vdirsyncer are obviously reusable in this context, email has a few nuances that make it quite different too. In particular, emails have “flags” (e.g.: seen, flagged, draft, etc), whereas calendar events have no equivalence. Additionally, flags change very often, but messages are immutable.

I’m still trying to wrap my head around a design that makes sense. Currently, the Storage type is parametrised with the type of content it holds (e.g.: IcalendarItem, which implements the trait Item). I’m considering adding a flags attribute to Item, which can be empty for IcalendarItem but have actual flags for EmailItem. An important detail in this approach is that whether an item type has flags is determined at compile-time, so the “check and synchronise flags” part of the synchronisation algorithm could potentially compile down to a no-op for IcalendarItem.

Another approach I’ve considered is having multiple “layers” for each Item type. IcalendarItem would only have the data layer, but EmailItem can have a message and a flags layer, where one can mutate without the other. Keep in mind that it’s important to avoid re-synchronising an email just because its flags have changed – we wouldn’t want to re-download a 25MB email just because it was marked as read.

However, these are still vague ideas in my head, which I’ll continue refining. , but are not in scope yet and there is still not a solid plan to implement things things exactly this way yet. If this does happen, it will be after a stable release of the rewrite has been made. I definitely don’t want to block a stable release with this milestone.

Configuration parsing and API

A configuration might specify something like “sync all collections from storage A”, but a StoragePair type (which contains all the rules for creating a plan and eventually executing a synchronisation) needs explicitly enumerated collections and their respective mappings.

A bit of glue code is needed between the reading the configuration and creating the StoragePair instance. With the builder API for the StoragePair type, I don’t expect any major obstacles in this field.

Given that the configuration format is so decoupled from the application logic itself, I expect that trying out different configuration formats in future should not require huge changes.

The CollectionId type

The vstorage crate has a few functions that take a “collection id” as a parameter. A collection id is just a string with a few restrictions (no slashes, can’t be .. or .).

My first instinct was to add a quick validation. A small function that takes the string and returns whether it is valid as a collection id or not. This works, but I just need to remember to call this in every single function that takes a collection id as an input. It also means that any value that is provided multiple times needs to be validated each time that it is received.

After some hesitation and experimenting back and forth, I eventually decided to use a custom type for this:

pub struct CollectionId {
    inner: String,
}

impl FromStr for CollectionId {
    type Err = CollectionIdError;

    fn from_str(s: &str) -> Result<CollectionId, CollectionIdError> {
        // ...
    }
}

impl TryFrom<String> for CollectionId {
    type Error = CollectionIdError;

    fn try_from(value: String) -> std::result::Result<Self, Self::Error> {
        // ...
    }
}

// ... A few other conversion helpers...

Note that the inner field is private, so creating instances of CollectionId is only possible if the data has been properly validated.

To be honest, as I implemented this type and used it everywhere I kept wondering if this wasn’t an overkill. After completing the changes, it’s clear that it wasn’t.

Now functions that take a collection id no longer take a String parameter; they take a CollectionId parameter. This makes the compiler enforce that the data is always validated before the function call. It also moves the responsibility of validating data to the caller, so it simplifies the returned error types (since they no longer need to account for the “invalid collection id” variants).

This item constitutes a very small development (it only took less than a couple of hours to get to its current version), but I’m very happy with the ergonomics that it provides.

Have comments or want to discuss this topic?
Send an email to my public inbox: ~whynothugo/public-inbox@lists.sr.ht.
Or feel free to reply privately by email: hugo@whynothugo.nl.

— § —