I had family visiting during this past month, so I’ve taken some time off to spend with them. Progress has therefore been slower than usual.
Sub-tasks missing from planning
As I mentioned a few months ago, the NLnet foundation is currently funding my work rewriting on vdirsyncer. As part of this process, I’ve shared a full plan with tasks that need to be done and an estimation for them.
While I had a pretty solid plan on how to approach re-writing vdirsyncer, some requirements did fly under my radar. For example, the original plan included the work necessary for the caldav client, but I didn’t really take into account the DNS-based service discovery. The DNS-based discovery didn’t consume a huge amount of time – but it was still a divergence from my original estimations.
When looking into the coming milestones, it has become clear that I’ve missed out on estimating several more tasks related to intermediate work. I have been doing a lot of re-planning to make sure that I have a good plan and estimation for this work.
Some items that I had overlooked are:
- Creating an API to declare sync targets. I currently have code that can synchronise two storages, but the API for defining these storages (as well as which collections inside of them should be synchronised) was missing. The rewrite has a strong separation between the library that does the work and the command line interface, so an API is quite necessary, even at this stage. I actually ended up refactoring almost all of the synchronisation code for this (twice), and finally settled on a simple builder API for this. This API is also usable by other frontends, including potentially a GUI application.
- Defining and documenting a configuration format: My next milestone is to publish a command line tool for syncing (e.g.: an alpha vdirsyncer). This obviously requires reading a configuration file (which specifies which storages to sync). Defining and documenting this configuration format was not something that I had planned for in scope. To be fair, I initially intended to use the existing format, but it eventually turned out not to be a great fit. I’ve shared some notes on this topic in the past and intend to address this in more details in future.
- Icalendar parsing and JSCalendar parsing. JMAP support is in scope as one of the final milestones. This requires converting Icalendar to/from JSCalendar. This in turn requires a low level parser for both formats. These parsers were not kept in scope, and none of the existing implementations fit the bill. I have now added these parsers to my list of pending milestones.
- Functionality to repair non-compliant icalendar items: This is a feature
available in the previous vdirsyncer that’s actually quite useful and slipped
under my radar (mostly because it’s usually used as a one-time thing). This
mostly adds missing mandatory fields (notably:
UID:
) and similar irregularities.
Coordinating with Pimalaya
I’ve also been coordinating further with @soywod, who is working on the Pimalaya project. The goal of Pimalaya is to provide a suite of libraries and applications for email, calendar and other personal information management tools. We’re obviously on very related fields, so coordinating to avoid duplicating work is key for both of us.
As I’ve mentioned above, a low-level icalendar parser is in scope for me, and the intent is for it to be fully re-usable. Tentatively, it seems that it might be usable for Pimalaya in future, although nothing in set in stone yet. My plans after vdirsyncer involve other calendar-related tools, so we do aspire to try and converge in our general direction.
Email synchronisation
A topic that has come up many times is the idea of extending vdirsyncer to also synchronise email (e.g.: between IMAP, Maildir, and potentially even JMAP). The general idea of “synchronise two storages while keeping a state file to resolve mismatches” applies well to both email and calendar. In fact, vdirsyncer’s original algorithm is inspired by the one used in offlineimap
However, while some aspects of vdirsyncer are obviously reusable in this
context, email has a few nuances that make it quite different too. In
particular, emails have “flags” (e.g.: seen
, flagged
, draft
, etc),
whereas calendar events have no equivalence. Additionally, flags change very
often, but messages are immutable.
I’m still trying to wrap my head around a design that makes sense. Currently,
the Storage
type is parametrised with the type of content it holds (e.g.:
IcalendarItem
, which implements the trait Item
). I’m considering adding a
flags
attribute to Item
, which can be empty for IcalendarItem
but have
actual flags for EmailItem
. An important detail in this approach is that
whether an item type has flags is determined at compile-time, so the “check and
synchronise flags” part of the synchronisation algorithm could potentially
compile down to a no-op for IcalendarItem
.
Another approach I’ve considered is having multiple “layers” for each Item
type. IcalendarItem
would only have the data
layer, but EmailItem
can
have a message
and a flags
layer, where one can mutate without the other.
Keep in mind that it’s important to avoid re-synchronising an email just
because its flags have changed – we wouldn’t want to re-download a 25MB email
just because it was marked as read.
However, these are still vague ideas in my head, which I’ll continue refining. , but are not in scope yet and there is still not a solid plan to implement things things exactly this way yet. If this does happen, it will be after a stable release of the rewrite has been made. I definitely don’t want to block a stable release with this milestone.
Configuration parsing and API
A configuration might specify something like “sync all collections from
storage A”, but a StoragePair
type (which contains all the rules for
creating a plan and eventually executing a synchronisation) needs explicitly
enumerated collections and their respective mappings.
A bit of glue code is needed between the reading the configuration and creating
the StoragePair
instance. With the builder API for the StoragePair
type, I
don’t expect any major obstacles in this field.
Given that the configuration format is so decoupled from the application logic itself, I expect that trying out different configuration formats in future should not require huge changes.
The CollectionId
type
The vstorage
crate has a few functions that take a “collection id” as a
parameter. A collection id is just a string with a few restrictions (no
slashes, can’t be ..
or .
).
My first instinct was to add a quick validation. A small function that takes the string and returns whether it is valid as a collection id or not. This works, but I just need to remember to call this in every single function that takes a collection id as an input. It also means that any value that is provided multiple times needs to be validated each time that it is received.
After some hesitation and experimenting back and forth, I eventually decided to use a custom type for this:
pub struct CollectionId {
inner: String,
}
impl FromStr for CollectionId {
type Err = CollectionIdError;
fn from_str(s: &str) -> Result<CollectionId, CollectionIdError> {
// ...
}
}
impl TryFrom<String> for CollectionId {
type Error = CollectionIdError;
fn try_from(value: String) -> std::result::Result<Self, Self::Error> {
// ...
}
}
// ... A few other conversion helpers...
Note that the inner
field is private, so creating instances of CollectionId
is only possible if the data has been properly validated.
To be honest, as I implemented this type and used it everywhere I kept wondering if this wasn’t an overkill. After completing the changes, it’s clear that it wasn’t.
Now functions that take a collection id no longer take a String
parameter;
they take a CollectionId
parameter. This makes the compiler enforce that the
data is always validated before the function call. It also moves the
responsibility of validating data to the caller, so it simplifies the returned
error types (since they no longer need to account for the “invalid collection
id” variants).
This item constitutes a very small development (it only took less than a couple of hours to get to its current version), but I’m very happy with the ergonomics that it provides.