‹ back home

vdirsyncer status update, June 2023

2023-06-30 #open-source #status-update #vdirsyncer

This month started out rather slow, with me taking a week off on vacation and then coming back home only to be sick in bed for another week. But it ended up being rather productive after all.

Configuration parsing

My initial goal since last month was to implement a small binary that can work as a drop-in replacement for vdirsyncer. With some basic sync scenarios working1, the next step was to parse the configuration file.

The configuration format has a few quirks. In first place, the names of storages and pairs are in section titles, which means section titles aren’t a static thing. In second place, property values are then treated as JSON, so I need a second parser for that.

I’m using rust-ini to parse the configuration file itself (it’s been a perfect fit so far) and the famous serde_json to parse the property values.

I’ve gotten to a point where most of my real configuration file parses fine, and creates a Config object. I say “most” because one key detail is missing: the CardDav storage.

So I’ve temporarily paused development of this binary until I catch up with the CardDav implementation.

CardDav

So I set out to advance on the CardDav (i.e.: address books) implementation. Everything so far has mostly focused on CalDav (i.e.: calendars), while always leaving the API definitions in place to make sure that CardDav would fit.

Implementing CardDav is a bit repetitive. It’s very similar to CalDav but with little differences all over the place. I’m often tempted to try and abstract the common parts, but it’s often far harder than just having two very similar looking functions.

However, for all the XML parsing aspect of things (reminder: both CalDav and CardDav are XML-based), it did bother me how little could be reused from the CalDav implementation. In fact, XML parsing was already a huge mess in itself. Parsing worked (and pretty well I might say), but it was hard to understand, hard to reuse parts, and quite non-trivial to parse any new structure (which was required for CardDav).

I ended up rewriting all the XML parsing parts of libdav. And then again.

The first rewrite improved code re-usability, but I’m not sure if it was really easier to understand. This approach used the low-level parser from quick-xml. Using the low-level parser was nice, because I could extract only the necessary bits from the XML data that I received, and discard all the noise. However, the price being paid for this (in terms of complexity) wasn’t worth it. Theoretically, performance could be stupendous with this approach, but that would require investing a lot more time into this than what is realistically ever going to happen.

After I was done, I was not happy with the result. It was a lot more code, and a lot more complexity, and I had the impression that some bits did almost the same work, but it was really hard to consolidate them into one thing.

I took a while to sit back and consider my options. It seemed like parsing the whole XML structure into an actual in-memory tree (with pointers to the original data, no copying!) would be superb approach. Converting data from this tree into the final data type should be pretty simple. I could definitely do this, but that’s not what I signed up for: I’m here to write a calendar (and addressbook) synchronisation tool, not an XML tree generator.

Obviously, such a thing already existed. Indeed roxmltree seemed to fit the bill perfectly and it also seems to be the best option around in terms of performance. Great!

I set out to do this whole rewrite again, which took an entire day. And by “entire day”, I mean, I started at 9:00hs and finished at around 23:00hs, just in time for bed (don’t worry, I did eat and stretch).

I’m pretty happy with the result. It’s pretty easy to extract data from an XML tree, and instead of a huge mess of intertwined Parser and Node implementations, I just have a few parse_ functions (basically, one for each query type). These functions are smaller in scope and even have a few nice individual test each, yay.

Something that’s left for later is dealing with parsing non-UTF8 data. So far, all the tests servers use UTF-8, but I’m sure that some exception exists out there.

CardDav progress

With the XML parsing bits improved as much, I’ve turned my attention back to CardDav itself. It all looks in pretty good shape, and I’m currently writing some live tests to test functionality with live servers (currently: xandikos, radicale, baikal, nextcloud and cyrus-imap). These are showing good results so far, so I expect to close this phase pretty soon.

Some notes on configuration parsing

My intent is for the configuration format to remain unchanged, but there are some small subtleties that will have to change. I intend to clearly document these, and have a clear migration guide in place before retiring the previous tool.

Notes on synchronising email

I’ve synced up with @soywood, who is working on pimalaya (open source tools and libraries to manage emails), with regards to synchronising IMAP using the vstorage library and using my same synchronisation algorithm for synchronising emails.

A vstorage is generic on its item type and doesn’t really care what type of data it’s handling or synchronising. It only cares that the data is a bunch of bytes and that it has an API to extract a unique ID for each item.

In the case of IMAP, however, something else is relevant: flags. Flags are per-message and might change while the message does not (flags are things like “seen”, “starred”, etc). Flags are not part of the message blob itself. And re-synchronising emails just because flags change can be expensive: imagine having to re-download 10 email with 5MiB attachments just because they’re now marked as “read”.

vstorage does have a concept for flags: properties (previously called metadata). However, these are only applicable to collections, and not items. IMAP flags apply only to items and not to collections.

It’s not clear to me how to make this work nicely. One potential option is to have a new concept of ItemProperties and shove flags in there, while defining ItemProperties as “nothing” for CalDav and CardDav. This doesn’t feel quite right, so I intend to keep thinking on this before making any decision.

My main thought on the topic is: I intend to implement synchronising via JMAP one day. JMAP can handle calendars, address books and emails, so it just seems logical to synchronise emails at one point too. We’ll see how that goes.


  1. Basically, CalDav with HTTPS and Basic Auth works. None of the custom TLS settings are implemented yet, nor are things like Digest Auth or Oauth. ↩︎

Have comments or want to discuss this topic?
Send an email to ~whynothugo/public-inbox@lists.sr.ht (mailing list etiquette)

— § —