‹ back home

vdirsyncer status update, October 2023

2023-11-01 #open-source #status-update #vdirsyncer

After having an initial version of the configuration parser, I’ve moved on to working on the actual command line for vdirsyncer itself.

Replacing Box<dyn Storage> with Arc<dyn Storage>

This section is pretty technical and requires some understanding of Rust.

In my synchronisation algorithm, the type representing the pair to be synchronised, SyncPair kept references to the Storage instances, and these shared references had started to become a pain point due to Rust’s lifetime rules.

In particular, my struct App can’t hold to the Storage and SyncPair instances because one has references to the other (and Rust doesn’t allow one field of a struct to have references to another field of the struct).

It became clear that handling shared references was a pain, both for development of vdirsyncer, and likely for other consumers of the vstorage library.

I switched to using Arc<Storage> in most places, and it was clear that my previous approach wasn’t the right one. An Arc allows sharing references to data by using atomic reference counting. Arc itself is extremely cheap, and Storage instances are only created once at start-up, so the near-unmeasurable cost pays off instantly.

Testing the command line tool

The initial version of the command line just reads the configuration file and synchronises once. It is extremely minimal and my current goal is to iron out all the little bugs lurking around.

I had already adapted my configuration file to the new format. The next step was to make a full backup of all my calendars and contacts before proceeding.

During my first full sync, the grand majority of my items were in conflict. This is because they existed on both sides with non-semantic differences. The previous implementation handles these fine, but I intended to handle conflicts further down the roadmap. When synchronising from a server into an empty storage, conflicts are not an issue. But they do occur when synchronising a storage with pre-existing data on both sides. This is my case personally, and is also going to be the case for anyone switching over from the previous implementation.

I’ll note that while most of the synchronisation is in place, I’ve left one thing pending: conflict resolution. I’ve been thinking about this as an edge case that I can iron out after the initial alpha version, but that is actually not the case.

The first point of conflict are newlines: while the icalendar specification mandates that all lines end in \r\n, the majority of the files in my calendar end in \n. Generally, tools are lenient to this type of issue, and this should be no exception. I adapted my code to normalise all items as early as possible.

After the above tweak, the second attempt at synchronising didn’t go well either. All my items exist on both sides (e.g.: on my local system and on the remote server). Because they are “new” items (as this is the first synchronisation ever), they are not assumed to be identical, and there the result is a conflict: new items with the same UID on both sides.

I made some small changes so as to compare the hash of the items when there is a conflict, and execute a no-op if the hashes match. Thanks to the previously mentioned newline normalisation, the hashes match, and now almost all items are detected as “no change, no action required”.

At this point, about ninety percent of my calendars are recognised as in-sync properly. However, many items are still failing. The following is a portion of an item that was in conflict:

-(de baño)
+(de baño)

This looks like an encoding issue. While handling different encodings is in scope, I didn’t expect to hit these so early (since both my local system and server use UTF-8).

I wrote some tests trying to replicate this, but failed. Finally, after a lot of debugging (and not the fun type), I narrowed down the issue to this bug in my XML parsing library of choice, roxmltree.

I managed to temporarily work around this by pinning the exact version of the parser that I use. I was previously following the master branch due to other recent fixes that I needed, but the latest stable version includes all the fixes that I need and excludes this regression, so it works perfectly.

At this point, synchronisation works, but some items (10 out of 3298 calendar items) are still in conflict. Some inspection indicates that fields like DTSTAMP are the ones responsible for the conflict. The DTSTAMP field indicates the last time that an item was edited.

I’m already ignoring conflicts in the PRODID field. This field indicates the last program that edited a calendar component, and some servers change it to their own name. The result is that uploading a file and then fetching immediately might yield a different PRODID, so ignoring changes to it is the safest thing to do.

The same needs to be done for DTSTAMP. This is especially true for some WebCal servers, which generate a new DTSTAMP each time that an item is requested.

Indeed, there is a whole list of properties that need to be ignored when comparing whether two events are the same or not. I intended to address these later during development, since I expected them to be an issue later on.

An iCalendar parser

When I wrote the code to ignore the PROPID, I simply wrote some code that parses it and removes that line, but doesn’t parse the whole iCalendar component. It works, but extending it to other properties would start to become too much of an ugly hack.

What needs to be done here is some basic parsing of the icalendar component, so as to ignore those fields when comparing two items for semantic equality. I already have a low-level icalendar parser in my roadmap. It seems best to first write the proper parser, and then use that. I could use the same approach that I used for PROPID, but the result would likely be some pretty ugly code, hard to properly test and debug, and would ideally be replaced by the parser in future anyway.

I am currently focused on this low level parser. I have shifted the “initial command line” milestone to immediately after the parser is done.

Configuration parsing

I have committed all the work done for the configuration parsing. This also includes all TLS-related settings, including authentication based on client certificates, using custom CA, and self-signed certificates by validating only the signature.

Some of these scenarios need more testing, and at some point I would really like a second pair of eyes having a look at the implementation, especially all the security related code. I believe everything is correct and safe, but a single pair of eyes is never enough for this kind of code.

Upcoming month

This upcoming month I will be travelling to visit extended family in China. I intend to work on and off every other week.

Have comments or want to discuss this topic?
Send an email to my public inbox: ~whynothugo/public-inbox@lists.sr.ht.
Or feel free to reply privately by email: hugo@whynothugo.nl.

— § —