After having an initial version of the configuration parser, I’ve moved on to working on the actual command line for vdirsyncer itself.
Replacing Box<dyn Storage>
with Arc<dyn Storage>
This section is pretty technical and requires some understanding of Rust.
In my synchronisation algorithm, the type representing the pair to be
synchronised, SyncPair
kept references to the Storage
instances, and these
shared references had started to become a pain point due to Rust’s lifetime
rules.
In particular, my struct App
can’t hold to the Storage
and SyncPair
instances because one has references to the other (and Rust doesn’t allow one
field of a struct
to have references to another field of the struct
).
It became clear that handling shared references was a pain, both for
development of vdirsyncer, and likely for other consumers of the vstorage
library.
I switched to using Arc<Storage>
in most places, and it was clear that my
previous approach wasn’t the right one. An Arc
allows sharing references to
data by using atomic reference counting. Arc
itself is extremely cheap, and
Storage
instances are only created once at start-up, so the near-unmeasurable
cost pays off instantly.
Testing the command line tool
The initial version of the command line just reads the configuration file and synchronises once. It is extremely minimal and my current goal is to iron out all the little bugs lurking around.
I had already adapted my configuration file to the new format. The next step was to make a full backup of all my calendars and contacts before proceeding.
During my first full sync, the grand majority of my items were in conflict. This is because they existed on both sides with non-semantic differences. The previous implementation handles these fine, but I intended to handle conflicts further down the roadmap. When synchronising from a server into an empty storage, conflicts are not an issue. But they do occur when synchronising a storage with pre-existing data on both sides. This is my case personally, and is also going to be the case for anyone switching over from the previous implementation.
I’ll note that while most of the synchronisation is in place, I’ve left one thing pending: conflict resolution. I’ve been thinking about this as an edge case that I can iron out after the initial alpha version, but that is actually not the case.
The first point of conflict are newlines: while the icalendar specification
mandates that all lines end in \r\n
, the majority of the files in my calendar
end in \n
. Generally, tools are lenient to this type of issue, and this
should be no exception. I adapted my code to normalise all items as early as
possible.
After the above tweak, the second attempt at synchronising didn’t go well
either. All my items exist on both sides (e.g.: on my local system and on the
remote server). Because they are “new” items (as this is the first
synchronisation ever), they are not assumed to be identical, and there the
result is a conflict: new items with the same UID
on both sides.
I made some small changes so as to compare the hash of the items when there is
a conflict, and execute a no-op
if the hashes match. Thanks to the previously
mentioned newline normalisation, the hashes match, and now almost all items are
detected as “no change, no action required”.
At this point, about ninety percent of my calendars are recognised as in-sync properly. However, many items are still failing. The following is a portion of an item that was in conflict:
-(de baño)
+(de baño)
This looks like an encoding issue. While handling different encodings is in scope, I didn’t expect to hit these so early (since both my local system and server use UTF-8).
I wrote some tests trying to replicate this, but failed. Finally, after a lot
of debugging (and not the fun type), I narrowed down the issue to this bug in
my XML parsing library of choice, roxmltree
.
I managed to temporarily work around this by pinning the exact version of the
parser that I use. I was previously following the master
branch due
to other recent fixes that I needed, but the latest stable version includes all
the fixes that I need and excludes this regression, so it works perfectly.
At this point, synchronisation works, but some items (10 out of 3298 calendar
items) are still in conflict. Some inspection indicates that fields like
DTSTAMP
are the ones responsible for the conflict. The DTSTAMP
field
indicates the last time that an item was edited.
I’m already ignoring conflicts in the PRODID
field. This field indicates the
last program that edited a calendar component, and some servers change it to
their own name. The result is that uploading a file and then fetching
immediately might yield a different PRODID
, so ignoring changes to it is the
safest thing to do.
The same needs to be done for DTSTAMP
. This is especially true for some
WebCal servers, which generate a new DTSTAMP
each time that an item is
requested.
Indeed, there is a whole list of properties that need to be ignored when comparing whether two events are the same or not. I intended to address these later during development, since I expected them to be an issue later on.
An iCalendar parser
When I wrote the code to ignore the PROPID
, I simply wrote some code that
parses it and removes that line, but doesn’t parse the whole iCalendar
component. It works, but extending it to other properties would start to become
too much of an ugly hack.
What needs to be done here is some basic parsing of the icalendar component, so
as to ignore those fields when comparing two items for semantic equality. I
already have a low-level icalendar parser in my roadmap. It seems best to first
write the proper parser, and then use that. I could use the same approach that
I used for PROPID
, but the result would likely be some pretty ugly code, hard
to properly test and debug, and would ideally be replaced by the parser in
future anyway.
I am currently focused on this low level parser. I have shifted the “initial command line” milestone to immediately after the parser is done.
Configuration parsing
I have committed all the work done for the configuration parsing. This also includes all TLS-related settings, including authentication based on client certificates, using custom CA, and self-signed certificates by validating only the signature.
Some of these scenarios need more testing, and at some point I would really like a second pair of eyes having a look at the implementation, especially all the security related code. I believe everything is correct and safe, but a single pair of eyes is never enough for this kind of code.
Upcoming month
This upcoming month I will be travelling to visit extended family in China. I intend to work on and off every other week.