Writing an emoji input method, part 1

Today I started work on emoji-im, an input method to type emoji. The general ideal is pretty simple, when the input method is active, I type a word, and the input method pop-up suggests emoji based on the word typed. Pressing space inserts that emoji.

By implementing this as an input method, it works on any application where typing regular text is possible.

I knew this was not something that I’d be able to complete in a day, but I at least wanted to get the basic scaffold and general structure set up on this first day. Spoiler: I think this went quite well.

I decided to use Hare for this. It’s a simple language, quite low level, with a solid approach to error management. Compilation is fast, and I only need two dependencies which compile in milliseconds. I used hare-wayland in the past for wlhc, and found myself quite comfortable developing with it.

I started my project today by generating client code for the Wayland protocols that I expect to need. I copied some bits from wlhc for this, and came to this initial Makefile:

DESTDIR=
PREFIX=/usr/local

.PHONY: build
build:
	mkdir -p wayland/xdg wayland/wlr wayland/wp

	hare-wlscanner \
		< $(shell pkg-config --variable=pkgdatadir wayland-protocols)/stable/xdg-shell/xdg-shell.xml \
		> wayland/xdg/shell.ha
	hare-wlscanner \
		< $(shell pkg-config --variable=pkgdatadir wayland-protocols)/staging/single-pixel-buffer/single-pixel-buffer-v1.xml \
		> wayland/wp/single_pixel_buffer.ha
	hare-wlscanner \
		< $(shell pkg-config --variable=pkgdatadir wayland-protocols)/staging/fractional-scale/fractional-scale-v1.xml \
		> wayland/wp/fractional_scale.ha
	hare-wlscanner \
		< $(shell pkg-config --variable=pkgdatadir wlr-protocols)/unstable/wlr-layer-shell-unstable-v1.xml \
		> wayland/wlr/layer_shell.ha
	hare-wlscanner \
		< input-method-unstable-v2.xml \
		> wayland/zwp/input_method.ha

	hare build -o emoji-im

install: build
	install -m755 emoji-im $(DESTDIR)$(PREFIX)/bin/emoji-im

This generates code to unmarshall events and send requests to a wayland server. The generated files are not intended to be edited manually, and the functions that they provide are a direct representation of the underlying protocol.

My code is in separate files, but it’s all compiled together as one big project.

I find this approach quite convenient. I can actually inspect the generated code to double check the names and types of functions and their method. The library itself handles all the wire protocol serialisation and deserialisation.

Coming from Python, I’ve used to libraries that generate abstractions to call some remote method at runtime. In these situations, even though I’d be using a proxy object that wraps some underlying IPC there’s no source code for the client objects; it’s all generated at runtime. So double checking types, or figuring out the little details was not as straightforward as checking the code for the function I was calling – there was no code!

In this sense, I find that Rust proc_macros tend to behaves too closely to Python. They generate code at compile them, and feed it straight to the compiler. It’s not as straightforward to double check details in this code.

I started out writing code to register the Wayland global objects that I’d need. I wrote code to register registered a new input method. Each execution was a simple:

make && WAYLAND_DEBUG=1 ./emoji-im

Most of this was some straightforward scaffold that I was pleased to have finished quickly.

The next step was to intercept keyboard input and determine emoji suggestions based on that. Because my program would be intercepting all keyboard input, I wouldn’t be able to press Ctrl+c to kill it. In order to ensure that I didn’t lock myself out of my own session, I ran:

watch -n 10 pkill emoji-im

This would kill the emoji keyboard every ten seconds, so if I ended up in a state where it was eating up all keystrokes, this would save the day. Occasionally it gets killed at the wrong time. I could have just ran tests in a nested compositor.

The wayland protocol for input methods is pretty well designed, and integrating with it was quite uneventful. Until I had to process keystrokes. Keymaps and Key events -presses are transmitted using the Xorg. These are quite non-trivial to parse. Instead of actually parsing the properly, I hard-coded the key sequences for all letters, Space and Tab on my own keyboard.

Some time later I was pointed at hare-xkb, which I will have to integrate eventually. I preferred to continue the wave of implementing new bits rather than sinking time into polishing something that can be left for a final stage. Adding tiny improvements helps keep motivation up; perfecting inconsequential details before everything else works does not.

Being able to recognise keystrokes, I started intercepting them and building a string with the characters. This is literally just concatenating a character to a string. This is the preedit string, renders in text input areas only as a placeholder. Depending on this string, I intend to show emoji suggestions. When Space is pressed, I intend to insert the selected emoji. At this early stage, if the input string is smile, I’ll insert the 🙂. At this point, I had to leave to deal with some personal stuff.

That’s basically it for day 1. This was a pretty fun project, and there’s plenty of work left to do. At this point, I have an emoji input method which, when active, lets me insert the smile emoji.