‹ back home

How the clipboard works

2022-10-21 #clipboard #desktop #programming

Reading how copy-paste works from the Wayland specification is non-trivial unless you understand a lot of how desktop computing works and Wayland internal. It took me quite a while to figure it all out, though once you get there, it seems quite obvious.

Here’s my attempt at explaining how it works for mere mortals.

Terminology

Let me clarify that what we usually call “clipboard” is actually called a “selection”. I’ll use the term “clipboard” here anyway to keep this friendly, but keep in mind that it’s not the actual technical term.

Copying

When you select some text and press ctrl+c, you’d normally say that the program “copies” data into the clipboard. In reality, no copying happens at this point. What the program actually does is announce “I have the clipboard now, and have data of type text/plain. At this point, the previous application that owned the clipboard (if any) is informed that they no longer own the clipboard. From this point on, whenever another application gains focus, it’ll be informed that someone owns the clipboard and is offering data of type text/plain. This happens when an application gains focus because only a foreground application can access the clipboard.

However, we don’t just copy text, we can copy anything. Let’s copy a png image on Firefox (e.g.: right click, copy image). At this point, Firefox announces “I have the clipboard; I have data of type image/png, text/x-moz-url1 or text/plain. This means that the data can either be pasted as a png image (e.g.: raw image bytes) or as a URL (the URL to the image we’ve copied) or as text (again, the URL. This is the fallback for programs that don’t support any of the above two).

This all oversimplified one tiny bit; the copying process is not one step, but a few (but the above introduction should help this make sense). In full detail:

Pasting

Copying wasn’t hard, now let’s see how the other side works.

As mentioned above, applications are informed when another application owns the clipboard.

So let’s imagine an you’ve copied an image in Firefox, and now switch to a terminal. The compositor will inform the terminal “Somebody owns the clipboard and is offering data of types image/png, text/x-moz-url or text/plain. If you try to paste, the terminal will ignore the types it doesn’t know how to handle, and will request the data of type text/plain. At this point, the owner of the clipboard (Firefox) is informed that somebody wants to paste its data and receives a file descriptor2 into which is must write the data. The terminal receives another file descriptor where it can read the data. Anything written to the first one is read out the second, so the data is transferred directly between applications with no middleman. It’s basically a pipe; data goes in one end, comes out the other.

Some notes on this design

First of all, one has to understand that, under the hood, nothing is ever “copied into the clipboard”. When we click copy, nothing is copied. The “copy” semantic is only a user-interface concept. What really happens is “the application announces that it owns the clipboard; that the user has copied something”.

A big upside of this design is that no data is copied around unnecessarily. For example, an image editor will offer data as image/png, image/bmp, image/jpeg, etc. If the data had to be sent as soon as the “copy” action happens, then the image would have to be encoded into all these formats right away – but likely only one of these would ever be used. One could be copying a 600MB video, only to paste somewhere were a URL is will pasted.

This approach yields the greatest flexibility, but also keeps unnecessary work and memory usage to a minimum.

There’s a few other technical advantages to this design which are out-of-scope here (like the compositor not needing to allocate huge amounts of memory for clipboard data).

An issue with this design

A big problem with this design is that if I copy an image (e.g.: from GIMP) and then close that application, the clipboard selection is lost. I can no longer paste it; it’s gone forever.

This is a well-known issue on Linux desktop. There’s a couple ways around it:


  1. Type starting with x- are non-standard. x-moz- means it’s a mozilla-defined one. There seems to be no standard mime type for a URL. ↩︎

  2. A file descriptor is what you get when you open a file. You can think of it as an object into which you can write or from which you can read. In this case, there’s no real file, but it’s a useful abstraction when two applications need to send data to each other directly. ↩︎

— § —