Reading how copy-paste works from the Wayland specification is non-trivial unless you understand a lot of how desktop computing works and Wayland internal. It took me quite a while to figure it all out, though once you get there, it seems quite obvious.
Here’s my attempt at explaining how it works for mere mortals.
Terminology
Let me clarify that what we usually call “clipboard” is actually called a “selection”. I’ll use the term “clipboard” here anyway to keep this friendly, but keep in mind that it’s not the actual technical term.
Copying
When you select some text and press ctrl+c, you’d normally say that the program
“copies” data into the clipboard. In reality, no copying happens at this point.
What the program actually does is announce “I have the clipboard now, and have
data of type text/plain
”. At this point, the previous application that owned
the clipboard (if any) is informed that they no longer own the clipboard.
From this point on, whenever another application gains focus, it’ll be informed
that someone owns the clipboard and is offering data of type text/plain
. This
happens when an application gains focus because only a foreground application
can access the clipboard.
However, we don’t just copy text, we can copy anything. Let’s copy a png image
on Firefox (e.g.: right click, copy image). At this point, Firefox announces
“I have the clipboard; I have data of type image/png
, text/x-moz-url
1
or text/plain
”. This means that the data can either be pasted as a png image
(e.g.: raw image bytes) or as a URL (the URL to the image we’ve copied) or as
text (again, the URL. This is the fallback for programs that don’t support any
of the above two).
This all oversimplified one tiny bit; the copying process is not one step, but a few (but the above introduction should help this make sense). In full detail:
- The application creates a
wl_data_source
object, indicating that it’s going to offer data to other applications. - The application adds the mime types that it can handle to the data source
(with
wl_data_source::offer
) - The application finally calls
wl_data_device::set_selection
, to indicate “I’m taking ownership of the clipboard, and the above createdwl_data_source
is what I’m offering”.
Pasting
Copying wasn’t hard, now let’s see how the other side works.
As mentioned above, applications are informed when another application owns the clipboard.
So let’s imagine an you’ve copied an image in Firefox, and now switch to a
terminal. The compositor will inform the terminal “Somebody owns the clipboard
and is offering data of types image/png
, text/x-moz-url
or text/plain
”.
If you try to paste, the terminal will ignore the types it doesn’t know how to
handle, and will request the data of type text/plain
. At this point, the
owner of the clipboard (Firefox) is informed that somebody wants to paste its
data and receives a file descriptor2 into which is must write the data. The
terminal receives another file descriptor where it can read the data. Anything
written to the first one is read out the second, so the data is transferred
directly between applications with no middleman. It’s basically a pipe; data
goes in one end, comes out the other.
Some notes on this design
First of all, one has to understand that, under the hood, nothing is ever “copied into the clipboard”. When we click copy, nothing is copied. The “copy” semantic is only a user-interface concept. What really happens is “the application announces that it owns the clipboard; that the user has copied something”.
A big upside of this design is that no data is copied around unnecessarily. For
example, an image editor will offer data as image/png
, image/bmp
,
image/jpeg
, etc. If the data had to be sent as soon as the “copy” action
happens, then the image would have to be encoded into all these formats right
away – but likely only one of these would ever be used. One could be copying a
600MB video, only to paste somewhere were a URL is will pasted.
This approach yields the greatest flexibility, but also keeps unnecessary work and memory usage to a minimum.
There’s a few other technical advantages to this design which are out-of-scope here (like the compositor not needing to allocate huge amounts of memory for clipboard data).
An issue with this design
A big problem with this design is that if I copy an image (e.g.: from GIMP) and then close that application, the clipboard selection is lost. I can no longer paste it; it’s gone forever.
This is a well-known issue on Linux desktop. There’s a couple ways around it:
- When something is copied and all of a program’s windows are closed, the program can linger in the background, windowless, until is loses the clipboard. This might be really hard to implement for some applications due to how they’re designed, and needs to be implemented by every single on. It works for tools which focus to copying data though (this is why I do with shotman).
- A clipboard manager. Clipboard manager used a privileged API to always be notified when any application takes ownership of the clipboard; when this happens, they can copy all the data, and take ownership of the clipboard themselves. The majority of the implementations out there are broken and only handle text. Many dump the data to disk, which makes them a bit risky if you ever copy sensitive data into your clipboard. I’ve written a clipboard manager that works (clipmon), but it has one big problem: if you copy an image that is offered in many different formats, it has to copy all those formats. This adds a lot of unnecessary work (and memory usage) for many scenarios.
Type starting with
x-
are non-standard.x-moz-
means it’s a mozilla-defined one. There seems to be no standard mime type for a URL. ↩︎A file descriptor is what you get when you open a file. You can think of it as an object into which you can write or from which you can read. In this case, there’s no real file, but it’s a useful abstraction when two applications need to send data to each other directly. ↩︎