Introduction
We have been working to introduce a full multitouch and gesture solution for Ubuntu, and we’ve targetted Ubuntu 10.10 (Maverick) Netbook Edition as an initial test and integration milestone for our efforts. As we are near to showing off our work, I would like to give an overview of the technical approaches we have taken for 10.10 and highlight our future architectural directions.
When we started a few months ago, multitouch support was really only integrated and enabled in the Linux kernel. There are a few families of devices with multitouch support, and we are targetting N-Trig touchscreens specifically. We also plan to support the Apple Magic Mouse and touchpads utilizing the BCM5974 chip, which can be found in Apple and other mainstream laptops. There are a few other multitouch devices that have drivers in Ubuntu 10.10, but we have not been able to fully test them.
Multitouch Slots
The first bit of good news was the addition of the multitouch slots protocol between the kernel and userspace. Henrik Rydberg, a member of the Canonical Multitouch team, wrote the slots protocol that is now available in Linux 2.6.36. Further, he has created a library named mtdev to convert legacy kernel drivers to the slots protocol in userspace. What’s so great about this slots protocol? The slots protocol provides touch tracking, which ensures that applications don’t become confused when two or more fingers are on the device at once. With mtdev, the touch events representing your index finger and your thumb will be kept separated so an application can easily keep track of them.
Multitouch Application Support
Now that we have tracked touches, we need some way to send them to applications. Applications built for Linux generally run through the X Window System. This system provides for windows and mice and keyboard input, among many other features. The most logical way to send multitouch events to applications is by sending them through the X server. This way, a touch on top of one application’s window will be sent only to that application. Other applications with windows underneath the top window should not be aware of the touch. The X server already handles this input propagation for mice, so it just needs to be extended for multitouch devices.
Unfortunately, there are a lot of hairy issues with X input that aren’t readily apparent to users. For example, if I hold the mouse button down over one window and drag outside of the window, the window still receives all the mouse events even though the cursor may physically be located above a different window. This functionality is implemented through a concept called “grabbing”. Without getting into the details here, supporting multitouch grabbing means we have to develop an extension to the current X server input architecture. Work is underway on this, as Peter Hutterer has proposed an XInput protocol extension for multitouch. Although we have a proposed protocol, the implementation will not be ready in time for Ubuntu 10.10. We are looking forward to our continued work with Peter and other X developers to help develop the implementation and provide it in a future Ubuntu release.
Gesture Support
So what multitouch features will be ready for Ubuntu 10.10? Without the ability to send multitouch events directly to applications, we can still send some data. We can listen to the multitouch events that are sent from the kernel to the X server and make decisions about whether they are useful and what they mean in a given context. We can group touches into a set of predefined meanings: gestures. For example, two touches that are moving towards each other can represent a pinch-to-zoom gesture. To facilitate the recognition of gestures, we have created a library named grail (Gesture Recognition and Instantiation Library). Grail takes tracked touches from mtdev or the kernel and attempts to recognize gestures from a predefined list:
* Swipe (moving fingers in a uniform direction)
* Pinch (moving fingers closer or farther apart)
* Rotate (moving fingers around each other)
* Tapping
Further, grail provides for recognizing each of the above gestures from one to five fingers. Lastly, grail uses a callback function to ask for clients for a recognized gesture. If a client requests the gesture, the gesture event is passed to the client. If no client requested the gesture, the touches are translated into a single-touch pointer motion to support general mouse input control.
X Hacking
Here’s where things become interesting! Applications in Linux work through the X server as noted above, so it would be nice if we could pass gestures through X to the applications. In Ubuntu 10.10 we have taken the xserver-xorg-input-evdev module, an X input module/driver that translates kernel keyboard and mouse input into events passed to applications, and added a bit of code to pass kernel multitouch events through grail. When grail recognizes gestures, it uses a callback into the module to determine if any clients, X applications in this case, are listening for the gestures. If so, then the gesture event is passed to the correct client through X.
How do we pass gestures through X? We’ve written a new X Gesture extension. X clients can listen for a set of gestures on any X window. If a gesture occurs in the window, the gesture event is passed to the client. Note that gesture event propagation occurs similarly to input event propagation. A gesture occurs in a child window, and all windows up the X window hierarchy from the child window to the root window are tested for any clients listening for the gesture type. The first window with a client listening for the gesture type receives a gesture event, and propagation stops. The difference between normal mouse input and gesture propagation is in determining the child window. In normal input event propagation, the child window is the top-most window under the cursor when the event occurred. In the gesture case, the child window is the top-most window that contains all the touches that make up the gesture.
(Note that the above leaves out some details for clarity. If you would like more details on X event propagation, see the X11R7.5 documentation)
So we have multitouch gestures through X, we’re done now right? Wrong. Our approach of embedding all of this can be termed as a “hack”, a suboptimal solution. We fully recognize this and include it in Ubuntu 10.10 only as a stop-gap measure. We will be reaching out to the X developer community to open a dialog over the potential inclusion of a more optimal solution in X, if that is desirable for everyone. We have taken great pains to create a new X extension without publishing an official API for it in an effort to make clear that the solution is still in its infancy and interfaces may change. However, we invite developers to poke around and play if they are so inclined. Stephen Webb, another member of the Canonical Multitouch team, has created a higher-level C library called geis that will be included in Ubuntu 10.10. We hope that the library is flexible and extensible such that we will not have to break backwards compatibility once all the X implementation details are worked out, but we can’t make any guarantees at this time.
To give an idea of where we think the X gesture implementation is headed, I believe the first step is implementing true multitouch support through X as discussed above. Then, we can look into extending our small X Gesture extension to abstract out the gesture recognizer such that grail or any other gesture recognizer can be plugged in. We could then siphon multitouch events from DIX, a component of the X server that processes event propagation, instead of siphoning off events in an input module.
Again, I invite you to check out all our work at http://launchpad.net/canonical-multitouch/, feel free to poke us in #ubuntu-touch, and we look forward to your comments and suggestions!