Multitouch/Architecture

Overview

The gesture framework is a complete gesture recognition stack. As input, it takes a multiplexed stream of touch events from the display server. The output is a multiplexed stream of gesture events.

INSERT PICTURE HERE

The gesture framework differs from many existing gesture stacks in that it:

supports both touchscreens and touchpads, and in the future will support independent devices like the Apple Magic Mouse.
supports multiple simultaneous gestures on the same device and in the same window.
has a backend architecture that allows for rapid development on newer window servers.
allows for gesture and raw multitouch support in the same window.
allows for the client to make the decision about recognizing a gesture vs replaying the touches as multitouch events.
allows for multiple-gesture recognition decisions.
- For example, two three touch taps in sequence may be recognized as a double three-tap. If one tap is seen but not a second one, the first tap gesture may still be rejected even though it has physically ended.
on X11, gestures are analyzed as soon as they physically occur, even if the operating system is attempting to recognize gestures.
- This reduces latency of gesture recognition once touch events reach the client application.

We also recognize that the multitouch gesture field is nascent. Although the gesture stack has undergone major architectural changes, we have maintained compatibility with previous implementations through a stable API layer.

The gesture stack consists of three main components:

Frame: Groups touches from the same device occurring on the same window together
Grail: Performs gesture recognition on touch frames
Geis: API layer allowing for backwards compatible gesture stack implementations

Frame

Frame groups touches into units that are easier for grail to operate on. Gestures are recognizer per-device and per-window, so touches are grouped into units representing pairs of devices and windows. This is also where all backends for each window system are implemented. Frame events are platform independent.

INSERT PICTURE HERE

Some window systems, like X11, also have the concept of touch sequence acceptance and rejection ¹. This is functionality is provided through frame as well.

Touch sequence acceptance and rejection is a core aspect of the gesture stack when used for system-level gestures. Imagine a finger painting application is open on a desktop environment where three touch flicks are used to switch between applications. If the user performs a three touch tap, the expected result is three dots drawn in the painting application. When the user performs a flick, the stack accepts the touch sequences and switches applications. This prevents the painting application from handling the touches. When the user performs a tap, the stack rejects the touch sequences because they do not match a known gesture. The painting application then receives the reject touch sequences.

Grail

Grail is the gesture recognizer of the gesture project. It takes the per-device, per-window touch frames from frame and analyzes them for potential gestures.

INSERT PICTURE HERE

Grail events are generated by frame events. Rather than duplicate the frame provided data, grail events contain the gesture data and a reference to the frame event that generated it. This allows for clients to see the full touch data comprising a gesture.

Grail gesture events are comprised of a set of touches, a uniform set of gesture properties, and a list of recognized gesture primitives. The supported primitives are:

Drag
Pinch
Rotate
Tap
Touch (Deprecated)

The gesture properties are:

Gesture ID
Gesture state (begin, update, end)
The frame event that generated the Grail event
The original centroid position of the touches
The original average radius, or distance from the centroid, of the touches
A best-fit 2D affine transformation of the touches from their original positions
A best-fit 2D affine transformation of the touches from their previous positions
A flag denoting the construction state of the gesture

Drag, pinch, and rotate properties are encapsulated by the affine transformations. The transformation is of the form:

⎡ a  b  c ⎤
⎜ b -a  d ⎥
⎣ 0  0  1 ⎦

The position movement caused by dragging is the vector (c, d). The pinch magnitude, or scale, is sqrt(a² + b²). The angular rotation is cos^-1(a / s), where s is the scale.

The construction state flag will be discussed in the Normal Mode of recognition.

There are two main modes in which the gesture recognizer operates: Normal Mode and Atomic Mode.

Normal Mode

This is the default behavior of grail. Multiple simultaneous gestures are supported, and the client may choose which gestures to accept and reject.

A pool of recently begun touches is maintained. In the current implementation this includes any touches that have begun within the past 60 milliseconds. When a new touch begins, it is combined in all possible combinations with touches in this pool in order to create potential gestures matching any active subscriptions.

INSERT PICTURE HERE

A new gesture instance is created for each combination of touches. Each instance has an event queue, and new instances have one begin event describing the original state of the touches. The events are queued until any gesture primitive is recognized. When frame events are processed, any changes to touches in a gesture instance generate a new grail event. The new touch state is analyzed, and subscription thresholds and timeouts are analyzed to determine if any of the subscription gesture primitives have been recognized. For example, the default rotate threshold is 1/50th of a revolution, and the default rotate timeout is one half second. If the threshold is met before the timeout expires, the rotate gesture primitive is recognized.

Now that a gesture primitive has been recognized, the grail event queue is flushed to the client. The client must process gesture events and make a decision on whether to accept or reject each gesture.

Atomic Mode

This is the legacy behavior of grail. It supports the original architecture where only one gesture is supported per device and window at a time. Gesture acceptance and rejection is unsupported.

XInput 2 Specification (1)