The X Window System (aka X11) is a client/server network protocol that's been used for decades on a variety of different hardware platforms. It has been implemented by a number of different vendors for a wide variety of hardware platforms.
In Ubuntu, we ship X11 as implemented by the X.org project on Linux. This is the official "reference implementation" of X11 and is the implementation of X used in all major Linux distributions today.
X is designed as a client/server architecture. The clients communicate to the X server using the X11 network protocol. Clients can run locally to the xserver or remotely on other machines.
The xserver includes a framework for managing video and input device X drivers. These drivers interface to lower level kernel device drivers (or to the hardware directly in a few cases). Typically these drivers are developed and supported by the hardware vendor in conjunction with the kernel and X.org communities; in some cases the drivers are developed and supported exclusively by the vendor (such as in the case of closed-source drivers), or by community volunteers (such as when the hardware vendor has chosen to not support Linux themselves.) The xserver itself is developed and supported by the X.org community. Distribution vendors like Canonical are then responsible for integrating the server, drivers, and clients together into a form that users can easily install.
Client applications use a protocol library such as libX11 for sending and receiving commands to the X server. They also typically employ one or more X toolkit libraries to draw and operate widgets like buttons and scroll bars.
Client applications that provide 3D functionality operate a bit differently. These can be divided into two groups. Some 3D client apps communicate through the X server using an OpenGL protocol library such as libglx; this is called "3D Indirect Rendering". Others bypass the X server entirely and communicate with the graphics hardware directly to gain extra performance; this is called "3D Direct Rendering" and uses the Direct Rendering Infrastructure (DRI) (see diagram below).
The X server receives graphics requests from the client programs to be displayed to the user, and it sends back user commands from input devices such as keyboards, mice, touchscreens, etc.
The X server itself is composed of a "core", "extension modules", and hardware-specific "drivers".
Extension modules add various technologies such as graphics acceleration, video acceleration, font support, compositing, screen resizing, and so forth.
Video and input device drivers contain logic for translating between general X operations and the specific register-level operations particular to the given piece of hardware.
We often say "the video driver", but actually for a given video card there are three different kinds of drivers:
2D DDX driver: The 2D video "Device Dependent X" (DDX) driver is what most ordinary 2D client applications use. It handles selecting the video mode and resolution, provides 2d and video acceleration, and does the initial setup for DRI. Ex. xserver-xorg-video-radeon.
DRI driver: The "Direct Rendering Infrastructure" (DRI) driver is responsible for programming the 3D hardware. Usually DRI drivers use the Mesa state machine. In the DRI, the GLX client-side library loads a DRI driver, named radeon_dri.so.
Kernel DRM driver: The "Direct Rendering Manager" is the kernel-side component of the DRI that allows applications direct access to the graphics hardware. The DRM is responsible for security and handling resource contention. Ex. radeon.ko
Input devices communicate with the X server via the XInput protocol.
Like with video drivers, "the input driver" really refers to two different kinds of drivers. The Linux kernel provides a generic Linux input driver called evdev. There is a corresponding X driver called xserver-xorg-input-evdev which interfaces with this kernel driver. Many input devices including all keyboards and mice can use evdev for all their needs.
Historically, before evdev, X used a myriad variety of input device drivers. Many of these drivers still exist and can be useful for hardware not yet properly supported by evdev. Others are obsolete and are gradually falling by the wayside.
One advantage of evdev over the historical drivers is that it supports "Input Hotplug"; this enables the automatic detection and initialization of hardware without requiring restarting X.
Another change is that specialized input buttons, such as multimedia hotkeys provided by ACPI, are managed at the kernel level rather than in userspace through the X server. This is advantageous in that a lot of cruft had built up to workaround hardware specific problems getting hotkeys to work, and this enables cleaning that up much more properly.
X in general:
X Development: http://www.x.org/wiki/Development
X Performance: http://www.x.org/wiki/Development/Documentation/Performance
xorg 7.5: http://www.x.org/wiki/Releases/7.5
2D DDX drivers:
How Video Cards Work: http://www.x.org/wiki/Development/Documentation/HowVideoCardsWork
DRI and the 3D infrastructure:
Direct Rendering Infrastructure: http://en.wikipedia.org/wiki/Direct_Rendering_Infrastructure
DRI Documentation: http://dri.freedesktop.org/wiki/Documentation
Mesa 3D and Direct Rendering Infrastructure wiki: http://dri.freedesktop.org/wiki/
Input Event Processing: http://www.x.org/wiki/Development/Documentation/InputEventProcessing
X Hotplug Proposal: http://www.x.org/wiki/XHotplugProposal
Hotkeys Architecture: https://wiki.ubuntu.com/Hotkeys/Architecture