Architectures for a Compositing Manager
There are various ways that we can implement an X compositing manager; this document tries to sketch out the tradeoffs involved.
1. Compositing manager on top of RENDER
Using RENDER to implement a compositing manager has the advantage that RENDER works directly with the objects that the compositing manager is manipulating; pixmaps and windows.
But it has disadvantages; first, the RENDER API does not expose the full power of modern graphics cards. If you want to say, use pixel shaders, in your compositing manager, you are out of luck.
Second, to get decent performance, we need hardware acceleration. The basic problem with hardware acceleration of RENDER is that it doesn't always match the hardware that well; in particular, for filtered scaling, hardware is mipmap based, a concept not exposed in RENDER.
Hardware accelerating RENDER to get good performance for the drawing we do within windows is not challenging, but getting good performance for full-window and full-desktop operations may be harder.
1.a. RENDER acceleration written directly to hardware
The first approach to accelerating RENDER is what we do currently; directly program the hardware. At some level, this allows optimum performance, but involves duplicating work that's being done in the 3D drivers; which have a much more active development community. RENDER is also an unknown quantity to hardware vendors so we're unlikely to get good support in closed-source or vendor-written drivers. Using a known API like OpenGL to define the hardware interaction would give us much more of a common language with hardware vendors.
1.b. RENDER acceleration written on top of GL in the server.
The other approach is to implement RENDER on top of GL. Since GL is a pretty close match to the hardware, we shouldn't lose a lot of efficiency doing this, and features such as pixel shaders should eventually allow for very high-powered implementations of RENDER compositing. A start of this work has been done by Dave Reveman for the 'glitz' library.
1.b.i. GL in the server done with a GL-based server
If we are accelerating RENDER in the server with GL, we clearly need GL in the server. We could do this by running the entire server as a GL client, on top of something like mesa-solo, or nested on top of the existing server like Xgl. Both have their disadvantages; mesa-solo is far from usable for hosting an X server, and nested X server still requires that the "backend" X server be maintained.
1.b.ii. GL in the server done incrementally
An alternative approach would to be to start by implementing indirect rendering; once we had that, we'd have DRI drivers running inside the server process. It would be conceivable to use those DRI drivers to implement parts of the 2D rendering stack while keeping all the video card initialization, Xvideo implementation, and so forth the same. This work has been started in the accel_indirect_glx branch of Xorg and is making good progress.
2. Compositing manager on top of OpenGL
2.a. Compositing manager renders to separate server
The way that luminocity works is that it runs a headless "Xfake" server (all software), sucks the window contents off of that server then displays them on an output server. Input events are forward the other way from the output server to the headless server.
This model has a great deal of simplicity, because the CM can simply be a normal direct rendering client. And it doesn't perform that badly; rendering within windows simply isn't the limiting factor for normal desktop apps. Performance could be further optimized by running the CM in the same process as the headless server (imagine a libxserver).
The forwarding of input events is pretty difficult, and having to extend that for XKB, for Xinput, for Xrandr, and so forth would be painful, though doable. The real killer of this model is Xvideo and 3D applications. There's no way that we can get reasonable performance for such applications without having them talking to the output server.
2.b. Compositing manager renders to same server
The more reasonable final model for a GL-based compositing manager is that we use a single X server. Application rendering, whether classic X, GL, or Xvideo is redirected to offscreen pixmaps. Then a GL based compositing manager transfers the contents of those pixmaps to textures and renders to the screen.
2.b.i. Compositing manager uses indirect rendering
We normally think that direct rendering is always better than indirect
rendering. However, this may not be the case for the compositing
manager. The source data that we are using is largely on the X server;
for a direct rendering client to copy a pixmap onto a texture
basically requires an XGetImage, meaning copying the data from the server to the client. So, the right approach may to be use
an indirect rendering client, which could simply manipulate textures as opaque objects without ever needing to touch the texture data directly.
To do the work of rendering window pixmaps to the screen, we could simply generate temporary textures and use glCopySubImage to copy from the composite-redirected pixmap into the texture.
A different approach would be to have a GL extension similar to pbuffers where a pixmap is "bound" to a texture. Would it be a problem if the application could write to the window (and thus pixmap) while the compositing manager was working? The pbuffer spec prohibits changing pbuffer contents while a pbuffer is bound to a texture. It also likely involves copying in any case, since textures have different formats and restrictions than pixmaps. Avoiding new GL extensions is definitely a plus in any case. So, perhaps the simple approach with the copy is the right one.
Automatic mipmap generation would have to be implemented for the DRI to make filtered scaling down efficient, since sucking the image back to the client to generate mipmaps would be horribly inefficient.
We might want a way to specify that the textures we are creating to shadow windows are temporary and can be discarded. pbuffers or the (draft) superbuffer specification may allow expressing the right semantics.
Update: the GLX_EXT_texture_from_pixmap spec addresses a few of these issues. It is only defined for pixmaps, not windows, but since the Composite extension exposes the redirected image as a pixmap, this works. It specifies a new call, glXBindTexImageEXT, that is required to act like glTexImage2D, ie, it must act like a copy when it is executed. It also addresses the mipmapping issue by allowing the server to indicate whether it can generate mipmaps for bound pixmaps automatically. Finally, the expected usage model is that textures are bound to pixmaps for a single invocation and then unbound; the implementation may track updates between binds as an optimization, potentially eliminating copies.
While this does introduce some additional complexity due to the new extension, it does provide all the information necessary to enable a direct-rendering compositing manager (see below), again with a natural transition from older to newer stacks.
2.b.ii. Compositing manager uses direct rendering
A sophisticated implementation of direct rendering could perhaps make
copying from a pixmap onto a texture more efficient than
XGetImage. Right now the DRI relies heavily on having a
application-memory copy of all textures, but it is conceivable that it
could be extended to allow textures that are "locked" into video
memory. With such a texture, you could imagine the server taking care
of copying from pixmap to texture, even if rendering with the texture
was being done by the client.
Still, there seems to be an inevitable high synchronization overhead for such a system; the client and server are doing a complex dance to get the rendering done.