Please note - I know almost nothing about gaming hardware or the state of the art with games. So you may take my commentary other than what I actually tested and did with a large chunk of salt.
This project had its birth in 1998, when I got the company NT 4 Terminal Server Edition MSDN CD. I took it home, did a couple of test setups on a P166 with 64 MiB of RAM I had, and then tried to figure out something unusual to do; it didn't seem all that interesting to just install Office 97, that was no challenge.
When I saw my Quake CD, the idea was born...
Unfortunately, I knew nothing about optimization of Quake at that point. To make matters worse, my concept of optimizing the NT experience in general back then was to make sure I left the default driver with 4-bit color and used an old, crappy, 56Hz monitor to discourage anyone from working on console and breaking it.
Surprisingly enough, nothing crashed when I tried it out over a TS connection. The experience was terrible; color was washed out and the frame rate varied from a high of about 2-3 fps(standing still) to a low of about 1/20 frame per second, but that was to be expected. I did nothing to compensate for color issues (which also affected speed), I was running in 640x480 resolution (difficult for my P-120 with 16 MiB RAM and Win95 to play well locally), and I had done nothing to forcibly kill the attempt to use sound by the software. In any case, I filed it away as a lost cause. In the back of my head I remember thinking that when really fast, high-RAM PCs were available, it might be possible to run a gaming LAN with 1 machine.
As it turns out, Quake was the last significant first-person shooter to be released which did not require 3D hardware and 16-bit color for acceptable playability. Video speeds rocketed into the stratosphere with the introduction of AGP, and the whole idea of server-distributed video for an active game was ridiculous by mid 1999. This was obviously not going to be a winning idea.
At the very least, a "budget" approach to interactive gaming hardware would suggest that the central server be essentially a machine that does mechanical physics calculations, then distributes that information to what I would call a "narrow" client. Not thin - it would need memory and very, very fast video, taking the physics model it was passed and rendering scenes. This should be a real-time number-cruncher. Image caching? Forget it; all of the optimization has to be done in prediction.
Note that this not the sweet spot that Terminal Services targets. Here's why.
If you have a chance to compare TS over a many-hop low-bandwidth link to standard desktop direct remote control tools, you will quickly notice that the Terminal Services session "feels" much better. Certainly there is uncontrollable lag, but the client interface performance is much better and even the perceived lag is reduced by the very compact data exchange.
How does it transfer data so rapidly? Well - it doesn't. The TS client software could be called a "smart client", since it works not with raw data, but with information; and it then juggles it in a very astute fashion.
In this context, I am using the word information to mean "structured, compact data". For example, when an icon's text changes, for example, you don't get the entire graphic retransmitted: the screen text is read, is transmitted as compressed text data filling only a few bytes, and then the client UI renders it back into the appropriate font onscreen.
Getting raw text is not trivial, but it isn't the only trick the client uses. It also moonlights as an impressionist artist and a theater director.
When a new window is drawn in your session, even if it has never been seen before, it is not necessarily transmitted to you. Title bars and borders take up a lot of pixels; so the server tries to just tell the client "I called this API with these parameters" - and the client fills in the appropriate details.
When you pop menus open or shut, or move windows around, you still don't cause much of a load. Anytime the client has to acquire the screen, it hangs onto the entire thing in a cache. If the backdrop is updated, it only updates a tile from the cache. If you pop a window open, the client superimposes it, then when it shuts the window will immediately re-render the last image it had of the backdrop instead of going to the sever and saying, "I forgot what was over here 2 seconds ago... could you tell me again?"
So the client really works to provide application interfaces nicely - well, some interfaces. It was designed to use every possible trick to minimize data transfer to the point where you can run Word over a 28K dialup. It was not designed to replace an AGP4x video card with 32 MiB of onboard RAM. A quick 5-minute Half Life:CounterStrike session pumps as much data to a top-of-the-line video card as a 20-user Terminal Server sends to its clients in a year.
There is no raw text being sent usually. The backdrops do NOT repeat. Almost nothing is done with raw API on the screen, and your favorite FPS game definitely does not look like a standard window. It might be able to cache some tiles, but the client shouldn't be capable of thinking fast enough to make use of them.
Still, last fall I ran across my old Quake CD. After some thought (what the heck am I thinking? I thought) I went ahead and installed it on my Win2K test server, and finally got around to trying it on a terminal connection.
Test system: Pentium MMX 233, 128 MiB RAM, Win2K SP2 server which also was a native AD domain master. Yes, this is definitely a test box - not your average domain controller.
It turns out that a remote session running in 320x240 or so works quite well. There are several reason this seems such an excellent match.
Surprisingly enough, most of the time these stayed above 10 fps. You can see a couple of shots below which have a turtle graphic - those are ones that dropped below 10. I suspect that with a slightly faster server the rate would have stayed above 10 fps throughout.
Below are some shots I took of the process; I put in a few notes between them.
Notice the title bar! Yes, I'm playing Quake in an Internet Browser. Shockwave, eat your heart out.
This is a 320x240 session - note that if we go much lower logon will be difficult. It can be done for a 320x200 session with no problems, though. I chose 320x240 since it was one of the resolutions that Quake ran at. In fact, when it came out, very few shipping PCs had the horsepower to smoothly run much higher than this if I recall correctly.

You can see we wind up with minimal screen real estate here, but Win2K gives me enough room for basic display needs.(Yes, it was 5:55 AM. So?)
The entry way. Yes, I'm going towards Easy skill level. My reflexes aren't what they were in 1996.
I did a rapid turn here - turning is what would really choke frame rates.

Still, as we walk down an episode entry hall, it looks pretty good.

First ogre encounter. I missed catching the explosion by a hair, and it didn't even cause a hiccup in frame rate.

Turtling again.
Looks like Alvin forgot to set the /real-time_display_of_bullets_from_double-barreled_machine_guns optimization flag when compiling this version of the control...

I see - he optimized for /detailed_rocket_explosions_and_ogre_gore_splatter instead.

Performance actually improved after a while; note that rapid spin I just completed from the left, and no turtling. This was actually playable.
(Well it would have been if people had stopped bugging me about slow domain logons).

Gotta go find more rockets. I couldn't remember the cheat for it.

Can you believe this is Internet Explorer?
OK, since this was really done in the interests of determining how well TS performs in a ridiculous situation, we have an obligatory performance monitor chart.
If I had actually logged the data instead of charting it, I could have pulled out the usual irrelevant 50% of the parameters that I randomly pick when running perfmon. (This is why you always log for client reports. You can filter out the stuff you thought was important and which inevitably winds up flatlining).
In any case, here's the pretty picture and the discussion is below.

If you have no clue what counters specific to TS mean, I suggest you check Q186536 - Terminal Server Performance Monitor Objects and Counters. It is supposed to apply only to NT4 TSE, but all of the counters documented there are also on Win2K. You can find direct Win2K documentation in the MSDN page on the Terminal Services Session Object.
Notice the utter lack of any significant paging. Quake did indeed run OK on a 32 MB Win95 system; the memory needs are piddling, and games generate throwaway data. The spikes are likely where I gave the server a breather and it decided to page out stuff related to unimportant processes such as DNS and File/Print.
In hindsight, these were really stupid metrics to watch - sort of.
Output Errors for TS are caused by ACK loss and malformed packets. Output timeouts are caused by noisy lines in some cases, and in high-latency networks by exceeding the internally set timeout.
Both of these are cumulative counters; all we showed here is that the cabling was probably good enough, latency wasn't an issue, and the packets were OK. Still, I used to wonder if under load a server might start sneaking behind your back and frying a few packets; apparently not.
Next time I'll check the Save Screen Bitmap counters instead.
This is an interesting one. It climbs, then drops and starts over.
Output Compression Ratio is just what it says - it tells you how well data is being compressed on output. The server probably dumped the compression dictionary where we get the drop to zero in compression. Not surprisingly, this si where we get a spike in disk queue length as the system flushes things out.
This is the most interesting one. Note the continual high frame rates (not to be confused with QUake's concept of "frame rate"). That is as expected, but with my choice of counters, we have a lot of very active things happening here that are not detailed.
Everything we are dealing with is volatile. There are also going to be chaotic peaks and drops in demand on the server there are probably bursts of demands on the server, and we are not catching any detail on those. SO a first step would be focusing on the counters which are most active. This also means we want to dump the traditional checks - Average Disk Queue Length and Page Faults/sec may have made cameo appearances in every enterprise level MCSE exam, but they don't focus on the reality of optimizing a Quake Terminal Server.
Another key issue here is granularity. Most servers, whether standard DCs, application servers, or Terminal Servers, have predictable daily cycles and their variations are long-period ones. That is not the case here; we need microscopic time scales to catch the details of the action. As a result, we also have to consider the problem of lugging the sevre and distorting the data if too heavy a load is placed on the TS from measurements. That means only a few metrics at a time can be measured, or we can only do short-period testing.
This produces another problem. We cannot realistically correlate data between runs, even if we use a demo loop. One alternative might be to use a heavy perfmon set, then do a very short test run. This has other problems though. For one thing, we would almost always be in a ramp-up or ramp-down situation; and intermittent cycling could produce skewed results from variations in caches.
A compromise solution might be this. After a few practice runs to get an idea of which parameters are closely correlated, do a subset of related metrics. Load a demo loop a few times to stabilize data, then do a "real" run.
A remaining weakness would be that the demo segmetn would be identical, producing artifically high performance data after warmup. The action is chaotic in the mathematical sense in a real-world game. A best-possible solutioon would probably be to use a Quakebot so the true action and intensity is different each time.
Finally, some actual means of comparing Quake frame rate and macroscopic state is critical. After all, the optimization is for the delivered application - not a server with minimized demands. What gives the smoothest gameplay? The best resolution is probably to carefully synchronize the time between client and server, then record a demo loop and timestamp it. This will also help us see what transmission and client assembly lags are involved.