February 2020
In 2017, Brad Grantham and I finished the Alice 4, a tablet that did nothing but run late-1980s 3D graphics SGI demo programs, and we were looking for something to do next. The Alice 4 used what’s called a “fixed pipeline” GPU, meaning it could rasterize triangles but not do much else.
Modern CPUs can run fragment shaders, which are user-written programs that decide the color of each pixel individually. The wonderful website Shader Toy is a great demonstration of what’s possible. The 3D scene is a single 2D rectangle that fills the screen, and the 3D look comes entirely from the procedural “texture” on that rectangle. This rock, by Alexander Alekseev, is a great example, and one of the more difficult shaders we were trying to display:
We decided that the Alice 5, like the Alice 4, would be a tablet with an ARM CPU and FPGA GPU, but instead of the traditional 3D fixed pipeline, it would only run fragment shaders. In fact, it would do nothing but browse and run Shader Toy shaders!
The plan was to implement a RISC-V processor in Verilog and instantiate as many of these as we could on the Cyclone V FPGA. Brad amazingly got this done in almost no time, quite the feat considering it was his first Verilog project. He also wrote a RISC-V emulator to compare results to.
My job was to get the GLSL shader compiler working. We started with Khronos Group’s SPIR-V front-end, which generated an intermediate representation of the shader in what’s called Static Single Assignment (SSA) form, meaning that it pretends it has infinite immutable registers. My compiler back end took this intermediate form and generated RISC-V assembly language. The difficult part was converting the infinite number of registers to a finite number of them, spilling the rest to memory as necessary.
This turns out to be a notoriously difficult problem. In fact no mainstream compiler uses SSA (the idea is too new), so although it’s generally considered a superior form for the intermediate code, there’s relatively little research on how to actually assign registers from it. After months of reading papers and implementing complex data structures and algorithms, I got pretty decent results.
I also wrote a RISC-V assembler, in case we wanted to add our own instructions to the architecture. (We didn’t, in the end.) We succeeded in getting the project working: a multi-core RISC-V-based GPU that ran Shader Toy shaders!
Our performance goal was to hit five frames per second for complex shaders (like the rock above) and 30 frames per second on simple ones (like my constructive solid geometry test below), at a resolution of 720×480.
At that we completely failed, and by a large margin. Our frame rate was so low that there was simply no optimizations or improvements that could plausibly get us to anything remotely interactive. We estimated that a few “easy” optimizations (faster hardware clock rate, some CPU pipelining) could get us a factor of four, and difficult optimizations (custom ASIC, specialized instructions, better optimizations in compiler) might get us another factor of 30. Even with all those, the rock above would still only display one frame every three seconds. We’d still be a factor of 15 away from a slow but interactive frame rate.
What’s a bit embarrassing is that we could have predicted this before starting. We could have estimated the number of instructions per frame for these shaders (about 200 million) and the number of cores we could fit on an FPGA (about 5), and known that we were impossibly far from having a tablet with interactive animations. I don’t know why we didn’t do a simple Fermi estimate. Perhaps we were too confident after our previous successes.
To make matters worse, my part of the project was beyond frustrating. Whereas the FPGA experience on Alice 4 was near-effortless, the chip we used for the Alice 5 (just a later revision of it) never worked reliably. The demo program provided by the vendor worked fine, but almost any modification caused it to lock up.
After a year of work, Brad and I shelved the project. It was the first project we’d worked on that we ended not because we were bored of it, or because we’d achieved our goals, but because there was simply no path to success. I used to worry that my personal projects weren’t ambitious enough, but this experience put that fear to rest!
The source code is available on GitHub. Brad made a slide presentation with many more details.