In graduate school at UNC we made a graphics computer called the Pixel-Planes. I wrote a simple ray tracer for it, which as far as I know is the world’s first real-time SIMD ray tracer. Below is the homework paper I wrote for Turner Whitted's graphics class.
The Pixel Planes 5 architecture provide dozens of renderers, which are grids of 128x128 processors that map to regions of the screen. These SIMD processors are ideal for ray tracing, since each can claim the ray assigned to its pixel and trace the ray's route through the model. This project investigates the capabilities and limitations of a ray tracer implemented on these renderers.
Pixel Plane 5 renderers are Single Instruction Multiple Data (SIMD) arrays of 128x128 (16384) processors. Each processor has 208 bits of local memory and 4096 bits of additional, slower memory. All instructions are bit-serial, and an Enable register can mask out some processors from operations. Renderers are assigned to parts of the screen, often doing multiple parts serially since there are fewer renderers than regions on a screen.
In addition to the 208 bits of local memory, each processor also has 4096 bits of Backing Store, which can be accessed in 32-bit blocks. Access to this memory is significantly slower than access to the local memory, but accesses are asynchronous and can overlap other operations.
The renderers are fed SIMD instructions from the Graphics Processors, which are i860's. In this program, the i860's largely sit idle, although they might be a good source of processing power if optimizations were to be later added to the program.
The algorithm used is the standard screen-space subdivision ray tracing, where parts of the screen (in this case, pixels) are assigned to different processors for parallel processing. Each processor computes the initial ray depending on its position on the screen, then traces the ray into the model, intersecting with the objects, reflecting and shading as necessary.
The outline for each processor's algorithm is as follows:
procedure Trace-Ray(ray) for each generation do min-t = Max-Int for each sphere do Enable this processor t = intersect ray with sphere if t is imaginary or t < 0 or t > min-t then Disable this processor end if min-t = t point = ray × t normal = (point − sphere.center) / radius Shade based on normal dotted with light vector ray = reflection of ray Disable this pixel for future calculations in this generation end for end for end procedure
The projection can be orthogonal, where the tail of the initial ray is variable but the direction is constant, or perspective, where the opposite is true.
The biggest challenge with this implementation was the limited memory on the renderers—208 bits per processor. This limit, along with the desire to have at least a 1280x1024 display (and thus allow object positions anywhere in this range), limited the implementation of the algorithm to integer arithmetic; fixed-point would have taken too many additional bits. Fixed-point is used at one point, as noted below, but the ray parametric parameters, the model description, and most intermediate values are kept as 12 or 24 bit integers.
With most integers being on the order of 500, or 29, problems arose when calculating parts of the intersection formula. Namely, the coefficients of the quadratic equation were on the order of the squares of the integers, or 218, and they themselves are squared in the determinant of the quadratic formula, to yield numbers on the order of 236. The square root function needs an equal number of bit for the input, the output, and some temporary space, so well over half the memory needs to be free just for the square root operation!
This problem led to the use of the Backing Store. The ray, which is kept in six 12-bit integers, is swapped out to the Backing Store after the coefficients of the quadratic equation are calculated. It stays there until the parametric value T is calculated, whereupon it is swapped back in and the intersection point is found. The Backing Store is used a few other times to save important values while their space is used for temporary variables.
To increase interactivity, the scene is rendered at one-quarter the resolution (640x512) while the joystick is active; when the joystick stops, the full resolution (1280x1024) is rendered.
Although the original idea was to use fixed-point for all variables, some problems arose with the calculation of the parametric value T. When the normal of the sphere is calculated, 12 bits are used to store each of the normal's components, and the normal is simply kept as the length of the sphere's radius. This yields a normal which has length on the order of 200. When such a ray is intersected with another sphere which is on the order of 400 units away, the parametric value T truncates to 0, 1, or 2. To fix this, an additional 6 bits were added to the right of T's binary point, which significantly helped the accuracy of the reflections. This bias is removed after the value is plugged back into the ray.
It would be interesting, memory willing, to introduce statistically-based random jittering of reflection rays so as to simulate different surface properties. To minimize the noise in the resulting image, each pixel would have to be oversampled several dozen times; since we are not compute-bound on the renderers, this may be possible to do at a reasonable rate. Some problems may arise in keeping this averaged information in memory, and also in distributing random values to the processors in a way that does not reveal renderer boundaries as artifacts. A successful implementation of such path tracing might yield fairly decend and fast global illumination for simple scenes.
The renderers of Pixel Planes 5 are fairly poorly suited for ray tracing applications that desire anything more that the very basic of features. Future architectures, namely Pixel Flow with 256 bytes per pixel, will be able to not only render much more complex scenes at faster speeds, but will be able to use the extra memory to allow the incorporation of other standard ray tracing features, such as shadows, specular highlights, and transparency.