New approaches and possibilities in computer graphics using high level shading languages
Karlis Salitis, Student, Cs
Daugavpils University, Daugavpils, Latvia
Computer graphics can be widely used in many areas of our work and entertainment. There are a lot of approaches we can choose to render something. Our task is to concentrate on the biggest sector of computer graphics, PC graphics, and learn more about it.
In the research we want to look at the new approaches in graphics path, that gives both hardware acceleration and incredible flexibility in terms of possible effect creation.
During the process we are also eager to find, what current solutions we have and what kind of improvements we might expect in future, from both software and hardware.
Computer graphics is advancing increasingly fast nowadays. Incredible effects are merged with movies, there are a huge number of cartoons fully designed on computers, and graphic in games is starting to approach movie like quality.
Despite the fact that computer graphic might seem to be straightforward science, there are a lot of approaches to do one thing, and mostly, the way you’ve chosen will change the rendered output radically.
There are two forces, though, that drive graphic industry, the first is the look of the image we render “If it looks like computer graphics, it is not good computer graphics” – Jeremy Birn, and the second, is the amount of time we need to spend, in order to get image done. The ratio between them is usually the main factor that lets us decide when we use one approach and when another.
Best quality rendering is usually achieved using ray tracing, radiosity, or photon mapping , as they take in account not only object we are drawing, but also its surroundings, so, effectively dealing with reflections, refractions and other phenomenons. But, unfortunately it is overkill to use them for daily tasks, as time to render one frame might take weeks on standard PC. These approaches are mostly utilized in movie production and digital imaging.
PC graphics, on the other hand, is usually required for simulation of processes, or for entertainment. High quality static images are not what we are looking for, we need PC to be able to render images in real-time, which means to update image at least 10 times per second. Visual flaws usually go by just fine if process itself looks smooth.
But the task of this paper is not to analyze every approach, instead we want to concentrate on most widely used one, and show how we could use it more efficiently by taking advantage of recent improvements in hardware and software. It's obvious that computer clusters are not so widespread to be considered widely used, so we'll stay with PC graphics.
There are many ways to do graphics on PC, starting from simple raster graphic up to ray-traced images, but as you may already guessed, most of the approaches are too limited or too slow to be useful. But there is also a halfway, that offers best of both worlds - existence of two industry supported graphic application programming interfaces – OpenGL  and DirectX . These interfaces are nothing more than two big sets of graphic manipulation commands, but the power of them hides is in the hardware support, and worldwide usage.
Lets take look at the principles behind them. Both of these interfaces have common structure – graphic tasks are ordered in a pipeline, which describes what is happening with the data we pass and in what order. Geometrical data is passed in a vector form along parameters, which describe how it should be handled, and what calculations must be applied. Resulting image is rendered to screen or to chosen pixel buffer.
Graphic pipeline can be divided in three stages: application, geometry and rasterizer, where each stage is responsible for specific task.
Application stage is the entry point. It is the base of our application where we decide which data must be drawn, set up things we want or don't want to see, do the physic calculations and user input handling. Basically we feed the next stage with data we want to draw and set how it will be handled further.
Geometry stage, works only with vertex data passed from previous stage and is subdivided in 5 sub stages (Fig. 1).
The first sub stage transforms  all object coordinates relative to eye position for further light calculation. Second sub stage applies Phong lighting model  on every vertex, that's lit by one or more lights, and passes data to rasterizer stage. Projection sub stage is transforming geometry coordinates too, but this time, they are transformed to unit cube, which represents viewing volume. Clipping stage makes sure that after projection there are no vertices outside the volume, if there is, vertices are destroyed or created, so that there are no geometry outside of it. Finally, unit cube is stretched so that its corners match screen coordinates, and passes further.
Finally, rasterizer stage takes shapes one by one and transforms into pixels. It also assigns color values from texture if applicable, multiply with light intensity and output pixel if it is not occluded. Blending operations might occur during this stage too.
Such architecture is very handy, since programmer has to worry about application stage mostly, API does the rest. However, there might be cases when such model is very limiting. What if you want to override API lighting function with more precise one - there is simply no way to do that. This is the reason why this approach is often referenced as fixed-function pipeline.
But there is one field fixed-function pipeline excels in, and that is optimization. Because, of the fact, that pipeline is always working the same way, it is possible to optimize performance of each stage. Moreover, n staged pipeline can give up to n times speed boost, because when 1st stage has finished it's job and provided information to 2nd stage, it can start to work on next chunk, without stalling.
If you have paid attention to computer prices, then you might have noticed that high-end video card is quite expensive component of PC. The reason for that is that video cards nowadays are becoming more complicated, by taking many tasks, stages, off the CPU.
It's impossible to imagine situation, where CPU would be responsible for rasterization now, as it is one of the most time demanding tasks, but during 90-ies it was a common practice.
As we mentioned earlier, APIs are industry supported, in real world it means that hardware manufacturers implement more features into hardware, and supply API function code with their drivers. When we call one of API's functions, we are calling function supplied with our hardware, and if video card can perform the task without CPU, it does so.
But there is one more thing that makes hardware optimization very simple for pipelined approach. Contrary to mathematical problems, pixels or vertices are not affecting each other directly. Lets look at rasterization for example – when geometrical primitive has been rasterized in pixels, we can work with all pixels at the same time, if we wish, as only things that bothers us are the current texture coordinates and light intensities. Even if one pixel occludes another, they are discarded during depth test later on. So it's a common trend among hardware manufacturers to increase pixel and vertex pipelines on their graphic chips. For example nVidia GeForce 6800GT/Ultra can work with 6 vertices and 16 pixels in parallel, so giving huge performance boost that would be impossible to achieve even on a 20Ghz CPU.
In order to better understand graphic development tendencies and problems we should take a brief look into history of video card development.
Based on classification used in book “The Cg Tutorial”, video cards have gone trough 5 generations. Cards produced till 1990 could be named as additional generation, but because of the fact that they existed only as storage area for monitor to read, they aren’t.
So all cards produced after VGA controllers and up till 1998 are considered to belong to the first generation. The cards only task was to accelerate rasterization, by offloading it from CPU to video card.
The second generation of cards appeared around 1999. And you can recognize them by marketing buzzword TnL, which stands for transform and lighting. These cards implemented geometry stage in hardware, so leaving CPU to work with application stage only.
The third generation, came with an innovative idea - allow to reprogram fixed function pipeline in the geometry stage, so effectively replacing lighting and geometry calculations, but still leaving program execution on video card side. We know these cards as GeForce3/4, excluding MX series. These cards also included limited configurability for pixel processing, so allowed to perform some calculations fixed-function pipeline rasterizer stage wasn't supposed to.
Based on success of the third generation forth generation of cards appeared. Because of the effects possible to achieve using third generation cards reprogrammable vector pipes, it was clear that similar flexibility is more than welcome at rasterizer stage too. First programmable pixel pipelines appeared on video cards. The most famous card of this generation is Radeon 9700/9800 as it was first on the market and allowed to use fragment programs on rasterizer side reasonably fast. This generation of cards is simple to use, but unfortunately they are limited feature-wise. There are problems concerning precision/speed issues, but the main flaw is highly limited instruction count, and absence of true branching. Speed in case of longer fragment programs suffered too.
Finally, this year, new generation of cards appeared in form of GeForce 6 series. These cards have approximately 2x times the shading power of previous generation, have true branching, floating point texture types with filtering options and possibility to access textures during geometry stage for approaches like displacement mapping. There are some flaws, but it's clear that programmable pipelines are definitely the key aspect for hardware manufacturers now.
If we are using vertex program in geometry stage, then it will be executed for every vertex we pass. Similarly, if we have enabled fragment program, it will execute for every fragment, even if it's occluded later. Newer cards have implemented approach known as early Z, by moving depth test for fragments before fragment program, so possibly rejecting pixel without executing program for it. Still, early Z won’t work with fragment programs that change pixels depth value on the fly or in case blending is enabled.
Parallel processing in case of reprogrammed pipes remain, so if we have 16 pixel units, they will work with 16 fragments simultaneously.
There is a suspicion that early exits from branches might not work as expected on current generation of hardware and will wait until all parallel pixel units will finish their job.
But now, lets take a look what happens when we replace fixed-function pipeline. As we have already seen, our program can substitute vertex or pixel processing, so effectively changing API built in algorithms. There is a limitation, though, if we are replacing part of the pipeline, then we must also take care of all functions it should perform. For example, if we are using fog in our application, we should remember to calculate fog coordinate in vertex program, so that pixel pipeline, in case it's not overridden, could complete the calculations. If we are using textures in our pixel program, then vertex unit must supply their lookup coordinates, or we must calculate them on fly.
Also, there are some musts that should not be forgot. When using reprogrammed vertex pipeline, we must always supply fragment pipeline with a clip space vertex position, and it, in turn, must set pixel color and optionally depth. Each of these operations takes up just a line of code, but they are vital for program to work.
When first programmable video cards appeared, biggest problem was that they were difficult to use. API functions weren’t supposed to support pipeline programming from the beginning, and the first solutions were difficult to use. Drawback was that the only way you could reprogram part of the pipeline was by writing program code in video card specific assembly, and then upload it via API extension calls.
This issue was firstly addressed by NVIDIA founded group. They created first compiler that would accept C-like programs, and depending on target hardware, compiled them into matching assembly. It is also possible to use Cg at program runtime.
Afterwards, it became clear that there should be easy way to use high level shading language from within API. So Microsoft implemented their HLSL in DirectX, and OpenGL development team agreed to include functions, to upload program code to driver, which would be responsible for compilation to native assembly.
Compilation is being performed, during the application stage, on the CPU.
As far as we have seen, in order to use programmed pipeline replacements, all chain must be sustained. You must have working program, geometry and buffer you want to output to. More generally speaking, that means you have to write all program from the scratch, keeping in mind how, and where you'll include programmable paths. Such approach might not always be acceptable, as often you don't want or don't know how to write graphic programs using APIs.
We have found, that there are solutions for this problem. You can download free packages, that offer you to use programmable graphic via high level shading languages directly from most popular 3D modeling applications like Maya, 3DMax, as well as packages, that let you experiment with different shading algorithms. You can find more information on program home pages at NVIDIA developer site .
Can and should we always use reprogrammed pipes? Often API calls do just fine, and if you can live with them do so. According to information found at www.beyond3d.com and www.opengl.org discussion boards, driver replaces fixed function pipeline with equal vertex, fragment programs for newest hardware anyway, so technically there is no difference.
Also, if you'll take a look at fixed-function pipeline implementation in Cg language , you'll understand that it includes many things that are not always needed, so often calculations can be optimized or improved.
Lets take a look at lightning equation implemented in fixed pipe and a way to improve it. Fixed-function lighting model is very simple, easy to compute and understand. In order to speed things up it is performed during geometry stage so light intensity is interpolated for pixels later on. Such approximation works just fine for well tessellated geometry, but image quality suffers a lot if we have geometric primitives with large area, or lights that are very close to object. Easy and simple way to increase lighting quality could be moving of lighting equation from geometry to rasterizer stage. But because of the fact lighting is one of the most important aspects of computer graphics, a lot of research has been done in this field, and there are way better, methods, than just making light calculations per-pixel.
The first thing you can do is to implement bump mapping  proposed by Jim Blinn at rasterizer stage. This approach simulates bumpy appearance for surfaces with low tessellation, by supplying normal data in a texture per pixel, instead of using interpolated normals from geometry stage. Bump mapping gives way better quality and implementation is quite simple.
However, better approaches based on bump mapping, which improve lighting calculations, also exist. Two of the most anticipated are parallax mapping  and relief mapping  using binary search in pixel shader. These approaches implement base texture and normal map offsetting depending on view vector, so improving appearance of bumpiness even more.
We would like to stress, that these approaches aren't just making rendered image better, they also might help with modeling tasks a lot. If you have high quality height map, you can use parallax or relief mapping, and achieve effect that you are drawing geometry consisting of millions of polygons, in the same time one quad and texture map is all you need. Moreover, height maps can be drawn easy in every paint program, and shader you have created once can be reused later on.
But lighting is just a small part what you can do. It's recommended to visit sites like ShaderTech  to learn more, and to take a look at images achieved real-time on PC via reprogrammed pipeline.
High quality graphic on PC in a reasonable amount of time is only possible by using one of the graphic APIs. APIs got all basic functions you might wish during the work with graphic, but because of speed issues their execution is limited to some extent and they are locked to pipeline architecture. Then again, because of high demand and hardware improvements, there is a way to get off the fixed pipeline model, so attaining more flexibility and opportunities.
After realization of new algorithms that replace parts of the fixed pipeline it becomes clear, that programmable approach gives way better image quality and can simplify modeling tasks a lot. There are also three broadly known shading languages you can use to express your ideas relatively easy.
For 3D content creators and artists there are free software packages, which allow concentrating on shading problems, rather than programming tasks.
Based on the fact that shading language compilers are now core elements in API and driver development, we believe that compiled code will get optimized even more with each new revision.
Looking at the side of the hardware development, it is clear, that shading power is the main target of hardware manufacturers. We can also expect that branching an pixel discarding capabilities on next generation of hardware is likely to improve, so allowing us to use even more sophisticated and demanding effects.
 Real-time rendering, second edition; Tomas Akenini-Möller, Eric Haines; A K Petters – 2002;
 OpenGL http://www.opengl.org/
 Matrix and quaternion FAQ http://skal.planet-d.net/demo/matrixfaq.htm
 Phong lighting model
 Cg Toolkit and documentation
 FX Composer http://developer.nvidia.com/object/fx_composer_home.html
 Fixed-function pipeline in Cg
 Bump mapping http://www.tweak3d.net/articles/bumpmapping/
 Parallax mapping www.infiscape.com/doc/parallax_mapping.pdf
 Relief maping with binary search
 Shadertech http://www.shadertech.com/