Content deleted Content added
m grammar s/casted/cast |
fmt headings; rm spam link |
||
Line 7:
Programmable [[vertex]] and fragment [[shaders]] were added to the graphics pipeline to enable game programmers to generate even more realistic effects. Vertex shaders allow the programmer to alter per-vertex attributes, such as position, color, texture coordinates, and normal vector. Fragment shaders are used to calculate the color of a [[Fragment (computer graphics)|fragment]], or pre-pixel. Programmable fragment shaders allow the programmer to substitute, for example, a lighting model other than those provided by default by the graphics card, typically simple [[Gouraud shading]]. Shaders have enabled graphics programmers to create lens effects, [[displacement mapping]], and [[depth of field]].
===Data
For many years graphics cards only supported paletted or integral color types. Various formats are available, each containing a red element, a green element, and a blue element. Sometimes an additional alpha value is added, to be used for transparency. Common formats are:
*8 bits per pixel - Palette mode, where each value is an index into a table with the real color value specified in one of the other formats. Possibly 2 bits for red, 3 bits for green, and 3 bits for blue.
Line 25:
Operations on the GPU operate in a vectorized fashion: a single operation can be performed on up to four values at once. For instance, if one color <R1, G1, B1> is to be modulated by another color <R2, G2, B2>, the GPU can produce the resulting color <R1*R2, G1*G2, B1*B2> in a single operation. This functionality is useful in graphics because almost everything basic data type is a vector (either 2, 3, or 4 dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of this vector instructions ([[SIMD]]) have already been added to CPUs.
==GPGPU
GPUs are designed specifically for graphics and thus are very restrictive in terms of operations and programming. Because of their nature GPUs are only effective at tackling problems that can be solved using [[Stream processing]] and the hardware can only be used in certain ways.
===Stream
GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors - processors that can operate in parallel by running a single [[Kernel (computer science)|kernel]] on many records in a stream at once.
Line 37:
Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.
===GPU
====Computational
There are a variety of computational resources available on the GPU:
*Programmable processors - Vertex and fragment pipelines allow programmer to perform kernel on streams of data
Line 47:
In fact, the programmer can substitute a write only texture for output instead of the framebuffer. This is accomplished either through Render-To-Texture (RTT), Copy-To-Texture(CTT), or the more recent framebuffer_objects. This texture is write only, but once the operation is complete it can then be switched for use as input.
====Textures as
The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.
Line 61:
On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by drawing geometry. For example, if the programmer wanted to run the kernel over the entire grid, he would draw a full screen quad to create fragments over each grid cell (i.e. over each pixel).
=====Flow
In regular programs it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be accomplished using a series of simpler instructions, but looping and conditional branching were not possible.
Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various techniques, such as static branch resultion, pre-computation, and Z-cull{{ref|survey}} can be used to achieve branching when hardware support does not exist.
===GPU
====Map====
The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.
Line 73:
Some computations require calculating a smaller stream (possibly a stream of only 1 element) from a larger stream. This is called a reduction of the stream. Generally a reduction can be accomplished in multiple steps. The results from the previous step are used as the input for the current step and the range over which the operation is applied is halved each step until only one stream element remains. For two dimensional problems the output size is halved in both directions, resulting in one quarter the number of elements as in the previous step.
====Stream
Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.
Line 90:
The search operation allows the programmer to find a particular element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel.
====Data
A variety of data structures can be represented on the GPU:
*Dense Arrays
Line 135:
*[http://graphics.stanford.edu/~mhouston/public_talks/R520-mhouston.pdf Slideshow for ATI GPGPU physics demonstration] by Stanford grad student Mike Houston See p.13 for overview of mapping of conventional program tasks to GPU hardware.
*[http://techreport.com/onearticle.x/8887 Tech Report article: "ATI stakes claims on physics, GPGPU ground"] by Scott Wasson
[[Category:Computational science]]
|