Joined
·
538 Posts
Whoa there! Someone has been putting wacky-backy in your pipe, me thinksOriginally posted by Dan
GeForce3
350 MHz RAMDAC
64/128/256MB
7.36GB per second memory
5 Gpixels per second (3.2 Gpixels FSAA)
The GeForce3 is still a "4 pixel per clock" engine, which means at 200Mhz, it can output 800Mpixels per second. Period. It can also apply up to 2 textures per pixel, which gives a "top speed" of 1.6Gtexels per second. Pixels and texels are different, when quoting graphics processing "power".
Place that against a GeForce2 Ultra, which is also a "4 pixel per clock" engine, but runs at 250Mhz, and is therefore capable of 1Gpixels per second. Again, it can apply up to 2 textures per pixel, giving a "top speed" of 2Gtexels per second.
It's faster than the GeForce3?!?! The core rendering engine is, yes. But these figures are not the end of the story. We need to look around the main graphics engine, at the "support" chips: graphics memory.
You really need to examine the bandwidth to memory that the graphics processor has.
Both the GeForce2 and GeForce3 use 128bit memory buses to main memory (the GeForce3 has a more efficient bus controller, allowing better use of the available memory bandwidth). Using DDR memory at 230/460Mhz (standard for GeForce2 Ultra and GeForce3), both have 7.4Gb/s of memory bandwidth (rounded). Sounds a lot, doesn't it? Not when you think that to achieve 1GPixel per second, drawing 32bit colour (4 bytes) with 32bit ZBuffer (4 bytes), you need 1GPixel/s * 8bytes/pixel = 8Gb/s memory bandwidth. Oops. And that is ignoring any memory reads (zbuffer and in case you are alpha-blending), and any texture reads (in case you actually want to texture the pixels you are rendering).
What this means is that both the GeForce2 Ultra and the GeForce3 are limited by the speed of the memory achitectures. But when both are heavily stressed, the GeForce3 comes out on top by virtue of it's more efficient usage of the memory bandwidth it as available. But when not heavily stressed, the GeForce2 Ultra can beat the GeForce3 in speed, as seen by various reviews/benchmarks.
Now let's turn to the PS2. Where the GeForce2/3 has a 128bit bus to main memory, the PS2 has a 2560bit bus to main memory. That is no typing mistake, it really is 2560bits wide. It has a 1024bit read bus, it has a 1024bit write bus and a 512bit texture bus.
A bit of maths - 2560bits = 320bytes (128bytes read, 128bytes write, 64bytes texture). 150Mhz * 320bytes = 48Gb/s memory bandwidth. That is a massive figure, achieved by using embedded DRAM (hence the ability to use massive bus widths).
Why is it 128bytes read/128bytes write? Well, the PS2 can render "16 pixels per clock". If each pixel is 32bit colour, with 32bit depth, that is 8bytes per pixel (as above). 8bytes * 16 = 128bytes. And Sony allocated that for both reading and writing, so zbuffering and alpha-blending have full bandwidth too.
And finally, 150Mhz * "16 pixels per clock" = 2.4Gpixel/s. Note that is "pixel" not "texel". That is way beyond any PC rendering speed, and it has the bandwidth to actually achieve that speed as well.
The unfortunate thing about the PS2 GS is that when texturing is enabled, the fill-rate drops to 1.2Gpixel/s (or 1.2GTexel/s, as it can only do single texturing). They seem to have optimised the rendering read/write process, but not the texture fetching process.
Theory is a wonderful place: everything works in TheoryNow for the polygons persecond, you cant quite make a number. Because the GPU is programable, the number of polygons depends on how effecient the person programed the GPU. It can theoretically (according to nVidia) push 125 million.
Well, let's take a look at the numbers, shall we?
Let's assume that we are drawing an infinite triangle-strip, so that on average, only one vertex needs to be sent to the GPU for every triangle (the other two vertices are already in the GPU from previous triangles). The vertex consists of homogenous co-ordinates (X, Y, Z, and W), one set of texture co-ordinates (S & T, or U & V, depending on your favourite API), and one colour (ARGB). Each of those values is a 32bit data variable (float for everything, apart from the ARGB8888 32bit word). 32bit = 4bytes.
4 bytes * 7 = 28 bytes. 28 bytes per vertex, that is. Which also means 28 bytes per triangle, based on our generous assumption.
Okay, we are drawing 128 Million traingles per second, so we need 128Mtris * 28 bytes/tri = 3.5Gbyte/sec bandwidth.
Erm, AGP4x peaks out at 32bit*66Mhz*4 = 1Gbyte/sec. That means we can only get 1/3.5 = 29% of the required bandwidth. So instead of 125 million triangles, we end up with 29% of 125 Million triangles, which is 36 million triangles/sec.
Now then, sort out the bandwidth bottle-necks in the PC architecture, and you may be able to approach that theoretical figure ... did anyone mention XBox?
Okay, this is all really a rough hack through the figures, just trying to put a perspective on things. It's really in the hands of the developers that each platform comes to life, regardless of what the platform is capable of.