Next Generation Emulation banner
1 - 2 of 26 Posts

· Premium Member
538 Posts
Originally posted by Dan
7.36GB per second memory
5 Gpixels per second (3.2 Gpixels FSAA)
Whoa there! Someone has been putting wacky-backy in your pipe, me thinks :)
The GeForce3 is still a "4 pixel per clock" engine, which means at 200Mhz, it can output 800Mpixels per second. Period. It can also apply up to 2 textures per pixel, which gives a "top speed" of 1.6Gtexels per second. Pixels and texels are different, when quoting graphics processing "power".
Place that against a GeForce2 Ultra, which is also a "4 pixel per clock" engine, but runs at 250Mhz, and is therefore capable of 1Gpixels per second. Again, it can apply up to 2 textures per pixel, giving a "top speed" of 2Gtexels per second.
It's faster than the GeForce3?!?! The core rendering engine is, yes. But these figures are not the end of the story. We need to look around the main graphics engine, at the "support" chips: graphics memory.
You really need to examine the bandwidth to memory that the graphics processor has.
Both the GeForce2 and GeForce3 use 128bit memory buses to main memory (the GeForce3 has a more efficient bus controller, allowing better use of the available memory bandwidth). Using DDR memory at 230/460Mhz (standard for GeForce2 Ultra and GeForce3), both have 7.4Gb/s of memory bandwidth (rounded). Sounds a lot, doesn't it? Not when you think that to achieve 1GPixel per second, drawing 32bit colour (4 bytes) with 32bit ZBuffer (4 bytes), you need 1GPixel/s * 8bytes/pixel = 8Gb/s memory bandwidth. Oops. And that is ignoring any memory reads (zbuffer and in case you are alpha-blending), and any texture reads (in case you actually want to texture the pixels you are rendering).
What this means is that both the GeForce2 Ultra and the GeForce3 are limited by the speed of the memory achitectures. But when both are heavily stressed, the GeForce3 comes out on top by virtue of it's more efficient usage of the memory bandwidth it as available. But when not heavily stressed, the GeForce2 Ultra can beat the GeForce3 in speed, as seen by various reviews/benchmarks.

Now let's turn to the PS2. Where the GeForce2/3 has a 128bit bus to main memory, the PS2 has a 2560bit bus to main memory. That is no typing mistake, it really is 2560bits wide. It has a 1024bit read bus, it has a 1024bit write bus and a 512bit texture bus.
A bit of maths - 2560bits = 320bytes (128bytes read, 128bytes write, 64bytes texture). 150Mhz * 320bytes = 48Gb/s memory bandwidth. That is a massive figure, achieved by using embedded DRAM (hence the ability to use massive bus widths).
Why is it 128bytes read/128bytes write? Well, the PS2 can render "16 pixels per clock". If each pixel is 32bit colour, with 32bit depth, that is 8bytes per pixel (as above). 8bytes * 16 = 128bytes. And Sony allocated that for both reading and writing, so zbuffering and alpha-blending have full bandwidth too.
And finally, 150Mhz * "16 pixels per clock" = 2.4Gpixel/s. Note that is "pixel" not "texel". That is way beyond any PC rendering speed, and it has the bandwidth to actually achieve that speed as well.

The unfortunate thing about the PS2 GS is that when texturing is enabled, the fill-rate drops to 1.2Gpixel/s (or 1.2GTexel/s, as it can only do single texturing). They seem to have optimised the rendering read/write process, but not the texture fetching process.

Now for the polygons persecond, you cant quite make a number. Because the GPU is programable, the number of polygons depends on how effecient the person programed the GPU. It can theoretically (according to nVidia) push 125 million.
Theory is a wonderful place: everything works in Theory ;)
Well, let's take a look at the numbers, shall we? :)
Let's assume that we are drawing an infinite triangle-strip, so that on average, only one vertex needs to be sent to the GPU for every triangle (the other two vertices are already in the GPU from previous triangles). The vertex consists of homogenous co-ordinates (X, Y, Z, and W), one set of texture co-ordinates (S & T, or U & V, depending on your favourite API), and one colour (ARGB). Each of those values is a 32bit data variable (float for everything, apart from the ARGB8888 32bit word). 32bit = 4bytes.
4 bytes * 7 = 28 bytes. 28 bytes per vertex, that is. Which also means 28 bytes per triangle, based on our generous assumption.
Okay, we are drawing 128 Million traingles per second, so we need 128Mtris * 28 bytes/tri = 3.5Gbyte/sec bandwidth.
Erm, AGP4x peaks out at 32bit*66Mhz*4 = 1Gbyte/sec. That means we can only get 1/3.5 = 29% of the required bandwidth. So instead of 125 million triangles, we end up with 29% of 125 Million triangles, which is 36 million triangles/sec.
Now then, sort out the bandwidth bottle-necks in the PC architecture, and you may be able to approach that theoretical figure ... did anyone mention XBox? ;)

Okay, this is all really a rough hack through the figures, just trying to put a perspective on things. It's really in the hands of the developers that each platform comes to life, regardless of what the platform is capable of.

· Premium Member
538 Posts
Originally posted by Adair
Wow Lewpy, you really know alot about that stuff. Thanks for all the info. So If a PS2 can do all that then why aren't the games alot more impressive? Is the unoptimised texture fetching process or are they just not using the full capability of their hardware on purpose?
I always check out technology/hardware when it comes out. My background is electronics, so highspeed chip technology always gets my interest ;)
Anyway, from what I understand, here is the main problems with the PS2 architecture: 4Mb of VRAM.
If the game is running at 640x480x32bit colour, you are looking at atleast 3 buffers that size: 2 display buffers, 1 ZBuffer. 640x480x32bit = 1.17Mb. 1.17Mb*3= 3.51Mb ... that leaves a whooping 0.49Mb for textures .... gee whiz.
I don't know the real details of the PS2 GS, only what has been freely published on the .net, but I would guess they have an interlaced mode, which means you could reduce the display frames footprint by one-half, but obviously this would reduce the image quality.
PS2 also supports anti-aliasing, but I believe it uses super-sampling. Thus, a larger display-buffer is required in VRAM, which leaves practically nothing for textures.
Also, PS2 does not support texture compression (only the pseudo-compression of paletted textures). This does not allow what little texture memory there is to be used efficiently.
This means that the programmers have to do very, very careful texture memory management, with many texture uploads per frame, in order to do something useful with textured primitives. Using large textures is almost impossible, as they just can't fit in the available space.
There is another problem I have thought about, but have no proof as to whether it is true or not. The PS2 GS renders 16pixels simultaneously. What I don't know is whether there are hard limits on what pixels can be rendered simultaneously. Worst case, only aligned groups of parallel pixels can be rendered at the same time. This means if a vertical line is rendered, then only one of the 16 pixel pipelines is doing any useful work, while the other 15 are idle. This reduces the fill rate to 1/16 it's peak "theoretical" value. Bummer.
It just seems that PS2 was not really designed for heavy textured primitive usage. You could almost argue that it doesn't need to, because it has such a large tri-throughput capability, that the texels on an object could be modeled with coloured triangles rather than textures. How feasable this is, I am not sure.
As soon as developers get to grips with the PS2, and understand how to make it perform, then the impressive stuff will flow. The PS2 also requires careful programming to extract the most from all the parallel engines in the unit. The art of doing this will take time to mature. Just look at the PSX stuff coming out still: would you have thought it was possible on a PSX a couple of years ago?

Originally posted by Dan
Also, you talk about memory problems, the internal memory gives it a nudge of speed by bypassing the GPU all together...

"Lightspeed Memory Architecture implements four independent memory controllers that not only communicate with the graphics processor but with themselves as well. In theory, this "crossbar" memory architecture can be up to four times faster than previous designs by being able to move smaller amounts of data in 64-bit blocks rather than tying up the entire 256-bit capacity of the memory bus when it's not needed. " -Nvidia

This is only effective with the cards onboard memory. So as long as it is swapping between its onw 64/128/256mb of memory, the memory argument is moot.
No, the memory arguement is far from moot :) Their memory "crossbar" is a good thing, of course. But is it radically different from what Matrox implemented with their "dual independant buses", which split the bus in to two smaller buses, each independant? nVidia have extended it to 4 smaller buses. Logical progression to me :)
Plus, what this really does for the system is allow it to approach the theoretical maximum memory bandwidth of the memory more often. It doesn't offer more bandwidth, just uses the bandwidth that is available more efficiently.
One thing to note from that quote from nVidia is that they talk about a 256bit bus. Well, technically, that is not true. What they are really saying is a 128bit bus running at DDR (double-data rate). It doesn't take much for someone to look at the figures and be lead to believe they have a 256bit bus running at DDR (say, 460Mhz). Marketting people suck, they deliberately try to skew the figures, using terms such as "effective".

Anyway, I am not trying to put nVidia down, I am trying to put things in perspective. I think they are doing some incredible design and implementation, just not as wonderfully great as they like to make out often ... quite often, sometimes! ;)

Originally posted by tekwiz99
:D ...never would thought Lewpy will tug-in for lecture, so... would PS2's design turns out to be a big waste? and speaking of XBox, how it gonna perform against PS2? (focus on their graphics engine)
I don't think the PS2 design is a big waste, it just needs to be deisgned/programmed for. That will take a few generations of games. The XBox is more like a PC in architecture, so companies that are inherently PC based can make the transistion to console developers far more easily. Good thing, Bad thing? I don't know. Time will tell.
Graphically speaking, the XBox is basically a GeForce3 chip with 2 more texture units per pixel pipeline. This allows 4 textures per pixel, and doubles the peak texel throughput ... of course, this is only true if 4 textures are being applied simultaneously.
I am not sure how they have tackled the memory bandwidth problem. Also, from what I remember, the XBox GPU is the main memory controller for the entire console (including CPU). This means some of the available bandwidth is wasted looking after the other components in the system, but it does mean all display lists are inherently already in the GPUs local memory, as all main memory is local to it ;)
I guess they are using DDR SDRAM, and they probably have split it in to banks. Then again, consoles tend to render at lower resolutions, and the GeForce3 has on-chip FSAA capabilities rather than super-sampling, so all these factors will help lower it's memory bandwidth requirements.
1 - 2 of 26 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.