Next Generation Emulation banner
1 - 2 of 2 Posts

KrossX

· クロスエクス
Joined
·
4,886 Posts
Discussion starter · #1 ·
PhysX87: Software Deficiency
By: David Kanter | 07-05-2010

One of the latest challenges in computer gaming is modeling the game environment, with a high degree of realism. The most gaming obvious improvements in the last 25 years have been graphical – from the early days of 2D sprites like Metroid or King's Quest, to 3D rendering with Glide and later DirectX and OpenGL, powering the latest games like Crysis. Features such as multi-sample anti-aliasing and anisotropic filtering produce more attractive images and increasing amounts of effort and computational capacity are spent on accurately portraying difficult phenomena such as smoke, water reflections, hair and shadows. However, an accurate visualization of an object is only as convincing and realistic as the modeling of the object itself; a glass hurled against a wall that bounces away harmlessly is unlikely to be convincing, no matter how beautifully rasterized (or ray traced). Consequently, as graphics have improved, modeling the underlying behavior becomes increasingly important. This article delves into the recent history of real-time game physics libraries (specifically PhysX), and analyzes the performance characteristics of PhysX. In particular, through our experiments we found that PhysX uses an exceptionally high degree of x87 code and no SSE, which is a known recipe for poor performance on any modern CPU.

[...]
Full article @ Real World Technologies
 
But both Ageia and Nvidia use PhysX to highlight the advantages of their hardware over the CPU for physics calculations. In Nvidia’s case, they are also using PhysX to differentiate with AMD’s GPUs. The sole purpose of PhysX is a competitive differentiator to make Nvidia’s hardware look good and sell more GPUs. Part of that is making sure that Nvidia GPUs looks a lot better than the CPU, since that is what they claim in their marketing. Using x87 definitely makes the GPU look better, since the CPU will perform worse than if the code were properly generated to use packed SSE instructions.
that was a pretty interesting article.
its true that if they really wanted to, they could use sse and have a nice speedup compared to the x87fpu.

the article says there are 16 sse regs... but that only applies when running on x64 operating systems; on 32bit systems theres only 8 accessible regs (i wish there was 16! :().

the x87 fpu has some more complex instructions that SSE doesn't have, but those can be emulated with multiple sse instructions (stuff like sincos/tan...)
the latency for the complex x87 fpu instructions is so much, even emulating the instructions using sse algorithms ends up being faster.

anyways, standard x86 compilers usually generate their floating point calculations with the x87 fpu, as they're not very good at utilizing sse.
its possible this could be a reason its using so much of the x87-fpu.
(unless the physX code is emitted and generated at runtime, if that's the case, then there's no good execuse not to be using sse)
 
1 - 2 of 2 Posts