maciek_urbanski wrote:
Nope, we didn't.
Yes I know. Read part about 'access to external memory' above.
I wasn't referring to you, but you can see why I deleted that part.
maciek_urbanski wrote:
This is not a lie. You should not accuse others of lying if you cannot prove they are.
But I agree - this value is a throughput in very unlikely scenario. Essentially they might benchmark shader setting R8G8B8A8 render target into solid color... but that's how marketing numbers are generated.
Yes it is a lie, it's ImgTech's lie. A lie that you weren't aware of, apparently.
maciek_urbanski wrote:
Please provide link to some benchmarks.
http://www.gp32x.com/board/index.php?sh ... ntry617065dmdm is a PowerVR rep, so please drop this nonsense about me building "guesswork on guesswork", you're being all kinds of cocky.
cb88 wrote:
@Exophase chill out man.... emulators aren't necessarily easy to code or figure out either but look how many of them are already ported to the pandora..
Great, porting emulators is as difficult as reverse engineering GPUs now.
maciek_urbanski wrote:
your last post was pretty much to the point until you made the above comparison.
sorry but the above makes no sense. on one hand you have a part capable of hiding latencies like nobody's business (that's what GPUs do in general - they nearly eliminate data-flow latencies for a certain class of computational tasks). on the other you have a regular SIMD cpu extension, generally orders of magnitude less-efficient at what the GPU does,
Orders of magnitude? Would you like to tell me what a 32bit SIMD scalar processor can do in one cycle that's ORDERS OF MAGNITUDE more efficient than 2-way FPU SIMD that NEON can do? Yes, the hardware threads hides latency but there are other mechanisms to hide latency even on CPUs, such as prefetching. They just take more work. On the other hand, the CPU has 256KB of L2 cache that is quite fast, although we don't know how much cache the GPU has.
maciek_urbanski wrote:
and a VLIW unit which i admit to know nothing about, but i would be utterly surprsed if it could be as flexible and as good at latencies hiding as the GPU is,
It's much more flexible. Go read the documentation. Can issue 8 execution units per cycle, which basically addresses 4 units with 2x redundancy each, but can do many similar ALU operations over nearly all of them. And of course they also have prefetching and decent caches. I'd like to know why you think the USSEs have such an amazing instruction set. 16-way threads are nice, but the USSEs were obviously made to be scaled, with the SGX 530/535 being roll out parts. The newer SGX's already have more USSEs.
maciek_urbanski wrote:
without requiring insane levels of manual data micro-management. the mere fact that TI thew in there an SGX along with their DSP should tell you something.
OR it could be that the highest end PowerVR chip available has shaders anyway, or it could be that you can't do pixel shading using a DSP... take your pick? Your sentence suggests that the SGX is only good for its shaders. I entirely expect that some people will use NEON for transformation and lighting, especially if they want to maximize pixel shader computational throughput.