Of course an N-way SIMD unit will be less expensive than N independent units. It'll also be much less flexible and the overall throughput will be less..
they surely will be less flexible, but the throughput bit is not clear, as it does not follow from the flexibility part. by simd's virtue of being simpler, you could either:
a) use those units at 'casual' clocks aiming for a power consumption advantage (as you mention later in your post), or
b) push those units to higher clocks (where a similar number of independent units could not go), in which case the matter of throughputs becomes one of statistics, and ultimately, a function of the nature of the workload at hand.
speaking of which, being simple and, at the same time, fit for the task at hand, is a good thing. and simd just happens to be so in the world of vector crunching.
..is this really a fair comparison? nVidia and ATi have both moved to non-vector units (although ATi has multi-issue superscalar ones). It's just strange that the trend is going in the opposite direction..
to what extent the trend is 'in the opposite direction' is a subject to debate. allow me to bring to your attention the following original b3d analysis piece on nvidia's g80
, where on page 8
one can read the following bit:
b3d article wrote:
"Inwardly, each 16 SP [ed: scalar processor, or shading processor - your pick] cluster is further organised in two pairs of 8 (let's call that 8x2) and the scheduler will effectively run the same instruction on each half cluster across a number of cycles, depending on thread type. [ed: emphasis by me]"
now, the above is surely speculative, but it's both educated speculation (the b3d folk really take their pride in such stuff), and one non-contradicting with common sense, thus i'm willing to bring it into our otherwise pragmatic discussion.
PSP isn't the only SIMD unit that has facilities for reordering, although I haven't seen anything that's as general/extensive as it is.
neither have i, but then again, i've never seen face-to-face the actual set-in-metal architecture of a contemporary GPU shader unit, which i suspect would share similar levels of register-file addressing advancement, given it existed in the simd domain.
I'm not making it up, but I don't remember where I got the exact information. Does it really matter though?
it clearly does not matter in this thread; i was just really surprised to read that bit, though, thus my asking.