@vikpaw: Regarding Hyper-Threading; yes, it does make it appear as though there's another core to hand out, but it's not real and will quite often take away CPU resources from another process.
For those that aren't sure what it is, it was designed to help the old P4 from being so slow. A pipeline in a CPU is made up of all the stages a chunk of program goes through when it gets processed and it takes one CPU clock cycle to get something from one end of the pipeline to the other. Generally speaking the longer a pipeline is the faster the CPU can go, but it makes the CPU clock cycle longer (but the cycles are going quicker 'cause in theory the CPU is fast enough to overcome this). A shorter pipeline will limit the CPU speed earlier, but will process things quicker per clock cycle. Clear as mud?! This is one reason why the earliest P4s were slower than the latest P3s (which had a shorter pipeline). The speed of these early P4s wasn't sufficient to overcome the penalty of the longer pipeline / cycle.
The P4 had a very long pipeline because Intel wanted it to get to silly speeds (4GHz was the initial promise), but they soon discovered that not everything that was getting sent down the pipeline was using each stage, so quite a lot of the pipeline was going unused during the clock cycle. This led them to invent Hyper-Threading (HT). It 'fooled' the OS to think there were two CPU cores in the system and the OS would allocate two chunks of program at the same time. The HT system in the CPU would then slot one chunk in along side the other so that more of the pipeline was being used per clock cycle, but this quite often slowed down both processes. In a desktop scenario HT worked quite well; we were more interested in being able to do more at once and didn't notice the latency impact that HT was having. In a server environment, latency is quite important. So in the P4 days, in a server environment, it was more often quicker to run a single core CPU without HT than with.
AMD have always had shorter pipelines and never needed HT. This is why the Athlon64 was so much quicker than the P4 even though they ran at slower clock speeds (less GHz). This led to the shameful period of Intel's marketing dep't to try and fool us into thinking more GHz was always better. Yeah right!
Today's CPUs all have shorter and more efficient pipelines than the old P4. In simplistic terms, because the pipeline is shorter, there's much less impact if some of the pipeline goes unused (the P3 and AMD way of thinking, if you like) and the latency impact of trying to cram two chunks down one pipeline is relatively higher. That and we also have 2 or more real cores to play with, not fake ones.
This Microsoft article SQL Server support in a hyper-threaded environment discusses HT in a SQL server environment. The bit that is interesting to me is:
If you look at the SIMS CPU usage as described in my previous post, you'll see that a single person using SIMS can max out one CPU core, rather than have the load spread over several cores, ergo, in my opinion, the last sentence of the quote comes into effect.Quote:
The performance of hyper-threaded environments varies. Conservative testing has shown 10 to 20 percent gains for SQL Server workloads, but the application patterns have a significant affect. You might find that some applications do not receive an increase in performance by taking advantage of hyper-threading. If the physical processors are already saturated, using logical* processors can actually reduce the workload achieved.
EDIT: * a logical processor is an HT CPU core.
EDIT2: What I didn't mention above is that when you have HT enabled, not only is there contention in the pipeline, but also in the L1, L2 and L3 caches and there may be the possibility that one thread (chunk of work) will throw out the other thread's L1-3 data for it's own, which will in turn get thrown out by the other thread, meaning lots of return trips to main RAM (which is much slower than the caches).