Sunday, August 23, 2009

multithreading

Multithreading computers have hardware support to efficiently execute multiple threads. These are distinguished from multiprocessing systems (such as multi-core systems) in that the threads have to share the resources of single core: the computing units, the CPU caches and the translation lookaside buffer (TLB). Where multiprocessing systems include multiple complete processing units, multithreading aims to increase utilization of a single core by leveraging thread-level as well as instruction-level parallelism. As the two techniques are complementary, they are sometimes combined in systems with multiple multithreading CPUs and in CPUs with multiple multithreading cores.



Advantages

Some advantages include:

* If a thread gets a lot of cache misses, the other thread(s) can continue, taking advantage of the unused computing resources, which thus can lead to faster overall execution, as these resources would have been idle if only a single thread was executed.
* If a thread can not use all the computing resources of the CPU (because instructions depend on each other's result), running another thread permits to not leave these idle.
* If several threads work on the same set of data, they can actually share their cache, leading to better cache usage or synchronization on its values.

Disadvantages

Some criticisms of multithreading include:

* Multiple threads can interfere with each other when sharing hardware resources such as caches or translation lookaside buffers (TLBs).
* Execution times of a single-thread are not improved but can be degraded, even when only one thread is executing. This is due to slower frequencies and/or additional pipeline stages that are necessary to accommodate thread-switching hardware.
* Hardware support for Multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.

The mileage thus vary, Intel claims up to 30 percent benefits with its HyperThreading technology [1], a synthetic program just performing a loop of non-optimized dependent floating-point operations actually gets a 100 percent benefit when run in parallel. On the other hand, assembly-tuned programs using e.g. MMX or altivec extensions and performing data pre-fetches, such as good video encoders, do not suffer from cache misses or idle computing resources, and thus do not benefit from hardware multithreading and can indeed see degraded performance due to the contention on the shared resources.

Hardware techniques used to support multithreading often parallel the software techniques used for computer multitasking of computer programs.

Types of Multithreading

Block multi-threading

Concept

The simplest type of multi-threading is where one thread runs until it is blocked by an event that normally would create a long latency stall. Such a stall might be a cache-miss that has to access off-chip memory, which might take hundreds of CPU cycles for the data to return. Instead of waiting for the stall to resolve, a threaded processor would switch execution to another thread that was ready to run. Only when the data for the previous thread had arrived, would the previous thread be placed back on the list of ready-to-run threads.

For example:

1. Cycle i : instruction j from thread A is issued
2. Cycle i+1: instruction j+1 from thread A is issued
3. Cycle i+2: instruction j+2 from thread A is issued, load instruction which misses in all caches
4. Cycle i+3: thread scheduler invoked, switches to thread B
5. Cycle i+4: instruction k from thread B is issued
6. Cycle i+5: instruction k+1 from thread B is issued

Conceptually, it is similar to cooperative multi-tasking used in real-time operating systems in which tasks voluntarily give up execution time when they need to wait upon some type of event.

No comments:

Post a Comment