![]() When a thread blocks on an external event, such as a mouse click or disk I/O request, the operating system takes it off the round-robin schedule, so the thread no longer incurs time-slicing overhead. Runnable threads, not blocked threads, cause time-slicing overhead. Let your program’s degree of threading adapt to the hardware. Because target platforms vary in the number of hardware threads, avoid hard-coding your program to a fixed number of threads. How to Organize Threads in an ApplicationĪ good solution is to limit the number of runnable threads to the number of hardware threads, and possibly limit it to the number of outer-level caches if cache contention is a problem. The more software threads there are without hardware threads to run them, the more likely this will become a problem. It’s like having someone fall asleep in a check-out line. If a waiting thread is suspended, then all threads waiting behind it are blocked from acquiring the lock. The problem is even worse if the lock implementation is fair, in which the lock is acquired in first-come first-served order. All threads waiting for the lock must now wait for the holding thread to get another time slice and release the lock. In extreme cases, there can be so many threads that the program runs out of even virtual memory.Īnother problem arises when a time slice expires for a thread holding a lock. As with caches, time slicing causes threads to fight each other for real memory and thus hurts performance. ![]() Each software thread requires virtual memory for its stack and private data structures. Similar to caches, the least recently used data is evicted from memory to disk when necessary to make room. Virtual memory resides on disk, and the frequently used portions are kept in real memory. Thus software threads tend to evict each other’s data, and the cache fighting from too many threads can hurt performance.Ī similar overhead, at a different level, is thrashing virtual memory. Typically, the choice for eviction is the least recently used data, which is typically data from an earlier time slice. When the cache is full, a processor must evict data from the cache to make room for new data. Accesses that hit in cache are not only much faster they also consume no bandwidth from the memory bus. Modern processors rely heavily on cache memory, which can be about 10 to 100 times faster than main memory. However, schedulers typically allocate big enough time slices so that the save/restore overheads are insignificant, so this obvious overhead is in fact not much of a concern.Ī more subtle but significant overhead of time slicing is saving and restoring a thread’s cache state, which can be megabytes. You might be surprised how much state there is on modern processors. The most obvious overhead is saving the register state of a thread when suspending it, and restoring the state when resuming it. There are several kinds of overhead, and it helps to know the culprits so you can spot them when they appear. However, fair distribution of hardware threads incurs overhead. Otherwise, some software threads might hog all the hardware threads and starve other software threads. Time slicing ensures that all software threads make some progress. When the time slice runs out, the scheduler suspends the thread and allows the next thread waiting its turn to run on the hardware thread. Each software thread gets a short turn, called a time slice, to run on a hardware thread. When there are more software threads than hardware threads, the operating system typically resorts to round-robin scheduling. Read: Visual Studio Code Extensions for Higher Productivity What is a Time-Slice in Threading? There may be one hardware thread per core on the chip, or more, as for example with Intel Hyper-Threading Technology. Hardware threads are real physical resources. Software threads are the threads that programs create. It is important to distinguish software threads from hardware threads. Second, having too many threads running incurs overhead from the way they share finite hardware resources. First, partitioning a fixed amount of work among too many threads gives each thread too little work that the overhead of starting and terminating threads swamps the useful work. The impact of having too many threads comes in two ways. Why Do Too Many Threads Hurt App Performance? The Intel® Threading Building Blocks (Intel® TBB) task scheduler serves as an example in this threading tutorial. This article discusses why and how task-based programming avoids the problem. In fact, having too many threads can bog down a program. It might seem that if a little threading is good, then a lot must be better. Threading is the method of choice for extracting performance from multi-core chips.
0 Comments
Leave a Reply. |