Hi guys and girls ;). Hope you are doing good. I am going to keep this blog short and crisp. To make the best use of this blog I would recommend you to make your own modifications in the example codes and run them on your machine.
You can fork my repo to access code used in this blog: https://github.com/gouravdhar/cpp-threads
First things first, the prerequisites include :
Basic Understanding of C++ language.
Functors in C++. Its a short read incase you are not aware (https://www.geeksforgeeks.org/functors-in-cpp/)
Lambda Expression in C++. Go through a few codes to get the hang of it. (https://www.geeksforgeeks.org/lambda-expression-in-c/)
The flow of the blog would be :
What are threads and why to use them?
First C++ Thread.
Multithreading in C++.
Resource Sharing and Locks.
What are threads and why to use them?
Wikipedia defines thread as “In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.”
Apart from the main thread, you can create your own threads to execute multiple instructions concurrently to save time.
Point to note : All the threads share the same resources, so it is important to synchronize the use of resources efficiently. By resources I mean( the code section, data section, and OS resources (like open files and signals) ).
First C++ Thread
The thread class is defined in “#include<thread>” header file.
Creating a thread is simple. We need to pass a function pointer or a callable object(functor or lambda expressions), which would contain the code to be executed by the thread, to the constructor of the thread object. Have a look at the code, I have used a function pointer.
Passing function pointer to thread
thread::join() function makes the (this->) thread wait for the thread to complete its execution. In this case the main thread waits for ‘t’ thread to complete its execution. If many concurrent tasks are going on, we can synchronize the workflow using join().
thread::detach() makes both the threads(the creator and createe) independent of each other, i.e. both the threads would execute independently. Suppose we don’t want to use join()(i.e. not wait for the completion of the thread we created because we don’t care anymore about its flow of execution), in that case we call thread::detach(). Once detach() has been called, we won’t be able to join(). It’s always better to check if a thread is joinable using thread::joinable() and then join(), else we get error.
As can be seen, main function didn’t wait for the thread to complete its execution. Since both the threads(main and t) are running concurrently the order of execution changes with every run. More like the threads race for resources(in this case “cout”). Let’s see what happens if we comment detach().
As expected, join() was called and first the thread finished its execution and after that “Main Thread” was printed. This was just an example where we used one thread. If you observe the flow in this eg. is mostly sequential. In real life we could have multiple threads running concurrently and the main thread waiting for each of the threads depending on certain states.
It is important for a thread to call either detach() or join(). If they are not called, the destructor of the object, from where the thread was called, would terminate the program.
Multithreading in C++
There’s a hardware limit when it comes to the number of threads that can run concurrently. More number of threads can be created but the number of threads that can run concurrently is limited by the number of logical processors in your system. To find out you can open task manager → performance → CPU →Logical Processors. It can also found programmatically by using thread::hardware_concurrency().
Number of logical processors of CPU as shown in Task Manager
Output: 4 (for my system)
An example of multi-threading
We created a vector of worker thread and pushed threads which were created using lamda functions. We created thread::hardware_concurrency() number of threads.
At the end, every worker thread was looped using for_each(), the lambda expression took a reference of each worker thread and called join() on it. The output as can be seen is random and haphazard and different for every run. Any guesses why?
Because the threads are running concurrently and cout is a resource which is being shared by the threads simultaneously ==>there is a resource race between the threads, commonly known as ‘race condition’. To prevent simultaneous access to resources we use locks(it’s coming in the next section).
Have a look at the following scenario:
Task is to print the string “Thread is running” 200 times using different number of threads. The following code takes number of threads to be used as input, performs the task and prints the execution time of the program.
For the code to work properly enter a number in the range(0,200) which is a divisor of 200.
Print the string “Thread is running” 200 times using different number of threads
Results are interesting :
numThreads : 1 ==> Execution time : 0.046 sec numThreads : 2 ==> Execution time : 0.062 sec numThreads : 4 ==> Execution time : 0.031 sec numThreads : 10 ==> Execution time : 0.077 sec numThreads : 50 ==> Execution time : 0.063 sec numThreads : 200 ==> Execution time : 0.155 sec
Having more thread by no way guarantees less execution time. Using less threads does not fully use up the CPU resources which implies you can do better in terms of execution time.
On the other hand more threads fight over acquiring the CPU resources, sometimes end up taking more time due to activities like frequent thread switching(and many other factors).
So, there’s basically a tradeoff on the optimal number of threads for your program to complete execution in minimal time.
So what’s the optimal number of threads one should use? There’s no definite answer to that but first try should be to keep the thread count around the number of logical processors of you CPU given by hardware_concurrency().
Resource Sharing and Locks
Suppose I decide to display multiplication table for the numbers 1, 10, 20 and 40. I use multiple threads for the same(One thread for each number). The output looks something like this:
Random output using multi-threads
The output doesn’t make sense right!!!
This is because while printing, the threads are using cout simultaneously. If we are able to block simultaneous usage of cout our, problem would be solved. For exactly this reason we use locks.
Webopedia describes mutex as : In computer programming, a mutual exclusion object (mutex) is a program object that allows multiple program threads to share the same resource, such as file access, but not simultaneously.
After adding lock my table function would look something like this :
Added lock() and unlock()
Now that’s a decent output.
Here ‘mtx’ is an object of Mutex. If a thread calls lock for a mutex object(using mtx.lock()), if the mutex is free, the thread acquires the lock, otherwise the mutex is held by other thread and it will wait for the lock to be released. Lock is released at mtx.unlock().
But there’s a problem. What if between lock() and unlock() a runtime eror occurs, the thread won’t proceed and unlock() would not be called. All other threads would be waiting for mtx to be released and they would keep waiting indefinately. Main() would be waiting for the remaining threads to complete. Basically all the threads(except one) would be waiting forever and the program will reach a state commonly known as deadlock.
We can use lock_guard() to escape the deadlock condition in this case. lock_guard() acquires the mtx lock at the time it is called and automatically releases the lock when the scope of lock ends.
To simplify, lock_guard() releases the lock when either first ‘}’ or any exception occurs.
Note : lock_guard() guarantees to call unlock on destruction but Deadlocks can occur due to other reasons as well. A classic case for deadlock is when thread1 calls lock and waits for resource acquired by thread2 and same is the case with thread2 as well where it waits for resource acquired by thread1.
Example of a deadlock scenario
Note: Threads maintain their own copy of stack calls and local variables are stored in stacks. So local variables are not shared between threads.
That’s all for basics of threads. A more advanced concept is thread-pool where we create ‘n’ number of worker threads and keep on reusing them to complete the tasks. This saves execution time spent in creating and destroying threads. Basic principle behind it’s working is we initialize a vector of ’n’ worker threads and a queue of tasks. Whenever a worker thread has finished executing a task, new task will be moved from the task queue to the worker thread and this process will continue until the task queue is empty. After which we iterate over the thread vector and call join() for each worker thread.