OS بحث 1
OS بحث 1
Sec: 4
Id: 323240115
An Analysis of Python Threads:
Parallelism vs. Concurrency in
Single-Core and Multi-Core
CPU Environments
Introduction to Python Threads:
Threading is the go-to way to implement concurrency or parallelism in programming. Python
offers numerous constructs and classes to leverage threads for better performance and
responsiveness. However, it is also important to follow thread safety best practices to avoid
critical issues like race conditions and deadlocks.
In this article, we will take a deep dive into threading in Python. Let’s discuss what threading
is and why you might want to use it. Then we will look at ways to create and manage
threads in Python. We will also explore some of the challenges of using threading and how
to avoid them.
What is threading?
Threading (or multi-threading) is an execution model that enables programmers to implement
concurrency or parallelism. A thread is a lightweight unit of a program that can run independently. All
threads share the same memory space and resources of the main program.
Improved performance: Threads allow you to do more work in less time. For
example, if you have to make API calls to two different servers, you can create and
run two threads simultaneously, one for each API call.
Responsiveness: Threading boosts program responsiveness by allowing it to
handle multiple requests simultaneously. For example, a web server may create a
new thread for each incoming request. This allows the web server to concurrently
respond to the requests of multiple users, enhancing overall experience.
Simplified code: When done right, threading can simplify your code by allowing you
to break down large tasks into smaller, more manageable chunks. This adds to the
overall maintainability of a codebase.
Increased scalability: Threading can also improve the scalability of your program by
allowing it to be adapted to run on multiple cores or machines. For example, if you
are migrating from a single-core to a multi-core architecture, you can leverage thread
parallelism to scale up your application.
Simplified communication: Threads share the same memory space, which makes
communication between them more straightforward than with multiple processes.
This simplifies the implementation of tasks that require sharing data or coordination
between different parts of a program.
Multithreading vs multiprocessing:
Multi-processing involves running multiple processes simultaneously. Unlike a thread, each
process gets its own dedicated memory space. Multi-processing is well-suited for intensive
CPU-bound tasks, and it can take full advantage of multi-core processors. Each process
may run on a separate CPU core, increasing overall performance.
Processes typically consume more system resources than threads due to their independent
memory space. This can also limit the number of processes you can run concurrently. Inter-
process communication (IPC) between processes is generally more complicated and
expensive than inter-thread communication.
Asynchronous programming is a great fit for I/O-bound scenarios, such as web scraping,
network requests, or database queries. It prevents blocking and maximizes the utilization of
a single thread.
You want to simplify the code by avoiding the need to worry about thread
synchronization.
You want to improve the performance of the program by avoiding the overhead of too
many context switches.
You are building an application that handles real-time updates to data, like chat or
streaming applications. Asynchronous apps excel at handling streams of data without
blocking.
You are building a responsive web application using JavaScript frameworks or
libraries, like Node.js or React.
For applications that require maximum resource utilization on multi-core machines, the
official Python documentation recommends using the “multiprocessing” module. However,
it's important to note that I/O-bound tasks can still benefit greatly from threading. During I/O-
bound operations, like file I/O or database queries, the GIL is released, allowing multiple
threads to progress concurrently.
Creating threads
You can create a new thread by calling the Threading.Thread() constructor. The constructor
accepts different arguments, including the thread target function, the thread name, and a list
of thread arguments. The target function contains the code that the thread will execute when
it starts.
For example, the following piece of code imports the threading module, defines a target
function, and then creates a new thread object using the Threading.Thread() constructor.
import threading
def my_function():
my_thread = threading.Thread(target=my_function)
Starting threads
The above code created a thread object, but didn’t start it. To start the execution of a thread,
we use the start() method exposed by the Thread object. Invoking this function executes the
thread’s target function concurrently with the main program.
my_thread.start()
To wait for a thread to complete its execution, we can call the join() method on the Thread
object. This causes the calling thread (often the main program) to block until the thread
terminates.
my_thread.join()
Daemon threads
Daemon threads are threads that run in the background and don't prevent the main program
from exiting. You can make a thread a daemon either:
Or
By setting the Thread object's daemon property to True before invoking the start()
method.
For example, the following code sets the daemon property to true, and then calls start().
my_thread.daemon = True
my_thread.start()
There are several other functions exposed by the Threading module that a developer should
know:
Race conditions are errors that can occur when multiple threads access the same data at
the same time. Deadlocks are situations where two or more threads are waiting for each
other to release a resource. This can cause the threads to block indefinitely, halting the
program.
Locks
Locks are synchronization primitives that ensure that only one thread can access a block of
code at a time. The Threading module offers a Lock class that can be used for this purpose.
The Lock class has two main functions: acquire() and release().
At any time, a lock object can be in one of two possible states: “locked” or “unlocked”.
When you call acquire() on a locked Lock object, it blocks the current thread until
another thread calls release() on the same lock.
When you call acquire() on an unlocked Lock object, the state of the Lock is
immediately changed to “locked”.
When you call release() on a locked Lock object, the object’s state is immediately
changed to “unlocked”. Calling release() on an already unlocked object leads to a
runtime error.
The following code gives a simplified example on how to create and use a lock.
import threading
# Create a lock
lock = threading.Lock()
lock.acquire()
# Critical section
# ……
An RLock, or Reentrant Lock, is an extension of the basic lock that can be acquired multiple
times by the same thread. It's especially useful in recursion scenarios, or when a function
calls another function that also needs the lock already held by the calling function.
The threading module provides the RLock class for this purpose. Consider the following
example where the same thread acquires and releases the rlock multiple times:
import threading
class SharedData:
def __init__(self):
self.counter1 = 1
self.counter2 = 2
self.lock = threading.RLock()
def incrementCounter1(self):
try:
self.counter1 = self.counter1 + 1
finally:
self.lock.release()
def updateCounter2(self):
try:
finally:
self.lock.release()
def updateCounters(self):
try:
self.incrementCounter1()
self.updateCounter2()
finally:
Semaphores are objects that maintain counters for controlling access to a resource. They
allow a specific number of threads to access a resource concurrently. Each acquire() call
decrements the counter, and each release() call increments it. If the counter reaches 0, the
next acquire() call blocks until a release is called() by another thread.
The threading module includes the Semaphore class for this purpose:
import threading
semaphore = threading.Semaphore(3)
semaphore.acquire()
# ……
semaphore.release()
Condition variables
Condition variables are synchronization primitives that allow threads to wait for specific
conditions to become true before proceeding. A condition variable is always linked to a lock.
It is typically used to coordinate the execution of different threads in response to some
shared state.
The Condition class in the threading module allows us to implement condition variables.
Calling the “wait” or “wait_for” functions of a condition variable object releases the linked lock
and waits for another thread to call “notify()” or “notify_all()”.
Consider this example where a job processing thread waits for a job producing thread to
create a job before starting its processing. The line comments provide explanations for the
different lines of code.
import threading
condition = threading.Condition()
def consume_job():
with condition:
fetch_new_job()
def produce_job():
with condition:
create_new_job()
All synchronization primitives that the Threading module provides can be expressed using
the “with” statement syntax. “with” is a form of “Resource Acquisition Is Initialization” (RAII),
a principle used to manage resources in a way that automatically releases them when they
go out of scope.
By using the “with” statement, you can prevent potential deadlocks and enhance the
readability and maintainability of your code.
with my_lock:
my_lock.acquire()
try:
#important code here
finally:
my_lock.release()
The following code creates a thread pool and uses it to perform some asynchronous tasks.
The line comments provide explanations for the different lines of code.
import concurrent.futures
# Function to simulate a time-consuming task
def perform_task(task_id):
result = task_id * 2
return result
tasks = [11,12,13,14,15,16,17]
concurrent.futures.wait(results)
result = future.result()
print(f"Result: {result}")
First in, first out (FIFO): The items are removed in the order they were added.
Last in, first out (LIFO): This queue functions like a stack, where the most recently
added items are the first to be removed
Priority: Items are removed based on their assigned priority, with the lowest priority
items being removed first.
The following code creates a LIFO queue and defines a producer thread that inserts some
items into the queue. It also initializes a consumer thread that processes values from the
queue. The line comments provide explanations for the different lines of code.
import threading
import queue
# Create a synchronized LIFO queue
lifo_queue = queue.LifoQueue()
In a worker thread model, a pool of pre-defined threads is created at the start of the
application. These threads are designed to be long-lived, with the main thread consistently
distributing incoming workloads across them.
Conversely, in a per-request model, the main thread spawns a new thread for each
incoming request. These threads are short-lived — i.e., they terminate after processing the
request.
Depending on your resource configurations and application requirements, you can use
either worker threads or per-request threads.
You want to have a smaller memory footprint by reducing the overhead of thread creation
and destruction.
Your users can tolerate slight delays in responses, especially during peak hours, as worker
threads may be busy processing other tasks.
You have long-running tasks that shouldn’t block the main thread.
You have abundant system resources, and can handle a large memory footprint, even during
peak usage.
Your application performs short-lived tasks in response to user requests, such as serving
web requests.
Your users require near-instantaneous responses, and your infrastructure can support the
rapid creation and management of short-lived threads.
Conclusion:
Multithreading is an important concept for developers to grasp, regardless of the language
they are using. Python offers built-in classes and constructs that can be used to efficiently
and safely manage a large number of threads.
This article has introduced you to some of the most important classes and constructs. You
can use this knowledge to build scalable, multi-threaded applications that are free of race
conditions and deadlocks.