Concurrency and Parallelism


Understanding concurrency and parallelism in Python is crucial for optimizing your applications. These concepts are fundamental for high-performance Python, multi-core programming, and building responsive Python applications.

Concurrency and parallelism are often used interchangeably, but they have distinct meanings in the world of computing, especially when discussing Python performance optimization.

Concurrency deals with handling many things at once. It's about designing your program so that it can make progress on multiple tasks seemingly at the same time. Think of a chef juggling multiple cooking tasks – preparing ingredients, stirring a pot, chopping vegetables. They are all in progress, but the chef might only be performing one action at any given instant. In Python, concurrent programming is achieved through techniques like threading and asynchronous I/O. This is key for responsive Python UIs and non-blocking operations.

Parallelism deals with doing many things at once. It's about actually executing multiple tasks simultaneously, typically on multiple CPU cores. Imagine having multiple chefs in the kitchen, each working on a different dish. This directly leads to faster execution in Python and improved CPU utilization. In Python, true parallelism often involves multi-processing to bypass the Python GIL.

 

Introduction to Concurrency


Python concurrency is a core concept for building efficient and scalable Python applications. It allows programs to appear to do multiple things at once, improving responsiveness and resource utilization.

Concurrency refers to the ability of a system to handle multiple tasks seemingly at the same time. While a single CPU might only execute one instruction at a time, concurrency creates the illusion of simultaneous execution by rapidly switching between tasks. This is particularly beneficial for I/O-bound operations (like network requests or file operations) where a program would otherwise spend a lot of time waiting. Mastering concurrent programming in Python is essential for modern Python development and building high-performance systems.

 

Example 1: Basic concurrent printing

import time
import threading

def task(name, delay):
    """A simple function to simulate a task."""
    print(f"Task {name}: Starting")
    time.sleep(delay)  # Simulate some work
    print(f"Task {name}: Finished")

# Create two threads, each running the task function
thread1 = threading.Thread(target=task, args=("A", 2))
thread2 = threading.Thread(target=task, args=("B", 1))

# Start the threads
thread1.start()
thread2.start()

# Wait for both threads to complete
thread1.join()
thread2.join()

print("All tasks completed concurrently.")

Explanation: This example demonstrates basic concurrency using Python's threading module. Two threading.Thread objects are created, each targeting the task function with different arguments. When start() is called on each thread, they begin executing concurrently. The time.sleep() simulates work, showing that "Task A" and "Task B" start almost simultaneously, even though Task A takes longer to finish. join() ensures the main program waits for both threads to complete before printing "All tasks completed concurrently," illustrating how concurrent tasks can improve overall perceived speed. This is a fundamental concept in Python threading examples.

 

Example 2: Concurrency with non-blocking I/O (conceptual, using asyncio)

import asyncio
import time

async def fetch_data(url):
    """Simulates fetching data from a URL asynchronously."""
    print(f"Fetching data from {url}...")
    await asyncio.sleep(2)  # Simulate network delay
    print(f"Finished fetching data from {url}")
    return f"Data from {url}"

async def main():
    start_time = time.time()
    # Concurrently fetch data from multiple URLs
    results = await asyncio.gather(
        fetch_data("http://example.com/api/data1"),
        fetch_data("http://example.com/api/data2"),
        fetch_data("http://example.com/api/data3")
    )
    end_time = time.time()
    print(f"All data fetched in {end_time - start_time:.2f} seconds.")
    for res in results:
        print(res)

if __name__ == "__main__":
    asyncio.run(main())

Explanation: This example introduces asyncio, Python's standard library for asynchronous programming. The fetch_data function is an async function that simulates a network request using await asyncio.sleep(). In main, asyncio.gather() is used to run multiple fetch_data calls concurrently. This demonstrates how asynchronous I/O in Python allows the program to initiate multiple I/O operations and switch between them without waiting for each to complete, significantly reducing total execution time for I/O-bound tasks. This is a key technique for building high-performance web applications with Python.

 

Example 3: Producer-Consumer with a Queue (intermediate)

import threading
import time
import queue

def producer(q, items_to_produce):
    """Produces items and puts them into the queue."""
    for i in range(items_to_produce):
        item = f"Item-{i}"
        time.sleep(0.1) # Simulate time to produce an item
        print(f"Producer: Produced {item}")
        q.put(item)
    q.put(None) # Signal that production is complete

def consumer(q, consumer_id):
    """Consumes items from the queue."""
    while True:
        item = q.get()
        if item is None:
            q.put(None) # Pass the signal along for other consumers
            break
        time.sleep(0.2) # Simulate time to process an item
        print(f"Consumer {consumer_id}: Consumed {item}")
        q.task_done()

# Create a shared queue
q = queue.Queue()

# Create producer and consumer threads
producer_thread = threading.Thread(target=producer, args=(q, 5))
consumer1_thread = threading.Thread(target=consumer, args=(q, 1))
consumer2_thread = threading.Thread(target=consumer, args=(q, 2))

# Start the threads
producer_thread.start()
consumer1_thread.start()
consumer2_thread.start()

# Wait for all items to be processed
q.join()

print("All items processed. Producer-Consumer example finished.")

Explanation: This example showcases a classic producer-consumer pattern using a queue.Queue for thread-safe communication. The producer thread generates items and puts them into the queue, while multiple consumer threads retrieve and process items from the queue. This pattern effectively demonstrates thread communication and resource sharing in a concurrent environment, preventing race conditions and ensuring data integrity. The q.join() method blocks until all items in the queue have been retrieved and processed by the consumers (signified by q.task_done()). This is a powerful pattern for managing concurrent tasks and data pipelines in Python.

 

Example 4: Concurrent file processing (intermediate)

import threading
import queue
import time
import os

def process_file(filepath):
    """Simulates processing a file by reading its content."""
    try:
        with open(filepath, 'r') as f:
            content = f.read()
            word_count = len(content.split())
            print(f"Processed: {os.path.basename(filepath)} - Words: {word_count}")
        time.sleep(0.1) # Simulate file processing time
    except Exception as e:
        print(f"Error processing {filepath}: {e}")

def file_worker(q):
    """Worker thread that processes files from the queue."""
    while True:
        filepath = q.get()
        if filepath is None: # Sentinel value to stop the worker
            q.task_done()
            break
        process_file(filepath)
        q.task_done()

def main_file_processor(directory, num_workers):
    file_queue = queue.Queue()
    files = [os.path.join(directory, f) for f in os.listdir(directory) if os.path.isfile(os.path.join(directory, f))]

    # Create worker threads
    workers = []
    for _ in range(num_workers):
        worker = threading.Thread(target=file_worker, args=(file_queue,))
        worker.start()
        workers.append(worker)

    # Add files to the queue
    for filepath in files:
        file_queue.put(filepath)

    # Add sentinel values to stop workers
    for _ in range(num_workers):
        file_queue.put(None)

    # Wait for all tasks to be processed
    file_queue.join()
    print("All files processed.")

# Create some dummy files for demonstration
if not os.path.exists("temp_files"):
    os.makedirs("temp_files")
with open("temp_files/file1.txt", "w") as f: f.write("This is file one with some words.")
with open("temp_files/file2.txt", "w") as f: f.write("File two is also here.")
with open("temp_files/file3.txt", "w") as f: f.write("Another file with more content for testing.")

main_file_processor("temp_files", 3)

Explanation: This advanced example demonstrates concurrent file processing using a thread pool. A queue.Queue is used to hold file paths, and multiple worker threads (file_worker) are created to process these files concurrently. Each worker takes a file path from the queue, simulates processing it with process_file, and then marks the task as done. This approach is highly efficient for I/O-bound tasks like reading and processing multiple files, as threads can work on different files simultaneously. This showcases how Python concurrency can be applied to practical data processing tasks.

 

Example 5: Concurrent web scraping with rate limiting (advanced, conceptual)

import asyncio
import aiohttp # requires: pip install aiohttp
import time
from collections import deque

# This is a simplified example and doesn't handle full rate limiting complexity
# such as handling HTTP headers for rate limits or complex backoff strategies.

async def fetch_url(session, url, delay):
    """Fetches a URL with a simulated delay for rate limiting."""
    print(f"Fetching {url}...")
    await asyncio.sleep(delay) # Simulate network and rate limiting delay
    async with session.get(url) as response:
        content = await response.text()
        print(f"Finished {url}: {len(content)} characters")
        return content

async def limited_fetcher(urls, rate_limit_interval_sec, concurrent_limit):
    """Fetches URLs with a rate limit and concurrent limit."""
    async with aiohttp.ClientSession() as session:
        semaphore = asyncio.Semaphore(concurrent_limit)
        last_request_times = deque() # To track request times for rate limiting

        async def _fetch_with_limit(url):
            async with semaphore:
                # Basic rate limiting: ensure a minimum interval between requests
                if last_request_times:
                    time_since_last = time.time() - last_request_times[-1]
                    if time_since_last < rate_limit_interval_sec:
                        await asyncio.sleep(rate_limit_interval_sec - time_since_last)
                
                start_fetch_time = time.time()
                result = await fetch_url(session, url, 0) # No additional sleep here, handled by overall rate limit
                last_request_times.append(time.time())
                return result

        tasks = [_fetch_with_limit(url) for url in urls]
        await asyncio.gather(*tasks)

async def main_scraper():
    urls_to_scrape = [
        "http://example.com",
        "http://example.org",
        "http://example.net",
        "http://example.com/page1",
        "http://example.org/page2",
    ]
    # Limit to 2 concurrent requests, with at least 0.5 seconds between each.
    await limited_fetcher(urls_to_scrape, rate_limit_interval_sec=0.5, concurrent_limit=2)
    print("All URLs fetched with rate limiting.")

if __name__ == "__main__":
    asyncio.run(main_scraper())

Explanation: This advanced example demonstrates concurrent web scraping with rate limiting using asyncio and aiohttp. It introduces asyncio.Semaphore to limit the number of concurrent active connections (e.g., to avoid overwhelming a server). A basic rate-limiting mechanism is implemented using a deque to ensure a minimum interval between requests to a website, preventing IP bans and respecting server policies. This illustrates how advanced concurrency patterns are vital for ethical and efficient web scraping in Python, managing both resource consumption and external service interactions.

 

 

Threads vs. Processes


Understanding the distinction between Python threads and Python processes is crucial for choosing the right concurrency model for your application and effectively utilizing multi-core CPUs in Python.

When designing concurrent Python applications, one of the first decisions you'll face is whether to use threads or processes. Both allow your program to perform multiple tasks, but they operate at different levels and have different implications for performance and resource usage.

Threads (Lightweight Processes):

Threads are units of execution within the same process.

They share the same memory space, which makes communication between threads easier and faster.

However, due to the Global Interpreter Lock (GIL) in CPython (the most common Python implementation), Python threads cannot truly run in parallel on multiple CPU cores for CPU-bound tasks. The GIL ensures that only one thread executes Python bytecode at a time.

Threads are best suited for I/O-bound tasks (e.g., network requests, file I/O, database queries) where the program spends most of its time waiting for external resources. During these waiting periods, the GIL can be released, allowing other threads to run.

Pros: Lower overhead for creation and switching, easy data sharing, good for I/O-bound operations.

Cons: Limited by the GIL for CPU-bound tasks, potential for race conditions if not synchronized correctly.

Processes (Heavyweight Processes):

Processes are independent programs running in their own memory space.

Each process has its own Python interpreter and its own GIL, meaning they can achieve true parallel execution on multi-core processors for CPU-bound tasks.

Communication between processes requires explicit mechanisms (e.g., multiprocessing.Queue, Pipe, shared memory).

Processes are best suited for CPU-bound tasks (e.g., heavy computations, data processing) where you want to leverage multiple CPU cores.

Pros: True parallelism, isolated memory spaces (more robust), bypasses the GIL.

Cons: Higher overhead for creation and switching, more complex inter-process communication, higher memory consumption.

Choosing between threading vs multiprocessing in Python depends heavily on whether your application is I/O-bound or CPU-bound.

 

Example 1: Demonstrating shared memory in threads (conceptual)

import threading

shared_data = {"count": 0} # Data shared between threads

def increment_count():
    """Function to increment a shared counter."""
    for _ in range(100000):
        shared_data["count"] += 1

thread1 = threading.Thread(target=increment_count)
thread2 = threading.Thread(target=increment_count)

thread1.start()
thread2.start()

thread1.join()
thread2.join()

# Due to race conditions without locks, this might not be 200000
print(f"Final shared count (potential race condition): {shared_data['count']}")

Explanation: This example visually demonstrates that threads share the same memory space. Both thread1 and thread2 access and modify the shared_data dictionary. Without proper synchronization (like a lock), the final count is likely not the expected 200,000 due to race conditions, where multiple threads try to update the same variable simultaneously, leading to lost updates. This highlights a key challenge and consideration when working with shared mutable state in multi-threaded Python programs.

 

Example 2: Demonstrating isolated memory in processes

import multiprocessing
import os

def process_function(data):
    """Function run by a process, modifying its local copy of data."""
    data['value'] = os.getpid() # Modify a local copy
    print(f"Process {os.getpid()}: Data value inside process: {data['value']}")

if __name__ == "__main__":
    initial_data = {"value": "original"}
    print(f"Main process: Initial data value: {initial_data['value']}")

    process = multiprocessing.Process(target=process_function, args=(initial_data,))
    process.start()
    process.join()

    # The original 'initial_data' in the main process remains unchanged
    print(f"Main process: Data value after process: {initial_data['value']}")

Explanation: This example illustrates that processes have their own isolated memory spaces. The process_function receives a copy of initial_data. When the process modifies data['value'], it's modifying its local copy, not the initial_data in the main process. This fundamental difference means that changes made by one process are not automatically visible to another, emphasizing the need for explicit inter-process communication (IPC) mechanisms when sharing data between processes. This is a core concept for multi-processing in Python.

 

Example 3: CPU-bound task with threads vs. processes (conceptual)

import time
import threading
import multiprocessing
import os

def cpu_intensive_task():
    """A simple CPU-bound task."""
    result = 0
    for i in range(10_000_000):
        result += i * i
    return result

def run_with_threads(num_threads):
    print(f"\nRunning with {num_threads} threads (CPU-bound):")
    start_time = time.time()
    threads = []
    for _ in range(num_threads):
        thread = threading.Thread(target=cpu_intensive_task)
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    end_time = time.time()
    print(f"Threads finished in: {end_time - start_time:.2f} seconds")

def run_with_processes(num_processes):
    print(f"\nRunning with {num_processes} processes (CPU-bound):")
    start_time = time.time()
    processes = []
    for _ in range(num_processes):
        process = multiprocessing.Process(target=cpu_intensive_task)
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
    end_time = time.time()
    print(f"Processes finished in: {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    num_iterations = 2 # Change for more noticeable difference
    print(f"Number of CPU cores detected: {os.cpu_count()}")

    # Try running with threads
    run_with_threads(num_iterations)

    # Try running with processes
    run_with_processes(num_iterations)

Explanation: This example highlights the impact of the GIL on CPU-bound tasks. Both run_with_threads and run_with_processes execute the same CPU-intensive calculation. When you run this, you'll typically observe that the execution time with multiple threads is not significantly faster (and sometimes even slightly slower due to overhead) than a single thread, because of the GIL. However, with multiple processes, the execution time should ideally decrease proportionally to the number of available CPU cores, demonstrating true parallel execution and how multiprocessing bypasses the GIL. This is a critical distinction for Python performance tuning.

 

Example 4: I/O-bound task with threads vs. processes (conceptual)

import time
import threading
import multiprocessing

def io_intensive_task():
    """A simple I/O-bound task (simulating network request)."""
    time.sleep(1) # Simulate waiting for network response
    print(f"Task finished by {threading.current_thread().name if threading.current_thread().name else multiprocessing.current_process().name}")

def run_io_with_threads(num_tasks):
    print(f"\nRunning {num_tasks} I/O-bound tasks with threads:")
    start_time = time.time()
    threads = []
    for i in range(num_tasks):
        thread = threading.Thread(target=io_intensive_task, name=f"Thread-{i}")
        threads.append(thread)
        thread.start()
    for thread in threads:
        thread.join()
    end_time = time.time()
    print(f"Threads finished in: {end_time - start_time:.2f} seconds")

def run_io_with_processes(num_tasks):
    print(f"\nRunning {num_tasks} I/O-bound tasks with processes:")
    start_time = time.time()
    processes = []
    for i in range(num_tasks):
        process = multiprocessing.Process(target=io_intensive_task, name=f"Process-{i}")
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
    end_time = time.time()
    print(f"Processes finished in: {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    num_io_tasks = 5

    # Run I/O-bound tasks with threads
    run_io_with_threads(num_io_tasks)

    # Run I/O-bound tasks with processes
    run_io_with_processes(num_io_tasks)

Explanation: This example demonstrates the effectiveness of threads for I/O-bound operations. Both functions perform simulated network requests. You'll observe that run_io_with_threads completes significantly faster than a sequential execution of the tasks, and often comparable to or slightly faster than run_io_with_processes for the same number of tasks. This is because during the time.sleep() (simulating I/O wait), the GIL is released, allowing other threads to run. Processes, while offering true parallelism, incur higher overhead for I/O-bound tasks, making threading a common choice for concurrent I/O in Python.

 

Example 5: When to choose: Scenario comparison

import time
import threading
import multiprocessing
import os

def download_file(filename):
    """Simulates downloading a file (I/O-bound)."""
    print(f"Starting download: {filename}")
    time.sleep(2) # Simulate network delay
    print(f"Finished download: {filename}")

def compress_data(data_size):
    """Simulates compressing data (CPU-bound)."""
    print(f"Starting compression: {data_size}MB")
    # Simulate CPU-intensive work by a loop, scaling with data_size
    _ = sum(i * i for i in range(1_000_000 * data_size))
    print(f"Finished compression: {data_size}MB")

def main_threaded_download_compress():
    print("\n--- Scenario: Download with Threads, Compress with Threads ---")
    start_time = time.time()
    
    # Simulate downloading multiple files concurrently using threads
    download_threads = []
    for i in range(3):
        t = threading.Thread(target=download_file, args=(f"file{i+1}.zip",))
        download_threads.append(t)
        t.start()
    for t in download_threads:
        t.join()

    # Simulate compressing multiple data sets concurrently using threads
    compress_threads = []
    for i in range(2):
        t = threading.Thread(target=compress_data, args=(i + 1,))
        compress_threads.append(t)
        t.start()
    for t in compress_threads:
        t.join()

    end_time = time.time()
    print(f"Total time (Threads for both): {end_time - start_time:.2f} seconds")

def main_hybrid_download_compress():
    print("\n--- Scenario: Download with Threads, Compress with Processes ---")
    start_time = time.time()
    
    # Download files concurrently using threads (good for I/O)
    download_threads = []
    for i in range(3):
        t = threading.Thread(target=download_file, args=(f"file{i+1}.zip",))
        download_threads.append(t)
        t.start()
    for t in download_threads:
        t.join()

    # Compress data concurrently using processes (good for CPU)
    compress_processes = []
    for i in range(2):
        p = multiprocessing.Process(target=compress_data, args=(i + 1,))
        compress_processes.append(p)
        p.start()
    for p in compress_processes:
        p.join()

    end_time = time.time()
    print(f"Total time (Threads for download, Processes for compress): {end_time - start_time:.2f} seconds")

if __name__ == "__main__":
    print(f"Number of CPU cores: {os.cpu_count()}")
    main_threaded_download_compress()
    main_hybrid_download_compress()

Explanation: This advanced example demonstrates a practical scenario to illustrate when to choose threads versus processes. main_threaded_download_compress uses threads for both I/O-bound (download) and CPU-bound (compress) tasks. main_hybrid_download_compress uses threads for downloads and processes for compression. You'll likely see that the hybrid approach is more efficient overall because it leverages threads for I/O (where the GIL is less of a bottleneck) and processes for CPU-bound work (where true parallelism is needed). This example emphasizes the importance of analyzing your application's workload (I/O vs. CPU) to make informed decisions about Python concurrency models for optimal performance.

 

 

The Global Interpreter Lock (GIL)


Note: The Python GIL is a critical concept for anyone working with multi-threaded Python applications. Understanding its implications is essential for Python performance tuning and choosing between threading and multiprocessing.

The Global Interpreter Lock (GIL) is a mutex (or a lock) that protects access to Python objects, preventing multiple native threads from executing Python1 bytecodes simultaneously.2 In simpler terms, even on multi-core processors, only one thread can be actively executing Python bytecode at any given time. This means that if you have a CPU-bound task in Python (a task that spends most of its time performing calculations rather than waiting for I/O), adding more threads to it will not make it run faster; in fact, it might even slow it down due to the overhead of context switching between threads. The GIL is one of the most frequently discussed topics in Python concurrency limitations.

Why does the GIL exist?

The GIL was introduced in CPython (the reference implementation of Python) to simplify memory management and prevent race conditions for memory-managed objects. Without the GIL, reference counting (Python's primary memory management technique) would be much more complex to implement and could lead to performance bottlenecks. It makes Python's C extensions easier to write, as they don't have to worry about complex multi-threading issues. However, it's the primary reason why Python's threading module isn't suitable for CPU-bound parallelism.

 

When is the GIL released

The GIL is typically released by a thread during:

I/O operations: When a thread performs an I/O operation (like reading/writing a file, making a network request, or interacting with a database), it releases the GIL, allowing other threads to run. This is why Python threads are effective for I/O-bound tasks.

CPU-bound operations in C extensions: If a CPU-bound operation is implemented in C (and doesn't interact with Python objects), it can explicitly release the GIL, allowing other Python threads to run. Libraries like NumPy often do this.

Time-slicing: The GIL is also periodically released to allow other threads a chance to run, even if the current thread is CPU-bound. This time-slicing mechanism prevents a single CPU-bound thread from monopolizing the interpreter.

 

Implications of the GIL

CPU-bound tasks: For tasks that are primarily CPU-bound, multiprocessing is generally preferred over threading to achieve true parallelism and utilize multiple CPU cores. Each process gets its own Python interpreter and GIL.

I/O-bound tasks: For tasks that involve a lot of waiting for external resources (like network operations or file I/O), threading can be very effective because the GIL is released during these waiting periods, allowing other threads to make progress.

Asynchronous programming: Techniques like asyncio (asynchronous I/O) offer another approach to concurrency for I/O-bound tasks without relying on threads, further bypassing the GIL's impact on I/O.

 

Example 1: Demonstrating GIL's effect on CPU-bound tasks

import time
import threading

def cpu_intensive_function():
    """A CPU-bound function that does a lot of calculations."""
    start_time = time.time()
    count = 0
    for _ in range(100_000_000): # Perform a large number of operations
        count += 1
    end_time = time.time()
    print(f"{threading.current_thread().name}: Finished CPU work in {end_time - start_time:.4f} seconds")

# Run a single CPU-bound task
print("--- Single Thread CPU-bound Task ---")
single_thread = threading.Thread(target=cpu_intensive_function, name="SingleThread")
single_thread.start()
single_thread.join()

# Run two CPU-bound tasks with threads
print("\n--- Two Threads CPU-bound Tasks ---")
thread1 = threading.Thread(target=cpu_intensive_function, name="Thread-1")
thread2 = threading.Thread(target=cpu_intensive_function, name="Thread-2")

start_multithreaded = time.time()
thread1.start()
thread2.start()

thread1.join()
thread2.join()
end_multithreaded = time.time()
print(f"Total time for two CPU-bound threads: {end_multithreaded - start_multithreaded:.4f} seconds")

Explanation: This example clearly demonstrates the GIL's impact on CPU-bound tasks. When you run the cpu_intensive_function with two threads, the total execution time will be roughly the sum of the individual execution times, not a reduction. This is because the GIL ensures that only one thread can execute Python bytecode at a time, preventing true parallel execution of these CPU-intensive operations across multiple cores. This is a common Python threading pitfall when dealing with computational workloads.

 

Example 2: GIL's release during I/O operations

import time
import threading

def io_bound_function(task_id):
    """An I/O-bound function that simulates network delay."""
    print(f"Task {task_id}: Starting I/O operation...")
    time.sleep(2) # Simulate network request, releases GIL
    print(f"Task {task_id}: Finished I/O operation.")

print("--- Two I/O-bound Tasks with Threads ---")
thread1 = threading.Thread(target=io_bound_function, args=(1,), name="IO-Thread-1")
thread2 = threading.Thread(target=io_bound_function, args=(2,), name="IO-Thread-2")

start_time = time.time()
thread1.start()
thread2.start()

thread1.join()
thread2.join()
end_time = time.time()
print(f"Total time for two I/O-bound threads: {end_time - start_time:.4f} seconds")

Explanation: In contrast to the CPU-bound example, this shows how the GIL's release during I/O operations makes threading effective for I/O-bound tasks. Both io_bound_function calls are simulated to take 2 seconds. When run concurrently with threads, the total time will be approximately 2 seconds, not 4 seconds. This is because while one thread is waiting for its simulated I/O operation to complete (during time.sleep()), the GIL is released, allowing the other thread to acquire it and start its own I/O operation. This is why Python threads are widely used for web scraping, networking, and database interactions.

 

Example 3: Visualizing GIL context switching (conceptual)

import threading
import time

def gil_observer_task(name):
    for i in range(5):
        # Simulate a small amount of CPU work
        _ = [j*j for j in range(100_000)]
        print(f"{name}: Step {i}")
        time.sleep(0.01) # Small sleep to ensure context switch

print("--- Observing GIL Context Switching ---")
thread_a = threading.Thread(target=gil_observer_task, args=("A",))
thread_b = threading.Thread(target=gil_observer_task, args=("B",))

thread_a.start()
thread_b.start()

thread_a.join()
thread_b.join()
print("Observation complete.")

Explanation: This example provides a more direct (though simplified) observation of the GIL's context switching. Both threads perform a small amount of CPU work and then a very brief time.sleep(). This sleep causes the GIL to be released and then reacquired, allowing the operating system to switch between threads. You'll see the output alternating between "A: Step X" and "B: Step Y", demonstrating that even for relatively CPU-bound tasks, the GIL is periodically given up, allowing other threads a chance to run. This mechanism prevents a single CPU-bound thread from completely starving other threads, contributing to the perceived concurrency in Python.

 

Example 4: C-extension releasing GIL (Conceptual with numpy)

import time
import threading
import numpy as np # pip install numpy

def numpy_cpu_task(array_size):
    """A CPU-bound task that uses NumPy, which releases the GIL."""
    print(f"NumPy thread: Starting calculation with {array_size} elements...")
    start_time = time.time()
    
    # NumPy operations are often implemented in C and can release the GIL
    # This matrix multiplication is a good example of a CPU-bound task that releases GIL
    a = np.random.rand(array_size, array_size)
    b = np.random.rand(array_size, array_size)
    c = np.dot(a, b) # This operation releases the GIL
    
    end_time = time.time()
    print(f"NumPy thread: Finished calculation in {end_time - start_time:.4f} seconds")
    return c

def regular_cpu_task():
    """A regular Python CPU-bound task that holds the GIL."""
    print("Regular Python thread: Starting CPU calculation...")
    start_time = time.time()
    count = 0
    for _ in range(50_000_000):
        count += 1
    end_time = time.time()
    print(f"Regular Python thread: Finished CPU work in {end_time - start_time:.4f} seconds")

print("--- Concurrent NumPy (GIL-released) and Regular Python (GIL-held) ---")

numpy_thread = threading.Thread(target=numpy_cpu_task, args=(1000,)) # Large array for noticeable effect
regular_thread = threading.Thread(target=regular_cpu_task)

start_total_time = time.time()
numpy_thread.start()
regular_thread.start()

numpy_thread.join()
regular_thread.join()
end_total_time = time.time()
print(f"Total time for mixed CPU tasks: {end_total_time - start_total_time:.4f} seconds")

Explanation: This advanced example demonstrates a key advantage of using libraries like NumPy in multi-threaded Python applications. numpy_cpu_task performs a matrix multiplication, which is implemented in C and explicitly releases the GIL during its execution. In contrast, regular_cpu_task performs a standard Python loop, holding the GIL. When both run concurrently, you'll observe that they can make progress relatively independently, indicating that the NumPy operation does not block the Python interpreter for the regular_cpu_task. This is why scientific computing in Python can benefit from threads even for CPU-bound tasks when leveraging C-optimized libraries.

 

Example 5: Strategies to overcome GIL for CPU-bound tasks

import time
import multiprocessing
import threading

def cpu_intensive_task_long():
    """A longer CPU-bound task."""
    result = 0
    for i in range(100_000_000):
        result += i
    return result

def run_cpu_with_processes():
    print("\n--- Running CPU-bound tasks with Processes (Bypassing GIL) ---")
    start_time = time.time()
    processes = []
    for _ in range(2):
        p = multiprocessing.Process(target=cpu_intensive_task_long)
        processes.append(p)
        p.start()
    for p in processes:
        p.join()
    end_time = time.time()
    print(f"Total time with processes: {end_time - start_time:.4f} seconds")

def run_cpu_with_threads_for_comparison():
    print("\n--- Running CPU-bound tasks with Threads (Limited by GIL) ---")
    start_time = time.time()
    threads = []
    for _ in range(2):
        t = threading.Thread(target=cpu_intensive_task_long)
        threads.append(t)
        t.start()
    for t in threads:
        t.join()
    end_time = time.time()
    print(f"Total time with threads: {end_time - start_time:.4f} seconds")

if __name__ == "__main__":
    run_cpu_with_processes()
    run_cpu_with_threads_for_comparison()
    print("\n--- Other strategies for CPU-bound tasks: ---")
    print("1. Use multiprocessing module for true parallelism.")
    print("2. Rewrite CPU-intensive parts in C/C++/Rust and expose to Python (e.g., via Cython, ctypes).")
    print("3. Utilize libraries that are implemented in C and release the GIL (e.g., NumPy, Pandas).")
    print("4. Consider alternative Python interpreters (Jython, IronPython) which may not have a GIL (but have other tradeoffs).")

Explanation: This comprehensive example demonstrates the primary strategy for overcoming the GIL's limitations on CPU-bound tasks: using multiprocessing. It explicitly compares the execution time of a CPU-intensive task run by multiple processes versus multiple threads. As expected, processes will be significantly faster on multi-core machines because each process has its own interpreter and GIL. The additional print statements summarize other common strategies for Python CPU-bound optimization, including leveraging C extensions and considering alternative Python implementations. This is crucial for high-performance Python development when faced with the GIL.

 

 

Threading (threading module)


Note: The Python threading module is the standard library for implementing concurrent operations using threads. It's an essential tool for building responsive Python applications, especially for I/O-bound tasks.

The threading module provides a high-level, object-oriented API for creating and managing threads in Python. It simplifies the complexities of multi-threading in Python, allowing developers to write concurrent code more easily. While threads in CPython are limited by the GIL for true CPU parallelism, they are incredibly effective for tasks that involve waiting for external resources. This makes Python threads ideal for applications like network programming, web servers, GUI applications, and data processing pipelines where I/O operations are frequent.

 

Key concepts in threading module

threading.Thread: The core class for creating a new thread of execution. You typically pass a target function and its arguments to its constructor.

start(): Method to begin a thread's execution.

join(): Method to wait for a thread to complete its execution. This ensures the main program doesn't exit before background threads have finished.

Thread Synchronization Primitives: Mechanisms like Lock, Semaphore, Event, and Condition are provided to manage access to shared resources and coordinate thread execution, preventing race conditions and ensuring thread safety.

threading.current_thread(): Returns the current Thread object.

threading.active_count(): Returns the number of Thread objects currently alive.

Using the threading module effectively involves identifying parts of your application that can run concurrently without relying heavily on CPU computations. It's the go-to choice for concurrent I/O in Python.

 

 

Creating and managing threads


Note: Creating threads in Python using the threading module is straightforward. Proper thread management involves starting, waiting for, and naming threads for better debugging and control.

 

Example 1: Basic thread creation and execution

import threading
import time

def greet(name, delay):
    """A simple function to be run by a thread."""
    print(f"Thread '{threading.current_thread().name}': Hello, {name}!")
    time.sleep(delay)
    print(f"Thread '{threading.current_thread().name}': Goodbye, {name}!")

# Create a Thread object
# target: the function to be executed by the thread
# args: a tuple of arguments to pass to the target function
thread1 = threading.Thread(target=greet, args=("Alice", 1), name="GreetingThread-Alice")
thread2 = threading.Thread(target=greet, args=("Bob", 0.5), name="GreetingThread-Bob")

# Start the thread's execution
thread1.start()
thread2.start()

# Wait for the thread to complete its execution
# This makes the main program wait for thread1 and thread2 to finish
thread1.join()
thread2.join()

print("Main program finished. All greeting threads completed.")

Explanation: This is the most basic way to create and run threads in Python. Two threading.Thread objects are instantiated, each with a target function (greet) and arguments (args). Calling thread.start() initiates the thread's execution. thread.join() is crucial here; it blocks the main program's execution until the respective thread finishes. Without join(), the "Main program finished" message might appear before the threads complete, demonstrating the importance of waiting for threads to complete for predictable program flow. The name argument helps in debugging multi-threaded applications.

 

Example 2: Threading with subclassing Thread

import threading
import time

class MyThread(threading.Thread):
    def __init__(self, message, delay):
        super().__init__()
        self.message = message
        self.delay = delay
        # Optionally set a thread name
        self.name = f"MyCustomThread-{message}"

    def run(self):
        """The method that will be executed when the thread starts."""
        print(f"{self.name}: Starting with message: '{self.message}'")
        time.sleep(self.delay)
        print(f"{self.name}: Finishing with message: '{self.message}'")

# Create instances of our custom thread class
thread_a = MyThread("Task A", 1.5)
thread_b = MyThread("Task B", 0.8)

# Start the threads
thread_a.start()
thread_b.start()

# Wait for threads to complete
thread_a.join()
thread_b.join()

print("Main program finished. Custom threads completed.")

Explanation: This example demonstrates creating threads by subclassing threading.Thread. By overriding the run() method, you define the code that the thread will execute. This approach is often preferred when you need more complex thread behavior, internal state for the thread, or want to encapsulate thread-specific logic within a class. It provides a more organized way of managing thread code for larger concurrent Python projects.

 

Example 3: Daemon threads (background threads)

import threading
import time

def background_task():
    """A task that runs in the background indefinitely."""
    print("Background task started...")
    while True:
        time.sleep(1)
        print("Background task running...")

# Create a thread and set it as a daemon
daemon_thread = threading.Thread(target=background_task, name="DaemonThread", daemon=True)

# Start the daemon thread
daemon_thread.start()

print("Main program doing its work...")
time.sleep(3) # Main program does some work

print("Main program exiting.")
# Daemon thread automatically terminates when the main program exits

Explanation: This example introduces daemon threads in Python. A daemon thread is a thread that runs in the background and is automatically terminated when the main program exits (or when all non-daemon threads have exited). Setting daemon=True for a thread ensures it won't prevent the main program from exiting. This is useful for tasks like logging, garbage collection, or background monitoring that don't need to block the program's shutdown. Understanding Python daemon threads is important for graceful application termination.

 

Example 4: Managing a pool of threads (using ThreadPoolExecutor)

import concurrent.futures
import time

def process_item(item):
    """Simulates processing an item."""
    print(f"Processing item {item}...")
    time.sleep(1) # Simulate work
    print(f"Finished item {item}")
    return f"Processed {item}"

# Using ThreadPoolExecutor for managing a pool of worker threads
# max_workers: specifies the maximum number of threads in the pool
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    items_to_process = [1, 2, 3, 4, 5, 6]
    
    # Submit tasks to the thread pool
    # map is useful for applying a function to a list of iterables
    # as_completed returns futures as they complete
    
    # Using submit and as_completed for more control
    print("--- Using submit and as_completed ---")
    future_to_item = {executor.submit(process_item, item): item for item in items_to_process}
    for future in concurrent.futures.as_completed(future_to_item):
        item = future_to_item[future]
        try:
            result = future.result()
            print(f"Result for {item}: {result}")
        except Exception as exc:
            print(f'{item} generated an exception: {exc}')

    # Using map for simpler cases (results returned in order of submission)
    print("\n--- Using map ---")
    results_map = executor.map(process_item, items_to_process)
    for result in results_map:
        print(f"Map result: {result}")

print("All items processed using ThreadPoolExecutor.")

Explanation: This example introduces concurrent.futures.ThreadPoolExecutor, a higher-level API for managing thread pools in Python. Instead of manually creating and managing Thread objects, you define a pool of worker threads. Tasks are submitted to this pool, and the executor handles their scheduling and execution. This is a much more robust and efficient way to manage concurrent tasks, especially when you have many small tasks. It simplifies common patterns like waiting for all tasks to complete (executor.map returns results in order, as_completed gives them as they finish) and is highly recommended for production-ready concurrent Python applications.

 

Example 5: Thread local storage (advanced)

import threading
import time

# Create a thread-local storage object
thread_data = threading.local()

def worker_with_local_data(name):
    """A worker function that uses thread-local data."""
    # Each thread will have its own independent 'value' attribute
    thread_data.value = f"Data for {name}"
    print(f"{threading.current_thread().name}: Initial thread_data.value = {thread_data.value}")
    
    time.sleep(0.1) # Simulate some work

    # Verify that other threads' data is not affected
    print(f"{threading.current_thread().name}: After sleep, thread_data.value = {thread_data.value}")

thread1 = threading.Thread(target=worker_with_local_data, args=("Thread-1",), name="Thread-1")
thread2 = threading.Thread(target=worker_with_local_data, args=("Thread-2",), name="Thread-2")

thread1.start()
thread2.start()

thread1.join()
thread2.join()

print("Main program finished. Demonstrating thread-local storage.")
# Attempting to access thread_data.value from the main thread will likely raise an AttributeError
# print(f"Main thread: {getattr(thread_data, 'value', 'Not set in main thread')}")

Explanation: This advanced example demonstrates threading.local(), which provides thread-local storage. This means that any attributes set on a threading.local() object are specific to the thread that sets them. Each thread gets its own independent copy of these attributes, preventing data clashes when multiple threads might otherwise try to use a shared global variable for their own transient state. This is incredibly useful for managing state in multi-threaded environments without resorting to complex locking mechanisms for data that is unique to each thread. It's a key tool for designing robust concurrent systems in Python.

 

 

Thread Synchronization (Locks, Semaphores, Events)


Note: Thread synchronization in Python is critical for preventing race conditions and ensuring data integrity in multi-threaded applications. Using locks, semaphores, and events from the threading module allows you to coordinate thread execution and safely access shared resources.

When multiple threads access and modify shared resources (like variables, lists, or external files), unpredictable behavior can occur due to race conditions. Without proper synchronization, the final state of the shared resource might depend on the non-deterministic order in which threads execute their operations. The threading module provides various synchronization primitives to manage this access and ensure that threads cooperate correctly. Mastering Python thread safety is paramount for reliable concurrent programming.

 

Common synchronization primitives

Lock: The simplest synchronization primitive. It's used to protect critical sections of code, ensuring that only one thread can execute that section at a time. A thread acquires a lock before entering the critical section and releases it afterward. If another thread tries to acquire an already acquired lock, it waits until the lock is released. This is the cornerstone for mutual exclusion in Python.

RLock (Reentrant Lock): Similar to a Lock, but it can be acquired multiple times by the same thread. This is useful in recursive functions where a thread might need to acquire the same lock more than once.

Semaphore: A more general synchronization primitive. It maintains an internal counter, which is decremented by each acquire() call and incremented by each release() call. Threads can acquire a semaphore as long as the counter is greater than zero. It's often used to limit the number of threads that can access a resource simultaneously (e.g., a limited number of database connections).

Event: A simple signaling mechanism between threads. One thread can set an internal flag, and other threads can wait for that flag to be set. It's useful for coordinating when a particular event has occurred.

Condition: A more advanced synchronization primitive built on top of a lock. It allows threads to wait for a specific condition to be met, and then be notified when the condition changes. Often used in producer-consumer patterns.

 

Example 1: Using Lock for mutual exclusion (Preventing Race Conditions)

import threading
import time

shared_counter = 0
# Create a Lock object
lock = threading.Lock()

def increment_counter_safely():
    """Increments a shared counter safely using a lock."""
    global shared_counter
    for _ in range(100_000):
        # Acquire the lock before accessing the shared resource
        lock.acquire()
        try:
            shared_counter += 1
        finally:
            # Always release the lock, even if an error occurs
            lock.release()

        # A more Pythonic way using 'with' statement for locks:
        # with lock:
        #     shared_counter += 1

print("--- Demonstrating Lock for Mutual Exclusion ---")
threads = []
for i in range(5):
    thread = threading.Thread(target=increment_counter_safely, name=f"Thread-{i}")
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

# Now, shared_counter should be the correct value (5 * 100,000 = 500,000)
print(f"Final shared counter (with lock): {shared_counter}")

# For comparison, without lock (will likely be incorrect due to race condition)
shared_counter_unlocked = 0
def increment_counter_unsafe():
    global shared_counter_unlocked
    for _ in range(100_000):
        shared_counter_unlocked += 1

print("\n--- Demonstrating Race Condition (Without Lock) ---")
threads_unsafe = []
for i in range(5):
    thread = threading.Thread(target=increment_counter_unsafe, name=f"UnsafeThread-{i}")
    threads_unsafe.append(thread)
    thread.start()

for thread in threads_unsafe:
    thread.join()
print(f"Final shared counter (without lock, likely incorrect): {shared_counter_unlocked}")

Explanation: This example vividly demonstrates the necessity of locks for thread safety. The increment_counter_safely function uses a threading.Lock to protect the shared_counter. Only one thread can acquire the lock and enter the critical section (where shared_counter is incremented) at any given time. This guarantees that increments are atomic and no updates are lost. The with lock: statement is the preferred Pythonic way to use locks, ensuring the lock is automatically released even if exceptions occur. The comparison with increment_counter_unsafe (without a lock) clearly shows the race condition and the resulting incorrect value, highlighting the importance of mutual exclusion when working with shared mutable state in multi-threaded Python.

 

Example 2: Using Semaphore to limit concurrent access

import threading
import time

# Create a Semaphore that allows up to 3 threads to access a resource simultaneously
semaphore = threading.Semaphore(3)

def access_resource(worker_id):
    """Simulates accessing a limited resource."""
    print(f"Worker {worker_id}: Attempting to acquire semaphore...")
    with semaphore: # Acquire the semaphore
        print(f"Worker {worker_id}: Acquired semaphore. Accessing resource...")
        time.sleep(2) # Simulate resource usage
        print(f"Worker {worker_id}: Releasing semaphore. Finished accessing resource.")

print("--- Demonstrating Semaphore for Limiting Concurrent Access ---")
threads = []
for i in range(7): # Create more threads than the semaphore limit
    thread = threading.Thread(target=access_resource, args=(i + 1,), name=f"WorkerThread-{i+1}")
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("All workers have finished accessing the resource.")

Explanation: This example showcases threading.Semaphore, which is used to limit the number of threads that can access a particular resource concurrently. The semaphore is initialized with a value of 3, meaning only three threads can enter the with semaphore: block at any given time. Other threads attempting to acquire the semaphore will block until a currently active thread releases it. This is extremely useful for resource pooling, managing access to limited external services (like API rate limits), or controlling the number of parallel database connections. It's a key tool for resource management in concurrent Python applications.

 

Example 3: Using Event for signaling between threads

import threading
import time

# Create an Event object
event = threading.Event()

def waiter_thread(thread_id):
    """Waits for the event to be set."""
    print(f"Waiter {thread_id}: Waiting for event to be set...")
    # Blocks until the event's internal flag is true
    event.wait() 
    print(f"Waiter {thread_id}: Event received! Proceeding...")
    # Do work after event is set
    print(f"Waiter {thread_id}: Doing work after event...")
    time.sleep(1)
    print(f"Waiter {thread_id}: Work complete.")

def setter_thread():
    """Sets the event after a delay."""
    print("Setter: Will set event in 3 seconds...")
    time.sleep(3)
    print("Setter: Setting event!")
    # Set the event's internal flag to true
    event.set()
    print("Setter: Event set. Other threads notified.")

print("--- Demonstrating Event for Thread Signaling ---")

# Create waiter threads
waiters = []
for i in range(2):
    t = threading.Thread(target=waiter_thread, args=(i + 1,), name=f"WaiterThread-{i+1}")
    waiters.append(t)
    t.start()

# Create the setter thread
setter = threading.Thread(target=setter_thread, name="SetterThread")
setter.start()

# Wait for all threads to complete
setter.join()
for t in waiters:
    t.join()

print("All threads finished.")

Explanation: This example demonstrates threading.Event, a simple yet powerful signaling mechanism. The waiter_thread calls event.wait(), which blocks its execution until the event.set() method is called by setter_thread. Once event.set() is called, all threads currently waiting on the event are unblocked and can proceed. This is highly useful for coordinating the start of multiple threads based on a specific condition, or for notifying worker threads when new data is available or a specific state has been reached. It's an important tool for thread coordination in Python.

 

Example 4: Producer-Consumer with Condition Variable (advanced)

import threading
import time
import collections

# Create a shared buffer (queue) and a Condition variable
buffer = collections.deque()
buffer_limit = 5
condition = threading.Condition()

def producer():
    """Produces items and adds them to the buffer."""
    for i in range(10):
        with condition:
            # Wait if the buffer is full
            while len(buffer) == buffer_limit:
                print("Producer: Buffer full, waiting...")
                condition.wait() # Release lock and wait for consumer to notify
            
            item = f"Item-{i}"
            buffer.append(item)
            print(f"Producer: Produced {item}. Buffer: {list(buffer)}")
            # Notify consumers that new items are available
            condition.notify_all() 
        time.sleep(0.5) # Simulate time to produce

def consumer(consumer_id):
    """Consumes items from the buffer."""
    while True:
        with condition:
            # Wait if the buffer is empty
            while not buffer:
                print(f"Consumer {consumer_id}: Buffer empty, waiting...")
                condition.wait() # Release lock and wait for producer to notify
            
            item = buffer.popleft()
            print(f"Consumer {consumer_id}: Consumed {item}. Buffer: {list(buffer)}")
            # Notify producer that space is available
            condition.notify_all()
        time.sleep(1) # Simulate time to consume

print("--- Demonstrating Producer-Consumer with Condition Variable ---")

producer_thread = threading.Thread(target=producer, name="ProducerThread")
consumer1_thread = threading.Thread(target=consumer, args=(1,), name="ConsumerThread-1")
consumer2_thread = threading.Thread(target=consumer, args=(2,), name="ConsumerThread-2")

producer_thread.start()
consumer1_thread.start()
consumer2_thread.start()

# In a real application, you'd have a mechanism to stop consumers gracefully.
# For this example, we'll let it run for a bit and then exit.
time.sleep(15) 
print("\nMain: Exiting, stopping consumer threads (manual intervention needed in real app).")
# In a real app, you'd use a sentinel value or event to stop consumer loops

Explanation: This complex example demonstrates the producer-consumer pattern using threading.Condition. The producer and consumer threads coordinate their access to a shared buffer using a single Condition object. When the buffer is full, the producer waits (condition.wait()); when it's empty, the consumers wait. After an item is added/removed, condition.notify_all() wakes up waiting threads. This is a powerful pattern for managing shared queues and data pipelines where threads need to wait for specific conditions to be met before proceeding, ensuring efficient thread communication and synchronization.

 

Example 5: Deadlock scenario (and how to avoid)

import threading
import time

lockA = threading.Lock()
lockB = threading.Lock()

def task1():
    """Acquires LockA then LockB."""
    print("Task 1: Attempting to acquire LockA...")
    with lockA:
        print("Task 1: Acquired LockA. Attempting to acquire LockB...")
        time.sleep(0.1) # Simulate work before acquiring next lock
        with lockB:
            print("Task 1: Acquired LockB. Doing work...")
            time.sleep(0.5)
            print("Task 1: Released LockB and LockA.")

def task2():
    """Acquires LockB then LockA (potential for deadlock)."""
    print("Task 2: Attempting to acquire LockB...")
    with lockB: # This order is different from task1
        print("Task 2: Acquired LockB. Attempting to acquire LockA...")
        time.sleep(0.1)
        with lockA:
            print("Task 2: Acquired LockA. Doing work...")
            time.sleep(0.5)
            print("Task 2: Released LockA and LockB.")

print("--- Demonstrating Potential Deadlock ---")
thread1 = threading.Thread(target=task1, name="Thread-Task1")
thread2 = threading.Thread(target=task2, name="Thread-Task2")

thread1.start()
thread2.start()

thread1.join()
thread2.join()
print("All tasks theoretically completed (may deadlock).")

# How to avoid deadlock: Always acquire locks in a consistent order!
print("\n--- Avoiding Deadlock (Consistent Lock Order) ---")

def task1_no_deadlock():
    print("Task 1 (No Deadlock): Attempting to acquire LockA...")
    with lockA:
        print("Task 1 (No Deadlock): Acquired LockA. Attempting to acquire LockB...")
        with lockB: # Always acquire A then B
            print("Task 1 (No Deadlock): Acquired LockB. Doing work...")
            time.sleep(0.5)
            print("Task 1 (No Deadlock): Released LockB and LockA.")

def task2_no_deadlock():
    print("Task 2 (No Deadlock): Attempting to acquire LockA...")
    with lockA: # Always acquire A then B
        print("Task 2 (No Deadlock): Acquired LockA. Attempting to acquire LockB...")
        with lockB:
            print("Task 2 (No Deadlock): Acquired LockB. Doing work...")
            time.sleep(0.5)
            print("Task 2 (No Deadlock): Released LockA and LockB.")

thread1_safe = threading.Thread(target=task1_no_deadlock, name="Thread-SafeTask1")
thread2_safe = threading.Thread(target=task2_no_deadlock, name="Thread-SafeTask2")

thread1_safe.start()
thread2_safe.start()

thread1_safe.join()
thread2_safe.join()
print("All tasks completed (deadlock avoided).")

Explanation: This advanced example deliberately demonstrates a deadlock scenario in multi-threaded programming and then shows how to avoid it. In the task1 and task2 functions, two threads attempt to acquire two different locks (lockA and lockB) in different orders. If thread1 acquires lockA and thread2 acquires lockB at roughly the same time, both will then try to acquire the other's already held lock, resulting in a deadlock where neither thread can proceed. The "How to avoid deadlock" section then shows the solution: always acquire locks in a consistent, predetermined order across all threads. This ensures that no circular dependency for resources can arise. Understanding and preventing deadlocks is a crucial aspect of robust concurrent system design in Python.