Let's dive deep into two powerful and often intertwined topics: Iterators and Generators. These concepts are fundamental for writing efficient, clean, and Pythonic code, especially when dealing with large datasets or infinite sequences.
Iterators and Generators: Unlocking Python's Efficiency
Iterators and generators are essential tools in Python for processing sequences of data. While they both provide ways to iterate over data, they do so with distinct mechanisms and benefits. Understanding their differences and how to use them effectively will significantly enhance your Python programming skills.
__iter__
and __next__
Methods: The Foundation of Iteration
At its core, iteration in Python is powered by the iterator protocol. This protocol involves two special methods: __iter__
and __next__
. Any object that implements these two methods can be considered an iterator and can be used in a for
loop.
The __iter__
method is called when an iterator is created. It should return the iterator object itself.
The __next__
method is responsible for returning the next item in the sequence. When there are no more items, it should raise the StopIteration
exception, signaling the end of the iteration.
Understanding these methods is crucial for building custom iterable objects, giving you precise control over how your data is traversed.
Note: Python iteration, custom iterators, for
loop mechanism, StopIteration
exception.
Here are five code examples demonstrating the __iter__
and __next__
methods, ranging from beginner-friendly to more advanced:
Example 1: Basic Custom Counter Iterator (Beginner)
Python
class MyCounter:
"""
A simple custom iterator that counts from a starting number up to an ending number.
This demonstrates the basic __iter__ and __next__ methods for fundamental iteration.
"""
def __init__(self, start, end):
self.current = start
self.end = end
def __iter__(self):
# The __iter__ method returns the iterator object itself.
print("Inside __iter__ method: Initializing iterator.")
return self
def __next__(self):
# The __next__ method returns the next item in the sequence.
if self.current < self.end:
num = self.current
self.current += 1
return num
else:
# When no more items, StopIteration is raised to signal the end.
print("Inside __next__ method: No more items, raising StopIteration.")
raise StopIteration
# --- Explanation ---
# This class creates an iterable counter. When you iterate over an instance of MyCounter,
# it will sequentially return numbers from 'start' up to (but not including) 'end'.
# It clearly shows how __iter__ returns 'self' and __next__ provides the next value
# or raises StopIteration.
# --- Usage Example ---
print("--- Example 1 Usage ---")
my_iterator = MyCounter(1, 5)
for number in my_iterator:
print(f"Count: {number}")
# You can also manually call next() (though not common for regular use)
# print("\nManually calling next():")
# manual_counter = MyCounter(1, 3)
# print(next(manual_counter))
# print(next(manual_counter))
# print(next(manual_counter)) # This will raise StopIteration
Example 2: Iterating Over a Custom List-like Object (Intermediate)
Python
class MyCustomList:
"""
A custom list-like class that allows iteration over its internal data.
This demonstrates how to make a composite object iterable.
"""
def __init__(self, data):
self._data = list(data) # Store data internally as a list
self._index = 0
def __iter__(self):
# Reset index for new iteration, allowing multiple iterations.
self._index = 0
return self
def __next__(self):
if self._index < len(self._data):
item = self._data[self._index]
self._index += 1
return item
else:
raise StopIteration
# --- Explanation ---
# This example shows how to make a custom container class iterable.
# The __iter__ method resets the internal index, ensuring that each new
# iteration starts from the beginning of the data. __next__ retrieves
# elements from the internal list.
# --- Usage Example ---
print("\n--- Example 2 Usage ---")
my_list = MyCustomList(['apple', 'banana', 'cherry'])
print("First iteration:")
for fruit in my_list:
print(f"Fruit: {fruit}")
print("\nSecond iteration (re-starts from beginning):")
for fruit in my_list:
print(f"Another Fruit: {fruit}")
Example 3: Infinite Sequence Iterator (Advanced Beginner/Intermediate)
Python
class InfiniteSequence:
"""
An iterator that generates an infinite sequence of numbers starting from a given value.
Demonstrates an iterator that doesn't necessarily raise StopIteration naturally.
"""
def __init__(self, start=0):
self.current = start
def __iter__(self):
return self
def __next__(self):
num = self.current
self.current += 1
return num
# --- Explanation ---
# This class creates an iterator for an infinite sequence. It will continuously
# yield the next number. You typically need a condition to break out of a loop
# when using such an iterator, or it will run forever.
# --- Usage Example ---
print("\n--- Example 3 Usage ---")
infinity_counter = InfiniteSequence(10)
print("First 5 numbers from infinite sequence:")
for _ in range(5): # Take only the first 5 for demonstration
print(next(infinity_counter))
# Another way to use, with a break condition:
print("\nNumbers from 20 to 23 from infinite sequence:")
infinity_from_20 = InfiniteSequence(20)
for num in infinity_from_20:
print(num)
if num >= 23: # Break condition to stop the infinite loop
break
Example 4: Iterator for Fibonacci Sequence (Intermediate)
Python
class FibonacciIterator:
"""
An iterator that generates the Fibonacci sequence up to a specified limit.
"""
def __init__(self, max_limit):
self.max_limit = max_limit
self.a = 0
self.b = 1
def __iter__(self):
# Reset for a new iteration
self.a = 0
self.b = 1
return self
def __next__(self):
if self.a > self.max_limit:
raise StopIteration
current_fib = self.a
self.a, self.b = self.b, self.a + self.b # Calculate next Fibonacci numbers
return current_fib
# --- Explanation ---
# This iterator calculates Fibonacci numbers on the fly. It keeps track of the
# previous two numbers to generate the next one. The iteration stops once the
# current Fibonacci number exceeds the `max_limit`.
# --- Usage Example ---
print("\n--- Example 4 Usage ---")
fib_seq = FibonacciIterator(50)
print("Fibonacci sequence up to 50:")
for num in fib_seq:
print(num)
print("\nAnother Fibonacci sequence up to 100:")
fib_seq_100 = FibonacciIterator(100)
for num in fib_seq_100:
print(num)
Example 5: File Line Reader Iterator (Advanced)
Python
class FileLineReader:
"""
An iterator that reads lines from a file one by one, providing memory efficiency
for large files by not loading the entire file into memory at once.
"""
def __init__(self, filepath):
self.filepath = filepath
self.file_obj = None
def __iter__(self):
# Open the file for reading. If it's already open, close and reopen.
if self.file_obj:
self.file_obj.close()
try:
self.file_obj = open(self.filepath, 'r')
return self
except FileNotFoundError:
print(f"Error: File not found at {self.filepath}")
raise # Re-raise the exception after printing
def __next__(self):
if self.file_obj is None:
# If __iter__ failed (e.g., file not found)
raise StopIteration
line = self.file_obj.readline()
if line:
return line.strip() # Remove newline characters
else:
# End of file, close the file object and raise StopIteration
self.file_obj.close()
self.file_obj = None # Mark as closed
raise StopIteration
# --- Explanation ---
# This iterator demonstrates a practical use case: reading large files line by line.
# It opens the file in __iter__ and reads one line at a time in __next__.
# This prevents loading the entire file into memory, which is crucial for performance
# and memory usage when dealing with very large files. It also handles file closing.
# --- Usage Example ---
print("\n--- Example 5 Usage ---")
# Create a dummy file for demonstration
with open("my_large_file.txt", "w") as f:
f.write("Line 1: Hello Python!\n")
f.write("Line 2: Iterators are cool.\n")
f.write("Line 3: Generators are even cooler!\n")
f.write("Line 4: End of file.\n")
file_reader = FileLineReader("my_large_file.txt")
print("Reading lines from my_large_file.txt:")
try:
for line in file_reader:
print(f"Read: {line}")
except FileNotFoundError:
print("Could not read file.")
# Clean up the dummy file
import os
if os.path.exists("my_large_file.txt"):
os.remove("my_large_file.txt")
yield
keyword: The Power of Generators
While __iter__
and __next__
give you fine-grained control, Python offers a more concise and often more elegant way to create iterators: generator functions using the yield
keyword. A function becomes a generator function if it contains at least one yield
statement.
When a generator function is called, it doesn't execute its body immediately. Instead, it returns a special object called a generator iterator. The execution of the generator function is paused at each yield
statement, and the yielded value is returned to the caller. The state of the function is saved, and execution resumes from where it left off the next time next()
is called on the generator iterator.
This "lazy evaluation" makes generators incredibly memory efficient, as they only produce values on demand.
Note: Python generators, yield
statement, generator functions, lazy evaluation, memory efficiency.
Here are five code examples illustrating the yield
keyword, ranging from beginner-friendly to more advanced:
Example 1: Simple Counter Generator (Beginner)
Python
def simple_counter_generator(start, end):
"""
A basic generator function that yields numbers from start up to (but not including) end.
This is the simplest form of a generator, showing the power of 'yield'.
"""
print("Generator started!")
current = start
while current < end:
yield current # Pause execution and return 'current', then resume next time
current += 1
print("Generator finished!")
# --- Explanation ---
# This function is a generator because it uses 'yield'. When called, it returns
# a generator object. Each time 'next()' is called on the generator object,
# the function executes until the next 'yield', returning the value.
# --- Usage Example ---
print("--- Example 1 Usage ---")
counter_gen = simple_counter_generator(1, 5)
print(f"Type of counter_gen: {type(counter_gen)}")
# Iterate using a for loop
print("Iterating with for loop:")
for num in counter_gen:
print(f"Counted: {num}")
# You can also manually call next()
# new_counter_gen = simple_counter_generator(10, 13)
# print(next(new_counter_gen))
# print(next(new_counter_gen))
# print(next(new_counter_gen))
# print(next(new_counter_gen)) # This will raise StopIteration
Example 2: Fibonacci Sequence Generator (Intermediate)
Python
def fibonacci_generator(max_limit):
"""
A generator function to produce the Fibonacci sequence up to a given limit.
This demonstrates maintaining state across 'yield' calls for complex sequences.
"""
a, b = 0, 1
while a <= max_limit:
yield a
a, b = b, a + b
# --- Explanation ---
# This generator efficiently produces Fibonacci numbers. The state (a and b)
# is preserved between successive calls to the generator's `next()` method,
# making it perfect for generating sequences on demand.
# --- Usage Example ---
print("\n--- Example 2 Usage ---")
fib_gen = fibonacci_generator(50)
print("Fibonacci sequence up to 50 (generated):")
for num in fib_gen:
print(num)
print("\nAnother Fibonacci sequence up to 100:")
for num in fibonacci_generator(100): # Create a new generator object
print(num)
Example 3: Infinite Even Number Generator (Advanced Beginner/Intermediate)
Python
def infinite_even_numbers(start=0):
"""
A generator that yields an infinite sequence of even numbers.
Useful for demonstrating generators that don't naturally terminate.
"""
current = start
if current % 2 != 0: # Ensure we start with an even number
current += 1
while True: # Infinite loop, relies on external break
yield current
current += 2
# --- Explanation ---
# This generator will run indefinitely, yielding even numbers.
# Like infinite iterators, you need a way to stop iterating externally,
# typically with a `break` statement or by taking a limited number of items.
# --- Usage Example ---
print("\n--- Example 3 Usage ---")
even_gen = infinite_even_numbers(10)
print("First 7 even numbers from 10 onwards:")
for _ in range(7):
print(next(even_gen))
print("\nEven numbers from 25 to 30:")
for num in infinite_even_numbers(25):
print(num)
if num >= 30:
break
Example 4: Reading Large Files with Generator (Advanced)
Python
def read_large_file_generator(filepath):
"""
A generator function to read lines from a large file, yielding one line at a time.
Excellent for memory-efficient processing of big files.
"""
try:
with open(filepath, 'r') as f:
for line in f:
yield line.strip() # Yield each line, removing whitespace
except FileNotFoundError:
print(f"Error: File not found at {filepath}")
# No need to raise StopIteration explicitly, the generator will simply finish.
# --- Explanation ---
# This is a classic use case for generators. Instead of loading the entire file
# into memory, it reads and yields lines one by one. This is highly memory-efficient
# for processing files that are too large to fit into RAM. The 'with' statement
# ensures the file is properly closed.
# --- Usage Example ---
print("\n--- Example 4 Usage ---")
# Create a dummy file for demonstration
with open("another_large_file.txt", "w") as f:
for i in range(1, 10001):
f.write(f"This is line {i} of a large file.\n")
print("Processing large file with generator:")
line_count = 0
for line in read_large_file_generator("another_large_file.txt"):
# print(f"Processed: {line}") # Uncomment to see lines being processed
line_count += 1
if line_count % 1000 == 0:
print(f"Processed {line_count} lines...")
print(f"Finished processing. Total lines: {line_count}")
# Clean up the dummy file
import os
if os.path.exists("another_large_file.txt"):
os.remove("another_large_file.txt")
Example 5: Pipeline of Generators (Advanced)
Python
def lines_from_file(filepath):
"""Yields lines from a file."""
with open(filepath, 'r') as f:
for line in f:
yield line.strip()
def filter_short_lines(lines, min_length=10):
"""Yields lines that meet a minimum length."""
for line in lines:
if len(line) >= min_length:
yield line
def capitalize_lines(lines):
"""Yields capitalized versions of lines."""
for line in lines:
yield line.upper()
# --- Explanation ---
# This example showcases a powerful pattern: chaining multiple generators together
# to form a processing pipeline. Each generator takes the output of the previous
# one as its input. This is extremely memory-efficient and highly readable,
# allowing for complex data transformations without creating intermediate lists.
# --- Usage Example ---
print("\n--- Example 5 Usage ---")
# Create a dummy file
with open("data_pipeline.txt", "w") as f:
f.write("short\n")
f.write("this is a medium line.\n")
f.write("A very long line that should definitely pass the filter.\n")
f.write("tiny\n")
f.write("another medium sized line.\n")
print("Processing data with a generator pipeline:")
# Create the generator pipeline
processed_data = capitalize_lines(filter_short_lines(lines_from_file("data_pipeline.txt"), min_length=15))
for data_item in processed_data:
print(f"Pipeline output: {data_item}")
# Clean up the dummy file
import os
if os.path.exists("data_pipeline.txt"):
os.remove("data_pipeline.txt")
Generator Expressions: Concise Generators
Just as list comprehensions provide a concise way to create lists, generator expressions (also known as generator comprehensions) offer a compact syntax for creating generators. They look very similar to list comprehensions but use parentheses ()
instead of square brackets []
.
Generator expressions are perfect for situations where you need a one-time, memory-efficient iteration over a sequence and don't need the full power of a generator function. They are evaluated lazily, producing values on the fly, just like generator functions.
Note: Python generator expressions, generator comprehensions, concise generators, lazy evaluation, memory-efficient.
Here are five code examples demonstrating generator expressions, ranging from beginner-friendly to more advanced:
Example 1: Basic Number Generator Expression (Beginner)
Python
# --- Example 1: Basic Number Generator Expression (Beginner) ---
print("--- Example 1 Usage ---")
# Create a generator expression to yield squares of numbers from 0 to 4
squares_gen_exp = (x * x for x in range(5))
# --- Explanation ---
# This is the simplest form of a generator expression. It looks like a list comprehension
# but uses parentheses, making it a generator. It doesn't compute all squares at once,
# but rather generates them one by one as requested.
# --- Usage Example ---
print(f"Type of squares_gen_exp: {type(squares_gen_exp)}")
print("Iterating over squares using a for loop:")
for sq in squares_gen_exp:
print(sq)
# Generator expressions are exhausted after one iteration
# print("\nTrying to iterate again (will produce nothing as it's exhausted):")
# for sq in squares_gen_exp:
# print(sq) # This will not print anything
Example 2: Filtering with Generator Expressions (Intermediate)
Python
# --- Example 2: Filtering with Generator Expressions (Intermediate) ---
print("\n--- Example 2 Usage ---")
data = [1, 5, 8, 12, 15, 20, 23, 25, 30]
# Generator expression to yield only even numbers from the data list
even_numbers_gen = (num for num in data if num % 2 == 0)
# --- Explanation ---
# Generator expressions can include conditional logic (an `if` clause),
# allowing you to filter items efficiently without creating an intermediate list.
# This makes them very powerful for processing large datasets selectively.
# --- Usage Example ---
print("Even numbers generated from data:")
for even_num in even_numbers_gen:
print(even_num)
# Example with a direct use (e.g., sum)
sum_of_filtered_evens = sum(num for num in data if num % 2 == 0 and num > 10)
print(f"Sum of even numbers greater than 10: {sum_of_filtered_evens}")
Example 3: Transforming Data with Generator Expressions (Intermediate)
Python
# --- Example 3: Transforming Data with Generator Expressions (Intermediate) ---
print("\n--- Example 3 Usage ---")
words = ["hello", "world", "python", "generators", "efficiency"]
# Generator expression to yield capitalized versions of words
capitalized_words_gen = (word.upper() for word in words)
# --- Explanation ---
# Generator expressions are excellent for transforming data lazily. Here, each word
# is capitalized only when it's requested, rather than creating a new list of all
# capitalized words upfront.
# --- Usage Example ---
print("Capitalized words generated:")
for cap_word in capitalized_words_gen:
print(cap_word)
# Combine transformation and filtering
filtered_and_transformed = (word.strip().title() for word in [" apple ", " banana ", " CHERRY "] if len(word.strip()) > 5)
print("\nFiltered and Title-cased fruits:")
for fruit in filtered_and_transformed:
print(fruit)
Example 4: Using Generator Expressions with Functions (Advanced Beginner)
Python
# --- Example 4: Using Generator Expressions with Functions (Advanced Beginner) ---
print("\n--- Example 4 Usage ---")
def process_numbers(numbers_iterable):
"""Processes an iterable of numbers, multiplying them by 10."""
for num in numbers_iterable:
print(f"Processed: {num * 10}")
# Create a generator expression directly passed to a function
# No need for extra parentheses if it's the sole argument to a function call
gen_exp_for_func = (x for x in range(3, 7))
print("Calling process_numbers with a generator expression:")
process_numbers(gen_exp_for_func)
# --- Explanation ---
# Generator expressions are often used as arguments to functions that expect iterables.
# When a generator expression is the *only* argument to a function, the outer parentheses
# for the generator expression itself can be omitted, making the syntax even more concise.
# Example where omitting parentheses is possible:
print("\nAnother example with direct passing to sum():")
total_sum = sum(i for i in range(1, 101)) # Sums numbers from 1 to 100
print(f"Sum of 1 to 100 (using generator expression): {total_sum}")
Example 5: Nested Generator Expressions (Advanced)
Python
# --- Example 5: Nested Generator Expressions (Advanced) ---
print("\n--- Example 5 Usage ---")
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Nested generator expression to flatten a matrix and yield only odd numbers
flat_odd_numbers_gen = (num for row in matrix for num in row if num % 2 != 0)
# --- Explanation ---
# Similar to nested list comprehensions, generator expressions can be nested.
# This allows for complex transformations and filtering of multi-dimensional
# data structures in a lazy and memory-efficient manner. Here, it flattens
# the matrix and only yields odd numbers, one at a time.
# --- Usage Example ---
print("Flat odd numbers from matrix:")
for odd_num in flat_odd_numbers_gen:
print(odd_num)
# Another example: Cartesian product with a filter
colors = ["red", "green", "blue"]
sizes = ["small", "medium", "large"]
color_size_pairs = ((color, size) for color in colors for size in sizes if len(color) + len(size) > 10)
print("\nColor-size pairs with combined length > 10:")
for pair in color_size_pairs:
print(pair)
Advantages of Generators (Memory Efficiency)
Generators offer significant advantages, especially concerning memory efficiency. This makes them indispensable when working with large datasets or potentially infinite sequences where loading all data into memory at once is either impossible or impractical.
Here's why generators excel in memory management:
Lazy Evaluation: Generators compute and yield values one at a time, only when requested. Unlike lists or other data structures that store all elements in memory simultaneously, generators keep only the state required to generate the next value. This is the cornerstone of their memory efficiency.
On-Demand Processing: When you iterate over a generator, it pauses execution after each yield
and resumes exactly from that point when the next value is needed. This "pause-and-resume" mechanism means that only one item exists in memory at any given time (the one currently being processed), regardless of how many items the generator will eventually produce.
Handling Infinite Sequences: Since generators don't need to store all elements, they are perfectly suited for representing infinite sequences (like all natural numbers or all even numbers). You can generate values indefinitely without ever running out of memory.
Reduced Memory Footprint: For large datasets, generators dramatically reduce memory consumption. Instead of allocating memory for a massive list, you only need enough memory for the generator object itself and the single value it's currently producing. This can prevent MemoryError
exceptions and allow your programs to handle data sizes that would otherwise be impossible.
Improved Performance for Large Data: While not always faster for small datasets (due to the overhead of context switching between yield
calls), for very large datasets, the memory efficiency of generators often translates to better overall performance. Less memory usage means fewer cache misses and potentially faster data processing.
Note: Python memory optimization, large data processing, infinite sequences, lazy loading, resource management.
Here are five code examples demonstrating the memory efficiency advantages of generators:
Example 1: Generating a Large Range of Numbers (Beginner)
Python
import sys
# --- Example 1: Generating a Large Range of Numbers (Beginner) ---
print("--- Example 1 Usage ---")
# Using a list comprehension (loads all into memory)
list_of_numbers = [i for i in range(1_000_000)]
print(f"Size of list_of_numbers: {sys.getsizeof(list_of_numbers)} bytes")
# Using a generator expression (memory efficient)
generator_of_numbers = (i for i in range(1_000_000))
print(f"Size of generator_of_numbers: {sys.getsizeof(generator_of_numbers)} bytes")
# --- Explanation ---
# This example clearly shows the memory difference. The list comprehension
# creates a list with a million integers, taking up significant memory.
# The generator expression, however, creates a generator object whose size
# is constant and very small, regardless of the range it represents. It only
# generates numbers as they are requested.
# --- Usage Example ---
# You can iterate over both, but the generator saves memory.
# for num in generator_of_numbers:
# pass # Process numbers without holding all in memory
Example 2: Processing Large File Data (Intermediate)
Python
import sys
import os
# Create a large dummy file for demonstration
file_size_mb = 50
num_lines = file_size_mb * 1024 * 1024 // 50 # Approx 50 chars per line
with open("huge_data.txt", "w") as f:
for i in range(num_lines):
f.write(f"This is a line of text for the large file example, line number {i}.\n")
# --- Example 2: Processing Large File Data (Intermediate) ---
print("\n--- Example 2 Usage ---")
def read_file_into_list(filepath):
"""Reads an entire file into a list of lines."""
print(f"Loading '{filepath}' into memory as a list...")
with open(filepath, 'r') as f:
return f.readlines()
def read_file_with_generator(filepath):
"""Yields lines from a file using a generator."""
print(f"Reading '{filepath}' using a generator...")
with open(filepath, 'r') as f:
for line in f:
yield line.strip()
# Attempt to load the whole file (may consume a lot of RAM)
try:
# list_lines = read_file_into_list("huge_data.txt")
# print(f"Size of list_lines: {sys.getsizeof(list_lines) / (1024*1024):.2f} MB")
# print(f"First 5 lines (list): {[line[:30] for line in list_lines[:5]]}")
# print(f"Last 5 lines (list): {[line[:30] for line in list_lines[-5:]]}")
# del list_lines # Free memory if you uncommented this part
print("List approach commented out to avoid potential memory issues on small RAM systems.")
print("Uncomment to see the memory footprint of loading the whole file.")
except MemoryError:
print("MemoryError: Could not load the entire file into memory as a list.")
# Process the file using the generator (memory efficient)
generator_lines = read_file_with_generator("huge_data.txt")
print(f"Size of generator_lines object: {sys.getsizeof(generator_lines)} bytes")
count = 0
for line in generator_lines:
count += 1
if count % 100000 == 0:
print(f"Processed {count} lines so far (generator). Current line length: {len(line)}")
print(f"Finished processing {count} lines from file using generator.")
# Clean up the dummy file
os.remove("huge_data.txt")
# --- Explanation ---
# This example vividly demonstrates memory efficiency. Loading a large file
# entirely into a list (`readlines()`) consumes a huge amount of RAM, potentially
# leading to a `MemoryError`. The generator approach, however, processes the
# file line by line, keeping only one line in memory at any given time, thus
# having a constant and minimal memory footprint, regardless of file size.
Example 3: Infinite Data Stream Simulation (Advanced)
Python
import sys
import itertools # Useful for working with infinite iterators
# --- Example 3: Infinite Data Stream Simulation (Advanced) ---
print("\n--- Example 3 Usage ---")
def simulated_data_stream():
"""Generates an infinite stream of simulated sensor readings."""
sensor_id = 1
value = 100.0
while True:
yield {'sensor_id': sensor_id, 'timestamp': '2025-06-13 17:30:00', 'value': value}
sensor_id += 1
value += 0.1 # Simulate some change in value
# Create a generator for the infinite stream
stream_gen = simulated_data_stream()
print(f"Size of infinite stream generator: {sys.getsizeof(stream_gen)} bytes")
# --- Explanation ---
# Generators are perfect for representing infinite data streams where you can't
# possibly store all data. This example simulates a continuous stream of sensor
# readings. The generator `stream_gen` maintains a tiny memory footprint, as it
# only generates a new reading when `next()` is called.
# --- Usage Example ---
print("Fetching first 5 readings from infinite stream:")
for i in range(5):
print(next(stream_gen))
print("\nFetching next 3 readings:")
# Use itertools.islice for more controlled access to infinite generators
for reading in itertools.islice(stream_gen, 3):
print(reading)
# You could imagine this feeding into a real-time analytics system without memory issues.
Example 4: Chaining Generators for Memory-Efficient Pipelines (Advanced)
Python
import sys
# --- Example 4: Chaining Generators for Memory-Efficient Pipelines (Advanced) ---
print("\n--- Example 4 Usage ---")
def generate_numbers(max_num):
"""Yields numbers up to max_num."""
for i in range(max_num):
yield i
def filter_evens(numbers):
"""Yields only even numbers from an iterable."""
for num in numbers:
if num % 2 == 0:
yield num
def square_numbers(numbers):
"""Yields the square of each number from an iterable."""
for num in numbers:
yield num * num
# Create a pipeline of chained generators
# Numbers from 0 to 999,999 -> filter evens -> square them
pipeline = square_numbers(filter_evens(generate_numbers(1_000_000)))
# --- Explanation ---
# This demonstrates the power of chaining generators. Each step in the pipeline
# (`generate_numbers`, `filter_evens`, `square_numbers`) is a generator. Data
# flows through the pipeline one item at a time. No intermediate lists are created,
# meaning the memory footprint remains minimal even for very large sequences.
# --- Usage Example ---
print(f"Size of the final pipeline generator object: {sys.getsizeof(pipeline)} bytes")
processed_count = 0
sum_of_squares = 0
for sq in pipeline:
sum_of_squares += sq
processed_count += 1
if processed_count % 100_000 == 0:
print(f"Processed {processed_count} squared evens. Current sum: {sum_of_squares}")
print(f"Finished processing. Total squared evens: {processed_count}. Final sum: {sum_of_squares}")
# Compare to a list-based approach (which would fail for 1M numbers likely)
# list_pipeline = [x*x for x in [num for num in range(1_000_000) if num % 2 == 0]]
# print(f"Size of list_pipeline: {sys.getsizeof(list_pipeline) / (1024*1024):.2f} MB (if it could be created)")
Example 5: Dynamic Data Generation with Minimal Memory (Advanced)
Python
import sys
import random
# --- Example 5: Dynamic Data Generation with Minimal Memory (Advanced) ---
print("\n--- Example 5 Usage ---")
def generate_random_data(count, max_value):
"""Generates 'count' random numbers dynamically."""
for _ in range(count):
yield random.randint(1, max_value)
def analyze_data_stream(data_stream):
"""Analyzes a stream of data for max, min, and average."""
total = 0
count = 0
max_val = float('-inf')
min_val = float('inf')
for value in data_stream:
total += value
count += 1
max_val = max(max_val, value)
min_val = min(min_val, value)
if count == 0:
return {'total': 0, 'count': 0, 'max': None, 'min': None, 'average': None}
return {'total': total, 'count': count, 'max': max_val, 'min': min_val, 'average': total / count}
# Generate 10 million random numbers and analyze them without storing all in memory
num_to_generate = 10_000_000
max_random_val = 1000
# Create the generator for random data
random_data_gen = generate_random_data(num_to_generate, max_random_val)
print(f"Size of random_data_gen object: {sys.getsizeof(random_data_gen)} bytes")
print(f"Analyzing {num_to_generate} random numbers...")
analysis_results = analyze_data_stream(random_data_gen)
# --- Explanation ---
# This example shows how generators are used to process very large amounts of
# dynamically generated data. The `generate_random_data` function doesn't create
# a list of 10 million random numbers. Instead, it yields them one by one.
# The `analyze_data_stream` function then processes these numbers as they are
# generated, keeping only a few aggregate statistics in memory, leading to
# extremely low memory usage for processing a massive virtual dataset.
# --- Usage Example ---
print("\nAnalysis Results:")
print(f"Total numbers processed: {analysis_results['count']}")
print(f"Sum of numbers: {analysis_results['total']}")
print(f"Maximum number: {analysis_results['max']}")
print(f"Minimum number: {analysis_results['min']}")
print(f"Average number: {analysis_results['average']:.2f}")
# Try to create a list of 10 million random numbers - it would likely crash due to MemoryError
# large_list = [random.randint(1, max_random_val) for _ in range(num_to_generate)]
# print(f"Size of large_list: {sys.getsizeof(large_list) / (1024*1024):.2f} MB (if it could be created)")