Generators in python provide an efficient way of generating numbers or objects as and when needed, without having to store all the values in memory beforehand.
Introduction
You can think of Generators as a simple way of creating iterators without having to create a class with __iter__()
and __next__()
methods.
So how to create a Generator?
There are multiple ways, but the most common way to declare a function with a yield
instead of a return
statement. This way you will be able to iterate it through a for-loop.
# Define a Generator function: squares.
def squares(numbers):
for i in numbers:
yield i*i
Create the generator and iterate.
# Create generator and iterate
sq_gen = squares([1,2,3,4])
for i in sq_gen:
print(i)
#> 1
#> 4
#> 9
#> 16
Generator Basics: The advantage of using Generators
Now let’s get into the details of a generator. But first let’s understand some basics.
Consider the following two approaches of printing the squares of values from 0 to 4:
Approach 1: Using list
# Approach 1: Using list
L = [0, 1, 2, 3, 4]
for i in L:
print(i*i)
#> 0
#> 1
#> 4
#> 9
#> 16
Approach 2: Using range generator
# Approach 2: Using range
for i in range(5):
print(i*i)
#> 0
#> 1
#> 4
#> 9
#> 16
The first approach uses a list whereas the second one uses range
, which is a generator. Though, the output is the same from both methods, you can notice the difference when the number of objects you want to iterate massively increases.
Because, the list object occupies actual space in memory. As the size of the list increases, say you want to iterate till 5000, the required system memory increases proportionately.
However, that is not the case with the generator range
. No matter the number if iterations, the size of the generator itself does not change. That’s something!
Free Time Series Project Template
Do you want learn how to approach projects across different domains with Time Series?
Get started with your first Time Series Industry Project and Learn how to use and implement algorithms like ARIMA, SARIMA, SARIMAX, Simple Exponential Smoothing and Holt-Winters.

Do you want learn how to approach projects across different domains with Time Series?
Get started with your first Time Series Industry Project and Learn how to use and implement algorithms like ARIMA, SARIMA, SARIMAX, Simple Exponential Smoothing and Holt-Winters.
# Check size of List vs Generator.
import sys
print(sys.getsizeof(L))
print(sys.getsizeof(range(6)))
#> 120
#> 48
However, since range
is a generator, the memory requirement of range
for iterating 5000 numbers does not increase. Because, the values are generated only when needed and not actually stored.
# check size of a larger range
print(sys.getsizeof(range(5000)))
#> 48
That’s still the same number of bytes as range(6)
.
Source: GeeksforGeeks
Now, that’s the advantage of using generators.
The good part is, Python allows you to create your own generator as per your custom logic. There are multiple ways to do it though. Let’s see some examples.
Approach 1. Using the yield keyword
We have already seen this. Let’s create the same logic of creating squares of numbers using the yield
keyword and this time, we define it using a function.
- Define the generator function
def squares(numbers):
for i in numbers:
yield i*i
- Create the generator object
nums_gen = squares([1,2,3,4])
nums_gen
#>
Notice, it has only created a generator object and not the values we desire. Yet. To actually generate the values, you need to iterate and get it out.
print(next(nums_gen))
print(next(nums_gen))
print(next(nums_gen))
print(next(nums_gen))
#> 1
#> 4
#> 9
#> 16
What does yield
do?
The yield statement is basically responsible for creating the generator that can be iterated upon.
Now, what happens when you use Yield
?
Two things mainly:
- Because you’ve used the
yield
statement in the func definition, a dunder__next__()
method has automatically been added to thenums_gen
, making it an iterable. So, now you can callnext(nums_gen)
. -
Once you call
next(nums_gen)
, it starts executing the logic defined insquares()
, until it hits upon theyield
keyword. Then, it sends the yielded value and pauses the function temporarily in that state without exiting. When the function is invoked the next time, the state at which it was last paused is remembered and execution is continued from that point onwards. This continues until the generator is exhausted.
The magic in this process is, all the local variables that you had created within the function’s local name space will be available in the next iteration, that is when next
is called again explicitly or when iterating in a for loop.
Had we used the return
instead, the function would have exited, killing off all the variables in it’s local namespace.
yield
basically makes the function to remember its ‘state’. This function can be used to generate values as per a custom logic, fundamentally become a ‘generator’.
What happens after exhausting all the values?
Once the values have been exhausted, a StopIteration
error gets raised. You need to create the generator again in order to use it again to generate the values.
# Once exhausted it raises StopIteration error
print(next(nums_gen))
You will need to re-create it and run it again.
nums_gen = squares([1,2,3,4])
This time, let’s iterate with a for-loop.
for i in nums_gen:
print(i)
#> 1
#> 4
#> 9
#> 16
Good.
Alternately, you can make the generator keep generating endlessly without exhaustion. This can be done by creating it as a class that defines an __iter__()
method with an yield
statement.
Approach 2. Create using class as an iterable
# Approach 3: Convert it to an class that implements a `__iter__()` method.
class Iterable(object):
def __init__(self, numbers):
self.numbers = numbers
def __iter__(self):
n = self.numbers
for i in range(n):
yield i*i
iterable = Iterable(4)
for i in iterable: # iterator created here
print(i)
#> 0
#> 1
#> 4
#> 9
It’s fully iterated now.
Run gain without re-creating iterable.
for i in iterable: # iterator again created here
print(i)
#> 0
#> 1
#> 4
#> 9
Approach 3. Creating generator without using yield
gen = (i*i for i in range(5))
gen
#> at 0x000002372CA82E40>
for i in gen:
print(i)
#> 0
#> 1
#> 4
#> 9
#> 16
Try again, it can be re-used.
for i in gen:
print(i)
This example seems redundant because it can be easily done using range
.
Let’s see another example of reading a text file. Let’s split the sentences into a list of words.
gen = (i.split() for i in open("textfile.txt", "r", encoding="utf8"))
gen
#> at 0x000002372CA84190>
Create generator again
for i in gen:
print(i)
OUTPUT
#> ['Amid', 'controversy', 'over', '‘motivated’', 'arrest', 'in', 'sand', 'mining', 'case,']
#> ['Punjab', 'Congress', 'chief', 'Navjot', 'Singh', 'Sidhu', 'calls', 'for', '‘honest', 'CM', 'candidate’.']
#> ['Amid', 'the', 'intense', 'campaign', 'for', 'the', 'Assembly', 'election', 'in', 'Punjab,']
#> ['due', 'less', 'than', 'three', 'weeks', 'from', 'now', 'on', 'February', '20,', 'the', 'Enforcement', 'Directorate', '(ED)']
#> ['on', 'Friday', 'arrested', 'Bhupinder', 'Singh', '‘Honey’,', 'Punjab', 'Chief', 'Minister']
#> ['Charanjit', 'Singh', 'Channi’s', 'nephew,', 'in', 'connection', 'with', 'an', 'illegal', 'sand', 'mining', 'case.']
Let’s try that again, but just extract the first 3 words in each line.
gen = (i.split()[:3] for i in open("textfile.txt", "r", encoding="utf8"))
for i in gen:
print(i)
OUTPUT
#> ['Amid', 'controversy', 'over']
#> ['Punjab', 'Congress', 'chief']
#> ['Amid', 'the', 'intense']
#> ['due', 'less', 'than']
#> ['on', 'Friday', 'arrested']
#> ['Charanjit', 'Singh', 'Channi’s']
Nice. We have covered all aspects of working with generators. Hope the concept of generators is clear now.