Menu

Generators in Python – How to lazily return values only when needed and save memory?

Generators in python provide an efficient way of generating numbers or objects as and when needed, without having to store all the values in memory beforehand.

Introduction

You can think of Generators as a simple way of creating iterators without having to create a class with __iter__() and __next__() methods.

So how to create a Generator?

There are multiple ways, but the most common way to declare a function with a yield instead of a return statement. This way you will be able to iterate it through a for-loop.

# Define a Generator function: squares.
def squares(numbers):
  for i in numbers:
    yield i*i

Create the generator and iterate.

# Create generator and iterate
sq_gen = squares([1,2,3,4])
for i in sq_gen:
  print(i)

#> 1
#> 4
#> 9
#> 16

Generator Basics: The advantage of using Generators

Now let’s get into the details of a generator. But first let’s understand some basics.

Consider the following two approaches of printing the squares of values from 0 to 4:

Approach 1: Using list

# Approach 1: Using list
L = [0, 1, 2, 3, 4]
for i in L:
  print(i*i)

#> 0
#> 1
#> 4
#> 9
#> 16

Approach 2: Using range generator

# Approach 2: Using range
for i in range(5):
  print(i*i)

#> 0
#> 1
#> 4
#> 9
#> 16

The first approach uses a list whereas the second one uses range, which is a generator. Though, the output is the same from both methods, you can notice the difference when the number of objects you want to iterate massively increases.

Because, the list object occupies actual space in memory. As the size of the list increases, say you want to iterate till 5000, the required system memory increases proportionately.

However, that is not the case with the generator range. No matter the number if iterations, the size of the generator itself does not change. That’s something!

# Check size of List vs Generator.
import sys
print(sys.getsizeof(L))
print(sys.getsizeof(range(6)))

#> 120
#> 48

However, since range is a generator, the memory requirement of range for iterating 5000 numbers does not increase. Because, the values are generated only when needed and not actually stored.

# check size of a larger range
print(sys.getsizeof(range(5000)))

#> 48

That’s still the same number of bytes as range(6).

Python size of objects

Source: GeeksforGeeks

Now, that’s the advantage of using generators.

The good part is, Python allows you to create your own generator as per your custom logic. There are multiple ways to do it though. Let’s see some examples.

Approach 1. Using the yield keyword

We have already seen this. Let’s create the same logic of creating squares of numbers using the yield keyword and this time, we define it using a function.

  1. Define the generator function
def squares(numbers):
  for i in numbers:
    yield i*i
  1. Create the generator object
nums_gen = squares([1,2,3,4])
nums_gen

#>

Notice, it has only created a generator object and not the values we desire. Yet. To actually generate the values, you need to iterate and get it out.

print(next(nums_gen))
print(next(nums_gen))
print(next(nums_gen))
print(next(nums_gen))

#> 1
#> 4
#> 9
#> 16

What does yield do?

The yield statement is basically responsible for creating the generator that can be iterated upon.

Now, what happens when you use Yield?

Two things mainly:

  1. Because you’ve used the yield statement in the func definition, a dunder __next__() method has automatically been added to the nums_gen, making it an iterable. So, now you can call next(nums_gen).

  2. Once you call next(nums_gen), it starts executing the logic defined in squares(), until it hits upon the yield keyword. Then, it sends the yielded value and pauses the function temporarily in that state without exiting. When the function is invoked the next time, the state at which it was last paused is remembered and execution is continued from that point onwards. This continues until the generator is exhausted.

The magic in this process is, all the local variables that you had created within the function’s local name space will be available in the next iteration, that is when next is called again explicitly or when iterating in a for loop.

Had we used the return instead, the function would have exited, killing off all the variables in it’s local namespace.

yield basically makes the function to remember its ‘state’. This function can be used to generate values as per a custom logic, fundamentally become a ‘generator’.

What happens after exhausting all the values?

Once the values have been exhausted, a StopIteration error gets raised. You need to create the generator again in order to use it again to generate the values.

# Once exhausted it raises StopIteration error
print(next(nums_gen))

You will need to re-create it and run it again.

nums_gen = squares([1,2,3,4])

This time, let’s iterate with a for-loop.

for i in nums_gen:
  print(i)

#> 1
#> 4
#> 9
#> 16

Good.

Alternately, you can make the generator keep generating endlessly without exhaustion. This can be done by creating it as a class that defines an __iter__() method with an yield statement.

Approach 2. Create using class as an iterable

# Approach 3: Convert it to an class that implements a `__iter__()` method.
class Iterable(object):
  def __init__(self, numbers):
    self.numbers = numbers

  def __iter__(self):
    n = self.numbers
    for i in range(n):
      yield i*i

iterable = Iterable(4)

for i in iterable: # iterator created here
  print(i)

#> 0
#> 1
#> 4
#> 9

It’s fully iterated now.

Run gain without re-creating iterable.

for i in iterable: # iterator again created here
  print(i)

#> 0
#> 1
#> 4
#> 9

Approach 3. Creating generator without using yield

gen = (i*i for i in range(5))
gen

#> at 0x000002372CA82E40>

for i in gen:
  print(i)

#> 0
#> 1
#> 4
#> 9
#> 16

Try again, it can be re-used.

for i in gen:
  print(i)

This example seems redundant because it can be easily done using range.

Let’s see another example of reading a text file. Let’s split the sentences into a list of words.

gen = (i.split() for i in open("textfile.txt", "r", encoding="utf8"))
gen

#> at 0x000002372CA84190>

Create generator again

for i in gen:
  print(i)
OUTPUT
#> ['Amid', 'controversy', 'over', '‘motivated’', 'arrest', 'in', 'sand', 'mining', 'case,']
#> ['Punjab', 'Congress', 'chief', 'Navjot', 'Singh', 'Sidhu', 'calls', 'for', '‘honest', 'CM', 'candidate’.']
#> ['Amid', 'the', 'intense', 'campaign', 'for', 'the', 'Assembly', 'election', 'in', 'Punjab,']
#> ['due', 'less', 'than', 'three', 'weeks', 'from', 'now', 'on', 'February', '20,', 'the', 'Enforcement', 'Directorate', '(ED)']
#> ['on', 'Friday', 'arrested', 'Bhupinder', 'Singh', '‘Honey’,', 'Punjab', 'Chief', 'Minister']
#> ['Charanjit', 'Singh', 'Channi’s', 'nephew,', 'in', 'connection', 'with', 'an', 'illegal', 'sand', 'mining', 'case.']

Let’s try that again, but just extract the first 3 words in each line.

gen = (i.split()[:3] for i in open("textfile.txt", "r", encoding="utf8"))
for i in gen:
  print(i)
OUTPUT
#> ['Amid', 'controversy', 'over']
#> ['Punjab', 'Congress', 'chief']
#> ['Amid', 'the', 'intense']
#> ['due', 'less', 'than']
#> ['on', 'Friday', 'arrested']
#> ['Charanjit', 'Singh', 'Channi’s']

Nice. We have covered all aspects of working with generators. Hope the concept of generators is clear now.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science