What is an iterator in Python

What is an iterator in Python

We've learned before what an iterable is. It is an object we can iterate over - we can go over its items one by one.

An iterator is an object that does the actual iteration over an iterable. It provides iteration, enabling us to iterate over an iterable.

Python documentation says it is:

An object representing a stream of data

Iterator reads data from iterable and returns them one by one. Data might be from a container (list, tuple etc.) or other sources like files, network connections, etc.

So an iterator is an intermediary between the data source - iterable - and code that needs to iterate over its data. Iterator is is an itermediary between iterable and code that needs to iterate over its data

Getting an iterator for iterablePermalink

We use the iter() function to get an iterator for iterable.

my_list = [1, 2, 3]
iter(my_list)
# outputs: <list_iterator object at 0x10bbadc90>

iter() function calls the __iter__() method on the iterable to obtain an iterator. Iterable's __iter__() method is responsible for creating and returning an iterator.

As we'll see later, the iterator needs access to the iterable to get data from it. So when the __iter__ method creates an iterator, it passes the iterable itself to the iterator.

(Another option is that iterable has the method __getitem__() - we’ve discussed it in the article about iterable) How we get an iterator for an iterable

Getting data from the iteratorPermalink

Once we have an iterator for our iterable, how do we get data from it?

We pass the iterator to the next() function.

# create our iterable
numbers = [1,2,3]

# we get the iterator
it = iter(numbers)

# we call next() to get item
next(it)
1
next(it)
2

The next() function tells the iterator to give us the next item from the iterable.

Every time we iterate over an iterable (using for, in, etc.), behind the scenes, Python uses the next() function to get items one by one.

If the object passed to the next() function is not an iterator, next() raises TypeError.

>>> next(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not an iterator
>>> 

But how does the next() know if an object is an iterator and how it gets data from it?

What makes object an iterator?Permalink

The next() function checks if an object has the __next__() method.

If it does, it uses it to get the next item.

If it does not then next() raises TypeError.

So, an iterator is an object with the __next__() method.

The __next__() method is what makes an object an iterator. It is responsible for returning the next item from the iterable.

The next() function calls the __next__() method and returns what it returned.

How next() and __next__() get item from iterable We can also use the __next__() method directly:

numbers = [1,2,3]
it = iter(numbers)
it.__next__()
1
it.__next__()
2

But as with any dunder method, we should not. It’s better to use the next() function.

We will write our own iterator with the __next__() method shortly.

Now, let's look at what happens if there is no next item in an iterable?

ExhaustionPermalink

When there is no next item, we say the iterator is exhausted, and the __next__() method must raise StopIteration error.

numbers = [1,2,3]

it = iter(numbers)

it.__next__()
1
it.__next__()
2
it.__next__()
3

# now we call __next__() again and we get an error
it.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Once the __next__() method raises the StopIteration error, it needs to continue to do so for subsequent calls. Otherwise, it’s broken.

Let’s call __next()__ once more:

>>> it.__next__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

We should always use the next() function instead of using the __next__() method directly. When __next__() raises StopIteration, the next() function propagates that error.

numbers = [1,2,3]

# get iterator
it = iter(numbers)

next(it)
1
next(it)
2
next(it)
3

next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Once the iterator is exhausted, it is of no use anymore. We can’t get any more items from iterable using exhausted iterator.

If we want to iterate over an iterable again, we need to get a new iterator from it using iter():

numbers = [1,2,3]

# get iterator
it = iter(numbers)
next(it)
1
next(it)
2
next(it)
3
next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
# ^^ we exhausted iterator `it`

# but we can get a new iterator
it2 = iter(numbers)
next(it2)
1

Getting a new iterator is not possible for all iterables, though. We’ll explain that bit later.

How does it all work together?Permalink

So how does all of this come together?

Every time we iterate over the iterable with a for loop, Python uses the iter() function (which uses the __iter__() method) and the next() function (which uses the __next__() method) behind the scenes.

It works like this:

  1. Python uses the iter() function to get an iter-ator for an object.
  2. The iter() function uses the __iter__() method of the object to get an iterator. If the iterable does not have an __iter__() method (in which case object is not iter-able) or an object returned from the __iter__() is not an iterator, it raises TypeError.
  3. Python then passes the iterator to the next() function.
  4. The next() function uses the __next__() method of the iterator to get the next item. (The next() checks that iterator has __next__() method and raises TypeError if it does not, although this was also checked before by the iter() function.)
  5. If the __next__() method returns some item, it is used in the for loop, and we go back to step 3 - Python calls next() again.
  6. If there is no next item, the __next__() method raises StopIteration, stopping the whole process.

How for loop in Python uses iter() and next() to iterate over iterable

Here's the code that roughly corresponds to what is happening when we use a for loop:

# Say we have iterable `numbers` like this
numbers = [1, 2, 3]

# This is what Python does when we use a for loop:

# Python will get iterator for `numbers`
it = iter(numbers)

# It starts a while loop with condition True
# so it will run forever unless it is stopped
while True:
    try:
        # it calls the next method passing it the iterator
        item = next(it) 
        # if it returns an item it uses it
        print(item)
    # otherwise next() raises StopIteration and this is where 
    # it will break the while loop
    except StopIteration:
        break

Let's now build our own iterator and iterable from scratch.

Building iteratorPermalink

Let's create a simple class Bistro with three fields, waitress, chef and barman:

class Bistro:
    def __init__(self, waitress, chef, barman):
        self.waitress = waitress
        self.chef = chef
        self.barman = barman

Once we know how our class looks, we can create an iterator for it:

class BistroIterator:
    def __init__(self, bistro):
        self.bistro = bistro
        self.next_item = 'waitress'

    def __next__(self):
        if self.next_item == 'waitress':
            self.next_item = 'chef'
            return self.bistro.waitress
        elif self.next_item == 'chef':
            self.next_item = 'barman'
            return self.bistro.chef
        elif self.next_item == 'barman':
            self.next_item = None
            return self.bistro.barman
        else:
            raise StopIteration

Our BistroIterator needs a Bistro object for which it provides iteration capability so that it can access its data:

  • When we create our iterator, it stores the Bistro object into field bistro and sets which field it will return when it's asked for the next item - we chose to return field waitress.
  • When the __next__() method is called, it checks the next_item field to know what to return.
  • If the next_item field contains the value waitress, it sets the next_item to be 'chef' and returns the waitress field.
  • If it contains the value chef, it sets the next_item to 'barman' and returns the chef field.
  • If it contains the value barman, it sets the next_item to None and returns thebarman field.
  • If the next_item is neither of those, it raises a StopIteration exception.

Now we need to update our Bistro class with the __iter__() method in which we create a BistroIterator object, passing itself as a parameter and returning it.

class BistroIterator:
    def __init__(self, bistro):
        self.next_item = 'waitress'
        self.bistro = bistro

    def __iter__(self):
        return PersonIterator(self)

    # rest of class as before

Our Bistro class has become iterable, and we can use a for loop to iterate over its values:

our_bistro = Bistro('Mary', 'John', 'Alvin')
for item in our_bistro:
    print(item)

# outputs:
# Mary
# John
# Alvin

And that is the iterator's job: Take data from iterable and return them from the __next__() method one by one.

Usually, it is not very useful to iterate over objects like Bistro, but it proves that we can make iterable out of almost any object.

Iterator is (almost always) iterable & iterable can be an iteratorPermalink

We said before that iterator is an object that has the __next__() method.

But...

Iterator protocolPermalink

Python documentation says that iterator is also required to have the __iter__() method in addition to the __next__() method to conform to the iterator protocol.

However, an iterator without __iter__() will still work because what makes the iterator work is the __next__() method - as we've seen before. Our BistroIterator has no __iter__() method, but it still works.

So iterator does not have to conform to the iterator protocol to work.

When an iterator does have the __iter__() method, it must return the iterator object itself.

class SomeIterator:
    def __iter__(self):
         return self

    def __next__(self):
        ...

However, most iterators do have the __iter__() method.

Why?

Iterator as iterablePermalink

We learned before that if an object has the __iter__() method, it is iterable.

So if an iterator has it, then it's also iterable!

And that means we can use iterator with for loop and in expression and functions that expect iterable.

Let's try with a for loop:

numbers = [1, 2, 3]

# get iterator
it = iter(numbers)

# notice below we use iterator `it` not `numbers`
for item in it:
    print(item)

# outputs
# 1
# 2
# 3

It works because the for loop uses the iter() function to get an iterator for an iterable. Here we gave it an iterator. But the iterator for the list is also iterable - it has the __iter__() method, which returns self. So the for loop will use the same iterator as if we looped over numbers.

We can check that list's iterator returns self when we ask for its iterator:

numbers = [1, 2, 3]

# get iterator
it = iter(numbers)

it
# outputs:
# <list_iterator object at 0x10633f5b0>

# get iterator from iterator
it2 = iter(it)
it2
# outputs:
# <list_iterator object at 0x10633f5b0>

it == it2
# outputs:
# True

So when we give the for loop an iterator, and it asks for its iterator, it will return itself. Then it proceeds by calling next() until it raises StopIteration.

It would not be possible if the iterator had no __iter___ () method. If we try the above with our BistroIterator which doesn't have __iter__(), we'll get an error:

our_bistro = Bistro('Mary', 'John', 'Alvin')
it = iter(our_bistro)

for item in it:
    print(item)

# outputs:
# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# TypeError: 'PersonIterator' object is not iterable

When we add the __iter__() method, it will work:

class BistroIterator:
    def __init__(self, bistro):
        self.next_item = 'waitress'
        self.bistro = bistro

    def __next__(self):
        if self.next_item == 'waitress':
            self.next_item = 'chef'
            return self.bistro.waitress
        elif self.next_item == 'chef':
            self.next_item = 'barman'
            return self.bistro.chef
        elif self.next_item == 'chef':
            self.next_item = None
            return self.bistro.barman
        else:
            raise StopIteration

    def __iter__(self):
        return self


our_bistro = Bistro('Mary', 'John', 'Alvin')
it = iter(our_bistro)

for item in it:
    print(item)

# outputs:
# Mary
# John
# Alvin

Not all iterables are the samePermalink

Although we can use iterators with the __iter__() method where iterable is expected, we need to be aware of exhaustion.

Once an iterator is exhausted, we can't use it anymore to get data from it.

We said in the Exhaustion section that if we want to iterate over iterable multiple times, we could get a new iterator for each iteration.

But, an iterator that is also iterable will not give us a new iterator when we ask for it because its __iter__() method returns itself.

numbers = [1, 2, 3]

it = iter(numbers)
for item in it:
    print(item)
# outputs:
# 1
# 2
# 3

# Now `it` is exhausted.
# But `it` is also an iterable so 
# let's get an iterator from it using `iter()`

it2_from_it = iter(it)

for item in it2_from_it:
    print(item)

# Doesn't outputs anything as `it` and `it2_from_it` 
# are the same object

So a function or other code that expects an iterable must not assume it will be able to iterate over the iterable more than once. If the iterable is also iterator it will be exhausted after the first iteration. For further iterations, that code or function will not get any items from the iterable (which is also iterator), which might break it.

We can iterate only once over iterables whose __iter__() method always returns the same iterator. They're often iterators themselves.

Different kinds of iterablesPermalink

We've seen that some iterables will return a new iterator each time we ask for it. But others will not.

Which iterables return a new iterator every time?

When iterable and iterator are separate objects, iterable will return a new iterator each time we ask for it.

Our iterable Bistro is a good example. It creates and returns a new BistroIterator object every time we ask it for an iterator. So we can iterate over our_bistro many times.

Another example is all container objects like list, tuple dict, etc.

Each time we pass them to the iter() function (or call their __iter__() method), they produce a new iterator.

numbers = [1,2,3]
# get iterator
it = iter(numbers)
next(it)
1
next(it)
2
next(it)
3
next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
# we exhausted iterator `it`

# but we can get a new iterator
it2 = iter(numbers)
next(it2)
1

That is because, same as our Bistro object, they're separate objects from their iterators. They hold data and create an iterator - they have the __iter__() method. But the iterator is a different object, and only the iterator has the __next__() method. They don't.

Which iterables return the same iterator every time?

When an object is both iterable and iterator (has both __iter__() and __next__() methods), it will return the same iterator - self - each time we ask for it.

Usually, it is so when an object doesn't store data we want to iterate over.

One example is the iterator object with the __iter__() method. For example, the list iterator we've seen before. It doesn't have any data by itself. It needs another object - iterable (list) - to read data from. It is an iterator for another object, so it just returns itself when asked for its iterator. It doesn't create another iterator as it is an iterator.

But there are other objects which are both iterable and iterator. They have __iter__() and __next__() but are not merely iterator for another iterable.

These objects still don't store any data, but they are doing the work to get data from somewhere. For example, Python's file object (_io.TextIOWrapper). It doesn't contain any data. Those are in a file. But it has the __iter__() method as well as the __next__() method. It's not an iterator for another iterable. It is a standalone object that deals with the file to get data, handles file closing, etc. But when asked for an iterator, it will just return itself.

We can iterate only once over objects that are both iterator and iterable.

So:

  • Some objects are only iterables.
  • Some objects are only iterators, but most iterators are also iterables (even though they are iterators for other iterable).
  • And some objects are both iterable and iterator (without being iterator for other iterable ).

We can often use iterable and iterator interchangeably. But only when we understand how it works, we can properly decide what to use and how to use it and avoid surprises.

SummaryPermalink

  • Iterator is an object that provides iteration for iterable.
  • We get iterator from iterable using iter() function which calls __iter__() method on iterable.
  • We get an item from the iterator by passing it to the next() function.
  • next() function uses __next__() method of iterator to get the next item in iterable. Iterator has knowledge of and access to iterable.
  • Iterator is the object with __next__() method.
  • When there are no more items in iterable, then the iterator is exhausted; in this case, its __next__() method must raise StopIteration.
  • for loop uses iter() and next() functions to iterate over the iterable.
  • The iterator protocol requires the iterator to have the __iter__() method, but it will work without it anyway.
  • Most iterators have it, though, as it makes iterator an iterable, enabling us to use it in places where iterable is expected.
  • Iterator and iterable can be separate objects.
  • When they're separate objects, we can get a new iterator for iterable when the current iterator is exhausted.
  • But they can also be one object which is both iterator and iterable.
  • When it's one object, and it is exhausted, we can't get a new iterator, so we can only iterate over such iterable once.

You might also like

Join the newsletter

Subscribe to get new articles about Python, code and programming into your inbox!