Collections is a built-in python module that provides useful container datatypes. Container datatypes allow us to store and access values in a convenient way. Generally, you would have used lists, tuples, and dictionaries. But, while dealing with structured data we need smarter objects.
In this article, I will walk you through the different data structures supported by collections
module, understand when to use them with examples.
Contents
- namedtuple
- What is namedtuple
- Another way of creating a namedtuple
- Why use namedtuple over dictionary
- Creating a namedtuple from a python Dictionary
- How to replace a attribute in a namedtuple
- Counter
- defaultdict
- OrderedDict
- What happens when you delete and re-insert keys in OrderedDict
- Sorting with OrderedDict
- ChainMap
- What happens when we have redundant keys in a ChainMap
- How to add a new dictionary to a ChainMap
- How to reverse the order of dictionaries in a ChainMap
- UserList
- UserString
- UserDict
# Import the collections module
import collections
Let us start with the namedtuple
What is namedtuple()
You can think of namedtuple
in two ways:
As an enhanced version of tuple. Or as a quick way of creating a python class
with certain named attributes.
A key difference between a tuple
and a namedtuple
is: while a tuple
let’s you access data through indices, with a namedtuple
you can access the elements with their names.
You can actually define what all attributes a namedtuple
can hold and create multiple instances of it. Just like how you would do with classes.
So, in terms of functionality, its more similar to a class
, eventhough it has tuple
in its name.
Let’s create a namedtuple
that represents a ‘movie’ with the attributes ‘genre’, ‘rating’ and ‘lead_actor’.
# Creating a namedtuple.
# The field values are passed as a string seperated by ' '
from collections import namedtuple
movie = namedtuple('movie','genre rating lead_actor')
# Create instances of movie
ironman = movie(genre='action',rating=8.5,lead_actor='robert downey junior')
titanic = movie(genre='romance',rating=8,lead_actor='leonardo dicaprio')
seven = movie(genre='crime',rating=9,lead_actor='Brad Pitt')
Now, you can access any details of a movie you want using the identifier. It’s quite convenient and user friendly.
# Access the fields
print(titanic.genre)
print(seven.lead_actor)
print(ironman.rating)
#> romance
#> Brad Pitt
#> 8.5
Another way of creating a namedtuple
Alternately, you can pass a list of field names instead of the filed names separated by a space.
Let us see an example.
# Creating namedtuple by passing fieldnames as a list of strings
book = namedtuple('book',['price','no_of_pages','author'])
harry_potter = book('500','367','JK ROWLING')
pride_and_prejudice = book('300','200','jane_austen')
tale = book('199','250','christie')
print('Price of pride and prejudice is ',pride_and_prejudice.price)
print('author of harry potter is',harry_potter.author)
#> Price of pride and prejudice is 300
#> author of harry potter is JK ROWLING
The items in a namedtuple
can be accessed by both index as well as an identifier.
print(tale[1])
#> 250
Why use namedtuple over dictionary
A major advantage of namedtuple
is they take up less space / memory than an equivalent dictionary.
So, in the case of large data, namedtuples are efficient.
I’ll demonstrate the same in below example.
# Create a dict and namedtuple with same data and compare the size
import random
import sys
# Create Dict
dicts = {'numbers_1': random.randint(0, 10000),'numbers_2':random.randint(5000,10000)}
print('Size or space occupied by dictionary',sys.getsizeof(dicts))
# converting same dictionary to a namedtuple
data=namedtuple('data',['numbers_1','numbers_2'])
my_namedtuple= data(**dicts)
print('Size or space occupied by namedtuple',sys.getsizeof(my_namedtuple))
#> Size or space occupied by dictionary 240
#> Size or space occupied by namedtuple 64
Executing above code, you find that namedtuple has size ’64’, whereas a dictionary occupies much larger ‘240’ bytes. That’s nearly 4x smaller memory.
You can imagine the effect when expanded to handle a large number of such objects.
Creating a namedtuple
from a python Dictionary
Did you notice how we converted a dictionary into a namedtuple using **
operator?
All you need to do is: first define the structure of the namedtuple
and pass the dictionary (**dict
) to that namedtuple
as argument. Only requirement is, the key’s of the dict
should match the field names of the namedtuple
.
# Convert a dictionary into a namedtuple
dictionary=dict({'price':567,'no_of_pages':878,'author': 'cathy thomas'})
# Convert
book = namedtuple('book',['price','no_of_pages','author'])
print(book(**dictionary))
#> book(price=567, no_of_pages=878, author='cathy thomas')
How to replace a attribute in a namedtuple
What if the value of one attribute has to be changed?
You need to update it in the data. It can be done simply using ._replace()
method
# update the price of the book
my_book=book('250','500','abc')
my_book._replace(price=300)
print("Book Price:", my_book.price)
#> Book Price: 250
Counter
A counter
object is provided by the collections
library.
You have a list of some random numbers. What if you want to know how many times each number occurs?
Counter
allows you to compute the frequency easily. It works not just for numbers but for any iterable object, like strings and lists.
Counter is dict
subclass, used to count hashable objects.
It returns a dictionary with the elements as keys and the count (no of times the element was present) as values .
EXAMPLES
#importing Counter from collections
from collections import Counter
numbers = [4,5,5,2,22,2,2,1,8,9,7,7]
num_counter = Counter(numbers)
print(num_counter)
#>Counter({2: 3, 5: 2, 7: 2, 4: 1, 22: 1, 1: 1, 8: 1, 9: 1})
Let’s use Counter to find the frequency of each character in a string
#counter with strings
string = 'lalalalandismagic'
string_count = Counter(string)
print(string_count)
#> Counter({'a': 5, 'l': 4, 'i': 2, 'n': 1, 'd': 1, 's': 1, 'm': 1, 'g': 1, 'c': 1})
As you saw, we can view what elements are there and their count in a list string.
In case you have a sentence and you want to view count of the words, how to do it?
Use the split()
function to make a list of words in the sentence and pass it to Counter()
# Using counter on sentences
line = 'he told her that her presentation was not that good'
list_of_words = line.split()
line_count=Counter(list_of_words)
print(line_count)
#> Counter({'her': 2, 'that': 2, 'he': 1, 'told': 1, 'presentation': 1, 'was': 1, 'not': 1, 'good': 1})
How to find most common elements using Counter
Counter is very useful in real life applications.
Especially when you need to process large data, and you want to find out the frequency of some elements. Let me show some very useful methods using Counters.
Counter().most_common([n])
This returns a list of ‘n most common elements’ along with their counts in descending order
# Passing different values of n to most_common() function
print('The 2 most common elements in `numbers` are', Counter(numbers).most_common(2))
print('The 3 most common elements in `string` are', Counter(string).most_common(3))
#> The 2 most common elements in `numbers` are [(2, 3), (5, 2)]
#> The 3 most common elements in `string` are [('a', 5), ('l', 4), ('i', 2)]
The most_common()
method can be used to print the most repetitive item. It is used in frequency analysis.
Counter(list_of_words).most_common(1)
#> [('her', 2)]
We can use to the same to find the most repetitive character in a string.
Counter(string).most_common(3)
#> [('a', 5), ('l', 4), ('i', 2)]
What happens if you don’t specify ‘n’ while using most_common(n)
?
All the elements are their counts will be printed in descending order of their counts.
Counter(string).most_common()
#>[('a', 5),('l', 4),('i', 2),('n', 1),('d', 1),('s', 1),('m', 1),('g', 1),('c', 1)]
Counter().elements()
method returns all the elements which have count greater than 0.
print(sorted(string_count.elements()))
#> ['a', 'a', 'a', 'a', 'a', 'c', 'd', 'g', 'i', 'i', 'l', 'l', 'l', 'l', 'm', 'n', 's']
defaultdict
A dictionary is an unordered collection of keys and values.
In the key: value pairs, the key should be distinct, and it cannot be changed. That is why in a dictionary, a list cannot be a key, as it is mutable. But, a tuple can be a key.
# Dict with tuple as keys: OKAY
{('key1', 'key2'): "value"}
# Dict with list as keys: ERROR
{['key1', 'key2']: "value"}
How defaultdict is different from dict
If you try to access a key that is not present in a dictionary, it throws a KeyError
. Whereas, in a defaultdict
it does not give a KeyError
.
It does not give a keyerror . If you access a key that is not present,the defaultdict
will return a default value.
Syntax: defaultdict(default_factory)
When we access a key that is not present, default_factory
function will return a default value
# Creating a defaultdict and trying to access a key that is not present.
from collections import defaultdict
def_dict = defaultdict(object)
def_dict['fruit'] = 'orange'
def_dict['drink'] = 'pepsi'
print(def_dict['chocolate'])
#> <object object at 0x7f591a2f4510>
If you excecute above command it does not give you a KeyError
.
In case you want to output that the value for the requested key is not present, you can define your own function and pass it to the defaultdict.
See below example
# Passing a function to return default value
def print_default():
return 'value absent'
def_dict=defaultdict(print_default)
print(def_dict['chocolate'])
#> value absent
In all other ways, it is the same as a normal dictionary. Same syntax commands are used for defaultdict too.
Actually, it is possible to overcome the KeyError
in dictionary by using the get
method.
# Make dict return a default value
mydict = {'a': 'Apple', 'b': 'Ball'}
mydict.get('c', 'NOT PRESENT')
#> 'NOT PRESENT'
OrderedDict
A dict is an UNORDERED collection of key value pairs. But, an OrderedDict
maintains the ORDER in which the keys are inserted.
It is subclass of dict
.
I am going to create a ordinary dict
and make it OrderedDict
to show you the difference
# create a dict and print items
vehicle = {'bicycle': 'hercules', 'car': 'Maruti', 'bike': ' Harley', 'scooter': 'bajaj'}
print('This is normal dict')
for key,value in vehicle.items():
print(key,value)
print('-------------------------------')
# Create an OrderedDict and print items
from collections import OrderedDict
ordered_vehicle=OrderedDict()
ordered_vehicle['bicycle']='hercules'
ordered_vehicle['car']='Maruti'
ordered_vehicle['bike']='Harley'
print('This is an ordered dict')
for key,value in ordered_vehicle.items():
print(key,value)
#> This is normal dict
#> bicycle hercules
#> car Maruti
#> bike Harley
#> scooter bajaj
-------------------------------
#> This is an ordered dict
#> bicycle hercules
#> car Maruti
#> bike Harley
In an OrderedDict
, even after changing the value of certain keys, the order remains same or unchanged.
# I have changed the value of car in this ordered dictionary.
ordered_vehicle['car']='BMW'# I have changed the value of car in this ordered dictionary.
for key,value in ordered_vehicle.items():
print(key,value)
#> bicycle hercules
#> car BMW
#> bike harley davison
What happens when you delete and re-insert keys in OrderedDict
When a key is deleted, the information about its order is also deleted. When you re-insert the key, it is treated as a new entry and corresponding order information is stored.
# deleting a key from an OrderedDict
ordered_vehicle.pop('bicycle')
for key,value in ordered_vehicle.items():
print(key,value)
#> car BMW
#> bike harley davison
On reinserting the key, it is considered as a new entry.
# Reinserting the same key and print
ordered_vehicle['bicycle']='hercules'
for key,value in ordered_vehicle.items():
print(key,value)
#> car BMW
#> bike harley davison
#> bicycle hercules
You can see the bicycle is at the last, the order has changed when we deleted the key.
There are several useful commands that can be executed. We can perform sorting functions as per need
Sorting with OrderedDict
What if you want to sort the items in increasing order of their values? This will help you in data analysis
Sort the items by KEY (in ascending order)
# Sorting items in ascending order of their keys
drinks = {'coke':5,'apple juice':2,'pepsi':10}
OrderedDict(sorted(drinks.items(), key=lambda t: t[0]))
#> OrderedDict([('apple juice', 2), ('coke', 5), ('pepsi', 10)])
Sort the pairs by VALUE (in ascending order)
# Sorting according to values
OrderedDict(sorted(drinks.items(), key=lambda t: t[1]))
#> OrderedDict([('apple juice', 2), ('coke', 5), ('pepsi', 10)])
Sort the dictionary by length of key string (in ascending order)
# Sorting according to length of key string
OrderedDict(sorted(drinks.items(), key=lambda t: len(t[0])))
#> OrderedDict([('coke', 5), ('pepsi', 10), ('apple juice', 2)])
ChainMap
ChainMap is a container datatype which stores multiple dictionaries.
In many cases, you might have relevant or similar dictionaries, you can store them collectively in a ChainMap
You can print all the items in a ChainMap
using .map
operator. Below code demonstrates the same
# Creating a ChainMap from 3 dictionaries.
from collections import ChainMap
dic1={'red':5,'black':1,'white':2}
dic2={'chennai':'tamil','delhi':'hindi'}
dic3={'firstname':'bob','lastname':'mathews'}
my_chain = ChainMap(dic1,dic2,dic3)
my_chain.maps
#> [{'black': 1, 'red': 5, 'white': 2}, {'chennai': 'tamil', 'delhi': 'hindi'},{'firstname': 'bob', 'lastname': 'mathews'}]
You can print keys of all dictionaries in a chainmap using .keys()
function
print(list(my_chain.keys()))
#> ['firstname', 'lastname', 'chennai', 'delhi', 'red', 'black', 'white']
You can print the values of all dictionaries in a chainmap using .values()
function
print(list(my_chain.values()))
#> ['bob', 'mathews', 'tamil', 'hindi', 5, 1, 2]
What happens when we have redundant keys in a ChainMap
It is possible that 2 dictionaries might have the same key. See an example below.
# Creating a chainmap whose dictionaries do not have unique keys
dic1 = {'red':1,'white':4}
dic2 = {'red':9,'black':8}
chain = ChainMap(dic1,dic2)
print(list(chain.keys()))
#>['black', 'red', 'white']
Observe that ‘red’ is not repeated, it is printed only once
How to add a new dictionary to a ChainMap
You can add a new dictionary at the beginning of a ChainMap using .new_child()
method. It is demonstrated in the below code.
# Add a new dictionary to the chainmap through .new_child()
print('original chainmap', chain)
new_dic={'blue':10,'yellow':12}
chain=chain.new_child(new_dic)
print('chainmap after adding new dictioanry',chain)
#> original chainmap ChainMap({'red': 1, 'white': 4}, {'red': 9, 'black': 8})
#> chainmap after adding new dictioanry ChainMap({'blue': 10, 'yellow': 12}, {'red': 1, 'white': 4}, {'red': 9, 'black': 8})
How to reverse the order of dictionaries in a ChainMap
The order in which dictionaries are stored in a ChainMap can be reversed using reversed()
function.
# We are reversing the order of dictionaries using reversed() function
print('orginal chainmap', chain)
chain.maps = reversed(chain.maps)
print('reversed Chainmap', str(chain))
#> orginal chainmap ChainMap({'blue': 10, 'yellow': 12}, {'red': 1, 'white': 4}, {'red': 9, 'black': 8})
#> reversed Chainmap ChainMap({'red': 9, 'black': 8}, {'red': 1, 'white': 4}, {'blue': 10, 'yellow': 12})
UserList
Hope you are familiar with python list
s?.
A UserList
is list-like container datatype, which is wrapper class for list
s.
Syntax: collections.UserList([list])
You pass a normal list as an argument to userlist. This list is stored in the data attribute and can be accessed through UserList.data
method.
# Creating a user list with argument my_list
from collections import UserList
my_list=[11,22,33,44]
# Accessing it through `data` attribute
user_list=UserList(my_list)
print(user_list.data)
#> [11, 22, 33, 44]
What is the use of UserLists
Suppose you want to double all the elements in some particular lists as a reward. Or maybe you want to ensure that no element can be deleted from a given list.
In such cases, we need to add a certain ‘behavior’ to our lists, which can be done using UserLists.
For example, Let me show you how UserList
can be used to override the functionality of a built-in method. The below code prevents the addition of a new value (or appending) to a list.
# Creating a userlist where adding new elements is not allowed.
class user_list(UserList):
# function to raise error while insertion
def append(self,s=None):
raise RuntimeError("Authority denied for new insertion")
my_list=user_list([11,22,33,44])
# trying to insert new element
my_list.append(55)
#> ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-2-e8f22159f6e0> in <module>
4
5 my_list=user_list([11,22,33,44])
----> 6 my_list.append(55)
7 print(my_list)
<ipython-input-2-e8f22159f6e0> in append(self, s)
1 class user_list(UserList):
2 def append(self,s=None):
----> 3 raise RuntimeError("Authority denied for new insertion")
4
5 my_list=user_list([11,22,33,44])
RuntimeError: Authority denied for new insertion
The above code prints RunTimeError
message and does not allow appending. This can be helpful if you want to make sure nobody can insert their name after a particular deadline. So, UserList
have very real time efficient.
UserString
Just like UserLists
are wrapper class for list
s, UserString
is a wrapper class for string
s.
It allows you to add certain functionality/behavior to the string. You can pass any string convertible argument to this class and can access the string using the data attribute of the class.
# import Userstring
from collections import UserString
num=765
# passing an string convertible argument to userdict
user_string = UserString(num)
# accessing the string stored
user_string.data
#> '765'
As you can see in above example, the number 765 was converted into a string ‘765’ and can be accessed through the UserString.data
method.
How and when UserString can be used
UserString
can be used to modify the string, or perform certain funtions.
What if you want to remove a particular word from a text file (wherever present)?
May be, some words have misplaced and need to be removed.
Let’s see an example of how `UserString` can be used to remove certain odd words from a string
# Using UserString to remove odd words from the textfile
class user_string(UserString):
def append(self, new):
self.data = self.data + new
def remove(self, s):
self.data = self.data.replace(s, "")
text='apple orange grapes bananas pencil strawberry watermelon eraser'
fruits = user_string(text)
for word in ['pencil','eraser']:
fruits.remove(word)
print(fruits)
#> apple orange grapes bananas strawberry watermelon
You can see that ‘pencil’ and ‘eraser’ were removed using the function class user_string
.
Let us consider another case. What if you need to replace a word by some other word throughout the file?
Userstring
makes this far easier as shown below.The below code replaces a certain word throughout the textfile using UserString
I have defined a function inside the class to replace certain word by ‘The Chairman’ throughout.
# using UserString to replace the name or a word throughout.
class user_string(UserString):
def append(self, new):
self.data = self.data + new
def replace(self,replace_text):
self.data = self.data.replace(replace_text,'The Chairman')
text = 'Rajesh concluded the meeting very late. Employees were disappointed with Rajesh'
document = user_string(text)
document.replace('Rajesh')
print(document.data)
#> The Chairman concluded the meeting very late. Employees were disappointed with The Chairman
As you can see, ‘Rajesh’ is replaced with ‘The Chairman’ everywhere. Similarly, UserStrings help you simplify all processes
UserDict
It is a wrapper class for dictionaries. The syntax, functions are similar to UserList and UserString.
syntax:collections.UserDict([data])
We pass a dictionary as the argument which is stored in the data attribute of UserDict.
# importing UserDict
from collections import UserDict
my_dict={'red':'5','white':2,'black':1}
# Creating an UserDict
user_dict = UserDict(my_dict)
print(user_dict.data)
#> {'red': '5', 'white': 2, 'black': 1}
How UserDict can be used
UserDict
allows you to create a dictionary modified to your needs. Let’s see an example of how UserDict can be used to override the functionality of a built-in method. The below code prevents a key-value pair from being dropped.
# Creating a Dictionary where deletion of an is not allowed
class user_dict(UserDict):
# Function to stop delete/pop
def pop(self, s = None):
raise RuntimeError("Not Authorised to delete")
data = user_dict({'red':'5','white':2,'black':1})
# try to delete a item
data.pop(1)
#> ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-16-2e576a68d2ad> in <module>
12
13 #try to delete a item
---> 14 data.pop(1)
<ipython-input-16-2e576a68d2ad> in pop(self, s)
5 def pop(self, s = None):
6
----> 7 raise RuntimeError("Not Authorised to delete")
8
9
RuntimeError: Not Authorised to delete
You will receive an RunTimeError message. This will help if you don’t want to lose data.
What if some keys have junk values and you need to replace them with nil or ‘0’? See the below examples on how to use Userdict for the same.
class user_dict(UserDict):
def replace(self,key):
self[key]='0'
file= user_dict({'red':'5','white':2,'black':1,'blue':4567890})
# Delete 'blue' and 'yellow'
for i in ['blue','yellow']:
file.replace(i)
print(file)
#> {'red': '5', 'white': 2, 'black': 1, 'blue': '0', 'yellow': '0'}
The field with junk values have been replaced with 0. These are just simple examples of how an UserDict allows you to create a dictionary with required functionality
These are all the container datatypes from the collections module. They increase efficiency by a great amount when used on large datasets.
Conclusion
I hope you have understood when and why to use the above container datatypes. If you have any questions, please drop it in the comments
Recommended Posts
Python JSON Guide
Python RegEx Tutorial
Python Logging Guide
Paralel Processing in Python
This article was contributed by Shrivarsheni.