Reading 15: Dictionaries
I was reading the dictionary. I thought it was a poem about everything.
- Steven Wright
Overview
A dictionary is a new kind of Python data type. Dictionaries are fantastically useful and are found in nearly all Python programs. Once you learn what dictionaries are and how they work, you won't want to program without them.
Before we describe what dictionaries are, let's describe a problem they can solve.
Example problem: phone number database
You want to keep track of your friends' phone numbers. But since you have so many friends, this is a difficult job! How can the computer help?
For each friend, you need to store:
- the name of the friend
- their phone number
Also, you want to be able to retrieve the phone number for a given friend. Given what you know now, how can you do this?
Using a list
You could create a list of names and phone numbers...
...but it would not be easy to find the number corresponding to a different name. It would be better if a name and the corresponding phone number were associated in some way.
Using a list of tuples
You could have a list of (<name>, <phone number>)
tuples:
Let's see what we would need to do
in order to find the phone number
corresponding to a particular name e.g. 'Ethan'
.
We could write the code like this:
This is not too bad, but:
-
We can't modify the phone number! (Tuples are immutable.) We might use lists instead of tuples, but this isn't the only problem.
-
We have to look through the entire list in the worst case to find one number, which is inefficient.
Using a dictionary
The Right ThingTM to do in cases like this is to use a dictionary. So let's talk about dictionaries and what makes them so awesome.
Keys and values
A dictionary (sometimes called a dict for short) is a Python data structure that associates keys with values. Each key is associated with exactly one value. (Sometimes this is called a mapping between keys and values.) Dictionaries allow you to do these things:
- find the value corresponding to a particular key
- change the value associated with a key
- add new key/value pairs
- delete key/value pairs
and they're fast! (Much faster than a list of tuples, for instance.)1
Because we can add key/value pairs to a dictionary and delete key/value pairs from a dictionary, dictionaries, like lists, are not immutable.
There are two rules for keys and values:
- The values in a dictionary can be any Python value.
- The keys in a dictionary can be any kind of immutable Python value.2
Since strings are immutable, we can use strings as dictionary keys. You can also use numbers, tuples, and other kinds of values we haven't seen yet.3 In the example above, we can use names as keys and phone numbers as values.
Dictionary syntax
We want to create a dictionary from our friends' names and phone numbers. First, we have to know the syntax of dictionaries.
Empty dictionary
Dictionaries use curly braces, and the simplest dictionary is the empty one, which looks like this:
It's a dictionary with no key/value pairs. Pretty exciting!
Actually, though, empty dictionaries, like empty lists, are very useful. Often you start with an empty dictionary and then fill it up element-by-element in a loop, adding a new key/value pair for every iteration of the loop.
Non-empty dictionary
Alternatively, you can create a dictionary by writing out the key/value pairs inside of curly braces, separated in two ways:
-
different key/value pairs are separated by commas
-
the key and the value in a single key/value pair are separated by a colon (
:
)4
For our example, here is the dictionary we can create:
If there are more key/value pairs, we can add them too.
You can see that the first key/value pair is 'Joe' : '567-8901'
and the second is 'Jane' : '123-4567'
.
The keys are 'Joe'
and 'Jane'
and the values are '567-8901'
and '123-4567'
.
The spaces in the dictionary are not required
but they help to keep it readable.
Most of the time, when we write out a dictionary like this (called a literal dictionary), the keys and values are Python values, but they can also be Python expressions. Here's a contrived example.
This would give the same dictionary as the previous code.
In cases like this (which are very rare), the key expressions and the value expressions are evaluated before the dictionary is created. (It's not that rare to have computed values, but computed keys are extremely unusual.)
Dictionary types
The only restriction on the types of keys or values in a dictionary is that the key must be immutable i.e. its type must be the type of an immutable Python object. Other than that, a dictionary can have any type of key or value. In particular, a single dictionary can have different types of (immutable) keys, and different types of values. This is a bit unusual, but sometimes it's quite useful. So this is legal:
The mydict
dictionary has three different (immutable) key types,
and three different value types
(not all immutable, but it doesn't matter for values).
Note
You may have heard of the JSON data format, which is a way of formatting structured data which is used a lot by internet applications. A JSON object is almost identical to a Python dictionary with string keys and different types of values. Python, like most programming languages, has a JSON library (actually more than one).
Getting a value given a key
The most common thing to do with a dictionary is to look up the value that corresponds to a particular key. We'll assume this dictionary again:
To get Joe's phone number, all we have to write is this:
which will evaluate to '567-8901'
.
Notice that phone_numbers['Joe']
looks like accessing a list with a value of 'Joe'
.
Python is overloading the meaning of the square brackets!
Before this, the value inside the brackets could only be an integer.
But with a dictionary, it can be any key value
(which means any immutable Python value).
Python really likes to re-use its syntax for distinct but similar things!
Changing a value at a key
Another thing you commonly want to do with a dictionary is to change the value associated with a particular key. For instance, let's say that Joe's phone number changes. We can change the dictionary value too:
This is just like the syntax for changing a list value, except that the "index" is a string, not a number. Here's the new dictionary:
Adding a new key/value pair
Another very common thing to do with dictionaries is to add new key/value pairs. The syntax for this is identical to the syntax for changing the values at existing keys, except that the keys are not in the dictionary until after you add them. For instance, let's say that you just made a new friend named Bob, and you wanted to add his phone number. No problem!
Now when you look at the entire dictionary, you see this:
Note
Even though it looks like the key/value pairs are stored in the order they were added, you shouldn't depend on this. Python dictionaries are not sequences. The current implementation does keep keys in "insertion order", but earlier versions of Python dictionaries didn't, and this might change again in the future.
Accessing a nonexistent key
What happens when you try to access a nonexistent key in a dictionary?
>>> phone_numbers['Mike']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Mike'
Python raises a KeyError
exception. This is the right thing to do.5
Deleting a key/value pair: the del
statement
It's not that common, but sometimes we want to delete a key/value pair.
Let's say you had a falling out with your new friend Bob,
and you decide you don't ever want to talk to him again.
You might want to delete his phone number
from your phone_number
dictionary.
Here's how to do it:
>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567', 'Bob': '000-0000'}
>>> del phone_numbers['Bob']
>>> phone_numbers
{'Joe': '314-1592', 'Jane': '123-4567'}
The new keyword del
is short for "delete".
Given a key, it removes the key/value pair
that the key is part of from the dictionary.
This is not a function or method call!
del
is actually a special Python statement.
Because it isn't a function call,
you don't have to put parentheses around its argument (and you shouldn't).
del
can remove elements from things other than dictionaries
(e.g. lists) but it's more useful with dictionaries than with lists.
We will meet del
again in future readings.
Back to the example: tuples as keys
Let's improve the example by using a tuple of first and last names as keys:
phone_numbers = {
('Joe', 'Smith') : '567-8910',
('Jane', 'Doe') : '123-4567',
('Adam', 'Blank') : '000-0000',
('Mike', 'Vanier') : '111-1111',
}
(Fun fact: we don't have to use the \<return>
line continuation characters at the ends of the lines
when writing out a dictionary like this.)
It's OK to use a tuple of strings as a dictionary key, because both tuples and strings are immutable, so a tuple of strings is immutable too. If we had e.g. a tuple of lists, that would not be immutable, so you couldn't use it as a key. Similarly, a list of strings is not an acceptable dictionary key. Let's try it anyway:
>>> phone_numbers = { ['Joe', 'Smith'] : '567-8910', ['Jane', 'Doe'] : '123-4567' }
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
The error message unhashable type: 'list'
means that since lists are mutable,
they can't be used as dictionary keys.6
OK, so we'll use tuples. Once we've done this, we can access a value corresponding to a tuple:
We have to use the entire tuple; either the first or last name doesn't work:
>>> phone_numbers['Joe']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Joe'
>>> phone_numbers['Smith']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Smith'
Dictionaries and for
loops
We've seen many things that can be looped over using for
loops:
- lists
- strings
- files
So it shouldn't surprise you to learn that dictionaries
can also be looped over in a for
loop.
When you have a dictionary in a for
loop following the in
keyword,
you loop over the keys of the dictionary.
For instance, we could write this loop:
which would print:
key: ('Joe', 'Smith'), value: 567-8910
key: ('Jane', 'Doe'), value: 123-4567
key: ('Adam', 'Blank'), value: 000-0000
key: ('Mike', 'Vanier'), value: 111-1111
We could use this to print out the phone numbers of
every person in the dictionary whose first name is 'Joe'
:
for key in phone_numbers:
(first_name, last_name) = key
if first_name == 'Joe':
print(f'name: {first_name} {last_name}, number: {phone_numbers[key]}')
Since there is only one 'Joe'
in the dictionary, this will print:
Dictionary methods
Dictionaries are objects in Python (like lists, and strings, and files) Therefore, they have methods. In this section, we'll discuss a few of the most important ones. For a full list of dictionary methods, consult the Python documentation.
get
If you try to get the value in a dictionary
corresponding to a key which isn't in the dictionary,
it results in a KeyError
exception
if you use the square bracket syntax:
>>> phone_numbers[('William', 'Shakespeare')]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: ('William', 'Shakespeare')
Instead of this,
you can use the get
method if you would rather return a default value:
>>> phone_numbers.get(('William', 'Shakespeare'), 'unknown')
# 'unknown' is the default value returned if the key isn't in the dictionary
'unknown'
The default value is the extra argument (here, 'unknown'
).
This is usually not what you want to do, but we'll see an example below where this method will be useful.
clear
If you want to empty out an existing dictionary,
you can do that with the clear
method:
>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '123-4567', ('Adam', 'Blank'): '000-0000', ('Mike', 'Vanier'): '111-1111'}
>>> phone_numbers.clear()
>>> phone_numbers
{}
This is rarely needed; it's actually easier to just do this:
On the other hand,
if the dictionary was passed in as an argument to a function
or was part of a larger data structure,
you might need to use the clear
method if you need to empty it out.
keys
and values
If you need the dictionary's keys or values as a separate thing,
you can use the keys
or values
methods.
These return (respectively) a dict_keys
or a dict_values
object.
These are basically like iterators, and they can easily be converted to lists:
phone_numbers = {
('Joe', 'Smith') : '567-8910',
('Jane', 'Doe') : '123-4567',
('Adam', 'Blank') : '000-0000',
('Mike', 'Vanier') : '111-1111',
}
>>> phone_numbers.keys()
dict_keys([('Joe', 'Smith'), ('Jane', 'Doe'), ('Adam', 'Blank'), ('Mike', 'Vanier')])
>>> list(phone_numbers.keys())
[('Joe', 'Smith'), ('Jane', 'Doe'), ('Adam', 'Blank'), ('Mike', 'Vanier')]
>>> phone_numbers.values()
dict_values(['567-8910', '123-4567', '000-0000', '111-1111'])
>>> list(phone_numbers.values())
['567-8910', '123-4567', '000-0000', '111-1111']
It's rare that you actually need these methods.
items
The items
method is like the keys
and values
methods combined:
it returns a dict_items
object
which can be converted to a list of key/value pairs:
>>> phone_numbers.items()
dict_items([(('Joe', 'Smith'), '567-8910'), (('Jane', 'Doe'), '123-4567'), (('Adam', 'Blank'), '000-0000'), (('Mike', 'Vanier'), '111-1111')])
>>> list(phone_numbers.items())
[(('Joe', 'Smith'), '567-8910'), (('Jane', 'Doe'), '123-4567'), (('Adam', 'Blank'), '000-0000'), (('Mike', 'Vanier'), '111-1111')]
Sometimes the items
method can be used to good effect in a for
loop
(kind of like the dictionary equivalent of the enumerate
function):
>>> for (key, value) in phone_numbers.items():
... print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 123-4567
('Adam', 'Blank') 000-0000
('Mike', 'Vanier') 111-1111
You usually don't need to convert the items
return value into a list,
and you usually shouldn't.
(In this respect, the items
method is similar to the range
function.)
update
The update
method adds the key/value pairs from another dictionary
into a dictionary, overwriting old values
if the other dictionary has the same keys with different values.
>>> for (key, value) in phone_numbers.items():
... print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 123-4567
('Adam', 'Blank') 000-0000
('Mike', 'Vanier') 111-1111
>>> new_phone_numbers = {
... ('Bob', 'Johnson') : '543-9876',
... ('Jane', 'Doe') : '7654-321'
... }
>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '123-4567', ('Adam', 'Blank'): '000-0000', ('Mike', 'Vanier'): '111-1111'}
>>> phone_numbers.update(new_phone_numbers)
>>> phone_numbers
{('Joe', 'Smith'): '567-8910', ('Jane', 'Doe'): '7654-321', ('Adam', 'Blank'): '000-0000', ('Mike', 'Vanier'): '111-1111', ('Bob', 'Johnson'): '543-9876'}
>>> for (key, value) in phone_numbers.items():
... print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 7654-321
('Adam', 'Blank') 000-0000
('Mike', 'Vanier') 111-1111
('Bob', 'Johnson') 543-9876
We see that updating the phone_numbers
dictionary with new_phone_numbers
has provided a new phone number for Bob Johnson
and has overwritten the old phone number for Jane Doe.
What about append
?
There is no append
method for dictionaries, because it's not needed!
To add a new key/value pair, just use normal assignment syntax:
>>> phone_numbers[('Don', 'Knuth')] = '271-8281'
>>> for (key, value) in phone_numbers.items():
... print(key, value)
...
('Joe', 'Smith') 567-8910
('Jane', 'Doe') 7654-321
('Adam', 'Blank') 000-0000
('Mike', 'Vanier') 111-1111
('Bob', 'Johnson') 543-9876
('Don', 'Knuth') 271-8281
The in
operator
Previously we've seen the in
operator for sequences.
We can also use in
with dictionaries.
<key> in <dictionary>
means:
is the key <key>
one of the keys in the dictionary <dictionary>
?
Example: creating a frequency table
OK, let's do something useful!
We have a list of words. We want to create a frequency table for each word, which means that for each word, we want to record the number of times it occurs in the word list.
We will solve this by creating a dictionary:
- key: a word in the list
- value: the count of that word
Let's write the code, and also print out the resulting table at the end.
words = ['to', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
freqs = {}
for word in words:
if word in freqs:
freqs[word] += 1
else: # first time we've seen that word
freqs[word] = 1
for (key, value) in freqs.items():
print(f'Word: {key} occurs: {value} times')
This prints:
Word: to occurs: 2 times
Word: be occurs: 2 times
Word: or occurs: 1 times
Word: not occurs: 1 times
Word: that occurs: 1 times
Word: is occurs: 1 times
Word: the occurs: 1 times
Word: question occurs: 1 times
See how easy that was? Dictionaries can make many programming tasks much easier to accomplish.
Of course, it's pretty rare to find any code that can't be improved
somewhere... What can we do here?
Remember the get
method we described above?
The idea there was that if the key wasn't in the dictionary,
we would supply a default value to return.
Here, we have a similar situation,
except that we are setting the values in a dictionary.
But if you look closely, you'll see that the line
is equivalent to:
which means that this line is both getting a value from a dictionary at a particular key and setting the value at the same key.
The trick to making this code simpler
is to realize that when the key isn't in the dictionary,
we can use the get
method to just return a count of 0
.
Then the code simplifies to this:
We aren't using the +=
operator any more, so line 4 is longer,
but we've eliminated the if
statement entirely.
This counts as a win.
Conclusion
You may think that this is just another reading, but if you continue programming in Python we guarantee you that dictionaries will be one of the most useful things you will ever learn. They are used everywhere, and learning to use them effectively will take you a long way towards becoming a good Python programmer.
-
Dictionaries allow you to look up the value corresponding to a key in constant time, which means that the time to retrieve a value given the key doesn't depend on how many key/value pairs are stored in the dictionary. In contrast, searching for a key/value pair in a list of tuples takes linear time i.e. time proportional to the size of the list. You'll learn much more about constant time and linear time in CS 2. ↩
-
The reasons for this are technical, having to do with the way that dictionaries are implemented. Internally, they use what are called hash tables, and for a hash table to work correctly, the hash value of a dictionary key shouldn't change, which means that the key itself shouldn't change. You will meet hash tables again if you take any CS courses beyond CS 1. ↩
-
Like immutable sets of values. Don't worry, we'll get there. ↩
-
Yes, here it is, yet another meaning for the colon character. ↩
-
Some languages (*cough* *Javascript*) return a special "undefined" value, which creates problems because programmers rarely check for "undefined" after doing every dictionary access. ↩
-
If you read the previous footnotes, you won't be surprised to see the word "unhashable", which relates to hash tables, which is the data structure used to implement dictionaries. ↩