# Data structures¶

## Arrays¶

In Python the easiest way to implement a 2D array is a list of lists. For a 3D array: list of lists of lists and so on...

In [1]:
M = [[1, 2], [3, 4]]


You can access the emelents with multiple bracket [ ] indexing:

In [2]:
M[0][1]

Out[2]:
2
In [3]:
M = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[0, 9], [0, 0]]]  # 3x2x2 array
M[2][0][1]

Out[3]:
9

Write a function that prints a 2D array in a tabular like format:

1   2
3   4
In [4]:
def array_print(M):
for i in range(len(M)):
for j in range(len(M[i])):
print M[i][j], "\t",   # TAB character
print

In [5]:
array_print([[1,2,3],[4,5,6],[7,8,9]])

1 	2 	3
4 	5 	6
7 	8 	9

In [6]:
array_print([[1,2,3, 3.5],[4,5,6, 6.5],[7,8,9, 9.5]])

1 	2 	3 	3.5
4 	5 	6 	6.5
7 	8 	9 	9.5


## Embedded data structures¶

Let's say that a store would like to set up a discount system for regular customers. The store's database has the name of the customers (a unique string) and their shopping costs so far. The customers are stored in a list, every element of the list is a pairs: their name and their list of shopping costs. For example one customer entry would look like this:

["Anna", [54, 23, 12, 56, 12, 71]]

The discounts are calculated in the following way:

Total shoppings > 200: $10\%$

Total shoppings > 500: $15\%$

Total shoppings > 1000: $20\%$

Otherwise no discount ($0\%$).

Write a function that calculates the discount for every customer.

• The input is the list of customer entries
• The output is the list of the discounts in a length 2 list: name and the discount
• For example a list of these: ["Anna", 10]

How to begin? Break down to subtasks!

1. Decide the discount from the total shopping cost ( 228 -> 10 )
2. calculate the total shopping cost (sum the list of costs)
3. Do this for every customer and return the results in a result list

There are two ways to achive this (design pattern):

• top-down: first write the final function assuming that the smaller subtask (functions) are already done, then do the smaller tasks
In [7]:
# top-down

def discount(customers):
result = []
for customer in customers:
result.append( calculate_discount(customer) )
return result

def calculate_discount(customer):
name = customer[0]
total_cost = 0
for shopping in customer[1]:
total_cost += shopping
return [name, discount_from_total(total_cost)]

def discount_from_total(total):
if total > 1000:
return 20
if total > 500:
return 15
if total > 200:
return 10
return 0

In [8]:
discount([["Anna", [54, 23, 12, 56, 12, 71]],
["Bill", [11, 3, 12, 1, 12, 55]],
["Hagrid", [111, 545, 343, 56, 12, 66]],
["Not_a_wizard", [54, 222, 65, 56, 43, 71]]])

Out[8]:
[['Anna', 10], ['Bill', 0], ['Hagrid', 20], ['Not_a_wizard', 15]]

## tuple¶

We have already seen strings and lists which are similar to the tuple.

An n-tuple can be created with a comma separeted list in a parenthesis or with the tuple() function:

In [9]:
t = (1, 5, 6, 2, 1)
print t[2]
type(t)

6

Out[9]:
tuple
In [10]:
l = [1, 2, 3]
t = tuple(l)
print t

(1, 2, 3)


for loop as before

In [11]:
for e in t:
print e

1
2
3


In [12]:
t[1] = 4

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-141c76cb54a2> in <module>()
----> 1 t[1] = 4

TypeError: 'tuple' object does not support item assignment

Parenthesis is also used for grouping operations ( $2*(3+4)$ ) but that is different from the parenthesis of the tuple. Also in some cases you don't have to write the parenthesis at all.

In [13]:
x = 2, 3, 4
print x

(2, 3, 4)

In [14]:
x, y = 2, 3
print x
print y

2
3


1-tuples are different than a single object in a parenthesis, you can construct a 1-tuple with an ending comma inside a parenthesis.

In [17]:
print type((1))
print type((1,))

<type 'int'>
<type 'tuple'>


# Mutable and immutable types¶

The tuple is almost identical to the list except that it is immutable: you cannot assign a single element. The list is mutable.

You cannot change a tuple once created, except creating a new one, like in case of strings:

In [18]:
s = ("h", "e", "l", "l", "o")
print s[2]
s = ("h", "a", "l", "l", "o")

string = "puppy"
string2 = string[:2] + "ff" + string[4:]
print string2

l
puffy

In [19]:
for e in s:
print e,

h a l l o

In [20]:
s[1] = "e"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-73feb1f8575e> in <module>()
----> 1 s[1] = "e"

TypeError: 'tuple' object does not support item assignment

# Dictionary¶

A dictionary is a series of pairs, we call them key-value pairs. A dictionary can have any type of key and any type of value, but the key have to be immutable.

You can contruct dictionaries with a curly bracket { } or with the dict() function.

In [21]:
d = {"puppy": 5}
type(d)

Out[21]:
dict

A value can be anything, mutable and immutable.

In [22]:
d = {}
d["cat"] = [1, 5]


You can add a pair to the dictionary by assigning the value to the bracketed key.

In [23]:
d["new key"] = "new value"
print d

{'new key': 'new value', 'cat': [1, 5]}


You can access the elements by their keys in a bracket.

In [24]:
d["cat"]

Out[24]:
[1, 5]

You can use mixed keys with different types (as long as immutable):

In [25]:
d = dict()
d[5] = 7
d[(1, 5)] = "puppy"
d[(1, 5)] = "snake"
d["cat"] = 3.14
print d

{(1, 5): 'snake', 5: 7, 'cat': 3.14}


A for loop iterates over the keys:

In [26]:
for key in d:
print key, ":", d[key]

(1, 5) : snake
5 : 7
cat : 3.14


The order of the pairs are not strinct, they may seem random. This is because of the storing algorithm of the dict.

The dictionary uses a so-called hash function to calculate where to put the elements. Their actual order is not relevant, it can even change during your program. But the key-value pairs are kept togeather.

You can observe the hash values yourself with the hash() function:

In [28]:
print hash((1, 5))
print hash(5), hash(0), hash(False), hash(True)
print hash((5,))
print hash("puppy")
print hash("puppz")
print hash("muppy")
print hash("puppy, cat, python, galahad ...")

3713081631939823281
5 0 0 1
3430023387570
1778334363375243195
1778334363375243192
1523879377499237882
5497814539177767733


## Hash function¶

The dictionary uses the hash function, but a similar function is common in both theoretical and applied computer science. There are advanced algorithm courses about hash functions.

What is a hash function:

• assign a natural number to every piece of data (the value has a fixed width bit representation: 64, 128, 256 ... )
• it is a function so it assigns the same value to the same thing (every time).
• Sort-of unique meaning that there are a few different data with the same hash value
• it should be calculated quickly

There may be extra requirements in other applications (cryptographic hash function):

• infeasable to invert (find out the data from the hash), not impossible but rather slow
• (pseudo)random and non-continuous
• infeasable to mimic a given data (find a piece of data to match a given hash value)

### Applications¶

• checksum (file hash)
• indexing, fast access (dict)
• pseudo random number generators (cryptography)

# Iterating and functions of containers¶

A data type is iterable if it can be iterated over with a for loop. Example:

• list
• string (characters)
• tuple
• dict

 for x in """iterable""": """do stuff""" 

There are some useful functions that can be used on any iterables:

In [29]:
# repetition
print "puppy "*3
print (1, 2, 3)*3
print [1, 2, 3]*3

puppy puppy puppy
(1, 2, 3, 1, 2, 3, 1, 2, 3)
[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [30]:
# concatenation (except for dict)
(1, 2) + (2, 4, 6)

Out[30]:
(1, 2, 2, 4, 6)
In [ ]:
# logical "and"
print all((False, True, True, True))
print all((0, 1, 1, 1))

In [31]:
# logical "or"
any((0, 1, 1, 1))

Out[31]:
True
In [32]:
# "transpose"
zip([1, 2, 3], [11, 12, 13])

Out[32]:
[(1, 11), (2, 12), (3, 13)]
In [33]:
# sum (for numbers, not for strings)
sum((1, 2, 3))

Out[33]:
6
In [ ]: