Data structures

Arrays

In Python the easiest way to implement a 2D array is a list of lists. For a 3D array: list of lists of lists and so on...

In [1]:
M = [[1, 2], [3, 4]]

You can access the emelents with multiple bracket [ ] indexing:

In [2]:
M[0][1]
Out[2]:
2
In [3]:
M = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[0, 9], [0, 0]]]  # 3x2x2 array
M[2][0][1]
Out[3]:
9

Write a function that prints a 2D array in a tabular like format:

1   2
3   4
In [4]:
def array_print(M):
    for i in range(len(M)):
        for j in range(len(M[i])):
            print M[i][j], "\t",   # TAB character
        print
In [5]:
array_print([[1,2,3],[4,5,6],[7,8,9]])
1 	2 	3 	
4 	5 	6 	
7 	8 	9 	
In [6]:
array_print([[1,2,3, 3.5],[4,5,6, 6.5],[7,8,9, 9.5]])
1 	2 	3 	3.5 	
4 	5 	6 	6.5 	
7 	8 	9 	9.5 	

Embedded data structures

Let's say that a store would like to set up a discount system for regular customers. The store's database has the name of the customers (a unique string) and their shopping costs so far. The customers are stored in a list, every element of the list is a pairs: their name and their list of shopping costs. For example one customer entry would look like this:

["Anna", [54, 23, 12, 56, 12, 71]]

The discounts are calculated in the following way:

Total shoppings > 200: $10\%$

Total shoppings > 500: $15\%$

Total shoppings > 1000: $20\%$

Otherwise no discount ($0\%$).

Write a function that calculates the discount for every customer.

  • The input is the list of customer entries
  • The output is the list of the discounts in a length 2 list: name and the discount
  • For example a list of these: ["Anna", 10]

How to begin? Break down to subtasks!

  1. Decide the discount from the total shopping cost ( 228 -> 10 )
  2. calculate the total shopping cost (sum the list of costs)
  3. Do this for every customer and return the results in a result list

There are two ways to achive this (design pattern):

  • top-down: first write the final function assuming that the smaller subtask (functions) are already done, then do the smaller tasks
  • bottom-up: write the smaller subtasks first, then work your way up to the final task
In [7]:
# top-down

def discount(customers):
    result = []
    for customer in customers:
        result.append( calculate_discount(customer) )
    return result


def calculate_discount(customer):
    name = customer[0]
    total_cost = 0
    for shopping in customer[1]:
        total_cost += shopping
    return [name, discount_from_total(total_cost)]


def discount_from_total(total):
    if total > 1000:
        return 20
    if total > 500:
        return 15
    if total > 200:
        return 10
    return 0
In [8]:
discount([["Anna", [54, 23, 12, 56, 12, 71]], 
            ["Bill", [11, 3, 12, 1, 12, 55]],
            ["Hagrid", [111, 545, 343, 56, 12, 66]], 
            ["Not_a_wizard", [54, 222, 65, 56, 43, 71]]])
Out[8]:
[['Anna', 10], ['Bill', 0], ['Hagrid', 20], ['Not_a_wizard', 15]]

tuple

We have already seen strings and lists which are similar to the tuple.

An n-tuple can be created with a comma separeted list in a parenthesis or with the tuple() function:

In [9]:
t = (1, 5, 6, 2, 1)
print t[2]
type(t)
6
Out[9]:
tuple
In [10]:
l = [1, 2, 3]
t = tuple(l)
print t
(1, 2, 3)

for loop as before

In [11]:
for e in t:
    print e
1
2
3

But you cannot assign individual elements (like the string)

In [12]:
t[1] = 4
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-141c76cb54a2> in <module>()
----> 1 t[1] = 4

TypeError: 'tuple' object does not support item assignment

Parenthesis is also used for grouping operations ( $2*(3+4)$ ) but that is different from the parenthesis of the tuple. Also in some cases you don't have to write the parenthesis at all.

In [13]:
x = 2, 3, 4
print x
(2, 3, 4)
In [14]:
x, y = 2, 3
print x
print y
2
3

1-tuples are different than a single object in a parenthesis, you can construct a 1-tuple with an ending comma inside a parenthesis.

In [17]:
print type((1))
print type((1,))
<type 'int'>
<type 'tuple'>

Mutable and immutable types

The tuple is almost identical to the list except that it is immutable: you cannot assign a single element. The list is mutable.

You cannot change a tuple once created, except creating a new one, like in case of strings:

In [18]:
s = ("h", "e", "l", "l", "o")
print s[2]
s = ("h", "a", "l", "l", "o")

string = "puppy"
string2 = string[:2] + "ff" + string[4:]
print string2
l
puffy
In [19]:
for e in s:
    print e,
h a l l o
In [20]:
s[1] = "e"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-73feb1f8575e> in <module>()
----> 1 s[1] = "e"

TypeError: 'tuple' object does not support item assignment

Dictionary

A dictionary is a series of pairs, we call them key-value pairs. A dictionary can have any type of key and any type of value, but the key have to be immutable.

You can contruct dictionaries with a curly bracket { } or with the dict() function.

In [21]:
d = {"puppy": 5}
type(d)
Out[21]:
dict

A value can be anything, mutable and immutable.

In [22]:
d = {}
d["cat"] = [1, 5]

You can add a pair to the dictionary by assigning the value to the bracketed key.

In [23]:
d["new key"] = "new value"
print d
{'new key': 'new value', 'cat': [1, 5]}

You can access the elements by their keys in a bracket.

In [24]:
d["cat"]
Out[24]:
[1, 5]

You can use mixed keys with different types (as long as immutable):

In [25]:
d = dict()
d[5] = 7
d[(1, 5)] = "puppy"
d[(1, 5)] = "snake"
d["cat"] = 3.14
print d
{(1, 5): 'snake', 5: 7, 'cat': 3.14}

A for loop iterates over the keys:

In [26]:
for key in d:
    print key, ":", d[key]
(1, 5) : snake
5 : 7
cat : 3.14

The order of the pairs are not strinct, they may seem random. This is because of the storing algorithm of the dict.

The dictionary uses a so-called hash function to calculate where to put the elements. Their actual order is not relevant, it can even change during your program. But the key-value pairs are kept togeather.

You can observe the hash values yourself with the hash() function:

In [28]:
print hash((1, 5))
print hash(5), hash(0), hash(False), hash(True)
print hash((5,))
print hash("puppy")
print hash("puppz")
print hash("muppy")
print hash("puppy, cat, python, galahad ...")
3713081631939823281
5 0 0 1
3430023387570
1778334363375243195
1778334363375243192
1523879377499237882
5497814539177767733

Hash function

The dictionary uses the hash function, but a similar function is common in both theoretical and applied computer science. There are advanced algorithm courses about hash functions.

What is a hash function:

  • assign a natural number to every piece of data (the value has a fixed width bit representation: 64, 128, 256 ... )
    • it is a function so it assigns the same value to the same thing (every time).
  • Sort-of unique meaning that there are a few different data with the same hash value
  • it should be calculated quickly

There may be extra requirements in other applications (cryptographic hash function):

  • infeasable to invert (find out the data from the hash), not impossible but rather slow
  • (pseudo)random and non-continuous
  • infeasable to mimic a given data (find a piece of data to match a given hash value)

Applications

  • checksum (file hash)
  • indexing, fast access (dict)
  • pseudo random number generators (cryptography)

Iterating and functions of containers

A data type is iterable if it can be iterated over with a for loop. Example:

  • list
  • string (characters)
  • tuple
  • dict

for x in """iterable""": """do stuff"""

There are some useful functions that can be used on any iterables:

In [29]:
# repetition
print "puppy "*3
print (1, 2, 3)*3
print [1, 2, 3]*3
puppy puppy puppy 
(1, 2, 3, 1, 2, 3, 1, 2, 3)
[1, 2, 3, 1, 2, 3, 1, 2, 3]
In [30]:
# concatenation (except for dict)
(1, 2) + (2, 4, 6)
Out[30]:
(1, 2, 2, 4, 6)
In [ ]:
# logical "and"
print all((False, True, True, True))
print all((0, 1, 1, 1))
In [31]:
# logical "or"
any((0, 1, 1, 1))
Out[31]:
True
In [32]:
# "transpose"
zip([1, 2, 3], [11, 12, 13])
Out[32]:
[(1, 11), (2, 12), (3, 13)]
In [33]:
# sum (for numbers, not for strings)
sum((1, 2, 3))
Out[33]:
6
In [ ]: