Binary trees

Let us consider a list of real numbers and you want to find an element in the list. Suppose that the numbers are all different. If you iterate over the elements to find a particular one then it takes as much steps as the length of the list. This is what we call

  • linear or
  • $O(n)$ pronounced "order $n$"

time complexity.

But you can do better if the list is already sorted. The advantage of the binary search is that it takes $O(\log(n))$ time but you have to sort your numbers first.

The best performance is when you sort your numbers and don't modify (insert or erase) them later. In this case you can search in them efficiently.

Binary search

  • Look at the middle of the list and how it compares to the searched number.
    • If the middle is smaller, then the searched number must be in the second half.
      • In this case continue your search in the second half of the list.
    • If the middle is greater, then the searched number must be in the first half.
      • In this case continue your search in the first half of the list.
    • If they are equal, then you found it.
In [1]:
import sys

def binary_search(l, x):
    a = 0
    b = len(l)-1
    while a <= b:
        i = (a + b)/2
        print >>sys.stderr, i,
        if l[i] < x:
            a = i+1
        elif l[i] > x:
            b = i-1
        else:
            print >>sys.stderr
            return i
    print >>sys.stderr
    return None
In [2]:
l = [3,10,-1,3.14,20,2]
l.sort()
print l
print binary_search(l,10)
[-1, 2, 3, 3.14, 10, 20]
4
2 4

This search takes at most $\log_2(n)$ steps, since the possible places get halved at every step.

In [3]:
for i in range(-1,11):
    print binary_search(range(10),i)
None
0
1
2
3
4
5
6
7
8
9
None
4 1 0
4 1 0
4 1
4 1 2
4 1 2 3
4
4 7 5
4 7 5 6
4 7
4 7 8
4 7 8 9
4 7 8 9

In this algoritm the numbers have to be pre-sorted. You have to be careful when instering a new element.

Write a class for storing a sorted list. The class will keep the order of the elements by insert a new element to its correct place.

In [4]:
class OrderedList(object):
    
    def __init__(self, elements=[]):
        self.l = sorted(elements)
    
    def search(self, x):
        a = 0
        b = len(self.l)-1
        while a <= b:
            i = (a + b)/2
            if self.l[i] < x:
                a = i+1
            elif self.l[i] > x:
                b = i-1
            else:
                return i
        return None
    def insert(self, x):
        a = 0
        b = len(self.l)-1
        while a <= b:
            i = (a + b)/2
            if self.l[i] < x:
                a = i+1
            elif self.l[i] > x:
                b = i-1
            else:
                return
        self.l.insert(a, x)
        
    def erase(self, x):
        index = self.search(x)
        if index:
            del self.l[index]
            
    def __iter__(self):
        return iter(self.l)
    def __repr__(self):
        return str(self.l)
In [5]:
l = OrderedList([1, -1, 2])
print l
l.insert(5)
print l
l.erase(2)
print l
l.insert(3.14)
l.insert(0)
print l
print l.search(1)
[-1, 1, 2]
[-1, 1, 2, 5]
[-1, 1, 5]
[-1, 0, 1, 3.14, 5]
2

Binary trees

A binary tree is a directed graph where every node have at most two outgoing edges.

We will implement a tree for storing numbers in a similar way than in the previous ordered list.

In [6]:
class Node(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None
    def insert(self, data):
        if self.data > data:
            if self.left is None:
                self.left = Node(data)
            else:
                self.left.insert(data)
        elif self.data < data:
            if self.right is None:
                self.right = Node(data)
            else:
                self.right.insert(data)

This implementation keeps the tree sorted in a way that smaller numbers always go to the left and greater numbers to the right.

If a node has no outgoing edge then it is marked by None. If you want to insert an elements then do the following.

  • If the new number is smaller then the current node then you should insert it to the left
    • If the left edge (branch) is None then insert a new node (leaf) there.
    • If the left branch is already occupied then call the insert method to that node recursively.
  • If the new number is greater than the current node, then do the same for the right branch.
  • If they are equal then you don't have to do anything because the number was already in the tree.

No we add a to_list() method which retrieves the elements in order.

  • First start with an empty list
  • Concatenate the result of to_list on the left branch (recursive)
  • Append the current element
  • Concatenate the result of to_list on the right branch (recursive)

The result is the list of ordered elements.

Python Tutor link

In [7]:
class Node(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None
    def insert(self, data):
        if self.data > data:
            if self.left is None:
                self.left = Node(data)
            else:
                self.left.insert(data)
        elif self.data < data:
            if self.right is None:
                self.right = Node(data)
            else:
                self.right.insert(data)
    def to_list(self):
        s = []
        if self.left is not None:
            s += self.left.to_list()
        s.append(self.data)
        if self.right is not None:
            s += self.right.to_list()
        return s
root = Node(5)
root.insert(4)
root.insert(1)
root.insert(7)
print root.to_list()
[1, 4, 5, 7]

This is good for storing the elements ordered and also you can find and add elements with $O(\log(n))$ time complexity.

Although this technique is slow if the tree is very uneven (unbalanced). The actual time complexity is the depth (height) of the tree. Look this!

In [8]:
root = Node(5)
root.insert(4)
root.insert(3)
root.insert(2)
root.insert(1)

There are self-balancing binary trees (AVL trees) and you will learn about them on Algorithm theory class.

Now we implement the find(data) method, it is quite straight forward.

Python Tutor link

In [9]:
class Node(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None
    def insert(self, data):
        if self.data > data:
            if self.left is None:
                self.left = Node(data)
            else:
                self.left.insert(data)
        elif self.data < data:
            if self.right is None:
                self.right = Node(data)
            else:
                self.right.insert(data)
    def to_list(self):
        s = []
        if self.left is not None:
            s += self.left.to_list()
        s.append(self.data)
        if self.right is not None:
            s += self.right.to_list()
        return s
    def find(self, data):
        if data == self.data:
            return self
        elif data < self.data and self.left is not None:
            return self.left.find(data)
        elif self.right is not None:
            return self.right.find(data)
        else:
            return None
root = Node(5)
root.insert(4)
root.insert(1)
root.insert(7) 
print root.find(4)
print root.find(4).to_list()
<__main__.Node object at 0x0000000008D3BB70>
[1, 4]

An application

A very useful application of binary trees are expression trees where the nodes are operations and the leaves are numbers.

If a node is an operation, then store the operation in self.data and the two branches are the two operands. The leaves are numbers without outgoing edges.

In [10]:
class Node:
    pass

def calculate(node):
    if node.data == "+":
        return calculate(node.left) + calculate(node.right)
    elif node.data == "*":
        return calculate(node.left) * calculate(node.right)
    else:
        return node.data
    
def write(node):
    if node.data == "+" or node.data == "*":
        return "(" + write(node.left) + ")" + node.data + "(" + write(node.right) + ")"
    else:
        return str(node.data)

x0 = Node()
x0.data = "+"
x1 = Node()
x1.data = "*"
x2 = Node()
x2.data = 7
x3 = Node()
x3.data = 5
x1.left = x2
x1.right = x3
x4 = Node()
x4.data = -3
x0.left = x1
x0.right = x4
print calculate(x0)
print write(x0)
32
((7)*(5))+(-3)