# File operations and command line
We will learn how read and write files from python and how to write a python program and run it from the command line. 

## Redaing files
Python - just like most of the languages - handles files through file objects. 

The `open(filename[, mode])` function opens a file and returns a file handle object (or raise an error).
The `mode` can be `'r'` (read), `'w'` (write), `'r+'` (both), or in case of binary files: `'rb'`, `'wb'`, `'r+b'`.

In [None]:
f = open('E0.csv') # open for reading, returns file object
print f

The file object is not so useful on its own.
This file contains the English Premier League statistics from the season of 2015/16.
The `f.read()` reads the whole file to a string. We don't print the whole:

In [None]:
f = open('E0.csv')
content = f.read()
print content[:100]

Read only one first line!

In [None]:
f = open('E0.csv')
first_line = f.readline()
print first_line
second_line = f.readline()
print second_line

The file object is _iterable_, row-wise:

Mind the newline character at the end of each line.

In [None]:
f = open('E0.csv')
L = []
for line in f:
    L.append(line)
print L[:10]

The list `L` now contains the rows of the file. You can split the lines into cells with `.split(",")` but that's for later.

## Writing a file
Let's say that you care only about the results of the team 'Liverpool'.
Write a file named `'Liverpool.csv'` containing anything.

For writing we open with `open(filename, 'w')`. **If you write file, don't forget to close it!**

In [None]:
f = open('Liverpool.csv', 'w')
f.write('YNWA')
f.close()

Note: for reading a text file we use `open('E0.csv', 'r')` but reading is the default, so you can skip the mode `r`.

Let's read the results row-by-row and choose the rows containing the word 'Liverpool', save those line in 'Liverpool.csv'.
The header of the file will be the same!

In [None]:
f = open('E0.csv')
L = [f.readline()]
for line in f:
    if 'Liverpool' in line:
        L.append(line)
with open('Liverpool.csv', 'w') as f:
    for l in L:
        f.write(l)
f.close()

Note that the **write** method does not write into a new line automatically, you have to insert the newline characters manually.
In the example the lines already contained newline characters.

The `with open(filename, 'rb') as f` is the same as `f = open(filename, 'rb')` but the file will be closed at the end of the block.
In this way you can make sure to close the file and don't leave it open.

The former one is prefered due to safety reasons (against data corruption)!

## `csv` and `json` in python
The former file was a **comma separated values** file with the `.csv` extenstion.
In those files the records follow each other line-by-line, inside a record the cells are separeted by comma (`,`)
Python can handle this format with the `csv` module.

In [None]:
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in reader:
        L.append(row)
print L[0]
print L[19]

The difference is that the cells are handled, too. You can determine which delimiter to use (here `,`).
The `quotechar` determines how to read strings with special characters or commas.
This is useful for example when you want to write the decimal number `2,25` in a `.csv` file.
The `csv` module handles all these.

Open the `.csv` files as binary (`rb`/`wb`), the `csv` module won't work properly otherwise.

In general opening a text file as binary is not a big problem,
but opening a binary file as text may cause problems.
More [about the EOL characters](https://en.wikipedia.org/wiki/Newline). This is due to historical reasons from different operation systems.

### Reading csv format as `dict`
If you look the data closely, you can see that a dictionary would be even better.
It's better to refer the cells by name and not index.

This format uses the first line (header) as dictionary keys.

In [None]:
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        L.append(row)
print L[0]
print L[1]

Store the data of Liverpool matches. We will write the 'Date', 'HomeTeam', 'AwayTeam', 'FTHG'(Full Time Home Goals), 'FTAG' (Full Time Away Goals), 'FTR' (Full Time Result) values!
We will use `csv.DictWriter` to write the data, the `writer.writeheader()` writes the header first,
then `writer.writerows()` writes the actual data.

The `fieldnames` parameter tells which fields (columns) to use.
The `extrasaction='ignore'` ignores the other fields.

In [None]:
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
    reader = csv.DictReader(csvfile)
    for x in reader:
        if x['HomeTeam'] == 'Liverpool' or x['AwayTeam'] == 'Liverpool':
            L.append(x)
csvfile.close()
with open('Liverpool.csv', 'wb') as output:
    fields = ['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']
    writer = csv.DictWriter(output, fieldnames=fields, extrasaction='ignore')
    writer.writeheader()
    writer.writerows(L)
output.close()

### The `json` format
JavaScript Object Notation

This format can store most of the types and also combine them:
numbers, strings, lists, dicts, list of lists, list of dicts etc.

The lists are marked with a comma separated list in brackets `[ ]`, the dict contains the usual `key:value` pairs in curly brackets `{ }`.

    {
        "Liverpool" : {
            "Players": [
                "Steven Gerrard",
                "Bill Shankly"
            ],
            "Results" : [
                {
                    "HomeTeam":"Liverpool",
                    "AwayTeam":"Tottenham",
                    "HTG":1,
                    "ATG":1
                },
                {
                    "HomeTeam":"West Ham",
                    "AwayTeam":"Liverpool",
                    "HTG":2,
                    "ATG":0
                }
            ],
            "Points":1,
            "Goals Scored":1,
            "Goals Condceded":3
        }
    }

Python cvan hanle this ormat with the `json` module.
After reading the file, the `data` is a dictionary, containing all sorts of objects.

The `u'Steven Gerrard'` means a unicode string (encoding).

In [None]:
import json
with open('Liverpool.json') as data_file:    
    data = json.load(data_file)

print data
print data['Liverpool']['Players']

Now write a json file! To look better you can use the `sort_keys`, `indent` and `separators` parameters.
The `json.dumps(obj)` returns a string which encodes an object (`obj`) in a json format.
We write that into a file and that's all.

In [None]:
import json
with open('Liverpool.json') as data_file:    
    data = json.load(data_file)
data_file.close()
with open('Liverpool_matches.json', 'wb') as f:
    f.write(json.dumps(data['Liverpool']['Results'], 
            sort_keys=True, indent=4, separators=(',', ': ')))

There are sevaral ways to handle json format:
* `json.dumps(obj)`: encodes `obj` to a JSON formatted string
* `json.dump(JSON_formatted_string, file)`: writes into a file
* `json.load(file)`: reads the content of `file` to a python object (it can be a complex python data)
* `json.loads(JSON_formatted_string)`: converts a JSON formatted string into a python object

More details in [https://docs.python.org/2/library/json.html](python docs).

## Command line arguments

We will run python codes as standalone programs!

### The `sys` module

Write a python code and save with the `.py` extension.
Your OS can recongise it as a python program or you can run with an interpreter.

You can communicate with your program  via `input` or with **command line arguments**.
Your very first code writes its arguments.
The first one is the name of the code file. The others are optional. The list `sys.argv` stores these parameters (list of strings).
You have to `import sys` first.

Save the followings as **cli.py** and run from command line.

~~~ python
import sys

print 'Number of arguments:', len(sys.argv)
print 'List of arguments:', str(sys.argv)

~~~

In [None]:
! python cli.py arg1 arg2

The `!` tells the notebook to run in command line, not as a python code.

You can use the values in `sys.argv` and we call them *positional parameters* since you can refer to them by their place in the list `sys.argv`.

Exercise: calculate the power of a number.
Write a python program which have two command line arguments: base and exponent.

If the numbers are integers then calculate as integers, otherwise calculate with floats.

Save the followings as **power.py**.

~~~ python
import sys

def is_intstring(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

a = []

for i in range(1,3):
    if is_intstring(sys.argv[i]):
        a.append(int(sys.argv[i]))
    else:
        a.append(float(sys.argv[i]))

print a[0] ** a[1]

~~~

This is how to run it.

In [None]:
!python power.py 4.2 3

!python power.py 2 100

## argparse
If you want to use more complex command line arguments, then you can use the `argparse` module.

The following is an example of a _flag_ used to mark two options: on or off.
For example if `verbosity` is on, then you want a lots of info, or if it's off then you want less info printed on your screen.

~~~ python
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--verbosity", help="increase output verbosity")
args = parser.parse_args()
print type(args.verbosity)
if args.verbosity:
    print "verbosity turned on"
else:
    print "verbosity turned off"
    
~~~

In [None]:
!python parser.py --verbosity 1

In [None]:
!python parser.py

This works for integer values 0 and 1 like, but a nicer solution is the `bool` type.
In this way you only have to write `-v` or `--verbose` or  nothing.

~~~ python
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-v", "--verbose", help="increase output verbosity", action="store_true")
args = parser.parse_args()
print type(args.verbose)
if args.verbose:
    print "verbosity turned on"
else:
    print "verbosity turned off"

~~~

In [None]:
!python parser2.py
!python parser2.py --verbose
!python parser2.py -v

You can even print a nice help menu with exmplanatory text (`help="increase output verbosity"`).

In [None]:
!python parser2.py --help

Back to the `.csv` file.
Let's say you want to know how many matches are there where a given team scored more than a given number of goals. 
For example how many matches did Liverpool score more than one goal?

The name of the team is the `-t` or `--team` argument, by default its `'Liverpool'` but you can set to other teams as well.

The minimum goal number is by default 0, but you can reset it with `-g` or `--goal`.

The `action='store'` option stores them into the `args` object. You can access them by `args.team` and `args.goals
`.

~~~ python
import argparse
import csv

parser = argparse.ArgumentParser()
parser.add_argument("-t", "--team", help="The team we are looking for", action="store", type=str, default='Liverpool')
parser.add_argument("-g", "--goals", help="Number of minimum goals scored", action="store", type=float, default=0)
args = parser.parse_args()

m = 0
team = args.team
goals = args.goals
with open('E0.csv', 'rb') as csvfile:
    reader = csv.DictReader(csvfile)
    for x in reader:
        if x['HomeTeam'] == team and float(x['FTHG']) >= goals:
            m += 1
        elif x['AwayTeam'] == team and float(x['FTAG']) >= goals:
            m += 1
print m

~~~

In [None]:
!python goals.py -h

In [None]:
!python goals.py -g 1