We will learn how read and write files from python and how to write a python program and run it from the command line.
Python - just like most of the languages - handles files through file objects.
The open(filename[, mode])
function opens a file and returns a file handle object (or raise an error).
The mode
can be 'r'
(read), 'w'
(write), 'r+'
(both), or in case of binary files: 'rb'
, 'wb'
, 'r+b'
.
f = open('E0.csv') # open for reading, returns file object
print f
The file object is not so useful on its own.
This file contains the English Premier League statistics from the season of 2015/16.
The f.read()
reads the whole file to a string. We don't print the whole:
f = open('E0.csv')
content = f.read()
print content[:100]
Read only one first line!
f = open('E0.csv')
first_line = f.readline()
print first_line
second_line = f.readline()
print second_line
The file object is iterable, row-wise:
Mind the newline character at the end of each line.
f = open('E0.csv')
L = []
for line in f:
L.append(line)
print L[:10]
The list L
now contains the rows of the file. You can split the lines into cells with .split(",")
but that's for later.
Let's say that you care only about the results of the team 'Liverpool'.
Write a file named 'Liverpool.csv'
containing anything.
For writing we open with open(filename, 'w')
. If you write file, don't forget to close it!
f = open('Liverpool.csv', 'w')
f.write('YNWA')
f.close()
Note: for reading a text file we use open('E0.csv', 'r')
but reading is the default, so you can skip the mode r
.
Let's read the results row-by-row and choose the rows containing the word 'Liverpool', save those line in 'Liverpool.csv'. The header of the file will be the same!
f = open('E0.csv')
L = [f.readline()]
for line in f:
if 'Liverpool' in line:
L.append(line)
with open('Liverpool.csv', 'w') as f:
for l in L:
f.write(l)
f.close()
Note that the write method does not write into a new line automatically, you have to insert the newline characters manually. In the example the lines already contained newline characters.
The with open(filename, 'rb') as f
is the same as f = open(filename, 'rb')
but the file will be closed at the end of the block.
In this way you can make sure to close the file and don't leave it open.
The former one is prefered due to safety reasons (against data corruption)!
csv
and json
in python¶The former file was a comma separated values file with the .csv
extenstion.
In those files the records follow each other line-by-line, inside a record the cells are separeted by comma (,
)
Python can handle this format with the csv
module.
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
L.append(row)
print L[0]
print L[19]
The difference is that the cells are handled, too. You can determine which delimiter to use (here ,
).
The quotechar
determines how to read strings with special characters or commas.
This is useful for example when you want to write the decimal number 2,25
in a .csv
file.
The csv
module handles all these.
Open the .csv
files as binary (rb
/wb
), the csv
module won't work properly otherwise.
In general opening a text file as binary is not a big problem, but opening a binary file as text may cause problems. More about the EOL characters. This is due to historical reasons from different operation systems.
dict
¶If you look the data closely, you can see that a dictionary would be even better. It's better to refer the cells by name and not index.
This format uses the first line (header) as dictionary keys.
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
L.append(row)
print L[0]
print L[1]
Store the data of Liverpool matches. We will write the 'Date', 'HomeTeam', 'AwayTeam', 'FTHG'(Full Time Home Goals), 'FTAG' (Full Time Away Goals), 'FTR' (Full Time Result) values!
We will use csv.DictWriter
to write the data, the writer.writeheader()
writes the header first,
then writer.writerows()
writes the actual data.
The fieldnames
parameter tells which fields (columns) to use.
The extrasaction='ignore'
ignores the other fields.
import csv
L=[]
with open('E0.csv', 'rb') as csvfile:
reader = csv.DictReader(csvfile)
for x in reader:
if x['HomeTeam'] == 'Liverpool' or x['AwayTeam'] == 'Liverpool':
L.append(x)
csvfile.close()
with open('Liverpool.csv', 'wb') as output:
fields = ['Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR']
writer = csv.DictWriter(output, fieldnames=fields, extrasaction='ignore')
writer.writeheader()
writer.writerows(L)
output.close()
json
format¶JavaScript Object Notation
This format can store most of the types and also combine them: numbers, strings, lists, dicts, list of lists, list of dicts etc.
The lists are marked with a comma separated list in brackets [ ]
, the dict contains the usual key:value
pairs in curly brackets { }
.
{
"Liverpool" : {
"Players": [
"Steven Gerrard",
"Bill Shankly"
],
"Results" : [
{
"HomeTeam":"Liverpool",
"AwayTeam":"Tottenham",
"HTG":1,
"ATG":1
},
{
"HomeTeam":"West Ham",
"AwayTeam":"Liverpool",
"HTG":2,
"ATG":0
}
],
"Points":1,
"Goals Scored":1,
"Goals Condceded":3
}
}
Python cvan hanle this ormat with the json
module.
After reading the file, the data
is a dictionary, containing all sorts of objects.
The u'Steven Gerrard'
means a unicode string (encoding).
import json
with open('Liverpool.json') as data_file:
data = json.load(data_file)
print data
print data['Liverpool']['Players']
Now write a json file! To look better you can use the sort_keys
, indent
and separators
parameters.
The json.dumps(obj)
returns a string which encodes an object (obj
) in a json format.
We write that into a file and that's all.
import json
with open('Liverpool.json') as data_file:
data = json.load(data_file)
data_file.close()
with open('Liverpool_matches.json', 'wb') as f:
f.write(json.dumps(data['Liverpool']['Results'],
sort_keys=True, indent=4, separators=(',', ': ')))
There are sevaral ways to handle json format:
json.dumps(obj)
: encodes obj
to a JSON formatted stringjson.dump(JSON_formatted_string, file)
: writes into a filejson.load(file)
: reads the content of file
to a python object (it can be a complex python data)json.loads(JSON_formatted_string)
: converts a JSON formatted string into a python objectMore details in https://docs.python.org/2/library/json.html.
We will run python codes as standalone programs!
sys
module¶Write a python code and save with the .py
extension.
Your OS can recongise it as a python program or you can run with an interpreter.
You can communicate with your program via input
or with command line arguments.
Your very first code writes its arguments.
The first one is the name of the code file. The others are optional. The list sys.argv
stores these parameters (list of strings).
You have to import sys
first.
Save the followings as cli.py and run from command line.
import sys
print 'Number of arguments:', len(sys.argv)
print 'List of arguments:', str(sys.argv)
! python cli.py arg1 arg2
The !
tells the notebook to run in command line, not as a python code.
You can use the values in sys.argv
and we call them positional parameters since you can refer to them by their place in the list sys.argv
.
Exercise: calculate the power of a number. Write a python program which have two command line arguments: base and exponent.
If the numbers are integers then calculate as integers, otherwise calculate with floats.
Save the followings as power.py.
import sys
def is_intstring(s):
try:
int(s)
return True
except ValueError:
return False
a = []
for i in range(1,3):
if is_intstring(sys.argv[i]):
a.append(int(sys.argv[i]))
else:
a.append(float(sys.argv[i]))
print a[0] ** a[1]
This is how to run it.
!python power.py 4.2 3
!python power.py 2 100
If you want to use more complex command line arguments, then you can use the argparse
module.
The following is an example of a flag used to mark two options: on or off.
For example if verbosity
is on, then you want a lots of info, or if it's off then you want less info printed on your screen.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--verbosity", help="increase output verbosity")
args = parser.parse_args()
print type(args.verbosity)
if args.verbosity:
print "verbosity turned on"
else:
print "verbosity turned off"
!python parser.py --verbosity 1
!python parser.py
This works for integer values 0 and 1 like, but a nicer solution is the bool
type.
In this way you only have to write -v
or --verbose
or nothing.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-v", "--verbose", help="increase output verbosity", action="store_true")
args = parser.parse_args()
print type(args.verbose)
if args.verbose:
print "verbosity turned on"
else:
print "verbosity turned off"
!python parser2.py
!python parser2.py --verbose
!python parser2.py -v
You can even print a nice help menu with exmplanatory text (help="increase output verbosity"
).
!python parser2.py --help
Back to the .csv
file.
Let's say you want to know how many matches are there where a given team scored more than a given number of goals.
For example how many matches did Liverpool score more than one goal?
The name of the team is the -t
or --team
argument, by default its 'Liverpool'
but you can set to other teams as well.
The minimum goal number is by default 0, but you can reset it with -g
or --goal
.
The action='store'
option stores them into the args
object. You can access them by args.team
and args.goals
.
import argparse
import csv
parser = argparse.ArgumentParser()
parser.add_argument("-t", "--team", help="The team we are looking for", action="store", type=str, default='Liverpool')
parser.add_argument("-g", "--goals", help="Number of minimum goals scored", action="store", type=float, default=0)
args = parser.parse_args()
m = 0
team = args.team
goals = args.goals
with open('E0.csv', 'rb') as csvfile:
reader = csv.DictReader(csvfile)
for x in reader:
if x['HomeTeam'] == team and float(x['FTHG']) >= goals:
m += 1
elif x['AwayTeam'] == team and float(x['FTAG']) >= goals:
m += 1
print m
!python goals.py -h
!python goals.py -g 1