About strings

Special charactes

Some useful special characters:

In [1]:
new_line = "\n"         # 
horizontal_tab = "\t"   # can modify its width
rep = "\\"              # backslash
one_quotion = "\'"      # '
double_quotion = "\""   # "
############################# these come from the typewriter era
bell = "\a"             # alert, this actually beeps on some computers (hardly supported now)
carriage_return = "\r"  # jumps to the beginning of the same row
vertical_tab = "\v"     # 
Example for \n and \t:
In [2]:
print "Tabular:\n\nFrom\there\n-------------------------\nTo \tthere"
Tabular:

From	here
-------------------------
To 	there

String operations

Substring:

In [3]:
s = "Firstsecondthird"
print s[:5]
print s[5:11]
print s[11:]
First
second
third

Concatenation (string to string):

In [4]:
s = "puppy"
print "5 " + s
5 puppy
In [5]:
print 5 + s
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-134562b9452d> in <module>()
----> 1 print 5 + s

TypeError: unsupported operand type(s) for +: 'int' and 'str'
In [6]:
s = "puppy"
quantity = 5
print str(quantity) + " " + s
5 puppy

The str() can convert something to string.

Useful string methods

Upper/lower case

In [7]:
s = "some LOWER-, some UPPERcase letters"

The following method converts every character lowercase, except the first, that becomes capital:

In [8]:
s.capitalize()
Out[8]:
'Some lower-, some uppercase letters'

All capitals:

In [9]:
s.upper()
Out[9]:
'SOME LOWER-, SOME UPPERCASE LETTERS'

All lowercase:

In [10]:
s.lower()
Out[10]:
'some lower-, some uppercase letters'

Checks whether the string contains only alphabetic characters (comma and whitespace is not):

In [11]:
s.isalpha()
Out[11]:
False

Substring and splitting

You can check whether one string contains an other and also its actual position (index):

In [12]:
'case' in s
Out[12]:
True
In [13]:
s.find('case')
Out[13]:
23

You can split a string into a list of strings with the split method. The string is split by the whitespaces by default (tab, space, newline ...).

In [14]:
s.split()   # split by whitespaces
Out[14]:
['some', 'LOWER-,', 'some', 'UPPERcase', 'letters']
In [15]:
s = "first second\tthird\nfourth"
s.split()
Out[15]:
['first', 'second', 'third', 'fourth']
In [16]:
s = "first, second, third, fourth"
s.split(", ")            # split by a given separator (a comma and a space)
Out[16]:
['first', 'second', 'third', 'fourth']

Erase from the sides: strip

The following method erases the whitespace characters from the beginning and the end:

In [17]:
s = '  \t white spaces \t \n\n    '
s.strip()
Out[17]:
'white spaces'

Strips given characters:

In [18]:
s = "...once uppon a time,"
print s.strip(".,")   # strips both sides
print s.rstrip(".,")  # strips from the end
print s.lstrip(".,")  # strips from the beginning
once uppon a time
...once uppon a time
once uppon a time,

Formatting

Alignment

In [19]:
s = "where"
In [20]:
print '0123456789'*3
print s.center(30)
print s.rjust(30)
print s.ljust(30)
012345678901234567890123456789
            where             
                         where
where                         

The parameter of the method tells the final width.

You can print a table nicely:

In [21]:
tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(13)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(7)
    tabular_string += "\n" 
print tabular_string
First row         -2   -310
Second row         3      1
Third row       -321     11

format method

The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

In [22]:
'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}{0}'.format('X', 'Y', 'Z')
Out[22]:
'X-Y-Z X, Y, Z, XXX'

The format marker "{ }" can have optional formatting instructions: {number:optional}

optional Meaning
d decimal
b binary
o octal
x, X hex, capital HEX
f, F float
e, E exponential form: something times 10 to some power
< left justified
> right justified
^ centered
c^ centered but with a character 'c' as padding
In [23]:
print "01234 01234 01234 0123456789"
print '{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center')
01234 01234 01234 0123456789
0123   1234     | **center**
In [24]:
"int {0:d},  hex {0:x} {0:X},  oct {0:o},  bin {0:b}".format(42)
Out[24]:
'int 42,  hex 2a 2A,  oct 52,  bin 101010'
In [25]:
"{0}, {0:e}, {0:f}, {0:8.4f}, {0:15.1f}".format(-12.345)
Out[25]:
'-12.345, -1.234500e+01, -12.345000, -12.3450,           -12.3'

You can also name the parameters, it is more convinient then indices.

In [26]:
'The center is: ({x}, {y})'.format(x=3, y=5)
Out[26]:
'The center is: (3, 5)'
In [27]:
x1 = 3; y1 = 4
print 'The center is: ({x}, {y})'.format(x=x1, y=y1)
The center is: (3, 4)
In [28]:
table_string = ""
for row in tabular:
    table_string += "{0:_<13}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:>7d}".format(row[i])
    table_string += "\n" 
print table_string
First row____     -2   -310
Second row___      3      1
Third row____   -321     11

Regular expressions (RegEx) in python

You have to import these function, because they are not default. Put this in the beginning of your code.

In [29]:
import re

The findall function finds all matching substrings in a string:

In [30]:
pattern = r'[0-9]+'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.findall(pattern, string)
['1', '7']

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called raw string as pattern. In this format one backslash means actually one backspash. You don't have to escape the backslash.

If you put an r in front of the string, then it is in a raw format.

In [31]:
raw_string = r"aa\txx\s"
not_raw = "aa\txx\\"
print "Raw:     " + raw_string
print "Not raw: " + not_raw
Raw:     aa\txx\s
Not raw: aa	xx\

You can even substitute patterns.

In [32]:
pattern = r'[0-9]+'
substitute = r'12'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)
Once uppon a time there was 12 little puppy and 12 dwarfs

You can group the patterns and refer to them as numbers in the substitution: \1

In [33]:
pattern = r'[0-9]+\s([a-z]+)'
substitute = r'12 \1'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)
Once uppon a time there was 12 little puppy and 12 dwarfs
In [ ]: