About strings¶

Special charactes¶

Some useful special characters:

new_line = "\n"         # 
horizontal_tab = "\t"   # can modify its width
rep = "\\"              # backslash
one_quotion = "\'"      # '
double_quotion = "\""   # "
############################# these come from the typewriter era
bell = "\a"             # alert, this actually beeps on some computers (hardly supported now)
carriage_return = "\r"  # jumps to the beginning of the same row
vertical_tab = "\v"     #

print "Tabular:\n\nFrom\there\n-------------------------\nTo \tthere"

Tabular:

From	here
-------------------------
To 	there

String operations¶

Substring:

s = "Firstsecondthird"
print s[:5]
print s[5:11]
print s[11:]

First
second
third

Concatenation (string to string):

s = "puppy"
print "5 " + s

5 puppy

print 5 + s

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-134562b9452d> in <module>()
----> 1 print 5 + s

TypeError: unsupported operand type(s) for +: 'int' and 'str'

s = "puppy"
quantity = 5
print str(quantity) + " " + s

5 puppy

The str() can convert something to string.

Useful string methods¶

Upper/lower case¶

s = "some LOWER-, some UPPERcase letters"

The following method converts every character lowercase, except the first, that becomes capital:

s.capitalize()

'Some lower-, some uppercase letters'

All capitals:

s.upper()

'SOME LOWER-, SOME UPPERCASE LETTERS'

All lowercase:

s.lower()

'some lower-, some uppercase letters'

Checks whether the string contains only alphabetic characters (comma and whitespace is not):

s.isalpha()

False

Substring and splitting¶

You can check whether one string contains an other and also its actual position (index):

'case' in s

True

s.find('case')

23

You can split a string into a list of strings with the split method. The string is split by the whitespaces by default (tab, space, newline ...).

s.split()   # split by whitespaces

['some', 'LOWER-,', 'some', 'UPPERcase', 'letters']

s = "first second\tthird\nfourth"
s.split()

['first', 'second', 'third', 'fourth']

s = "first, second, third, fourth"
s.split(", ")            # split by a given separator (a comma and a space)

['first', 'second', 'third', 'fourth']

Erase from the sides: `strip`¶

The following method erases the whitespace characters from the beginning and the end:

s = '  \t white spaces \t \n\n    '
s.strip()

'white spaces'

Strips given characters:

s = "...once uppon a time,"
print s.strip(".,")   # strips both sides
print s.rstrip(".,")  # strips from the end
print s.lstrip(".,")  # strips from the beginning

once uppon a time
...once uppon a time
once uppon a time,

Formatting¶

Alignment¶

s = "where"

print '0123456789'*3
print s.center(30)
print s.rjust(30)
print s.ljust(30)

012345678901234567890123456789
            where             
                         where
where

The parameter of the method tells the final width.

You can print a table nicely:

tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(13)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(7)
    tabular_string += "\n" 
print tabular_string

First row         -2   -310
Second row         3      1
Third row       -321     11

`format` method¶

The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}{0}'.format('X', 'Y', 'Z')

'X-Y-Z X, Y, Z, XXX'

The format marker "{ }" can have optional formatting instructions: {number:optional}

optional	Meaning
d	decimal
b	binary
o	octal
x, X	hex, capital HEX
f, F	float
e, E	exponential form: something times 10 to some power
<	left justified
>	right justified
^	centered
c^	centered but with a character `'c'` as padding

print "01234 01234 01234 0123456789"
print '{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center')

01234 01234 01234 0123456789
0123   1234     | **center**

"int {0:d},  hex {0:x} {0:X},  oct {0:o},  bin {0:b}".format(42)

'int 42,  hex 2a 2A,  oct 52,  bin 101010'

"{0}, {0:e}, {0:f}, {0:8.4f}, {0:15.1f}".format(-12.345)

'-12.345, -1.234500e+01, -12.345000, -12.3450,           -12.3'

You can also name the parameters, it is more convinient then indices.

'The center is: ({x}, {y})'.format(x=3, y=5)

'The center is: (3, 5)'

x1 = 3; y1 = 4
print 'The center is: ({x}, {y})'.format(x=x1, y=y1)

The center is: (3, 4)

table_string = ""
for row in tabular:
    table_string += "{0:_<13}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:>7d}".format(row[i])
    table_string += "\n" 
print table_string

First row____     -2   -310
Second row___      3      1
Third row____   -321     11

Regular expressions (RegEx) in python¶

You have to import these function, because they are not default. Put this in the beginning of your code.

import re

The findall function finds all matching substrings in a string:

pattern = r'[0-9]+'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.findall(pattern, string)

['1', '7']

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called raw string as pattern. In this format one backslash means actually one backspash. You don't have to escape the backslash.

If you put an r in front of the string, then it is in a raw format.

raw_string = r"aa\txx\s"
not_raw = "aa\txx\\"
print "Raw:     " + raw_string
print "Not raw: " + not_raw

Raw:     aa\txx\s
Not raw: aa	xx\

You can even substitute patterns.

pattern = r'[0-9]+'
substitute = r'12'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)

Once uppon a time there was 12 little puppy and 12 dwarfs

You can group the patterns and refer to them as numbers in the substitution: \1

pattern = r'[0-9]+\s([a-z]+)'
substitute = r'12 \1'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)

Once uppon a time there was 12 little puppy and 12 dwarfs