# About strings

## Special charactes

Some useful special characters:

In [None]:
new_line = "\n"         # 
horizontal_tab = "\t"   # can modify its width
rep = "\\"              # backslash
one_quotion = "\'"      # '
double_quotion = "\""   # "
############################# these come from the typewriter era
bell = "\a"             # alert, this actually beeps on some computers (hardly supported now)
carriage_return = "\r"  # jumps to the beginning of the same row
vertical_tab = "\v"     # 

In [None]:
print "Tabular:\n\nFrom\there\n-------------------------\nTo \tthere"


## String operations

Substring:

In [None]:
s = "Firstsecondthird"
print s[:5]
print s[5:11]
print s[11:]

Concatenation (string to string):

In [None]:
s = "puppy"
print "5 " + s

In [None]:
print 5 + s

In [None]:
s = "puppy"
quantity = 5
print str(quantity) + " " + s

The <code style="color:green">str()</code> can convert something to string.

## Useful string methods
### Upper/lower case

In [None]:
s = "some LOWER-, some UPPERcase letters"

The following method converts every character lowercase, except the first, that becomes capital:

In [None]:
s.capitalize()

All capitals:

In [None]:
s.upper()

All lowercase:

In [None]:
s.lower()

Checks whether the string contains only alphabetic characters (comma and whitespace is not):

In [None]:
s.isalpha()

### Substring and splitting

You can check whether one string contains an other and also its actual position (index):

In [None]:
'case' in s

In [None]:
s.find('case')

You can split a string into a list of strings with the <code style="color:green">split</code> method.
The string is split by the whitespaces by default (tab, space, newline ...).

In [None]:
s.split()   # split by whitespaces

In [None]:
s = "first second\tthird\nfourth"
s.split()

In [None]:
s = "first, second, third, fourth"
s.split(", ")            # split by a given separator (a comma and a space)

### Erase from the sides: <code style="color:green">strip</code>

The following method erases the whitespace characters from the beginning and the end:

In [None]:
s = '  \t white spaces \t \n\n    '
s.strip()

Strips given characters:

In [None]:
s = "...once uppon a time,"
print s.strip(".,")   # strips both sides
print s.rstrip(".,")  # strips from the end
print s.lstrip(".,")  # strips from the beginning

## Formatting

### Alignment

In [None]:
s = "where"

In [None]:
print '0123456789'*3
print s.center(30)
print s.rjust(30)
print s.ljust(30)

The parameter of the method tells the final width.

You can print a table nicely:

In [None]:
tabular = [["First row", -2, -310], ["Second row", 3, 1], ["Third row",-321, 11]]
tabular_string = ""
for row in tabular:
    tabular_string += row[0].ljust(13)
    for i in range(1, len(row)):
        tabular_string += str(row[i]).rjust(7)
    tabular_string += "\n" 
print tabular_string

### <code style="color:green">format</code> method
The object is the formatting string and the parameters are the things to subtitute.

The numbers in the brackets mark the parameters.

In [None]:
'{0}-{1}-{2} {0}, {1}, {2}, {0}{0}{0}'.format('X', 'Y', 'Z')

The format marker `"{ }"` can have optional formatting instructions: `{number:optional}`

| optional | Meaning |
|:----|:---------|
| d | decimal | 
| b | binary |
| o | octal |
| x, X | hex, capital HEX |
| f, F | float |
| e, E | exponential form: something times 10 to some power |
| < | left justified |
| > | right justified |
| ^ | centered |
| c^ | centered but with a character `'c'` as padding |

In [None]:
print "01234 01234 01234 0123456789"
print '{0:5} {1:5d} {2:>5} {3:*^10}'.format('0123', 1234, '|', 'center')

In [None]:
"int {0:d},  hex {0:x} {0:X},  oct {0:o},  bin {0:b}".format(42)

In [None]:
"{0}, {0:e}, {0:f}, {0:8.4f}, {0:15.1f}".format(-12.345)

You can also name the parameters, it is more convinient then indices.

In [None]:
'The center is: ({x}, {y})'.format(x=3, y=5)

In [None]:
x1 = 3; y1 = 4
print 'The center is: ({x}, {y})'.format(x=x1, y=y1)

In [None]:
table_string = ""
for row in tabular:
    table_string += "{0:_<13}".format(row[0])
    for i in range(1, len(row)):
        table_string += "{0:>7d}".format(row[i])
    table_string += "\n" 
print table_string

# Regular expressions (RegEx) in python

You have to import these function, because they are not default.
Put this in the beginning of your code.

In [None]:
import re

The <code style="color:green">findall</code> function finds all matching substrings in a string:

In [None]:
pattern = r'[0-9]+'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.findall(pattern, string)

Since backslash and other special characters can be used in a RegEx pattern, you have to be careful with them.

The best if you use a so called __raw__ string as pattern. In this format one backslash means actually one backspash.
You don't have to escape the backslash.

If you put an <code style="color:green">r</code> in front of the string, then it is in a raw format.

In [None]:
raw_string = r"aa\txx\s"
not_raw = "aa\txx\\"
print "Raw:     " + raw_string
print "Not raw: " + not_raw

You can even substitute patterns.

In [None]:
pattern = r'[0-9]+'
substitute = r'12'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)

You can group the patterns and refer to them as numbers in the substitution: <code style="color:green">\1</code>

In [None]:
pattern = r'[0-9]+\s([a-z]+)'
substitute = r'12 \1'
string = "Once uppon a time there was 1 little puppy and 7 dwarfs"
print re.sub(pattern, substitute, string)