I've taken a keen interest in some of the parallel implementations of Monte Carlo algorithms he's written about, most recently he wrote an implementation using Scala. You can check out the post here.

I decided to try my hand at writing an implementation of Darren's toy problem using python and NumPy. It also gave me the opportunity to do some parallel programming in python since most of the parallel programming I do is either in C or C++ using MPI. For a reminder of his toy problem, see the post linked above.

I'll cut straight to the chase and just post the code. I've commented it so it should be fairly easy to read.

```
import sys
import numpy as np
from multiprocessing import Pool
def integrate(its):
# I totally cheated and tweaked the number of chunks
# to get the fastest result
chunks = 10000
chunk_size = its / chunks
np.random.seed() # Each process needs a different seed!
sum = 0.0
for i in xrange(chunks): # For each chunk...
# ...do a vectorised Monte Carlo calculation
u = np.random.uniform(size=its/chunks)
sum += np.sum(np.exp(-u * u)) # Do the Monte Carlo
# We did 'its' total iterations in this process, so
# normalise the result and return it
return sum / float(its)
if __name__ == '__main__':
num_procs = int(sys.argv[1])
iters = 1000000000
its = iters / num_procs # Each process gets a share of the iterations
pool = Pool(processes=num_procs)
# Each process calls 'integrate' with 'its' iterations
result = pool.map(integrate, [its] * num_procs)
# pool.map returns a list of length 'num_procs', with
# element 'i' being the return value of 'integrate' from
# process 'i'
# Renormalise by the number of processors
print sum(result) / float(num_procs)
```

And here are the timings. I just used the linux `time`

command, so take these with a pinch of salt.

```
num_procs time
1 0m23.578s
2 0m12.111s
3 0m8.490s
4 0m7.898s
5 0m7.492s
6 0m6.989s
7 0m6.497s
8 0m6.353s
9 0m6.402s
10 0m6.459s
11 0m6.460s
12 0m6.525s
16 0m6.680s
32 0m7.333s
64 0m8.692s
```

Can you guess how many cores my laptop has?

Comparing with the Scala code yields some interesting things. The python + NumPy code above will run anywhere from two (`num_proc = 8`

) to four times (`num_proc = 1`

) faster than the Scala code. However, note that I did spend some time tweaking the chunksize. The use of vectorising some of the Monte Carlo calculations in NumPy also helps a lot.

**Note:** If you plan on running this, be careful passing in a number of processes that doesn't divide the number of iterations, since all of those divisions are integer divisions in python 2.x.

`vim`

on a `python`

script that I can save and reuse later for reproducibility. My workflow goes something like this:
- Write some basic plotting script and have the script save the plot as a
`pdf`

file. - Run the script.
- Open the plot and decide something needs tweaking (colours, fonts, labels, etc).
- Go back to
`vim`

and make the tweak. Save the script. - Re-run the script.
- Go to 3.

I do this until I am satisfied my plot will look nice in my paper/slides/poster. This is kind of a pain. It takes a lot of time because I'm a perfectionist when it comes to plotting.

Using the matplotlib GUI backends interactively from an IPython session means I can tweak my plot and get immediate visual feedback. This makes steps 3. and 4. above (sans `vim`

) much quicker. I can also save my plot from the GUI as whatever file type I like. The downside is once I save the plot from the GUI and close my IPython session, I have lost all the commands that created the plot. I've lost the reproducibility.

This is what a typical session might look like:

```
In [1]: import matplotlib
In [2]: matplotlib.use('qt4agg')
In [3]: import matplotlib.pyplot as plt
In [4]: plt.ion()
In [5]: fig = plt.figure()
In [6]: ax = fig.add_subplot(1, 1, 1)
In [7]: ax.plot([1, 2, 3])
Out[7]: [<matplotlib.lines.Line2D at 0x11178f5d0>]
In [8]: plt.draw() # Now look at the plot, decide tweaks
In [9]: # Some tweak here...
In [10]: plt.draw() # Look again. Decide it's ok.
In [11]: plt.savefig('high_quality.pdf')
In [12]: # End session
Do you really want to exit ([y]/n)?
```

To recover the reproducibility aspect, I'd like to be able to save everything I did to a file (a python script I can re-run later to reproduce the figure).

Turns out, there's an `IPython`

magic function that does this that I didn't know about. Here's what a typical `IPython`

session would look like using the `%save`

magic:

```
In [1]: import matplotlib
In [2]: matplotlib.use('qt4agg')
In [3]: import matplotlib.pyplot as plt
In [4]: plt.ion()
In [5]: fig = plt.figure()
In [6]: ax = fig.add_subplot(1, 1, 1)
In [7]: ax.plot([1, 2, 3])
Out[7]: [<matplotlib.lines.Line2D at 0x11178f5d0>]
In [8]: plt.draw() # Now look at the plot, decide tweaks
In [9]: # Some tweak here...
In [10]: plt.draw() # Look again. Decide it's ok.
In [11]: plt.savefig('high_quality.pdf')
In [12]: %save plotting_script.py 1-11
```

Now I have all the commands I ran in the `plotting_script.py`

. I can rerun this and produce exactly the same plot.

`numpy.cumsum`

is pretty useful. It returns an array of the running sum of elements from another array.
```
In [1]: a = np.array([1, 2, 3, 4])
In [2]: np.cumsum(a)
Out[2]: array([ 1, 3, 6, 10])
```

It's sometimes the case that I don't *quite* want to compute the cumulative sum, but instead some small variation. Say I wanted to compute this instead: `b[i] = b[i-1] + c * a[i]`

for each `i = 1, 2, 3, ..., n`

.

Well now I'm stuffed. I can't hack `cumsum`

to give me the right thing. Although, during my quest of trying, I discovered that `numpy.cumsum`

is identical to `numpy.add.acumulate`

. Then I read the documentation for `accumulate`

and understood that any binary `numpy.ufunc`

will implement the `accumulate`

method and 'do the right thing'. The function `numpy.multiply`

is one such example and gives rise to `numpy.cumprod`

.

So, now I needed to write my own `ufunc`

. Turns out it's kind of a disaster when you just want something quick and dirty.

`numpy.frompyfunc`

Well, it turns out I don't need to write any `c`

. I can define a normal python binary function:

```
In [3]: def f(x, y):
...: return x + 2.0 * y
...:
```

And I can convert it to a `ufunc`

like so:

```
In [4]: f_ufunc = np.frompyfunc(f, 2, 1) # f takes 2 args and returns one
```

Blast! Ye not work:

```
In [13]: a = np.array([1.0, 2.0, 3.0, 4.0])
In [14]: f_ufunc.accumulate(a)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-6afc67f05a6b> in <module>()
----> 1 f_ufunc.accumulate(a)
ValueError: could not find a matching type for f (vectorized).accumulate, requested type has type code 'd'
```

Turns out I didn't have to do any of the hard work. There's an open (as of `numpy`

version 1.8.0) ticket on github that gives the workaround:

```
In [15]: a = np.array([1.0, 2.0, 3.0, 4.0], dtype=object)
In [16]: f_ufunc.accumulate(a)
Out[16]: array([1.0, 5.0, 11.0, 19.0], dtype=object)
```

Victory.

]]>`f`

with respect to `x`

is as follows:
```
def derivative(f, x, h=1e-8):
return (f(x) - f(x + h)) / h
```

From the definition of the derivative,

$$f'(x) = \text{lim}_{h \to 0} \frac{f(x+h) - f(x)}{h},$$

the smaller you make `h`

, the more accurate your computed derivative gets. The numerical analysts reading this are probably already cringing; don't worry. For those that are not numerical analysts, it turns out that computing derivatives this way on a computer is not a good idea. Not only is this calculation merely an approximation to the derivative, roundoff error makes the computation incorrect when `h`

gets too small. Instead, we're going to take a different route, a route that can produce exact derivatives for certain classes of functions.

First we need a few mathematical definitions of some objects that I will be using. These aren't too complicated.

We define the set of dual numbers $\mathbb{D} := \{ a + b \varepsilon \, \, | \, \, a, b \in \mathbb{R} \}$, with the added property that $\varepsilon^2 = 0$. These look a bit like complex numbers but with an $\varepsilon$ instead of an $i$. In fact, if, instead of $\varepsilon^2 = 0$, we had that $\varepsilon^2 = -1$ then $\mathbb{D}$ would indeed be the set of complex numbers, $\mathbb{C}$.

We now need to define what it means to add and multiply two dual numbers together. One could write down many definitions of what it means to add two dual numbers, but we'll take the 'obvious' one:

$$(a + b \varepsilon) + (c + d \varepsilon) = (a + c) + (b + d) \varepsilon.$$

That is, we just add the like-terms of each dual number.

Note that $\varepsilon$ is *not* a real number. We make no attempt at understanding what numbers like $1 + 3 \varepsilon$ actually mean. The symbol $\varepsilon$ is just that, a symbol. It is a symbol that follows the rule that when you multiply it by itself, you get zero. With that in mind, we define what it means to multiply two dual numbers together like so,

$$ \begin{aligned} (a + b \varepsilon)(c + d \varepsilon) &= ac + ad \varepsilon + bc \varepsilon + bd \varepsilon^2 \\ &= ac + (ad + bc) \varepsilon, \end{aligned} $$

since $\varepsilon^2 = 0$.

You can also divide dual numbers. I won't go through it, but it's a good exercise. Given two dual numbers, $a + b \varepsilon$ and $c + d \varepsilon$, work out an expression for $\frac{a + b \varepsilon}{c + d \varepsilon}$. Furthermore, can you find any conditions on $a, b, c, d$ such that the division doesn't work? An obvious one is when $c = d = 0$, but can you find others?

So we have written down what it means to add or multiply two dual numbers together. Using this, we can implement a dual number class in python relatively easily:

```
class DualNumber(object):
"""
Dual numbers have the form z = a + be
"""
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return str(self.a) + " + " + str(self.b) + " * e"
def __add__(self, other):
return DualNumber(self.a + other.a, self.b + other.b)
```

The `__add__`

method is a python magic method that allows us to implement the `+`

operation between two objects of type `DualNumber`

:

```
In [1]: d1 = DualNumber(1, 2)
In [2]: print d1
1 + 2 * e
In [3]: d2 = DualNumber(3, 4)
In [4]: print d2
3 + 4 * e
In [5]: d1 + d2
Out[5]: 4 + 6 * e
```

Notice we can't add a `DualNumber`

to an `int`

yet:

```
In [6]: d1 + 4
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-3-2f7f1012ae70> in <module>()
----> 1 d1 + 4
/Users/damon/ad.py in __add__(self, other)
11
12 def __add__(self, other):
---> 13 return DualNumber(self.a + other.a, self.b + other.b)
14
15 def __sub__(self, other):
AttributeError: 'int' object has no attribute 'a'
```

Inside `__add__`

, the variable `other`

is of type `int`

, which does not have a member variable called `a`

. To fix this, we can add some logic:

```
def __add__(self, other):
if not isinstance(other, DualNumber):
new_other = DualNumber(other, 0)
else:
new_other = other
return DualNumber(self.a + new_other.a, self.b + new_other.b)
```

All I'm doing here is making sure that the thing on the right-hand side of the `+`

operation is an object of type `DualNumber`

and, if not, create one. Numbers like `int`

s and `float`

s can trivially by converted to objects of type `DualNumber`

, you just set the coefficient of $\epsilon$ to zero.

```
In [7]: d1 + 4
Out[7]: 5 + 2 * e
```

The `__mul__`

method works similarly:

```
def __mul__(self, other):
"""
Multiplies two dual numbers together
"""
if not isinstance(other, DualNumber):
new_other = DualNumber(other, 0)
else:
new_other = other
new_a = self.a * new_other.a
new_b = self.a * new_other.b + self.b * new_other.a
return DualNumber(new_a, new_b)
```

Now we can multiply two objects of type `DualNumber`

:

```
In [8]: d1 * d2
Out[8]: 3 + 10 * e
```

Most functions we want to differentiate aren't defined on the space of dual numbers. Mostly we see functions that take a real number as input and give another real number back, $f : \mathbb{R} \to \mathbb{R}$. However, we know how to add, multiply, and divide dual numbers together, so for functions that only involve multiplication, addition, and division, we can actually apply them to dual numbers fairly easily.

For example, take the polynomial $p(x) = 3x^2 + 2x + 1$. In python this looks like:

```
def p(x):
return 3 * x * x + 2 * x + 1
```

Since we know how to multiply and add dual numbers, we can plug in an object of type `DualNumber`

here and everything works as expected:

```
In [9]: p(d1)
Out[9]: 6 + 16 * e
```

So, if $f : \mathbb{R} \to \mathbb{R}$ only involves multiplication, addition, or division, then we can extend the function to the duals, $\hat{f} : \mathbb{D} \to \mathbb{D}$.

You might be getting bored by now, and I don't blame you. We haven't exactly done anything Earth-shattering here. How do we use the dual numbers to get derivatives? Well, what happens if we plug in a general dual number and compute the result by hand? Using the polynomial $p$ we defined above,

$$
\begin{aligned}
p(a + b \epsilon) &= 3(a + b \epsilon)(a + b \epsilon) + 2(a + b \epsilon) + 1 \\

&= 3a^2 + 6ab \epsilon + 3b^2 \epsilon^2 + 2a + 2b \epsilon + 1 \\
&= 3a^2 + 6ab \epsilon + 2a + 2b \epsilon + 1 \\
&= (3a^2 + 2a + 1) + (6ab + 2b) \epsilon \\
&= (3a^2 + 2a + 1) + b(6a + 2) \epsilon
\end{aligned}
$$

Let $b = 1$ and let $a = x$:

$$
\begin{aligned}
p(x + \epsilon) &= (3x^2 + 2x + 1) + (6x + 2) \epsilon \\

&= p(x) + p'(x) \epsilon
\end{aligned}
$$

The first term is just $p(x)$, but the second term is $p'(x) \epsilon$. So if we plug in $x + \epsilon$ to any given function $f$, and just look at the coefficient of $\epsilon$, we get the derivative:

```
def derivative(f):
def f_prime(x):
# Evaluate at x + e
f_x_plus_epsilon = f(DualNumber(x, 1))
# Get the coefficient of e
deriv = f_x_plus_epsilon.b
return deriv
return f_prime
```

Returning the derivative like this means we can keep a hold of the callable:

```
In [10]: p_prime = derivative(p)
In [11]: p(3) # Should print 3 * 3 * 3 + 2 * 3 + 1 = 34
Out[11]: 34
In [12]: p_prime(3) # Should print 6 * 3 + 2 = 20
Out[12]: 20
```

It's not unusual to ask for the second, third, or even higher derivatives. The method outlined above can be easily extended to suit these needs.

For example, if you want second derivatives you need a new set of numbers of the form $a + b \epsilon + c \epsilon^2$, but with a different condition: $\epsilon^3 = 0$.

Can you implement a class in python to compute second derivatives?

Automatic differentiation can only be used if the functions you are trying to differentiate only involve operations defined on the dual numbers. In our case, those operations are addition, multiplication and division, so polynomials (or quotients thereof) are a good fit here. Functions like $f(x) = \sin (x)$ cannot be differentiated in this way, since one needs to make sense of $\sin (x + \epsilon)$.

To differentiate more general functions than polynomials, one needs to have access to the Taylor series of that function. Functions that are $k$ times continuously differentiable can be approximated by a polynomial of degree $k$. Now we have a polynomial again and can use automatic differentiation.

Here's some complete code for you to throw rocks at:

```
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
class DualNumber(object):
"""
Dual numbers have the form z = a + be
"""
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return str(self.a) + " + " + str(self.b) + " * e"
def __add__(self, other):
if not isinstance(other, DualNumber):
new_other = DualNumber(other, 0)
else:
new_other = other
return DualNumber(self.a + new_other.a, self.b + new_other.b)
def __mul__(self, other):
if not isinstance(other, DualNumber):
new_other = DualNumber(other, 0)
else:
new_other = other
return DualNumber(self.a * new_other.a, self.a * new_other.b + self.b * new_other.a)
def derivative(f):
def f_prime(x):
f_x_plus_epsilon = f(DualNumber(x, 1))
deriv = f_x_plus_epsilon.b
return deriv
return f_prime
def f(x):
return x * x * x
fp = derivative(f)
x = np.linspace(-2, 2, num=1000)
y = f(x)
yp = [fp(xi) for xi in x] # Unfortunately we can't vectorise here
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, y, label=r'$f(x) = x^3$')
ax.plot(x, yp, label=r"$f' (x) = 3x^2$")
ax.legend()
plt.show()
```

Fin.

]]>In my last post, I gave a very brief introduction into matplotlib. This time I'd like to cover some of the more finicky details. Details that pertain to making figures look good in your journal publications.

We'll cover three main aspects of plot aesthetics. First, font size. Optimising font size is crucial to making your figures look the part in journals. Second, when the fonts in your plots don't match the fonts in the main body text, the flow of the document becomes jarring. We'll talk about how to fix that. Third, tick marks. Usually, figures need to be shrunk down to fit in a page nicely. Adjusting the number of tick marks can make your figures look a lot less busy. We'll also go over optimising the physical size of your graphics so that the effect of shrinking your figures down to fit in the page doesn't undo all the hard work you just put in to make your graphs look good.

To wrap up, I'm going to try package all these touches into a single file. This should make it easy to create reproducible figures.

The first thing to notice is that the font size will need some adjust for most journals. Usually, printed font size is around `10pt`

or `11pt`

. Setting the font size of the various text elements in your plots is, of course, a matter of personal preference. That said, I tend to make the font size in my figures one size smaller than that of the text in the main body of the article. Your journal's website should tell you what font size they use.

There are three main font elements in a matplotlib figure; the axes labels; tick labels; and the text inside a legend. The most common way to adjust these is to manipulate the `rcParams`

dictionary. In the following example python code, we set the font size of each of these elements to `9pt`

.

```
from matplotlib import rcParams
rcParams['axes.labelsize'] = 9
rcParams['xtick.labelsize'] = 9
rcParams['ytick.labelsize'] = 9
rcParams['legend.fontsize'] = 9
```

As long as you remember to do this before saving the figure, you should see the changes. I usually leave all the `rcParams`

related foo at the top of my script, just to keep it all in one place.

Probably the most important issue is the font the is used in the typesetting of your figure's text elements. By default, matplotlib does not use the ubiquitous Computer Modern Roman font. In fact, by default, it does not even use `LaTeX`

to typeset the text. To do so, see the following:

```
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.serif'] = ['Computer Modern Roman']
rcParams['text.usetex'] = True
```

A lot of journals use Computer Modern Roman, but some don't. Times is a commonly used font. Using a nondefault font is a little more tricky and I will address that later.

When figures get shrunk down to fit inside a column of a journal article, it might be worth adjusting the number of ticks there are on the y-axis. Matplotlib tries to be clever and gives you a 'nice-looking' number of ticks, but sometimes it doesn't cut the mustard. To fiddle around, you'll need to play with a `ticker`

object. We'll use a convenience ticker called `MaxNLocator`

:

```
from matplotlib.ticker import MaxNLocator
my_locator = MaxNLocator(6)
# Set up axes and plot some awesome science
ax.yaxis.set_major_locator(my_locator)
```

The number `6`

just says, "give me no more than 6 ticks, and give them to me

at nice locations". It doesn't actually say that, I just look at code so long

all day I feel like it starts talking to me. Coffee helps.

This is probably the most important of all the setting mentioned so far. Setting the physical figure dimensions appropriately will mean that your graphs aren't 'shrunk down' to fit into your papers. This means font sizes will stay true to their set values. This is also the most finicky setting.

We will optimise figure dimensions for the specific `LaTeX`

document that you are editing your paper in. Presumably you will be using some journal's custom style file. We need the width, in points, of where your figure will go. Add this to your `tex`

file to force the compiler to show you the text width:

```
\documentclass{some_journal}
\usepackage{some_packages,here}
% maybe some other preamble goes here
\begin{document}
% some awesome science goes here
\showthe\textwidth % <-- this tells you the textwidth
% some more awesome science
\end{document}
```

This number will probably be 300--400 pts. Lets's say 350pt. The following python code will convert this number to figure dimensions that look aesthetically pleasing:

```
WIDTH = 350.0 # the number latex spits out
FACTOR = 0.45 # the fraction of the width you'd like the figure to occupy
figwidthpt = WIDTH * FACTOR
inchesperpt = 1.0 / 72.27
golden_ratio = (np.sqrt(5) - 1.0) / 2.0 # because it looks good
figwidthin = fig_width_pt * inches_per_pt # figure width in inches
figheightin = fig_width_in * golden_ratio # figure height in inches
fig_dims = [fig_width_in, fig_height_in] # fig dims as a list
```

You can then either set the figure dimensions when creating the figure:

```
fig = plt.figure(figsize=fig_dims)
```

Or you can pre compute `fig_dims`

by hand (say it's `7.3`

wide by `4.2`

high) and set the corresponding entry in the `matplotlibrc`

file:

```
figure.figsize : 7.3, 4.2
```

Then, when the time comes to include figures into your document, remember to use the `FACTOR`

variable to have your plots scaled properly:

```
\documentclass{some_journal}
\usepackage{some_packages,here,graphicx}
% maybe some other preamble goes here
\begin{document}
% some awesome science goes here
\begin{figure}
\includegraphics[width=0.45\textwidth]{figure.pdf} % <-- FACTOR = 0.45
\end{figure}
% some more awesome science
\end{document}
```

Now your figures should look good. This is all personal preference, though. Feel free to play around and find something you like.

Well, those four issues are the ones whose defaults irk me the most, but it's kind of a pain to type out all that code for each plot I do. Even still, I might be writing several papers at once, each containing several plots. There's a better way to manage this, and it's to use the `matplotlibrc`

file.

As a quick introduction, the keys of the `rcParams`

dictionary correspond to entries in the `matplotlibrc`

file. For the issues discussed above, here are the mappings from the `rcParams`

dictionary to the entry in the `matplotlibrc`

file:

```
# rcParams dict
rcParams['axes.labelsize'] = 9
rcParams['xtick.labelsize'] = 9
rcParams['ytick.labelsize'] = 9
rcParams['legend.fontsize'] = 9
rcParams['font.family'] = 'serif'
rcParams['font.serif'] = ['Computer Modern Roman']
rcParams['text.usetex'] = True
rcParams['figure.figsize'] = 7.3, 4.2
# matplotlibrc file
axes.labelsize : 9.0 # fontsize of the x any y labels
xtick.labelsize : 9.0 # fontsize of the tick labels
ytick.labelsize : 9.0 # fontsize of the tick labels
legend.fontsize : 9.0
font.family : serif
font.serif : Computer Modern Roman
text.usetex : True # use latex for all text handling
figure.figsize : 7.3, 4.2
```

Usually, the `matplotlibrc`

file lives in `~/.matplotlib`

. If it doesn't, system-wide defaults are used. If it does, then that file determines the defaults to be used for all plots you create. The downside is that it's likely the case, especially if you're working on more than one paper, that you'd like a different set of defaults to apply to each one.

Enter `rc_file`

. This function takes a path to a file that is treated as a `matplotlibrc`

file. Now you can keep your default settings in one place for each journal. Moreover, if you submit any new papers to that journal, all your plots will look the same across multiple papers. Just copy the matplotlib defaults, from here, override with your defaults, and then stick the following at the top of your plotter:

```
from matplotlib import rc_file
rc_file('/path/to/my/rc/file.rc') # <-- the file containing your settings
import matplotlib.pyplot as plt
fig = plt.figure()
# plot some awesome science
fig.tight_layout(pad=0.1) # Make the figure use all available whitespace
fig.savefig('awesome_science.pdf')
```

Notice the use of `tight_layout`

, which makes the figure use all the available whitespace.

Now all your plots will look the same, every time, and for every journal. Your plots are now reproducible.

]]>I want to write about something that I care about. I care about software and I care about science. More importantly, I care about how these two interact and how software can help the scientific community. Presentation of science to the public in an understandable fashion is difficult. It's always hard to gauge at what level a scientist should pitch their presentation, and that's where pictures come in handy.

Most people abosrb information best visually. Auditory and kinematic learners also exist, but most people are visual learners. To that end, I decided to write about matplotlib, a plotting library written in Python. This should be fairly introductory.

For some reason, installing python packages is notoriously hard if you're not exposed to the various different approaches already out there. I'm not going to go through all of them. Instead, I recommend you install matplotlib through your operating system's package manager. If you're on Ubuntu, it's `aptitude`

. If you're on RHEL, it's `yum`

. If you're on OS X, you don't have a package manager by default and you should install one. I recommend macports, but others speak volumes about homebrew.

Personally, I like to install software from source by hand. If you're scared of doing it you should give it a try. It's super educational. I will go through the installation from source procedure on a Mac in another blog post, since I don't want to bloat this one. In what remains, I'll assume you have it installed.

To check you have installed matplotlib correctly, you can execute the following

in a terminal:

```
$ python -c "import matplotlib"
$ echo $?
0
```

The command `echo $?`

just prints out the exit status of the previous command. If you get a `0`

then the previous command was successful. If you get a `1`

, or anything nonzero, then matplotlib is not installed correctly.

Now that matplotlib is installed correctly, you can create your first plot.

From now on, I'll be using the IPython prompt, but you don't need that to along. All the commands are the same and will work in the regular python prompt.

First, we boot up the `ipython`

console:

```
$ ipython
Python 2.7.3 (default, Nov 29 2012, 11:01:29)
Type "copyright", "credits" or "license" for more information.
IPython 0.13.2 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]:
```

Now we import `pyplot`

, a set of convenience methods that interface well with matplotlib's internals:

```
In [1]: import matplotlib.pyplot as plt
```

Next, we create a `Figure`

object:

```
In [2]: fig = plt.figure()
```

The `Figure`

is the object that holds the axes, which we create now:

```
In [3]: ax = fig.add_subplot(1, 1, 1)
```

The `Axes`

[1] object is where most of the magic happens. This is the object you will interface with the most to do all of your plotting. Here is a simple plotting command that creates a line plot:

```
In [4]: ax.plot([1, 2, 3], [1, 4, 9])
Out[4]: [<matplotlib.lines.Line2D at 0x10860ad50>]
```

The `plot`

function takes a list of `x`

-coordinates and a list of

`y`

-coordinates and returns a list of objects called `Line2D`

objects. A `Line2D`

object is a matplotlib object that represents the lines in the figure that join the coordinates together. We will explore `Line2D`

objects, and other objects, in a different post. For now, just take it for granted that `ax.plot`

connects the passed coordinates with lines to create a line plot. We can save the figure with the following:

```
In [5]: fig.savefig('plot.pdf')
```

This will save the figure in `pdf`

format. Open it and take a look. It should look a little like this:

There are lots of file types that matplotlib supports. The ones I use most commonly are `pdf`

and `png`

. In fact, the image above is an `svg`

file. It is common in the scientific community to produce scalable vector graphics, and matplotlib allows this. It also supports `ps`

and `eps`

. For a full list of supported file types see here.

[1] It's actually an `AxesSubplot`

object, not an `Axes`

object. An `AxesSubplot`

is just an `Axes`

object with some extra functions to allow manipulation of its position within a `Figure`

. The reason for this is that there may be more than one set of axes in a figure.

You can stop here, or you can follow along with a more complicated line plot using `numpy`

, a high performance python library for dealing with array objects. Carrying on from within the same ipython session:

```
In [6]: ax.cla()
```

This clears the axes of the old plot ready to plot something new.

```
In [7]: import numpy as np
```

Here we import `numpy`

so we can use it below:

```
In [8]: x = np.linspace(0, 5, num=1000, endpoint=True)
```

This creates an array, `x`

, of 1000 equally spaced points between `0`

and `1`

, inclusive.

```
In [9]: y = x * x * np.sin(2.0 * np.pi * x)
```

This line takes advantage of some of `numpy`

's machinery. The multiplication of `numpy`

arrays is done component-wise.

```
In [10]: ax.plot(x, y)
Out[10]: [<matplotlib.lines.Line2D at 0x108a728d0>]
```

Again, we pass in the `x`

- and `y`

-coordinates to `plot`

. This draws lines between the passed coordinates. Go ahead and save your creation. It should look something like this: