Hoyt Koepke

Python Libraries All Researchers Should Know About

One of my frequent experiences involving Python is coding up something and then finding that someone else already coded it up. I guess it’s one of the disadvantages of the world of open source. In light of that, I’ve accumulated a list of a few libraries that have significantly affected the way I code and greaty increased my productivity. These are, of course, along side my own Code and Open Source Software.

Cython

In writing mathematical code, Cython is one of the best tools for writing highly optimized code. Cython is a tool that translates a python module into equivalent C code. By itself, this doesn’t give you that much of a speedup, as it primarly runs the python interpreter through the python-C interface. However, if you declare variables as being statically typed (dynamic typing is what forces the code to be run through an interpreter), those parts are compiled directly into machine code. Thus the difference between your quickly scripted python code and a highly optimized executable is just a few type declarations.

For example, suppose you have the following code:

def foo(A):
    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[i,j] += i*j

where A is a 2 dimensional matrix. This code uses interpreted loops and thus runs slower than you’d like. However, add type information and run it through cython:

def bar(ndarray[float, ndim=2] A):
    cdef unsigned int i, j

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[i,j] += i*j

Cython takes this code and translates it into C code. Any parts that require python are run through python’s C API. The rest is turned into C code that handles array indexing, looping, python exception management, etc.

For more information on using cython and numpy arrays, see the cython/numpy tutorial.

A general rule of thumb is that your program spends 80% of its time running 20% of the code. Thus a good strategy for efficient coding is to write everything, profile your code, and optimize the parts that need it. Python’s profilers are great, and Cython allows you to do the latter step with minimal effort.

NumPy/SciPy

NumPy provides a very flexible nd-array object with extensive slicing, indexing, masking, and linear algebra functionality. Numerous functions such as cos and exp work effeciently on an entire array. A decent reference guide is available online.

SciPy extends NumPy with routines for optimization, numerical integration and differentiation, statistics, interpolation, clustering, spatial computation, image processing, Fourier transforms, and signal processing. It also includes support for reading and writing MatLab files and some audio formats (scipy.io). The scipy reference guide details the SciPy API.

SqlAlchemy

SqlAlchemy makes leveraging the power of a database incredibly simple and intuitive. It is essentially a wrapper around an SQL database. You build queries using intuitive operators, then it generates the SQL, queries the database, and returns an iterator over the results. If you use sqlite – already embedded in python’s standard library – using a database is painless. And, if you tell sqlite to build its database in memory, you’ve got a really powerful and super fast data structure.

PyTables

PyTables is a great way of managing large amounts of data in a hierarchical fashion. It optimizes resources, automatically transferring data between disk and memory as needed. It also supports on-the-fly (de)compression and works naturally with numpy arrays.

QT / PyQt

For writing user interfaces in C++, it is very difficult to beat QT (free under GPL). You design your GUI in a great editor, which then generates code for your program or configuration files to load at runtime. The code is cross-platform over Linux, Mac OS X, and Windows. If you need to develop a GUI, and don’t mind the GPL license (I don’t), this is, in my experience, the easiest way to do so.

PyQt brings the ease of QT to python. And I do mean ease – I’ve designed a reasonably complex GUI-driven application that required me to write only a few dozen lines of GUI code; the rest was done with the QT designer.