pivot tables with sqlalchemy

If your database doesn’t support pivots, here is a quick technique to get pivot columns with sqlalchemy

import operator
from sqlalchemy.sql import case, func, select

def pivot_report(report, pivot_on=None, pivot_columns=None, pivot_func=func.sum,
                    non_pivot_columns=None, group_by=None):
    """ produce a pivot on a select

    if we have a report: 

        id, type, count
        1, white, 10
        1, black, 20
        2, white, 12
        2, black, 20

    and we want 

        id, black count, white count
        1, 20, 10
        2, 20, 12

    pass in type as the pivot_on, and [count] as pivot columns  

    """

    # find all possible values of the pivot
    pivot_values = map(
        operator.itemgetter(0),
        select([pivot_on], from_obj=[report]).distinct().execute()
    )

    # build the new pivot columns
    new_columns = [
        pivot_func(case([(pivot_on == value, column)])).label("%s %s" % (value, column))
        for value in pivot_values
        for column in pivot_columns
    ]

    return select(
        non_pivot_columns + new_columns,
        from_obj=[report],
        group_by=group_by
    )

# example code
from sqlalchemy import Table, Column, Integer, String, MetaData
from sqlalchemy import create_engine

metadata = MetaData()
example = Table('example', metadata,
    Column('id', Integer),
    Column('type', String),
    Column('count', Integer),
)

engine = create_engine('sqlite:///:memory:')
metadata.bind = engine
example.create()
example.insert().execute(
    dict(id=1, type='white', count=10),
    dict(id=1, type='black', count=20),
    dict(id=2, type='white', count=30),
    dict(id=2, type='black', count=40),
)

report = example.select()

# now build the pivot
report = pivot_report(
    report,
    pivot_on=report.c.type,
    pivot_columns=[report.c.count],
    non_pivot_columns=[report.c.id],
    group_by=[report.c.id])

for r in report.execute():
    print r.items()

object ceremony, dynamic languages, JSON and algebraic data types

The reason most people end up using a dynamic language is to avoid the boilerplate associated with object creation. You know, typing “FileWriter fout = new FileWriter(”fred.txt”);” gets boring quickly. I think this is a good enough reason to move to another language on its own. This boilerplate is also sometimes called ceremony, and I have come to realize that far from being low ceremony, dynamic languages actually revolve around ceremony.

Think of the hash of hashes, list of hashes approach that you often find in a perl, python or ruby program. These are so useful, that they have been codified as JSON – which can simply be evaled to return your data in several languages. Ad-hoc data structures like this have a great appeal when hacking something in python, yet you quickly get to a pain point when using them when the data doesn’t look exactly as you would expect and you need to handle exceptions and edge cases.

So why is the hash of hashes approach so tempting? Because it avoids the ceremony around object creation. Things like Python’s “__init__” and Perl’s “bless” are ceremony and are necessarily ceremony because all a type in a dynamic language is just data with some ceremony. Clearly, perl has got a perfect name for the ceremony in “bless”.

The eureka moment comes when you use a typed language that is low ceremony, such as Haskell. Algebraic data types give you the freedom to create complex data structures without ceremony, which you can then process without the hassle involved in unpicking a big blob of JSON, which will typically need lots of switches. Instead, you can pattern match on the type.

So if you went to a dynamic language to avoid the ceremony, you may well be moving in the wrong direction.

cogent: the unsung hero of bioinformatics and python

I recently started using cogent – the COmparative GENomics Toolkit and discovered that it is an excellent piece of kit. A google search for ‘python ensembl‘ doesn’t even show it at all, yet it definitely has the best bindings for ensembl avaiable in python – they’re based on sqlalchemy making it easy enough to pull of any query. Have a look at the full list of examples.

Installing python bioinformatics tools with virtualenv and pip

Python seems to have developed a decent set of tools for quickly building development environments. I want to store my notes on how to get a good environment for bioinformatics set up quickly.

First of all, if you haven’t already, install virtualenv and pip. Both are easy installable. Now install virtualenv wrapper.

Now we are going to setup a bioinformatics environment with both biopython and pygr installed so that you can hack on them. Firstly create a new virtualenv, passing the no site packages flag to keep this clean:

james@flapjack:~/Documents/virtualenvs$ mkvirtualenv --no-site-packages bio
New python executable in bio/bin/python
Installing setuptools............done.
(bio)james@flapjack:~/Documents/virtualenvs$ cdvirtualenv
(bio)james@flapjack:~/Documents/virtualenvs/bio$

Now, to install biopython we first use pip to install numpy:

(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install numpy
Downloading/unpacking numpy
...

Important to remember the ‘-E’ flag which tells pip to use the virtualenv we are in (this should be added to virtualenv_wrapper IMHO). Now we can install biopython from our github fork, using the ‘-e’ flag to keep it editable (i.e we are hacking on it).

(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -e git://github.com/jamescasbon/biopython.git#egg=biopython
Obtaining biopython from git+git://github.com/jamescasbon/biopython.git#egg=biopython
Cloning git://github.com/jamescasbon/biopython.git to ./src/biopython
remote: Counting objects: 22719, done.
...

Next up, we want pygr so we need pyrex to build the c files:

(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -U pyrex -f http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/Pyrex-0.9.8.5.tar.gz
Downloading/unpacking pyrex
Downloading Pyrex-0.9.8.5.tar.gz (242Kb): 242Kb downloaded
In the tar file /var/folders/Gn/GneSaDeKGaGpZXx+hcopdU+++TI/-Tmp-/tmpTEdhFd/Pyrex-0.9.8.5.tar.gz the member Pyrex-0.9.8.5/Demos/embed/Makefile is invalid: 'filename None not found'
Running setup.py egg_info for package pyrex
Installing collected packages: pyrex
Running setup.py install for pyrex
changing mode of build/scripts-2.5/pyrexc from 644 to 755
changing mode of /Users/james/Documents/virtualenvs/bio/bin/pyrexc to 755
Successfully installed pyrex

Now, to get and editable pygr:

(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -e git://github.com/jamescasbon/pygr.git#egg=pygr
Obtaining pygr from git+git://github.com/jamescasbon/pygr.git#egg=pygr
Cloning git://github.com/jamescasbon/pygr.git to ./src/pygr
remote: Counting objects: 6281, done.
...
Successfully installed pygr

Finally, ipython:

(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install ipython
Downloading/unpacking ipython
Downloading ipython-0.9.1.tar.gz (2.8Mb): 2.8Mb downloaded

We now have a completely isolated environment, where pygr and biopython are editable:

(bio)james@flapjack:~/Documents/virtualenvs/bio$ bin/ipython
Python 2.5.4 (r254:67916, Mar 2 2009, 10:40:04)
Type "copyright", "credits" or "license" for more information.

IPython 0.9.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import pygr

In [2]: pygr.__file__
Out[2]: '/Users/james/Documents/virtualenvs/bio/src/pygr/pygr/__init__.pyc'

More money…

My company has met its goals and secured a second tranche of VC money. To celebrate, we’ve even got a website: Population Genetics Technologies.

Two excellent pieces of writing

Now that I’ve hung up my hat at the Open Rights Group, I actually have time to read stuff for pleasure again. And it has been with great pleasure that I’ve read the two pieces listed below. Sometimes it doesn’t matter what you’re writing about – the quality of your prose sings through. In the case of these two pieces, though, that quality is matched by the urgency of the subject matter. Enjoy.

Making textmate virtualenv aware

So I am using textmate for my python development, but I wanted it to pick up any virtualenv configured in a project. Here’s how to hack the python bundle…

First off, the run script command needs to be aware of the virtualenv stuff. So open up the bundle editor, and replace this:

is_test_script = ENV["TM_FILEPATH"] =~ /(?:\b|_)(?:test)(?:\b|_)/ or
File.read(ENV["TM_FILEPATH"]) =~ /\bimport\b.+(?:unittest)/

TextMate::Executor.run(ENV["TM_PYTHON"] || "python", "-u", ENV["TM_FILEPATH"]) do |str, type|

with:

is_test_script = ENV["TM_FILEPATH"] =~ /(?:\b|_)(?:test)(?:\b|_)/ or
File.read(ENV["TM_FILEPATH"]) =~ /\bimport\b.+(?:unittest)/

# default python
python = ENV["TM_PYTHON"] || "python"

# try for virtualenv if it exists
if ENV.has_key?("TM_PROJECT_DIRECTORY")
virtualenv_python = ENV["TM_PROJECT_DIRECTORY"] + "/bin/python"
if FileTest.exists?(virtualenv_python)
python = virtualenv_python
end
end

TextMate::Executor.run(python, "-u", ENV["TM_FILEPATH"]) do |str, type|

Now, we also want the unit tests to pick up that environment as well, so you need to do the same with the Run Project Unit Tests command. I am using nose to collect tests, and the nosexml plugin to format the results. You should install them if you need to. Replace:

# Find all files that end with "Test.py" and run
# them.

find . -name "*Test.py" -exec "${TM_PYTHON:-python}" '{}' \;|pre

with:

cd $TM_PROJECT_DIRECTORY
if [ -f bin/activate ]
then
source bin/activate
fi
nosetests --xml --xml-formatter=nosexml.TextMateFormatter

PS, it seems like the python bundle needs some love, anyone know the maintainer?

An interesting new campaign

RentalRights looks like it is finally getting a campaign together to organise in support of people who rent. If you rent and live in the uk you should check it out. No other country seems to treat tenants so badly.

chimpy: MailChimp API for python

I needed to use MailChimp from a Django app, so I have knocked up a wrapper for their API. Come on over to the google code site for chimpy if it is useful to you.

Why you shouldn’t use BT business

My new office use BT for their ADSL. We had a problem with mail disappearing between people in the office. So I emailed BT and asked them what was happening. Before they would help, they needed three emails in the last 72 hours that had disappeared.

I provided three emails satisfying this criteria. I did what most people would do and sent an email with the subject ‘test’ which was not delivered and sent this to BT. Their response was that emails with the word test in the subject line would be spam filtered, and you cannot access to the mails they have filtered. Since we are a company that performs tests, I asked if this could be removed? BT said that they could not do this. OK, so can BT just whitelist all mails within our domain? Again, no.

BT also cannot provide email lists. Think about that. The flagship British technology company is incapable of providing your business with an email list. That is so poor, I’m speechless.

All in all, BT don’t seem capable of providing reliable email service and are also not willing to help fix problems when they occur. And their online management interface sucks as well. Avoid.

When you add in phorm to this, I see no reason to ever use a BT product if you can avoid it.