<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>machine-envy &#187; bioinformatics</title>
	<atom:link href="http://www.machine-envy.com/blog/category/bioinformatics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.machine-envy.com/blog</link>
	<description></description>
	<lastBuildDate>Tue, 27 Jul 2010 23:23:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Variant Call Format: really?</title>
		<link>http://www.machine-envy.com/blog/2010/07/27/variant-call-format-really/</link>
		<comments>http://www.machine-envy.com/blog/2010/07/27/variant-call-format-really/#comments</comments>
		<pubDate>Tue, 27 Jul 2010 23:23:10 +0000</pubDate>
		<dc:creator>James Casbon</dc:creator>
				<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://www.machine-envy.com/blog/?p=274</guid>
		<description><![CDATA[1000 genomes are making their genotypes available in variant call format (vcf).  Now as others have noticed, vcf isn&#8217;t the prettiest format around.  There are a few things to dislike:

The data is in &#8216;wide&#8217; format which means that a file is fifteen screens wide and hides rare variation in a load of noise [...]]]></description>
			<content:encoded><![CDATA[<p>1000 genomes are making their genotypes available in <a href="http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:variant_call_format">variant call format</a> (vcf).  Now as others have <a href="http://plindenbaum.blogspot.com/2010/05/first-rule-of-bioinfo-club.html">noticed</a>, vcf isn&#8217;t the prettiest format around.  There are a few things to dislike:</p>
<ul>
<li>The data is in &#8216;wide&#8217; format which means that a file is fifteen screens wide and hides rare variation in a load of noise &#8211; it&#8217;s also &#8216;ungreppable&#8217;.</li>
<li>The use of tab delimiting with embedded semi-colon delimiting is an <a href="http://www.sequenceontology.org/gff3.shtml">old trick</a>, but these days everyone has a json parser handy so you should just use json for an structured field inside a tab delimited file.   Since strings, lists and key/value pairs all have simple json representations this will cover all cases.</li>
<li>The fact that the file headers dynamically describe the structure of the fields makes it less a format and more a family of formats.  The genotype fields have arbitrary keys.</li>
<li>Despite the flexibility defining genotypes, the locus information <a href="http://biostar.stackexchange.com/questions/995/adding-an-extra-column-to-a-vcf-file">is not extensible</a> so you cannot annotate the site with extra information.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.machine-envy.com/blog/2010/07/27/variant-call-format-really/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>cogent: the unsung hero of bioinformatics and python</title>
		<link>http://www.machine-envy.com/blog/2009/10/27/cogent-the-unsung-hero-of-bioinformatics-and-python/</link>
		<comments>http://www.machine-envy.com/blog/2009/10/27/cogent-the-unsung-hero-of-bioinformatics-and-python/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 16:57:53 +0000</pubDate>
		<dc:creator>James Casbon</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[ensembl]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.machine-envy.com/blog/?p=225</guid>
		<description><![CDATA[I recently started using cogent &#8211; the COmparative GENomics Toolkit and discovered that it is an excellent piece of kit.  A google search for &#8216;python ensembl&#8216; doesn&#8217;t even show it at all, yet it definitely has the best bindings for ensembl avaiable in python &#8211; they&#8217;re based on sqlalchemy making it easy enough to [...]]]></description>
			<content:encoded><![CDATA[<p>I recently started using <a href="http://pypi.python.org/pypi/cogent">cogent &#8211; the COmparative GENomics Toolkit</a> and discovered that it is an excellent piece of kit.  A google search for &#8216;<a href="http://www.google.co.uk/search?q=python+ensembl">python ensembl</a>&#8216; doesn&#8217;t even show it at all, yet it definitely has the best <a href="http://pycogent.sourceforge.net/examples/query_ensembl.html">bindings for ensembl avaiable in python</a> &#8211; they&#8217;re based on <a href="http://www.sqlalchemy.org/">sqlalchemy</a> making it easy enough to pull of any query.  Have a look at the full list of <a href="http://pycogent.sourceforge.net/examples/index.html">examples</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.machine-envy.com/blog/2009/10/27/cogent-the-unsung-hero-of-bioinformatics-and-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Installing python bioinformatics tools with virtualenv and pip</title>
		<link>http://www.machine-envy.com/blog/2009/07/11/installing-python-bioinformatics-tools-with-virtualenv-and-pip/</link>
		<comments>http://www.machine-envy.com/blog/2009/07/11/installing-python-bioinformatics-tools-with-virtualenv-and-pip/#comments</comments>
		<pubDate>Sat, 11 Jul 2009 13:56:52 +0000</pubDate>
		<dc:creator>James Casbon</dc:creator>
				<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.machine-envy.com/blog/?p=222</guid>
		<description><![CDATA[Python seems to have developed a decent set of tools for quickly building development environments.  I want to store my notes on how to get a good environment for bioinformatics set up quickly.
First of all, if you haven&#8217;t already, install virtualenv and pip.  Both are easy installable.  Now install virtualenv wrapper.
Now we [...]]]></description>
			<content:encoded><![CDATA[<p>Python seems to have developed a decent set of tools for quickly building development environments.  I want to store my notes on how to get a good environment for bioinformatics set up quickly.</p>
<p>First of all, if you haven&#8217;t already, install virtualenv and pip.  Both are easy installable.  Now install <a href="http://www.doughellmann.com/projects/virtualenvwrapper/">virtualenv wrapper</a>.</p>
<p>Now we are going to setup a bioinformatics environment with both biopython and pygr installed so that you can hack on them. Firstly create a new virtualenv, passing the no site packages flag to keep this clean:</p>
<p><code lang="bash">james@flapjack:~/Documents/virtualenvs$ mkvirtualenv --no-site-packages bio<br />
New python executable in bio/bin/python<br />
Installing setuptools............done.<br />
(bio)james@flapjack:~/Documents/virtualenvs$ cdvirtualenv<br />
(bio)james@flapjack:~/Documents/virtualenvs/bio$ </code></p>
<p>Now, to install biopython we first use pip to install numpy:</p>
<p><code lang="bash">(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install numpy<br />
Downloading/unpacking numpy<br />
...</code></p>
<p>Important to remember the &#8216;-E&#8217; flag which tells pip to use the virtualenv we are in (this should be added to virtualenv_wrapper IMHO).  Now we can install biopython from our github fork, using the &#8216;-e&#8217; flag to keep it editable (i.e we are hacking on it).</p>
<p><code lang="bash">(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -e git://github.com/jamescasbon/biopython.git#egg=biopython<br />
Obtaining biopython from git+git://github.com/jamescasbon/biopython.git#egg=biopython<br />
  Cloning git://github.com/jamescasbon/biopython.git to ./src/biopython<br />
remote: Counting objects: 22719, done.<br />
...</code></p>
<p>Next up, we want pygr so we need pyrex to build the c files:</p>
<p><code lang="bash">(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -U pyrex -f http://www.cosc.canterbury.ac.nz/greg.ewing/python/Pyrex/Pyrex-0.9.8.5.tar.gz<br />
Downloading/unpacking pyrex<br />
  Downloading Pyrex-0.9.8.5.tar.gz (242Kb): 242Kb downloaded<br />
  In the tar file /var/folders/Gn/GneSaDeKGaGpZXx+hcopdU+++TI/-Tmp-/tmpTEdhFd/Pyrex-0.9.8.5.tar.gz the member Pyrex-0.9.8.5/Demos/embed/Makefile is invalid: 'filename None not found'<br />
  Running setup.py egg_info for package pyrex<br />
Installing collected packages: pyrex<br />
  Running setup.py install for pyrex<br />
    changing mode of build/scripts-2.5/pyrexc from 644 to 755<br />
    changing mode of /Users/james/Documents/virtualenvs/bio/bin/pyrexc to 755<br />
Successfully installed pyrex</code></p>
<p>Now, to get and editable pygr:</p>
<p><code lang="bash">(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install -e git://github.com/jamescasbon/pygr.git#egg=pygr<br />
Obtaining pygr from git+git://github.com/jamescasbon/pygr.git#egg=pygr<br />
  Cloning git://github.com/jamescasbon/pygr.git to ./src/pygr<br />
remote: Counting objects: 6281, done.<br />
...<br />
Successfully installed pygr</code></p>
<p>Finally, ipython:</p>
<p><code lang="bash">(bio)james@flapjack:~/Documents/virtualenvs/bio$ pip -E . install ipython<br />
Downloading/unpacking ipython<br />
  Downloading ipython-0.9.1.tar.gz (2.8Mb): 2.8Mb downloaded<br />
</code></p>
<p>We now have a completely isolated environment, where pygr and biopython are editable:</p>
<p><code lang="python">(bio)james@flapjack:~/Documents/virtualenvs/bio$ bin/ipython<br />
Python 2.5.4 (r254:67916, Mar  2 2009, 10:40:04)<br />
Type "copyright", "credits" or "license" for more information.</p>
<p>IPython 0.9.1 -- An enhanced Interactive Python.<br />
?         -> Introduction and overview of IPython's features.<br />
%quickref -> Quick reference.<br />
help      -> Python's own help system.<br />
object?   -> Details about 'object'. ?object also works, ?? prints more.</p>
<p>In [1]: import pygr</p>
<p>In [2]: pygr.__file__<br />
Out[2]: '/Users/james/Documents/virtualenvs/bio/src/pygr/pygr/__init__.pyc'<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://www.machine-envy.com/blog/2009/07/11/installing-python-bioinformatics-tools-with-virtualenv-and-pip/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.727 seconds -->
