phyloGenerator2

Overview

Like phyloGenerator1, I built pG2 with the intention of making it easily extendable. That said, I think I did a slightly better job with pG2!

There are five major components of pG2:

pG2.rb_ - handles all arguments to the program, and runs everything
Cap.rb - the 'God Class' that keeps everything running, and handles the next three modules (which do all the work)
Download.rb - downloads all sequences, and does basic sequence checking.
Hawkeye.rb - performs secondary sequence checks (the Hawkeye method).
PhyloGen.rb - build phylogenies

...they used to have different, slightly more amusing, names during early development if you go back through the Git entries.

Take a look at Cap.rb to figure out what's going on; there's really nothing very complex going on behind the scenes, honest. It's quite sobering to me just how much simpler all of this is than the original pG1 was; the user interface was almost all of what I was focusing on.

More details

Each of the modules assumes its current working directory is somewhere it can write things out, and that either contains a set of genus_species_gene.fasta files it can use to build a phylogeny, or that its main job is to create such files. Download creates these files, Hawkeye renames the bad ones such that PhyloGen can simply build a phylogeny from what it sees. Every module takes arguments that are fairly self-explanatory, and a Hash of additional parameters that are all described in the Guide to pG2 on this website. This is the simplest way of sharing state among the different modules, without creating some sort of convoluted database, and it makes it ridiculously easy to manipulate sequence identities etc.

Each of the modules also assumes it has access to the programs it needs through the shell; in other words this means that if you have set up pG2 so that it works properly so will all its subcomponents on their own.

I achieve some degree of parallelism by simply calling each module in its own thread. It would be trivial to re-call Download with a new set of species once Hawkeye has removed sequences from the overall pool.

If you're interested in using different methods and programs inside phyloGenerator2, it's trivial to alter the relevant parts of each component.