Why?

There is some interest in the bioinformatics community for using rake
as a workflow tool (see e.g. this blog post from BioinformaticsZen
).

Rake could be ideal for this type of work: a typical workflow will
take data and perform a first set of conversions on it (i.e. a task),
followed by a second set of conversions (that is dependent on the
first task), and so on. And obviously, bioinformaticians want to keep their data
in databases rather than files…

A typical Rakefile could look like this:



 task :001_load_data do

   

 end
task :002_calculate_averages => [:001_load_data] do

end
task :003_make_histogram_of_averages => [:002_calculate_averages] do

end

The trouble is that there is no way yet to check whether a task has to
be rerun or not, because there are no timestamps. Regular rake will
rerun all three tasks from the example above, regardless if some of
them have already been completed.

BioRake adds this task timestamp functionality to rake for working with
databases. The functionality needed is very similar to the one available for
FileTasks.

So if we had reloaded the data (001), the timestamp for that task in a
metadata would be later than the one for task 002. As a result, task
002 would automatically have to be rerun if we were to run task 003.

Install

 gem sources -a http://gems.github.com (you only have to do this once)

 sudo gem install jandot-biorake

Implementation

I’ve started to implement an additional type of task, called
event. The above snippet from a Rakefile would actually contain

 event :001_load_data do
   ...
 end
 
 event :002_calculated_averages => [:001_load_data] do
   ...
 end

instead of using the task tag.

Similar to a FileTask, timestamps are used to check if certain tasks
have to be re-run or not. FileTasks have the advantage that every file
has a timestamp. To implement this the metadata of event completion
times is stored in the .rake directory inside the current directory.

A event task automatically:

checks the metadata to see if the task has already been run
if so: are there any prerequisites with timestamps that are newer than the task
itself?
(re)run the task if necessary
update the metadata

To re-run all tasks from scratch issue a Rake::EventTask.clean or simply



rm -rf .rake

to reset the metadata to before any events have occured.

Status

Even though the tests seem to run and I’ve tried some things out, I
can’t guarantee production-level stability (well: call it beta). Use
at your own risk.

Sample

The sample/ directory contains an example Rakefile. Suppose a
researcher has intensities for a group of individuals on a number of
probes. This information should be loaded into a database with the
tables individuals, probes and intensities.

As the intensities table contains foreign keys for individual and
probe, the individuals and probes tables have to be loaded
before the intensities can be loaded.

In rake-speak, this would look like:

event :load_probes do
load the actual data
end

event :load_individuals do
load the actual data
end

event :load_intensities => [:load_probes, :load_individuals] do
load the actual data
end

In a later step, the researcher might want to calculate the average
intensity per probe. This would be a new task that depends on the
intensities being loaded:

event :calculate_averages => [:load_intensities] do
  _calculate averages and store in probes table_
end

Here, we call the database that will contain the data sample.sqlite3. The
metadata about completed events is stored in the .rake directory.

Try a rake -T…

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
lib		lib
sample		sample
test		test
README.textile		README.textile
Rakefile		Rakefile
biorake.gemspec		biorake.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib

lib

sample

sample

test

test

README.textile

README.textile

Rakefile

Rakefile

biorake.gemspec

biorake.gemspec

Repository files navigation

Why?

Install

Implementation

Status

Sample

About

Releases

Packages

Contributors 3

Languages

jandot/biorake

Folders and files

Latest commit

History

Repository files navigation

Why?

Install

Implementation

Status

Sample

About

Resources

Stars

Watchers

Forks

Languages