Skip to content

Exhibit-based scripts for searching modENCODE genome data

Notifications You must be signed in to change notification settings

lstein/ModENCODE-Faceted-Browsing

Repository files navigation

        ModENCODE Data Collection for Amazon EC2

1) The filesystem that contains this README must be mounted at /modencode!!

2) The data is stored on multiple large volumes, equalling almost
   4 TB total. You may load all the data or just a subset. Please
   examine /modencode/DATA_SNAPSHOTS.txt for the list of datasets and
   comment out the volumes you do not wish to attach.

3) Run /modencode/bin/setup.pl. This will do the following:

   - Create the volumes selected in DATA_SNAPSHOTS.txt and mount them
     in /modencode/data/all_files.

   - Create a series of links within /modencode/data that point to the
     datafiles, organized hierarchically by experiment type.

   - Create a data web browser and an FTP site. These are identical to
     what can be found at ftp://ftp.modencode.org and http://data.modencode.org.

To browse the datasets, you may:

   - Log in via ssh and explore the file system. Datasets are organized
      hierarchically at /modencode/data.

   - Browse the data using a shopping cart metaphor, using an interface
      located at http://your-instance-address/. From this interface you
      can download compressed tarballs of selected datasets, or view
      the data on the ModENCODE genome browser or modMINE databases.

   - Download individual datasets via FTP. Point your FTP client at
      ftp://your-instance-address/. Anonymous FTP is allowed.

If at some later date, you should wish to add additional data volumes,
uncomment the desired volumes in DATA_SNAPSHOTS.txt, and run
/modencode/bin/attach_snapshots.pl. Note that this script will only
mount snapshots and will never unmount them. You will have to unmount,
detach, (and potentially destroy) unwanted volumes manually using the
AWS console.

After mounting new data volumes, you should also run
/modencode/bin/generate_ftp_tree.pl to rebuild the hierarchical
list of data sets.

4) The large data volumes will be automatically deleted when the
running instance terminates. However, this filesystem will not be
deleted automatically.

About

Exhibit-based scripts for searching modENCODE genome data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages