GitHub - steven-albanese/FAHMunge: Tools for Munging Folding@Home datasets

FAHMunge

How to use

Install FAHMunge via setup.py using anaconda (not system) python
Login to work server using the usual FAH login
Check if script is running (screen -r -d). If True, stop here.
Start a screen session
cd /data/choderalab/fah/Software/FAHMunge
/data/choderalab/anaconda/bin/python scripts/munge_fah_data.py
To stop, control c when the script is in the "sleep" phase

The metadata for FAH is a CSV file located here:

/data/choderalab/fah/Software/FAHMunge/projects.csv

###Example CSV

project,location,pdb
"10491","/home/server.140.163.4.245/server2/data/SVR2359493877/PROJ10491/","/home/server.140.163.4.245/server2/projects/GPU/p10491/topol-renumbered-explicit.pdb"
"10492","/home/server.140.163.4.245/server2/data/SVR2359493877/PROJ10492/","/home/server.140.163.4.245/server2/projects/GPU/p10492/topol-renumbered-explicit.pdb"
"10495","/home/server.140.163.4.245/server2/data/SVR2359493877/PROJ10492/","/home/server.140.163.4.245/server2/projects/GPU/p10495/MTOR_HUMAN_D0/RUN%(run)d/system.pdb"

pdb points pipeline towards pdb to look at for numbering atoms in the munged data. The top two lines are eamples of using a single pdb for the munging pipeline. The third line shows how to use a different pdb for each run. %(run)d is substituted by the run number via filename % vars() in Python, whic allows run numbers or other variables to be substituted. This is done on a per-run basis, not per-clone.

Single vs. multi process

There is also a multiprocessing version in the scripts/ folder. However, the scripts generate potentially large temporary files. The single process version seems to put less strain on the /tmp filesystem, so we prefer that right now.

More description

Overall Pipeline (Core17/18):

Extract XTC data from bzips
Append allatom coordinates and filenames to HDF5 file
Extract protein coordinates and filenames into a second HDF5 file

General instructions:

Run FAH servers
Edit scripts/munge_fah_data.py to load your FAH data.
Run scripts/munge_fah_data.py in a screen session
rsync your stripped data to analysis machines periodically.

Note: the rate limiting step appears to be bunzip.

Sync to `hal.cbio.mskcc.org`

Munged no-solvent data is rsynced nightly from plfah1 and plfah2 to hal.cbio.mskcc.org via the choderalab robot user account to:

/cbio/jclab/projects/fah/fah-data/munged

This is done via a crontab:

04 01 * * * rsync -av --chmod=g-w,g+r,o-w,o+r server@plfah1.mskcc.org:/data/choderalab/fah/munged/no-solvent /cbio/jclab/projects/fah/fah-data/munged >> $HOME/plfah1-rsync-no-solvent.log 2>&1
39 01 * * * rsync -av --chmod=g-w,g+r,o-w,o+r server@plfah2.mskcc.org:/data/choderalab/fah/munged/no-solvent /cbio/jclab/projects/fah/fah-data/munged >> $HOME/plfah2-rsync-no-solvent.log 2>&1
34 03 * * * rsync -av --chmod=g-w,g+r,o-w,o+r server@plfah1.mskcc.org:/data/choderalab/fah/munged/all-atoms /cbio/jclab/projects/fah/fah-data/munged >> $HOME/plfah1-rsync-all-atoms.log 2>&1
50 03 * * * rsync -av --chmod=g-w,g+r,o-w,o+r server@plfah2.mskcc.org:/data/choderalab/fah/munged/all-atoms /cbio/jclab/projects/fah/fah-data/munged >> $HOME/plfah2-rsync-all-atoms.log 2>&1

To install this crontab as the choderalab user:

crontab ~/crontab

To list the active crontab:

crontab -l

Transfers are logged in the choderalab account:

plfah1-rsync-all-atoms.log
plfah1-rsync-no-solvent.log
plfah2-rsync-all-atoms.log
plfah2-rsync-no-solvent.log

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
FAHMunge		FAHMunge
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAHMunge

FAHMunge

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

MANIFEST.in

MANIFEST.in

README.md

README.md

setup.py

setup.py

Repository files navigation

FAHMunge

How to use

Single vs. multi process

More description

Sync to `hal.cbio.mskcc.org`

About

Releases

Packages

Languages

License

steven-albanese/FAHMunge

Folders and files

Latest commit

History

Repository files navigation

FAHMunge

How to use

Single vs. multi process

More description

Sync to hal.cbio.mskcc.org

About

Resources

License

Stars

Watchers

Forks

Languages

Sync to `hal.cbio.mskcc.org`