A software toolkit for the interconversion of standard data models for phenotypic data
Documentation: https://cnag-biomedical-informatics.github.io/convert-pheno
Google Colab tutorial: https://colab.research.google.com/drive/1T6F3bLwfZyiYKD6fl1CIxs9vG068RHQ6?usp=sharing
CLI Source Code: https://github.com/cnag-biomedical-informatics/convert-pheno
CPAN Distribution: https://metacpan.org/pod/Convert::Pheno
Web App UI Source Code: https://github.com/cnag-biomedical-informatics/convert-pheno-ui
Docker Hub Image: https://hub.docker.com/r/manuelrueda/convert-pheno/tags
convert-pheno - A script to interconvert common data models for phenotypic data
convert-pheno [-i input-type] <infile> [-o output-type] <outfile> [-options]
Arguments:
(input-type):
-ibff Beacon v2 Models ('individuals' JSON|YAML) file
-iomop OMOP-CDM CSV files or PostgreSQL dump
-ipxf Phenopacket v2 (JSON|YAML) file
-iredcap (experimental) REDCap (raw data) export CSV file
-icdisc (experimental) CDISC-ODM v1 XML file
-icsv (experimental) Raw data CSV
(Wish-list)
#-iopenehr openEHR
#-ifhir HL7/FHIR
(output-type):
-obff Beacon v2 Models ('individuals' JSON|YAML) file
-opxf Phenopacket v2 (JSON|YAML) file
(Wish-list)
#-oomop OMOP-CDM PostgreSQL dump
Compatible with -i(bff|pxf):
-ocsv Flatten data to CSV
-ojsonf Flatten data to 1D-JSON (or 1D-YAML if suffix is .yml|.yaml)
-ojsonld (experimental) JSON-LD (interoperable w/ RDF ecosystem; YAML-LD if suffix is .ymlld|.yamlld)
Options:
-exposures-file <file> CSV file with a list of 'concept_id' considered to be exposures (with -iomop)
-mapping-file <file> Fields mapping YAML (or JSON) file
-max-lines-sql <number> Maximum number of lines read from SQL dump [500]
-min-text-similarity-score <score> Minimum score for cosine similarity (or Sorensen-Dice coefficient) [0.8] (to be used with --search mixed)
-ohdsi-db Use Athena-OHDSI database (~2.2GB) with -iomop
-omop-tables <tables> OMOP-CDM tables to be processed. Tables <CONCEPT> and <PERSON> are always included.
-out-dir <directory> Output (existing) directory
-O Overwrite output file
-path-to-ohdsi-db <directory> Directory for the file <ohdsi.db>
-phl|print-hidden-labels Print original values (before DB mapping) of text fields <_labels>
-rcd|redcap-dictionary <file> REDCap data dictionary CSV file
-schema-file <file> Alternative JSON Schema for mapping file
-search <type> Type of search [>exact|mixed]
-svs|self-validate-schema Perform a self-validation of the JSON schema that defines mapping (requires IO::Socket::SSL)
-sep|separator <char> Delimiter character for CSV files [;] e.g., --sep $'\t'
-stream Enable incremental processing with -iomop and -obff [>no-stream|stream]
-sql2csv Print SQL TABLES (only valid with -iomop). Mutually exclusive with --stream
-test Does not print time-changing-events (useful for file-based cmp)
-text-similarity-method <method> The method used to compare values to DB [>cosine|dice]
-u|username <username> Set the username
Generic Options:
-debug <level> Print debugging level (from 1 to 5, being 5 max)
-help Brief help message
-log Save log file (JSON). If no argument is given then the log is named [convert-pheno-log.json]
-man Full documentation
-no-color Don't print colors to STDOUT [>color|no-color]
-v|verbose Verbosity on
-V|version Print Version
convert-pheno
is a command-line front-end to the CPAN's module Convert::Pheno.
A script that uses Convert::Pheno to interconvert common data models for phenotypic data
If you plan to only use the CLI, we recommend installing it via CPAN. See details below.
Download a docker image (latest version - amd64|x86-64) from Docker Hub by executing:
docker pull manuelrueda/convert-pheno:latest
docker image tag manuelrueda/convert-pheno:latest cnag/convert-pheno:latest
See additional instructions below.
Please download the Dockerfile
from the repo:
wget https://raw.githubusercontent.com/cnag-biomedical-informatics/convert-pheno/main/Dockerfile
And then run:
docker buildx build -t cnag/convert-pheno:latest .
To run the container (detached) execute:
docker run -tid -e USERNAME=root --name convert-pheno cnag/convert-pheno:latest
To enter:
docker exec -ti convert-pheno bash
The command-line executable can be found at:
/usr/share/convert-pheno/bin/convert-pheno
The default container user is root
but you can also run the container as $UID=1000
(dockeruser
).
docker run --user 1000 -tid --name convert-pheno cnag/convert-pheno:latest
Alternatively, you can use make
to perform all the previous steps:
wget https://raw.githubusercontent.com/cnag-biomedical-informatics/convert-pheno/main/Dockerfile
wget https://raw.githubusercontent.com/cnag-biomedical-informatics/convert-pheno/main/makefile.docker
make -f makefile.docker install
make -f makefile.docker run
make -f makefile.docker enter
Docker containers are fully isolated. If you need the mount a volume to the container please use the following syntax (-v host:container
).
Find an example below (note that you need to change the paths to match yours):
docker run -tid --volume /media/mrueda/4TBT/data:/data --name convert-pheno-mount cnag/convert-pheno:latest
Then I will do something like this:
# First I create an alias to simplify invocation (from the host)
alias convert-pheno='docker exec -ti convert-pheno-mount /usr/share/convert-pheno/bin/convert-pheno'
# Now I use the alias to run the command (note that I use the flag --out-dir to specify the output directory)
convert-pheno -ibff /data/individuals.json -opxf pxf.json --out-dir /data
The script runs on command-line Linux and it has been tested on Debian/RedHat/MacOS based distributions (only showing commands for Debian's). Perl 5 is installed by default on Linux,
but we will install a few CPAN modules with cpanminus
.
git clone https://github.com/cnag-biomedical-informatics/convert-pheno.git
cd convert-pheno
Install system level dependencies:
sudo apt-get install cpanminus libbz2-dev zlib1g-dev libperl-dev libssl-dev
Now you have two choose between one of the 3 options below:
Option 1: Install dependencies (they're harmless to your system) as sudo
:
cpanm --notest --sudo --installdeps .
bin/convert-pheno --help
Option 2: Install the dependencies at ~/perl5
:
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
cpanm --notest --installdeps .
bin/convert-pheno --help
To ensure Perl recognizes your local modules every time you start a new terminal, you should type:
echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc
Option 3: Install the dependencies in a "virtual environment" (at local/
) . We'll be using the module Carton
for that:
mkdir local
cpanm --notest --local-lib=local/ Carton
export PATH=$PATH:local/bin; export PERL5LIB=$(pwd)/local/lib/perl5:$PERL5LIB
carton install
carton exec -- bin/convert-pheno -help
First install system level dependencies:
sudo apt-get install cpanminus libbz2-dev zlib1g-dev libperl-dev libssl-dev
Now you have two choose between one of the 3 options below:
Option 1: System-level installation:
cpanm --notest --sudo Convert::Pheno
convert-pheno -h
Option 2: Install Convert-Pheno and the dependencies at ~/perl5
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
cpanm --notest Convert::Pheno
convert-pheno --help
To ensure Perl recognizes your local modules every time you start a new terminal, you should type:
echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc
Option 3: Install Convert-Pheno and the dependencies in a "virtual environment" (at local/
) . We'll be using the module Carton
for that:
mkdir local
cpanm --notest --local-lib=local/ Carton
echo "requires 'Convert::Pheno';" > cpanfile
export PATH=$PATH:local/bin; export PERL5LIB=$(pwd)/local/lib/perl5:$PERL5LIB
carton install
carton exec -- convert-pheno -help
* Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOs, OpenSuse, MacOS) should do as well.
* Perl 5 (>= 5.26 core; installed by default in most Linux distributions). Check the version with "perl -v".
* >= 4GB of RAM
* 1 core
* At least 16GB HDD
For executing convert-pheno you will need:
-
Input file(s):
A text file in one of the accepted formats. With
--iomop
I/O files can be gzipped. -
Optional:
Athena-OHDSI database
The database file is available at this link (~2.2GB). The database may be needed when using
-iomop
.Regardless if you're using the containerized or non-containerized version, the download procedure is the same. In Linux you can use
wget
,curl
oraria2c
:$ wget 'https://drive.google.com/uc?export=download&id=1-Ls1nmgxp-iW-8LkRIuNNdNytXa8kgNw&confirm=t' -O ohdsi.db --no-check-certificate or $ curl -L 'https://drive.google.com/uc?export=download&id=1-Ls1nmgxp-iW-8LkRIuNNdNytXa8kgNw&confirm=t' > ohdsi.db or $ aria2c -x2 'https://drive.google.com/uc?export=download&id=1-Ls1nmgxp-iW-8LkRIuNNdNytXa8kgNw&confirm=t' -o ohdsi.db
(you can install
wget
,curl
oraria2c
inside the container by typingsudo apt install wget
,sudo apt install curl
orsudo apt install aria2
.Once downloaded, you have two options:
a) Move the file
ohdsi.db
inside theshare/db/
directory.or
b) Use the option
--path-to-ohdsi-db
Examples:
$ bin/convert-pheno -ipxf phenopackets.json -obff individuals.json
$ $path/convert-pheno -ibff individuals.json -opxf phenopackets.yaml --out-dir my_out_dir
$ $path/convert-pheno -iredcap redcap.csv -opxf phenopackets.json --redcap-dictionary redcap_dict.csv --mapping-file mapping_file.yaml
$ $path/convert-pheno -iomop dump.sql -obff individuals.json
$ $path/convert-pheno -iomop dump.sql.gz -obff individuals.json.gz --stream -omop-tables measurement -verbose
$ $path/convert-pheno -cdisc cdisc_odm.xml -obff individuals.json --rcd redcap_dict.csv --mapping-file mapping_file.yaml --search mixed --min-text-similarity-score 0.6
$ $path/convert-pheno -iomop *csv -obff individuals.json -sep ','
$ carton exec -- $path/convert-pheno -ibff individuals.json -opxf phenopackets.json # If using Carton
* Error message: CSV_XS ERROR: 2023 - EIQ - QUO character not allowed @ rec 1 pos 21 field 1
Solution: Make sure you use the right character separator for your data with --sep <char>.
The script tries to guess it from the file extension, but sometimes extension and actual separator do not match.
When using REDCap as input, make sure that <--iredcap> and <--rcd> files use the same separator field.
The defauly value for the separator is ';'.
Example for tab separator in CLI.
--sep $'\t'
* Error message: Foo
Solution: Bar
The author requests that any published work that utilizes Convert-Pheno
includes a cite to the the following reference:
Rueda, M et al., (2024). Convert-Pheno: A software toolkit for the interconversion of standard data models for phenotypic data. Journal of Biomedical Informatics. DOI
Written by Manuel Rueda, PhD. Info about CNAG can be found at https://www.cnag.eu.
Copyright (C) 2022-2024, Manuel Rueda - CNAG.
This program is free software, you can redistribute it and/or modify it under the terms of the Artistic License version 2.0.