SplitThreader (ARCHIVED)

NOTE: SplitThreader is archived. It has become technically outdated and hard to maintain in its current form, so we are finally shutting it down (April 2024) after 8 years.

A brief history and post-mortem of SplitThreader

I wrote SplitThreader to help me analyze the particularly gnarly genome of SKBR3, a breast cancer cell line I studied for my PhD. As of this writing, GitHub says SplitThreader is 8 years old, and we've kept it running somehow on an old web server that whole time.

SplitThreader was my first real visualization tool I made, the first of many, and it taught me a lot about making visualization tools, but I also learned a lot since then about software development and making a good bioinformatics tool. Because of some things I didn't know back then, SplitThreader is now harder and more expensive to maintain than the very next software project I made, Ribbon. SplitThreader was always more niche, visualizing long-range structural variants that are rare in healthy individuals and only really need SplitThreader's level of firepower when there are tons of them in cancer genomes like SKBR3.

Some things I would do differently today if I had bandwidth to give SplitThreader a makeover:

Require a pre-analyzed copy number profile instead of doing that processing on the server. Back in 2014-ish when I started SplitThreader, I couldn't find a nice tool that could generate a copy number profile for every 10kb bin in the human genome, so I duct-taped some tools together. To use SplitThreader, users would have to run this terrible bash script to get a copy number input file. I think bedtools might have something that can actually bin coverage now across the whole genome, but back when I made SplitThreader it could only output coverage for every single base on its own line, so my bash script was using that and then binning it with awk, which took a long time. Then inside the SplitThreader server, it would run some more processing in Python and finally use an R script to segment the copy number data. The way to change SplitThreader for the better today would be to adopt one or more copy number file formats that now are more standardized and have tools to generate them. It would use the input file directly without any extra processing, taking it whether it's segmented or not, and not trying to do any of that for the user. This would be more flexible and remove the need for the server.
Make it front-end only. Back-ends cost money and require a bigger server, and they also result in uploading user data which then needs to be secured or at the very least occasionally deleted. SplitThreader and other applications on the old institutional web server would have issues every year or so when the server would fill up (mostly from other more data-heavy applications) and wouldn't take any more uploads. By removing the copy number processing from the server side and similarly moving the VCF wrangling to the front-end, we could completely remove the need for the server. Today we could even use WebAssembly-compiled bedtools and bcftools for this through biowasm! Making SplitThreader front-end-only would also make it much easier for clinical users to install a local version to keep patient data secure.

An opportunity

If anyone is interested to take this on as a project, please give it a try and let me know how it goes! I think it could be a great project for a budding bioinformatics software engineer.

Previous README contents:

SplitThreader is an interactive web application for analysis of rearrangements in a cancer genome.

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
bin		bin
css		css
images		images
js		js
resources/annotation		resources/annotation
user_data		user_data
user_uploads		user_uploads
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SplitThreader_tests.html		SplitThreader_tests.html
analysis.php		analysis.php
check_progress.php		check_progress.php
example_copycat.txt		example_copycat.txt
example_lumpy.vcf		example_lumpy.vcf
example_variants.bedpe		example_variants.bedpe
example_variants.csv		example_variants.csv
file_upload.php		file_upload.php
header.html		header.html
index.php		index.php
info.php		info.php
input_validation.php		input_validation.php
run_algorithm.php		run_algorithm.php
title.html		title.html
vis.php		vis.php

License

MariaNattestad/SplitThreader

Folders and files

Latest commit

History

Repository files navigation

SplitThreader (ARCHIVED)

A brief history and post-mortem of SplitThreader

An opportunity

Previous README contents:

Screenshots:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages