There’s a critical need for a new generation of bioinformatics and clinical informatics software to scale up to handle the vast increase in data. Even so, the challenges ahead in managing, sharing, accessing, and analyzing data are exacerbated not just by the ever-increasing volumes of data generated but even more so by the need to integrate and analyze a wide-ranging variety of data types and sources coming from research data, clinical data, insurance data, patient data, sensor data, and environmental data.

Both the Massachusetts Bio Technology Council and the Massachusetts Technology Leadership Council released reports highlighting the unique opportunity for innovation and partnerships between the Massachusetts Life Science and Massachusetts Big Data IT companies to:

“take the lead in developing information-based solutions that increase efficiency and mitigate risk at all levels of the biopharma value chain, from basic research to patient care.”

The reports can be found here: MASS Bio IMPACT2020 and MA TLC Big Data and Life Sciences

Paradigm4 has jumped in to do just that. We are helping accelerate life science and clinical informatics research with SciDB, an all-in-one data management and scalable analytics software solution. SciDB powers the NIH’s One Thousand Genomes browser, a public interface to public genomic and clinical cancer research data. Paradigm4’s SciDB is also enabling a number of pharma, biotech, instrumentation, and personalized medicine companies to do interactive, data-driven discovery on big data from diverse public and private data sources.

For example many of our customers struggle with both public and proprietary data in huge numbers of files. We make their exploratory analysis both faster and easier by putting their data in a scientific database, a shared, curated resource. We’ve loaded the ~5800 files comprising the public TCGA level 3 data into SciDB schema to allow companies to join their proprietary data with public data for faster selection and scalable math.

Our database goes well beyond other genomic array data management solutions as we support massively scalable in-database mathematical analysis. One biotech company is now able to correlate all genes by all genes for all tumor types, for all samples, for all probes in less than half the time it used to take them to do one gene by all genes—that’s 49B calculations—in-database. A personalized medicine company was previously unable to deal with the combinatorics of Fisher’s tests over all pairs of their target research genes because it took too long. With SciDB, they do 200 million Fisher’s tests in several seconds.

These are just a few examples of how SciDB accelerates research and drives scientific discovery. Share your data management and analytics challenges: we’re keen to help.