Big data in the cloud: Medical research use case

The use of private clouds for the analysis of big data isn’t recent news. Companies have been using virtual environments to conduct analysis, run applications and otherwise break down their data for a few years now. Research firm IDC predicted in November 2015 that annual spending on big data infrastructure, services and software will reach $48.6 billion by 2019. This spending will come from nearly every industry, a testament to the usefulness of big data tools across the spectrum.

One of the most important verticals for big data is in life sciences. Medical data is growing at an astounding rate, and health care organizations, federal and otherwise, have to have somewhere to put all of this information. Cloud infrastructure is quickly becoming a suitable answer to this impending problem with the medical field.

We have touched previously on the cloud’s general role in big data analytics. Now let’s take a look at a crucial use case where cloud infrastructure can be utilized in the day-to-day operation of the medical research industry:

“Where is all of this data going to go?”

NIH looks for answers

Medical data is one of the world’s most important resources, but we’re running out of places to put it. The National Institutes of Health, and the National Human Genome Research Institute in particular, are currently embroiled in an exploration to find new ways to host the extreme amount of data generated by genomic research, according to Forbes contributor Kalev Leetaru.

The NHGRI is expected to close out funding for the Online Mendelian Inheritance in Man database, which has been in operation for 50 years. This database costs an annual $2.1 million and generates 23 million page views per year. When the NIH ceases to provide funding for this database, what does it mean for the future of medical data as a whole?

Genomics alone contributes a hefty portion of the big data generated by the medical industry. The amount of data created by sequencing the human genome is monstrous. According to a report published in the scientific journal PLoS Biology in July 2015, between 100 million and 2 billion genomes could be sequenced by 2025, and that is going to create as many as 40 exabytes of data. The question remains: Where is all of this data going to go?

“While an increasing number of academic institutions offer centralized institutional repository systems, few are designed or capable of handling the kinds of massive multi-terabyte datasets being generated by the new era of ‘big data’ research,” Leetaru wrote.

What’s the answer? The cloud, of course.

The cloud offers a ready answer for the problem of big data generated by genome sequencing.

A different way to store data

According to Dataversity contributor Jonathan Buckley, big data analytics is changing the way the medical industry interacts with the research it conducts. Specifically, big data helps scientists improve their gene-mapping methods and better observe the complex inner workings of human genetic makeup. For instance, scientists at Israel’s University of Haifa were able to use analysis tools to collect and sift through mountains of data to improve understanding of the social character of genes – or how they interact.

The massive amounts of data required to conduct this kind of research needs to go somewhere. This is where the cloud comes in. By moving data and related tools to cloud environments, health care organizations and scientists at the NIH can promote enhanced data organization and improve analytics processes without having to invest in huge data centers full of information. Implementing the right kind of private cloud architecture can take away from the headache of finding a new place to put all the data being generated on a constant basis by the medical research industry.

Or, as Leetaru put it:

“In the end, as the academic enterprise moves towards a future ever more entwined with the world of big data, it faces new challenges in supporting the contemporary needs and long-term preservation of data intensive research and offers a powerful new application area for the cloud.”

Finding a private cloud provider that will facilitate the storage of medical data – like the projected exabytes generated by genome sequencing – is going to be critical going forward for health care organizations and the research institutions they support.