Hadoop, the open-source software framework built to process large data sets, continues to make waves. A 2014 Allied Market Research report estimated Hadoop will be worth $50.2 billion by 2020, meaning the global market for Hadoop will increase at a compound annual growth rate of 58.2 percent from 2013 to 2020. This astounding growth is an indication that companies will continue to use it to assess large data clusters, and some of that analysis will inevitably take place in a company’s cloud infrastructure.
Hadoop utilization is prevalent across multiple industries as a tool to leverage big data. In 2007, The New York Times used Hadoop clusters to transform 11 million TIFF files to PDF images. More recently, according to Forbes contributor Ben Kepes, AtScale raised $7 million for its Hadoop-based business intelligence solution. The distributed computing platform has had an impact on big data since its inception, and its future is clearly looking bright.
Enterprise data center application
Companies may choose to run Hadoop in the cloud or on their physical servers. Forrester analyst Richard Fichera maintained in an April 2014 report that enterprises should utilize Hadoop on-premises in their data centers instead of migrating the data to a cloud computing environment. According to Tech Republic, Fichera based his argument on the fact that the large sets of data moved by Hadoop favor an on-premises solution because of the predictable way the data is fed through the system. Fichera also argued that big data moves slower in a cloud computing environment.
However, the growing use of cloud computing for enterprises makes Fichera’s argument somewhat moot. It makes the most sense for enterprises to install Hadoop where their data is. More often than not, that data is located in the cloud. Therefore, a combination of cloud-based and data center-based computing facilitated by the Hadoop platform is more than likely going to be the norm. In addition, as data sets grows, it may just be easier to store everything in the cloud as opposed to constantly expanding on-premises setups.
“Cloud computing has long been a friend of big data.”
Big data computing in the cloud
Hadoop allows users to analyze large clusters of data, and cloud computing has long been a friend of big data. According to a January 2015 article published in Information Systems, big data and cloud computing are conjoined, as the cloud provides the platform necessary for big data analysis to take place over clusters of commodity compute resource. The evaluation of big data is driven by cloud-based applications developed through the use of platforms like Hadoop.
“Therefore, cloud computing not only provides facilities for the computation and processing of big data but also serves as a service model,” the researchers said.
The future of Hadoop
As enterprises grow and have need for more computing power due to larger sets of data, it is likely they will utilize the Hadoop platform. According to TechCrunch contributor Stefan Groschupf, the growth of the industry depends on standardization. As the Hadoop community continues to grow, Groschupf stressed, the lack of industry-wide standards will be harder to get past. Currently, there is no standardization process to unify processes and eliminate the need to test computations in each version of Hadoop as it’s released.