Big Data Requires Scalable Storage Systems
Big Data problems require Big Data-capable storage systems that can scale as your
data grows. Traditional NAS arrays usually scale only one way: up. Although scaling up
has worked well in the past, it’s often ill-equipped for the demands of today’s data. In
contrast, scale-out architecture offers new advantages. In this blog, I’ll review the
limitations of “scale up” architecture and how “scale-out” architectures address those
When Scaling Up Breaks Down
Scaling up a storage array means adding additional capacity or performance to an
existing controller or existing set of Highly Available (HA) controllers. To use an
automotive analogy, you could add performance to your existing car by adding a cold
air intake or a forced induction system. Meanwhile, if you wanted to add capacity to
fit more stuff, you could add a rooftop box or a trailer. With data storage, the scaling up
process is the same. You can add more disk shelves to increase your capacity, and
there may be limited ways, like increasing memory or adding cache, to boost
performance, but traditional NAS arrays will always bump up against a ceiling.
Scale Up vs. Scale Out
This limit usually happens when storage administrators and engineers can still add
capacity, but there is effectively no way to increase performance. When controllers
with the same performance level become responsible for more and more capacity,
the performance per TiB ratio drops to the point where it may no longer be acceptable
for the application and business requirements.
For example, if we have an array controller capable of 10,000 IOPS and 5GBps,
doubling the storage capacity drops the performance per TiB in half:
The classic escape routes from this quandary require substantial compromises that
demand increasingly more expensive controllers to support larger and larger
capacities, driving the price per TiB ratios up. This approach may also create “array
sprawl”, where many independent arrays with siloed performance and capacity are
deployed which creates pockets of capacity separated from your arrays with
available performance. This design is fraught with inefficiency, operational
complexity, and bloat. This model can have disastrous effects for workloads whose
performance needs grow as the data set expands.
However, organizations can solve many of the problems of scale-up systems with a
“scale-out” storage system that scales performance alongside capacity increases.
Scale-out works by sharing a single file system across many nodes connected to a
dedicated back-end (BE) switching fabric. As you add nodes with additional capacity,
their compute and network capabilities are also added to the cluster, increasing
aggregate performance capabilities under a single namespace.
Scale-out storage is suitable for data-intensive workloads across industries, including
Advanced Driver Assistance Systems (ADAS), media and entertainment workflows,
Scientific Computing, and IoT.