Big Data: What's the Big Deal?

The value of holding data has been recognised since the dawn of commerce when merchants inscribed key business data onto tablets. That analysing business data can provide insights and unlock revenue is not new thinking, so what is the big deal with Big Data?

Big data’s disruptive value is created by the advent of high performance, digital data capture. These technologies have made data collection virtually free. Internet-based marketing systems capture information about consumer preferences as we search and buy. Digital still and video cameras can be emptied repeatedly and re-used, more data captured per event. The availability of computer power to manipulate this data enables companies to utilize multi-petabyte databases to analyze customer patterns. The resulting content is stored forever as reanalysed; those terabytes per day of oilfield 3D seismic data might become next decade’s oil find; today’s genomic profile might deliver tomorrow’s cure for cancer.

The result of this is the growth of enormous stores of unstructured data. The challenge now for data managers is to make this level of data storage affordably scalable, accessible for analysis and durable so long term value can be realised.

However, like a cavalry, a new set of “wide area storage” solutions based on second generation object storage are arriving to help manage these issues. Object storage is like a valet system. The creator of a piece of data (an object) hands it to the storage in exchange for an object identifier (the ‘parking ticket’). When data is required, the user hands the system the ’ticket’ and data is returned. The power of this model is that it is highly scalable; many objects can be stored and retrieved in parallel and retrieval can be independent of the original application. Any application or user with authorized access can use the data. Historically, this technology was limited by performance of the object storage “box.”

The new generation is different – it can copy and disperse objects efficiently across large numbers of independent elements, which can be distributed providing resilience without traditional replication. The resulting capabilities are perfectly suited to the challenges of Big Data: almost infinite scale, low cost per petabyte, affordable 100 percent online storage, global access and a content store that lives forever without disruptive data migration.

How is this possible? Firstly, the underlying object storage is natively scalable; unlike systems with centralized indices, to add data, you simply add more objects, which are then dispersed across scaled out components to add access, performance or storage. If you need more storage, or bandwidth, you add more pre-packaged storage, processors or network capacity; the system takes care of the rest. Second, wide area storage systems are built assuming that individual components will fail. Because objects are copied and dispersed across resources, components can fail while data is continuously available. This “failure-assumed” model enables wide area storage to use lower cost disk and processor technologies, translating reducing capital cost compared with traditional storage. The ability to defer replacement of failed components also delivers lower running costs. Plus sharing the system means the overhead of disaster protection is shared by everyone. Thirdly, cloud-like access models embed capability for “wide” geographic access, supporting a wide range of uses – from streaming data to parallel processing.

Finally, and most intriguingly, object storage offers the capability for content, once stored, to never require migration. For any technology manager who has endured the pain of migration to the next big thing, the appeal of this is strong.