Three “V”s of Big Data – Volume, Velocity, Variability

distributechIf your attended Distributech in San Diego this year, you couldn’t miss the huge Oracle banner draped across the Marriott hotel next to the convention center at Distributech 2012 his year.  Add to this a small Oracle city on the vendor floor, and it is clear that the major vendors see continuing heavy growth in the database applications for energy utilities.

But the buzz in Big Data is not about Oracle, or IBP or SAP.  A whole raft of upstarts are appearing, seemingly out of nowhere, offering to spin huge amounts of loosely structured data into operational and marketing gold.  Most of the new folks are built on one flavor or another of Hadoop, touting the NoSQL, massivlely parallel approach to grinding through large data sets. But does a migration to NoSQL database architecture make sense in a utility context?

Today’s relational databases running on modern hardware are extraordinarily capable, tried-and-true platforms.  They do what they were designed to do very well, and a well-designed RDB installation will handle all the terabytes you can throw at it.  Notice I said well-designed.  Additionally, for those already managing large datasets, staging on a relational database means there is no re-training overhead and less technical risk involved in launching new infrastructure.

But even the major RDB players are offering NoSQL products these days, so if the Big in Big Data isn’t just data volume, what is it, and when do you need it?  Consider the three “V”s of Big Data – Volume, Velocity and Variability.  If you are looking for an ad hoc way to combine loosely-structured data from a range of disparate sources in order to pick off the best candidates for an energy efficiency incentive program, the variability of data is high, both in structure and content.  If you are monitoring system conditions and real time data streams to send out energy alerts to customer smartphones, the velocity is high. These are natural applications for NoSQL infrastructure.

On the other hand, if the data set is large but structured. as in energy market and settlement data, operational or even meter data, then a well-designed relational database is the better solution.  An RDB is more well-suited to the types of financial analytics, reporting and other functions that this type of data supports, and because the structure is consistent, these operations can be optimized to provide excellent performance.  Layer on a configurable computational engine and you have a decision support system that can run a wide range of sophisticated analysis with responsive performance over huge data sets, while maintaining full SOX auditability and reproducibility.  As with so many things, it’s a matter of choosing the technology that is appropriate for the task.