In a recent blog post, we mentioned three key insights gained from Paradigm4’s survey of data scientists. Surprisingly (or not) we discovered that incorporating diverse data types into analytical workflows is a major pain point for data scientists using traditional database software or file systems. While lots of media attention has been devoted to technologies for managing unstructured data – think videos, emails and social media content – far less has been written about the challenges of dealing with structured data that doesn’t fit neatly in a relational database.
What types of structured data are we talking about? Time-based data, including stock trades and quotes, sensor data collected from factories, utilities and vehicles, images captured by satellites & surveillance cameras, and DNA sequences generated in research labs are just a few examples. Among other drivers, the nascent Internet of Things (IoT) promises to generate mind-boggling amounts of exactly this type of data in the near future.
In the spirit of not “leaving data on the table” data scientists want to harness this in their organizations to drive innovation, profits and insights. But if the cost of storing and extracting this data from ill-fitting relational architectures proves too difficult or time consuming, data scientists will forsake this potentially high-value data. By empowering data scientists with the right tools, organizations can manage and analyze this data far more effectively than building unwieldy workarounds that sit on top of relational or file-based storage systems.
At Paradigm4, we’ve rejected this “square peg in a round hole” philosophy. Instead, we built SciDB from the ground up to help data scientists store, manage and analyze the growing diversity of data found throughout their organizations. SciDB’s native array model leverages the existing structure and dimensionality of data, resulting in lightning-fast joins and windowing operations and the ability to perform complex analytics directly in-database. But SciDB is also ACID compliant to ensure transactional integrity, and massively scalable, letting data scientists focus on analytics instead of hardware management.
If incorporating diverse data types into your analytical workflows is proving to be a major pain point, give SciDB a look. After all, everyone knows how that square peg/round hole thing inevitably turns out.