A lot of people are asking us why we commissioned our recently completed Data Scientist survey. We wanted to separate hype from reality. And who better to ask than the practitioners who have to wrest value from big data every day? Turns out these value wranglers are struggling with the differences between hype and reality in a few key areas.
1. While Hadoop has garnered widespread media coverage, it has limitations that are not actively covered in the press. Hadoop is well suited for embarrassingly-parallel problems but falls short for large-scale complex analytics (which BTW the survey pointed out, most users are already doing this or will be in the next two years).
Paradigm4 handles the problems that Hadoop can’t. Our customers build recommendation engines, find genetic anomalies, study climate change, and surface interesting trends in financial networks across petabytes of unimaginably diverse data.
2. The traditional approach to analytic infrastructure (database separate from analytics engine) is broken. For complex analytics, data scientists are forced to move large volumes of data from existing data stores to dedicated mathematical and statistical computing software. This time-consuming and coding-intensive step adds no analytical value and impedes productivity.
Because our analytical database brings the analytics to the data (instead of bass-ackwards) we save busy data scientists from worrying about low-value, error prone and time consuming ‘extract, transform, and load’ processing. The result: data scientists spend more time exploring, analyzing and modeling their data; and produce valuable insights faster.
3. Incorporating the diverse data types into analytical workflows is a major pain point for data scientists using traditional database software or file systems.
Relational databases and Hadoop are great at what they were designed to do. But big- and diverse-data demand a new data model. With Paradigm4’s native array data model, incorporating disparate and difficult data sources is a snap because (i) the array data model is flexible enough to accommodate disparate data, (ii) ordered storage makes joining data along dimensions of an array very fast, and (iii) fast regrid and window capabilities let users very easily work with data having disparate time or space granularity.
Read the survey to find out more. About the Survey The Paradigm4 Data Scientist Survey was fielded by Innovation Enterprise, an independent research firm, from March 27 to April 23, 2014. The responses were generated from a survey of 111 data scientists in the U.S.