Many organizations have realized significant competitive advantages with Big Data, but could do even more with new software paradigms. The financial industry in particular experiences limitations because it depends on complex analytics—particularly matrix math—which most Big Data architectures cannot accommodate readily.
- ETL gets in the way of interactive, exploratory analytics.Moving Big Data is burdensome. Analytics solutions that separate the storage engine from the analytics engine are impractical for Big Data. But many approaches to Big Data complex analytics require exactly that: specialized math software external to the data warehouse. Putting ETL in the loop slows analysts down. Interactive, exploratory, “big math” ought to be painless.
- In-memory solutions don’t scale for complex analytics.Big Data connotes data sets far larger than a single machine’s memory. Although some “embarrassingly parallel” problems decompose into multiple smaller independent problems that can be distributed across a cluster, many complex analyses needed by financial institutions don’t. Even if your data does fit on one machine, performance is limited by the number of cores you have. Analytics ought to scale past limitations of a machine’s memory or number of cores—up to as many machines as you have available.
- Hadoop doesn’t do complex math.Hadoop and databases with embedded MapReduce get plenty of buzz, but they’re challenged by complex analytics that are not embarrassingly parallel. Plus, these architectures often require a lot of low-level coding, turning data scientists into computer scientists. Big Data chores ought to be invisible and automatic.
- Quant-friendly languages are demoted.Typical Big Data solutions don’t let quants and data scientists develop analytical solutions in languages they prefer like R and Python. Analytics solutions should promote collaboration and capitalize on contemporary analysis tools.
Analysts want to explore data regardless of its size, iterate rapidly to build models using complex analytic approaches and based on all available data, and deploy them. Ask your Big Data vendors if their infrastructure supports these objectives. Here are some examples of awesome things you should be able to do with a Big Data exploratory analytics database.
- Build the ARCA book for one day of all exchange-traded US equities (186 million quotes) in 80 seconds on a 32-instance commodity hardware cluster. Run it in about half the time on a cluster twice as large.
- Run a Principle Components Analysis on a 50M x 50M sparse matrix in minutes.
- Select data sets (based on complex criteria) in constant time—irrespective of how big your dataset gets.
If you’d like to learn more, we recently published a paper describing several financial calculations that require a flexible, extensible and scalable infrastructure and citing specific methods to solve these problems.
Check it out: