Don’t Leave Data on the Table
Array DBMS Supports Complex Analytics at Scale
How much does your data cost? Are you using it all? The edge goes to those who find patterns and exploit signals only discernible in big datasets. So why are you using only some of your data? Odds are your analytic software and legacy processes constrain you. SciDB gives you the flexibility to store and use it all.
- SciDB is an array database management system. Arrays are the natural way to organize, store, and retrieve ordered or multifaceted data.
- Arrays have an unfair speed advantage for multidimensional selections and joins.
- Select any two dimensions of your array and you have a matrix—represented in exactly the format you need to run complex analytics that drive predictive models.
- SciDB uses a distributed massively parallel processing architecture that lets you store and access as much as you need by scaling out on commodity hardware.
- SciDB allows you to do complex math directly in the database. Embedded complex math means you won’t be waiting to export your data to a math software package, and you’ll never have to select a subset of data to fit in memory.
Store it all, use it all, with SciDB.
Array Data Model
Geo-spatial data, scientific data, financial feeds, sensor data, sequencing data, time-series data, and other highly faceted data do not fit neatly or efficiently into tables, the data model used in relational databases. SciDB’s native multi-dimensional array data model is designed from the ground up for ordered, highly dimensional, multifaceted data. And data is never overwritten, allowing you to record and access data corrections and updates over time.
Dramatic Storage and Operational Benefits
SciDB’s array data model provides dramatic storage efficiencies as the number of dimensions and attributes grows. Math operations run directly on the native data format. By partitioning data in N dimensions, not just 1 dimension, semantically related data can be efficiently accessed, speeding up clustering, array operations and population selection.
Distributed MPP Architecture
Get cost-effective scaling of data management and analytics with SciDB’s shared-nothing, massively parallel processing (MPP) architecture. Scale out on 10s to 1000s of commodity-hardware nodes in a cloud or on-premise. No need for big and expensive high-performance computers or costly database appliances. Hit the memory limit on a scale up architecture and you’ll need a new system. With SciDB, just add more nodes.
SciDB moves analytics to the data, eliminating time intensive ETL processes. Because SciDB stores data in a native array format, linear algebra operations like covariance and SVD are accelerated by 10-100X. SciDB is built to perform parallel linear algebra on commodity hardware clusters, so analytical workflows scale to 100s of billions of data elements. SciDB eliminates the tedium of manual data distribution and lets you leverage existing R and Python code. Shorten your “ask-to-answer” loops with SciDB’s scalable, cost-effective in-database analytics.
Open source software reduces your costs. SciDB Community Edition is provided under an open-source license. Developers can implement custom operators, aggregates and other extensions to the SciDB codebase. SciDB runs on existing commodity hardware or in the cloud, delivering cost-effective analytics without the need for expensive appliances or high performance computers.