Don’t Leave Data on the Table
Array DBMS Supports Complex Analytics at Scale
Parallel processing without parallel programming, frees data scientists to analyze more and program less.
How much does your data cost? Are you using it all? The edge goes to those who find patterns and exploit signals only discernible in big datasets. So why are you using only some of your data? Odds are your analytic software and legacy processes constrain you—either forcing you to drop some data or undertake computer science projects to code parallel processing.
SciDB gives you the flexibility to store and use it all.
- SciDB is an array database management system. Arrays are the natural way to organize, store, and retrieve ordered or multifaceted data.
- Arrays have an unfair speed advantage for multidimensional selections and joins.
- Select any two dimensions of your array and you have a matrix—represented in exactly the format you need to run complex analytics that drive predictive models.
- SciDB uses a distributed massively parallel processing architecture that lets you store and access as much as you need by scaling out on commodity hardware.
- SciDB allows you to do complex math directly in the database. Embedded complex math means you won’t be waiting to export your data to a math software package, and you’ll never have to select a subset of data to fit in memory.
Store it all, use it all, with SciDB.
Array Data Model
Geo-spatial data, scientific data, financial feeds, sensor data, sequencing data, time-series data, and other highly faceted data do not fit neatly or efficiently into tables, the data model used in relational databases. SciDB’s native multi-dimensional array data model is designed from the ground up for ordered, highly dimensional, multifaceted data. And data is never overwritten, allowing you to record and access data corrections and updates over time. Dramatic Storage and Operational Benefits stem from the array data model. SciDB is designed to efficiently handle both dense and sparse arrays providing dramatic storage efficiencies as the number of dimensions and attributes grows. Math operations run directly on the native data format. Partitioning data in each coordinate of an array facilitates fast joins and access along any dimension, thereby speeding up clustering, array operations and population selection.
Need to support concurrent users, reads, and writes? That’s what database management systems were built to do. Try that with files and you are looking for trouble—forked files, corrupted data, inconsistent results. Databases solve these problems with ACID technology—so you can curate once and analyze many. ACID guarantees that transactions are all or nothing (Atomicity), all users see the same valid data (Consistency), transactions don’t interfere with each other (Isolation), and data will never be lost (Durability). ACID guarantees make for repeatable results. SciDB combines full ACID guarantees with versioned, no overwrite array storage. When using versioned arrays, write transactions in SciDB create new versions of the array rather than modifying pre-existing versions.
Distributed MPP Architecture
Get cost-effective scaling of data management and analytics with SciDB’s shared-nothing, massively parallel processing (MPP) architecture. Scale out on 10s to 1000s of commodity-hardware nodes in a cloud or on-premise. No need for big and expensive high-performance computers or costly database appliances. Hit the memory limit on a scale up architecture and you’ll need a new system. With SciDB, just add more nodes.
SciDB moves analytics to the data, eliminating time-intensive ETL processes. Arrays are the natural way to store data for linear algebra operations (like SVD and covariance) which saves time moving, organizing, and preparing data for matrix math. SciDB then performs massively parallel linear algebra—without parallel programming—on commodity hardware clusters. The result: analytical workflows scale to 100s of billions of data elements without turning the analysis into a programming science project. SciDB eliminates the tedium of manual data distribution and lets you leverage R and Python coding skills. Program faster, and shorten your “ask-to-answer” loops with SciDB’s scalable, cost-effective in-database analytics.
Open source software reduces your costs. SciDB Community Edition is provided under an open-source license. Developers can implement custom operators, aggregates and other extensions to the SciDB codebase. SciDB runs on existing commodity hardware or in the cloud, delivering cost-effective analytics without the need for expensive appliances or high performance computers.
Scale out by adding commodity hardware nodes to your cluster. Commodity hardware means affordability and flexibility. Proprietary hardware locks you in to a vendor and comes at high cost of ownership. SciDB lets you leverage the industry’s best price/performance available.
Programmable from R & Python
Leverage existing workforce skill sets rather than learning new development tools. Analysts and developers can harness the power of SciDB arrays and distributed processing from familiar R and Python interfaces.