Python Integration

Python programmers want their analytics to just work–on extremely large datasets as nimbly as on small ones. Python programmers want to concentrate on the analytics, not parallelism, data formatting and memory management. Enter SciDB-Py, which offers Python programmers best-of-both-worlds benefits: it lets you use SciDB’s capacity for massively scalable, ad-hoc, complex analytics and it lets you access all these features from within a programming environment that is familiar and powerful in its own right. SciDB-Py lets you remain a Python programmer, but expands Python’s power to include SciDB’s massive-scale data management and analytical capabilities.

With SciDB-Py, you can do all of the following from inside a Python program:

  • Use SciDB as a storage back-end
  • Use SciDB to offload large computations to a cluster
  • Use SciDB to filter and join data before performing analytics
  • Use SciDB to share data among multiple users, all with ACID guarantees
  • Use SciDB to perform multidimensional windowing and aggregation
  • Use SciDB’s massively scalable analytical capabilities, including statistical methods, correlation, and dense and sparse linear algebra operations (e.g., truncated SVD)

You can download the package from GitHub here.