SciDB-R Integration

R programmers want their analytics to just work–on extremely large data sets as nimbly as on small ones. R programmers want to concentrate on the analytics, not parallelism, data formatting and memory management.

Enter SciDB-R, which offers R programmers best-of-both-worlds benefits: it lets you use SciDB’s capacity for massively scalable, ad-hoc, complex analytics and it lets you access all these features from within a programming environment that is familiar and offers powerful graphing and visualization capabilities.

SciDB-R lets you remain an R programmer, but expands R’s power to include SciDB’s massive scale data management and analytical capabilities. With SciDB-R, you can do all of the following from inside an R program:

  • Use SciDB as a storage back-end
  • Use SciDB to offload large computations to a cluster
  • Use SciDB to filter and join data before performing analytics
  • Use SciDB to share data among multiple users, all with ACID guarantees
  • Use SciDB to perform multidimensional windowing and aggregation
  • Use SciDB’s massively scalable analytical capabilities, including statistical methods, correlation and dense and sparse linear algebra operations (e.g., truncated SVD)

You can download the package from GitHub here. It is also available on The Comprehensive R Archive Network (“CRAN”).

Here is an example of R code that uses SciDB to perform calculations on a previously existing huge array.


library (" scidb ")
scidbconnect ()

U = scidb ("Z") # U is an R representation 
                # of SciDB array Z
set. seed (1)
x = cbind ( rnorm (5)) # An R column vector
y = U %*% x # Computed by SciDB, 
            # returning a SciDB array object
y[, drop = FALSE ] # Return the computed result to R