R programmers want their analytics to just work–on extremely large data sets as nimbly as on small ones. R programmers want to concentrate on the analytics, not parallelism, data formatting and memory management.
Enter SciDB-R, which offers R programmers best-of-both-worlds benefits: it lets you use SciDB’s capacity for massively scalable, ad-hoc, complex analytics and it lets you access all these features from within a programming environment that is familiar and offers powerful graphing and visualization capabilities.
SciDB-R lets you remain an R programmer, but expands R’s power to include SciDB’s massive scale data management and analytical capabilities. With SciDB-R, you can do all of the following from inside an R program:
- Use SciDB as a storage back-end
- Use SciDB to offload large computations to a cluster
- Use SciDB to filter and join data before performing analytics
- Use SciDB to share data among multiple users, all with ACID guarantees
- Use SciDB to perform multidimensional windowing and aggregation
- Use SciDB’s massively scalable analytical capabilities, including statistical methods, correlation and dense and sparse linear algebra operations (e.g., truncated SVD)
Here is an example of R code that uses SciDB to perform calculations on a previously existing huge array.
library (" scidb ") scidbconnect () U = scidb ("Z") # U is an R representation # of SciDB array Z set. seed (1) x = cbind ( rnorm (5)) # An R column vector y = U %*% x # Computed by SciDB, # returning a SciDB array object y[, drop = FALSE ] # Return the computed result to R