There has been a substantial increase in the amount
of digital data collected over the last several decades. Together with
this data comes the desire to transform it, through analysis, into
useful information. Traditional analytic tools used for analysis
fail when the size of the data grows large, and general-purpose
database systems used to manage large collections of data cannot
perform the sophisticated analyses required. The analyst who
wants to work with large datasets faces a dilemma when it comes to tool
Our work resolves this tension with a hybrid
strategy that integrates R and SciDB: Agrios. R is a
powerful data analysis tool, and SciDB is an array database management
system. Our integration focuses on the automated movement of data
between the two systems, in an effort to improve performance.
Contributions include semantic mappings between the two languages, a
cost-based interaction model, a start-to-finish system implementation,
and test results quantifying the performance of the hybrid approach.
- Learn more about SciDB, and the research behind it.
- Learn more about R.
- Paradigm4 is a commercial system utilizing SciDB. They offer an R-to-SciDB connector, as well.
Patrick Leyshock, PhD candidate in Computer Science, Portland State University
David Maier, Maseeh Professor of Emerging Technologies, Portland State University
Kristin Tufte, Research Assistant Professor, Portland State University
My doctoral dissertation: "Optimizing Data Movement in Hybrid Analytic Systems".
"Minimizing Data Movement through Query Transformation", from the 2014 International Conference on Big Data.
"Data Movement in Hybrid Analytic Systems: A Case for Automation", from the 2014 International Conference on Scientific and Statistical Database Management.
"Agrios: A Hybrid Approach to Big Array Analytics", from the IEEE's 2013 International Conference on Big Data, and the accompanying slides.
Poster from Intel's "Big Data" ISTC meeting, January 2013.
My PhD dissertation proposal from July 2012, and the accompanying slides.
Abstract of lightning talk and poster presentation from XLDB 2012.
National Science Foundation Award #1110917, and Intel's Big Data Science and Technology Center