Current projects


Ensemble is a National Science Digital Library project for computing education communities to share and discuss digital educational materials through a distributed portal to existing collections, maintaining curation at each collection.



The focus of this project is hybrid stream-archive query processing. Data stream processing is interesting and has been well studied; however, to take full advantage of data streams, one must be able to combine and correlate information in those streams with archived historical data. This project aims to find efficient mechanisms for combining stream data with 'similar' data from the archive. Techniques being investigated include porthole queries and adaptive access.



We believe that data management systems of the future must stress data movement over data storage. NIAGARA is an initial effort, conducted with the University of Wisconsin, to move beyond disk-centric data management to net-centric systems. It emphasizes handling richer structures of data, such as XML, and incremental, stream-based processing of data.



This projects looks at streams containing missing data and will explore various mechanisms to deal with them. As a motivating scenario, we collaborate with the PORTAL ADUS and look at traffic data streams from Inductlive Loop Detectors installed on the the Portland, Oregon Metropolitan Area Freeway system.



The Superimposed Pluggable Architecture for Contexts and Excerpts (SPARCE) is a middleware architecture for superimposed information management.

Exploiting the User Interface for Data Integration in Effectiveness Research


This projects looks at the roles of the database developer and the data analyst, providing tools that allow database developers to losslessly transform databases in a way that then allows analysts to make informed integration decisions.

FDD: Full Disclosure of Data Preparation and Use


In this project, we explore the way in which data users (e.g., analysts who work with various datasets as part of their job) clean, transform, and integrate their data with a particular focus on whether or not they document their steps. The project will interview data users from several application domains to determine whether they document their steps (as they work their datasets) and whether they believe the documentation is useful.



Agrios integrates R and SciDB, facilitating rapid analyses of large datasets by automatically managing data movement between the two systems.