Argonne National Laboratory Mathematics and Computer Science Division
Argonne Home > MCS Division > Seminar & Events

Seminars & Events

Bookmark and Share

Mathematics and Computer Science Division
"Processing Sliding Window Joins Over High Speed Data Streams"

DATE: December 19, 2011
TIME: 1:30 PM - 2:30 PM
SPEAKER: Abhirup Chakraborty, Postdoc Interviewee
LOCATION: Building 240, Seminar Room 4301, Argonne National Laboratory
HOST: Raj Kettimuthu

Description:
In a growing number of information-processing applications, such as network-traffic monitoring, sensor networks, financial analysis, and data mining for e-commerce, data takes the form of continuous streams rather than traditional stored database tuples. These applications have some common features, such as, they require real time analysis, they possess huge volumes of data, and they suffer from unpredictable and bursty arrivals of data elements. In such applications, processing queries over data streams by first loading them into a traditional database management system (DBMS) or into main memory is infeasible. High speed data streams along with a large number of simultaneous continuous queries lead to resource limitations.

In this seminar, I outline the challenges of processing join queries over data streams, and present a few algorithms that generate exact results for the join queries incorporating secondary storages and non-dedicated computers.
The proposed techniques exploit the high bandwidth of a disk subsystem by rendering the data access pattern largely sequential, eliminating small, random disk accesses. I present an I/O-efficient algorithm to process hybrid join queries, that join a fast, time varying or bursty data stream and a persistent disk relation. Such a hybrid join is the crux of a number of common transformations in an active data warehouse. The algorithm reduces the response time in output results by exploiting spatio-temporal locality within the input stream, and minimizes disk overhead through disk-I/O amortization.

Lastly, I present a mechanism to distribute the loads of a stream join operator across a shared nothing system. The algorithm uses a fixed or predefined communication pattern, and dynamically maintains the degree of declustering to minimize communication and processing overheads. Experimental results show the efficacy of the proposed algorithms.


Save the event to your calendar [schedule.ics]


The Office of Advanced Scientific Computing Research | UChicago Argonne LLC | Privacy & Security Notice | ContactUs