Strumenti Utente

Strumenti Sito


dm:abstract.orlandi

Succinct data structures are theoretical and practical data structures based on compressed data. Their main goal is to obtain data compression close to the best and add little extra data to support operations on the underlying data while keeping them in compressed format. They lie at the base of more complex data structures that find application in massive data sets analysis, text and web indexing, and more.

The field has been active since the 80s and yet there is room for improvement even in the most basic data structures, which handle binary or arbitrary alphabet strings. In particular, I will present a dissertation of different results.

After dividing succinct data structures in non-systematic (those who alter the original representation of the data and exploit such representation for smaller size footrprint) and systematic (those who keep the original data in raw format), we will discuss theoretical results about non-systematic succinct data structures over binary strings, where we improve the space footprint at hand. On the side of systematic data structures we will present both lower and upper bounds regarding strings, providing novel techniques in both directions.

Finally, I will describe two applications (approximated substring counting and simulated web crawling) that prove how a careful implementation of succinct data structures can lead to competitive results even on large scale applications.

dm/abstract.orlandi.txt · Ultima modifica: 11/01/2011 alle 23:27 (9 anni fa) da Fosca Giannotti