Extracting and summarizing information from large data repositories

Albanese, Massimiliano (2006) Extracting and summarizing information from large data repositories. [Tesi di dottorato] (Inedito)

Full text disponibile come:

[img]PDF - Richiede un editor Pdf del tipo GSview, Xpdf o Adobe Acrobat Reader
1260Kb

Abstract

Information retrieval from large data repositories has become an important area of computer science. Research in this field is highly encouraged by the ever-increasing rate with which today's society is able to produce digital data. Unfortunately most of such data (e.g. video recordings, plain text documents) are unstructured. Two major issues thus arise in this scenario: i) extracting structured data -- information -- from unstructured data; ii) summarizing information, i.e. reducing large volumes of information to a short summary or abstract comprising only themost essential facts. In this thesis, techniques for extracting and summarizing information from large data repositories are presented. In particular the attention is focused onto two kinds of repositories: video data collections and natural language text document repositories. We show how the same principles can be applied for summarizing information in both domains and present solutions tailored to each domain. The thesis presents a novel video summarization algorithm, the Priority Curve Algorithm, that outperforms previous solutions, and three heuristic algorithms, OptStory+, GenStory and DynStory, for creating succinct stories about entities of interest using the information collected by algorithms that extract structured data from heterogenous data sources. In particular a Text Attribute Extraction (TAE) algorithm for extracting information from natural language text is presented. Experimental results show that our approach to summarization is promising.

Tipologia di documento:Tesi di dottorato
Parole chiave:Information extraction, Information summarization, Automatic story creation
Settori scientifico-disciplinari MIUR:Area 09 Ingegneria industriale e dell'informazione > ING-INF/04 AUTOMATICA
Coordinatori della Scuola di dottorato:
Coordinatore del Corso di dottoratoe-mail (se nota)
Cordella, Luigi Pietro
Tutor della Scuola di dottorato:
Tutor del Corso di dottoratoe-mail (se nota)
Picariello, Antonio
Stato del full text:Accessibile
Data:2006
Numero di pagine:132
Istituzione:Università degli Studi di Napoli Federico II
Dipartimento o Struttura:Informatica e Sistemistica
Tipo di tesi:Dottorato
Stato dell'Eprint:Inedito
Denominazione del dottorato:Ingegneria informatica ed automatica
Ciclo di dottorato:XVIII
Numero di sistema:577
Depositato il:28 Luglio 2008
Ultima modifica:04 Febbraio 2009 09:38

Solo per gli Amministratori dell'archivio: edita il record