Albanese, Massimiliano (2006) Extracting and summarizing information from large data repositories. [Tesi di dottorato] (Inedito)

[img]
Anteprima
PDF
Tesi_MASSIMILIANO_ALBANESE.pdf

Download (1MB) | Anteprima
Tipologia del documento: Tesi di dottorato
Lingua: English
Titolo: Extracting and summarizing information from large data repositories
Autori:
AutoreEmail
Albanese, Massimiliano[non definito]
Data: 2006
Tipo di data: Pubblicazione
Numero di pagine: 132
Istituzione: Università degli Studi di Napoli Federico II
Dipartimento: Informatica e sistemistica
Dottorato: Ingegneria informatica ed automatica
Ciclo di dottorato: 18
Coordinatore del Corso di dottorato:
nomeemail
Cordella, Luigi Pietro[non definito]
Tutor:
nomeemail
Picariello, Antonio[non definito]
Data: 2006
Numero di pagine: 132
Parole chiave: Information extraction, Information summarization, Automatic story creation
Settori scientifico-disciplinari del MIUR: Area 09 - Ingegneria industriale e dell'informazione > ING-INF/04 - Automatica
Depositato il: 28 Lug 2008
Ultima modifica: 30 Apr 2014 19:23
URI: http://www.fedoa.unina.it/id/eprint/577
DOI: 10.6092/UNINA/FEDOA/577

Abstract

Information retrieval from large data repositories has become an important area of computer science. Research in this field is highly encouraged by the ever-increasing rate with which today's society is able to produce digital data. Unfortunately most of such data (e.g. video recordings, plain text documents) are unstructured. Two major issues thus arise in this scenario: i) extracting structured data -- information -- from unstructured data; ii) summarizing information, i.e. reducing large volumes of information to a short summary or abstract comprising only themost essential facts. In this thesis, techniques for extracting and summarizing information from large data repositories are presented. In particular the attention is focused onto two kinds of repositories: video data collections and natural language text document repositories. We show how the same principles can be applied for summarizing information in both domains and present solutions tailored to each domain. The thesis presents a novel video summarization algorithm, the Priority Curve Algorithm, that outperforms previous solutions, and three heuristic algorithms, OptStory+, GenStory and DynStory, for creating succinct stories about entities of interest using the information collected by algorithms that extract structured data from heterogenous data sources. In particular a Text Attribute Extraction (TAE) algorithm for extracting information from natural language text is presented. Experimental results show that our approach to summarization is promising.

Actions (login required)

Modifica documento Modifica documento