Albanese, Massimiliano
(2006)
Extracting and summarizing information from large data repositories.
[Tesi di dottorato]
(Unpublished)
Item Type: |
Tesi di dottorato
|
Resource language: |
English |
Title: |
Extracting and summarizing information from large data repositories |
Creators: |
Creators | Email |
---|
Albanese, Massimiliano | UNSPECIFIED |
|
Date: |
2006 |
Date type: |
Publication |
Number of Pages: |
132 |
Institution: |
Università degli Studi di Napoli Federico II |
Department: |
Informatica e sistemistica |
Dottorato: |
Ingegneria informatica ed automatica |
Ciclo di dottorato: |
18 |
Coordinatore del Corso di dottorato: |
nome | email |
---|
Cordella, Luigi Pietro | UNSPECIFIED |
|
Tutor: |
nome | email |
---|
Picariello, Antonio | UNSPECIFIED |
|
Date: |
2006 |
Number of Pages: |
132 |
Keywords: |
Information extraction, Information summarization, Automatic story creation |
Settori scientifico-disciplinari del MIUR: |
Area 09 - Ingegneria industriale e dell'informazione > ING-INF/04 - Automatica |
[error in script]
[error in script]
Date Deposited: |
28 Jul 2008 |
Last Modified: |
30 Apr 2014 19:23 |
URI: |
http://www.fedoa.unina.it/id/eprint/577 |
DOI: |
10.6092/UNINA/FEDOA/577 |
Collection description
Information retrieval from large data repositories has become an important area of computer science. Research in this field is highly encouraged by the ever-increasing rate with which today's society is able to produce digital data. Unfortunately most of such data (e.g. video recordings, plain text documents) are unstructured. Two major issues thus arise in this scenario: i) extracting structured data -- information -- from unstructured data; ii) summarizing information, i.e. reducing large volumes of information to a short summary or abstract comprising only themost essential facts.
In this thesis, techniques for extracting and summarizing information from large data repositories are presented. In particular the attention is focused onto two kinds of repositories: video data collections and natural language text document repositories. We show how the same principles can be applied for summarizing information in both domains and present solutions tailored to each domain. The thesis presents a novel video summarization algorithm, the Priority Curve Algorithm, that outperforms previous solutions, and three heuristic algorithms, OptStory+, GenStory and DynStory, for creating succinct stories about entities of interest using the information collected by algorithms that extract structured data from heterogenous data sources. In particular a Text Attribute Extraction (TAE) algorithm for extracting information from natural language text is presented. Experimental results show that our approach to summarization is promising.
Downloads per month over past year
Actions (login required)
|
View Item |