Albanese, Massimiliano (2006) Extracting and summarizing information from large data repositories. [Tesi di dottorato] (Unpublished)

[img]
Preview
PDF
Tesi_MASSIMILIANO_ALBANESE.pdf

Download (1MB) | Preview
Item Type: Tesi di dottorato
Language: English
Title: Extracting and summarizing information from large data repositories
Creators:
CreatorsEmail
Albanese, MassimilianoUNSPECIFIED
Date: 2006
Date Type: Publication
Number of Pages: 132
Institution: Università degli Studi di Napoli Federico II
Department: Informatica e sistemistica
PHD name: Ingegneria informatica ed automatica
PHD cycle: 18
PHD Coordinator:
nameemail
Cordella, Luigi PietroUNSPECIFIED
Tutor:
nameemail
Picariello, AntonioUNSPECIFIED
Date: 2006
Number of Pages: 132
Uncontrolled Keywords: Information extraction, Information summarization, Automatic story creation
MIUR S.S.D.: Area 09 - Ingegneria industriale e dell'informazione > ING-INF/04 - Automatica
Date Deposited: 28 Jul 2008
Last Modified: 30 Apr 2014 19:23
URI: http://www.fedoa.unina.it/id/eprint/577

Abstract

Information retrieval from large data repositories has become an important area of computer science. Research in this field is highly encouraged by the ever-increasing rate with which today's society is able to produce digital data. Unfortunately most of such data (e.g. video recordings, plain text documents) are unstructured. Two major issues thus arise in this scenario: i) extracting structured data -- information -- from unstructured data; ii) summarizing information, i.e. reducing large volumes of information to a short summary or abstract comprising only themost essential facts. In this thesis, techniques for extracting and summarizing information from large data repositories are presented. In particular the attention is focused onto two kinds of repositories: video data collections and natural language text document repositories. We show how the same principles can be applied for summarizing information in both domains and present solutions tailored to each domain. The thesis presents a novel video summarization algorithm, the Priority Curve Algorithm, that outperforms previous solutions, and three heuristic algorithms, OptStory+, GenStory and DynStory, for creating succinct stories about entities of interest using the information collected by algorithms that extract structured data from heterogenous data sources. In particular a Text Attribute Extraction (TAE) algorithm for extracting information from natural language text is presented. Experimental results show that our approach to summarization is promising.

Actions (login required)

View Item View Item