The Density Valued Data Analysis in a Temporal Framework: The Data Model Approach.
[Tesi di dottorato]
High Frequency Data are data characterized by an overwhelming number of observations in the period of reference, often a single day. Typically, these data are synthesized by their average or by the variation of the observed values in terms of the upper and lower values (or suitable quantiles). Usually, this interval or range provides interesting information on the data for the representation of the data variability. Recently, histograms and boxplots have been employed in order to obtain a more informative representation of high frequency data. Anomalies and casual or systematic errors can affect such high frequency data representation and consequent interpretation and use. In order to face such problems assuming the classical decomposition of data as the sum of a model plus an error, we propose to represent intra-period high frequency data by density models such as the beanplots, based on a suitable mixture of distributions. The location, size and shape of such models are summarized in the estimated model coefficients and visualized by means of classical beanplot silhouettes. On this modeling based approach we build a beanplots time series consisting of a vectorial time series whose elements are the estimated coefficients of each bean plot. In this way we can solve the problem of the storage of high frequency data through few coefficients: in fact, only one beanplot and the generating matrix are required. But the main advantage of using this kind of representation and the corresponding visualization is in their capacity to highlight anomalies or anticipate structural pattern changes in a beanplot time series, as well as to provide useful tools for short period forecasting. In this respect, it is fruitful to use multivariate control chart techniques to provide signals of anomalous observations or early warnings for structural changes. At the same time, these models are useful to study the evolution in the mid and long run by considering classical approaches developed for multivariate time series or approaches based on a time series factor analysis for multivariate successions of vectors of coefficients. These modelizations of single or multiple beanplot time series over the chosen period interval are also useful in forecasting problems. In the case of multiple beanplot time series based on different sets of high frequency data observed simultaneously, or of the same set observed in different occasions, cluster analysis methods can be used to search for suitable prototypes in building composite indicators or to discover homogeneous (and contiguous) time segments corresponding to pattern changes. The tools considered through this thesis are useful in various financial applications such as Trading, Stock Picking, Statistical Arbitrage and Risk Management.
The Thesis is structured as follows:
Chapter 1 The Analysis of Massive Data Sets
Chapter 2 Complex Data in a Temporal Framework
Chapter 3 Foundations of Interval Data Representations
Chapter 4 Foundations of Boxplot and Histogram Data Representations
Chapter 5 Foundations of Density Valued Data: Representations
Chapter 6 Visualization and Exploratory Analysis of Beanplot Data
Chapter 7 Beanplot Modelling
Chapter 8 Beanplot Time Series Forecasting
Chapter 9 Beanplot Time Series Clustering
Chapter 10 Beanplot Model Evaluation
Chapter 11 Case Studies: Market Monitoring, Asset Allocation, Statistical Arbitrage and Risk Management
The Thesis is accompanied by a library of programs in R built on the presented methods.
Actions (login required)