Robust methods for Partial Least Squares Regression: methodological contributions and applications in environmental field

Camminatiello, Ida (2006) Robust methods for Partial Least Squares Regression: methodological contributions and applications in environmental field. [Tesi di dottorato] (Inedito)

Full text disponibile come:

[img]PDF - Richiede un editor Pdf del tipo GSview, Xpdf o Adobe Acrobat Reader
1208Kb

Abstract

Several epidemiological studies demonstrated short-term associations between high levels of pollution and increased acute mortality and morbidity. Vehicles emissions are an important source of environmental pollution, so it’s necessary to estimate pollution emissions caused by classes of vehicles in different situations (traffic, road, etc) in order to reduce environmental pollution. The analysis is based on a research developed by the Italian National Research Council (CNR), concerning the relationship between the pollutants produced by auto vehicles and the kinematics parameters, considering different traffic and road situations (driving cycles). The model, based on the vehicle dynamic equation, shows variables strongly correlated, missing data and few observations, so the most proper statistic methodology to analyse the data with these characteristics is the Partial Least Squares (PLS) regression. The results of the CNR analysis showed as the different driving cycles (traffic, road, etc) can produce outliers, because of the different kinematics variables generated. The aim of this thesis is to analyse the proposed model taking into account the outliers by applying a robust approach to the PLS regression. We proceed in the following way. In the first chapter we show that the presence of multicollinearity between the independent variables in regression analysis yields Ordinary Least Squares (OLS) inapplicable, so we have to use other technique, like Ridge Regression, Principal Component Regression, Latent Root regression Analysis, Partial Least Squares (PLS) regression. It’s has been stated that in a lot of cases PLS is the better solution. However the results are affected by outliers. In the second chapter we describe the most important robust methods for estimating the regression parameters and variance/covariance matrix. Unfortunately several affine equivariant estimators with high breakdown point can not be applied when the number of units is smaller than the number of variables. Therefore we propose an approach which combines “leave-one-out” methods and Singular Value Decomposition (SVD). We call this method SSVD. In the third chapter we show that both the algorithms for PLS regression: NIPALS and SIMPLS are affected by outliers. SIMPLS algorithm’s sensitivity to outliers is due to use of cross-covariance matrix between independent and dependent variables as well as and the use of least squares regressions. The NIPALS algorithm’s sensitivity to outliers is due to use of least squares regressions. There are two ways to solve the problem of outliers. The first is to use regression diagnostic to detect outliers. For the multivariate nature of the data, it can be very difficult to detect outliers. The second is to use a robust procedure for PLS regression. Several procedures have been proposed, but evidence of their use in the statistical literature is still scarce. A first class of robust alternatives for PLS regression involves the application of robust regression to the NIPALS algorithm. A second class includes methods which use a robust cross-covariance matrix and a robust regression method. We describe the different robust alternatives, their advantages and disadvantages, propose a robust approach and end with a simulation study. In the fourth chapter we apply some robust methods for PLS regression and our approach on environmental data of CNR in order to compare the results and show that our approach is a valid alternative in presence of multicollinearity and outliers.

Tipologia di documento:Tesi di dottorato
Parole chiave:Partial Least Squares, Robust statistics, environmental pollution
Settori scientifico-disciplinari MIUR:Area 13 Scienze economiche e statistiche > SECS-S/01 STATISTICA
Coordinatori della Scuola di dottorato:
Coordinatore del Corso di dottoratoe-mail (se nota)
Lauro, Natale Carlo
Tutor della Scuola di dottorato:
Tutor del Corso di dottoratoe-mail (se nota)
D’Ambra, Luigi
Stato del full text:Accessibile
Data:2006
Numero di pagine:130
Istituzione:Università degli Studi di Napoli Federico II
Dipartimento o Struttura:Matematica e Statistica
Tipo di tesi:Dottorato
Stato dell'Eprint:Inedito
Denominazione del dottorato:Statistica
Ciclo di dottorato:XVIII
Numero di sistema:593
Depositato il:30 Luglio 2008
Ultima modifica:04 Febbraio 2009 09:38

Solo per gli Amministratori dell'archivio: edita il record