Robust methods for Partial Least Squares Regression: methodological contributions and applications in environmental field
Camminatiello, Ida (2006) Robust methods for Partial Least Squares Regression: methodological contributions and applications in environmental field. [Tesi di dottorato] (Inedito)
Full text disponibile come:
Several epidemiological studies demonstrated short-term associations between high levels of pollution and increased acute mortality and morbidity. Vehicles emissions are an important source of environmental pollution, so it’s necessary to estimate pollution emissions caused by classes of vehicles in different situations (traffic, road, etc) in order to reduce environmental pollution. The analysis is based on a research developed by the Italian National Research Council (CNR), concerning the relationship between the pollutants produced by auto vehicles and the kinematics parameters, considering different traffic and road situations (driving cycles). The model, based on the vehicle dynamic equation, shows variables strongly correlated, missing data and few observations, so the most proper statistic methodology to analyse the data with these characteristics is the Partial Least Squares (PLS) regression. The results of the CNR analysis showed as the different driving cycles (traffic, road, etc) can produce outliers, because of the different kinematics variables generated. The aim of this thesis is to analyse the proposed model taking into account the outliers by applying a robust approach to the PLS regression. We proceed in the following way. In the first chapter we show that the presence of multicollinearity between the independent variables in regression analysis yields Ordinary Least Squares (OLS) inapplicable, so we have to use other technique, like Ridge Regression, Principal Component Regression, Latent Root regression Analysis, Partial Least Squares (PLS) regression. It’s has been stated that in a lot of cases PLS is the better solution. However the results are affected by outliers. In the second chapter we describe the most important robust methods for estimating the regression parameters and variance/covariance matrix. Unfortunately several affine equivariant estimators with high breakdown point can not be applied when the number of units is smaller than the number of variables. Therefore we propose an approach which combines “leave-one-out” methods and Singular Value Decomposition (SVD). We call this method SSVD. In the third chapter we show that both the algorithms for PLS regression: NIPALS and SIMPLS are affected by outliers. SIMPLS algorithm’s sensitivity to outliers is due to use of cross-covariance matrix between independent and dependent variables as well as and the use of least squares regressions. The NIPALS algorithm’s sensitivity to outliers is due to use of least squares regressions. There are two ways to solve the problem of outliers. The first is to use regression diagnostic to detect outliers. For the multivariate nature of the data, it can be very difficult to detect outliers. The second is to use a robust procedure for PLS regression. Several procedures have been proposed, but evidence of their use in the statistical literature is still scarce. A first class of robust alternatives for PLS regression involves the application of robust regression to the NIPALS algorithm. A second class includes methods which use a robust cross-covariance matrix and a robust regression method. We describe the different robust alternatives, their advantages and disadvantages, propose a robust approach and end with a simulation study. In the fourth chapter we apply some robust methods for PLS regression and our approach on environmental data of CNR in order to compare the results and show that our approach is a valid alternative in presence of multicollinearity and outliers.
Solo per gli Amministratori dell'archivio: edita il record