Rella Riccardi, Maria (2022) Econometric methods and machine learning algorithms to investigate factors contributing to pedestrian crash severity. [Tesi di dottorato]


Download (26MB) | Preview
[error in script] [error in script]
Item Type: Tesi di dottorato
Resource language: English
Title: Econometric methods and machine learning algorithms to investigate factors contributing to pedestrian crash severity
Rella Riccardi,
Date: 10 March 2022
Number of Pages: 384
Institution: Università degli Studi di Napoli Federico II
Department: Ingegneria Civile, Edile e Ambientale
Dottorato: Ingegneria dei sistemi civili
Ciclo di dottorato: 34
Coordinatore del Corso di dottorato:
Montella, AlfonsoUNSPECIFIED
Mauriello, FilomenaUNSPECIFIED
Date: 10 March 2022
Number of Pages: 384
Keywords: Pedestrian, road safety, vulnerable road users, parametric models, non-parametric models, measure of performance, contributory factors
Settori scientifico-disciplinari del MIUR: Area 08 - Ingegneria civile e Architettura > ICAR/04 - Strade, ferrovie ed aeroporti
Additional information: cell. 3397888593 email personale:
Date Deposited: 17 Mar 2022 10:33
Last Modified: 28 Feb 2024 10:35

Collection description

Road traffic crashes constitute a real concern and a serious public health problem. What is more, the worldwide burden of road traffic injuries and deaths is disproportionately borne by vulnerable road users (VRU) which include children, elderly people, pedestrians, cyclists, and motorcyclists. Reducing the increasing number of crashes involving VRU and their fatality represents the most serious challenge for the new decade of action for road safety. Among the road vulnerable users, making the second-largest group of road casualties after car occupants, pedestrians are the most susceptible to road potential risks. The severity of vehicle-pedestrian crashes further confirmed that actions to improve pedestrian safety are strongly needed to identify factors that affect (and how) crash injury severity. The econometric models have been widely used to carry out crash severity analyses. Recently, machine learning algorithms have been used for crash severity prediction in lieu of the more traditional econometric models. To provide support for the choice of the appropriate prediction method, this research is also aimed at comparing econometric models and machine learning methods by their capability in identifying significant explanatory variables affecting crash severity and by their performances. Analyses were carried out on three case studies using three national databases referring to the vehicle-pedestrian crashes that occurred in Great Britain in the period 2016-2018, in Sweden in the period 2015-2019, and in Italy in the period 2014-2018 to investigate how the model performances vary in presence of different sample sizes. The econometric models used in the research were the multinomial logit, the ordered logit, the random parameters multinomial logit, and the random parameters ordered logit while the machine learning algorithms include the association rules, the classification trees, the random forests, the artificial neural networks, and the support vector machine. This research further investigated the problem of imbalanced distributions of the response classes. Crash severity variable has higher variability among the severity levels’ distributions which affects classification accuracy in predicting the most severe crashes of both parametric and non-parametric methods. The quantitative models’ comparison relied on the three performance metrics F-measure, G-mean, and Area Under Curve. The quantitative evaluation of the results demonstrated that machine learning tools outperformed the econometric models, and some algorithms (SVM, ANN, and RF) also prevailed over others algorithms falling under the same umbrella of machine learning tools. The qualitative evaluation demonstrated that the machine learning tools uncover more hidden correlations among data than the econometric models and provided valuable insights on the interdependence among the several roadway, environmental, vehicle, and road users related factors contributing to the severity of pedestrian crashes. In the British case study and for fatal crashes, 19 variables were significant both in the econometric models as well as in the machine learning algorithms, 1 variable was significant only in the econometric models and 7 variables were significant only in the machine learning algorithms. In the Swedish case study, 13 variables were significant both in the econometric models as well as in the machine learning algorithms and 5 variables were only significant in the machine learning algorithms. In the Italian case study, 16 variables were significant both in the econometric models as well as in the machine learning algorithms and 3 variables were only significant in the machine learning algorithms. No further variables were identified only by the econometric models both in the Swedish and the Italian case studies. On the other hand, the random parameter econometric models provided evidence of the existence of heterogeneity among data. The presence of such variability in the effect of variables across the sample population highlights the need to account for potential unobserved heterogeneity across vehicle-pedestrian crashes as it may improve understanding and reduce erroneous inferences and predictions, producing more accurate and informative results. In conclusion, the econometric models confirmed their advantage in offering easy to interpret outputs and understandable relations between dependent and independent variables. The magnitude of each indicator variable and its direction were clear as well. Machine learning tools, instead, exhibited higher classification accuracy and the ability to highlight more hidden relations among data. However, some machine learning tools (SVM and ANN) exhibited very high classification performances but their results are really difficult to interpret whereas, other machine learning algorithms, such as AR, CT, and RF, provided very intuitive results even though with lower prediction accuracy. From the methodological perspective, the research results suggest that the joint use of econometric methods and machine learning algorithms may overcome the limits of each group of methods with a satisfactory trade-off between prediction accuracy and interpretation of results providing powerful insights on factors contributing to fatal and serious crashes. From the engineering perspective, detected the interdependences between contributory patterns and severity in pedestrian crash involvement, a combination of engineering, social, and management strategies, as well as appropriate safety countermeasures, can be identified and planned to effectively moderate pedestrian crash severity, increasing the perceived safety of walking and contributing to the vision zero-deaths on road by 2050.


Downloads per month over past year

Actions (login required)

View Item View Item