PELLEGRINO, MARIA SOLE (2018) Assessing and inferring intra and inter-rater agreement. [Tesi di dottorato]

[img]
Anteprima
Testo
PhD_thesis_of_Pellegrino_Maria_Sole.pdf

Download (6MB) | Anteprima
[error in script] [error in script]
Tipologia del documento: Tesi di dottorato
Lingua: English
Titolo: Assessing and inferring intra and inter-rater agreement
Autori:
AutoreEmail
PELLEGRINO, MARIA SOLEmariasole.pellegrino@unina.it
Data: 11 Dicembre 2018
Numero di pagine: 106
Istituzione: Università degli Studi di Napoli Federico II
Dipartimento: Ingegneria Industriale
Dottorato: Ingegneria industriale
Ciclo di dottorato: 31
Coordinatore del Corso di dottorato:
nomeemail
Grassi, Michelemichele.grassi@unina.it
Tutor:
nomeemail
Vanacore, Amalia[non definito]
Data: 11 Dicembre 2018
Numero di pagine: 106
Parole chiave: Rater repeatability and reproducibility; subjective evaluations; statistical benchmarking
Settori scientifico-disciplinari del MIUR: Area 13 - Scienze economiche e statistiche > SECS-S/02 - Statistica per la ricerca sperimentale e tecnologica
Depositato il: 02 Gen 2019 15:32
Ultima modifica: 22 Giu 2020 09:24
URI: http://www.fedoa.unina.it/id/eprint/12616

Abstract

The research work wants to provide a scientific contribution in the field of subjective decision making since the assessment of the consensus, or equivalently the degree of agreement, among a group of raters as well as between more series of evaluations provided by the same rater, on categorical scales is a subject of both scientific and practical interest. Specifically, the research work focuses on the analysis of measures of agreement commonly adopted for assessing the performance (evaluative abilities) of one or more human raters (i.e. a group of raters) providing subjective evaluations about a given set of items/subjects. This topic is common to many contexts, ranging from medical (diagnosis) to engineering (usability test), industrial (visual inspections) or agribusiness (sensory analysis) contexts. In the thesis work, the performance of the agreement indexes under study, belonging to the family of the kappa-type agreement coefficients, have been assessed mainly regarding their inferential aspects, focusing the attention on those scenarios with small sample sizes which do not satisfy the asymptotic conditions required for the applicability of the standard inferential methods. Those scenarios have been poorly investigated in the specialized literature, although there is an evident interest in many experimental contexts. The critical analysis of the specialized literature highlighted two criticisms regarding the adoption of the agreement coefficients: 1) the degree of agreement is generally characterized by a straightforward benchmarking procedure that does not take into account the sampling uncertainty; 2) there is no evidence in the literature of a synthetic index able to assess the performance of a rater and/or of a group of raters in terms of more than one evaluative abilities (for example repeatability and reproducibility). Regarding the former criticism, an inferential benchmarking procedure based on non parametric confidence intervals, build via bootstrap resampling techniques, has been suggested. The statistical properties of the suggested benchmarking procedure have been investigated via a Monte Carlo simulation study by exploring many scenarios defined by varying: level of agreement, sample size and rating scale dimension. The simulation study has been carried out for different agreement coefficients and building different confidence intervals, in order to provide a comparative analysis of their performances. Regarding the latter criticism, instead, has been proposed a novel composite index able to assess the rater abilities of providing both repeatable (i.e. stable over time) and reproducible (i.e. consistent over different rating scales) evaluations. The inferential benchmarking procedure has been extended also to the proposed composite index and their performances have been investigated under different scenarios via a Monte Carlo simulation. The proposed tools have been successfully applied to two real case studies, about the assessment of university teaching quality and the sensory analysis of some food and beverage products, respectively.

Downloads

Downloads per month over past year

Actions (login required)

Modifica documento Modifica documento