PELLEGRINO, MARIA SOLE
(2018)
Assessing and inferring intra and inter-rater agreement.
[Tesi di dottorato]
Item Type: |
Tesi di dottorato
|
Resource language: |
English |
Title: |
Assessing and inferring intra and inter-rater agreement |
Creators: |
Creators | Email |
---|
PELLEGRINO, MARIA SOLE | mariasole.pellegrino@unina.it |
|
Date: |
11 December 2018 |
Number of Pages: |
106 |
Institution: |
Università degli Studi di Napoli Federico II |
Department: |
Ingegneria Industriale |
Dottorato: |
Ingegneria industriale |
Ciclo di dottorato: |
31 |
Coordinatore del Corso di dottorato: |
nome | email |
---|
Grassi, Michele | michele.grassi@unina.it |
|
Tutor: |
nome | email |
---|
Vanacore, Amalia | UNSPECIFIED |
|
Date: |
11 December 2018 |
Number of Pages: |
106 |
Keywords: |
Rater repeatability and reproducibility; subjective evaluations; statistical benchmarking |
Settori scientifico-disciplinari del MIUR: |
Area 13 - Scienze economiche e statistiche > SECS-S/02 - Statistica per la ricerca sperimentale e tecnologica |
[error in script]
[error in script]
Date Deposited: |
02 Jan 2019 15:32 |
Last Modified: |
22 Jun 2020 09:24 |
URI: |
http://www.fedoa.unina.it/id/eprint/12616 |
Collection description
The research work wants to provide a scientific contribution in the field of subjective decision making since the assessment of the consensus, or equivalently
the degree of agreement, among a group of raters as well as between more series of evaluations provided by the same rater, on categorical scales is a subject of both scientific and practical interest. Specifically, the research work focuses on the analysis of measures of agreement commonly adopted for assessing the performance (evaluative abilities) of one or more human raters
(i.e. a group of raters) providing subjective evaluations about a given set of items/subjects. This topic is common to many contexts, ranging from medical (diagnosis) to engineering (usability test), industrial (visual inspections) or agribusiness (sensory analysis) contexts.
In the thesis work, the performance of the agreement
indexes under study, belonging to the family of the kappa-type agreement coefficients, have been assessed mainly regarding their inferential aspects, focusing the attention on those scenarios with small sample sizes which do not satisfy the asymptotic conditions required for the applicability of the standard inferential methods.
Those scenarios have been poorly investigated in the specialized literature, although there is an evident interest in many experimental contexts.
The critical analysis of the specialized literature highlighted two criticisms regarding the adoption of the agreement coefficients: 1) the degree of agreement
is generally characterized by a straightforward benchmarking procedure that does not take into account the sampling uncertainty; 2) there is no evidence
in the literature of a synthetic index able to assess the performance of a rater and/or of a group of raters in terms of more than one evaluative abilities (for example repeatability and reproducibility).
Regarding the former criticism, an inferential benchmarking procedure based on non parametric confidence intervals, build via bootstrap resampling techniques, has been suggested. The statistical properties of the suggested benchmarking procedure have been investigated via a Monte Carlo simulation study by exploring many scenarios defined by varying: level of agreement, sample
size and rating scale dimension. The simulation study has been carried out for different agreement coefficients and building different confidence intervals, in order to provide a comparative analysis of their performances.
Regarding the latter criticism, instead, has been proposed a novel composite index able to assess the rater abilities of providing both repeatable (i.e. stable over time) and reproducible (i.e. consistent over different rating scales) evaluations. The inferential benchmarking procedure has been extended also to the proposed composite index and their performances have been investigated under different scenarios via a Monte Carlo simulation.
The proposed tools have been successfully applied to two real case studies, about the assessment of university teaching quality and the sensory analysis of some food and beverage products, respectively.
Downloads per month over past year
Actions (login required)
|
View Item |