On the use of event logs for the analysis of system failures

Pecchia, Antonio (2011) On the use of event logs for the analysis of system failures. [Tesi di dottorato] (Inedito)

Computer systems are the basis for daily human activities, and, even more importantly, they play a key role in many critical domains. For this reason, understanding the failure behavior of computer systems is crucial to engineers. Event logs, i.e., the set of files where computing entities register events related to regular and anomalous activities occurred during the system operational phase, represent a valuable source of data to conduct a failure analysis. Study based on event logs span over the past three decades; however, computer systems have deeply changed over this timeframe. Investigating the suitability of traditional assumptions and techniques underlying log-based failure analysis, in spite of the changes occurred in the computer systems industry, is of paramount importance. The focus of the thesis is to evaluate the accuracy of current logging mechanisms at reporting failures, and to develop novel techniques to make event logs effective to infer failure data. Techniques involve production, collection, and correlation of the failure data in the log to support accurate system dependability characterization. The benefits that can be achieved by adopting proposed techniques, are shown by means of experiments conducted in the context of real-world, complex distributed systems.

Parole chiave:logging mechanism; event log; failure analysis; dependability
Data:30 Novembre 2011
Istituzione:Università di Napoli Federico II
