De Filippo, Maria Rosaria (2013) Computational approaches to analyze next generation sequencing data. [Tesi di dottorato]

[img]
Preview
Text
tesi_dottorato_MariaRosariaDeFilippo.pdf

Download (11MB) | Preview
[error in script] [error in script]
Item Type: Tesi di dottorato
Lingua: English
Title: Computational approaches to analyze next generation sequencing data
Creators:
CreatorsEmail
De Filippo, Maria Rosariamdefilippo@unisa.it
Date: 2 April 2013
Number of Pages: 142
Institution: Università degli Studi di Napoli Federico II
Department: Biologia e patologia cellullare e molecolare "L. Califano"
Scuola di dottorato: Biotecnologie
Dottorato: Biologia computazionale e bioinformatica
Ciclo di dottorato: 25
Coordinatore del Corso di dottorato:
nomeemail
Cocozza, Sergiococozza@unina.it
Tutor:
nomeemail
Weisz, Alessandroaweisz@unisa.it
Chiusano, Maria Luisachiusano@unina.it
Date: 2 April 2013
Number of Pages: 142
Uncontrolled Keywords: Next generation sequencing, miRNA-Seq, exome sequencing
Settori scientifico-disciplinari del MIUR: Area 05 - Scienze biologiche > BIO/11 - Biologia molecolare
Area 01 - Scienze matematiche e informatiche > INF/01 - Informatica
Area 06 - Scienze mediche > MED/26 - Neurologia
Aree tematiche (7° programma Quadro): SALUTE e TUTELA DEL CONSUMATORE > Biotecnologie, strumenti e tecnologie generiche per la salute umana
Date Deposited: 03 Apr 2013 10:18
Last Modified: 16 Jul 2014 12:42
URI: http://www.fedoa.unina.it/id/eprint/9489
DOI: 10.6092/UNINA/FEDOA/9489

Abstract

Advances in next generation sequencing in the last few years have enabled an increasing number of applications in biology and medicine, from whole genome to small-RNA sequencing, with increased throughput accompanied by plunging costs. This thesis is focalized on two of the most used applications, small-RNA sequencing, to investigate the biological function of the increasing population of small non coding RNA, including micro-RNA and Exome sequencing to identify single nucleotide variations (SNV) and small insertion and deletions (InDel). In this context two different dataset were used: the first obtained from small-RNA-sequencing using human breast cancer MCF-7 cells in two different conditions and the latter obtained from exome sequencing in patients with a rare syndrome (malignant migrating partial seizures of infancy). A large amount of data were produce from each experiment, required comprehensive analysis pipelines to analyze them. Small-RNA sequencing represents a novel technology widely used to investigate with high sensitivity and specificity small non-coding RNA populations, comprising microRNAs and other regulatory transcripts. To gather biologically relevant information, such as detection and differential expression analysis of known and novel non-coding RNAs and target prediction, the analysis requires the implementation of multiple statistical and bioinformatics tools from different sources, each focusing on a single step of the analysis pipeline. As result, a novel modular pipeline called iMir for comprehensive analysis of miRNA-Seq data, from adapter trimming, quality filter to differential expression and biological target prediction together with other useful options, was designed by integrating multiple open source modules and resources in an automated workflow. The pipeline was applied to analyze simultaneously miRNA-Seq datasets from human breast cancer MCF-7 cell, resulting in a rapid and accurate identification, quantization and differential expression analysis of ~450 miRNAs, including several novel miRNAs and isomiRs, as well as identification of the putative mRNA targets of differentially expressed miRNAs. Exome sequencing, the targeted sequencing of coding regions of the genome, is a powerful and cost-effective technique for dissecting the genetic basis of diseases and traits that have proved to be intractable with conventional gene-discovery strategies. To reduce the number of false positive variations and simplify the understanding of results, a comprehensive pipeline was developed, integrating different tools. Starting from quality check and alignment, base quality score recalibration and local realignment around indels were performed and SNV and InDel were called. Finally, different filters were applied to discard variations with low quality and coverage. The pipeline was then used to analyze data from exome sequencing in six patients with malignant migrating partial seizures in infancy, also known as MMPSI or MMPEI. After analysis and filtering, common variants between 6, 5, 4 and 3 patients were studied to identify putative disease causing mutation(s). Results obtained indicate the accuracy of the pipeline to identify SNV and short InDels and the reliability to provide a global and quantitative catalogue of nucleotide variants in the exome.

Actions (login required)

View Item View Item