De Filippo, Maria Rosaria (2013) Computational approaches to analyze next generation sequencing data. [Tesi di dottorato]
Preview |
Text
tesi_dottorato_MariaRosariaDeFilippo.pdf Download (11MB) | Preview |
Item Type: | Tesi di dottorato |
---|---|
Resource language: | English |
Title: | Computational approaches to analyze next generation sequencing data |
Creators: | Creators Email De Filippo, Maria Rosaria mdefilippo@unisa.it |
Date: | 2 April 2013 |
Number of Pages: | 142 |
Institution: | Università degli Studi di Napoli Federico II |
Department: | Biologia e patologia cellullare e molecolare "L. Califano" |
Scuola di dottorato: | Biotecnologie |
Dottorato: | Biologia computazionale e bioinformatica |
Ciclo di dottorato: | 25 |
Coordinatore del Corso di dottorato: | nome email Cocozza, Sergio cocozza@unina.it |
Tutor: | nome email Weisz, Alessandro aweisz@unisa.it Chiusano, Maria Luisa chiusano@unina.it |
Date: | 2 April 2013 |
Number of Pages: | 142 |
Keywords: | Next generation sequencing, miRNA-Seq, exome sequencing |
Settori scientifico-disciplinari del MIUR: | Area 05 - Scienze biologiche > BIO/11 - Biologia molecolare Area 01 - Scienze matematiche e informatiche > INF/01 - Informatica Area 06 - Scienze mediche > MED/26 - Neurologia |
Aree tematiche (7° programma Quadro): | SALUTE e TUTELA DEL CONSUMATORE > Biotecnologie, strumenti e tecnologie generiche per la salute umana |
Date Deposited: | 03 Apr 2013 10:18 |
Last Modified: | 16 Jul 2014 12:42 |
URI: | http://www.fedoa.unina.it/id/eprint/9489 |
DOI: | 10.6092/UNINA/FEDOA/9489 |
Collection description
Advances in next generation sequencing in the last few years have enabled an increasing number of applications in biology and medicine, from whole genome to small-RNA sequencing, with increased throughput accompanied by plunging costs. This thesis is focalized on two of the most used applications, small-RNA sequencing, to investigate the biological function of the increasing population of small non coding RNA, including micro-RNA and Exome sequencing to identify single nucleotide variations (SNV) and small insertion and deletions (InDel). In this context two different dataset were used: the first obtained from small-RNA-sequencing using human breast cancer MCF-7 cells in two different conditions and the latter obtained from exome sequencing in patients with a rare syndrome (malignant migrating partial seizures of infancy). A large amount of data were produce from each experiment, required comprehensive analysis pipelines to analyze them. Small-RNA sequencing represents a novel technology widely used to investigate with high sensitivity and specificity small non-coding RNA populations, comprising microRNAs and other regulatory transcripts. To gather biologically relevant information, such as detection and differential expression analysis of known and novel non-coding RNAs and target prediction, the analysis requires the implementation of multiple statistical and bioinformatics tools from different sources, each focusing on a single step of the analysis pipeline. As result, a novel modular pipeline called iMir for comprehensive analysis of miRNA-Seq data, from adapter trimming, quality filter to differential expression and biological target prediction together with other useful options, was designed by integrating multiple open source modules and resources in an automated workflow. The pipeline was applied to analyze simultaneously miRNA-Seq datasets from human breast cancer MCF-7 cell, resulting in a rapid and accurate identification, quantization and differential expression analysis of ~450 miRNAs, including several novel miRNAs and isomiRs, as well as identification of the putative mRNA targets of differentially expressed miRNAs. Exome sequencing, the targeted sequencing of coding regions of the genome, is a powerful and cost-effective technique for dissecting the genetic basis of diseases and traits that have proved to be intractable with conventional gene-discovery strategies. To reduce the number of false positive variations and simplify the understanding of results, a comprehensive pipeline was developed, integrating different tools. Starting from quality check and alignment, base quality score recalibration and local realignment around indels were performed and SNV and InDel were called. Finally, different filters were applied to discard variations with low quality and coverage. The pipeline was then used to analyze data from exome sequencing in six patients with malignant migrating partial seizures in infancy, also known as MMPSI or MMPEI. After analysis and filtering, common variants between 6, 5, 4 and 3 patients were studied to identify putative disease causing mutation(s). Results obtained indicate the accuracy of the pipeline to identify SNV and short InDels and the reliability to provide a global and quantitative catalogue of nucleotide variants in the exome.
Downloads
Downloads per month over past year
Actions (login required)
View Item |