0% found this document useful (0 votes)
54 views39 pages

Cnvs Dataset and Analysis: Prepared By: Mohammed Abdulghani Taha Supervised By: Assist. Prof. Gokmen Altay

This document summarizes a dataset and analysis of copy number variations (CNVs). It describes several techniques used to discover CNVs, focusing on array comparative genomic hybridization (aCGH). The key steps of an aCGH analysis are presented: normalization, segmentation to identify aberrant regions, and calling to categorize segments. The document then outlines analysis of a real breast cancer aCGH dataset to test associations between CNV regions and estrogen receptor status.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views39 pages

Cnvs Dataset and Analysis: Prepared By: Mohammed Abdulghani Taha Supervised By: Assist. Prof. Gokmen Altay

This document summarizes a dataset and analysis of copy number variations (CNVs). It describes several techniques used to discover CNVs, focusing on array comparative genomic hybridization (aCGH). The key steps of an aCGH analysis are presented: normalization, segmentation to identify aberrant regions, and calling to categorize segments. The document then outlines analysis of a real breast cancer aCGH dataset to test associations between CNV regions and estrogen receptor status.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

CNVs dataset and analysis

Prepared by: Mohammed Abdulghani Taha Supervised by: Assist. Prof. Gokmen Altay

What is CNVs ??
Large regions of the genome that have been deleted or duplicated on certain chromosomes [4] For example, the chromosome that normally has sections in order as A-B-C-D might instead have sections A-B-C-C-D (a duplication of "C") or A-B-D (a deletion of "C") [4].

What is CNVs ??

Techniques to discover CNVs


There are several techniques that have been used to discover CNVs[3].
ROMA (Representational Oligonucleotide Microarray Analysis), Fosmid Paired-End Sequencing, SKY (Spectral Karyotyping) or FISH (Fluorescence In Situ Hybridization), CGH (Comparative Genomic Hybridization), RT/Q-PCR (Real Time Quantitative PCR) aCGH (Array Comparative Genomic Hybridization)

aCGH (Array Comparative Genomic Hybridization)


has become one of the new powerful arraybased methods [3]. In an aCGH experiment a DNA sample of interest (test sample), and a reference sample are mixed [6] . The combined sample is then hybridized to the microarray and imaged [6].

aCGH (Array Comparative Genomic Hybridization)


Normal microarray procedure is followed [3]. Based on the color of each dot, the colors intensity, and a complicated algorithm, the amount of copying or deletion can be estimated [7]. The higher uorescence intensity ratio indicates that the target genome contains more copies

Quantification of dataset
Array CGH consists of a number of probes and each probe contains a small DNA fragment. Array CGH approaches can provide a vector V = (v1, v2,,vn), where vi is the log ratio of the reference genome for the ith probe and this is done by measuring the fluorescence intensity at each probe . V=log2(fluorescence intensity in the target genome/fluorescence intensity in the reference genome)

Pre-processing
Normalisation Segmentation Calling

Normalisation
The aim of Normalization is to make log2-ratio from different hybridizations comparable [10]. Types of normalization [10]:
Median normalization. Mode normalization. Spatial normalization.

Median Normalization
Median Normalization must be used after using either Mean Normalization or Standard Deviation Normalization Data sets for each microrrays normalized data must be compiled into a matrix. From each microarray, 2 data sets are available to be used for analysis. For this experiment, 2 microarrays were used giving us 4 different data sets.

Median Normalization
X denotes a gene ,N equals the number of data sets used, P equals the number of genes in each microarray. M1 equals the red intensity median for genes X11 X1n Mm also calculated which is equal to equals the median for all combined red medians M1 Mp . A1 is calculated the median for all the expression ratio values for the Data Set #1.

Median Normalization
Each genes expression ratio was then multiplied by a
Ratio = (Mm / A1). Ratio = (Mm / A2). Ratio = (Mm / A3). Ratio = (Mm / A4).

Segmentation

Divide the genome into contiguous segments. Clones that belong to the same segment have the same copy number. The purpose of segmentation are Noise reduction, detection of aberration (loss, normal, gain) and breakpoint analysis [10].

Calling

Calling is the process of categorizing the different segmentation states as loss, normal, gain, or amplification [10].

The pre-processed data

Analysis (Clustering)
Similarity: The copy number o a clone of two samples is in agreement if they are equal. Two clones of two samples are in concordance if they agree on which clone has the largest copy number

Clustering

Clustering

Clustering

Clustering

Clustering

Real-life dataset

Real-life dataset

Real-life dataset

Real-life dataset

Analysis of multiple CNVs


An example of association tests involving several CNVs , where data from a CGH array is analysed Real data sets are used to illustrate how to analyze CNV data. Start by loading the package CNVassoc:
> library(CNVassoc)

and some required libraries


> library(xtable)

Analysis of multiple CNVs


Step 1. Use any aCGH calling procedure,here they use (CGHcall). Step2. Build blocks/regions of consecutive probes with similar signatures (CGHregions). Step 3. Use the signature that occurs most in a block to perform association here they use (multiCNVassoc). Step 4. Correct for multiple testing considering dependency among signatures here they use (getPvalBH).

Analysis of multiple CNVs


To illustrate, they apply these steps to the breast cancer data studied by Neve et al.. The data consists of CGH arrays of 1MB resolution and is available from Bioconductor http://www.bioconductor.org The authors chose the 50 samples In this example the association between strogen receptor positivity (dichotomous variable; 0: negative, 1: positive) and CNVs was tested.

Analysis of multiple CNVs


The original data set contained 2621 probes The data reduced to 459 blocks after the application of CGHcall and CGHregions. The data is saved in an object called NeveData This object is a list with two components. The data can be loaded as usual:
> data(NeveData) > intensities <- NeveData$data > pheno <- NeveData$pheno

Analysis of multiple CNVs


The calling can be performed using CGHcall package by using the following instructions: dontrun{} This process takes about 20 minutes.The alternative way is that they saved the final object of class cghCall that can be loaded as
> data(NeveCalled)

Analysis of multiple CNVs


CGHcall function does not estimates the underlying number of copies for each segment but assigns the underlying status: loss, normal or gain. This is done by

> probs <- getProbs(NeveCalled)


This is a dataframe that looks like this: > probs[1:5, 1:7]

Analysis of multiple CNVs

Analysis of multiple CNVs


In order to determine the regions that are recurrent or common among samples. This is done by CGHregion function This can be done by executing : dontrun{ library(CGHregions) NeveRegions <- CGHregions(NeveCalled) }

Analysis of multiple CNVs


This process takes about 3 minutes. We have stored the result in the object NeveRegions that can be loaded as usual > data(NeveRegions) Now we have to get the posterior probabilities for each block/region. > probsRegions <- getProbsRegions(probs, +NeveRegions, intensities)

Analysis of multiple CNVs


Finally, the association analysis between each region and the strogen receptor positivity can be analyzed by using the multiCNVassoc function. > pvals <- multiCNVassoc(probsRegions, formula = +"pheno~CNV", model = "mult", num.copies = 0:2, + cnv.tol = 0.01)

Analysis of multiple CNVs


The function getPvalBH produces the FDRadjusted p-values > pvalsBH <- getPvalBH(pvals) > head(pvalsBH)

You might also like