Application of DNA Microarrays in Cytogenetics

*Corresponding author: Mohammad Azhar Aziz, Team Leader-Colorectal Cancer Research program (CCRP), Research Scientist/Principal Investigator, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences, King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh 11426 KSA, Mail Code 2216, Saudi Arabia, Tel: ext. 53994; Email:


Introduction
Human genome comprising of approximately 3 billion bases compacted into 23 pairs of chromosomes is the blue print of life in an individual.The ability to study this blueprint is reflected in our understanding of the organization, function and implications of any changes in the human genome.Scientific evidence is growing to support the notion that this genetic blueprint can be a characteristic of an individual.As we progress towards establishing the idea of personalized medicine and making it a reality in the clinic, we need to assess our ability to study genome and associated variations.Cytogenetics deals with the study of the complete human genome arranged in the form of 46 chromosomes.
Inherited and acquired changes in the chromosomes are well known to be associated with a number of important clinical diseases and syndromes.Cytogenetics demanded better tools and technologies to look beyond superficial staining and coarse analysis of structural and numerical variations in the chromosomes that depend on our visual capabilities.One of the tools that revolutionized the way Cytogenetics could be studied is microarray.As the name implies it's an array of molecules that is arranged at microscopic level on a solid surface.There are several kinds of arrays employed to study human genome.
The scope of this review deals with DNA microarray (also known as chromosome microarray=CMA) that is used to study structural and numerical variations in the human genome.DNA microarray has capability to study almost entire human genome on a single chip.As the physics of the solid surface on which these arrays are arranged progressed, we witnessed a parallel evolution in the ability to design probes that can give us an insight about the genomic structure and organization.Combining the two, we have seen the field of DNA microarrays grow at a phenomenal pace.This resulted in our ability to interrogate human genome with increasingly higher resolution.Many new discoveries and hidden facts associated with genome level changes were reported due to this technological advancement.
The databases of genomics grew more than what was expected by Moore's law.However, the increase in quantity and quality of data is posing challenges for analysis that could provide meaningful information.As of now, capabilities of generating data using microarrays seem to be leveling off, but for data analysis it's just the 'lag phase'.Newer technologies in the form of single base sequencing (next generation or massively parallel sequencing) are pushing these boundaries further.

Evolution of microarrays
Cytogenetics like other fields of genetics has undergone a transformation by the ability to interrogate the genome with unprecedented resolution.The more we delve deep into the compact chromosomes; we get to know the details that were otherwise not even accessible.The classical way of studying chromosomes and the associated abnormalities are giving way to molecular details that can provide information at the level of a single base.There are two types of abnormalities in the karyotype that are associated with diseases-Numerical and Structural.Numerical variation refers to the change in number of chromosomes which is 46 in a normal cell.Structural variation is the change in the structure of the 23 pairs of chromsomes.These definitions are getting refined and more accurate with high resolution mapping of abnormalities in the human genome.
Conventional cytogenetics involving G banding, fluorescence in situ hybridization (FISH) and other forms of staining did allow us to discover the macroscopic form of chromosomes and the implications of any associated abnormality.Karyotyping using quinacrine stain, Giemsa stain was helpful in initial understanding of organization of chromosomes.Down's syndrome, Turner's syndrome, Klinefelters syndrome were a result of numerical aberrations possibly detected with the karyotyping.Discovery of 'Philadelphia chromosome' in chronic myelogenous leukemia, opened up the field of associating structural variation with a disease phenotype [1].

International Journal of Cell Science & Molecular Biology
Molecular cytogenetic techniques, such as FISH, were introduced into the clinical cytogenetics laboratory to increase resolution.FISH is targeted to a particular chromosomal region or regions.Testing may be locus specific as indicated by the patient's phenotype, or it may be used to interrogate multiple loci, such as the subtelomeric regions [2].
In order to further understand the numerical genome wide changes associated with a phenotype, comparative genomic hybridization (CGH) technique evolved.In this technique, genomic DNA is extracted from diseased and normal cells and labeled separately using red and green dyes.When this combination is allowed to hybridize with the genomic DNA arranged in the form of bacterial artificial chromosome (BAC) array, a color pattern is generated based on the amplification or deletion of specific portions in the disease samples relative to the normal cells.The ability to generate arrays using BACs has increased the resolution of the chromosomes that we can map.PIK3CA was identified as an oncogene in ovarian cancer using this approach [3].Comparative genomic hybridization [4,5] technology applied to metaphase chromosomes gave way to Array-CGH (aCGH) with the unraveling of full human genome sequence.Now, the genome can be investigated at any time and stage without the constraint of culturing cells.This was especially useful in studying solid tumors which were otherwise outnumbered by studies based on nonsolid tumors due to this constraint.Array-CGH can measure genome wide copy numbers especially for complex karyotypes frequently found in cancer cells [6].However, the resolution limit of CGH allows for detection of genetic aberrations in the range of 1-20Mb [7].CGH is also limited in its inability to detect copy number changes coincident with regions of loss of heterozygosity (LOH).Uniparental disomy cannot be studied using CGH.Also, low level mosaicism is not unraveled by CGH technology owing to its working principle.These constraints were addressed in the form of single nucleotide polymorphism (SNP) arrays to allow genotyping using thousands of SNPs on the entire genome.The most common source of genetic variation in the human genome is the existence of SNPs.These are germline point mutations naturally and statistically occurring in the course of evolution.To be defined as a SNP, these polymorphisms must occur with a minor allele frequency of at least 1% in a given population [8].The design of these arrays has also evolved with a continuous ability to generate arrays with more probes.10K gene chip enables genotyping of over 10000 SNPs in one experiment with analysis at 105kb resolution.Results from SNP arrays have shown to be very accurate and overlap with earlier results using CGH.The number of SNPs that can be studied in a single experiment has been increasing.Currently, the capability has stretched to about a million SNPs and copy number variations.For the purpose of genotyping it is now possible to genotype up to 2.5 million SNPs [9].

Cancer
For a complex disease like cancer, microarray is a valuable tool in understanding the causal molecular genetics.Somatic changes as well as germline changes have been attributed to cause different types of cancer [10][11][12].There is an ongoing quest to find causal determinants of cancer in different settings.Finding biomarkers that can predict the predisposition to a particular type of cancer has become possible with the microarrays.Using CGH, correlations have been made between chromosomal loss/gain and tumor DNA ploidy, genotypes and phenotypes [13].Mapping SNPs to account for predisposition towards a particular cancer has been successfully attempted in several types.Studying the genetic makeup of an individual may also decide the outcome of a particular therapy.A well known example came from mutant k-ras gene which rendered patient refractory to anti-epidermal growth factor receptor (EGFR) family based therapy using drugs like bevacizumab.Cellular tumor antigen p53 and other genes have been implicated in decisive outcome of certain well established therapies and coaxed the researchers towards personalized medicine.High resolution technology also calls for proper study designs in order to derive accurate inference from the data.Analyzing data from matched tumor-normal samples from the same patient is a critical step in elucidating genomic imbalances associate with cancer [11,14].Such analysis strategies would be crucial to the understanding of initiation and progression of cancer.Biomarker discovery has been made possible with the availability of microarrays.Prognostic, diagnostic and therapeutic biomarkers are still a hot pursuit.The quest for finding these biomarkers is making us appreciate the individualistic nature of each patient and hence reinforcing the concept of personalized medicine.Neurological disorders.Pathogenic chromosomal abnormalities as small as 40-to 600 kilobases detected by aCGH have been increasingly reported in patients with mental retardation and/or developmental delay [15].With the help of a genome wide aCGH analysis at high resolution very small deletions and amplifications are detected successfully in idiopathic mental retardation [16].There has been a number of genetic diseases associated with mental disorders and syndromes that were better understood with high resolution understanding of their genetic causes.In some cases certain genes were specifically involved while in others there were contiguous gene duplication and deletion.Newer syndromes and their causes are now possible to be discovered by using microarray technology retrospectively on stored tissue samples and biobanks [17].
Infertility and reproductive disorders in the field of human reproduction, the first application of CMAs was for the

International Journal of Cell Science & Molecular Biology
detection of chromosomal abnormalities in miscarriages [18] and fetuses with morphological abnormalities [19].This was one of the first diseases to witness the revolutionary changes at the level of assessing the susceptibility of infertility and the probability of successful In Vitro Fertilization (IVF) procedures.In some countries, its a regular practice to screen the fetus for chromosomal abnormalities and inform the parents of possible consequences.This gives parents a chance to make an informed decision.The use of microarrays in pre implantation settings, ongoing pregnancies, miscarriages and patient with reproductive deficiencies are proving to be increasingly useful [20].Cytogenetic evaluation of products of conception is crucial in determining the cause of pregnancy loss [21].

Types of Microarray Platforms
The evolution of microarrays for cytogenetic applications has brought in several variants depending on evolution of their manufacturing technology as well as the applications.There is no consensus on what constitutes a better platform.For some resolution is important but for others wide coverage is more critical.Adapting to the needs of the researchers, several companies have brought in different platforms as discussed below.
The underlying principle of different types of arrays available commercially is based on the complimentarity of bases.Probes belonging to particular SNP or CNV are allowed to hybridize with single stranded genomic DNA of the cell (s) under study.The level of hybridization is proportional to the signal intensity which is captured by a image scanner.This image is then processed by image analysis softwares to generate numbers that are used for further analysis by computational biologists.The difference lies in how these probes are anchored to the solid surface, number of probes and choice of probes.The design of microarray is therefore the most crucial aspect for getting informative results.
Sample requirements for the microarrays differ.Initially, high quality DNA was required but now even formalin fixed paraffin embedded (FFPE) samples can be used for microarrays designed especially for these types of samples.This advancement has unlocked treasures of sample databases to be revisited and explore earlier unexplained genomic changes.Uniparental disomy was discovered in one such attempt [22].Apart from the whole genome microarrays with different resolution limits owing to number of probe sets present on the array, almost all companies have the option for customized arrays available.In customized arrays researcher can design their own probes based on their area of interest and print them in the way that suits their sample size.For the sake of brevity only three companies which are well known among researchers and have been personally experienced by the author are discussed here.Comparison of different platforms has been a consistent exercise within different versions of a platform and across platforms [23][24][25][26].Since microarrays have been a necessary tool in clinical settings these companies have come up with microarrays that can be used for diagnostics.e.g.CytoscanHD array from Affymetrix has a diagnostic platform that is approved by United States Federal Drug Authority (USFDA).

Affymetrix arrays
For genotyping currently Affymetrix is providing arrays that have begun with 10000 SNPs.Later, 100K, 500K and SNP5.0 array which had additional 420000 non-polymorphic probes to detect copy number variation were introduced [27].Latest array in genotyping was SNP6.0 which carries about a million probes for detecting polymorphisms and almost an equal number of copy number variations.For cytogenetic analyses like break point estimation and LOH, cytoscan 750K array was introduced that have 200K gene centric SNPs and allow estimation of LOH upto 5MB.This platform can detect uniparental disomy as well.This format evolved further into cytoscan HD array which is the latest array with 750K SNPs with an ability to detect 25-50 kb copy number changes.This is the latest available array platform for studying molecular cytogenetics at the highest available resolution.Other array types are available that can be customized or used for specific diseases especially cancer (Oncoscan) and inherited congenital disorders.Arrays that can deal with formalin fixed paraffin embedded samples are also available.

Illumina arrays
Like the affymetrix platform, the illumina BeadArray has gradually increased in capacity over the years -from 100K SNPs (Human -1) to the current one million (HumanHap 1M), with intermediary steps 240k, 317k,550k and 650k [28].Illumina has designed arrays belonging to 'core' and 'omni' families.While former is useful in genotyping, later can be used for detecting copy number variation as well.Specific genotyping of exonic regions or regions relevant in cancer is also possible with the used of special arrays available.Illumina has come up with infinium technology to deliver a platform for detecting cancer relevant copy number variations.

Agilent Arrays
Agilent came up with its original use of arrayCGH technology and mixed it with SNPs in subsequent generations.Agilent has teamed up with Baylor college of medicine, USA, to generate microarray platform dedicated for specific areas of research.These arrays consist of dual color 60-mer nucleotides targeting genomic areas relevant for prenatal, postnatal and cancer research.They also have a dedicated platform for studying CNV association in unrelated populations.The resolution capacity of Agilent platforms has grown from 15K probes to about a million using sureprint technology.Agilent platforms allow multiple samples to be tested on a single array by using multi array format.1x244K, 2x105K, 4x44K, and 8x15K SurePrint HD arrays are available to serve different research needs.Agilent claims

International Journal of Cell Science & Molecular Biology
that SurePrint G3 CGH microarrays utilize the industry's highest fidelity long-mer probes resulting in the most accurate detection of copy number measurements [29].

Simultaneous analysis of many cytogenetic events is possible by using microarrays
The ability of microarrays to provide a snapshot of genomic imbalances from different perspective is valuable in seeking details about various cytogenetic events.While we get to know about the loss/gain of genomic regions, we could also know the possibility of uniparental disomy.By using these arrays we can carry out genome wide association studies (GWAS) which necessitates simultaneous measurement of SNPs associated with the phenotype.GWAS studies have found many applications in determining the susceptibility of a population towards a particular disease and possible interventions.Development of new drug targets and confirming earlier targets would be helpful application of GWAS in the direction of personalized medicine [30].SNP arrays offer great robustness, high resolution and the possibility to detect a variety of different genomic copy number variations such as submicroscopic deletions, amplifications, loss of heterozygosity and uniparental disomy [8].Simultaneous analysis of different genomic abnormalities circumvents the problem of experimental variation in different settings.With all information coming from the same tissue/cells captured at the same time, with same DNA content, confounding factors are less to be worried.

Microarrays replacing conventional cytogenetic techniques?
As with conventional and molecular cytogenetic studies, chromosome abnormalities of unclear clinical significance are sometimes uncovered by microarray analysis.These unclear results require the cytogenetic analysis of parents or other relatives to fully interpret the abnormal finding.Through the testing of parents or by FISH confirmation studies, many of these genomic alterations can be clarified.Thus, the situations encountered by microarray analysis are not unlike those that were experienced early on in the clinical cytogenetics laboratory in the elucidation of chromosomal heteromorphisms, nor unlike the finding of a novel subtle abnormality by conventional G-banding [2].In a comparative study, both Affymetrix and Illumina microarray platforms exhibit a high limit of detection and resolution to identify clinically relevant genomic aberrations, including those that escape routine FISH based analyses, in Chronic Lymphocytic Leukemia (CLL).CNAs present in only 16% of the cells as determined by FISH were unambiguously identified by microarrays.By applying similar interpretation criteria, results obtained from different microarray platforms were comparable.This opens up the possibility to fully replace the use of the current FISH panel by microarray-based profiling in all CLL patients.Microarray-based genomic profiling allows the detection of putative prognostic relevant abnormalities (i.e., focal TP53 deletions, CNLOH of 17p, size of 13q14 deletions and genomic complexity), that would have remained undetected by routine FISH procedures [26].
Novel oncogenes are being discovered by integrating information from other microarray data.Glyoxylase I was recently discovered as a novel metabolic oncogene in human gastric cancer [12].The cause of spontaneous abortions can now be ascertained in a better way only by employing microarrays [21].Microarray testing in the prenatal setting has increased dramatically, and a recommendation was made recently that this analysis replaces fluorescence in situ hybridization (FISH) in preimplanation genetic screening (PGS) as it provides a more comprehensive view of the genome [31].In Acute Myeloid Leukemia (AML) and myelodysplastic syndromes (MDS), it was established that trisomy at chromosome 8 was involved but there were possibility that cryptic abnormalities may be present in cases with an extra chromosome 8 as the seemingly sole aberration.Possible hidden abnormalities could be point mutations and cytogenetically unidentifiable chromosome aberrations resulting in fusion genes, small deletions, or amplifications [32,33].Results of over 25 published studies support the use of arrays in MDS testing.Because few balanced translocations are found in MDS, this disease is particularly amenable to microarray testing, and studies have shown better disease classification, identification of cryptic changes, and prognostication in this heterogeneous group of disorders.Novel genomic alterations identified by array testing may lead to better targeted therapies for treating patients with MDS [34].

Better Analysis can change data into information
Microarrays have made possible a wealth of data to be generated out of small samples.All 23000 genes located on 3 billion bases compacted into 46 chromosomes are now accessible.However the process of data generation and analysis is heavily dependent on computational and statistical tools.The study design without proper controls and information may lead to generating 'more data -less information'.The minimum information about a microarray experiment (MIAME) guidelines were thus prepared and made mandatory for any researcher willing to populate the databases [35].The development of open source tools and commercial software has grown keeping in mind the complex nature of hybridization, image capturing, normalizing, background detection-various steps in the generation of microarray data.Most of the platforms have their own softwares that suits their platform designs but these are mostly used for data extraction and primary analysis.Chromosome Analysis Suite (ChAS) from Affymetrix and Genomestudio from Illumina are softwares widely used to extract data from their respective platforms.

Challenges for Microarrays
Current microarray technologies cannot identify balanced rearrangements and some ploidies [36].Chromosomal

International Journal of Cell Science & Molecular Biology
rearrangements like inversions and translocations cannot be studied using microarray technology.Discovery of SNPs or point mutations is not possible due to the resolution level which is needed to be a single base.Since the technology is dependent on the ability of genomic DNA to hybridize and generate light signals, there are issues with designing a perfect error free oligo design on the microarray.The background hybridization adds to the noise and may cause loss of important information.Low sensitivity of microarray is a concern which could possible be addressed by next generation sequencing.
Next Generation Sequencing is providing resolution level of a single base.The ability to read thousands of genomic sequences in a massively parallel manner is opening up possibilities to pin down the genomic abnormalities down to one base.This has been made possible with the capability to sequence entire human genome in a span of days and at a cost of few thousand dollars.Earlier, this capacity was attainable only in years of time and millions of dollars.
Ethical concerns about the use of microarrays and next generation sequencing are also coming up in the clinical setting.What we know from research samples cannot be used for diagnostic purposes but they can be live saving for some patients.Microarrays alone are not yet fully optimized to carry out diagnostic tests in all settings.Except a few (e.g.cytoscan HD array), all other microarray results should be complemented with other techniques.
With data analysis tools still far from being perfect and error free its risky to dwell on these high throughput techniques for making some clinically crucial decisions.Careful examination of microarray data would be helpful for clinicians to advise patients while taking into consideration other well proven and time tested parameters already used in the clinic.