Spatial Point Pattern Analyses and its Use in Geographical Epidemiology

Spatial epidemiology is a subfield of health geography focused on the study of the spatial distribution of health outcomes. Point pattern analysis is the evaluation of the pattern, or distribution, of a set of points on a surface. It can refer to the actual spatial or temporal location of these points or also include data from point sources. It is one of the most fundamental concepts in geography and spatial analysis. In this study, geographical epidemiology, data for spatial analysis, disease clustering, disease mapping are introduced. This paper also includes applications in environmental epidemiology and geographical epidemiology with spatial point pattern analysis.


Introduction
The analysis of spatial point patterns came to prominence in geography during the late 1950s and early 1960s, when a spatial analysis paradigm began to take firm hold within the discipline. Researchers borrowed freely from the plant ecol-ogy literature, adopting techniques that had been used there in the description of spatial patterns and applying them in other contexts: for example, in studies of settlement distributions [1,2], the spatial arrangement of stores within urban areas [3] and the distribution of drumlins in glaciated areas [4]. The methods that were used could be clas-sified into two broad types [5]. The first were distance-based techniques, using information on the spacing of the points to characterize pattern (typically, mean distance to the near-est neighbouring point). Other techniques were area-based, relying on various characteristics of the frequency distribution of the observed numbers of points in regularly defined sub-regions of the study area ('quadrats'). For many geographers, point pattern analysis will conjure up images of nearest-neighbour analysis applied inappropriately to data sets of doubtful relevance [6].
In this paper, geographical epidemiology is introduced in section 2. In section 3, spatial point pattern analysis and geographical epidemiology is explained. Conclusion is discussed in section 4. The last part of the paper includes references.

Geographical Epidemiology
Geographical epidemiology can be defined as the description of spatial patterns of disease morbidity and mortality, part of descriptive epidemiological studies, with the aim of formulating hypotheses about the aetiology of diseases [7]. One can identify different branches in geographical epidemiology, which is a reflection of the different needs of public health specialists and epidemiologists in the assessment of ill-health aetiology. Predominant among the methods of geographical epidemiology are the following: disease mapping, disease clustering and ecological analysis [8]. There is usually a close relationship between these branches. However, as almost all geographical epidemiological studies are descriptive in nature and depend on scale, one should bear in mind that a more comprehensive picture of a spatial problem can be achieved when the results of geographical aggregatelevel data are combined with those at the individual level [9]. Multilevel modelling, hierarchical regression and contextual analysis are phrases describing one of the various statistical methods in which this combination is allowed. Multilevel modelling is a powerful, relatively new technique that can be used to determine how much of the ecological effect can be explained by variations in the distribution of individual-level risk factors, and recently attempts have been made to integrate this kind of analysis into geographical epidemiology [10,11]. There are also new developments incorporating time changes along with spatial variation. Such models are able to provide new insights into the aetiology of diseases that are otherwise unavailable [12,13].

Data for spatial analysis
There are usually two important types of spatial data: point and area data. Each item of health data (including population, environmental exposure, mortality and morbidity) may be Open Access Journal of Biostatistics & Biometrics connected with a point, or precise spatial position such as a home, a street address or an area, which could be defined as a spatial region by postcode, ward, local authority, province and country [14]. A public health specialist may also come across spatial data in the form of continuous surface, such as the statistical surfaces of pollution interpolated from fixed-point characteristics [12]. As data for spatial analysis come from different sources, and have often been collected without taking into account the interests of the geographical epidemiologists, it is absolutely necessary to ensure that precise and complete point and/or area health data are used in spatial epidemiology [15,16]. In the developed world, most of the mortality and cancer incidence data have good quality. Nevertheless, other health data such as rates of suicide, congenital anomalies and hospital admissions may be subject to partial ascertainment (rates are underestimated). In addition, the diagnosis, collection, coding and reporting of a given health outcome may differ between geographical regions and over time [17].The danger of ignoring data-quality issues is that, because of missing cases or inaccurate baseline population data, one might arrive at a misleading (invalid) high or low estimated risk [18,19]. Confidentiality may also be an important issue. Breaching the confidentiality of spatial data may cause concern, especially when it discloses areas with high rates of morbidity/ mortality or high levels of pollutants [14].

Disease clustering
Searching for disease clustering is one of the branches of geographical epidemiology that involves an assessment of local or global accumulation of disease [20]. There are different types of clustering, including general and specific. General clustering involves the analysis of the overall clustering tendency of the disease incidence in a study region, and is paralleled by the assessment of global spatial autocorrelation, in which the exact location of clusters is not investigated. The second type of investigation of clustering uses specific disease-clustering methods, which are designed to examine the exact location of the clusters [21]. As we will discuss the importance of, and the ways of detecting, global and local clustering in areal data in the section below on spatial autocorrelation, here we will focus only on the detection of clusters in point data. Methods for the detection clusters in point format data are more numerous than those for areal format data, and are usually divided into the following three groups: global, localised and focused (ie, assesses clustering around a putative source [22]. There are a number of tests available that help to assess different kinds of clusters in point format data. However, we will discuss only three of them very briefly, and refer the readers to Bailey and Gatrell, and Gatrell et al for a complete discussion. Cuzick and Edwards method determines global clustering by examining the k nearest neighbours of each case. The geographical analysis machineand the spatial scan statistic assess the localised clustering by drawing circles of different sizes over the area of study and compare the risk of disease inside and outside of each circle [23,24]. The spatial scan statistic has an advantage over geographical analysis machine in taking into account the problems of multiple testing [25].

Disease mapping
Data visualisation is the first step in disclosing the complex structure in data [26]. Data visualisation may not only create interest and attract the attention of the viewer but also provide a way of discovering the unexpected [27]. Although plots of data and other graphical displays are among the fundamental tools for analysts in general, for a spatial analyst, visualising spatial data usually means using a map [6].Disease mapping is one of the branches of geographical epidemiology fulfilling the need to create accurate maps of disease morbidity and mortality [28]. For instance, dot or dot-density maps are used to display point data, whereas choropleth maps are used for areal data, and contour or isopleth maps are used for continuous surface data [12]. The use of mapping in the medical context has developed so rapidly during recent decades that the presentation of maps is now established as a basic tool in the analysis of public health data [8,29].
There are two main classes of disease maps for areal data: maps of standardised rates and maps of statistical significance of the difference between disease risk in each area and the overall risk averaged over the whole map [7]. There are pros and cons for each of these classes. For instance, mapping rates in small areas tend to create a misleading picture (see the section Smoothing) while using statistical significance, particularly in areas with large populations, produce small p values indicating statistical significance, but do not disclose scientifically interesting differences [30]. The mapping of standardised rates is generally preferred to the mapping of p values, controlling for the influence of sampling variation by using a smoothing technique (see the section on Smoothing) [31].

Applications in environmental epidemiology
Epidemiology is concerned with the study of the distribution and determinants of health and diseases, morbidity, injuries, disability, and mortality in populations [32]. Epidemiologic studies are applied to the control of health problems in populations. Epidemiology is one of the core disciplines used to examine the associations between environmental epidemiology refers to the study of diseases and health conditions (occuring in the population) that are linked to environmental factors [33,34]. The exposures, which most of the time are outside the control of the individual, usually may be considered involuntary and stem from ambient and occupational environments [35]. According to this conception of environmental epidemiology, standard epidemioloc methods are used to study the association between environmental factors (exposures) and health outcomes. Examples of topics studied include air and water pollution, the occupational environment with its possible use of physical and Open Access Journal of Biostatistics & Biometrics chemical agents, and the psychosocial environment (Rothman K.J.).Some of using techniques in environmental epidemiology are as follows; a.
Spatial clustering b. Space-time interaction c.
Modelling the raised incidence of disease

Spatial Point Pattern Analysis and Geographical Epidemiology
The behavior of a general spatial distribution process can be characterized in terms of its first-order and second-order properties. First-order properties describe the spatially varying intensity of a point pattern, in which intensity is defined as the expected (mean) value of the distribution at locations throughout the region of interest [36]. Second-order properties describe the covariance (or autocorrelation) structure of the point pattern and can be identified by analyzing the distribution of distances between those sample points [2,37,38].
Ripley's K function is regarded as a suitable tool to characterize second-order properties of a point pattern. It is the expected number of points in a circle of radius d with a random point at center, and is formally defined as [39]: , A I is the area of the plot , is a counter variable ij u , is the distance between events i and j ij ϖ , and is a weighting factor to correct for edge effects. In our study, toroidal edge correction was used to avoid the edge effects by treating the rectangular study plot encompassing the study region as a torus, that is, the part of a sample outside the rectangle is made to appear at the corresponding opposite border [42]. Points at opposite sides of the plot are now close to each other and the boundary does not exist [43] (Figure 1).  The question of whether the geographical inci-dence of disease shows any tendency towards clustering in geographical space has a long and rich history Figure 2. Do cases of disease tend to occur in proximity to other cases? The problem has become more urgent in recent years in the light of concerns raised about possible links between disease inci-dence and potential sources of environmental con-tamination, such as nuclear installations. Evidence of clustering might also lend support to other theories of disease incidence, such as a viral aetiol-ogy. For example, exposure to a common, persist-ent viral infection, either during gestation or as a young child with an immune system that had been protected at a very early age, might provide clues to explaining possible leukaemia clustering [44][45][46][47][48][49].

Conclusion
3In this paper, geographical epidemiology, data for spatial analysis, disease clustering, disease mapping are explained. Also, Open Access Journal of Biostatistics & Biometrics applications in environmental epidemiology and geographical epidemiology with spatial point pattern analysis including spatial clustering are introduced. Example is given Gatrell et al.'s studies. I would like to thank you them for their enlightening information. For future work, spatial point process analysis can be examined for geographical epidemiology.