Integration of artificial neural network and geographic information system applications in simulating groundwater quality

Background: Although experiments on water quality are time consuming and expensive, models are often employed as supplement to simulate water quality. Artificial neural network (ANN) is an efficient tool in hydrologic studies, yet it cannot predetermine its results in the forms of maps and geo-referenced data. Methods: In this study, ANN was applied to simulate groundwater quality and geographic information system (GIS) was used as pre-processing and post-processing tool in simulating water quality in the Mazandaran Plain (Caspian southern coasts, Iran). Groundwater quality was simulated using multilayer perceptron (MLP) network. The determination of groundwater quality index (GWQI) and the estimation of effective factors in groundwater quality were also undertaken. After modeling in ANN, the model validation was carried out. Also, the study area was divided with the pixels 1×1 km (raster format) in GIS medium. Then, the model input layers were combined and a raster layer which comprised the model inputs values and geographic coordinate was generated. Using geographic coordinate, the values of pixels (model inputs) were inputted into ANN (Neuro Solutions software). Groundwater quality was simulated using the validated optimum network in the sites without water quality experiments. In the next step, the results of ANN simulation were entered into GIS medium and groundwater quality map was generated based on the simulated results of ANN. Results: The results revealed that the integration of capabilities of ANN and GIS have high accuracy and efficiency in the simulation of groundwater quality. Conclusion: This method can be employed in an extensive area to simulate hydrologic parameters.


Introduction
Groundwater is one of the most important water resources on earth, and its water quality studies are very vital for the protection and planning of water resources particularly in arid and semi-arid regions such as Iran. Groundwater presently accounts for more than 90% of Iran's total drinking water consumption. This water resource is less vulnerable to bacterial pollution and evaporation than surface water and therefore, groundwater is more important than surface water. One of the major limiting factors in water exploitation is unsuitable water quality. Human activities such as agricultures, manufacturing and urban development affect the quality of groundwater. Unfortunately, the groundwater quality is now being endangered due to inappropriate exploitation and increased human activity in recent decades. Thus, it is necessary to study water quality in order to manage water resources properly. Since experi-ments on water quality are time consuming and expensive, models are often employed as supplement to simulate water quality. Artificial neural network (ANN) was applied for simulation in the field of water resources modeling in the early 1990s. Its usage has increased significantly over the last decade, resulting in a number of studies on its applications (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12). ANNs had provided an appealing solution to simulate water resources system (13,14). The multi-layer perceptron (MLP) feed-forward network types have been widely applied to simulate hydrological parameters (15). Numerous studies have been conducted on the application of neural networks for groundwater forecasting (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29). It is necessary to introduce an index in water quality studies in order to evaluate the quality of water. In the past decades, various water quality indices have been used in previous studies for special purposes (30). Since the needed water quality parameters are available, we applied the ground water quality index (GWQI) which was first introduced by Ribeiro et al (31), to evaluate the ground water quality. So far, many studies have been conducted to measure surface and GWQI. The general water quality index (WQI) was developed by Brown et al (32), and later improved by the Scottish Development Department (33), following the suggestion in Horton (34), that various water quality data could be aggregated into a single overall index (35)(36)(37)(38)(39)(40)(41). Gholami et al (42), presented a model for estimating groundwater salinity on the Caspian Sea southern coasts employing the statistical and geographic information system (GIS) techniques. They estimated water salinity (electrical conductivity or EC) local changes with an acceptable accuracy. Lateef (43) investigated the groundwater quality of Tikrit and Samarra in Iraq using the WQI. They categorized the groundwater quality for 10 wells in some cities using WQI and discovered that the groundwater quality was unsatisfactory in the south of Samarra city. He attributed this water quality deterioration to the region's from north to south groundwater drainage system. Singh et al (44) applied a GIS based multi-criteria analysis by assigning weights to different water quality parameters. They grouped the water quality into six classes ranging from very good to unfit for drinking. They discovered that the water quality varied from moderate to good in most part of the study area except in some areas where the groundwater quality was classified as 'poor to unfit. ' An evaluation of change in land usage and land cover from year 1989 to 2006 using Landsat and LISS III satellite data, indicated that the groundwater quality was 'poor to unfit' as a result of rapid urbanization and industrialization (44). Krishna et al (45) applied GIS-ANN hybrid system in predicting arsenic concentration in groundwater. Their results revealed that GIS-ANN integration has a high capability in water quality modeling. This study has been conducted to simulate groundwater quality and also, to provide a methodology for combining ANN and GIS capabilities in hydrologic parameters modeling on the Caspian southern coasts.

Study area
The study area is located at 50º 30′ to 53º 50′ E longitude and 35º 55′ to 36º 45′ N latitude in Mazandaran province ( Figure 1) which is located in the southern Caspian coast in northern Iran. Area of study plain is approximately 10 000 km 2 . The southern coasts of the Caspian Sea mainly include plains made of quaternary sediments. However, there are diverse geologic formations and elevation and slope changes in central regions of Alborz mountains.
Determination of groundwater quality index In this study, eight water quality parameters such as: cation and anion (K + , Na + , Ca 2+ , Cl -, Mg 2+ , SO 4 2-), pH and total dissolved salt (TDS) were selected. These parameters were used for estimating the WQI. We were faced with a limitation in defining the type of WQI because of lack of mea-surements of microbial pollution in the region. At first, out of about 200 drinking water wells in the study area, 85 wells were selected based on their available water quality secondary data. In general, these 85 wells had adequate sampled data with water quality secondary data from 2008 to 2013 (46). As earlier stated, estimating GWQI for 85 wells were made using water quality secondary data (i.e. 6-year data with 4 samples per year). The location of these drinking water wells in the study area is presented in Figure 1. Prior to studying the GWQI, it is essential to choose a standard criterion in order to determine maximum values of the parameters. Iranian national standards of these eight water quality parameters for potable water are presented in Table 1. Eq. (1) was applied to estimate the GWQI based on the standard values given in Table 1: where GWQI denotes the groundwater quality index; w i is the relative weight of the parameter; C i is the concentration of the parameter and Cs i is the national standard concentration of the parameter for potable water. Each parameter has a different weight in terms of its contribution to water quality. The corresponding weight values of the parameters are then aggregated using some types of sum or mean (e.g., arithmetic, harmonic, geometric), frequently including individual weighing factors (34,41,47,48). The relative importance/contribution or the weights of parameters in  the final GWQI are defined on the basis of the extent of their participation in the water quality determination in order to estimate the final index by aggregating all the normalized parameters. The weight of the participation of each parameter in the final GWQI is given in Table 2. GWQI values are categorized into three classes; high (GWQI > 0.15), low (GWQI < 0.04), and suitable (0.04 < GWQI < 0.15) (49). For data collection and processing, we applied GIS. Different digital/base maps were provided in GIS environment including digital elevation model (DEM), transmissivity of aquifer formations, water table depth (50), residential and industrial areas using topographic maps of the region and GWQI values using the water quality secondary data (46).
Groundwater quality simulation using ANN An ANN includes three layers, i.e., input layer, hidden layer and output layer. A network can have more than one hidden layer. In this study, MLP was applied to simulate groundwater quality. A typical MLP structure is illustrated in Figure 2. MLP is generated by adding one or more hidden layers to one-layer perceptron and this topology can solve complex problems (51). Determination of the network optimum structure and number of neurons are important in network planning. MLP is the most extensively applied neural network architecture in literature for classification or regression problems (52)(53)(54)(55). Three-layer (input, hidden and output) feed-forward neural network with LM back-propagation learning were employed for simulating groundwater quality or GWQI. The first and simplest type of ANN devised was the feed-forward neural network. In the feed-forward network, the information moves in only one direction, forward, from the input nodes, through the hidden nodes and to the output nodes. In the first stage of simulation, all data were normalized and divided into three classes: training data (65% of all data), test data (23% of all data) and cross validation data (12% of all data).The hyperbolic tangent and sigmoid transfer functions were used. Based on the study results (through trial-and-error method), the best transfer function was the hyperbolic tangent transfer function. The objective of network training is to find the network that can simulate the relationship between inputs and outputs model. Since there were no definite rules in planning neural network structure, we evaluated different structures of the network. In the study area, 85 drinking water wells were selected based on their available water quality secondary data. In general, these 85 wells had adequate sampled data with water quality secondary data. The GWQI parameter was used as an output variable and groundwater quality factors were used as inputs variables. The input factors are depth of the water table, distance from contaminant centers, site elevation or site location, population, households and aquifer formations (transmissivity). The inputs values were estimated using DEM, transmissivity of aquifer formations and water table depth maps (50) then residential and industrial areas were done using topographic maps and satellite images of the region. Also, GWQI values were estimated using the water quality secondary data (46). The estimated input and output data were imported to ANN (Neuro Solutions software) medium. The most commonly applied method use in determining the optimum structure, learning rate and momentum parameter is the trial-and-error approach (15). We changed the numbers of hidden neurons from 1 to 10. We found that using trial-and-error method in a MLP network with tangent hyperbolic transfer function, LM training technique was the best network structure in groundwater quality simulation. Network training is one of the main stages in modeling using ANN. Weight coefficients in intermediate and output layers will be determined in the training stage (51,56,57). For the development of ANN model, we should determine both the significant and independent inputs (58,59). Determination of ANN model structure generally involves defining the number of layers, the number of nodes in each layer and how they are connected (10).
Here, an index known as the GWQI was used to evaluate the water quality. In the study area, 85 drinking water wells were selected and for each drinking water well, the GWQI value was then estimated. Appropriate input parameters were selected by trial-and-error method and sensitivity analysis. Eight input patterns were investigated and their efficiencies were evaluated and compared (Eq. 2-9):  where GWQI is the groundwater quality index, T is the transmissivity of aquifer formations (m 2 /day), G wTable is the groundwater table depth (m) and L c is the distance of a well from contaminant and residential centers (m). P and H are the population and the number of household within the area of 1 km 2 and E is the site elevation. A sensitivity analysis of model inputs was done in order to determine key parameters to groundwater quality. Sensitivity analysis is usually performed to study the effect of inputs on the outputs and to determine if any insignificant inputs can be ignored. Results revealed that the significant factors on groundwater quality and the best inputs in the simulation of groundwater quality were transmissivity of aquifer formations, groundwater depth, and distance from residential and industrial areas. An optimum network can be defined by three main components: transfer function, network architecture and learning rule (60). The determination of the network size is usually carried out by trial and error experimentation. The procedure begins with one neuron in one hidden layer and progressing (with increasing size) until the performance of the test is found suitable (61). The ANN efficiency was evaluated using the mean squared error (MSE) and the coefficient of determination (R 2 ). The MSE and R 2 are defined as (Eq. 10 & 11): where Q i is the observed value, Qi ∧ is the simulated value and Qi is the mean of the observed data and ~i Q is the mean of the simulated data and n is the number of data. The ANN efficiency (network validation) is evaluated using the MSE and the coefficient of determination (R 2 ). Then, the optimized network was validated throughout the comparison between the actual values and the estimated values (test stage).
Integration of ANN and GIS in simulating groundwater quality ANN is an efficient tool in hydrologic studies. In this study, an integration of ANN and GIS has been employed to simulate groundwater quality. ANN and GIS have been used for simulation and as a pre-processing and postprocessing system of the applied data respectively. Thus, GIS was applied as an efficient tool to provide base maps and to estimate model quantitative parameters. Different digital/base maps were provided in GIS environment including DEM, transmissivity of aquifer formations, water table depth (50), the number of household and populations, residential and industrial areas using topographic maps of the region and GWQI values using water quality secondary data (46). Eighty-five drinking water wells were selected to simulate GWQI. Then, the estimated data of these parameters were inputted into the ANN medium (Neuro Solutions software) for modeling. Initially, the data was separated into three parts, namely, training data, cross validation data and testing data. The model was optimized on network structure (transfer function, inputs and the number of neuron) by using trial-and-error method. Then, optimized model was evaluated using testing data. After validation model, we applied the validated optimum model for simulating the GWQI in sites without water quality experiments. In this step, GIS had a pre-processing role in modeling process. The objective of the present study was using ANN to simulate groundwater quality in a manner of geo-referenced graphic for sites without water quality data. The results revealed that the optimized network structure needs to have three inputs such as: transmissivity of aquifer formation, water table depth and the distance from contaminant centers. Raster layers of the three input factors were provided in GIS pre-processing stage and were combined using overlay analysis with a pixel size 1×1 km. So, the surface of study plain was separated to more than 10 000 geo-referenced pixels (1×1 km). These pixels had values of model inputs or groundwater quality factors (transmissivity of aquifer formation, water table depth and the distance from contaminant centers). We automatically inserted the site coordinate for every pixel in the GIS medium. Pixels data (networks inputs and coordinate) were exported from GIS and then imported to Neuro Solutions software. In ANN medium, GWQI was simulated using the validated optimum network for all of the 10 000 pixels (all of the study plain). In the next step, simulated GWQI were imported from ANN to GIS medium with geographic location data (X, Y). GIS had a post-processing role in this phase of study. We generated groundwater quality map using GWQI values (throughout geographic coordinate as an assisting agent for distinguishing geographic coordinate) and GIS capabilities study plain. Actual GWQI values of 85 drinking water wells were overlain on the generated raster layer of GWQI in GIS and result accuracy was evaluated via the comparison between the simulated GWQI and the actual GWQI in GIS. Evaluation results showed that the results were accurate and acceptable. Finally, groundwater raster layer was presented after been classified as groundwater quality map. Study stages are shown in Figure 3. In this study, groundwater quality simulation was performed using ANN and GIS capabilities in an extensive area with high accuracy and the results were presented in a manner of geo-referenced graphic (map).

Results
GWQI indices were estimated for the studied drinking water wells based on the 6-year sampling data with four samples per year. We estimated significant factors on water quality including aquifer formations transmissivity, water table depth, site elevation, distance from contaminant centers and populations. A number of the estimated As can be seen in Figures 4, 5 and 6, digital maps of these three factors were generated in GIS. Figure 7 shows the results of ANN simulation in the training stage for ground water quality simulation and, as can be seen, R 2 = 0.95. The results of network evaluation were presented in Tables 4 and 5. Tables 4 and 5      significant factors on water quality and GWQI were presented in Table 3. These data were imported in ANN medium for simulating groundwater quality. In the training stage, three factors are indicated as the best inputs for simulating groundwater quality due to the changes in input data pattern and sensitivity analysis. These three factors include transmissivity of aquifer formation, groundwater depth and distance from contaminant centers (42).
results in the training stage. Optimum network structure in groundwater quality simulation included a MLP with three inputs, tangent hyperbolic transfer function, LM (Levenberg-Marquart) training technique and one neuron. One of the best selections in modeling of hydrologic parameters are tangent hyperbolic transfer function and LM training technique and were employed in several studies as a prior selection in the world (62). After optimizing network, testing stage or efficiency evaluation is performed. Evaluation of ANN efficiency in groundwater quality simulation, via comparison between estimated and actual GWQI values, in the validation (testing) stage is shown in Figure 8 (R 2 =0.73). According to the results, ANN can simulate GWQI with an acceptable accuracy. ANN capability in similar modelling is validated by previous studies (63). The objective of this study is to estimate groundwater quality in sites without water quality data and also to preset the result in a manner that can be usable for all users. In GIS, raster layers of the three input factors were combined using overlay analysis with a pixel size 1×1 km. Pixels data (network inputs and coordinate) were exported from GIS and then imported into Neuro Solutions software. In ANN medium, GWQI was simulated using the validated optimum network for all of the study plain. In the next step, simulated GWQI were imported from ANN to GIS medium with geographic location data (X, Y). GWQI values throughout the geographic coordinate and     GIS capabilities were generated on groundwater quality map for study plain. Groundwater quality maps are shown in Figures 9 and 10. As can be seen, GWQI actual values were overlain on the generated GWQI map in GIS. We can evaluate result accuracy by overlaying actual values on the generated map. Results revealed that the estimated GWQI has a suitable accuracy and the results can be particularly employed in classifying ground water quality. Comparison between water quality zones and estimated GWQI values showed the efficiency and accuracy of the integration of ANN and GIS in modeling (45,64). GWQI values are classified into three categories; high (GWQI>0.15), low (GWQI<0.04), and suitable (0.04<GWQI <0.15) (49). As can be seen in Figures 9 and 10, the presented methodology can accurately simulate groundwater quality for ground water classification and existing error values does not flaw in the accuracy of water quality classification in the surface of a plain or a watershed. The actual GWQI values of 85 drinking water wells were overlain on the generated raster layer of GWQI in GIS and the result accuracy was evaluated via comparison between the simulated GWQI and the actual GWQI in GIS. Evaluation results showed that the results were accurate and acceptable. Finally, the GWQI raster layer was presented after the classification of the groundwater quality map.

Discussion
Eight model structures were developed to evaluate the probability impacts of enabling/disabling transmissivity of aquifer formation, water table depth, the distance from contaminant centers, elevation, number of household and population as inputs. Results showed that three factors, viz; transmissivity of aquifer formation, water table depth, and the distance from contaminant centers, are the most important factors and the best inputs for groundwater quality modeling (42). ANN is an efficient tool in modeling but, its results cannot be preset in the forms of maps and geo-referenced data. We applied ANN in simulating groundwater quality and GIS as a pre-processing and post-processing tool in the result monitoring and mapping. Also, GIS resulted in an increasing modeling accuracy and modeling velocity. The best network structure in groundwater quality simulation was a MLP network with tangent hyperbolic transfer function and LM training technique. Previous studies proved that an ANN with LM technique is an efficient structure in hydrological parameters simulation (65,66). Based on the results of the training stage, mean square error (MSE) and coefficient of determination (R 2 ) measures were 0.01 and 0.9 respectively. In the cross validation stage, mean MSE was 0.016. Furthermore, the results revealed that the best performance for LM algorithm was produced by the network. Using ANN for hydrologic parameters simulation followed good results in the past and, in most cases, there have been high correlation between simulated and observed hydrographs (67)(68)(69). Litta et al (70) developed ANN model with LM algorithm to derive thunderstorm forecasts from 1 to 24 ahead at Kolkata. In the testing stage, MSE and coefficient of determination (R 2 ) measures were 0.0005 and 0.73 respectively. The basis of the present study is automatic relation between ANN and GIS in modeling and mapping of results. Also, the results should have capability of overlay analysis with other digital data. GIS can provide a high volume of input data within a short time and ANN can simulate hydrologic parameters for the sites without water quality data within a short time. Finally, the integration of ANN and GIS can present results in a manner of digital maps. The groundwater quality maps indicated that the quality of groundwater is improper in terms of potable water quality standards of Iran in most of the study plain. It is necessary to plan to conserve and optimize usage of water resources. Unfortunately, the quantitative data of network inputs (transmissivity, groundwater table depth and distance from contaminant centers of settlements and manufacture) are available in the Mazandaran plain. Therefore, we can apply the current methodology or simi-lar methods in the surface of Mazandaran plain.

Conclusion
Results evidently revealed that the ANNs are capable of modeling the groundwater quality. This, therefore, substantiate the general enhancement achieved by using neural networks in several other hydrological fields (12). Results of the sensitivity analysis (input factors in network) showed that the most important factors for consideration in water quality are the water table depth, kind of aquifer formation and distance from contaminant centers. Pollution is higher in coastal areas than in other areas as a result of high water table, alluvial sediment existence, population density and upstream watershed flow (71). Therefore, we should focus to plan water quality management in the coastal area. However, this methodology (ANN and GIS integration) can be applied for modeling in other qualitative indices. It is clear that we could select a smaller pixel size that produces a more accurate input about distance from contaminant centers however, a great number of input pixels accompany a limitation for simulation in ANN medium (ANN software). Also, we have not accessed the precise data in two main inputs, namely, groundwater table depth and transmissivity of aquifer formation. ANN can be an efficient tool in hydrologic parameters simulation using suitable inputs and optimum network structure. Also, GIS is an efficient system in data processing and mapping. Coupling ANN with GIS capabilities could provide practitioners with easily interpretable water quality maps in the management of these resources. Therefore, the presented methodology and other groundwater models can be used for prospective planning of sustainable groundwater development and management of groundwater resources.

Acknowledgments
We thank ABFAR (Mazandaran Rural Water and Sewer Company) for providing the groundwater quality secondary data and for helping us with data pre-processing.

Ethical issues
There were no ethical issues for writing of this article. Figure 9. The map of groundwater quality (GWQI) obtained from ANN and GIS capabilities (the west of study plain). In this map, an evaluation of the results accuracy was done using a comparison between the simulated GWQI values with the actual GWQI values. Figure 10. The map of groundwater quality (GWQI) resulted by ANN and GIS capabilities (the east of study plain). In this map, an evaluation of the results accuracy was done using a comparison between the simulated GWQI values with the actual GWQI values.