Predicting arsenic and heavy metals contamination in groundwater resources of Ghahavand plain based on an artificial neural network optimized by imperialist competitive algorithm

Background: The effects of trace elements on human health and the environment gives importance to the analysis of heavy metals contamination in environmental samples and, more particularly, human food sources. Therefore, the current study aimed to predict arsenic and heavy metals (Cu, Pb, and Zn) contamination in the groundwater resources of Ghahavand Plain based on an artificial neural network (ANN) optimized by imperialist competitive algorithm (ICA). Methods: This study presents a new method for predicting heavy metal concentrations in the groundwater resources of Ghahavand plain based on ANN and ICA. The developed approaches were trained using 75% of the data to obtain the optimum coefficients and then tested using 25% of the data. Two statistical indicators, the coefficient of determination (R2) and the root-mean-square error (RMSE), were employed to evaluate model performance. A comparison of the performances of the ICA-ANN and ANN models revealed the superiority of the new model. Results of this study demonstrate that heavy metal concentrations can be reliably predicted by applying the new approach. Results: Results from different statistical indicators during the training and validation periods indicate that the best performance can be obtained with the ANN-ICA model. Conclusion: This method can be employed effectively to predict heavy metal concentrations in the groundwater resources of Ghahavand plain.


Introduction
Environmental pollutants, especially toxic heavy metals, can discharge into natural cycles, (e.g., soil, water, and air) through urbanization, industrialization, agriculture, mining, and exploitation of natural resources (1).Many elements such as Cu, Fe, Mn, Ni, and Zn are essential for human life and play major roles in health in low concentrations, but they can be toxic at high levels.Others, including As, Cd, Cr, Hg, Pb, and Sn, have no known essential function in living organisms and are toxic even at low concentrations.Heavy metals can cause serious adverse health effects in humans; thus, they are known as the most dangerous pollutants (2)(3)(4)(5)(6).Surface and groundwater resources are important for human life and for economic development (1).More than 50% of the world's population depends on groundwater resources for drinking, agriculture, and for general survival (7,8).Therefore, the contamination of groundwater by toxic heavy metals is a serious global environmental problem.Arsenic is a widely distributed metalloid that is also a carcinogen for humans, even at low levels of exposure (9).The combustion of fossil fuels, smelting of non-ferrous metals, and use of arsenical pesticides in agriculture are the main sources of this element in the environment (10).Some foods, such as vegetables, fruits, nuts, red meat, and shellfish, are known as sources of copper.Although Cu can play a critical role in various biochemical processes (11), a constant diet of this element results in the dissolution of the barrier that keeps undesirable toxins from entering the brain.Critical doses of this element can cause adverse health effects such as fatigue, hair loss, inflammation of brain tissues, panic attacks, premenstrual syndrome, anorexia, allergies, liver and kidney dysfunction, and also cancer (12).Poor reproductive capacity, blood pressure, impaired organ function, tumors, and hepatic abnormalities are known symptoms of chronic exposure to lead (13).Lead can also affect brain activity by interfering with synaptogenesis and neurotransmitter release.It has been proven that the consumption of Pb-contaminated food can cause adverse effects on human health, such as a reduction in IQ, learning disabilities, kidney failure, hyperactivity, slow growth, impaired hearing, and antisocial behaviors (9,14).Zinc is known as an essential element in biological systems because of its role in catalyzing reactions and the reversible changes in the oxidation state of metal ions.It should be noted that exposure to high levels of Zn can cause disruptions in some physiological activities, particularly breathing (15,16).In recent years, different artificial neural network (ANN) approaches have been successfully applied in a large number of studies on forecasting water resources problems because of their ability to model nonlinear systems (17)(18)(19)(20).Nor et al (21) developed ANN-based models for estimating nitrate and sulfate in water sources.Their results showed the good accuracy of ANN models.Mandal et al (22) presented an ANN model based on a backpropagation (BP) training algorithm (ANN-BP) for predicting removal efficiency.Their results showed that the ANN-BP can predict adsorption efficiency with acceptable accuracy.Keskin et al (23) investigated the applicability of ANN models for predicting water pollution sources in several areas of Turkey.They found that the ANN model can yield acceptable results.Hossain and Piantanakulchai (24) proposed a model based on GIS and the classification tree method to predict groundwater arsenic contamination risk.They demonstrated that the proposed model can effectively forecast the degree of As accumulation in groundwater with acceptable accuracy.Alizamir and Sobhanardakani (19) applied ANNs to forecast As, Pb and Zn concentrations in the groundwater resources of Asadabad plain.Their results showed the feasibility of ANNs in modeling the concentrations of heavy metals.Alizamir et al (20) applied two ANNs (MLP and RBF) to estimate heavy metals concentrations in the Asadabad plain.As demonstrated in their study, the MLP model offered better results than the single RBF model.In the current study, an ANN with two training algorithms was proposed for the prediction of heavy metal concentrations in the groundwater resources of Ghahavand plain.Artificial intelligence models are modeling tools that can identify statistical relationships between the input and output parameters of a complex system.This study introduces a model for predicting heavy metal concentrations using an ANN with imperialist competitive algorithm (ICA) and Levenberg-Marquardt (LM).

Study area
Ghahavand plain with an area of 2360 km 2 is located in Hamadan province, western Iran.Drinking water for residents of this plain is supplied by 1788 wells, 104 springs, and 96 aqueducts (25,26).

Sample collection
Based on Cochran's formula, a total of 60 groundwater samples were collected from 20 different wells under exploitation in the study area, including agricultural and residential regions.The locations of groundwater sampling stations are presented in Figure 1.

Sample preparation and analysis
In the current study, groundwater samples were taken according to the method introduced by Sobhanardakani et al (1).Then they were filtered with Whatman No. 42, preserved with 65% nitric acid (Merck, Germany), and kept at 4°C for further analysis (1,27).Finally, the concentrations of arsenic and heavy metals (Cu, Pb, and Zn) in groundwater samples were determined using an inductively coupled plasma-optical emission spectrometer at the wavelengths of 188.980 nm for As, 324.754 nm for Cu, 220.353 nm for Pb, and 206.200 nm for Zn (710-ES, Varian, Australia).
Artificial neural network ANNs can describe nonlinear and complex relationships using a part of the input and output training patterns from the dataset.These approaches establish a non-linear relationship between inputs and outputs (28).An ANN can be demonstrated based on architecture that shows the connection pattern between nodes, connection weights method determination, and the activation function (29).Because of their ability to learn a system's dynamics from data, ANNs are able to solve large-scale complex problems (30,31).The most commonly used neural network architecture is the feed-forward neural network (FFNN).The structure of a three-layered FFNN is based on some neurons in each layer and elements which link them (30).
The training of a network is based on the optimization process for weights to obtain the appropriate weights to minimize errors; this process continues until the values of the output layer are as close as possible to the actual outputs (28).In this study, the LM training algorithm was utilized to tune the weights (29,32).Figure 2 shows the feed-forward network for this study, having one hidden layer with several nodes between the input and output layers.

Imperialist competitive algorithm
The ICA was proposed by Atashpaz-Gargari and Lucas (33) as a novel optimization algorithm.This algorithm was inspired by imperialistic competition.Like other evolutionary algorithms, it starts with an initial population.In concept of this approach, population individuals are called countries and are in two types: colonies and imperialists.All together, they form empires (33).In competition with each other, powerful empires obtain new colonies, and weak ones collapse.At the end of the algorithm, only the most powerful imperialist exists, and all the countries are colonies of the strongest empire.These colonies have the same position and cost as the imperialist.The ICA was applied in several benchmark problems and it revealed reliability in the optimization of different cost functions.A flowchart of the ICA is presented in Figure 3.

Model performance evaluation
The following statistical indicators were selected in the performance evaluation ANN models: 1) root-mean-square error (RMSE) (Eq. 1) 2) Pearson correlation coefficient (r) (Eq.2) 3) Coefficient of determination (R 2 ) (Eq. 3) where n is the total number of data, and P i and O i are the heavy metal concentrations predicted by the ANN methods and measured values, respectively.

Results
Descriptive statistics of elements content (µg/L) in groundwater samples collected from Ghahavand plain are indicated in Table 1.The average levels of As, Cu, Pb, and Zn in groundwater samples were 8.26 ± 1.09 µg/L, 9.25 ± 0.06 µg/L, 2.57 ± 0.30 µg/L, and 10.41 ± 4.68 µg/L, respectively.The results of statistical analysis (one sample t test) showed that the mean concentrations of analyzed elements were lower than the maximum permissible limits (µg/L) (100.0,200.0, 100.0, and 2000.0 for As, Cu, Pb, and Zn, respectively) established by the World Health Organization (WHO) (25).
Neural networks have been successfully applied in different fields for environmental problems.In the present study, the same training and testing data sets were employed for the development of ANN-ICA and ANN-LM models.The collected data was divided into training and testing parts (80% and 20%, respectively).Since there is no criteria in ANN modeling to tell how many hidden nodes are needed, selecting the optimum number of hidden nodes is a difficult task.Here, a three-layer MLP with one hidden layer and the trial and error procedure were applied to select the number of hidden nodes (32,34).Sigmoid and linear functions were employed for the hidden and output node activation functions, respectively.For all heavy metal concentrations, the ANN models were first trained using the data in the training sets to obtain the optimized set of learning coefficients and then tested.RMSE, determination coefficients (R 2 ), and Pearson  correlation coefficients (r) were used as evaluation criteria.
For the ANN simulations, program codes were written in MATLAB software.

Discussion
To demonstrate the merits of the proposed ANN-ICA approach, the prediction accuracy of the model was compared to the prediction accuracy of the ANN-LM method, which was used as the benchmark.Table 2 presents

Conclusion
In the current study, the long-term changes in trends of heavy metal levels (As, Cu, Pb, and Zn) in groundwater resources of Ghahavand plain were estimated using an ANN with ICA and LM.Observations collected in the Ghahavand plain were used for model training and testing.Four predictive models for As, Pb, Cu, and Zn were created using the ANN-ICA approach.The ANN-ICA and ANN-LM methods were compared to assess prediction accuracy.The results, measured in terms of RMSE, r, and R 2 , revealed that the ANN-ICA model was superior to the ANN-LM model.Heavy metal concentrations can be estimated from easily available data using the ANN-ICA technique.The proposed ANN-ICA approach can be implemented for forecasting heavy metal concentrations in groundwater resources data in environmental modelling studies.

Discussion
To demonstrate the merits of the proposed ANN-ICA approach, the prediction accuracy of the model was compared to the prediction accuracy of the ANN-LM method, which was used as the benchmark.Table 2 presents

Discussion
To demonstrate the merits of the proposed ANN-ICA approach, the prediction accuracy of the model was compared to the prediction accuracy of the ANN-LM method, which was used as the benchmark.Table 2 presents    Azad University for providing the facilities necessary to conduct and complete this study.

Figure 2 .
Figure 2. The neural network model for estimating heavy metals concentrations in groundwater resources of Ghahavand plain.

Figure 7 . 10 Figure 4 .
Figure 7.Observed and simulated Cu concentrations by ANN-LM during training and testing phases.

Figure 5 . 10 Figure 4 .
Figure 5. Observed and simulated As concentrations by ANN-LM model during training and testing phases.

Figure 5 .
Figure 5. Observed and simulated As concentrations by ANN-LM model during training and testing phases.

Figure 4 .
Figure 4. Observed and simulated As concentrations by ANN-ICA model during training and testing phases.

Figure 7 .
Figure 7. Observed and simulated Cu concentrations by ANN-LM model during training and testing phases.

Figure 8 .
Figure 8. Observed and simulated Pb concentrations by ANN-ICA model during training and testing phases.

Figure 5 .
Figure 5. Observed and simulated As concentrations by ANN-LM model during training and testing phases.

Figure 6 .
Figure 6.Observed and simulated Cu concentrations by ANN-ICA model during training and testing phases.

Figure 6 .
Figure 6.Observed and simulated Cu concentrations by ANN-ICA model during training and testing phases.

12 Figure 8 .
Figure 8. Observed and simulated Pb concentrations by ANN-ICA model during training and testing phases.

Figure 9 .
Figure 9. Observed and simulated Pb concentrations by ANN-LM model during training and testing phases.

Figure 9 .
Figure 9. Observed and simulated Pb concentrations by ANN-LM model during training and testing phases.

Figure 8 .
Figure 8. Observed and simulated Pb concentrations by ANN-ICA model during training and testing phases.

Figure 9 .
Figure 9. Observed and simulated Pb concentrations by ANN-LM model during training and testing phases.

Figure 10 .
Figure 10.Observed and simulated Zn concentrations by ANN-ICA model during training and testing phases.

Figure 11 .
Figure 11.Observed and simulated Zn concentrations by ANN-LM model during training and testing phases.

13 Figure 10 .
Figure 10.Observed and simulated Zn concentrations by ANN-ICA model during training and testing phases.

Figure 11 .
Figure 11.Observed and simulated Zn concentrations by ANN-LM model during training and testing phases.
a numerical comparison of the ANN-ICA and the ANN-LM models y

Figure 10 .
Figure 10.Observed and simulated Zn concentrations by ANN-ICA model during training and testing phases.

Figure 11 .
Figure 11.Observed and simulated Zn concentrations by ANN-LM model during training and testing phases.

Table 1 .
Descriptive statistics of metals contents (µg/L) in groundwater resources of Ghahavand plain

Table 2 .
Comparative performance of ANNs for As, Cu, Pb, and Zn concentrations