Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods

Saeipourdizaj, Parisa; Sarbakhsh, Parvin; Gholampour, Akbar

doi:10.34172/EHEM.2021.25

[Home ] [Archive]

Environmental Health Engineering And Management Journal

Main Menu

Home

Journal Information

Articles archive

For Authors

For Reviewers

Registration

Contact us

Site Facilities

Search in website

Receive site information

Open Access

MeSH Browser

Scopus quartile

Google Scholar

	All	Since 2021
Citations	4196	3190
h-index	29	24
i10-index	147	112

ORCID

EBSCO

Volume 8, Issue 3 (Summer 2021)

Environ. Health Eng. Manag. 2021, 8(3): 215-226

Back to browse issues page

Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods

Parisa Saeipourdizaj

, Parvin Sarbakhsh ^*

, Akbar Gholampour

Corresponding author: Health and Environment Research Center, Tabriz University of Medical Sciences, Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran , p.sarbakhsh@gmail.com

Abstract: (6281 Views)

Background: PIn air quality studies, it is very often to have missing data due to reasons such as machine failure or human error. The approach used in dealing with such missing data can affect the results of the analysis. The main aim of this study was to review the types of missing mechanism, imputation methods, application of some of them in imputation of missing of PM10 and O3 in Tabriz, and compare their efficiency.
Methods: Methods of mean, EM algorithm, regression, classification and regression tree, predictive mean matching (PMM), interpolation, moving average, and K-nearest neighbor (KNN) were used. PMM was investigated by considering the spatial and temporal dependencies in the model. Missing data were randomly simulated with 10, 20, and 30% missing values. The efficiency of methods was compared using coefficient of determination (R2), mean absolute error (MAE) and root mean square error (RMSE).
Results: Based on the results for all indicators, interpolation, moving average, and KNN had the best performance, respectively. PMM did not perform well with and without spatio-temporal information.
Conclusion: Given that the nature of pollution data always depends on next and previous information, methods that their computational nature is based on before and after information indicated better performance than others, so in the case of pollutant data, it is recommended to use these methods.

Keywords: Air pollution, Algorithms, Environmental pollutants, Spatio-temporal analysis, Humans
eprint link: http://eprints.kmu.ac.ir/id/eprint/38224

Full-Text [PDF 1133 kb] (3401 Downloads)

Type of Study: Original Article | Subject: General
Received: 2021/09/19 | Accepted: 2021/08/1 | Published: 2021/09/26

Send email to the article author

Add your comments about this article

‎ 10.34172/EHEM.2021.25

‎ 20.1001.1.24233765.2021.8.3.4.6

Ethics code: IR.TBZMED. REC.1398.352

Mendeley

Zotero

RefWorks

Saeipourdizaj P, Sarbakhsh P, Gholampour A. Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods. Environ. Health Eng. Manag. 2021; 8 (3) :215-226
URL: http://ehemj.com/article-1-815-en.html

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Volume 8, Issue 3 (Summer 2021)

Back to browse issues page

Persian site map - English site map - Created in 0.15 seconds with 49 queries by YEKTAWEB 4735