Kybernetika 59 no. 6, 861-879, 2023

Seasonal time-series imputation of gap missing algorithm (STIGMA)

Eduardo Rangel-Heras, Pavel Zuniga, Alma Y. Alanis, Esteban A. Hernandez-Vargas and Oscar D. SanchezDOI: 10.14736/kyb-2023-6-0861

Abstract:

This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every $10$ minutes over a period of $365$ days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions $52,560$ rows for data points over time and $4$ columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing $5\

Keywords:

contiguous missing values, seasonal patterns, time-series

Classification:

62-04, 68Pxx

References:

  1. H. Ahn, K. Sun and K. P. Kim: Comparison of missing data imputation methods in time series forecasting. Computers Materials Continua 70 (2022), 767-779.   DOI:10.32604/cmc.2022.019369
  2. O. Anava, E. Hazan and A. Zeevi: International Conference on Machine Learning. Proc. Machine Learning Research, Lille 2015.   CrossRef
  3. F. Bashir and H. L. Wei: Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm. Neurocomputing 276 (2018), 23-30.   DOI10.1016/j.neucom.2017.03.097
  4. G. E. A. P. A. Batista and M. C. Monard: An analysis of four missing data treatment methods for supervised learning. Appl. Artific. Intell. 17 (2003), 519-533.   DOI:10.1080/713827181
  5. L. P. Bras and J. C. Menezes: Dealing with gene expression missing data. IEE Proceedings - Systems Biology, 153 (2006), 105-119.   DOI: 10.1049/ip-syb:20050056
  6. S. Brown, R. Tauler and B. Walczak: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis. (Second edition.) Elsevier, Smsterdam 2020.   CrossRef
  7. M. K. Choong, M. Charbit and H. Yan: Autoregressive-model-based missing value estimation for DNA microarray time series data. IEEE Trans. Inform. Technol. Biomedicine 13 (2009), 131-137.   DOI:10.1109/TITB.2008.2007421
  8. E. L. Dan, M. D{\^\i}nşoreanu and R. C. Mureşan: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR). IEEE, London 2020.   CrossRef
  9. W. Dunsmuir and P. M. Robinson: Estimation of time series models in the presence of missing data. J. Amer. Statist. Assoc. 76 (1981), 560-568.   DOI:10.1080/01621459.1981.10477687
  10. A. Folch-Fortuny, F. Arteaga and A. Ferrer: Enabling network inference methods to handle missing data and outliers. BMC Bioinformatics 16 (2015), 1-12.   DOI:10.1186/s12859-015-0717-7
  11. A. Folch-Fortuny, F. Arteaga and A. Ferrer: PCA model building with missing data: New proposals and a comparative study. Chemometr. Intell. Labor. Systems 146 (2015), 77-88.   DOI:10.1016/j.chemolab.2015.05.006
  12. A. Folch-Fortuny, F. Arteaga and A. Ferrer: Missing data imputation toolbox for MATLAB. Chemometr. Intell. Labor. Systems 154 (2016), 93-100.   DOI:10.1016/j.chemolab.2016.03.019
  13. J. M. González-Martíneza, O. E. de Noord and A. Ferrer: Multisynchro: a novel approach for batch synchronization in scenarios of multiple asynchronisms. J. Chemometr. 28 (2014), 462-475.   DOI:10.1002/cem.2620
  14. D. Hui, S. Wan, B Su, G. Katul, R. Monson and Y. Luo: Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations. Agricultur. Forest Meteorology 121 (2004), 93-111.   DOI:10.1016/S0168-1923(03)00158-8
  15. W. L. Junger and A. Ponce de Leon: Imputation of missing data in time series for air pollutants. Atmosph. Environment 102 (2015), 96-104.   DOI:10.1016/j.atmosenv.2014.11.049
  16. S. Liu and P. C. M. Molenaar: iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models. Behavior Res. Methods 46 (2014), 1138-1148.   DOI:10.3758/s13428-014-0444-4
  17. R. Magán-Carrión, F. Pulido-Pulido, J. Camacho and P. García-Teodoro: Tampered data recovery in WSNs through dynamic PCA and variable routing strategies. J. Commun. 8 (2013), 738-750.   DOI:10.12720/jcm.8.11.738-750
  18. S. Makridakis, S. C. Wheelwright and R. J. Hyndman: Forecasting: Methods and Applications. (Third edition.) Wiley, India 2008.   CrossRef
  19. D. C. Montgomery: Statistical Quality Control. (Sixth edition.) Wiley, New York 2005.   CrossRef
  20. H. Murad, R. Dankner, A. Berlin, L. Olmer and L. S. Freedman: Imputing missing time-dependent covariate values for the discrete time Cox model. Statist. Methods Medical Res. 29 (2020), 2074-2086.   DOI:10.1177/0962280219881168
  21. D. T. Neves, J. Alves, M. G. Naik, A. J. Proenca and F. Prasser: From missing data imputation to data generation. J. Comput. Sci. 61 (2022), 101640.   DOI:10.1016/j.jocs.2022.101640
  22. N. M. Noor, M. M. Al Bakri-Abdullah, A. Shukri Yahaya and N. A. Ramli: Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set. Trans Tech Publications, Switzerland 2014.   CrossRef
  23. R. Pedreschi, M. L. A. T. M. Hertog, S. C. Carpentier, J. Lammertyn, J. Robben, J. P. Noben, B. Panis, R. Swennen and B. M. Nicola: Treatment of missing values for multivariate statistical analysis of gel-based proteomics data. Proteomics 29 (2008), 1371-1383.   DOI:10.1007/978-1-4020-6754-9\_11728
  24. J. Quevedo, V. Puig, G. Cembrano, J. Aguilar, C. Isaza, D. Saporta, G. Benito, M. Hedo and A. Molina: Estimating missing and false data in flow meters of a water distribution network. IFAC Proc. Vol. 39 (2006), 1181-1186.   DOI:10.3182/20060829-4-CN-2909.00197
  25. Y. Sun, J. Li, Y. Xu, T. Zhang and X. Wang: Deep learning versus conventional methods for missing data imputation: A review and comparative study. Expert Systems Appl. 227 (2023), 120-201.   DOI:10.1016/j.eswa.2023.120201
  26. M. Zarzo and P. Martí: Modeling the variability of solar radiation data among weather stations by means of principal components analysis. Appl. Energy 88 (2011), 2775-2784.   DOI:10.1016/j.apenergy.2011.01.070
  27. Z. Zhang: Missing data imputation: focusing on single imputation. AME Publ. 4 (2016), 1-8.   DOI:10.21037/amj.2016.12.02