Kybernetika 57 no. 1, 15-37, 2021

A depth-based modification of the k-nearest neighbour method

Ondřej Vencálek and Daniel HlubinkaDOI: 10.14736/kyb-2021-1-0015

Abstract:

We propose a new nonparametric procedure to solve the problem of classifying objects represented by $d$-dimensional vectors into $K \geq 2$ groups. The newly proposed classifier was inspired by the $k$ nearest neighbour (kNN) method. It is based on the idea of a depth-based distributional neighbourhood and is called $k$ nearest depth neighbours (kNDN) classifier. The kNDN classifier has several desirable properties: in contrast to the classical kNN, it can utilize global properties of the considered distributions (symmetry). In contrast to the maximal depth classifier and related classifiers, it does not have problems with classification when the considered distributions differ in dispersion or have unequal priors. The kNDN classifier is compared to several depth-based classifiers as well as the classical kNN method in a simulation study. According to the average misclassification rates, it is comparable to the best current depth-based classifiers.

Keywords:

data depth, Bayes classifier, k nearest depth neighbours, nonparametric

Classification:

62H30, 62G30

References:

  1. C. Agostinelli and M. Romanazzi: Local depth. J. Statist. Plann. Inference 141 (2011), 817-830.   DOI:10.1016/j.jspi.2010.08.001
  2. C. B. Barber, D. P. Dobkin and H. Huhdanpaa: The quickhull algorithm for convex hulls. ACM Trans. Math. Software (TOMS) 22 (1996), 4, 469-483.   DOI:10.1145/235815.235821
  3. A. Christmann and P. J. Rousseeuw: Measuring overlap in binary regression. Comput. Statist. Data Analysis 37 (2001), 65-75.   DOI:10.1016/S0167-9473(00)00063-3
  4. L. H. Cox, M. M. Johnson and K. Kafadar: Exposition of statistical graphics technology. In: ASA Proc Stat. Comp Section 1982, pp. 55-56.   DOI:10.1016/S0167-9473(00)00063-3
  5. S. Dutta and A. K. Ghosh: On classification based on Lp depth with an adaptive choice of p. Preprint, 2011.   CrossRef
  6. S. Dutta and A. K. Ghosh: On robust classification using projection depth. Ann. Inst.Statist. Math. 64 (2012), 3, 657-676.   DOI:10.1007/s10463-011-0324-y
  7. S. Dutta, A. K. Ghosh and P. Chaudhuri: Some intriguing properties of Tukey's half-space depth. Bernoulli 17 (2011), 4, 1420-1434.   DOI:10.3150/10-BEJ322
  8. E. Fix and J. L. Hodges: Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties. Technical Report 4, Randolph Field, Texas: USAF School of Aviation Medicine, 1951.   CrossRef
  9. R. Fraiman, R. Y. Liu and J. Meloche: Multivariate density estimation by probing depth. Lecture Notes-Monograph Series 1997, pp. 415-430.   CrossRef
  10. A. K. Ghosh and P. Chaudhuri: On maximum depth and related classifiers. Scand. J. Statist. 32 (2005), 327-350.   DOI:/10.1111/j.1467-9469.2005.00423.x
  11. A. K. Ghosh and P. Chaudhuri: On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli 11 (2005), 1, 1-27.   DOI:10.3150/bj/1110228239
  12. K. Habel, R. Grasman, R. B. Gramacy, P. Mozharovskyi and D. C. Sterratt: Geometry: Mesh Generation and Surface Tessellation. R package version 0.4.5.    CrossRef
  13. D. Hlubinka, L. Kotík and O. Vencálek: Weighted data depth. Kybernetika 46 (2010), 1, 125-148.   CrossRef
  14. M. Hubert and S. van der Veeken: Fast and robust classifiers adjusted for skewness. In: COMPSTAT 2010: Proceedings in Computational Statistics: 19th Symposium held in Paris 2010 (Y. Lechevallier and G. Saporta, eds.), Springer, Heidelberg 2010, pp. 1135-1142.   CrossRef
  15. R. Jörnsten: Clustering and classification based on the $L_1$ data depth. J. Multivar. Anal. 90 (2004), 67-89.   CrossRef
  16. L. Kotík and D. Hlubinka: A weighted localization of halfspace depth and its properties. J. Multivar. Anal. 157 (2017), 53-69.   DOI:10.1016/j.jmva.2017.02.008
  17. D. Kosiorowski and Z. Zawadzki: DepthProc An R Package for Robust. Exploration of Multidimensional Economic Phenomena, 2020.   CrossRef
  18. T. Lange, K. Mosler and P. Mozharovskyi: Fast nonparametric classification based on data depth. Statist. Papers 55 (2014), 1, 49-69.   CrossRef
  19. J. Li, J. A. Cuesta-Albertos and R. Y. Liu: DD-classifier: Nonparametric classification procedure based on DD-plot. J. Amer. Statist. Assoc. 107 (2012), 498, 737-753.   DOI:10.1080/01621459.2012.688462
  20. R. Y. Liu: On a notion of data depth based on random simplices. Ann. Statist. 18 (1990), 1, 405-414.   DOI:10.1214/aos/1176347507
  21. R. Y. Liu, J. M. Parelius and K. Singh: Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 27 (1999), 783-858.   DOI:10.1214/aos/1018031260
  22. K. Mardia, J. Kent and J. Bibby: Multivariate Analysis. Academic Press, 1979.   CrossRef
  23. D. Paindaveine and G. Van Bever: From depth to local depth: a focus on centrality. J. Amer. Statist. Assoc. 105 (2013), 1105-1119.   CrossRef
  24. D. Paindaveine and G. Van Bever: Nonparametrically consistent depth-based classifiers. Bernoulli 21 (2015), 1, 62-82.   DOI:10.3150/13-BEJ561
  25. O. Pokotylo, P. Mozharovskyi and R. Dyckerhoff: Depth and depth-based classification with R package ddalpha. J. Statist. Software 91 (2019), 5, 1-46.   CrossRef
  26. R. Serfling: Depth functions in nonparametric multivariate inference. In: Data Depth: Robust Multivariate Analysis, Computational Geometry and Applications (R. Y. Liu, R. Serfling, and D. L. Souvaine, eds.), American Mathematical Society, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 7, New York 2006, pp. 1-16.   CrossRef
  27. O. Vencalek: $k$-Depth-nearest neighbour method and its performance on skew-normal distributons. Acta Univ. Palacki Olomouc., Fac. Rer. Nat., Mathematica 52 (2013), 2, pp. 121-129.   CrossRef
  28. I. C. Yeh, K. J. Yang and T. M. Ting: Knowledge discovery on RFM model using Bernoulli sequence. Expert Systems Appl. 36 (2009), 5866-5871.   DOI:10.1016/j.eswa.2008.07.018
  29. A. Zakai and Y. Ritov: Consistency and localizability. J. Machine Learning Res. 10 (2009), 827-856.   CrossRef
  30. Y. Zuo and R. Serfling: General notion of statistical depth function. Ann. Statist. 28 (2000), 461-482.   DOI:10.1214/aos/1016218226
  31. Y. Zuo and R. Serfling: Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist. 28 (2000) 2, 483-499.   DOI:10.1214/aos/1016218227