МЕТРИКА СХОЖОСТІ КАТЕГОРІАЛЬНИХ РОЗПОДІЛІВ, ЩО ВРАХОВУЄ СПОРІДНЕНІСТЬ РІЗНИХ КАТЕГОРІЙ

Authors

  • S. D. Shtovba Vasyl’ Stus Donetsk National University, Vinnytsia
  • M. V. Petrychko Vinnytsia National Technical University
  • M. Yu. Petranova Vasyl’ Stus Donetsk National University, Vinnytsia

DOI:

https://doi.org/10.31649/1997-9266-2023-167-2-49-57

Keywords:

categorical distribution, kinship categories, similarity metric, Czekanowski metric, pose detection, reviewer recommendation, , generalized Pareto distribution

Abstract

Estimating a level of similarity of two objects is a common problem in pattern recognition, clustering and classification. Among these problems can be reviewer recommendation, similar text documents analysis, human pose detection in video, species distribution clustering, recommendation in internet-shops etc. In case of categorical attributes an object is described as a distribution of membership degrees over categories. Similarity metrics of such distributions are usually defined as a superposition of objects’ similarities for each category. Most often it is a sum of similarities in separate categories. In addition to that each category is considered independently and in isolation from the others. Some practical problems have categories that are kinship. Therefore, it is expedient to consider objects’ similarity not only directly, as a similarity between equivalent categories, but it is also necessary to consider an indirect similarity, cross-similarity through kinship categories. It is such similarity metric of two categorical distributions that accounts for the kinship of different categories is proposed in this paper. The metric has two components. The first component is defined as Czekanowski metric. It defines a direct similarity of categorical distributions as a sum of intersection of distributions’ membership degrees of two objects. After the intersection the residuals are accounted for in the second component of the metric. The second metric’s component is defined as element-wise product of two matrices: matrix of residuals composition from membership degrees of two categorical distributions and matrix of categories’ paired kinship. It is assumed that kinship indices for each pair of categories are known. As a result, with a large number of categories the overall noisy contribution from weakly kinship categories is prominent. Therefore, it is proposed to filter the noise and account only for contribution from strongly kinship categories.

Author Biographies

S. D. Shtovba, Vasyl’ Stus Donetsk National University, Vinnytsia

Dr. Sc. (Eng.), Professor, Professor of the Chair of Information Technologies

M. V. Petrychko, Vinnytsia National Technical University

Cand. Sc. (Eng.), Junior Researcher of the Laboratory of Artificial Intelligence Problems

M. Yu. Petranova, Vasyl’ Stus Donetsk National University, Vinnytsia

Cand. Sc. (Eng.), Junior Researcher of the Laboratory of Artificial Intelligence Problems

References

N. Sebe, J. Yu, Q. Tian, and J. Amores, “A New Study on Distance Metrics as Similarity Measurement,” in 2006 IEEE International Conference on Multimedia and Expo, Toronto, Ont., 2006, pp. 533-536. https://doi.org/10.1109/ICME.2006.262443 .

Wang Wen-June, “New similarity measures on fuzzy sets and on elements,” Fuzzy sets and systems, no. 85.3, pp. 305-309, 1997. https://doi.org/10.1016/0165-0114(95)00365-7 .

Cha, Sung-Hyuk. “Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions,” International journal of mathematical models and methods in applied sciences, no. 1.4, pp. 300-307, 2007.

Jie Yu, Qi Tian, J. Amores, and N. Sebe, “Toward Robust Distance Metric Analysis for Similarity Estimation,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), 2006, pp. 316-322, https://doi.org/10.1109/CVPR.2006.310 .

S. Shtovba, and M. Petrychko, “An Algorithm for Topic Modeling of Researchers Taking Into Account Their Interests in Google Scholar Profiles,” CEUR Workshop Proceedings, vol. 2864 “Proceedings of the Fourth International Workshop on Computer Modeling and Intelligent Systems”, pp. 299-311, 2021. https://doi.org/10.32782/cmis/2864-26 .

S. Shtovba, and M. Petrychko, “Jaccard Index-Based Assessing the Similarity of Research Fields in Dimensions,” CEUR Workshop Proceedings, vol. 2533 “Proceedings of the First International Workshop on Digital Content & Smart Multimedia”, pp. 117-128, 2019.

Downloads

Abstract views: 119

Published

2023-05-04

How to Cite

[1]
S. D. Shtovba, M. V. Petrychko, and M. Y. Petranova, “МЕТРИКА СХОЖОСТІ КАТЕГОРІАЛЬНИХ РОЗПОДІЛІВ, ЩО ВРАХОВУЄ СПОРІДНЕНІСТЬ РІЗНИХ КАТЕГОРІЙ”, Вісник ВПІ, no. 2, pp. 49–57, May 2023.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.