Binarization of features based on frequency discretization for clustering tasks : доклад, тезисы доклада

Описание

Тип публикации: доклад, тезисы доклада, статья из сборника материалов конференций

Конференция: Hybrid Methods of Modeling and Optimization in Complex Systems (HMMOCS-III 2024); Krasnoyarsk; Krasnoyarsk

Год издания: 2025

Идентификатор DOI: 10.1051/itmconf/20257204003

Аннотация: This paper explores the transformation of heterogeneous features, including continuous data, into binary form using frequency discretization. This method is particularly beneficial for clustering tasks, as binary features simplify the interpretation of results using logical expressions. In unsupervised learning, where class labels Показать полностьюare unknown, we propose a binarization approach that converts continuous features into binary values based on their frequency distribution. Our experiments show that this technique not only preserves essential information but also improves clustering quality, as measured by the Rand Index, compared to known groupings of industrial product batches. The method reduces noise, simplifies the feature space, and enhances cluster interpretability. Among various distance metrics, the best results were achieved using Cosine distance. These findings highlight the potential of frequency discretization for improving clustering outcomes.

Ссылки на полный текст

Издание

Журнал: ITM Web of Conferences

Номера страниц: 4003

Место издания: Krasnoyarsk

Персоны

  • Masich Igor (Reshetnev Siberian State University of Science and Technology)
  • Shkaberina Guzel (Reshetnev Siberian State University of Science and Technology)
  • Masich Danila (Siberian Federal University)

Вхождение в базы данных

  • РИНЦ (eLIBRARY.RU)