Case study on well-known topic modeling methods for document classification
Yükleniyor...
Tarih
2021
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
IEEE
Erişim Hakkı
info:eu-repo/semantics/embargoedAccess
Özet
Topic modeling has numerous applications like text categorization, topic clustering, document tagging, feature extraction on wide document collections. In this study, practical exploration method of topic modeling of Latent Dirichlet Allocation, transformers based machine learning method Bidirectional Encoder Representations from Transformers and Term Frequency — Inverse Document Frequency method were applied to the document set separately. It includes sport and education articles collected from internet by graduate students, 801 number totally. The purpose of this study is to observe which method best suits to the topic modeling and if possible in order to increase the accuracy rate via ensemble of these methods. As a result of this study, it was observed that even it has some disadvantages, BERT classified the documents with the correct topic with an average of %92.6 success ratio, overwhelming the others.
Açıklama
Anahtar Kelimeler
Classification, Topic modeling, LDA, BERT, TF-IDF
Kaynak
Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]
WoS Q Değeri
N/A
Scopus Q Değeri
N/A