Case study on well-known topic modeling methods for document classification

Yükleniyor...
Küçük Resim

Tarih

2021

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/embargoedAccess

Özet

Topic modeling has numerous applications like text categorization, topic clustering, document tagging, feature extraction on wide document collections. In this study, practical exploration method of topic modeling of Latent Dirichlet Allocation, transformers based machine learning method Bidirectional Encoder Representations from Transformers and Term Frequency — Inverse Document Frequency method were applied to the document set separately. It includes sport and education articles collected from internet by graduate students, 801 number totally. The purpose of this study is to observe which method best suits to the topic modeling and if possible in order to increase the accuracy rate via ensemble of these methods. As a result of this study, it was observed that even it has some disadvantages, BERT classified the documents with the correct topic with an average of %92.6 success ratio, overwhelming the others.

Açıklama

Anahtar Kelimeler

Classification, Topic modeling, LDA, BERT, TF-IDF

Kaynak

Proceedings of the Sixth International Conference on Inventive Computation Technologies [ICICT 2021]

WoS Q Değeri

N/A

Scopus Q Değeri

N/A

Cilt

Sayı

Künye