SMOTE-text: A modified SMOTE for Turkish text classification

dc.authorid0000-0003-1250-5949
dc.contributor.authorÇürükoğlu, Nur
dc.contributor.authorÖzpınar, Alper
dc.date.accessioned2024-10-12T19:47:13Z
dc.date.available2024-10-12T19:47:13Z
dc.date.issued2021
dc.departmentİstanbul Ticaret Üniversitesi, Mühendislik Fakültesi, Mekatronik Mühendisliği (İngilizce) Bölümü en_US
dc.description.abstractOne of the most common problems faced by large enterprise companies is the loss of knowhow after employee’s job replacements and quits. Creating a well-organized, indexed, connected, user friendly and sustainable digital enterprise memory can solve this problem and creates a practical knowhow transfer to new recruited personnel. In this regard, one of the problems that generated is the correct classification of documents that will be stored in the digital library. The most general meaning of text classification also known as text categorization is the process of categorizing text into labeled groups. A document can be related to one or more subjects and choosing the correct labels and classification is sometimes a challenging process. Information repository shows various distributions according to the company’s business areas. For a good and successful machine learning based text classification requires balanced datasets related with the business and previous samples. Due to the lack of documents from minor business creates imbalanced learning dataset. To overcome this problem synthetic data can be created with some methods but those methods are suitable for numerical inputs not proper for text classification. This article presents a modified version of Synthetic Minority Oversampling Technique SMOTE algorithm for text classification by integrating the Turkish dictionary for oversampling for text processing and classification.en_US
dc.identifier.citationCurukoglu, N., & Ozpinar, A. (2021). SMOTE-Text: A Modified SMOTE for Turkish Text Classification. In Lecture notes on data engineering and communications technologies, v.76, (pp. 82–92).
dc.identifier.doi10.1007/978-3-030-79357-9_9
dc.identifier.endpage92en_US
dc.identifier.issn2367-4512
dc.identifier.scopus2-s2.0-85109805397en_US
dc.identifier.scopusqualityQ3en_US
dc.identifier.startpage82en_US
dc.identifier.urihttps://doi.org/10.1007/978-3-030-79357-9_9
dc.identifier.urihttps://hdl.handle.net/11467/8831
dc.identifier.volume76en_US
dc.indekslendigikaynakScopusen_US
dc.language.isoenen_US
dc.publisherSpringer Science and Business Media Deutschland GmbHen_US
dc.relation.ispartofLecture Notes on Data Engineering and Communications Technologiesen_US
dc.relation.publicationcategoryKitap Bölümü - Uluslararasıen_US
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectImbalanced Data Setsen_US
dc.subjectMachine Learningen_US
dc.subjectOversamplingen_US
dc.subjectSMOTE-Texten_US
dc.subjectText Classificationen_US
dc.titleSMOTE-text: A modified SMOTE for Turkish text classificationen_US
dc.typeBook Chapteren_US

Dosyalar