[90503] Graph based keyword extraction method for scientifc publications

Ali, Abdirahman Mohamed

[90503] Graph based keyword extraction method for scientifc publications

dc.contributor.advisor	Kakışım, Arzu
dc.contributor.author	Ali, Abdirahman Mohamed
dc.date.accessioned	2023-04-07T15:24:32Z
dc.date.available	2023-04-07T15:24:32Z
dc.date.issued	2022
dc.department	Enstitüler, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.description	Tez (Yüksek Lisans) -- İstanbul Ticaret Üniversitesi -- Kaynakça var.	en_US
dc.description	Q 335/A45	en_US
dc.description.abstract	Due to the increasing technological possibilities day by day, the volume of data produced is increasing rapidly. Therefore, reading and analyzing data has become a very time-consuming task. Since many text files do not contain keywords that briefly describe the content of the text, it is necessary to examine an entire document to understand the text's content. In this direction, many methods that aim to automate the text summarization process using keyword extraction approaches are presented. Recently, keyword extraction approaches, which are based on different approaches such as machine learning, deep learning, and topic models, and which have two different manners, supervised and unsupervised, have been proposed. Most of these proposed methods aim to extract the most relevant words and phrases from the given text. However, in scientific publications, it is often difficult to express the paper with a limited number of keywords. Sometimes no common words or phrases are observed between the keywords of two scientific publications that are similar in content. In this case, the creation of keywords that are not visible in the paper but related to the context of the paper is very important in terms of revealing the contextual similarity between the papers.In this study, a graph-based unsupervised keyword extraction approach for scientific papers is presented. The proposed method takes academic publications as input and creates an association word graph containing the n-grams that are frequently observed in these publications. It similarly generates n-grams for a newly coming paper, selects the specific nodes from the graph that matches the n-grams generated for the new paper, and performs random walks over these selected nodes to obtain different n-gram sequences. Our method selects the most frequently observed n-grams as keywords from the different number of generated n-gram sequences. Experimental results are presented by comparing our method with eight different methods using three different datasets.Keywords: Graph-based, keyword extraction, n-grams, random walk.ÖZETHer geçen gün artan teknolojik imkanlar nedeniyle üretilen veri hacmi hızlaartmaktadır. Bu nedenle, verileri okumak ve analiz etmek çok zaman alan bir iş halinegeldi. Birçok metin dosyası metnin içeriğini kısaca açıklayan anahtar kelimeleriçermediğinden, metnin içeriğini anlamak için tüm belgeyi incelemek gerekir. Budoğrultuda, anahtar kelime çıkarma yaklaşımlarını kullanarak metin özetleme süreciniotomatikleştirmeyi amaçlayan birçok yöntem sunulmaktadır. Son zamanlarda makineöğrenmesi, derin öğrenme ve konu modelleri gibi farklı yaklaşımları temel alandenetimli ve denetimsiz olmak üzere iki farklı yaklaşıma sahip olan anahtar kelimeçıkarma yöntemleri önerilmiştir. Önerilen bu yöntemlerin çoğu, verilen metinden enalakalı kelimeleri ve cümleleri çıkarmayı amaçlamaktadır. Ancak bilimsel yayınlardamakaleyi sınırlı sayıda anahtar kelime ile ifade etmek çoğu zaman zordur. Bazen içerikolarak benzer iki bilimsel yayının anahtar kelimeleri arasında ortak bir kelime veyakelime öbeği görülmez. Bu durumuda yazıda görünmeyen ancak yazının bağlamıylailgili anahtar kelimelerin oluşturulması, yazılar arasındaki bağlamsal benzerliğinortaya çıkarılması açısından oldukça önemlidir. Bu çalışmada, bilimsel makaleler içingraf tabanlı denetimsiz anahtar kelime çıkarma ve önerme yaklaşımı sunulmaktadır.Önerilen yöntem, akademik yayınları girdi olarak almakta ve bu yayınlarda sıklıklagözlenen n-gramları içeren bir ilişki kelime grafiği oluşturmaktadır. Benzer şekildeyeni gelen bir akademik yayın için n-gramlar üretmekte, ve bu n-gramlarla eşleşen grafdüğümleri üzerinden rastgele yürüyüşler gerçekleştirerek, n-gram dizileri eldeetmektedir. Yöntemimiz, üretilen farklı sayıda n-gram dizisinde en sık gözlenen ngramları anahtar sözcükler olarak seçmektedir. Yöntemimize ait deneysel sonuçlar, ikifarklı veriseti üzerinde sekiz farklı yöntemle karşılaştırılarak sunulmuştur.Anahtar Kelimeler: Anahtar kelime çıkarma, graf tabanlı, n-gram, rastgele yürüyüş.	en_US
dc.identifier.endpage	23	en_US
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://katalog.ticaret.edu.tr/e-kaynak/tez/90503.pdf
dc.identifier.uri	https://hdl.handle.net/11467/6454
dc.identifier.yoktezid	763012	en_US
dc.institutionauthor	Ali, Abdirahman Mohamed
dc.language.iso	en	en_US
dc.publisher	İstanbul Ticaret Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Yapay zeka	en_US
dc.subject	Data mining	en_US
dc.subject	Veri madenciliği	en_US
dc.subject	Information storage and retrieval	en_US
dc.subject	Bilgi depolama ve erişim sistemleri	en_US
dc.subject	Mathematical logic	en_US
dc.subject	Matematiksel mantık	en_US
dc.subject	Natural language processing (Computer science)	en_US
dc.subject	Doğal dil işleme (Bilgisayar bilimi)	en_US
dc.subject	Bilim	en_US
dc.title	[90503] Graph based keyword extraction method for scientifc publications	en_US
dc.type	Master Thesis	en_US

Koleksiyon

Fen Bilimleri Enstitüsü Tez Koleksiyonu

[90503] Graph based keyword extraction method for scientifc publications

Dosyalar

Koleksiyon