Benchmark effect of web search engines on text mining

dc.contributor.authorToprak, Ahmet
dc.contributor.authorTuran, Metin
dc.date.accessioned2021-05-04T09:43:12Z
dc.date.available2021-05-04T09:43:12Z
dc.date.issued2021en_US
dc.departmentFakülteler, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.description.abstractThere have been many studies about creating a dictionary and these studies have come from past to present with different methods and different analyzes. Especially with the emergence of the World Wide Web, efforts to create dictionary based on instant data have gained importance. Therefore, the performance of the web search engines directly effects the model which is using web documents for automatic dictionary creation. The web search engines were evaluated in terms of their suggested documents relationality to the query in the research. For this purpose, an automatic dictionary creating model using web documents were developed. First of all, the topic seed words are determined by the documents presented to the system initially. Search is executed by these seed words initially. Then TF-IDF metric was used as meaningful word selection method for returned first document. The top n meaningful words were selected from the highest TF-IDF values. The value of n was determined experimentally. When searching the web with these words added to the dictionary, new documents were suggesting by the web search engine. By repeating the process, experimental dictionaries of a certain size were obtained. By the way, the documents suggested by each web engine are generally different, so that the dictionary similarity produced from the top suggested documents can measure web engines performance of selecting relational documents. Hash similarity was used to evaluate dictionary performance. According to the results, dictionary with the 73.9% highest similarity for Google search engine, dictionary with the 68.7% highest similarity for Bing search engine and dictionary with the 60.5% highest similarity for Yandex search engine were produced.en_US
dc.identifier.endpage92en_US
dc.identifier.issue1en_US
dc.identifier.startpage84en_US
dc.identifier.urihttps://hdl.handle.net/11467/4896
dc.identifier.volume4en_US
dc.language.isoenen_US
dc.publisherMurat GÖKen_US
dc.relation.ispartofVeri Bilimien_US
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectAutomatic Dictionary Creationen_US
dc.subjectHash Similarityen_US
dc.subjectNatural Language Processingen_US
dc.subjectPerformance of Weben_US
dc.subjectTF-IDF Metricen_US
dc.titleBenchmark effect of web search engines on text miningen_US
dc.typeArticleen_US

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
Benchmark Effect of Web Search Engines on Text Mining[#772245]-1210981.pdf
Boyut:
714.3 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Lisans paketi
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
license.txt
Boyut:
1.56 KB
Biçim:
Item-specific license agreed upon to submission
Açıklama: