Developing a protective - preventive and machine learning based model on child abuse

Mert, Fatih

Developing a protective - preventive and machine learning based model on child abuse

dc.contributor.advisor	Zaim, Abdul Halim
dc.contributor.advisor	Aydın, Muhammed Ali
dc.contributor.author	Mert, Fatih
dc.date.accessioned	2022-11-01T16:40:18Z
dc.date.available	2022-11-01T16:40:18Z
dc.date.issued	2021
dc.department	Enstitüler, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	en_US
dc.description	Tez (Yüksek Lisans) -- İstanbul Ticaret Üniversitesi -- Kaynakça var.	en_US
dc.description.abstract	ÖZET Çevrimiçi çocuk istismarı, toplumlarda giderek artan bir sorundur ve sanal ortamda geçirilen süre son zamanlarda büyük ölçüde artmaya başlamıştır. Bireyler kişisel gönderileri ile kendi fikirlerini paylaşırken ve çevrimiçi sohbetin bir parçası olurken, anonim olabilme imkanina sahiptir. Bu seçenek ise yasa dışı faaliyetlere teşebbüs etmek isteyenler için kişisel kimliği gizleme şansını da beraberinde getirmektedir. Çevrimiçi platformlarda cinsel istismar, yukarıda belirtilen eylemlerin en önemli alanlarından biridir ve cinsel saldırganlar, güvenlerini kazanmak ve müstehcen medya dosyalarını paylaşmalarını sağlamak için çocuklarla veya gençlerle hızlı bir şekilde arkadaşça bir ilişki kurabilecekleri çevrimiçi sohbet platformlarını kolayca kullanabilmektedir. Bu cinsel saldırganlar çoğunlukla kurbanlarını buluşmaya ikna etmeye çalışırlar ve bu, çocuklarla ve gençlerle cinsel ilişkiye girmelerine yol açabilmektedir. Çoğu toplumun karşılaştığı büyük zorluğa dikkat çekmek için, bu çalışma esas olarak çevrimiçi iletişimin erken aşamasındaki çocuk istismarcılarını tespit etmeyi amaçlamaktadır. Makine Öğrenimi tekniklerini kullanarak sanal sohbet kayıtları aracılığıyla çocuklara yönelik cinsel istismarı hedefleyenlerin bulunmasına dayalı bu çalışmanın ilk bölümünde, bir Wikipedia veri seti kullanılarak, yüzde 97'in üzerinde doğrulukla, belirli bir metnin toksisite türlerine göre sınıflandırıldığı çok etiketli bir sınıflandırma yapılması sağlandı. Bu çalışmanın sonucu ikinci aşamada da kullanılmıştır ve PAN12 veri seti ile modelimizin eğitilmesinin ardından yüzde 92'den fazla doğruluk oranıyla sohbet kayıtlarından şüpheli konuşmalar tespit edilebilmiş ve cinsel saldırganlar tanınabilmiştir. Anahtar Kelimeler: Çevrimiçi cinsel istismarcı teşhisi, çocuk istismarı tespiti, çok etiketli metin sınıflandırma, makine öğrenmesi ABSTRACT Online grooming is an ever-increasing problem in societies and the time spent online is recently started to rise drastically. People can become anonymous whilst posting, sharing his/her own opinion, and being a part of online chatting. Option to be anonymous also brings together the chance for hiding personal identity when making an attempt on illegal activities. Online grooming is one of the significant areas of aforementioned actions and sexual predators can easily use online chatting platforms to quickly build a friendly relationship with children or teenagers to gain their trust and make them share their obscene media files. These sexual predators mostly try to convince their victims to meet and it may lead to having sexual intercourse with a minor. In order to draw attention to the huge challenge that most societies face, this study mainly aims to identify predators in the early stage of online communication. The objective is to do an investigation to detect child grooming through online chat records by using Machine Learning techniques. In the first part of the study, it has been achieved to make a multi-label classification on a Wikipedia dataset with more than 97 percent accuracy, where a given text gets classified based on the toxicity types. The outcome of this work is also used in the second stage and herein PAN12 dataset has been used to train and test our model. We have ended up with more than 92 percent accuracy, where suspicious conversation messages from the chat records get identified and sexual predators can be recognized. Keywords: Child abuse detection, machine learning, multi-label text classification, online sexual predator identification İçindekiler CONTENTS . i ABSTRACT . iii ÖZET . iv ACKNOWLEDGEMENTS . v FIGURES . vi TABLES . viii SYMBOLS AND ABBREVIATIONS . ix 1. INTRODUCTION . 1 1. 1. Motivation . 1 1. 2. Scope . 3 1. 3. Goals . 3 1. 4. Disclaimer . 3 1. 5. Outline . 3 2. LITERATURE REVIEW . 5 2. 1. Online Child Abuse . 5 2. 2. Related Work . 5 2. 3. Terminology Overview . 8 2. 4. Types of Machine Learning Models . 10 2. 4. 1. Supervised learning . 11 2. 4. 1. 1. SVM . 11 2. 4. 1. 2. Naive bayes . 12 2. 4. 1. 3. Logistic regression . 13 2. 4. 1. 4. K-nearest neighbors . 13 2. 4. 1. 5. AdaBoost . 14 2. 4. 2. Unsupervised learning . 15 2. 4. 3. Semi-supervised learning . 16 2. 4. 4. Reinforcement learning . 16 2. 5. Evaluation Metrics . . 16 3. METHODOLOGY . 23 3. 1. Data Gathering . 23 3. 2. Exploratory Data Analysis . 24 3. 2. 1. Data pre-filtering and pre-processing . 24 3. 2. 1. 1. Toxic comment classification . 24 3. 2. 1. 2 Sexual predator identification . 28 3. 3. Data Labeling . 30 3. 4. Building Classification Model . 31 3. 4. 1. Encoding text data for machine learning . 31 3. 4. 1. 1. Bag-of-words (BoW) . 31 3. 4. 1. 1. TF - IDF . 31 3. 5. Classification . 32 3. 5. 1. Pseudocodes for classification models . 32 3. 5. 1. 1. Pseudocode for toxic comment classification model . 33 3. 4. 1. 2. Pseudocode for sexual predator identification model . 33 3. 5. 2. Performance measurement of machine learning classifiers . 34 4. RESEARCH FINDINGS AND DISCUSSION . 35 5. CONCLUSION AND SUGGESSTIONS . 60 5. 1. Future Work . 61 REFERENCES . 62 APPENDICES . 64 Appendix A. Toxic Comment Classification . 65 Appendix B. Sexual Predator Identification . 72 APPENDIX C. Other Attachments . 74 CURRICULUM VITAE . 76	en_US
dc.identifier.endpage	76	en_US
dc.identifier.startpage	1	en_US
dc.identifier.uri	https://katalog.ticaret.edu.tr/e-kaynak/tez/88876.pdf
dc.identifier.uri	https://hdl.handle.net/11467/5465
dc.identifier.yoktezid	671381	en_US
dc.institutionauthor	Mert, Fatih
dc.language.iso	en	en_US
dc.publisher	İstanbul Ticaret Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Application software	en_US
dc.subject	Uygulama yazılımı	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Yapay zeka	en_US
dc.subject	Data mining	en_US
dc.subject	Veri madenciliği	en_US
dc.subject	Database management	en_US
dc.subject	Veri tabanı yönetimi	en_US
dc.subject	Victims of crimes	en_US
dc.subject	Suç kurbanları	en_US
dc.subject	Bilgi saklama ve geri alma sistemleri	en_US
dc.subject	Information storage and retrieval	en_US
dc.subject	Örüntü tanıma sistemleri	en_US
dc.subject	Pattern recognition systems	en_US
dc.subject.other	Q 336/M47	en_US
dc.title	Developing a protective - preventive and machine learning based model on child abuse	en_US
dc.type	Master Thesis	en_US

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 88876.pdf
Boyut:: 5.33 MB
Biçim:: Adobe Portable Document Format
Açıklama:

İndir

Koleksiyon

Fen Bilimleri Enstitüsü Tez Koleksiyonu