CREATING A COMPREHENSIVE DATA SET FOR DECEPTION DETECTION STUDIES IN TURKISH TEXTS
Küçük Resim Yok
Tarih
2024
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Suat TEKER
Erişim Hakkı
info:eu-repo/semantics/openAccess
Özet
Purpose- Deception detection has gained increasing importance with the widespread use of digital communication and online platforms. While numerous studies have been conducted on deception detection in various languages, a significant gap remains in the availability of a Turkish-language dataset for detecting deceptive reviews. This study addresses this gap by creating a comprehensive dataset specifically for deception detection in Turkish hotel reviews, including real, fake, and AI-generated comments. The dataset aims to facilitate research on deception detection, enhance the reliability of user-generated content, and contribute to the development of automated methods for identifying deceptive texts. Methodology- The study included a dataset of 5,013 Turkish hotel reviews, including real reviews from Tripadvisor, fake reviews generated by humans, and fake reviews generated by AI using the OpenAI GPT API. The collected dataset underwent extensive preprocessing to ensure quality and reliability, including data cleaning, filtering criteria, and balancing the distribution of real and fake comments. Descriptive and statistical analyses were performed to identify linguistic patterns and structural differences across these three categories. Specifically, linguistic features such as comment length, complexity, readability (measured using the Gunning Fog Index), and pronoun usage were examined. Findings- Real comments are longer and more detailed than fake and AI-generated comments, while fake comments are simpler and clearer, which supports deception detection studies in other languages. AI-generated comments frequently use the pronoun ‘we’, while fake comments tend to mimic personal experience with the pronoun ‘I’. In addition, the pronoun usage in real comments is more balanced and shows an authentic language structure. Conclusion- This study makes important contributions for fake comment detection by providing the first large-scale Turkish deception detection dataset. The findings can help businesses improve the credibility of online comments. Future work could focus on machine learning applications and comparisons with different languages.
Açıklama
Anahtar Kelimeler
Deception detection, Turkish dataset, text analysis, fake reviews, hotel reviews