Creating a comprehensive data set for deception detection studies in Turkish texts

Akkol, Ekin; Gökşen, Yılmaz

Creating a comprehensive data set for deception detection studies in Turkish texts

Dosyalar

Tam Metin / Full Text (393.33 KB)

Tarih

2024

Yazarlar

Akkol, Ekin

Gökşen, Yılmaz

Yayıncı

Suat TEKER

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Purpose- Deception detection has gained increasing importance with the widespread use of digital communication and online platforms. While numerous studies have been conducted on deception detection in various languages, a significant gap remains in the availability of a Turkish-language dataset for detecting deceptive reviews. This study addresses this gap by creating a comprehensive dataset specifically for deception detection in Turkish hotel reviews, including real, fake, and AI-generated comments. The dataset aims to facilitate research on deception detection, enhance the reliability of user-generated content, and contribute to the development of automated methods for identifying deceptive texts. Methodology- The study included a dataset of 5,013 Turkish hotel reviews, including real reviews from Tripadvisor, fake reviews generated by humans, and fake reviews generated by AI using the OpenAI GPT API. The collected dataset underwent extensive preprocessing to ensure quality and reliability, including data cleaning, filtering criteria, and balancing the distribution of real and fake comments. Descriptive and statistical analyses were performed to identify linguistic patterns and structural differences across these three categories. Specifically, linguistic features such as comment length, complexity, readability (measured using the Gunning Fog Index), and pronoun usage were examined. Findings- Real comments are longer and more detailed than fake and AI-generated comments, while fake comments are simpler and clearer, which supports deception detection studies in other languages. AI-generated comments frequently use the pronoun ‘we’, while fake comments tend to mimic personal experience with the pronoun ‘I’. In addition, the pronoun usage in real comments is more balanced and shows an authentic language structure. Conclusion- This study makes important contributions for fake comment detection by providing the first large-scale Turkish deception detection dataset. The findings can help businesses improve the credibility of online comments. Future work could focus on machine learning applications and comparisons with different languages.

Anahtar Kelimeler

Deception detection, Turkish dataset, text analysis, fake reviews, hotel reviews

Bağlantı

https://hdl.handle.net/20.500.14034/2562
https://doi.org/10.17261/Pressacademia.2024.1960

Koleksiyon

Dergipark
Yönetim Bilişim Sistemleri Bölümü Koleksiyonu

Detaylı Öğe Kaydı

Creating a comprehensive data set for deception detection studies in Turkish texts

Dosyalar

Tarih

Yazarlar

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Erişim Hakkı

Özet

Açıklama

Anahtar Kelimeler

Künye

Bağlantı

Koleksiyon