CREATING A COMPREHENSIVE DATA SET FOR DECEPTION DETECTION STUDIES IN TURKISH TEXTS

dc.contributor.authorAkkol, Ekin
dc.contributor.authorGökşen, Yılmaz
dc.date.accessioned2025-03-21T07:37:53Z
dc.date.available2025-03-21T07:37:53Z
dc.date.issued2024
dc.departmentİzmir Bakırçay Üniversitesi
dc.description.abstractPurpose- Deception detection has gained increasing importance with the widespread use of digital communication and online platforms. While numerous studies have been conducted on deception detection in various languages, a significant gap remains in the availability of a Turkish-language dataset for detecting deceptive reviews. This study addresses this gap by creating a comprehensive dataset specifically for deception detection in Turkish hotel reviews, including real, fake, and AI-generated comments. The dataset aims to facilitate research on deception detection, enhance the reliability of user-generated content, and contribute to the development of automated methods for identifying deceptive texts. Methodology- The study included a dataset of 5,013 Turkish hotel reviews, including real reviews from Tripadvisor, fake reviews generated by humans, and fake reviews generated by AI using the OpenAI GPT API. The collected dataset underwent extensive preprocessing to ensure quality and reliability, including data cleaning, filtering criteria, and balancing the distribution of real and fake comments. Descriptive and statistical analyses were performed to identify linguistic patterns and structural differences across these three categories. Specifically, linguistic features such as comment length, complexity, readability (measured using the Gunning Fog Index), and pronoun usage were examined. Findings- Real comments are longer and more detailed than fake and AI-generated comments, while fake comments are simpler and clearer, which supports deception detection studies in other languages. AI-generated comments frequently use the pronoun ‘we’, while fake comments tend to mimic personal experience with the pronoun ‘I’. In addition, the pronoun usage in real comments is more balanced and shows an authentic language structure. Conclusion- This study makes important contributions for fake comment detection by providing the first large-scale Turkish deception detection dataset. The findings can help businesses improve the credibility of online comments. Future work could focus on machine learning applications and comparisons with different languages.
dc.description.sponsorshipSuat TEKER
dc.identifier.doi10.17261/Pressacademia.2024.1960
dc.identifier.doihttps://doi.org/10.17261/Pressacademia.2024.1960
dc.identifier.endpage145
dc.identifier.issn2148-6689
dc.identifier.issue2
dc.identifier.startpage138
dc.identifier.urihttps://hdl.handle.net/20.500.14034/2562
dc.identifier.volume11
dc.language.isoen
dc.publisherSuat TEKER
dc.relation.ispartofResearch Journal of Business and Management
dc.relation.publicationcategoryMakale - Ulusal Hakemli Dergi - Kurum Öğretim Elemanı
dc.rightsinfo:eu-repo/semantics/openAccess
dc.snmzKA_DergiPark_20250319
dc.subjectDeception detection
dc.subjectTurkish dataset
dc.subjecttext analysis
dc.subjectfake reviews
dc.subjecthotel reviews
dc.titleCREATING A COMPREHENSIVE DATA SET FOR DECEPTION DETECTION STUDIES IN TURKISH TEXTS
dc.typeArticle

Dosyalar

Koleksiyon