Omarova, DilnazDael, Fares A.Shayea, IbraheemAbitova, GulnaraSailaukhanov, Eldos2025-03-202025-03-202024979-833150526-4https://doi.org/10.1109/CICN63059.2024.10847570https://hdl.handle.net/20.500.14034/2101IEEE MP Section; Institution of Electronics and Telecommunications Engineers (IETE)16th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2024 -- 22 December 2024 through 23 December 2024 -- Indore -- 206392The proliferation of online forums and communities has greatly facilitated knowledge sharing and user support but has also introduced the significant challenge of managing redundant and semantically similar questions. Traditional keyword-based methods have proven inadequate in addressing this issue due to the inherent complexities of natural language, where the same idea can be expressed in numerous ways. This study investigates the use of advanced machine learning algorithms - Logistic Regression, Random Forest, and Gradient Boosting (XGBoost) - to detect semantically similar questions. By employing the Quora Question Pairs dataset, the performance of these models is evaluated using metrics such as accuracy, precision, recall, and F1-score. This research not only provides a comparative analysis of these machine learning models but also suggests a framework for improving information retrieval and user experience in online forums. The study highlights the potential for future integration of deep learning models and advanced semantic understanding techniques to further enhance the detection of semantically similar questions. © 2024 IEEE.eninfo:eu-repo/semantics/closedAccessMachine LearningNatural Language ProcessingSentiment AnalysisWord EmbeddingsDetecting Questions in Online Communities: A Machine Learning ApproachConference Object10.1109/CICN63059.2024.108475702902972-s2.0-85218039043N/A