Performa Naïve Bayes, SVM, dan IndoBERT pada Analisis Sentimen Twitter IndiHome dengan Strategi Penanganan Data Tidak Seimbang
DOI:
https://doi.org/10.14421/fourier.2025.141.29-44Keywords:
Analisis Sentimen, Stratified 5-Fold Cross Validation, SMOTE, Pembobotan KelasAbstract
Abstrak
Penelitian ini bertujuan untuk membandingkan performa tiga pendekatan analisis sentimen pada layanan IndiHome menggunakan data Twitter, yaitu Naïve Bayes, Support Vector Machine (SVM), dan Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT). Keterbatasan model tradisional melatarbelakangi penelitian ini dalam mengenali opini positif dan tantangan ketidakseimbangan data yang sering muncul dalam analisis berbasis media sosial. Data penelitian berupa 7393 tweet (Januari 2019–Agustus 2024) yang dilabeli secara manual menjadi sentimen positif dan negatif. Model dievaluasi menggunakan stratified 5-fold cross validation dan data uji, dengan penerapan teknik penanganan ketidakseimbangan berupa Synthetic Minority Oversampling Technique (SMOTE) dan pembobotan kelas (class weighting). Hasil menunjukkan IndoBERT unggul dengan akurasi 0,96 dan F1-score makro 0,95 tanpa penanganan khusus, sedangkan SVM mencapai akurasi 0,95 dengan pembobotan kelas, dan Naïve Bayes meningkat dari akurasi 0,89 menjadi 0,92 setelah SMOTE. Analisis tren sentimen menunjukkan opini negatif mendominasi, terutama terkait kecepatan dan kestabilan layanan. Temuan ini menegaskan bahwa IndoBERT lebih efektif dalam memahami konteks bahasa Indonesia, sementara teknik penanganan data tetap relevan untuk meningkatkan performa model tradisional.
Kata Kunci: Analisis Sentimen, Stratified 5-Fold Cross Validation, SMOTE, Pembobotan Kelas.
Abstract
This study aims to compare the performance of three sentiment analysis approaches on IndiHome services using Twitter data, namely Naïve Bayes, Support Vector Machine (SVM), and Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT). The limitations of traditional models drive the background of this research in recognizing positive opinions and the challenge of data imbalance that often arises in social media-based analysis. The research data consists of 7393 tweets (January 2019–August 2024) manually labeled into positive and negative sentiments. The model is evaluated using stratified 5-fold cross-validation and test data, with the application of imbalance handling techniques such as Synthetic Minority Oversampling Technique (SMOTE) and class weighting. The results show IndoBERT excels with an accuracy of 0.96 and a macro F1-score of 0.95 without special handling. At the same time, SVM achieves an accuracy of 0.95 with class weighting, and Naïve Bayes improves from 0.89 to 0.92 after SMOTE. Sentiment trend analysis shows a predominance of negative opinions, particularly regarding service speed and stability. These findings confirm that IndoBERT is more effective at understanding the Indonesian context, while data handling techniques remain relevant for improving the performance of traditional models.
Keywords: Sentiment Analysis, Stratified 5-Fold Cross Validation, SMOTE, Class Weighting.
Downloads
References
Simon Kemp, “Digital 2024: Indonesia,” 2024. [Online]. Available: https://datareportal.com/reports/digital-2024-indonesia
APJII, “Profil Internet Indonesia 2022,” Jakarta. [Online]. Available: https://survei.apjii.or.id/survei
F. Syah, H. Fajrin, A. N. Afif, M. R. Saeputra, D. Mirranty, and D. D. Saputra, “Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 53–58, Jan. 2023, doi: 10.35870/jtik.v7i1.686.
D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “PENERAPAN ALGORITMA SVM UNTUK ANALISIS SENTIMEN PADA DATA TWITTER KOMISI PEMBERANTASAN KORUPSI REPUBLIK INDONESIA,” Edutic - Sci. J. Informatics Educ., vol. 7, no. 1, pp. 1–11, Nov. 2020, doi: 10.21107/edutic.v7i1.8779.
N. Fitriyah, B. Warsito, and D. A. I. Maruddani, “ANALISIS SENTIMEN GOJEK PADA MEDIA SOSIAL TWITTER DENGAN KLASIFIKASI SUPPORT VECTOR MACHINE (SVM),” J. Gaussian, vol. 9, no. 3, pp. 376–390, Aug. 2020, doi: 10.14710/j.gauss.v9i3.28932.
D. Darwis, N. Siskawati, and Z. Abidin, “PENERAPAN ALGORITMA NAIVE BAYES UNTUK ANALISIS SENTIMEN REVIEW DATA TWITTER BMKG NASIONAL,” J. Tekno Kompak, vol. 15, no. 1, p. 131, Feb. 2021, doi: 10.33365/jtk.v15i1.744.
Y. MZ, J. Bororing Edwin, S. Rahayu, and J. F Andhika, “Analisis Sentimen Terhadap Layanan Tokopedia Berdasarkan Twitter dengan Metode Klasifikasi Support Vector Machine,” Smart Comp Jurnalnya Orang Pint. Komput., vol. 12, no. 1, Jan. 2023, doi: 10.30591/smartcomp.v12i1.4591.
K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, Mar. 2023, doi: 10.15294/jaist.v4i2.59493.
N. P. A. P. S. Putri, D. S. Angreni, and I. W. Sudarsana, “A Study on Sentiment Analysis of Public Response to The New Fuel Price Policy In 2022: A Support Vector Machine Approach,” Inpr. Indones. J. Pure Appl. Math., vol. 7, no. 1, pp. 88–100, May 2025, doi: 10.15408/inprime.v7i1.42717.
F. Hashfi, D. Sugiarto, and I. Mardianto, “Sentiment Analysis of An Internet Provider Company Based on Twitter Using Support Vector Machine and Naïve Bayes Method,” Ultim. J. Tek. Inform., vol. 14, no. 1, pp. 1–6, 2022, doi: 10.31937/ti.v14i1.2384.
D. Pratama and S. Akbar, “Analysis of Public Opinion on Public Transportation in Bandung and Jakarta in Twitter using Indonesian Bidirectional Encoder Representations from Transformer,” in 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), BALI, Indonesia: IEEE, 2023, pp. 179–183. doi: 10.1109/IAICT59002.2023.10205608.
A. Rahmawati, A. Alamsyah, and A. Romadhony, “Hoax News Detection Analysis using IndoBERT Deep Learning Methodology,” 2022 10th Int. Conf. Inf. Commun. Technol. ICoICT 2022, no. April, pp. 368–373, 2022, doi: 10.1109/ICoICT55009.2022.9914902.
J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Appl. Sci., vol. 13, no. 6, 2023, doi: 10.3390/app13064006.
B. Phatcharathada and P. Srisuradetchai, “Randomized Feature and Bootstrapped Naive Bayes Classification,” Appl. Syst. Innov., vol. 8, no. 4, p. 94, Jul. 2025, doi: 10.3390/asi8040094.
Y. A. Singgalen, “Comparative analysis of decision tree and support vector machine algorithm in sentiment classification for birds of paradise content,” Int. J. Basic Appl. Sci., vol. 12, no. 3, pp. 100–109, Dec. 2023, doi: 10.35335/ijobas.v12i3.298.
K. K. Sampath and M. Supriya, “Transformer Based Sentiment Analysis on Code Mixed Data,” Procedia Comput. Sci., vol. 233, no. 2023, pp. 682–691, 2024, doi: 10.1016/j.procs.2024.03.257.
E. Y. Zhang, A. D. Cheok, Z. Pan, J. Cai, and Y. Yan, “From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models,” Sci, vol. 5, no. 4, p. 46, Dec. 2023, doi: 10.3390/sci5040046.
B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” 2020, [Online]. Available: http://arxiv.org/abs/2009.05387
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.
I. G. B. A. Budaya and I. K. P. Suniantara, “Comparison of Sentiment Analysis Algorithms with SMOTE Oversampling and TF-IDF Implementation on Google Reviews for Public Health Centers,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 1077–1086, Jul. 2024, doi: 10.57152/malcom.v4i3.1459.
M. N. Razali, N. Arbaiy, P.-C. Lin, and S. Ismail, “Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets,” Electronics, vol. 14, no. 4, p. 705, Feb. 2025, doi: 10.3390/electronics14040705.
O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, p. 6086, Mar. 2024, doi: 10.1038/s41598-024-56706-x.
K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F1 and macro-averaged F1 scores,” Appl. Intell., vol. 52, no. 5, pp. 4961–4972, Mar. 2022, doi: 10.1007/s10489-021-02635-5.
V. Ganganwar and R. Rajalakshmi, “Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification,” J. Inf. Telecommun., vol. 8, no. 2, pp. 167–188, Apr. 2024, doi: 10.1080/24751839.2023.2270824.
F. Amandasari and D. Damayanti, “Perbandingan Kinerja Support Vector Machine dan Naive Bayes dalam Klasifikasi Sentimen Twitter Terhadap Pelayanan BPJS,” J. Pendidik. dan Teknol. Indones., vol. 5, no. 3, pp. 645–653, 2025, doi: 10.52436/1.jpti.680.
V. Agresia and R. R. Suryono, “Comparison of SVM, Naïve Bayes, and Logistic Regression Algorithms for Sentiment Analysis of Fraud and Bots in Purcashing Concert Ticket,” INOVTEK Polbeng - Seri Inform., vol. 10, no. 2, pp. 591–602, Jul. 2025, doi: 10.35314/npyfdh47.
S. M. Anugerah, R. Wijaya, and M. A. Bijaksana, “Sentimen Analysis Social Media for Disaster using Naïve Bayes and IndoBERT,” INTEK J. Penelit., vol. 11, no. 1, pp. 51–58, Apr. 2024, doi: 10.31963/intek.v11i1.4771.
W. O. Vihikan and I. N. P. Trisna, “Indonesian Health Question Multi-Class Classification Based on Deep Learning,” J. Inf. Syst. Informatics, vol. 6, no. 3, pp. 1931–1944, Sep. 2024, doi: 10.51519/journalisi.v6i3.838.
S. Rohan et al., “BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses,” 2025, [Online]. Available: http://arxiv.org/abs/2506.01817
A. Annur Rohman, G. Alfa Trisnapradika, and K. Kunci, “Perbandingan Algoritma NBC, SVM, Logistic Regression untuk Analisis Sentimen Terhadap Wacana KaburAjaDulu di Media Sosial X,” Technol. Sci., vol. 7, no. 1, pp. 169–178, 2025, doi: https://doi.org/10.47065/bits.v7i1.7261.
N. Hussain et al., “Multi-Level Depression Severity Detection with Deep Transformers and Enhanced Machine Learning Techniques,” AI, vol. 6, no. 7, p. 157, Jul. 2025, doi: 10.3390/ai6070157.
S. Widagdo, Y. I. Handayani, A. H. Prastyowati, L. Rachmawati, M. Dimyati, and S. Amalia, “The effect of reliability and empathy on customer satisfaction: A survey of PT Telkom Indonesia’s IndiHome customers,” Hum. Syst. Manag., vol. 43, no. 2, pp. 181–194, Mar. 2024, doi: 10.3233/HSM-230003.
A. Feldmann et al., “The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic,” Proc. ACM SIGCOMM Internet Meas. Conf. IMC, pp. 1–18, 2020, doi: 10.1145/3419394.3423658.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Adinda Anas Qolbu, Nina Fitriyati, Nur Inayah

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.












