Performa Naïve Bayes, SVM, dan IndoBERT pada Analisis Sentimen Twitter IndiHome dengan Strategi Penanganan Data Tidak Seimbang

Adinda  Anas Qolbu; Nina Fitriyati; Nur Inayah

doi:10.14421/fourier.2025.141.29-44

Authors

Adinda Anas Qolbu UIN Syarif Hidayatullah Jakarta
Nina Fitriyati UIN Syarif Hidayatullah Jakarta
Nur Inayah UIN Syarif Hidayatullah Jakarta

DOI:

https://doi.org/10.14421/fourier.2025.141.29-44

Keywords:

Analisis Sentimen, Stratified 5-Fold Cross Validation, SMOTE, Pembobotan Kelas

Abstract

Penelitian ini bertujuan untuk membandingkan performa tiga pendekatan analisis sentimen, yaitu Naïve Bayes, Support Vector Machine (SVM), dan Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT), pada layanan IndiHome menggunakan data Twitter. Keterbatasan model tradisional melatarbelakangi penelitian ini dalam mengenali opini positif dan tantangan ketidakseimbangan data yang sering muncul dalam analisis berbasis media sosial. Data penelitian berupa 7393 tweet (Januari 2019–Agustus 2024) yang dilabeli secara manual menjadi sentimen positif dan negatif. Model dievaluasi menggunakan stratified 10-fold cross validation dan data uji, dengan penerapan teknik penanganan ketidakseimbangan berupa Synthetic Minority Oversampling Technique (SMOTE) dan pembobotan kelas (class weighting). Hasil menunjukkan IndoBERT unggul dengan akurasi 0,96 dan F1-score makro 0,95 tanpa penanganan khusus, sedangkan SVM mencapai akurasi 0,95 dengan pembobotan kelas, dan Naïve Bayes meningkat dari akurasi 0,89 menjadi 0,92 setelah SMOTE. Analisis tren sentimen menunjukkan opini negatif mendominasi, terutama terkait kecepatan dan kestabilan layanan. Temuan ini menegaskan bahwa IndoBERT lebih efektif dalam memahami konteks bahasa Indonesia, sementara teknik penanganan data tetap relevan untuk meningkatkan performa model tradisional. Hasil penelitian ini penting karena memberikan dasar empiris dalam pemilihan model analisis sentimen yang lebih akurat, adaptif terhadap bahasa Indonesia, dan bermanfaat dalam meningkatkan kualitas layanan.

This study aims to compare the performance of three sentiment analysis approaches, namely Naïve Bayes, Support Vector Machine (SVM), and Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT), on IndiHome services using Twitter data. The limitations of traditional models underlie this study in recognizing positive opinions and the challenge of data imbalance that often arises in social media based analysis. The research data consist of 7,393 tweets (January 2019–August 2024) manually labeled into positive and negative sentiments. Models were evaluated using stratified 10-fold cross validation and test data, with the application of imbalance handling techniques such as Synthetic Minority Oversampling Technique (SMOTE) and class weighting. Results show IndoBERT excels with 0.96 accuracy and 0.95 macro F1-score without special handling, while SVM reaches 0.95 accuracy with class weighting, and Naïve Bayes improves from 0.89 to 0.92 accuracy after SMOTE. Sentiment trend analysis indicates negative opinions dominate, mainly regarding speed and service stability. These findings confirm IndoBERT is more effective in understanding Indonesian context, while data handling remains relevant for improving traditional models. This study’s results are important because they offer an empirical foundation for choosing sentiment analysis models that are more accurate, adaptive to Indonesian language, and useful for improving service quality.

Downloads

Download data is not yet available.

References

Simon Kemp, “Digital 2024: Indonesia,” 2024. [Online]. Available: https://datareportal.com/reports/digital-2024-indonesia

APJII, “Profil Internet Indonesia 2022,” Jakarta. [Online]. Available: https://survei.apjii.or.id/survei

F. Syah, H. Fajrin, A. N. Afif, M. R. Saeputra, D. Mirranty, and D. D. Saputra, “Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization,” J. JTIK (Jurnal Teknol. Inf. dan Komunikasi), vol. 7, no. 1, pp. 53–58, Jan. 2023, doi: 10.35870/jtik.v7i1.686.

D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “PENERAPAN ALGORITMA SVM UNTUK ANALISIS SENTIMEN PADA DATA TWITTER KOMISI PEMBERANTASAN KORUPSI REPUBLIK INDONESIA,” Edutic - Sci. J. Informatics Educ., vol. 7, no. 1, pp. 1–11, Nov. 2020, doi: 10.21107/edutic.v7i1.8779.

N. Fitriyah, B. Warsito, and D. A. I. Maruddani, “ANALISIS SENTIMEN GOJEK PADA MEDIA SOSIAL TWITTER DENGAN KLASIFIKASI SUPPORT VECTOR MACHINE (SVM),” J. Gaussian, vol. 9, no. 3, pp. 376–390, Aug. 2020, doi: 10.14710/j.gauss.v9i3.28932.

D. Darwis, N. Siskawati, and Z. Abidin, “PENERAPAN ALGORITMA NAIVE BAYES UNTUK ANALISIS SENTIMEN REVIEW DATA TWITTER BMKG NASIONAL,” J. Tekno Kompak, vol. 15, no. 1, p. 131, Feb. 2021, doi: 10.33365/jtk.v15i1.744.

Y. MZ, J. Bororing Edwin, S. Rahayu, and J. F Andhika, “Analisis Sentimen Terhadap Layanan Tokopedia Berdasarkan Twitter dengan Metode Klasifikasi Support Vector Machine,” Smart Comp Jurnalnya Orang Pint. Komput., vol. 12, no. 1, Jan. 2023, doi: 10.30591/smartcomp.v12i1.4591.

K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, Mar. 2023, doi: 10.15294/jaist.v4i2.59493.

N. P. A. P. S. Putri, D. S. Angreni, and I. W. Sudarsana, “A Study on Sentiment Analysis of Public Response to The New Fuel Price Policy In 2022: A Support Vector Machine Approach,” Inpr. Indones. J. Pure Appl. Math., vol. 7, no. 1, pp. 88–100, May 2025, doi: 10.15408/inprime.v7i1.42717.

F. Hashfi, D. Sugiarto, and I. Mardianto, “Sentiment Analysis of An Internet Provider Company Based on Twitter Using Support Vector Machine and Naïve Bayes Method,” Ultim. J. Tek. Inform., vol. 14, no. 1, pp. 1–6, 2022, doi: 10.31937/ti.v14i1.2384.

D. Pratama and S. Akbar, “Analysis of Public Opinion on Public Transportation in Bandung and Jakarta in Twitter using Indonesian Bidirectional Encoder Representations from Transformer,” in 2023 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), BALI, Indonesia: IEEE, 2023, pp. 179–183. doi: 10.1109/IAICT59002.2023.10205608.

A. Rahmawati, A. Alamsyah, and A. Romadhony, “Hoax News Detection Analysis using IndoBERT Deep Learning Methodology,” 2022 10th Int. Conf. Inf. Commun. Technol. ICoICT 2022, no. April, pp. 368–373, 2022, doi: 10.1109/ICoICT55009.2022.9914902.

J. H. Joloudari, A. Marefat, M. A. Nematollahi, S. S. Oyelere, and S. Hussain, “Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks,” Appl. Sci., vol. 13, no. 6, 2023, doi: 10.3390/app13064006.

B. Phatcharathada and P. Srisuradetchai, “Randomized Feature and Bootstrapped Naive Bayes Classification,” Appl. Syst. Innov., vol. 8, no. 4, p. 94, Jul. 2025, doi: 10.3390/asi8040094.

Y. A. Singgalen, “Comparative analysis of decision tree and support vector machine algorithm in sentiment classification for birds of paradise content,” Int. J. Basic Appl. Sci., vol. 12, no. 3, pp. 100–109, Dec. 2023, doi: 10.35335/ijobas.v12i3.298.

K. K. Sampath and M. Supriya, “Transformer Based Sentiment Analysis on Code Mixed Data,” Procedia Comput. Sci., vol. 233, no. 2023, pp. 682–691, 2024, doi: 10.1016/j.procs.2024.03.257.

E. Y. Zhang, A. D. Cheok, Z. Pan, J. Cai, and Y. Yan, “From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models,” Sci, vol. 5, no. 4, p. 46, Dec. 2023, doi: 10.3390/sci5040046.

B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” 2020, [Online]. Available: http://arxiv.org/abs/2009.05387

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.

I. G. B. A. Budaya and I. K. P. Suniantara, “Comparison of Sentiment Analysis Algorithms with SMOTE Oversampling and TF-IDF Implementation on Google Reviews for Public Health Centers,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 1077–1086, Jul. 2024, doi: 10.57152/malcom.v4i3.1459.

M. N. Razali, N. Arbaiy, P.-C. Lin, and S. Ismail, “Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets,” Electronics, vol. 14, no. 4, p. 705, Feb. 2025, doi: 10.3390/electronics14040705.

O. Rainio, J. Teuho, and R. Klén, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., vol. 14, no. 1, p. 6086, Mar. 2024, doi: 10.1038/s41598-024-56706-x.

K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F1 and macro-averaged F1 scores,” Appl. Intell., vol. 52, no. 5, pp. 4961–4972, Mar. 2022, doi: 10.1007/s10489-021-02635-5.

V. Ganganwar and R. Rajalakshmi, “Employing synthetic data for addressing the class imbalance in aspect-based sentiment classification,” J. Inf. Telecommun., vol. 8, no. 2, pp. 167–188, Apr. 2024, doi: 10.1080/24751839.2023.2270824.

F. Amandasari and D. Damayanti, “Perbandingan Kinerja Support Vector Machine dan Naive Bayes dalam Klasifikasi Sentimen Twitter Terhadap Pelayanan BPJS,” J. Pendidik. dan Teknol. Indones., vol. 5, no. 3, pp. 645–653, 2025, doi: 10.52436/1.jpti.680.

V. Agresia and R. R. Suryono, “Comparison of SVM, Naïve Bayes, and Logistic Regression Algorithms for Sentiment Analysis of Fraud and Bots in Purcashing Concert Ticket,” INOVTEK Polbeng - Seri Inform., vol. 10, no. 2, pp. 591–602, Jul. 2025, doi: 10.35314/npyfdh47.

S. M. Anugerah, R. Wijaya, and M. A. Bijaksana, “Sentimen Analysis Social Media for Disaster using Naïve Bayes and IndoBERT,” INTEK J. Penelit., vol. 11, no. 1, pp. 51–58, Apr. 2024, doi: 10.31963/intek.v11i1.4771.

W. O. Vihikan and I. N. P. Trisna, “Indonesian Health Question Multi-Class Classification Based on Deep Learning,” J. Inf. Syst. Informatics, vol. 6, no. 3, pp. 1931–1944, Sep. 2024, doi: 10.51519/journalisi.v6i3.838.

S. Rohan et al., “BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses,” 2025, [Online]. Available: http://arxiv.org/abs/2506.01817

A. Annur Rohman, G. Alfa Trisnapradika, and K. Kunci, “Perbandingan Algoritma NBC, SVM, Logistic Regression untuk Analisis Sentimen Terhadap Wacana KaburAjaDulu di Media Sosial X,” Technol. Sci., vol. 7, no. 1, pp. 169–178, 2025, doi: https://doi.org/10.47065/bits.v7i1.7261.

N. Hussain et al., “Multi-Level Depression Severity Detection with Deep Transformers and Enhanced Machine Learning Techniques,” AI, vol. 6, no. 7, p. 157, Jul. 2025, doi: 10.3390/ai6070157.

S. Widagdo, Y. I. Handayani, A. H. Prastyowati, L. Rachmawati, M. Dimyati, and S. Amalia, “The effect of reliability and empathy on customer satisfaction: A survey of PT Telkom Indonesia’s IndiHome customers,” Hum. Syst. Manag., vol. 43, no. 2, pp. 181–194, Mar. 2024, doi: 10.3233/HSM-230003.

A. Feldmann et al., “The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic,” Proc. ACM SIGCOMM Internet Meas. Conf. IMC, pp. 1–18, 2020, doi: 10.1145/3419394.3423658.

Performa Naïve Bayes, SVM, dan IndoBERT pada Analisis Sentimen Twitter IndiHome dengan Strategi Penanganan Data Tidak Seimbang

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

informasi

supports

Current Issue

trafik