Performance Evaluation of Machine Learning Nave Bayes Algorithms for Network Traffic Classification
DOI:
https://doi.org/10.47577/technium.v13i.9473Keywords:
Network traffic classification, Machine learning, Nave Bayes, Bernoulli, Multinominal, Gaussian, Performance evaluation.Abstract
Network Traffic Classification (NTC) is an important field for different network statistics like management, malware detection and other paramount constraints. Artificial Intelligence (AI) including Machine Learning (ML) and Deep Learning (DL), on the other hand, plays a very important field nowadays due to its significant capabilities with an extremely different fields and for complex problems. ML, specifically, provides tools in the most important network fields like traffic management, security, etc. Performance evaluation is a very important aspect of any system. This research paper provides a method for NTC using ML Nave Bayes (NB) algorithm in terms of Bernoulli, Multinominal and Gaussian for classifying captured network traffic in two different datasets and perform a performance evaluation and a comparison among these algorithms. The first dataset is a VPN-nonVPN (ISCXVPN2016) dataset while the second is a packet-captured regular Wi-Fi traffic flow dataset for video browsing on the web. Results were comparable in terms of f1-score, accuracy and processing time. Bernoulli NB provides average 93.05% accuracy with 742 ms, Multinominal NB provides average 98.78% accuracy with 78.3 ms processing time and finally, Gaussian NB provides average 69.14% with 46.85 ms processing time.
References
M. Abbasi, A. Shahraki, and A. Taherkordi, “Deep Learning for Network Traffic Monitoring and Analysis (NTMA): A Survey,” Computer Communications, vol. 170. Elsevier B.V., pp. 19–41, Mar. 15, 2021. doi: 10.1016/j.comcom.2021.01.021.
K. Nigam, A. K. Mccallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents using EM,” vol. 39, pp. 103–134, 2000.
A. Shahraki and Ø. Haugen, “An outlier detection method to improve gathered datasets for network behavior analysis in IoT,” Journal of Communications, vol. 14, no. 6, pp. 455–462, 2019, doi: 10.12720/jcm.14.6.455-462.
C. Wang, T. Xu, and X. Qin, “Network Traffic Classification with Improved Random Forest,” in 2015 11th International Conference on Computational Intelligence and Security (CIS), Dec. 2015, pp. 78–81. doi: 10.1109/CIS.2015.27.
J.-M. Wang, C.-L. Qian, C.-H. Che, and H.-T. He, “Study on Process of Network Traffic Classification Using Machine Learning,” in 2010 Fifth Annual ChinaGrid Conference, Jul. 2010, pp. 262–266. doi: 10.1109/ChinaGrid.2010.53.
Jomilė Nakutavičiūtė, “What is QUIC protocol used for?,” 2017. Accessed: Jun. 21, 2022. [Online]. Available: https://nordvpn.com/blog/what-is-quic-protocol/
Rohit Dwivedi, “What Is Naive Bayes Algorithm In Machine Learning?,” Analytic Steps. https://www.analyticssteps.com/blogs/what-naive-bayes-algorithm-machine-learning (accessed May 28, 2022).
D. Aloraifan, I. Ahmad, and E. Alrashed, “Deep learning based network traffic matrix prediction,” International Journal of Intelligent Networks, vol. 2, pp. 46–56, 2021, doi: 10.1016/j.ijin.2021.06.002.
V. Tong, H. A. Tran, S. Souihi, and A. Mellouk, “A novel QUIC traffic Classifier based on Convolutional Neural Networks,” 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–6, Sep. 2018, doi: 10.1109/GLOCOM.2018.8647128.
Rui Li, Xi Xiao, Shiguang Ni, Haitao Zheng, and Shutao Xia, “Byte Segment Neural Network for Network Traffic Classification,” 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–10, 2018, doi: 10.1109/IWQoS.2018.8624128.
L. Chen, J. Liu, and M. Xian, “Network Traffic Classification Using Deep Learning,” International Journal on Artificial Intelligence Tools, vol. 29, no. 7–8, Dec. 2020, doi: 10.1142/S0218213020400084.
H. K. Lim, J. B. Kim, J. S. Heo, K. Kim, Y. G. Hong, and Y. H. Han, “Packet-based Network Traffic Classification Using Deep Learning,” in 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), 2019, pp. 046–051. doi: 10.1109/ICAIIC.2019.8669045.
A. M. Sadeghzadeh, S. Shiravi, and R. Jalili, “Adversarial Network Traffic: Towards Evaluating the Robustness of Deep-Learning-Based Network Traffic Classification,” IEEE Transactions on Network and Service Management, vol. 18, no. 2, 2021, doi: 10.1109/TNSM.2021.3052888.
K. L. Dias, M. A. Pongelupe, W. M. Caminhas, and L. de Errico, “An innovative approach for real-time network traffic classification,” Computer Networks, vol. 158, 2019, doi: 10.1016/j.comnet.2019.04.004.
A. Occhipinti, L. Rogers, and C. Angione, “A pipeline and comparative study of 12 machine learning models for text classification,” Expert Syst Appl, vol. 201, Sep. 2022, doi: 10.1016/j.eswa.2022.117193.
T. M. Mitchell, “Machine Learning and Data Mining,” Commun ACM, vol. 42, no. 11, pp. 30–36, Nov. 1999, Accessed: Jun. 09, 2022. [Online]. Available: https://www.scopus.com/record/display.uri?eid=2-s2.0-0002337827&origin=inward&txGid=74c09130c9780c4a599def28d44452bb&featureToggles=FEATURE_NEW_DOC_DETAILS_EXPORT:1
A. Mccallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI 1998 Computer Science, 1998.
Irina Rish, “An Empirical Study of the Naïve Bayes Classifier,” IJCAI 2001 Work Empir Methods Artif Intell, 2001, [Online]. Available: https://www.researchgate.net/publication/228845263
H. Ney and A. Juan, “Reversing and Smoothing the Multinomial Naive Bayes Text Classifer,” in Pattern Recognition in Information Systems, Proceedings of the 2nd International Workshop on Pattern Recognition in Information Systems, PRIS 2002, In conjunction with ICEIS 2002, Apr. 2002. [Online]. Available: https://www.researchgate.net/publication/221383148
D. D. Lewis, “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval,” Springer. Nédellec, C., Rouveirol, C. (eds) Machine Learning: ECML-98. ECML 1998. Lecture Notes in Computer Science, vol. 1398, 1998, doi: https://doi.org/10.1007/BFb0026666.
R. D. S. Raizada and Y.-S. Lee, “Smoothness without Smoothing: Why Gaussian Naive Bayes Is Not Naive for Multi-Subject Searchlight Studies,” PLoS One, vol. 8, no. 7, p. 69566, 2013, doi: 10.1371/journal.pone.0069566.
G. Draper-Gil, A. H. Lashkari, M. S. I. Mamun, and A. A. Ghorbani, “Characterization of encrypted and VPN traffic using time-related features,” in ICISSP 2016 - Proceedings of the 2nd International Conference on Information Systems Security and Privacy, 2016, pp. 407–414. doi: 10.5220/0005740704070414.
F. Pedregosa FABIANPEDREGOSA et al., “Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot,” 2011. [Online]. Available: http://scikit-learn.sourceforge.net.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf Process Manag, vol. 45, no. 4, pp. 427–437, Jul. 2009, doi: 10.1016/J.IPM.2009.03.002.
Teemu Kanstren, “A Look at Precision, Recall, and F1-Score,” Towards Data Science, Sep. 12, 2020. https://towardsdatascience.com/a-look-at-precision-recall-and-f1-score-36b5fd0dd3ec#:~:text=F1%2Dscore%20when%20Precision%3D0.8%20and%20Recall%20%3D%200.01%20to%201.0&text=Here%20precision%20is%20fixed%20at,varies%20from%200.0%20to%201.0. (accessed Jul. 25, 2022).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Afrah Salman Dawood
This work is licensed under a Creative Commons Attribution 4.0 International License.