Adaptive SMOTE for Extreme Class Imbalance in Network Intrusion Detection on CIC-IDS2017
Oluwapelunmi Bankole *
Department of Management Information Systems, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, 89154, Nevada, United States.
*Author to whom correspondence should be addressed.
Abstract
This study addresses extreme class imbalance in network intrusion detection systems, a critical challenge where minority attack classes can constitute less than 0.001% of network traffic. Using the CICIDS2017 dataset with an initial imbalance ratio of 191,678:1, we developed an adaptive SMOTE-based framework that applies tiered, class-dependent oversampling (1%, 0.8%, 0.5% of majority class based on minority class rarity). Our preprocessing pipeline combines comprehensive data cleaning, XGBoost-based recursive feature elimination (reducing 78 features to 50), min-max normalization, and targeted synthetic oversampling, achieving 99.9% imbalance reduction (191,678:1 \(\to\) 204:1) while adding only 4.7% additional training samples. Classification using Random Forest achieved 99.79% overall accuracy, with 97–99% recall for network-layer attacks (DoS variants, SSH-Patator, FTP-Patator, Port Scan) and 96–99% precision across all successfully detected classes. The complete pipeline (preprocessing, feature selection, sampling, training, evaluation) executes in 14.2 minutes on standard hardware, making it practical for periodic retraining in cloud-based IDS deployments. However, ultra-rare application-layer attacks (XSS: 0% recall, SQL Injection: 0% recall) remain challenging, highlighting that synthetic oversampling alone cannot address attacks requiring application-layer feature engineering beyond flow-level statistics. This work demonstrates that adaptive, class-dependent SMOTE strategies can effectively handle extreme imbalance for attacks with discriminative flow-based signatures while clearly delineating the limitations requiring complementary approaches.
Keywords: Network intrusion detection, class imbalance, SMOTE, deep learning, feature selection, CIC-IDS2017, minority class detection, adaptive sampling, cybersecurity, cloud computing security