Comparative Analysis of Deep LSTM Architectures with Multi-Head Attention for Enhanced IoT Intrusion Detection: A Memory-Efficient Approach
Oluwapelunmi Bankole *
Department of Management, Entrepreneurship & Technology, Lee Business School, University of Nevada, Las Vegas, 4505 S Maryland Pkwy, Las Vegas, NV 89154, USA.
*Author to whom correspondence should be addressed.
Abstract
The proliferation of Internet of Things (IoT) devices has introduced significant cybersecurity challenges, necessitating robust intrusion detection systems capable of identifying sophisticated attacks in resourceconstrained environments. This study presents a comprehensive comparative analysis of eight Long Short-Term Memory (LSTM) architectural variants for network intrusion detection, addressing the research gap identified in recent literature regarding the systematic evaluation of deep LSTM architectures. Using the CICIDS2017 benchmark dataset, we evaluated Vanilla LSTM, Bidirectional LSTM, Stacked LSTM (shallow and deep), Stacked Bidirectional LSTM, LSTM with Self-Attention, LSTM with Multi-Head Attention, and a novel hybrid CNN-LSTM-Attention architecture. Our experimental results demonstrate that the LSTM with Multi-Head Attention architecture achieved superior performance with 98.41% accuracy, 98.51% precision, 98.40% recall, 98.44% F1-score, and 99.67% ROC-AUC, utilizing only 14,290 parameters. Notably, our memoryefficient implementation successfully trained all architectures within 1GB RAM constraints using 400,000 samples, making the approach viable for edge computing scenarios. The novel hybrid CNN-LSTM-Attention architecture—which synergistically combines convolutional local-pattern extraction, bidirectional recurrent temporalmodeling, and self-attention dynamic feature weighting—achieved 96.37% accuracy with the highest ROC-AUC of 99.75%, demonstrating exceptional discriminative capability with only 7,010 parameters. All evaluations were performed on a held-out stratified test set with SMOTE applied exclusively to the training partition, ensuring realistic assessment under natural class imbalance conditions. These results provide practical, empirically grounded deployment guidelines for IoT intrusion detection across resource availability scenarios spanning edge devices (1GB RAM)to cloud environments, with direct relevance to emerging regulatory frameworks emphasizing efficient and interpretable AI-powered security systems.
Keywords: Intrusion detection systems, Long Short-Term Memory, LSTM, deep learning, attention mechanisms, multi-head attention, Internet of Things, IoT security, network security, CICIDS2017, memory-efficient computing, edge computing, cybersecurity, binary classification, neural networks