Adversarial Threats to AI-Driven Systems: Exploring the Attack Surface of Machine Learning Models and Countermeasures

Abayomi Titilola Olutimehin *

Royal Holloway University of London, Egham, Surrey, United Kingdom.

Adekunbi Justina Ajayi

Obafemi Awolowo University, PMB 013, Ile-Ife, Osun State, Nigeria.

Olufunke Cynthia Metibemu

Ekiti State University, Ado-Ekiti, Nigeria, Iworoko Road, PMB 5363, Ado-Ekiti, Ekiti State, Nigeria.

Adebayo Yusuf Balogun

University of Tampa, 401 W Kennedy Blvd, Tampa, FL 33606, United States of America.

Tunbosun Oyewale Oladoyinbo

University of Maryland Global Campus, 3501 University Blvd E, Adelphi, MD 20783, United States.

Oluwaseun Oladeji Olaniyi

University of the Cumberlands, 104 Maple Drive, Williamsburg, KY 40769, United States of America.

*Author to whom correspondence should be addressed.


Abstract

Adversarial attacks pose a critical threat to the reliability of AI-driven systems, exploiting vulnerabilities at the data, model, and deployment levels. This study employs a quantitative analysis using the CIFAR-10 Adversarial Examples Dataset from IBM’s Adversarial Robustness Toolbox and the MITRE ATLAS AI Model Vulnerabilities Dataset to assess attack success rates and attack surface exposure. A convolutional neural network (CNN) classifier was evaluated against Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and Carlini & Wagner (C&W) attacks, yielding misclassification rates of 42.2%, 65.5%, and 86.8%, respectively. Statistical analysis using the Chi-Square Goodness-of-Fit Test (p < 0.001) confirmed a disproportionate targeting of model-level vulnerabilities (53.6%). These vulnerabilities pose severe risks across real-world AI applications. In cybersecurity, adversarial perturbations compromise intrusion detection systems, malware classification models, and spam filters, allowing cybercriminals to bypass AI-driven defenses. In autonomous vehicles, subtle adversarial modifications to traffic signs and road patterns can mislead AI-based navigation, increasing the likelihood of accidents. Similarly, in financial systems, adversarial attacks deceive fraud detection models, enabling unauthorized transactions and financial fraud. Countermeasure evaluation demonstrated that adversarial training provided the highest robustness gain (23.29%), while detection algorithms were least effective (15.34%). To enhance AI security, hybrid defense mechanisms integrating adversarial training with real-time anomaly detection should be prioritized, and standardized evaluation benchmarks should be established for AI security testing. These findings emphasize the necessity of hybrid AI security frameworks that combine adversarial training with real-time anomaly detection. Moreover, standardized security benchmarks should be established to ensure resilience across industries, particularly in high-stakes AI applications.

Keywords: Adversarial attacks, AI security, model vulnerabilities, adversarial training, machine learning defenses


How to Cite

Olutimehin, Abayomi Titilola, Adekunbi Justina Ajayi, Olufunke Cynthia Metibemu, Adebayo Yusuf Balogun, Tunbosun Oyewale Oladoyinbo, and Oluwaseun Oladeji Olaniyi. 2025. “Adversarial Threats to AI-Driven Systems: Exploring the Attack Surface of Machine Learning Models and Countermeasures”. Journal of Engineering Research and Reports 27 (2):341-62. https://doi.org/10.9734/jerr/2025/v27i21413.

Downloads

Download data is not yet available.