YOLOv11-GA: Global Attention and Adaptive Loss for Sonar Target Detection
Huiwen Zhang
Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China.
Xiaoxia Yang *
Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China.
Na Gao
Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China.
Ning Wang
Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China.
Cuicui Zhang *
School of Marine Science and Technology, Tianjin University, Tianjin, China.
*Author to whom correspondence should be addressed.
Abstract
Sonar image target detection holds significant application value in critical fields such as marine resource exploration, underwater object search, and rescue operations. However, sonar images suffer from inherent complex background noise interference. Coupled with the widespread class imbalance issue in existing datasets, the detection models suffer from poor adaptability and suboptimal performance in sonar scenarios. To address the aforementioned issues, this paper proposes an improved sonar target detection method based on YOLOv11, named YOLOv11-GA. The specific improvements are as follows: A Global Attention Mechanism (GAM) is introduced before the SPPF layer in the YOLOv11 backbone network to enhance the model's ability to focus on key features while suppressing non-key features. An Adaptive Threshold Focal Loss (ATFL) function is incorporated to dynamically adjust loss weights, thereby reducing interference from easily classified samples and strengthening learning attention toward difficult-to-classify samples. To validate the effectiveness of the method, comparative experiments were conducted on two publicly available datasets: the Sonar Common Target Detection Dataset (SCTD) and the Forward-Looking Sonar Marine Debris Dataset (FLSMDD). The proposed model was evaluated through ablation studies and comparative experiments against multiple baseline and mainstream detectors. Experimental results show that YOLOv11-GA outperforms the baseline YOLOv11 by 2.4% in [email protected] on SCTD and by 1.6% on FLSMDD, confirming its enhanced adaptability and detection capability in complex underwater environments. Furthermore, the model maintains a high inference speed of 68.84 FPS, which satisfies the real-time requirements in practical sonar applications such as underwater navigation and monitoring systems.
Keywords: Sonar image target detection, Class imbalance, Global Attention Mechanism (GAM), Adaptive Threshold Focal Loss (ATFL)