Designing Interesting Opponents through Online Learning in Predator–Prey Environments

Gajjala Lilly Rani

Avanthi’s Scientific Technological and Research Academy, Hyderabad, Telangana, India.

Ankatwar Gajanan *

Government Degree College (Arts and Commerce), Adilabad, Telangana, India.

Alurwad Tripat Venkatreddy

Government Degree College, Nirmal, Telangana, India.

K. Krunal Yadav

Government Degree College (Arts and Commerce), Adilabad, Telangana, India.

Narote Preetham

Telangana Tribal Welfare Residential Degree College (Boys), Boath@Adilabad, Telangana, India.

*Author to whom correspondence should be addressed.


Abstract

Predator–prey environments are widely used benchmarks for multi-agent reinforcement learning (MARL) because they capture simultaneous cooperation among predators and competition against prey, yet many deployed predator opponents rely on static or pre-trained policies that become predictable, reduce behavioural diversity, and limit long-term engagement. This study investigates how online learning can generate adaptive and interesting opponents that continuously challenge prey agents. We propose an online-learning framework that integrates reinforcement learning with dynamic opponent adaptation and opponent modelling in a discrete 20×20 grid world containing four coordinated predator agents and one evasive prey. Agents use a state representation comprising relative agent positions, Euclidean distances, velocity vectors, and historical actions. Predator learning combines temporal-difference updates (Q-learning) with PPO/MAPPO-style policy optimization for stability, while an opponent model is updated online to predict behaviours and support coordinated decision-making. Interestingness is quantified using behavioural diversity (entropy), novelty (distance between current and historical behaviours), and challenge, alongside standard performance indicators and confusion-matrix-based evaluation of action prediction (Chase, Surround, Ambush). Across 10,000 training episodes and multiple runs under identical conditions, the proposed online-learning predators achieved the highest cumulative rewards with faster convergence than random, scripted, and offline-RL baselines, and attained an 87% capture success rate (56 percentage points above the random baseline). Online learning also produced the greatest behavioural diversity (entropy=2.21) while remaining strategically effective. Opponent modelling showed strong classification performance (85% Chase, 81% Surround, 82% Ambush; precision/recall/F1=0.83), and training yielded emergent cooperative strategies including coordinated trapping, dynamic flanking, ambush positioning, and adaptive pursuit. Overall, continuous online adaptation improves robustness and engagement by preventing behavioural stagnation, though it introduces computational overhead and potential instability in non-stationary MARL settings; future work should address scalability and explore hierarchical, graph-based, transformer, and meta-learning extensions. The work was conducted entirely in simulation without human or animal data.

Keywords: Multi-agent reinforcement learning, online learning, predator–prey environment, adaptive opponents, opponent modelling, behavioural diversity, reinforcement learning


How to Cite

Rani, Gajjala Lilly, Ankatwar Gajanan, Alurwad Tripat Venkatreddy, K. Krunal Yadav, and Narote Preetham. 2026. “Designing Interesting Opponents through Online Learning in Predator–Prey Environments”. Journal of Engineering Research and Reports 28 (6):306-22. https://doi.org/10.9734/jerr/2026/v28i61929.

Downloads

Download data is not yet available.