Modern video games pose significant challenges for traditional automated testing algorithms, yet intensive testing is crucial to ensure game quality. To address these challenges, researchers designed gaming agents using Reinforcement Learning, Imitation Learning, or Large Language Models. However, these agents often neglect the diverse strategies employed by human players due to their different personalities, resulting in repetitive solutions in similar situations. Without mimicking varied gaming strategies, these agents struggle to trigger diverse in-game interactions or uncover edge cases.
In this paper, we present MIMIC, a novel framework that integrates diverse personality traits into gaming agents, enabling them to adopt different gaming strategies for similar situations. By mimicking different playstyles, MIMIC can achieve higher test coverage and richer in-game interactions across different games. It also outperforms state-of-the-art agents in Minecraft by achieving a higher task completion rate and providing more diverse solutions. These results highlight MIMIC's significant potential for effective game testing.
MIMIC consists of four components: the Planner, Action Summarizer, Action Executor, and Memory System.
The Planner is the core module, generating action plans from predefined personality traits and past experiences. These experiences are stored in the Memory System, with the Action Summarizer analyzing execution results to produce summaries. The Action Executor then translates the Planner’s output into in-game actions. In each action iteration, the Planner produces a plan, the Executor executes it, and the Summarizer records feedback as new memory to guide future planning. The following sections detail each component.
The task, Obtain 1 diamond, is long-hailed as a significant challenge in the community, which is also the focus task of the NeurIPS MineRL Competition. It requires completing at least 13 sub-goals in sequence, each with multiple valid variants, making it a long-horizon task that typically takes humans over ten minutes to solve. Players must also handle dynamic requirements posed by the game, such as hunger or safety, further expanding the solution space.
Existing gaming agents are optimized for effective task completion, often resulting in homogeneous and repetitive behaviours. For example, in our evaluation, we observed that ODYSSEY, a state-of-the-art LLM-based agent, consistently followed a single optimized path to obtain a diamond regardless of variations in environments and events triggered in different runs. While these agents can achieve high task completion rates, their behaviours diverge from how human players play the game. Unlike these agents, human players rarely pursue a task in a strictly optimized manner. Instead, they exhibit adaptive and diverse behaviours in response to spontaneous in-game events, shaped by their personality traits. For example, an aggressive player attempting to obtain a diamond may still combat with creatures to collect dropped items, even when such actions are not directly beneficial to the primary task. As a result, the existing gaming agents fail to emulate the diverse and adaptive real-world player behaviours, limiting their ability to cover the wide range of unpredictable in-game scenarios and reducing their overall effectiveness for game testing.
This motivates us to propose MIMIC, an LLM-based agent framework that integrates personality profiles into the core planning process. Unlike existing agents that narrowly pursue optimal action sequences, MIMIC leverages recent advances in LLMs capable of simulating consistent personality traits to model gameplay behaviours that more closely resemble those of real human players. By conditioning its dynamic Planner on distinct personality prompts, MIMIC generates strategies that are task-oriented and driven by personality-specific tendencies. This enables it to pursue goals while continuously responding to in-game events in a manner consistent with a player of that personality, leading to more diverse, realistic, and interaction-rich testing trajectories.
For example, in finishing the Obtain 1 Diamond task, our aggressive agent dedicated 21.52% of its actions to combat, frequently upgrading armour and engaging a variety of creatures. In contrast, the cautious agent avoided combat entirely, prioritizing safety by crafting torches before mining. Meanwhile, the adrenaline-seeking agent actively crafted swords and explored high-risk areas to encounter enemies, reflecting a strong preference for challenge-oriented interactions. These results demonstrate that by integrating personality into our gaming agent, MIMIC can generate meaningful actions in response to various environments according to the specified personality, driving exploration towards more diverse scenarios.
Furthermore, human decision-making is influenced not only by personality but also by experience and personal preferences. To model this, we introduce a Memory System that records past actions and outcomes. During planning, the system retrieves relevant and preferred memories, filtered by personality alignment, to help guide the agent's decisions in a consistent, human-like manner.
By combining personality-driven planning with memory-aware decision-making, MIMIC delivers a novel testing framework that mimics the behavioural diversity of human players and enables border exploration of in-game scenarios.
To be published in the Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering, ASE'25
@inproceedings{MIMIC_YIFEI_ASE_2025,
author = {Chen, Yifei and Habchi, Sarra and Wei, Lili},
title = {MIMIC: Integrating Diverse Personality Traits for Better Game Testing Using Large Language Model},
year = {2025},
isbn = {},
publisher = {},
address = {},
url = {https://doi.org/10.48550/arXiv.2510.01635},
doi = {arXiv:2510.01635},
abstract = {Modern video games pose significant challenges for traditional automated testing algorithms, yet intensive testing is crucial to ensure game quality. To address these challenges, researchers designed gaming agents using Reinforcement Learning, Imitation Learning, or Large Language Models. However, these agents often neglect the diverse strategies employed by human players due to their different personalities, resulting in repetitive solutions in similar situations. Without mimicking varied gaming strategies, these agents struggle to trigger diverse in-game interactions or uncover edge cases. In this paper, we present MIMIC, a novel framework that integrates diverse personality traits into gaming agents, enabling them to adopt different gaming strategies for similar situations. By mimicking different playstyles, MIMIC can achieve higher test coverage and richer in-game interactions across different games. It also outperforms state-of-the-art agents in Minecraft by achieving a higher task completion rate and providing more diverse solutions. These results highlight MIMIC's significant potential for effective game testing.},
booktitle = {Proceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering},
pages = {},
numpages = {13},
location = {Seoul, South Korea},
series = {ASE '25}
}