Extracting Machine Learning Training Data from Video Games – Problem Statement

2024-05-05

Mikołaj Koziarkiewicz

Figure 1. Source: SD 2.1+own work.

Spring has sprung in full force in the Northern hemisphere, so it’s time to dust off the old "submit publication" button! In this entry, we’ll look into ways of extracting training data from computer-generated data sources without breaking too much of a sweat. Specifically, we’ll introduce the data source we’ll work on, and define a problem statement. This will form a basis for later blog entries that will tackle various aspects of said problem. Read on!

Introduction

One of the interesting aspects of our times is the increasing importance of data derived from "human-made" sources. Early automation was almost always primarily grounded in the physical reality. As time went by, systems that built upon data delivered by other systems become more worthwhile: analysis of social media trends, electronic sales projections, and so on. And, well, there’s also lots of stuff built on top of data originating from video games. Anti-cheat systems, usage analytics – used both for objectively valuable insights like improving accessibility, or less so, like "optimizing" microtransactions – with the normalization of gaming as a pastime for all ages, there’s a gold mine of opportunity here. And we’ll take some advantage of it; but first, a bit more about our subject.

The game in question is MechWarrior Online, subsequently referred to as MWO. As the name suggests, it is a multiplayer-only fair set in the Battletech/MechWarrior universe. It offers several game modes, virtually all of them centered around two teams fighting each other to destruction, while also trying to balance achieving auxiliary objectives.

Figure 2. Promotional screenshot of MechWarrior Online. An Ebon Jaguar heavy mech engaging a target (possibly an armless Bushwacker) with support weapons. Source: Piranha Games.

Everyone pilots a Battlemech (no combined arms here) : a customizable, heavily armed, ludicrously armored, single-seat, bipedal^[1] combat vehicle. While it sounds like a First-Person Shooter with some sci-fi bling thrown on, that couldn’t be further from the truth. Mechs have multiple, independently destroyable components, each potentially housing multiple armaments and pieces of equipment. Players need to manage ammunition, heat generated by own (or hostile^[2]) weapon fire, damage distribution, terrain use, team coordination, and so on.

The general vibe feels less like a squad-based FPS, and more like somewhere between naval combat and simultaneously controlling an armored platoon. Mad aiming skillz are much less important here than situational awareness, forethought, planning, and team coordination. Not only that, but the mechs themselves are – as previously stated – highly customizable, to the point that a significant amount of player’s time is spent tweaking and trying out different weapon/equipment configurations. All this coalesces into a unique experience, and so a unique data landscape to work on.

Possible premises and their choice

Having explained the circumstances we have on our hands, let’s see what we can do with the source material. Since many players occasionally record their games for later analysis (and some for streaming), the original idea was to create a "virtual coach" that would, based on said recordings, call out possible improvements in the player’s style.

Positioning (w.r.t. friendlies and the likely enemy placement) is probably the most important learnable skill in MWO - but getting full data for that (known friendly and enemy positions at a given point of time) is difficult to extract just from existing footage. Not only that, developing the model itself would be decidedly non-trivial. In other words – assisting positioning in MWO is an intriguing challenge, but complex enough to warrant a whole separate series.

So – at least for this blog series – let’s try something more manageable: situational awareness. With all the aspects of managing a mech occupying the player brain’s processing power, slipping up and being completely oblivious to a hostile mech^[3] running through the field of vision is surprisingly common. However, such mistakes can usually prove fatal, as said overlooked opponent can easily get behind the player’s mech and start tearing them apart. Moreover, the initial situation often constitutes the opponent’s error, and would be prime time to engage.

Having contextualized our circumstances, we now need a problem statement with specific requirements and goal conditions. Here it is:

Develop a model that detects enemies in-view that could have been targeted, but weren’t, and mark them on the footage. Bonus goal: mark situations where the player was not actively engaging any target, but could.

OK, looks good and relatively manageable. From our problem statement, we can work out that we need some way of determining:

the positions of mechs in a given frame of the footage;
that a given mech is friendly or hostile;
that a given hostile mech is "non-targeted";
that the player is in a "non-engaging" state.

We can also declare a couple of non-functional requirements:

our solution does not have to be especially performant – we’re operating on pre-recorded footage;
our solution should weigh recall over precision — we’re fine if e.g., a friendly mech is falsely marked as a "non-targeted" hostile mech, as it should be apparent from the context of the footage, and the recommendation simply discarded.

Now, we’ll take a brief look at the game’s UI to determine what, exactly, we want to train.

Identifying screen UI elements relevant for machine learning model training

Let’s examine some screenshots, enhanced with context markings. Examples are demonstrated in the slideshow below:

Targeting reticles: yes, the mechs have multiples thereof, for different types of weapons, and their hardpoint locations.
Unmarked hostile mech.
Target designator on a marked, hostile mech: the player can mark at most one hostile mech at a time.
Targeted mech’s scan result, showing weapon and component status^[4].

This is just some information relevant to the player on the screen, but pretty much all that we need for our purposes.

In end effect, we have two types of data to extract from the video’s frames:

UI-derived information, such as weapon firing state,
detectable objects.

While the former is extractable using simple image manipulation and "classic" computer vision techniques, this is not so with detectable objects, i.e. the mechs. For that, we need to train some sort of object detection model.

We could go over each recording and meticulously mark each and every mech. But who has the time (or money to hire annotators) for that?

We might consider "traditional" motion detection techniques, used widely in consumer IP cameras (and explained in a myriad of online tutorials), but that option also falls flat. Why? Because both the objects and the camera are moving – sometimes quite vigorously. So that’s one possible free lunch out of reach. We will, however, consider the potential to exploit research into movement detection on mobile cameras, but that’ll come later on.

Now, take another look at the screenshot: see how the hostile mech is nicely marked^[5]? And how about that nice bracketing of the actively targeted mech? Almost like a bounding box, right?

Well, it looks like we have a way out – we’ll try to automatically extract detection boxes by annotating targeted hostile mechs as objects to be detected. We can use that data as inputs for subsequent training of our "primary" detection models.

Figure 3. For completeness, a screenshot that’s more representative of situations in the game – an evidently more complex scene. The view is unzoomed, meaning the full interface, including the quasi-diegetic cockpit displays, is visible. The player, and two teammates, are engaging an (unlucky) hostile mech that just rounded a corner. The opponent’s readout is showing damage to the torso components, including actively receiving fire to the central portion. PPI has been edited out.

Summary

In this entry:

we’ve introduced the use case we’re going to handle in the blog series initiated by this entry: extracting training data from video games, and putting it to use.
We’ve chosen MechWarrior Online (MWO) as our exemplary data source.
We’ve also examined the automation problem landscape in MWO:
- we considered several potentials, such as assisting in player positioning,
- but, for the immediate future, settled on a more manageable problem: helping with situational awareness.
Finally, we also identified the screen elements that are relevant for our model training.

In the upcoming several follow-up entries of the series, we’ll explore how to obtain training data for the defined task, by way of identifying and extracting the relevant UI elements. We’re going to use, and eventually compare, several different methods to accomplish this task. And yes, that means we’ll actually start writing some code. Stay tuned!

1. Battletech fans will be quick to note that the last two points aren’t always the case in the setting, but it is in MWO, so you can safely ignore them (unless they’re in your rear arc).

2. …or sometimes friendly…

3. Especially a small and fast one that is equipped for stealth.

4. This one in particular is at full health.

5. That’s the weight class marker. The cross-hatched diamond signifies the "assault" class, the heaviest one in MWO.

Mikołak

Recent Posts

Extracting Machine Learning Training Data from Video Games – Problem Statement

Introduction

Possible premises and their choice

Identifying screen UI elements relevant for machine learning model training

Summary

Mikołak

Recent Posts

Making LLM coding assistants create graphics… and physical objects

Extracting Machine Learning Training Data from a Video Game – Using "Ready-Mades"

Extracting Machine Learning Training Data from Video Games – UI element analysis for model training preparation

Extracting Machine Learning Training Data from Video Games – Problem Statement

More