Hi guys, my name is Badump, and this is my first developer log. I am the guy who writes the pathfinding, netcode, unit movement, and right now also targeting code; in general with high performance code. Expect dev logs on those in the near future.
Recently I have been working on the targeting system that we are going to use in Sanctuary.
It is also supposed to handle things like spatial category queries. (Meaning get all units of type x in range n around some location or similar).
First things first, what are the likely use cases, and performance concerns?
Well let's start with a very heavy use case, imagine 10,000 artillery units, 5,000 for both sides. All units are in range of one another.
Crudely drawn by Timevir to demonstrate the example. Shockingly didn't copy and paste 10000 times.
If we do brute force comparison, then that is 5000^2 or 25 million evaluations. Ouch. (That is not even a peak work case, that would be 30,000 vs 30,000, or 900 million.)
One solution is to spread out calculations over time, but as you know that is going to create some gameplay problems at higher unit counts, which is something we don't want to cause.
Also imagine your radar flickers from a power shortage, now all units need to reacquire targets, causing pretty big gameplay issues, as you lose tons of dps from your units not firing.
So what we need is something that does it instantly and does not wait for so long.
Also we need some other features. Examples are not targeting units that are going to be soon dead (overkill) and trying to target units that have the lowest angular distance, meaning you don't want your battleship guns to rotate 180 degrees if there are enemies in front of the guns.
The solution that I created is to take note of the fact that most of those unit ranges overlap, meaning we can combine them together into a big group, and then calculate for that group instead of calculating for single units. That in the best case scenario reduces our factor by 5000, from 25 million to 5000, or (900m -> 30k). Still that is not always possible so our factor is going to be higher than 1 but much much less than 5000. So it is between 100x-1000x faster, and the gap grows when more units are added. For those familiar with Big O Notation: O(n^2) vs O(n log n).
The next piece of the puzzle is targeting stuff that is close by angle. So we want to prefer enemies that are not going to cost much time to rotate towards. Solving that problem precisely is going to be costly, so we are going to have to create a small workaround.
We split all the units into a ring around the average center of the group itself, in some angle increment, for now 10* meaning 36 ring pieces.
Also units have an overkill factor, we generally try to avoid scheduling units that have more incoming damage than their current HP.
Alright, what must be done now is stated, now how? Here is an example of how the targeting works, showing one of our code-based test cases;
This code is what our modders would be using to target stuff, or some similar API in lua(which is not in scope for now, stay tuned). So first the target is instantiated with some fixed size, and player count must be known before the game. Unfortunately for now it generates around 50MB per player, which is the fastest solution, but might be bad in the case where there are 32 players. I am still working on a more efficient solution which I will present at a later date.
The targeting priorities are set, which summarized is what the unit wants to target. Right now it is just a test, but imagine 0 means land and 1 means air. Our unit doesn't target air.
For now it is simply matching if the target contains all the categories in any of the targeting unit list. (Now this code is burstable, don't be misled by the usage of the list, that is just a convenience API).
The other units are of type 0 ie. land. All is good. Different players also have different groups due to the possibility of multiple teams. In case there are only 2 teams, and locked teams, it is possible to send team id, instead of player id. So for now 2 teams with 32 players = all good, 32 players FFA (where there are 32 teams) is not so good. Then we finish adding all the targeting units, and perform some magic to make them into groups.
Afterwards we add units that are targets. For now some features are missing, like adding the unit HP (it is always the same). That will be developed later on.
The targets are added into the groups, and distributed to the correct ring segment (for reducing turret movement)
Then comes the last part, where we distribute the targets themselves onto the units, each group is processed in parallel (as each firing unit is pretty much independent, they have their own targets). That is done simply by having each unit iterate radially outwards from the closest ring segment. So if it starts on segment 0(meaning it has rotation of around 0 degrees), it searches for targets by looping like this: 0,1,-1,2 ,-2,3,-3 …
Currently the code as is, for a test case of 20x20 = 400 square of units, targeting 9100 units, finishes in around 800µs (1000µs = 1ms). This is my favorite scenario in gameplay, with artillery targeting a swarm of Tier 1 units, and the whole reason I spent the time to write this code. That is around 4 million comparisons.
There are currently some big flaws with our performance doctrine that we are still working on. The ring doesn't keep track of segments that are all about to die (overkill) so later units have to loop through earlier overkill units to find a target. Keeping track of it would reduce the time a lot.
And right now this code runs only on a single thread for now, because some parts are not completely thread safe.
Fixing those two issues would allow it to execute much faster.(But it is already 600µs).
Although it’s important to remember that normally you don't have so many units trying to find a target in the same frame, so this is a worst case scenario even with our simulated numbers.
Currently the largest problem is creating a memory efficient list that is also thread safe and well actually fast. Unfortunately, Unity nativeList is not thread-safe for expanding.
PS: When I initially wrote this post, I forgot to turn off debug mode! I got it to 40µs :).
PPS: As the tests were actually too fast for profiling, I increased the size of the scale until they were at the very least hitting 10ms for execution. But I will explain this in the next post.
Stay tuned for next time! Badump