What Good Balance Argument is Not

I am going to touch on the most sensitive topic of Starcraft. Balance.

People are already arguing which race is the strongest/weakest on various forums even though Legacy of the Void is no more than a month old. This piece is not about my take on the current balance, but rather what is not a good argument on the state of balance.

To begin, I have written about my view on the concept of balance more than two years ago, and it has not changed much.

Instead of discussing what makes a good argument on balance, I will give it a little twist and discuss what does not make a good argument on balance. I shamelessly take this idea from one of the famous papers named “What Theory is Not” that was published two decades ago. In that paper, the authors discuss some of the misconceptions of what people think are “theory”, but they are not. I am applying this to the balance of Starcraft.

I intentionally do not link any example for the several common observations mentioned below, because it is unnecessary to upset anyone just to get my point across.

Also, more importantly, the poor arguments I listed below should not be considered as the evidence of counter argument. That is, when I say “race A is broken because of reason X” is a poor argument for balance, it does not mean that race A is NOT broken when X is observed. A layman example is, “person A is intelligent because today is Sunday” is a poor argument to justify a person’s intelligence, but it does not necessary mean that I am saying person A is not intelligent when today is Sunday. Person A can be intelligent, and today can be Sunday at the same time.

Poor arguments of balance

Winners of major tournaments

If a specific race has not won any tournament for a period of time, it does not mean that race is the weakest. Conversely, if a specific race has won most of the tournaments, it does not mean that race is the strongest.

If a race is strong or weak, everyone who plays that race will be affected. This basic yet important logic applies to other points I am going to put forward later, so I will just quote it for prominence.

The balance of the races affects every player directly (play that race) and indirectly (play against that race) . The balance status of a race affects everyone who plays that race regardless of skill.

Hypothetically, Life wins every tournament, but no other Zerg player is in any round of 16 (I just give a random yet understandable threshold). Will you say Zerg is the strongest? No, you will probably say that Life is the strongest player, but Zerg is weakest. You may even joke that Life is holding Zerg back because Mr. David Kim will not buff Zerg since the race is winning everything.

I am not endorsing the idea that if a race is poorly represented in a certain cut off, it is the weakest. However, it is definitely a much stronger argument for balance than the frequency of winners itself, based on the same logic.

Race distribution in grandmaster only

I have an unpopular opinion. Ladder ranking is the most accurate measurement of skill for the population of Starcraft players.

When I say it is the most accurate, I don’t mean it is perfect because it is far from it. There are many assumptions in the above statement, for example, other unimportant considerations like having multiple accounts, inactivity, smurfing, unranked games etc. are not explicitly considered. However, when you look at the whole population of players itself, these have relatively little impact on the relationship of skill and ladder ranking for the whole population of Starcraft players. If you want to argue against this statement, you are basically arguing that there is no measurement of skill for the player population which is a completely different discussion altogether. This is because there are no other better ways to practically put the whole population of players’ skill to test.

Then, as the subheading suggested, why is race distribution based on grandmaster only is a poor argument for balance?

We can conceptualise the relative skill of every player on a continuum with one end being grandmaster rank 1 and the other end being bronze rank 100, and the league placement is a categorical cut off on the continuum (a bit of an ordinal versus interval scale thing). Going back to the quoted logic I have mentioned above, the grandmaster players are not representative of the whole population on balance, so it is limited in what you can conclude from it. Assuming that the race population distribution is decently well distributed (that is each race should have around 33.3% of total players – exclude random), which is quite close the last time I checked, the unequal distribution of race should be observed in other leagues as well.

Let say Terran is the strongest race. The number of Terran players in grandmaster should be statistically higher than the expected 33.3%, and let that number be 80% hypothetically. Based on the quoted logic, the number of Terran players in master should also be statistically higher than the expected 33.3%, but probably lower than 80% and let that number be 60%. The same thing should be observed in diamond league, with the number being statistically higher than the expected 33.3%. This pattern should carry on to the lower leagues until the number is lower than 33.3% because of the assumption that the total race distribution of the population is well distributed or close.

I know that some of you are at the edge of your seat and want to point out the obvious flaws in this argument. My point is that race distribution in grandmaster only is a poor argument for overall balance. However, it does not mean that it has completely no implication on balance. This is because it is plausible that the impact of “imbalance” is asymmetrical or unproportionate for players of different levels, even though the racial balance should affect every player. This is a very important difference, and let me explain it.

Let say, the current state of Starcraft is in a perfect state of balance, and any minor change will tilt the scale. I now decrease the production time of Scv by 1 second. Obviously, Terran is now the strongest race, assuming all else remain equal. Given the strong mechanics of grandmaster players, this will have a strong effect on the results. In contrast, the bronze players, who don’t always make workers constantly, will benefit less from this change. Nevertheless, everyone still gets affected.

Therefore, taken together, simply by looking at the race distribution in grandmaster alone is not a good argument for balance, but you may make an argument of the asymmetrical effect of balance for players of different skill levels at best.

Now I am going take it one step further, as some of you may argue that the asymmetrical effect of balance for different skill levels is an indication of general balance anyway. In a way, yes, but it can also imply otherwise. Race A maybe strongest at the highest level, which is captured by the distribution in grandmaster, but it may not be the strongest in other levels. This may then reach an equilibrium and balance off, and result in a general balance that is not captured if only the distribution of grandmaster is studied. Does this contradict with the statement that “the balance status of a race affects everyone who plays that race”? No. When I say “affects”, I do not specify that it has to be affected in the same direction. But is that even practically possible? Yes. A good example will be forcing High Templar to autocast storm on any enemy unit it is in vision of (aka auto attack like other units with attack when another enemy unit is in range). This drastically weaken the power of High Templar in the hands of top players, but it may become stronger in the hands of wood league players who don’t even know how to cast it.

I know that you may want to point out many assumptions that I make above (e.g., people pick their race without considering the balance status, and also stick to it), but you probably won’t argue against my suggestion that race distribution in grandmaster only is a poor argument for the overall balance of the game.

Korean effect

There is a consensus that South Korean players are better than “foreigners”.

When a non-Korean player won a Korean player, people sometimes argue that “even foreigners can beat Koreans with that race”. Assuming that race is so strong that a presumed inferior player (foreigner) can defeat a presumed superior player (Korean), shouldn’t the effect be magnified if both are Koreans and one of them uses that race? When that happens, everyone will already be discussing how imbalance it is without using the example of Korean versus foreigner. Since most of the time such effect is not observed in “Korean mirror matches”, it just means that foreigners are capable of defeating Koreans to begin with.

With that being said, there can be extreme outlier.

Match up on ladder

I still remember when I was in platinum in Wings of Liberty, and I told a friend that I rarely got matched against a Terran. He said Terran was strong, and they were promoted to diamond. When I was promoted to diamond, Terran was still the relatively rare race. My friend said the same thing, but this time he said Terran were in master.

That actually doesn’t really make sense. If Terran was so strong overall that players got pushed up by one league, shouldn’t those in gold got promoted in platinum and I got matched against them anyway? While his statement seems rather silly, the phenomenon itself may be “explained” by an alternate explanation.

Again, let’s assume the population distribution of the races is close to equal (around 33.3% each), and a certain race does not get matched as often. That means less people of that race are playing the game if the sample size is big enough and there is no relationship between race and play time. I once told a friend that, what if it is an implicit sign of balance? Clearly, it is just a for fun speculation, but what if people don’t play when their race is weak? This may actually be plausible. Of course, it is not a good argument for balance too.

What is a good balance argument?

Honestly, it is really hard to put forward one, as the game is so dynamic.

In my opinion, there were only two periods when one can be conclusive that the game was imbalance. The first is the patchzerg period at the end of Wings of Liberty, and the second is the Blink Stalker PvT era in the early 2014. The article on TeamLiquid about patchzerg was probably the best argument I have seen about balance. The main reason why I say it is the best is that the author convincingly put forward different pieces of evidence that converge to tell a single story.

This is also how scientists make a claim about something, but general public only read the dumbed down version filtered by the usually not-so-qualified journalists that focuses on the findings. There is always a saying, a big claim requires big evidence. When you claim that a race is too strong or weak to the extent that it warrants a patch, you should be capable of putting forward convincing evidence. This is something that is usually lacked in balance discussion topics on forums. Based on what is written, the authors usually give me a feeling that they form a conclusion first (a race is too strong or weak), then they attempt to find support for it. I actually got into a debate with someone on Reddit before (not going to link it here as I explained above), because that person simply annoyed me by saying something pretty stupid and I quote one of the lines, “Ro 16 and Ro 32 Terran had the biggest share. And SSL has one Terran from the last possible 2”. This person was arguing with another person that Terran was imbalance, because of the relative high number of Terran in Ro16 and Ro32 of GSL, and there was one Terran in SSL (semi final or final) even though there were only two Terran to begin with. Here is my reply, “So are you suggesting that the relatively high representation in GSL Ro32 (37.5%) and Ro16 (43.75%), and the only two terran in SSL are doing well (one in final and the other lost in Ro8) as the argument that terran is OP? If you’re, then the argument itself is self-contradictory. If you use race representation on a larger scale in GSL (i.e., Ro32 or Ro16), as the evidence that the more represented race should be considered stronger, then the fact that only two terran made it to SSL suggest otherwise. Conversely, if you use the fact that maru and dream made it far in SSL even though they are the only two terran, then the fact that only 1 terran (12.5%) made it to Ro8 of GSL contradicts the argument.”

Anyway, this post is not about how to make a good argument, but simply points out the common poor arguments used.

EDIT: In response to the comments on Reddit regarding this post.

I have read through most of the comments, and it is quite funny that many understood part of the article but misinterpret the implications of that.

Probably only u/KOUJIROFRAU/ truly understands what I am saying in between the lines. People seem to go to far into arguing what makes a good argument for balance, for example, putting forward different evidence based on statistics (and use our brain). I basically indirectly suggest that having different evidence is the way to go (by using the TL article as an example), and it is more about being convincing than being conclusive.

u/Jay727/ is sort of saying the same thing based on the same ground, but what I wrote is intended to highlight why certain statements are not good arguments for balance and nothing more. The statement, and I quote, “Which is why I don’t buy arguments like “Looking at tournament winners is bad because what if Life wins everything on his own”” is misquoted as I did not write such a sentence. Also, the statement “But when it’s 10 different Protoss winning everything then there might be something wrong.” is hinged onto the same quoted logic that it affects everyone. Assuming the race distribution and of participants is reasonably close to equal and there are multiple different winners, which may suggest that there is something about the race and it affects everyone, this should be observed in other earlier cut offs distribution like Ro16 and Ro32. That is the reason why looking at the winners alone is not a good argument.

By the way, ZelotypiaGaming, you really know how to pick a topic title. lol.

If you enjoyed this article, I’d love you to share it with one friend. You can follow me on Twitter and Facebook. If you really like my work, you can help to sustain the site by contributing via PayPal and Patreon. See you in the next article!

Lies, Damned Lies and Statistics
—————————————–

If you can bear following the thought process of a non academic layman here:
1. anecdotal evidence is not data, that’s so common sense obvious – a winner of a single tournament does not represents anything. Even if there is perfect balance of 33.3% outcome, a single race will win at any given time. However, if you have 100 INDEPENDENT tournaments and 50% of them are won by a single race, then you may have something (Independent as in the outcome of one tournament does not affect the rest, so if Maru is so good and win 50% of tournaments he’s in, that’s just maru. If different terrans win 50% of tournaments in existence, that is the race).

2. “Statistics do not tell the whole story”: say we take another argument about balance in real life: the fact that women in america are getting paid 80% of the men in the same role. Or that during recession, african american are more often to be fired than white american. The counter argument to those statistics are : women in the same role spend on average less time on their career due to child care, if we are comparing the same people with the same years of experience, then the difference disappear! Also, while african american are fired more often than white, white american are fired more often than Asian; giving the implausible argument that white american (majority in management position) are racist against other white american.

How does this apply in Starcraft? There are many hidden points that need to be considered in taking statistics at face value. The fact that there are less X race at certain league is not the full story. The argument that patchzerg article state that “all zerg plays look the same” means absolutely nothing. If there is only one dominant strategy (nash equilibrium? or whatever), the everyone will use that strategy. Like MMM. In fact, only Protoss is unique enough that you can differentiate Rain and MC from how they play. The rest.. if you block out the players’ names, I won’t be able to tell who’s playing.

3. Double blind randomized trial is the only solution.

12 thoughts on “What Good Balance Argument is Not”

whartanto2 says:

December 8, 2015 at 10:26 pm

There is only one solution: Double Blind Randomized Trial! :)

Scruffy Bearded Guy says:

December 9, 2015 at 4:04 am

I would like to add also the following here…

A guy above you doesn’t mean that he’s literally better. Obviously the ladder system should be used to differenciate player’s skill levels but I would say is more accurate to say that the ladder measures better your *own* progress.

From a technical perspective though, I think that overall the ladder system of SC is the best around in terms of being an accurate/fair system that it will throw you were you deserve to be, if not the best out there.

PS. Good job btw regarding your blog.

1. Maxilicious says:
  
  December 9, 2015 at 3:28 pm
  
  Like I said it’s not accurate, but it’s probably the best we can get in practice. Also, the rule of thumb is, when you speak to someone who is of a lower league than you, the league placement is the perfect measurement. On the other hand, when you speak to someone who is of a higher league than you, the league placement is bullshit. Never forget this!
  
Kongeskog says:

December 9, 2015 at 4:08 am

Thank you for a well written article with an academic approach to the phenomenon. I have just one remark to the interesting statement: “The balance status of a race affects everyone who plays that race regardless of skill.” Let us imagine that one race is easier to control regarding micro, popularly known as A-move, it would probably have an edge in lower league. While in more skilled league the higher apm and skill will better utilize the units ability regarding spells, hence the A-move advantage would not be noticeable or it may even be drawback. I am aware that you somewhat commented this with your example of auto-cast storms. However, even though you may argue that the game in wider perspective might seems balanced. For those players it affect it is not for balanced.

1. Scruffy Bearded Guy says:
  
  December 9, 2015 at 4:25 am
  
  If you want my opinion, that is correct, although this is what makes the multiplayer experience on a such complicated game and it should feel like that because that also exposes the different skills that the players have (even in same leagues).
  
whartanto2 says:

December 9, 2015 at 3:08 pm

Lies, Damned Lies and Statistics
—————————————–

If you can bear following the thought process of a non academic layman here:
1. anecdotal evidence is not data, that’s so common sense obvious – a winner of a single tournament does not represents anything. Even if there is perfect balance of 33.3% outcome, a single race will win at any given time. However, if you have 100 INDEPENDENT tournaments and 50% of them are won by a single race, then you may have something (Independent as in the outcome of one tournament does not affect the rest, so if Maru is so good and win 50% of tournaments he’s in, that’s just maru. If different terrans win 50% of tournaments in existence, that is the race).

2. “Statistics do not tell the whole story”: say we take another argument about balance in real life: the fact that women in america are getting paid 80% of the men in the same role. Or that during recession, african american are more often to be fired than white american. The counter argument to those statistics are : women in the same role spend on average less time on their career due to child care, if we are comparing the same people with the same years of experience, then the difference disappear! Also, while african american are fired more often than white, white american are fired more often than Asian; giving the implausible argument that white american (majority in management position) are racist against other white american.

How does this apply in Starcraft? There are many hidden points that need to be considered in taking statistics at face value. The fact that there are less X race at certain league is not the full story. The argument that patchzerg article state that “all zerg plays look the same” means absolutely nothing. If there is only one dominant strategy (nash equilibrium? or whatever), the everyone will use that strategy. Like MMM. In fact, only Protoss is unique enough that you can differentiate Rain and MC from how they play. The rest.. if you block out the players’ names, I won’t be able to tell who’s playing.

3. Double blind randomized trial is the only solution.

1. whartanto2 says:
  
  December 9, 2015 at 3:18 pm
  
  Continuing the previous post:
  
  3. Double blind randomized trial is the only solution.
  Up to as recent as 1900s, you are more likely to die if you go to a doctor than if you don’t. The golden standard of modern science is a randomized trial, it is the single biggest reason why the mortality rate in the world dropped. In the past, if a patient is sick, doctors will bleed them to get rid of “bad blood”. If they die, then the doctors did not bleed them enough; if they live, the bloodletting works. This is no difference than judging witches by “burn them, if they die, they are innocent”. The same argument still plague modern economics (Keynesian vs austerity, etc.) and it turns into a dogma storytelling arguments (my “God/Theory/statistic” is better than yours).
  
  Here is an unlikely way to scientifically measure SC2 balance: get random 100 people who never play games before, split them randomly into 3 races, and make them compete for a year. Then you will get the true number of balance :) I can’t think of a control group design.. would it be a safe assumption to expect 33.3% split? or not? Anyway, you’re the assistant professor, figure it out :)
  
  1. Maxilicious says:
    
    December 9, 2015 at 3:26 pm
    
    This is actually quite well thought out. I am sure we can agree that no matter what methods we come up with, they will be imperfect in terms of inferring causality. It is quite a struggle to come up with real life experiments to even propose a causal effect, so it is usually a case of having realistic trade offs with theoretical perfections. It will be interesting if we can truly conduct a reasonably well controlled experiment on racial balance in Starcraft.
    
ZeloTypia (@ZelosGaming) says:

December 9, 2015 at 6:56 pm

FYI https://www.reddit.com/r/starcraft/comments/3vx3cx/ladder_ranking_is_the_most_accurate_measurement/

1. Maxilicious says:
  
  December 10, 2015 at 5:40 pm
  
  As always, thanks for sharing.
  
HyperONE says:

November 5, 2021 at 8:41 am

Hi, commenting on this old article because I have been walking through the general guides and found this spoke to me.

I know a big topic regarding balance lately has been the power of proxy battery Void Rays in PvT. It is rather common for the Terran player to complain that the build completely kills Reaper FE in favor of Marine FE or Reactor FE, and defending the build is disproportionately difficult compared to executing it. Perhaps you do or don’t agree, I can’t be sure. I’m sure you’ve more or less said your peace on this topic, but I think it would be enlightening to have an article talking about this interaction in some details.

1. Max says:
  
  November 5, 2021 at 10:39 am
  
  I actually don’t have a strong opinion on proxy void ray. Balance change is not going to happen, so it is down to balancing through maps.