League of Legends Competitive Matches Exploratory Data Analysis

by Lucien Chen

Introduction

It feels like everytime I introduce myself, the first question I receive is something along the lines of: “Wait, your name is Lucien? Like Lucian from League of Legends?” I always found this question to be strange, since the spelling and pronounciation are quite different.

Inspired by this story, and having played League and watched the competitive scene for over 10 years, I thought it would be interesting to take a dive into the stats of competitive matches, particularly those of Lucian.

For some context, League of Legends is an MMO, RTS game that is enjoyed by millions of players all around the world, and we are going to be exploring a dataset that contains information about competitive matches across many different regions and tiers to try to answer the question:

Is Lucian a better carry than other ADCS?

Source: data

Important Terminology

Throughout this notebook/website, there are going to be several abbreviations and other terms used by people in the League community. Here is a short list of terms that will be relevant to our analysis:

Cleaning the Data

Originally, the dataset had 149,232 rows and 123 columns. However, not every column is relevant since we are only investigating the relative strength of one champion. As such, I have selected certain columns that will be useful in our analysis. I also cleaned up some of the data since they weren’t in the correct format, e.g Changing datacompleteness to be of Boolean data type when it was previously inputted as a str.

Here are the columns I kept for the analysis and what they represent:

And here is what the first few rows of the cleaned up dataset look like:

gameid datacompleteness league side position champion win kills deaths assists damagetochampions damageshare earnedgold golddiffat10 xpdiffat10
ESPORTSTMNT01_2690210 True LCK CL Blue top Renekton False 2 3 2 15768 0.278784 7164 52 -44
ESPORTSTMNT01_2690210 True LCK CL Blue jng Xin Zhao False 2 5 6 11765 0.208009 5368 485 432
ESPORTSTMNT01_2690210 True LCK CL Blue mid LeBlanc False 2 2 3 14258 0.252086 5945 162 71
ESPORTSTMNT01_2690210 True LCK CL Blue bot Samira False 2 4 2 11106 0.196358 6835 296 265
ESPORTSTMNT01_2690210 True LCK CL Blue sup Leona False 1 5 6 3663 0.0647631 2908 528 -587

Now we want to see if ADCs actually fulfill their role as carries.

This graph displays the percentiles of damageshares for each position. Notice that ADCs have the highest damage share at each breakpoint (ex. 25th percentile, median, max, etc.) which is a clear indicator that they are indeed the carries of their games.

We are also interested in if Lucian has a significant playrate. If he is picked in very few games, it may be an indication that he is only useful within certain situations which may not be representative of all games played.

It looks like Lucian has about a 5% pickrate. While that’s not the highest pickrate compared to all the other ADCs, it is still quite a large number of games so we can proceed.

Dealing with Missingness

We are trying to find metrics to quantify whether or not Lucian is a better ADC. We are interested in relative statistics, since something like overall damage dealt will naturally increase with longer games. It looks like there are a lot of missing values for golddiffat10 and xpdiffat10. Since either of these columns represent how well Lucian does relative his opponents we can just focus in on the missingness of just golddiffat10. Let’s take a look at exactly how many missing values there are in the entire dataframe and try to figure out why that is

Here is the distribution of total variation distances using a permutation test, where we randomly shuffled the datacompleteness column. The TVDs seem to be low so let’s plot our observed TVD of 1 to see how drastic the difference is.

As we can see the likelihood of having a TVD that high is 0 and so the values in the golddiffat10 columns are likely not missing at random, or NMAR, since it tells us which columns have incomplete data.

Performing a Hypothesis Test

Let’s finally answer our question:

Is Lucian a better carry than other ADCs?

We are going to measure ability to carry by revisiting winrate. If a champion wins more often than other champions, on average, there is a high likelihood that the champion is more effective in their role, i.e better at carrying.

Let’s take a look at the win rates as well as how many games each champion appeared in:

champion mean count
Aphelios 0.490725 4043
Ashe 0.455446 303
Caitlyn 0.527548 726
Draven 0.500938 533
Ezreal 0.462601 1738
Jhin 0.470588 1156
Jinx 0.502806 3920
Kai’Sa 0.479 1000
Kalista 0.524338 1171
Kog’Maw 0.421965 173
Lucian 0.533507 1149
Miss Fortune 0.519481 539
Nilah 0.492754 69
Samira 0.503571 280
Senna 0.524331 822
Seraphine 0.521341 328
Sivir 0.522556 1064
Tristana 0.494737 475
Twitch 0.496599 441
Varus 0.477431 576
Vayne 0.453901 141
Xayah 0.482523 1173
Zeri 0.533549 2474
Ziggs 0.420513 195

It looks like Lucian’s win rate is quite high. The graph shows the winrates for each ADC depending on which side their team played on. It looks like Lucian has average performance on red side, but has strong outperformance on blue side causing his average win rate to spike.

Let’s define our hypotheses:

To test this out, we’ll run a hypothesis test at the 95% confidence level.

We observe that Lucian has a 53.35% win rate accross 1149 games, the number of games he was played in. We want to see if his win rate is statistically higher from the rest of the ADCs, so we will randomly select 1149 games and find the winrate of the sample which will be our test statistic.

The chance that we would see a winrate that is just as great or greater than Lucian’s winrate difference is 1.33% across 10,000 simulations which is rather low.

We will reject the null at the 95% confidence level since it is so rare to see a win rate as high as Lucian’s which suggests that he may possibly be a better carry.

Key Finding

The data suggests that Lucian may be a better carry, however his overall winrate is biased by his outperformance on the blue side. With this in mind, there may be extraneous factors that affect his performance relative to other ADCs.