A Better Player Comparison Tool for MLS transfers to Europe
I love FBRef. The level of detail you can get for free is fantastic, and it covers women’s sports too (though for some reason the level of detail for NWSL is greatly reduced this season).
Recently they came out with two huge improvements — first is the scouting report, which helps put data into context. It looks like this:
You can also download it as a xls file. The percentile ranks are nice to see, and it helps give a sense of context. Then, they added a comparison tool where you can directly compare players side by side. Very useful, but how do you know which players to compare?
That’s where this little box comes in — most similar players. But if you look at it, it doesn’t make much sense.
These are the most similar players to Tanner Tessmann. Joe Corona? I don’t know about that one. Also, it’s just for MLS, and I’m more interested in who Tessmann compares to now that he’s moved.
I decided to make my own algorithm. I made two key changes — I didn’t just consider FBRef data, but used it to create calculated field such as touch ratios to get a better comparison, and I weighted certain attributes based on position. All in all, I had about 100 variables to draw from separated into categories.
Once I pulled every player season from the top 5 leagues,filtered for a minimum of 3 90s to avoid noise, I calculated the percentile rank of each stat for every player. From there, I pull the scouting report from an MLS player for the last 365 days and calculate the percentile ranks of those. I don’t use the percentile ranks from the scouting report as those are position specific and less useful for me.
I take the average of the absolute vale of the percentile rank difference for each stat, and then average those based on the category weights.
Then I subtract the resulting number from 1 to find out a similarity score.
The scores can in theory range from 0 to 1, but typically the most similar players are between 0.8 and 0.9. Anything more than 0.9 is more close than even an individual player gets between their own seasons. The lowest scores are below 0.5, but rarely below 0.425.
Here’s an example of the output for the top 5 most similar players:
Denis Bouanga’s 2018-2019 season is the closest to Aaronson’s 2020. If we go back to FBRef, we can see where they are close:
A lot of those per 90 numbers are extremely close, especially when it comes to shot creation and xG. That’s a pretty key similarity! I think this is a very good measure of play style, though not necessarily quality; after all the jump from MLS to Ligue 1 is still significan even though the gap may be closing.
I’m still working on the best way of packaging and communicating this data to interested parties. I’ve created a basic template which I can put on socials when players move:
Where to improve:
The biggest next step is to look closer at team data and how it relates to players; things like usage are far more important than final third touches, and adjusting defensive actions for possession are extremely important.
Also, right now due to the work that needs to be done to turn an FBref scouting report into a workable data source it’s difficult to make the project open for all. Automating the process could make it possible for people to use the algorithm on their own.
If you want to know how a player compares to his European counterparts, feel free to ask me on Twitter @Thundering165, or use this link to create your own comparisons.