# Probabilities of First Round Outcomes in the World Cup

Did you know that there are 40 possible point distributions for a group at the first round of the World Cup? I was wondering how many there are and with what probability they occur. So here are the results of my calculations: There are 729 (3^6) variations of the results of the six matches of the group stage, ignoring the actually scored goals and only considering if the first mentionend team wins, draws or loses.

By ordering the resulting point distribution by the number of points, the results can be reduced to 40 combinations, where the first digit belongs to the first placed team, the second to the second placed etc. There is only one variation, a streak of six consecutive draws, which leads to a point distribution with three points for every team (3333). On the other side, there are 36 possible sequences of match results that lead to the distributions 6443 and 7441.

But which point distributions have the highest probability? Well, that depends on the probability of match outcome. If a victory, a loss and a draw have the same probability, it can be calculated by simply dividing the number of variations for a certain distribution by the number of all possible variations. In case of 6443 this would be 36/729 or 4.94 percent. The assumption that all results occur with the same frequency in very unlikely. To calculate „empirical“ probabilities, I looked at the last five World Cups and counted the number of draws in the first round. 63 of 240 matches ended without a winner. Thus, the draw probability for any match, not having any further information on the competitors, is 26.25 percent. For further calculation, I simply assumed that both teams have an equal winning probability of (1-0,2625)/2.

Take a look at the results:

Number of Variations, Probabilities and Real Point Distributions (1998 to 2014) for World Cup Group Stage

Obviously the point distributions with a higher calculated empirical probability occur indeed more often. Having only included 40 groups since 1998, the distribution lacks of course some smoothness. But there probably is a problem with the assumption of equal winning probabilities too. Since 1998 the two combinations with the highest computed probabilities have only occured once or in 2.5 percent of the included groups, although they should have in 12.4 percent. A reasonable explanation is, that team strength plays a crucial rule in the creation of groups. The highest ranking teams in the FIFA are distributed over all groups, making it likely that their winning probability in every match is higher than assumend for my calculations. Point distributions like 6633 and 6443 are more likely to occur if a group consists of teams with similar strengths. The mode of group drawing makes a strength distribution in favor of these point distributions more improbable.

by Tobias Wolfanger

# Body Data of Bundesliga Players by Position

As promised last week, here’s my follow up post with a look at the body data of Bundesliga players according to their positions. I aggregated the data I collected from whoscored.com last week and calculated the average age, height, weight and BMI for each position.

The difficulty of an analysis by position arises from the natural fact, that some players can and do play on more than just one position or at least some variation of it. 163 out of 546 players in the data set have played at least two different positions during the past season. Therefore it is necessary to determine how to deal with this noise in the data. Aggregating data on a higher level would not be a good solution. Imagine summarizing centre backs (D(C)), left (D(L)) and right backs (D(C)) into one position: Putting together lively full backs and heavyset center backs would ruin a lot of the expected insight.

So what did I do about it? If a player played more than just one position in the last season, I made a duplicate entry for each position played. So if for example Thomas Müller played as an offensive midfielder in the center, left and right and as a forward, he has four entries in the data set which I used for analysis. I also removed those players from the data set, who were members of a team, but didn’t have any appearences on the field. So all results presented in the following diagrams can be interpreted as the mean values for body data of players who had at least one appearance on the respective position in the past season. The data set used for the analysis can be downloaded here.

## Age and Position

Looking at the following diagram, the reader might ask why midfielders (M) and defenders (D) are much younger on average. This is more a less a statistical artifact due to the fact that the database at whoscored.com isn’t able to further specificate the position for players with few appearances. Therefore the players summarized under these positions are mostly younger ones. The same is true for forwards (FW), but there is no further specification for their position (center, left or right).

Over all, there is not a big difference regarding the age by position. Besides goalkeepers (GK) being the oldest on average, there might be a slight tendency to staff the more defensive positions with older players. Maybe this is where routine comes into play. We all know that a single defensive mistake can often have a more serious effect on the result than those in the opponent’s half of the field.

Average Age of Bundesliga Players by Position in Years

## Goalkeepers are the tallest on the field

As I suggested in my last post, goalkeepers are indeed the tallest on average. They also have the highest mean weight and BMI. This is not surprising if one considers their job to keep their goal clean. Some extra centimeters make it much easier to block a higher share of shots coming towards them. Some extra weight, as long as it has no effect on their ability to reach the farest corners of the goal, can help them to dominate their six-yard-box.

Average Height of Bundesliga Players by Position in cm

## Heavyweight in the Penalty Box

Their men in front, the centre backs (D(C)), are the second tallest and heaviest on the field. With regard to the height of their natural opponents, a decent height is necessary for the upkeep of air dominance. Forwards are smaller and lighter than centre backs, but surmount all other positions. They seem to have the body requirements to hold against the defenders in the penalty box. So it doesn’t surprise, that players deployed as defensive midfielders (DM(C)) are the next tallest and heaviest.

Average Weight of Bundesliga Players by Position in kg

## Midfielders and Full Backs

The left and right backs are smaller in comparison to their centre back colleagues, with an average height and weight that resembles the body data of midfielders. Differences between the various positions in the (attacking) midfield and full backs are marginal. Similar physical requirements such as speed or technical skills might be a reason for that and an explanation why many full backs are deployed as attacking midfielders and vice versa from time to time.

So what can we get out of this analysis? At least there seems to be a connection between the physical appearance of a professional football player and the positions he’s playing. So I’m pretty sure now why I spend almost all of my football „career“ as a defender. Not to mention that reflexes like a railway crossing gate didn’t give me a chance to aim at the goalkeeper’s position.

by Tobias Wolfanger

# Body Data of Bundesliga Players and Average Germans

So recently I came across that wonderful website whoscored.com, which offers a great database with a lot of player and team data far beyond the information usually available. Having dealt with football data on the aggregate level of leagues before, I thought it might be a good idea to take a closer look on some features to gain some insights on the micro level of the game. So here I am, digging into some of the data I scraped from the website.

Wondering which hypothesis I could go after, it crossed my mind that I could start with the basics. What can be said about the body physics of professional football players? How can they be compared to the German average?

I plotted weight and height of all the Bundesliga players and enriched the diagram with additional lines representing the edges of Body Mass Index (BMI) zones. The BMI is calculated by dividing the weight in kg by the square of the height in meters. It is used to measure the physical condition of people or societies under consideration of their height.

Height and Weight of Bundesliga Players in the season 2012/13 with BMI zones

## Marco Reus, you and me

Not surprising there is a strong correlation (Pearson’s r = 0.82) between height an weight, with an average of 183.7 cm and 78.6 kg. Compared to the average male German, Bundesliga players are more than 5 cm taller (1.78 cm) and nearly 5 kg lighter (83.4 kg), if the whole male population is included. These metrics are of course biased, because older people tend to be smaller and heavier, at least until they get into their 60s. The following table compares the physics of Bundesliga players to average German males in their respective age groups. The data are from chapter four of Statistisches Jahrbuch 2012.

Comparison of height, weight and overweight percentage of Bundesliga players and average German males

While there is almost no difference regarding the weight of both groups, the professional players tend to be a few centimeters taller. In the group of the players between 30 and 35, the difference is 6 cm. The main reason for this: More than 22 percent of the players in this age group are goalkeepers who tend to have a longer career and are taller than other players.

Regarding the BMI, the majority of players is located in the normal weight zone with a tendency towards the upper edge. According to the BMI criteria, only a few players can be classified as slightly overweight. It’s improbable that any of them are fat. I think the more plausible reason some of them are hitting the overweight zone is their high share of muscle tissue. The BMI doesn’t make any any differences between fat and muscles. Compared to average males, the percentage of overweight football players is rather small.

The final conclusion this far: Bundesliga players have average weight for their age groups, but are slightly taller. Only a small share of them is overweight by BMI criteria.

## What else can be done with body data?

As the goalkeeper example has shown, some positions seem to have special demands for the body measurements of players. I’ll soon write a follow up post that will deal with this relation. Finally there probably is also a connection between average body height an the performance of teams. Have a look at this blog post by Chris Anderson which suggests a strong correlation between the average height of a population and the FIFA coefficient of its national team.

by Tobias Wolfanger