When third place is enough

Introduction

The current (2026) edition of the FIFA World Cup finals has 48 teams in it, arranged in twelve groups of four.

Historically, the World Cup Finals has usually had a number of teams that is a power of 2 (16 up to 1978, and 32 in 2022), which offers the easy procedure that the top two teams in each group of 4 advance to the next stage (which is usually a knockout). However, when the number of teams is not a power of 2, it is not obvious what should happen. In the first 24-team competition (Spain, 1982) the top two teams of each of the six groups advanced to a second-round group stage¹ containing four groups of three,² and the winners of those groups advanced to the semi-finals. However, groups of three are awkward, and for the 1986 competition (Mexico), a 16-team knockout was used for the second stage: the top two teams from the first-round groups advanced, along with the four best third-place teams (by points, and then goal difference). In this year’s competition, a similar procedure is used; the second-round knockout stage will have 32 teams, and so the 8 best third-placed teams in the twelve groups will advance.

Which made me wonder: how many points does a third-placed team need to have a good chance of advancing to the knockout? Is three points likely to be enough, or will it take four?

This seems like the sort of thing a simulation could shed some light on. Make some simplifying assumptions, simulate, and see what happens.

Packages

and set random number seed for reproducibility:

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.3     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

set.seed(457299)

The simulation

The simplifying assumption I made was that every game was probabilistically the same: there is a certain probability of a draw, and if there is a winner, it is equally likely to be either team. How likely is a draw? I went back to the 2022 group stage, in which there were 8 groups and thus \(8 \times 6 = 48\) matches, of which 10 were draws, so my best guess at the probability of a draw is 10/48:

pdraw <- 10 / 48

Next, to simulate a single match. I use two uniform(0, 1) random numbers; if the first is less than pdraw, it’s a draw; otherwise, I use the second random number to decide who wins. Since I’m going to be doing calculations with points later, I return the numbers of points for each team:

sim1 <- function(p_draw) {
  r <- runif(2)
  if (r[1] < p_draw) return(c(1, 1)) # draw
  if (r[2] < 0.5) return(c(3, 0))
  return(c(0, 3))
}

Let’s test this for reasonableness:

map(1:10, \(x) sim1(pdraw))

[[1]]
[1] 3 0

[[2]]
[1] 0 3

[[3]]
[1] 0 3

[[4]]
[1] 3 0

[[5]]
[1] 0 3

[[6]]
[1] 3 0

[[7]]
[1] 1 1

[[8]]
[1] 0 3

[[9]]
[1] 3 0

[[10]]
[1] 0 3

That seems not unreasonable.

Next, we simulate all six results in a group, and calculate the number of points for each team. This has some steps:

make a vector of team names
make all the combinations of pairs of teams
keep only the ones where the first team is alphabetically before the second one (to make sure the teams play each other only once and no team plays itself)
simulate the result of each game
do some rearrangement to make a long dataframe with one number of points for each team in each game (thus, here, with 12 rows)
total up the number of points for each team
arrange them in descending order, and return just a vector of points:

sim_group <- function(n_team = 4, p_draw) {
  teams <- paste0("T", 1:n_team)
  crossing(T1 = teams, T2 = teams) %>% 
    filter(T1 < T2) %>% 
    rowwise() %>% 
    mutate(pts = list(sim1(p_draw))) %>% 
    unnest_wider(pts, names_sep = "_") -> d
  d %>% select(team = T1, pts = pts_1) -> d1
  d %>% select(team = T2, pts = pts_2) -> d2
  bind_rows(d1, d2) %>% 
    group_by(team) %>% 
    summarize(tot_pt = sum(pts)) %>% 
    arrange(desc(tot_pt)) %>% 
    pull(tot_pt)
}

To test, by simulating 10 groups:

map(1:10, \(x) sim_group(p_draw = pdraw))

[[1]]
[1] 7 6 2 1

[[2]]
[1] 9 4 3 1

[[3]]
[1] 9 6 3 0

[[4]]
[1] 5 4 4 3

[[5]]
[1] 9 6 3 0

[[6]]
[1] 5 4 4 2

[[7]]
[1] 7 4 3 2

[[8]]
[1] 9 6 3 0

[[9]]
[1] 6 4 4 3

[[10]]
[1] 9 6 1 1

Each of these is a possible arrangement of points in a 4-team group (you can work out an arrangement of wins, draws, and losses that will add up to that many points). A hint for this is that if there are no draws, the points add up to 18 (for example, 9, 6, 3, 0), with each of the games sharing out 3 points, but every draw decreases the total by one, so that 7, 4, 3, 2 must have two draws, both of which must have featured the bottom team (no other way to get 2 points).

Now, the route to simulation is a bit clearer: simulate the right number of groups, and record the number of points obtained by the team in each position:

sim_all_groups <- function(n_group, n_team = 4, p_draw) {
  tibble(gp = paste0("G", 1:n_group)) %>% 
    rowwise() %>% 
    mutate(pts = list(sim_group(n_team, p_draw))) %>% 
    unnest_wider(pts, names_sep = "_")
}

and to test (with 6 groups, because 12 is unwieldy):

sim_all_groups(6, p_draw = pdraw)

This might be a simulated version of the 1986 World Cup (24 teams). Then, four of the third-place teams advanced, and so according to this simulation, it would have taken 4 points to do that.

Then, given a table like the one above, we need to extract the number of points earned by each third-place team, rank them (breaking ties at random), and label each number of third-place points by whether or not it was enough to qualify for the next round. That’s the job of the next function:

third_place_qual <- function(n_group, n_qual, n_team = 4, p_draw) {
  sim_all_groups(n_group, p_draw = pdraw) %>% 
    select(pts_3) %>% 
    mutate(rk = rank(-pts_3, ties.method = "random")) %>% 
    mutate(qual = rk <= n_qual) %>% 
    select(pts_3, qual)
}

Let’s up things to 12 groups and 8 qualifiers:

third_place_qual(12, 8)

This time, three of the third-place teams earned 4 points, and they all qualified. Seven of them earned 3 points; five of them qualified and two did not. The last two simulated third-place teams only got 2 points, and that was not enough to make it to the next stage.

The last step is to do the above many times, and count up the fraction of teams qualifying for the knockout stage with each number of points:

tibble(sim = 1:1000) %>% 
  rowwise() %>% 
  mutate(thirds = list(third_place_qual(12, 8))) %>% 
  unnest(thirds) %>% 
  group_by(pts_3) %>% 
  summarize(prob = mean(qual)) -> d

with these results:³

According to this simulation, 3 points is the break-even: a third-place team with 3 points has a slightly better than even chance of qualifying. With 4 points, it is almost certain; with 2 points it is almost impossible.

I have to say that this was about what I was expecting; it seemed to me that a team with 3 points and a decent goal difference should make it to the knockout, but that 2 points would not be enough.

Limitations

Of course, not all the teams in a World Cup finals are really of equal strength, so it is not really true that the probabilities of win, loss, and draw are the same for all matches. It might be instructive to compare the results of a simulation with 6 groups, 4 third-place teams qualifying with the actual results from the World Cup finals where that was the system.

Also, none of this says anything about goal difference, because I have not simulated the goal-scoring process in any fashion. It would be an interesting question to simulate the effect of goal difference on the likelihood of qualification with 3 points. (As I write this, Scotland have 3 points from their first two games, with the third game being against Brazil, so this is a very relevant question to Scotland fans.)

I like running simulations using rowwise, but I suspect this slows things down. I’ll take that, because I can understand what I am doing, but I wouldn’t like to re-do the simulation every time I rebuild this post.

Credit

Image credit

Footnotes

You would think that FIFA would have learned not to have a second-round group stage after the 1978 debacle in which Argentina, playing last, knew that they had to thrash Peru to eliminate Brazil on goal difference, and proceeded to do so.↩︎
England’s “reward” for winning their first-round group was to go up against West Germany and Spain, while France, second in that same group, went on to face Austria and Northern Ireland.↩︎
I cheated, behind the scenes. I ran the simulation once, saving the results, and then below read in the saved results. This way I don’t have to rerun the simulation every time I rebuild this blog post. This means putting an eval: false on the top of the simulation code chunk, and having a chunk hidden with echo: false that reads in the saved results from a file.↩︎