Covering the same ground?

code
analysis
Author

Ken Butler

Published

April 22, 2026

Description

An exploration of two different functions that appear to do the same thing.

Packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.1     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Introduction

In a previous post, I had this dataframe:

my_url <- "http://datafiles.ritsokiguess.site/wisconsin.txt"
wisc <- read_table(my_url)
wisc %>% select(location) %>% 
  mutate(state = "WI") -> wisc
wisc

These are mostly, but not all, cities in Wisconsin, even though I am apparently claiming that they all are.

Correcting the states

(this discussion is taken from the aforementioned previous post)

The last three cities are in the wrong state: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN), and Chicago is in Illinois (IL).

The first step is to make a small dataframe with the cities that need to be corrected, and the states they are actually in:

corrections <- tribble(
  ~location, ~state,
  "Dubuque", "IA",
  "St.Paul", "MN",
  "Chicago", "IL"
)
corrections

Note that the columns of this dataframe have the same names as the ones in the original dataframe wisc.

After some messing about with left_join, I eventually landed on this function from dplyr, which literally applies the corrections:

wisc %>% 
  rows_update(corrections) -> wisc_correct
Matching, by = "location"
wisc_correct

The values to look up (the “keys”) are by default in the first column, which is where they are in corrections. If they had not been, I would have used a by in the same way as with a join.

The new dplyr and something very similar

I was reading the blog post about the most recent update of dplyr, and when I got to recode_values there, I had this great sense of déjà vu, because it seems to do almost exactly the same thing. Except that my lookup table is the wrong kind of thing: instead of saying how old values of state go to new values of state, it says that some of the values of state need to be changed according to which value of location they go with.

This seems like a very similar problem, and it was not clear to me whether recode_values or replace_values from the new dplyr can help with it.

Apparently the idea in dplyr came from this post by Libby Heeren. Let me see if I can reproduce the ideas there, but in my own way.

The idea was that there was a survey like this:

fake_survey

and the scores are actually Likert scores, to be mapped to text like this:

likert_mapping

To get a table with the scores and the text that goes with them, Libby uses a technique with !!!, which I don’t understand, but my go-to for this sort of look-up is a left_join:

fake_survey %>% 
  left_join(likert_mapping)
Joining with `by = join_by(score)`

with one missing value corresponding to a response that is not between 1 and 5.

Another way to do this exact thing is with recode_values:

fake_survey %>% 
  mutate(text = recode_values(score, 
                              from = likert_mapping$score,
                              to = likert_mapping$text))

The reason rows_update works for my data is that the original data and the table of corrections have the same two columns. To make that work for our fake survey, we could create a column text of missing values1 first:

fake_survey %>% 
  mutate(text = NA_character_) %>% 
  rows_update(likert_mapping, unmatched = "ignore")
Matching, by = "score"

to achieve the same end.

Back to Wisconsin

Now, does my Wisconsin data respond to this?

wisc %>% 
  mutate(state2 = recode_values(location,
                                from = corrections$location,
                                to = corrections$state))

This is not as elegant as I would have liked, but it gives the raw material to work with: the correct state is state2 if it is not missing, and state otherwise, which is either a coalesce or a use of replace_values, where the code below defines correct_state to be state2 with all its missing values replaced by the corresponding ones in state:

wisc %>% 
  mutate(state2 = recode_values(location,
                                from = corrections$location,
                                to = corrections$state)) %>% 
  mutate(correct_state = replace_values(state2, NA ~ state))

and that gets us there, but I have to say I like rows_update better for this job.

Picture credit

Cover art of the album Same Ground by Brunswick.

Footnotes

  1. Which have to be created as missing text, so that when rows_update fills in the other values, which are text, it doesn’t get upset about trying to make a column out of things that are of two different types.↩︎