Covering the same ground?

Description

An exploration of two different functions that appear to do the same thing.

Packages

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0.9000     ✔ readr     2.2.0     
✔ forcats   1.0.1          ✔ stringr   1.6.0     
✔ ggplot2   4.0.2          ✔ tibble    3.3.1     
✔ lubridate 1.9.5          ✔ tidyr     1.3.2     
✔ purrr     1.2.2          
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Introduction

In a previous post, I had this dataframe:

my_url <- "http://datafiles.ritsokiguess.site/wisconsin.txt"
wisc <- read_table(my_url)
wisc %>% select(location) %>% 
  mutate(state = "WI") -> wisc
wisc

These are mostly, but not all, cities in Wisconsin, even though I am apparently claiming that they all are.

Correcting the states

(this discussion is taken from the aforementioned previous post)

The last three cities are in the wrong state: Dubuque is in Iowa (IA), St. Paul is in Minnesota (MN), and Chicago is in Illinois (IL).

The first step is to make a small dataframe with the cities that need to be corrected, and the states they are actually in:

corrections <- tribble(
  ~location, ~state,
  "Dubuque", "IA",
  "St.Paul", "MN",
  "Chicago", "IL"
)
corrections

Note that the columns of this dataframe have the same names as the ones in the original dataframe wisc.

After some messing about with left_join, I eventually landed on this function from dplyr, which literally applies the corrections:

wisc %>% 
  rows_update(corrections) -> wisc_correct

Matching, by = "location"

wisc_correct

The values to look up (the “keys”) are by default in the first column, which is where they are in corrections. If they had not been, I would have used a by in the same way as with a join.

The new `dplyr` and something very similar

I was reading the blog post about the most recent update of dplyr, and when I got to replace_values there, I had this great sense of deja vu, because it seems to do almost exactly the same thing. Except that my lookup table is the wrong kind of thing: instead of saying how old values of state go to new values of state, it says that some of the values of state need to be changed according to which value of location they go with.

This seems like a very similar problem, and it is not clear to me whether recode_values or replace_values from the new dplyr can help with it.

Apparently the idea in dplyr came from this post by Libby Heeren. Let me see if I can reproduce the ideas there, but in my own way.

The idea was that there was a survey like this:

fake_survey

and the scores are actually Likert scores, to be mapped to text like this:

likert_mapping

To get a table with the scores and the text that goes with them, Libby uses a technique with !!!, which I don’t understand, but I do understand this:

fake_survey %>% 
  left_join(likert_mapping)

Joining with `by = join_by(score)`

with one missing value corresponding to a response that is not between 1 and 5.

Another way to do this exact thing is with recode_values:

fake_survey %>% 
  mutate(text = recode_values(score, 
                              from = likert_mapping$score,
                              to = likert_mapping$text))

Now, does my Wisconsin data respond to this?

wisc %>% 
  mutate(state2 = recode_values(location,
                                from = corrections$location,
                                to = corrections$state))

This is not as elegant as I would have liked, but it gives the raw material to work with: the correct state is state2 if it is not missing, and state otherwise, which is either a coalesce or a use of replace_values, where the code below defines correct_state to be state2 with all its missing values replaced by the corresponding ones in state:

wisc %>% 
  mutate(state2 = recode_values(location,
                                from = corrections$location,
                                to = corrections$state)) %>% 
  mutate(correct_state = replace_values(state2, NA ~ state))

and that gets us there, but I have to say I like rows_update better for this job.

The reason rows_update works here is that the original data and the table of corrections have the same two columns. To make that work for our fake survey, we could create a blank column text first:

fake_survey %>% 
  mutate(text = "") %>% 
  rows_update(likert_mapping, unmatched = "ignore")

Matching, by = "score"

with the only difference being that the unmatched score has a blank text rather than a missing value.

Description

Packages

Introduction

Correcting the states

The new dplyr and something very similar

The new `dplyr` and something very similar