Note that the columns of this dataframe have the same names as the ones in the original dataframe wisc.
After some messing about with left_join, I eventually landed on this function from dplyr, which literally applies the corrections:
wisc %>%rows_update(corrections) -> wisc_correct
Matching, by = "location"
wisc_correct
The values to look up (the “keys”) are by default in the first column, which is where they are in corrections. If they had not been, I would have used a by in the same way as with a join.
The new dplyr and something very similar
I was reading the blog post about the most recent update of dplyr, and when I got to recode_values there, I had this great sense of déjà vu, because it seems to do almost exactly the same thing. Except that my lookup table is the wrong kind of thing: instead of saying how old values of state go to new values of state, it says that some of the values of state need to be changed according to which value of location they go with.
This seems like a very similar problem, and it was not clear to me whether recode_values or replace_values from the new dplyr can help with it.
Apparently the idea in dplyr came from this post by Libby Heeren. Let me see if I can reproduce the ideas there, but in my own way.
The idea was that there was a survey like this:
fake_survey
and the scores are actually Likert scores, to be mapped to text like this:
likert_mapping
To get a table with the scores and the text that goes with them, Libby uses a technique with !!!, which I don’t understand, but my go-to for this sort of look-up is a left_join:
fake_survey %>%left_join(likert_mapping)
Joining with `by = join_by(score)`
with one missing value corresponding to a response that is not between 1 and 5.
Another way to do this exact thing is with recode_values:
fake_survey %>%mutate(text =recode_values(score, from = likert_mapping$score,to = likert_mapping$text))
The reason rows_update works for my data is that the original data and the table of corrections have the same two columns. To make that work for our fake survey, we could create a column text of missing values1 first:
This is not as elegant as I would have liked, but it gives the raw material to work with: the correct state is state2 if it is not missing, and state otherwise, which is either a coalesce or a use of replace_values, where the code below defines correct_state to be state2 with all its missing values replaced by the corresponding ones in state:
and that gets us there, but I have to say I like rows_update better for this job.
Picture credit
Cover art of the album Same Ground by Brunswick.
Footnotes
Which have to be created as missing text, so that when rows_update fills in the other values, which are text, it doesn’t get upset about trying to make a column out of things that are of two different types.↩︎