Note that the columns of this dataframe have the same names as the ones in the original dataframe wisc.
After some messing about with left_join, I eventually landed on this function from dplyr, which literally applies the corrections:
wisc %>%rows_update(corrections) -> wisc_correct
Matching, by = "location"
wisc_correct
The values to look up (the “keys”) are by default in the first column, which is where they are in corrections. If they had not been, I would have used a by in the same way as with a join.
The new dplyr and something very similar
I was reading the blog post about the most recent update of dplyr, and when I got to replace_values there, I had this great sense of deja vu, because it seems to do almost exactly the same thing. Except that my lookup table is the wrong kind of thing: instead of saying how old values of state go to new values of state, it says that some of the values of state need to be changed according to which value of location they go with.
This seems like a very similar problem, and it is not clear to me whether recode_values or replace_values from the new dplyr can help with it.
Apparently the idea in dplyr came from this post by Libby Heeren. Let me see if I can reproduce the ideas there, but in my own way.
The idea was that there was a survey like this:
fake_survey
and the scores are actually Likert scores, to be mapped to text like this:
likert_mapping
To get a table with the scores and the text that goes with them, Libby uses a technique with !!!, which I don’t understand, but I do understand this:
fake_survey %>%left_join(likert_mapping)
Joining with `by = join_by(score)`
with one missing value corresponding to a response that is not between 1 and 5.
Another way to do this exact thing is with recode_values:
fake_survey %>%mutate(text =recode_values(score, from = likert_mapping$score,to = likert_mapping$text))
This is not as elegant as I would have liked, but it gives the raw material to work with: the correct state is state2 if it is not missing, and state otherwise, which is either a coalesce or a use of replace_values, where the code below defines correct_state to be state2 with all its missing values replaced by the corresponding ones in state:
and that gets us there, but I have to say I like rows_update better for this job.
The reason rows_update works here is that the original data and the table of corrections have the same two columns. To make that work for our fake survey, we could create a blank column text first: