── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Introduction
Making a lot of changes in text, all in one go.
Let’s suppose you have a data frame like this:
d
# A tibble: 5 × 3
x1 x2 y
<chr> <chr> <chr>
1 one two two
2 four three four
3 seven nine eight
4 six eight seven
5 fourteen nine twelve
What you want to do is to change all the even numbers in columns x1 and x2, but noty, to the number versions of themselves, so that, for example, eight becomes 8. This would seem to be a job for str_replace_all, but how to manage the multitude of changes?
Making a lot of changes with str_replace_all
I learned today that you can feed str_replace_all a named vector. Wossat, you say? Well, one of these:
quantile(1:7)
0% 25% 50% 75% 100%
1.0 2.5 4.0 5.5 7.0
The numbers are here the five-number summary; the things next to them, that say which percentile they are, are the names attribute. You can make one of these yourself like this:
x <-1:3x
[1] 1 2 3
names(x) <-c("first", "second", "third")x
first second third
1 2 3
The value of this for us is that you can feed the boatload of potential changes into str_replace_all by feeding it a named vector of the changes it might make.
In our example, we wanted to replace the even numbers by the numeric versions of themselves, so let’s make a little data frame with all of those:
I think this is as high as we need to go. I like a tribble for this so that you can easily see what is going to replace what.
For the named vector, the values are the new values (the ones I called to in changes), while the names are the old ones (from). So let’s construct that. There is one extra thing: I want to replace whole words only (and not end up with something like 4teen, which sounds like one of those 90s boy bands), so what I’ll do is to put “word boundaries”1 around the from values:2