<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Ken&#39;s blog</title>
<link>https://blog.ritsokiguess.site/</link>
<atom:link href="https://blog.ritsokiguess.site/index.xml" rel="self" type="application/rss+xml"/>
<description>Ken&#39;s R Blog</description>
<generator>quarto-1.8.24</generator>
<lastBuildDate>Thu, 19 Feb 2026 05:00:00 GMT</lastBuildDate>
<item>
  <title>Log-linear models</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/log-linear/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>What log-linear models are and how to use them</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>When you have two categorical variables, and you want to test for an association between them, the go-to method is a chi-squared test. For example, you might have a survey of eyewear worn by people who classify themselves as male or female:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/eyewear.txt"</span></span>
<span id="cb3-2">eyewear <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_delim</span>(my_url, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 2 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (1): gender
dbl (3): contacts, glasses, none

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">eyewear</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["gender"],"name":[1],"type":["chr"],"align":["left"]},{"label":["contacts"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["glasses"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["none"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"female","2":"121","3":"32","4":"129"},{"1":"male","2":"42","3":"37","4":"85"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Is there a tendency for one gender to prefer a certain type of eyewear over the other, or, if you know a person’s gender, can you predict what they are likely to wear on their eyes? That is to say, is there an association between eyewear and gender?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">eyewear <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>gender) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chisq.test</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> eyewear_chisq</span>
<span id="cb6-5">eyewear_chisq</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Pearson's Chi-squared test

data:  .
X-squared = 17.718, df = 2, p-value = 0.0001421</code></pre>
</div>
</div>
<p>With a P-value of 0.0001, there certainly is an association.</p>
<p>The above is the way contingency tables are usually laid out, but is not the most convenient for graphs (and, as we see later, for log-linear modelling):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">eyewear <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>gender, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"eyewear"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"count"</span>) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> eyewear1</span>
<span id="cb8-3">eyewear1</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["gender"],"name":[1],"type":["chr"],"align":["left"]},{"label":["eyewear"],"name":[2],"type":["chr"],"align":["left"]},{"label":["count"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"female","2":"contacts","3":"121"},{"1":"female","2":"glasses","3":"32"},{"1":"female","2":"none","3":"129"},{"1":"male","2":"contacts","3":"42"},{"1":"male","2":"glasses","3":"37"},{"1":"male","2":"none","3":"85"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Now we can draw a graph to see what kind of association there is:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(eyewear1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> gender, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> count, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> eyewear)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Females are more likely to wear contacts and less likely to wear glasses than males.</p>
<p>A word about the graph: this is a bar chart, but one with two properties:</p>
<ul>
<li>the frequencies are not counted from the data, but given in a variable, so we use <code>geom_col</code> rather than <code>geom_bar</code>. In <code>geom_col</code>, the <code>y</code> aesthetic contains the frequencies.</li>
<li>the bars of the bar chart are stacked, but also scaled to have unit height (by dividing by the totals for that bar). The <img src="https://latex.codecogs.com/png.latex?y">-axis label <code>count</code> is thus a misnomer; they are proportions rather than counts.</li>
</ul>
<p>In a contingency table, one of the variables is often logically an outcome, <code>eyewear</code> here. Putting the logical explanatory variable as <code>x</code> and the logical outcome as <code>fill</code> enables us to compare the size of the coloured rectangles for each value of the variable on the <img src="https://latex.codecogs.com/png.latex?x">-axis (which is the appropriate way to do it).<sup>1</sup></p>
</section>
<section id="log-linear-models" class="level2">
<h2 class="anchored" data-anchor-id="log-linear-models">Log-linear models</h2>
<p>For two categorical variables, a chi-squared test followed by a graph (if the result is significant, to understand the association) is a straightforward way to go.</p>
<p>But what if we have more than two categorical variables? The picture can be more nuanced, in that there can be associations between only some of the categorical variables, and we now want to do two things: first, find out which categorical variables are associated, and second, for those that are, understand those associations.</p>
<p>The second thing can be done by drawing a graph like the one above, but the first needs a new modelling procedure: a <em>log-linear model</em>. This models the frequencies as having a Poisson distribution, with the log of the frequency having a linear relationship with effects for each categorical variable and possibly their interactions. A log-linear model is a generalized linear model with Poisson family (using the default log link).</p>
<p>One point that can be tricky to understand is that one or more of the categorical variables can be logically a response (like <code>eyewear</code> above), but as far as the modelling is concerned, <em>all categorical variables are explanatory</em>, with the frequency variable being the response.</p>
<p>The starting point is to fit a model including all interactions between (model) explanatory variables, with the frequency variable as (model) response. Let’s illustrate with the eyewear data, using the “long” layout (which is needed for log-linear modelling):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">eyewear1<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gender <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> eyewear,</span>
<span id="cb10-2">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"poisson"</span>,</span>
<span id="cb10-3">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> eyewear1)</span></code></pre></div></div>
</div>
<p>The next step is to see what can be removed from this model, which is most easily done with <code>drop1</code>:<sup>2</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(eyewear1<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"1.953993e-14","3":"47.95815","4":"NA","5":"NA","_rn_":"<none>"},{"1":"2","2":"1.782863e+01","3":"61.78678","4":"17.82863","5":"0.0001344505","_rn_":"gender:eyewear"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The interaction between gender and eyewear is significant, and the P-value is close to the one from the chi-squared test. In this model, this indicates a significant association between the categorical variables whose interaction is significant (in other words, the same conclusion as for the chi-squared test, with almost the same P-value).</p>
<p>Technical aside: the P-values from the log-linear model and from the chi-squared test are very close, but not exactly equal. Letting <img src="https://latex.codecogs.com/png.latex?O"> denote an observed frequency and <img src="https://latex.codecogs.com/png.latex?E"> the corresponding expected frequency, the chi-squared test statistic is the sum over all cells of <img src="https://latex.codecogs.com/png.latex?(O%20-%20E)%5E2%20/%20E">, the so-called Pearson chi-squared, while the log-linear test statistic in this case is the sum of <img src="https://latex.codecogs.com/png.latex?O%20%5Cln(O%20/%20E)">, the likelihood ratio test statistic. These are not the same but are often very similar. End of technical aside.</p>
</section>
<section id="which-airline-to-fly" class="level2">
<h2 class="anchored" data-anchor-id="which-airline-to-fly">Which airline to fly?</h2>
<p>Let’s do a more interesting example. Two airlines, Alaska Airlines and America West, fly into various airports in the west of the US. Each flight is either on time or delayed at its destination. Is one airline more punctual than the other, and does this depend on which airport you are looking at? This time we have three categorical variables: <code>airline</code>, <code>airport</code>, and on time/delayed, which we will call <code>status</code>. I found these data in a textbook, laid out like this:</p>
<pre><code>               Alaska Airlines       America West
Airport       On time    Delayed   On time    Delayed
Los Angeles      497        62        694       117
Phoenix          221        12       4840       415
San Diego        212        20        383        65
San Francisco    503       102        320       129
Seattle         1841       305        201        61

Total           3274       501       6438       787</code></pre>
<p>This is not modelling-friendly, and not even tidy (in the <code>tidyverse</code> sense of the term). There is a further problem in that we have two rows of headers (often the way a three way table is displayed). I condensed the two rows of headers into one:</p>
<pre><code>airport    aa_ontime aa_delayed aw_ontime aw_delayed
LosAngeles   497          62       694        117
Phoenix      221          12      4840        415
SanDiego     212          20       383         65
SanFrancisco 503         102       320        129
Seattle     1841         305       201         61</code></pre>
<p>which poses an additional problem in a moment, but at least we can now read in the data:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/airlines.txt"</span></span>
<span id="cb14-2">airlines <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_table</span>(my_url)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>
── Column specification ────────────────────────────────────────────────────────
cols(
  airport = col_character(),
  aa_ontime = col_double(),
  aa_delayed = col_double(),
  aw_ontime = col_double(),
  aw_delayed = col_double()
)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">airlines</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["airport"],"name":[1],"type":["chr"],"align":["left"]},{"label":["aa_ontime"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["aa_delayed"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["aw_ontime"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["aw_delayed"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"LosAngeles","2":"497","3":"62","4":"694","5":"117"},{"1":"Phoenix","2":"221","3":"12","4":"4840","5":"415"},{"1":"SanDiego","2":"212","3":"20","4":"383","5":"65"},{"1":"SanFrancisco","2":"503","3":"102","4":"320","5":"129"},{"1":"Seattle","2":"1841","3":"305","4":"201","5":"61"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>We need to get each of the three categorical variables in its own column, with one column of frequencies. This uses the variant of <code>pivot_longer</code> with <em>two</em> variable names to go into <code>names_to</code>, separated by an underscore:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">airlines <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-2">   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>airport, </span>
<span id="cb17-3">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"airline"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"status"</span>), </span>
<span id="cb17-4">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_"</span>, </span>
<span id="cb17-5">                <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"freq"</span> ) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> punctual</span>
<span id="cb17-6">punctual</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["airport"],"name":[1],"type":["chr"],"align":["left"]},{"label":["airline"],"name":[2],"type":["chr"],"align":["left"]},{"label":["status"],"name":[3],"type":["chr"],"align":["left"]},{"label":["freq"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"LosAngeles","2":"aa","3":"ontime","4":"497"},{"1":"LosAngeles","2":"aa","3":"delayed","4":"62"},{"1":"LosAngeles","2":"aw","3":"ontime","4":"694"},{"1":"LosAngeles","2":"aw","3":"delayed","4":"117"},{"1":"Phoenix","2":"aa","3":"ontime","4":"221"},{"1":"Phoenix","2":"aa","3":"delayed","4":"12"},{"1":"Phoenix","2":"aw","3":"ontime","4":"4840"},{"1":"Phoenix","2":"aw","3":"delayed","4":"415"},{"1":"SanDiego","2":"aa","3":"ontime","4":"212"},{"1":"SanDiego","2":"aa","3":"delayed","4":"20"},{"1":"SanDiego","2":"aw","3":"ontime","4":"383"},{"1":"SanDiego","2":"aw","3":"delayed","4":"65"},{"1":"SanFrancisco","2":"aa","3":"ontime","4":"503"},{"1":"SanFrancisco","2":"aa","3":"delayed","4":"102"},{"1":"SanFrancisco","2":"aw","3":"ontime","4":"320"},{"1":"SanFrancisco","2":"aw","3":"delayed","4":"129"},{"1":"Seattle","2":"aa","3":"ontime","4":"1841"},{"1":"Seattle","2":"aa","3":"delayed","4":"305"},{"1":"Seattle","2":"aw","3":"ontime","4":"201"},{"1":"Seattle","2":"aw","3":"delayed","4":"61"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and now we are ready to go.</p>
<p>The first stage is a log-linear model with the three-way interaction between <code>airport</code>, <code>airline</code>, and <code>status</code>, using <code>freq</code> as the (model) response:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">punctual<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(freq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> airport <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> airline <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> status,</span>
<span id="cb18-2">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"poisson"</span>,</span>
<span id="cb18-3">                  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> punctual)</span></code></pre></div></div>
</div>
<p>and then we see what if anything we can get rid of. <code>drop1</code> is nicer than <code>summary</code> for two reasons: (i) it only displays what actually can be dropped (the highest order interactions in this kind of model), (ii) when a categorical variable has more than two levels (such as <code>airport</code> here), <code>summary</code> only shows how each level compares to the baseline, not whether there is an effect of that categorical variable as a whole.</p>
<p>So:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(punctual<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"5.218048e-13","3":"183.4348","4":"NA","5":"NA","_rn_":"<none>"},{"1":"4","2":"3.216569e+00","3":"178.6513","4":"3.216569","5":"0.5222589","_rn_":"airport:airline:status"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The three-way interaction is not significant, so that term can be removed from the model. I like <code>update</code> for this task, because the model is complicated enough that copying, pasting, and editing the previous <code>glm</code> code risks fitting the wrong model next:<sup>3</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">punctual<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(punctual<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> airport<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>airline<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>status)</span>
<span id="cb20-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(punctual<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"3.216569","3":"178.6513","4":"NA","5":"NA","_rn_":"<none>"},{"1":"4","2":"6432.454138","3":"6599.8889","4":"6429.23757","5":"0.000000e+00","_rn_":"airport:airline"},{"1":"4","2":"240.107798","3":"407.5426","4":"236.89123","5":"4.334046e-50","_rn_":"airport:status"},{"1":"1","2":"45.465141","3":"218.8999","4":"42.24857","5":"8.037869e-11","_rn_":"airline:status"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>There are three two-way interactions up for grabs, but they are <em>all significant</em>, so this is where we stop. The modelling indicates there are significant associations between airport and airline, airport and status, airline and status.</p>
<p>As far as the modelling is concerned, all three categorical variables are “explanatory”, but from a logical point of view, <code>status</code> seems like an outcome: particular airlines or airports cause a flight to be on time or delayed. Our main interest in this kind of modelling is “what is associated with the logical outcome?”. The log-linear modelling says that there is an effect of airline on status, and also a separate effect of airport on status (but not an effect of the <em>combination</em> of airline and airport on status, because the <code>airport:airline:status</code> interaction was not significant).</p>
<p>The airline - status association was what interested us most, so let’s go after that, remembering that in our graphs, the logical outcome should be <code>fill</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airline, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb21-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-13-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This is a bit hard to see, so let’s zoom in on the <img src="https://latex.codecogs.com/png.latex?y">-axis:</p>
<div id="fig-airline-one" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-airline-one-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airline, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_cartesian</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-airline-one-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>Figure&nbsp;1 says that Alaska Airlines is a bit less punctual than America West overall, about 87% on time vs.&nbsp;about 89% on time. So, we seem to have an answer to our question. But we have two other significant associations to explore. Bear in mind that the graph above is “collapsed” over airports.</p>
<p>The next most interesting association is the other one with <code>status</code>, namely airport by status:</p>
<div id="fig-airline-two" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-airline-two-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airport, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-airline-two-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2
</figcaption>
</figure>
</div>
<p>Figure&nbsp;2 says that the airports differ a fair bit in terms of punctuality, from over 95% on time at Phoenix down to only just over 75% at San Francisco.<sup>4</sup></p>
<p>So, the airports are different in terms of punctuality, but the status-airline graph above aggregates over the different airports. Some of you may be beginning to feel uneasy at this point.</p>
<p>It is usually only associations with the logical outcome variable that concern us, but we have one more association that we can investigate, airport by airline. Neither of these are logical outcomes, so it doesn’t matter which is <code>x</code> and which is <code>fill</code>, but there are more airports than airlines, so I’ll put airports on the <img src="https://latex.codecogs.com/png.latex?x">-axis:</p>
<div id="fig-airline-three" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-airline-three-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airport, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> airline)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb24-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-airline-three-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3
</figcaption>
</figure>
</div>
<p>The airports are actually <em>very</em> different in terms of which of the airlines fly into them: almost all of the flights into Phoenix are America West, and most of the flights into San Francisco and Seattle are Alaska Airlines.</p>
<p>Figure&nbsp;2 and Figure&nbsp;3 tell us that there is a potential bias here: America West fly mostly into Phoenix, where it is easier to be on time, and Alaska Airlines fly mostly into Seattle and San Francisco, where it is more difficult to be on time. Maybe the punctuality picture for Alaska Airlines is better than Figure&nbsp;1 suggests?</p>
<p>Let’s compare apples with apples: make a graph of the punctuality of each airline <em>at each airport</em>, since the airports do differ one from another:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airline, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> airport)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Once again, zooming in on the <img src="https://latex.codecogs.com/png.latex?y">-axis seems to be the thing:</p>
<div id="fig-airline-four" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-airline-four-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(punctual, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> airline, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> status)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> airport) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb26-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coord_cartesian</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ylim =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>))</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-airline-four-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4
</figcaption>
</figure>
</div>
<p>and now we see that Alaska Airlines is <em>more</em> punctual at every single airport than America West, despite being less punctual overall! In other words, we have unearthed a Simpson’s Paradox.</p>
<p>Figure&nbsp;4 is the kind of thing we would draw to investigate a <em>three</em>-way association. In this case, <code>airport:airline:status</code> was not significant so we didn’t look at this graph before. The kind of thing that would have made the three-way association significant is if one airline was more punctual than the other at some airports and less punctual at others. Here, Alaska Airlines is more punctual at every airport, and, in Figure&nbsp;4, by about the same amount everywhere, so that the three-way interaction was not significant. What prompted us to look at this graph was the potential for bias by aggregating over airports in Figure&nbsp;1 that seemed to be very different from each other in Figure&nbsp;2 and Figure&nbsp;3. That small difference in punctuality between airlines in Figure&nbsp;1 was significant, but not in the way we thought!</p>
</section>
<section id="ovarian-cancer-a-four-way-table" class="level2">
<h2 class="anchored" data-anchor-id="ovarian-cancer-a-four-way-table">Ovarian cancer, a four-way table</h2>
<p>In a 1973 retrospective study, 299 women who had previously been operated on for ovarian cancer were assessed, and four variables measured:</p>
<ul>
<li><code>stage</code> of cancer at time of operation (early or advanced)</li>
<li>type of <code>operation</code> (radical or limited)</li>
<li>X-ray treatment received (<code>xray</code>) (yes or no)</li>
<li>10-year <code>survival</code> (yes or no)</li>
</ul>
<p>Logically, <code>survival</code> is the outcome, so we will be mainly interested in associations with that.</p>
<p>The data, no tidying needed this time:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/cancer.txt"</span></span>
<span id="cb27-2">cancer <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_delim</span>(my_url, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 16 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (4): stage, operation, xray, survival
dbl (1): freq

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">cancer </span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["stage"],"name":[1],"type":["chr"],"align":["left"]},{"label":["operation"],"name":[2],"type":["chr"],"align":["left"]},{"label":["xray"],"name":[3],"type":["chr"],"align":["left"]},{"label":["survival"],"name":[4],"type":["chr"],"align":["left"]},{"label":["freq"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"early","2":"radical","3":"no","4":"no","5":"10"},{"1":"early","2":"radical","3":"no","4":"yes","5":"41"},{"1":"early","2":"radical","3":"yes","4":"no","5":"17"},{"1":"early","2":"radical","3":"yes","4":"yes","5":"64"},{"1":"early","2":"limited","3":"no","4":"no","5":"1"},{"1":"early","2":"limited","3":"no","4":"yes","5":"13"},{"1":"early","2":"limited","3":"yes","4":"no","5":"3"},{"1":"early","2":"limited","3":"yes","4":"yes","5":"9"},{"1":"advanced","2":"radical","3":"no","4":"no","5":"38"},{"1":"advanced","2":"radical","3":"no","4":"yes","5":"6"},{"1":"advanced","2":"radical","3":"yes","4":"no","5":"64"},{"1":"advanced","2":"radical","3":"yes","4":"yes","5":"11"},{"1":"advanced","2":"limited","3":"no","4":"no","5":"3"},{"1":"advanced","2":"limited","3":"no","4":"yes","5":"1"},{"1":"advanced","2":"limited","3":"yes","4":"no","5":"13"},{"1":"advanced","2":"limited","3":"yes","4":"yes","5":"5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Strap yourself in. Step 1:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(freq <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> stage <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> operation <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> xray <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> survival,</span>
<span id="cb30-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> cancer, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">family =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"poisson"</span>)</span>
<span id="cb30-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"1.554312e-14","3":"98.12961","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"6.026558e-01","3":"96.73227","4":"0.6026558","5":"0.4375665","_rn_":"stage:operation:xray:survival"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The four-way interaction (thankfully) comes out:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, </span>
<span id="cb31-2">                   . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> stage<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>operation<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>xray<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb31-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"0.6026558","3":"96.73227","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"2.3575888","3":"96.48720","4":"1.7549331","5":"0.1852578","_rn_":"stage:operation:xray"},{"1":"1","2":"1.1773024","3":"95.30692","4":"0.5746466","5":"0.4484184","_rn_":"stage:operation:survival"},{"1":"1","2":"0.9557671","3":"95.08538","4":"0.3531113","5":"0.5523571","_rn_":"stage:xray:survival"},{"1":"1","2":"1.2337838","3":"95.36340","4":"0.6311281","5":"0.4269418","_rn_":"operation:xray:survival"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>None of the three-way interactions are significant (at least in this model), so we remove the least significant of them:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.2</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> stage<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>xray<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb32-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"0.9557671","3":"95.08538","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"3.0866591","3":"95.21627","4":"2.1308920","5":"0.1443567","_rn_":"stage:operation:xray"},{"1":"1","2":"1.5660529","3":"93.69567","4":"0.6102858","5":"0.4346802","_rn_":"stage:operation:survival"},{"1":"1","2":"1.5512410","3":"93.68085","4":"0.5954739","5":"0.4403102","_rn_":"operation:xray:survival"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and none of <em>these</em> are significant, so remove the least significant of them:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.4</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.3</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> operation<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>xray<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb33-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"1.551241","3":"93.68085","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"1.697682","3":"91.82729","4":"0.1464406","5":"0.70196030","_rn_":"xray:survival"},{"1":"1","2":"6.841961","3":"96.97157","4":"5.2907197","5":"0.02143936","_rn_":"stage:operation:xray"},{"1":"1","2":"1.931103","3":"92.06072","4":"0.3798619","5":"0.53767715","_rn_":"stage:operation:survival"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Now there is a two-way interaction up for grabs, and finally something is significant. The two-way interaction has the highest P-value, so out it comes:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.4</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> xray<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb34-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"1.697682","3":"91.82729","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"6.927690","3":"95.05730","4":"5.2300086","5":"0.02220042","_rn_":"stage:operation:xray"},{"1":"1","2":"2.024220","3":"90.15383","4":"0.3265384","5":"0.56770454","_rn_":"stage:operation:survival"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and on we go:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.6</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> stage<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>operation<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb35-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"2.024220","3":"90.15383","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"135.197636","3":"221.32725","4":"133.173416","5":"8.284940e-31","_rn_":"stage:survival"},{"1":"1","2":"4.115730","3":"90.24534","4":"2.091510","5":"1.481196e-01","_rn_":"operation:survival"},{"1":"1","2":"7.254229","3":"93.38384","4":"5.230009","5":"2.220042e-02","_rn_":"stage:operation:xray"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>It looks as if we are beginning to get somewhere. One non-significant effect here:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.7</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">update</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.6</span>, . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> . <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> operation<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>survival)</span>
<span id="cb36-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop1</span>(cancer<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">test =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chisq"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["Df"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Deviance"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["AIC"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["LRT"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Pr(>Chi)"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"NA","2":"4.115730","3":"90.24534","4":"NA","5":"NA","_rn_":"<none>"},{"1":"1","2":"136.729112","3":"220.85872","4":"132.613382","5":"1.098503e-30","_rn_":"stage:survival"},{"1":"1","2":"9.345738","3":"93.47535","4":"5.230009","5":"2.220042e-02","_rn_":"stage:operation:xray"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and now everything is significant, so we breathe a sigh of relief and see what’s left. Out of the two remaining significant associations, only one involves our logical outcome <code>survival</code>, and that is <code>stage:survival</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(cancer, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> stage, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> survival)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-27-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>that is to say, most of the patients whose cancer was in the early stage survived for 10 years, and most of those whose cancer was in the advanced stage did not.<sup>5</sup></p>
<p>The people who collected these data were probably looking for an effect of the actual treatments, but both <code>xray:survival</code> and <code>operation:survival</code> disappeared from our model by virtue of not being significant, so there is no evidence of either the X-ray treatment or the type of operation having any effect on survival.</p>
<p>That other association is of less interest, but we can still look at it:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(cancer, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> stage, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> freq, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> xray)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb38-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fill"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> operation)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/log-linear/index_files/figure-html/unnamed-chunk-28-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>If a patient had the limited operation, whether or not they also had the X-ray treatment depended on the stage of cancer (more likely if it was advanced). On the other hand, if the patient had the radical operation, whether or not they had the X-ray treatment had nothing to do with the stage of cancer.</p>
<p>This last, of course, has nothing to do with survival.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>“For each type of eyewear, this many of the respondents are male or female” seems to make less sense.↩︎</p></li>
<li id="fn2"><p>This is a generalized linear model, so the right test is based on the deviance / likelihood ratio, not <img src="https://latex.codecogs.com/png.latex?F">.↩︎</p></li>
<li id="fn3"><p>The next model is actually <code>freq ~ (airport + airline + status)^2</code> in R notation: that is, including all the two-way interactions but no higher.↩︎</p></li>
<li id="fn4"><p>Canadians of a certain age will remember “San Francisky? So how did you came? Did you drove or did you flew?”, from (a much younger) Eugene Levy’s character Sid Dithers on SCTV. I personally can’t believe how much Dan Levy’s character on Schitt’s Creek looks like his dad used to look.↩︎</p></li>
<li id="fn5"><p>Not exactly an earth-shattering conclusion.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/log-linear/</guid>
  <pubDate>Thu, 19 Feb 2026 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/log-linear/Screenshot from 2026-02-19 15-38-47.png" medium="image" type="image/png" height="47" width="144"/>
</item>
<item>
  <title>Quarto websites and Codeberg pages (part 2)</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Putting a Quarto website on your own domain using Codeberg and <code>grebedoc.dev</code>.</p>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>In <a href="../../posts/quarto-codeberg/index.html">Part 1</a>, I talked about sharing your Quarto website with the world using Codeberg Pages. But, what if you have your own domain, and you want your website to appear there? My recommendation is to use the service <code>grebedoc.dev</code>. There is some setup involved, but I will take you through that.</p>
</section>
<section id="one-big-website-or-several-small-ones" class="level2">
<h2 class="anchored" data-anchor-id="one-big-website-or-several-small-ones">One big website or several small ones?</h2>
<p>Before we get to details, it is worth making a design decision: do you want one website that grows lots of pieces as your site expands, or do you want several small websites that you can link together?</p>
<p>I went the first way, but eventually came to regret my decision, because the big website had to be re-rendered as a whole every time I made a change anywhere, and that began to take a while. I teach at a university, so my website naturally has one part for each course I teach, as well as one for the program I coordinate. This was originally one Quarto website, with each part in a subfolder. This works perfectly well, but, as I say, was starting to get unwieldy, especially given that I was typically making changes to only one part of it at a time.</p>
<p>Andrew Heiss teaches at Georgia State, and has a very big website on his own domain. I looked at how he had arranged the <a href="https://www.andrewheiss.com/teaching/">teaching part</a> of his website, and I saw that he had used a subdomain <code>classes</code> for all the classes he teaches, and had separate subdomains within that for each instance of each class. I didn’t want to go quite that far, but the idea of mapping each small website to a subdomain seemed like a good idea, and I could then have one website per class and re-render only the one I was working on. It took some back and forth with the person behind <code>grebedoc.dev</code>, but I now have that working (for example, this blog is one such small website), and I will talk about how I got it working.</p>
</section>
<section id="one-website-at-the-root-of-your-domain" class="level2">
<h2 class="anchored" data-anchor-id="one-website-at-the-root-of-your-domain">One website at the root of your domain</h2>
<p>Let’s suppose you own <code>mydomain.com</code> (adapt the below to your situation) and you have a Quarto website that you would like to appear when somebody visits <code>mydomain.com</code>. This could be (as discussed above) one big website with things in subfolders, or it could be a small one with links to all your other websites and, perhaps, your “About Me” page.</p>
<p>There are two things to consider:</p>
<ul>
<li>getting the DNS records<sup>1</sup> for your domain to point to the right place (there are about three parts to this)</li>
<li>setting up a “webhook” in Codeberg to notify the server whenever you push an update to your website.</li>
</ul>
<p>Your domain has a domain registrar (the people you pay money to each year to keep your domain yours). Mine is <code>namecheap.com</code>; <code>godaddy.com</code> is another one. When you log into your domain registrar’s website, you’ll see a list of the domains you have registered with them. I own two domains, so mine looks like this:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 18-09-52.png" class="img-fluid"></p>
<p>When a visitor to your domain types <code>mydomain.com</code> into their web browser, their browser needs to know what to show them in response. This is accomplished by converting the address the user typed into an IP address. <code>grebedoc.dev</code> is going to do the work of making your website visible to the world, so it is Grebedoc’s IP address that you need to provide. There are actually two forms of IP address: IPv4, the four numbers separated by dots that you are probably familiar with, and the newer IPv6, which looks like hex codes separated by colons.</p>
<p>To tell your domain registrar which IP address(es) it should map your domain to, you need to find the DNS settings for your domain. On <code>namecheap.com</code>, I click the Manage button next to the domain name, and then Advanced DNS.</p>
<p>A DNS “record” (probably listed as a row in the table you are looking at) has three parts:</p>
<ul>
<li>the Type</li>
<li>the Host</li>
<li>the Value</li>
</ul>
<p>To set the IPv4 address, you need an A record. Find out how you add a new record (mine has a button under the table); there might be a dropdown where you can select the Type of record you want to add. For your A record, set</p>
<ul>
<li>Type as <code>A</code></li>
<li>Host as <code>@</code></li>
<li>Value as <code>185.187.152.7</code></li>
</ul>
<p>The <code>@</code> means, loosely, “the root of my website”. The Value given here is, at this writing, the IPv4 address of <code>grebedoc.dev</code>.</p>
<p>Now we set the IPv6 address. This needs an AAAA record:</p>
<ul>
<li>Type as <code>AAAA</code></li>
<li>Host as <code>@</code></li>
<li>Value as <code>2a05:b0c4:4:1::3</code></li>
</ul>
<p>This last is the IPv6 address of <code>grebedoc.dev</code> (at this writing).</p>
<p>There is a third thing to add, which is how Grebedoc knows that you actually control the domain (otherwise, anyone could put up a website on your domain, or take down the one you put up there). In the <a href="https://grebedoc.dev/">Grebedoc docs</a>, this is called Method A. For this, you need:</p>
<ul>
<li>Type as <code>TXT</code></li>
<li>Host as <code>_git-pages-repository</code></li>
<li>Value as <code>https://codeberg.org/username/reponame.git</code></li>
</ul>
<p>The last one is the Git HTTPS clone URL of the Codeberg repo where your rendered website lives. Substitute appropriately for your situation. Be careful to get the right Host; there should be an underscore at the front, and minuses between each word. Here are mine, after I am finished (the Host for the TXT record only shows part of the value):<sup>2</sup></p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 18-45-10.png" class="img-fluid"></p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 18-45-31.png" class="img-fluid"></p>
<p>Save all of these changes.</p>
<p>Changes to the DNS settings can take a few minutes to propagate. If you want to find out when this has happened, you can use the <code>dig</code> tool. When everything is working, you should see something equivalent to this:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 18-50-18.png" class="img-fluid"></p>
<p><code>dig</code> requires a domain name, possibly with a subdomain on the front, and the Type of DNS record you want to look up. You should see an ANSWER SECTION in the output. For the A record, this should have your domain name first, and the Grebedoc IP address at the end. For the TXT record, the ANSWER SECTION should have the HTTPS git clone address of your Codeberg repo.</p>
<p>Before the propagation has completed, there will be either no answer section, or it will show something else.</p>
<p>There is one more thing to do, and that is to set things up so that whenever you push your website, the server is notified and the website your users see is updated. This is done by using a Webhook in Codeberg (analogous to a Github Action). Go back to your repo in Codeberg. Click on Settings, and then Webhooks, and then Add Webhook, and then Forgejo. There are two things to change: the Target URL at the top, which should be the <code>http</code> version<sup>3</sup> of your domain, as in <code>http://mydomain.com/</code> with a trailing slash. Go down to Branch filter, and change that to <code>pages</code> (because you are using the pages branch, and when <em>that</em> changes, the server needs to be notified). Click Update webhook.</p>
<p>Now, the next time you push your website, it should appear at your domain. It can take a few minutes for everything to come together. If you need to troubleshoot, make sure the <code>dig</code> output above makes sense, and make sure the webhook is working properly. To do that, go to Settings and Webhooks again. This time, you should see your domain listed among the webhooks. Click on the pencil on the right. At the bottom, you should see one or more Recent Deliveries (one for each push):</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 19-22-25.png" class="img-fluid"></p>
<p>If all went well, you should see a green checkmark next to the most recent of these, and clicking on the blue commit ID will give you more detail. You want the Response code to be a green 200 (“all good”), or at least one of the other 200 codes. I sometimes get a 202 response, which means that it took longer than expected, but it is working. This one is A-OK:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 19-24-21.png" class="img-fluid"></p>
<p>And now, with a little luck, you can visit your domain and see your website!</p>
</section>
<section id="using-subdomains" class="level2">
<h2 class="anchored" data-anchor-id="using-subdomains">Using subdomains</h2>
<p>I mentioned earlier that you can have one big website or lots of little ones. To do the latter, you map each little website to a subdomain of your domain. For example, the website for the course I teach starting next week is at <a href="http://stad29.ritsokiguess.site" class="uri">http://stad29.ritsokiguess.site</a>.</p>
<p>The process for doing this is a lot like the one for setting up a website at the root of your domain:</p>
<ul>
<li>setting two DNS records</li>
<li>setting up a webhook in the Codeberg repository</li>
</ul>
<p>As before, changes to the DNS records will take time to propagate, so that is the thing to do first. This time, we need a CNAME record and another TXT record:</p>
<ul>
<li>Type as <code>CNAME</code></li>
<li>Host: the name of your subdomain (only), such as <code>blog</code></li>
<li>Value as <code>grebedoc.dev.</code></li>
</ul>
<p>The Value needs to have a dot on the end. Then:</p>
<ul>
<li>Type as <code>TXT</code></li>
<li>Host as <code>_git-pages-repository</code> followed by a dot followed by the name of your subdomain (only), for example <code>_git-pages-repository.blog</code></li>
<li>Value as <code>https://codeberg.org/username/reponame.git</code>: the repo containing the website you want to map to the subdomain (in my case, <code>https://codeberg.org/nxskok/blog.git</code>)</li>
</ul>
<p>Save these. You can use <code>dig</code> to check on progress:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 19-42-03.png" class="img-fluid"></p>
<p>and (noting the input for the second one)</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 19-43-28.png" class="img-fluid"></p>
<p>The final step is the webhook. This is the same as before, except that you include the subdomain. This is the one for my blog:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Screenshot from 2026-01-01 21-20-33.png" class="img-fluid"></p>
<p>Push to this repo, maybe wait a few minutes, and then check to see what the website at your subdomain looks like.</p>
</section>
<section id="reminder" class="level2">
<h2 class="anchored" data-anchor-id="reminder">Reminder</h2>
<p>If you have many little websites, the logical way to link them together is to have your “main” site (the one at the root of your domain) have links to all the subdomain sites, and for the subdomain sites to have a link back to the main one. Bear in mind, though, that these are independent sites, so these links should be actual URLs rather than local links.</p>
<p>On your main site, you might do this with a navigation bar, a <code>navbar</code> in Quarto jargon. My main site has a <code>_quarto.yml</code> file that looks like this:</p>
<pre><code>project:
  type: website
  output-dir: .

website:
  title: "Ken's website"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: http://programs-courses.ritsokiguess.site/
        text: Programs and courses
      - href: http://stac32.ritsokiguess.site/
        text: STAC32
      - href: http://stac33.ritsokiguess.site/
        text: STAC33
      - href: http://stad29.ritsokiguess.site/
        text: STAD29
      - href: http://datafiles.ritsokiguess.site/
        text: Data files
      - href: http://pasias.ritsokiguess.site/
        text: PASIAS
      - href: http://lecture-notes.ritsokiguess.site/
        text: Lecture notes
      - href: http://blog.ritsokiguess.site/
        text: Blog
      - text: "Quercus"
        href: http://q.utoronto.ca

  sidebar:
    contents: auto

format:
  html:
    theme: cosmo
    css: styles.css
    toc: true</code></pre>
<p>Each of the entries in the <code>navbar</code> is a link to one of the subdomain websites, so the <code>href</code> part of the link is an actual URL. (If you have one big website, these links would be to files in the same folder system, but in my situation that is no longer the case.)</p>
</section>
<section id="credits" class="level2">
<h2 class="anchored" data-anchor-id="credits">Credits</h2>
<p>Photo credit: Bengt Nyman on <a href="https://commons.wikimedia.org/wiki/File:Podiceps_Cristatus_2015-5786.jpg">Wikimedia Commons</a>. It is a Great Crested Grebe; “grebedoc” is Codeberg backwards.)</p>
<p>Thanks to Catherine “Whitequark” for putting <code>grebedoc.dev</code> together.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>Don’t worry, I’ll get to what these are and what you need to do with them.↩︎</p></li>
<li id="fn2"><p>My repo <code>web-all</code> is the “front page” to my website, with links to everything else, so it is small.↩︎</p></li>
<li id="fn3"><p>The <a href="grebedoc.dev">grebedoc.dev</a> docs explain why this is, and when you can change it to <code>https</code>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>websites</category>
  <guid>https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/</guid>
  <pubDate>Thu, 01 Jan 2026 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/quarto-codeberg-part-2/Podiceps_Cristatus_2015-5786.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Quarto websites and Codeberg pages (part 1)</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/quarto-codeberg/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Considerations in getting Quarto websites and Codeberg to play nicely</p>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>It is New Year’s Eve, and it is snowing here in Toronto, so I would rather be inside writing this.</p>
<p>One of the things Quarto can produce is a static website (meaning, it turns your content into a website that a user can look at but not interact with). This is ideal for websites for courses or workshops, where the idea is to provide material for your students to work with, or (at least in part) for a blog, where the idea is to share your ideas with the world. (In a blog, you might want to enable comments, but that is another story that we do not get into here.) A standard way to share these with the world was to use Github Pages, where you set your website up as a Github repo, and then configured the Pages part of Settings, and got a URL that would display the site.</p>
<p>There are, these days, good reasons <a href="https://sanctum.geek.nz/why-not-github.html">not to use Github</a>. There are other places to host your code that are not owned by big corporations (Microsoft, in the case of Github) and are not in the US. Some of these are based on <a href="https://forgejo.org/">Forgejo</a>, such as <a href="https://codeberg.org/">codeberg.org</a>, hosted in Germany, and <a href="https://about.worktree.ca/">worktree.ca</a>, hosted in Canada. I use both of these two. You can also host Forgejo yourself. These all still use Git, so you can handle a lot of things the same way you are used to (commit, push, pull, etc).</p>
<p>One thing that Github Pages has going for it is that the mechanism is fairly simple, and Quarto has a “publish gh-pages” that will handle most of the details for you. Codeberg has its own equivalent, Codeberg Pages, that works a bit differently. The website you want to publish has to be two things:</p>
<ul>
<li>on a branch called <code>pages</code></li>
<li>in the root folder of the repo.</li>
</ul>
<p>I have a couple of ways of making this work, which may not be elegant, but they appear to work. The elegance, I leave to others.</p>
<p>In part 2, I discuss how you can use your own domain with all of this.</p>
</section>
<section id="website-and-source-in-the-same-place" class="level2">
<h2 class="anchored" data-anchor-id="website-and-source-in-the-same-place">Website and source in the same place</h2>
<p>This is about the least aesthetic way to do it: have your <code>.qmd</code> files and the <code>.html</code> files produced from them in the same folder. This means that when you open up your website project in R Studio or wherever, you will see your <code>.html</code> files mixed in amongst your source. If you are prepared to put up with this, I think this is the easiest way to go.</p>
<p>When you create a Quarto website, you get a file called <code>_quarto.yml</code> in which lives the design of your website. Here is the top of one of mine:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg/Screenshot from 2025-12-31 15-20-04.png" class="img-fluid"></p>
<p>The key thing is the <code>output-dir: .</code> — this means to build the website in the same folder as your <code>.qmd</code> files with your content. This solves the second Codeberg Pages bullet point above: it doesn’t matter that there are other files in with the <code>.html</code> ones, because the <code>.html</code> files are the ones that will be shown on your website.</p>
<p>The other thing is that first bullet point: “on a branch called <code>pages</code>”. Branches in <code>git</code> used to scare me, because the discussion of them you would find online tended also to get mixed up with “pull requests” and dealing with other people working on a project, neither of which are likely to apply to you working solo on a website. I discovered that you can have just one branch called <code>pages</code> and work on it all the time.</p>
<p>If you have used <a href="https://happygitwithr.com/">Happy Git with R</a>, the mechanism you start with on Codeberg is like the one in “existing project, Github first” (Chapter 16 there). On Codeberg, you <em>have to</em> create a repo there first, and only then put stuff in it, even if you have a project in R Studio that maybe already has been attached to <code>git</code>.</p>
<p>So, log into Codeberg, and create a new repository by clicking on the + sign top right and then New Repository. When you do that and give it a name, you’ll be greeted by this:<sup>1</sup></p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg/Screenshot from 2025-12-31 15-46-09.png" class="img-fluid"></p>
<p>These are your instructions for attaching your R Studio website to Codeberg. I find the SSH approach easier, so this is that (see <a href="https://happygitwithr.com/">Happy Git with R</a>, chapters 9 and 10, for discussion of SSH vs HTTPS). Click the button to see the other one. However, we are going to set up a <code>pages</code> branch right away, so what you need to do is replace each instance of <code>main</code> in these instructions with <code>pages</code>. If your website is already a (local) git repository, add <code>git switch -c pages</code> before the two lines shown (and, if you like, get rid of your other branch which is probably called <code>main</code>). Remember that the last thing you do is <code>git push -u origin pages</code>, which will connect the local branch <code>pages</code> that you just created with the remote branch of the same name on Codeberg.</p>
<p>So now I have a local project, which I called <code>disposable1</code>, with (more or less) the template website in it, which I rendered, and then the whole thing was pushed to Codeberg.</p>
<p>The final step is to use Codeberg pages to share the website with the world. The documentation for Codeberg Pages is <a href="https://docs.codeberg.org/codeberg-pages/">here</a>. The key URL format is this one: <code>https://username.codeberg.page/reponame/</code>, so now I should be able to point my browser at <code>https://nxskok.codeberg.page/disposable/</code>, and this is what I see:</p>
<p><img src="https://blog.ritsokiguess.site/posts/quarto-codeberg/Screenshot from 2025-12-31 16-21-31.png" class="img-fluid"></p>
<p>There are no special Settings requirements for this: any Codeberg repository with a <code>pages</code> branch and an <code>index.html</code> file in the root directory can be accessed in this way.</p>
</section>
<section id="website-and-source-in-different-places" class="level2">
<h2 class="anchored" data-anchor-id="website-and-source-in-different-places">Website and source in different places</h2>
<p>The approach described above works<sup>2</sup> for most of my websites, but when I came to reorganize my blog as a Quarto blog website, it didn’t seem to like having the content and the <code>html</code> files in the same place. This is possibly because a Quarto blog has a structure that expects certain files to be in certain places, and it doesn’t like you tinkering with it. When you render a blog, or any other Quarto website that does <em>not</em> have an <code>output-dir:</code> line in <code>_quarto.yml</code>, the rendered site ends up in its entirety in a folder <code>_site</code> below the root of your R Studio project.</p>
<p>You’ll remember that for Codeberg Pages to work, the website has to be in the root folder of the repo, and now it is not. So we need a different solution. The simplest one I could come up with is to maintain <em>two</em> repos, one with the source code and the website in the <code>_site</code> folder, and the other with just the contents of the <code>_site</code> folder, with an <code>index.html</code> at the root.</p>
<p>Using my blog as an example, the two repos are called <code>blog1</code> with the source <code>.qmd</code> files, and <code>blog</code> with the rendered website.</p>
<p><code>blog1</code> is a Quarto website project, in the same way that <code>disposable1</code> was. This, however, is not going to be shared with the world, so you can work on the <code>main</code> branch as usual, and it could be a private repo or not on Codeberg (I keep <code>blog1</code> on <code>worktree.ca</code>). Create this repo wherever you are going to keep it, and follow the instructions to connect it to your local repo, as written (with the <code>main</code> branch). The other thing to say about this is that the <code>_site</code> folder that will be created each time you render your website does <em>not</em> need to be under version control (it can always be rebuilt), so the <code>_site</code> folder should be in <code>.gitignore</code> for this repository.</p>
<p>The local version of the second repository should be a subfolder in the same containing folder that <code>blog1</code> is a subfolder of. (I have a folder called <code>r-projects</code> that all my R projects live in, so that both <code>blog1</code> and <code>blog</code> are subfolders of <code>r-projects</code>.) This is <em>not</em> an R project of any sort, so we will use the command line to work with it. I will be using <code>bash</code>; adjust as necessary to your setup. To start with, <code>blog</code> should be an empty folder.</p>
<p>Then fire up a command line (terminal), and put yourself in the <code>blog</code> folder. (As a check, <code>ls</code> should return no files.) Make sure that you rendered the website <code>blog1</code>, so that <code>blog1</code> has a subfolder called <code>_site</code>. Then copy the files in <code>_site</code> into <code>blog</code>, like this:</p>
<pre><code>cp -R ../blog1/_site/* .</code></pre>
<p>The <code>-R</code> recursively copies any subfolders of <code>_site</code>. The <code>*</code> on the end of the copy command is important; if you omit this, <code>blog</code> will contain a subfolder called <code>_site</code>, with the website below <em>that</em>, and this is not what you want. To check that it worked, run <code>ls</code> now, and you should see a file called <code>index.html</code>, along with some other files and possibly subfolders. (If it’s a blog, you’ll also see <code>about.html</code> and a subfolder called <code>posts</code>.)</p>
<p>The contents of <code>blog</code> now need to be put under version control, in a branch called <code>pages</code>, something like this:</p>
<pre><code>git init
git switch -c pages
git add .
git commit -m "first commit"</code></pre>
<p>The advantage to running <code>git switch</code> <em>first</em> is that no other branches get created, not even <code>main</code>.</p>
<p>Next, we connect a new Codeberg repo with this one. In Codeberg, create a new repo (mine has the same name <code>blog</code> as the local one), and follow the instructions for pushing an existing repository, remembering to use the <code>pages</code> branch on both ends, something like this:</p>
<pre><code>git remote add origin ssh://git@codeberg.org/username/blog.git
git push -u origin pages</code></pre>
<p>Now, you can check out the current state of your website at <code>https://username.codeberg.page/reponame/</code>, substituting your username (mine is <code>nxskok</code>) and repo name (<code>blog</code>).</p>
<p>From here on, the workflow is this:</p>
<ul>
<li>add some content to your website, in <code>blog1</code></li>
<li>run <code>quarto render</code> to render the site (creating a new version of <code>_site</code> inside <code>blog1</code>)</li>
<li>go to your terminal and run the <code>cp</code> command (above) to copy the new version of your website into <code>blog</code></li>
<li>also run the <code>git</code> commands to push the new version of your website to Codeberg</li>
<li>see how the new version of the published website looks</li>
<li>rinse and repeat. (From time to time, also commit the source project <code>blog1</code>.)</li>
</ul>
<p>I found it useful to create a shell script to do the copying and the git commands, since they are the same every time. Mine is called <code>copy.sh</code>. It lives in <code>blog</code> and looks like this:</p>
<pre><code>cp -R ../blog1/_site/* .
git add .
git commit -m "update site"
git push</code></pre>
<p>This way, I only need to run <code>sh copy.sh</code> to copy the new version of the website into <code>blog</code> and send it to Codeberg. My commit messages are very uninformative, but that hasn’t tripped me up yet.</p>
</section>
<section id="looking-ahead" class="level2">
<h2 class="anchored" data-anchor-id="looking-ahead">Looking ahead</h2>
<p>In <a href="../../posts/quarto-codeberg-part-2/index.html">part 2</a>, I talk about using your own domain with Codeberg.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I called my repo “disposable” because I will literally be disposing of it at some point after I write this.↩︎</p></li>
<li id="fn2"><p>For a satisfactory-to-me definition of “works”, anyway.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>websites</category>
  <guid>https://blog.ritsokiguess.site/posts/quarto-codeberg/</guid>
  <pubDate>Wed, 31 Dec 2025 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/quarto-codeberg/Screenshot from 2025-12-31 16-26-12.png" medium="image" type="image/png" height="50" width="144"/>
</item>
<item>
  <title>Tidy linear programming</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/tidy-linear-programming/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Using the <code>tidyLP</code> package to solve linear programs</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyLP)</span></code></pre></div></div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Yesterday on Mastodon I saw <a href="https://phanpy.social/#/cupoftea.social/s/115796843055024831">this post</a> about the <code>tidyLP</code> <a href="https://github.com/colin-fraser/tidyLP">package</a>. I was thinking “I have a spiffy new blog (with the same un-spiffy old posts) and I haven’t written anything in 2025, so maybe I should write about my experiences with the package before the year is out”.</p>
<p>A back-story: I moved to the UK to teach for a bit (this was some years ago) and learned that Operations Research (linear programming, integer programming, transportation and assignment problems, etc.) fell under the remit of Statistics, even though there is no actual randomness in any of the above, so I was expected to teach that stuff even though I had not used it in a long time, or ever. So I had some late nights learning or re-learning material to teach to my students in the next week. The book we used was the massive tome by <a href="https://www.amazon.ca/Operations-Research-Applications-Algorithms-Book/dp/0534209718?crid=HSGIU9IYZ0H8&amp;dib=eyJ2IjoiMSJ9.48NWmo7K8MWbQ8yLS94HaC9h85kcclEwMZBYR4cveUglcDaYatVNHVQmVO1wJARhgOAKaqsgtyTx-axi4ZZj7IMVEVdMKN-9nUNPER6Jq91cBCoEUCujflD-AHmQMDWtTdZF1K7ngvJp30BQ66P64bVvUtio7k5rfWmL1Sj-mSuMbV2PgmQvj7bZXHXvxilzEhhqLdJd68xrh58QjDdGpWaEdKO4QRJrNriCwjM52PaPMpZbs23xx73BuLlukYOdV50BYgTzaYpy1POxVDIzAXpfdDMn6OefnBCY4ITcNJQ.XeidaYkn6ZpiSGvMCvAV3OKiwJKTLwYurwti7hISEY4&amp;dib_tag=se&amp;keywords=winston+operations+research&amp;qid=1767112930&amp;sprefix=winston+oper%2Caps%2C151&amp;sr=8-2">Winston</a> (this edition). The link doesn’t say how many pages, but “weight: 2.45 kg” gives you the idea. Why we were using a massively wordy American text rather than a much more concise British one, I have no idea.</p>
</section>
<section id="a-first-example" class="level2">
<h2 class="anchored" data-anchor-id="a-first-example">A first example</h2>
<p>One of the first linear programming problems in Winston goes like<sup>1</sup> this:</p>
<p>A toy company makes wooden toy soldiers and toy trains. Each soldier requires 2 hours of finishing time and one hour of carpentry time, and results in a profit of $3. Each train requires 1 hour of finishing time and 1 hour of carpentry time, and results in a profit of $2. There are 100 finishing hours and 80 carpentry hours available, and previous experience indicates that a maximum of 40 soldiers can be sold. How many soldiers and trains should be made in order to maximize profit?</p>
<p>There are two things that make this a linear programming problem:</p>
<ul>
<li>the function to be optimized (here profit) is linear</li>
<li>there are some (also linear) constraints on what we can do.</li>
</ul>
<p>The standard by-hand method of solving these problems is called the Simplex Method. The idea is that the constraints define a region in space called a simplex, and the highest profit is at one of the corners of the the simplex. The Simplex Method tells you how to move from corner to corner while increasing the profit (and thus, when there is nowhere else to move to, you have found the optimal solution).</p>
<p>Unless you are learning the method specifically, these days there is no need to learn the Simplex Method to solve these problems, and the <code>tidyLP</code> package provides a function <code>lp_solve</code> to find the solution for you.</p>
<p>In this problem, the logical flow is:</p>
<ul>
<li>the number of soldiers and trains you make determine how much profit you make.</li>
<li>the number of soldiers and trains made also determine how many finishing and carpentry hours you need</li>
<li>there are limits on all of those things, which you need to respect.</li>
</ul>
<p>The way you set this up in <code>tidyLP</code> is to begin with a dataframe which captures the relationships in the first of those two bullet points. Each row of that dataframe contains something you want to optimize over, and each column contains the contribution of each additional unit of those things to your profit or to your finishing and carpentry hours, like this:<sup>2</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">d_csv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb4-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">product, finishing, carpentry, profit</span></span>
<span id="cb4-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">soldier, 2, 1, 3</span></span>
<span id="cb4-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">train, 1, 1, 2</span></span>
<span id="cb4-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb4-6">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(d_csv)</span>
<span id="cb4-7">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["product"],"name":[1],"type":["chr"],"align":["left"]},{"label":["finishing"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["carpentry"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["profit"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"soldier","2":"2","3":"1","4":"3"},{"1":"train","2":"1","3":"1","4":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This takes care of the first part of the problem statement.</p>
<p>Our constraints come in two flavours:</p>
<ul>
<li>ones that are related to columns of the dataframe we just made (the ones on finishing and carpentry hours)</li>
<li>ones that are related to the variables we are optimizing over (in this case, the limit on the number of soldiers made).</li>
</ul>
<p>These go into the problem specification in different ways.</p>
<p>Using <code>tidyLP</code> to solve this problem goes in three steps, which can be “piped” together (hence “tidy”):</p>
<ul>
<li>use <code>tidy_lp</code> to specify the problem and constraints</li>
<li>use <code>lp_solve</code> to solve the problem</li>
<li>use <code>bind_solution</code> to glue the solution onto our dataframe</li>
</ul>
<p>I then like to save the latter, and then inspect it. Here’s how it goes for this problem:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy_lp</span>(</span>
<span id="cb5-2">  d,</span>
<span id="cb5-3">  profit,</span>
<span id="cb5-4">  finishing <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>),</span>
<span id="cb5-5">  carpentry <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">80</span>),</span>
<span id="cb5-6">  (product <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"soldier"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>)</span>
<span id="cb5-7">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lp_solve</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_solution</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> sol</span>
<span id="cb5-10">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(product, .solution)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["product"],"name":[1],"type":["chr"],"align":["left"]},{"label":[".solution"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"soldier","2":"20"},{"1":"train","2":"60"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The inputs to <code>tidy_lp</code> are our dataframe, the column of it to optimize (by default maximize), and then one for each constraint. For constraints expressed in terms of the columns of our dataframe, you put the column name on the left of a squiggle, and on the right you put for example <code>leq</code> (“less than or equal to”) with a number inside saying <em>what</em> it is less than or equal to. For constraints expressed in terms of what we are optimizing over (the soldiers constraint here), the notation is different: the above shows how we express “the number of soldiers is less than 40”. I put soldiers and trains in a column called <code>product</code>.</p>
<p>For constraints, there are also <code>geq()</code> and <code>eq()</code>, which are “greater or equal” and “exactly equal” respectively.</p>
<p>The last step shows that the optimal solution is to make 20 soldiers and 60 trains. This happens to be the same answer as Winston got, so it is presumably correct.</p>
<p>We can go further, and investigate how many of our finishing and carpentry hours we ended up using. For this, we note that what I called <code>sol</code> has all the columns in our original dataframe in it, so we can do calculations like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(finishing<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>profit, \(x) x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["product"],"name":[1],"type":["chr"],"align":["left"]},{"label":["finishing"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["carpentry"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["profit"],"name":[4],"type":["dbl"],"align":["right"]},{"label":[".solution"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"soldier","2":"40","3":"20","4":"60","5":"20"},{"1":"train","2":"60","3":"60","4":"120","5":"60"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(finishing<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>profit, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution)))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["finishing"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["carpentry"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["profit"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"100","2":"80","3":"180"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This shows that we made a profit of $60 from soldiers and $120 from trains for a total of $180. Also, we used 40 finishing hours for soldiers and 60 for trains, so we used up our total 100 finishing hours. For carpentry hours, we used 20 for making soldiers and 60 for making trains, so we also used up our total of 80 of these. We only made 20 soldiers, though, not reaching our limit of 40: this constraint is not “binding”.</p>
<p>None of this, of course, is actually a <em>proof</em> that the answer is best (other than our confidence in <code>lp_solve</code>): it respects all the constraints, but in principle there could be a better number of soldiers and trains to make that also respects the constraints.</p>
</section>
<section id="a-second-example" class="level2">
<h2 class="anchored" data-anchor-id="a-second-example">A second example</h2>
<p>This one comes directly from Winston (Problem 3 in Section 3.2 in my edition):</p>
<blockquote class="blockquote">
<p>Leary Chemical manufactures three chemicals: A, B, and C. These chemicals are produced via two production processes: 1 and 2. Running process 1 for an hour costs $4 and yields 3 units of A, 1 of B, and 1 of C. Running process 2 for an hour costs $1 and produces 1 unit of A and 1 of B. To meet customer demands, at least 10 units of A, 5 of B, and 3 of C must be produced daily. Graphically determine a daily production plan that minimizes the cost of meeting Leary Chemical’s daily demands.</p>
</blockquote>
<p>Note that in this problem, there is no benefit to producing extra of the chemicals beyond what is needed to meet customer demands: here, we are not selling it at a profit, but producing it at a cost.</p>
<p>The logical flow is that the number of hours processes 1 and 2 are run (to be optimized over) determines the amounts of chemicals A, B, and C that are produced, so these are the relationships that need to be summarized in our dataframe. I have called the processes P1 and P2 (so that I don’t get confused):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">d_csv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb8-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">process, A, B, C, cost</span></span>
<span id="cb8-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">P1, 3, 1, 1, 4</span></span>
<span id="cb8-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">P2, 1, 1, 0, 1</span></span>
<span id="cb8-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb8-6">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(d_csv)</span>
<span id="cb8-7">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["process"],"name":[1],"type":["chr"],"align":["left"]},{"label":["A"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"P1","2":"3","3":"1","4":"1","5":"4"},{"1":"P2","2":"1","3":"1","4":"0","5":"1"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Process 2 does not produce any of chemical C, hence the zero in the dataframe.</p>
<p>The constraints (meeting customer demand) are all in terms of columns of this dataframe (there are, for example, no limits on how many hours we can run each process for), so there is no difficulty in expressing the constraints:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy_lp</span>(</span>
<span id="cb9-2">  d, </span>
<span id="cb9-3">  cost, </span>
<span id="cb9-4">  A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>),</span>
<span id="cb9-5">  B <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>),</span>
<span id="cb9-6">  C <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>),</span>
<span id="cb9-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.direction =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"min"</span></span>
<span id="cb9-8">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lp_solve</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_solution</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> sol</span>
<span id="cb9-11">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(process, .solution)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["process"],"name":[1],"type":["chr"],"align":["left"]},{"label":[".solution"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"P1","2":"3"},{"1":"P2","2":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The one new thing is that this is a minimum-cost problem. This is specified by using the additional input <code>.direction</code>.</p>
<p>The solution is to run process 1 for 3 hours and process 2 for 2 hours. How much of each chemical is produced under this plan?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>cost, \(x) x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["process"],"name":[1],"type":["chr"],"align":["left"]},{"label":["A"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[5],"type":["dbl"],"align":["right"]},{"label":[".solution"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"P1","2":"9","3":"3","4":"3","5":"12","6":"3"},{"1":"P2","2":"2","3":"2","4":"0","5":"2","6":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>cost, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution)))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"11","2":"5","3":"3","4":"14"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>There is no extra of chemicals B and C produced, but this plan produces one extra unit of chemical A. This seems to be unavoidable: to produce enough of C, process 1 has to be run for 3 hours, and to produce enough of B, process 2 has to be run in addition for 2 hours. Doing that produces <img src="https://latex.codecogs.com/png.latex?9+2%20=%2011"> units of A.</p>
</section>
<section id="being-a-statistician" class="level2">
<h2 class="anchored" data-anchor-id="being-a-statistician">Being a statistician</h2>
<p>As I said earlier, this is a mathematical problem and not a statistical one. Suppose we had no knowledge of the simplex method or the <code>tidyLP</code> package. What would we do? One way is to generate some numbers of hours to run each process, work out the cost of doing so and the amounts of the three chemicals produced. This part is a bit complicated, so we’ll put it in a function, imaginatively called <code>f</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">f <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(d, p1, p2) {</span>
<span id="cb12-2">  v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(p1, p2)</span>
<span id="cb12-3">  d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(A<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>cost, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> v)))</span>
<span id="cb12-4">}</span></code></pre></div></div>
</div>
<p>This returns a one-row dataframe with the total cost and the amounts of A, B, and C produced:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["A"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"11","2":"5","3":"3","4":"14"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>reproducing our answer from earlier.</p>
<p>All right, let’s generate some literally random<sup>3</sup> number of hours to run each process (let’s say between 0 and 5):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">nrow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span></span>
<span id="cb14-2">max_hours <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb14-3">g <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(nrow, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, max_hours), </span>
<span id="cb14-4">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(nrow, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, max_hours))</span>
<span id="cb14-5">g</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"3.936926","2":"0.4219791"},{"1":"2.568848","2":"1.2480891"},{"1":"1.241611","2":"3.0353055"},{"1":"3.828483","2":"4.0922087"},{"1":"4.101035","2":"1.4918833"},{"1":"2.613874","2":"3.9595575"},{"1":"1.342721","2":"3.9360561"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Now, we can run our function for each of those. My function is not vectorized,<sup>4</sup> so I use <code>rowwise</code> for this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ans =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, p1, p2)))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["ans"],"name":[3],"type":["list"],"align":["right"]}],"data":[{"1":"3.936926","2":"0.4219791","3":"<tibble[,4]>"},{"1":"2.568848","2":"1.2480891","3":"<tibble[,4]>"},{"1":"1.241611","2":"3.0353055","3":"<tibble[,4]>"},{"1":"3.828483","2":"4.0922087","3":"<tibble[,4]>"},{"1":"4.101035","2":"1.4918833","3":"<tibble[,4]>"},{"1":"2.613874","2":"3.9595575","3":"<tibble[,4]>"},{"1":"1.342721","2":"3.9360561","3":"<tibble[,4]>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and now I unnest <code>ans</code> wider to see the actual values:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ans =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, p1, p2))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ans)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["A"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"3.936926","2":"0.4219791","3":"12.232757","4":"4.358905","5":"3.936926","6":"16.169683"},{"1":"2.568848","2":"1.2480891","3":"8.954633","4":"3.816937","5":"2.568848","6":"11.523481"},{"1":"1.241611","2":"3.0353055","3":"6.760138","4":"4.276916","5":"1.241611","6":"8.001749"},{"1":"3.828483","2":"4.0922087","3":"15.577657","4":"7.920692","5":"3.828483","6":"19.406140"},{"1":"4.101035","2":"1.4918833","3":"13.794987","4":"5.592918","5":"4.101035","6":"17.896021"},{"1":"2.613874","2":"3.9595575","3":"11.801179","4":"6.573431","5":"2.613874","6":"14.415052"},{"1":"1.342721","2":"3.9360561","3":"7.964220","4":"5.278777","5":"1.342721","6":"9.306941"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>I cannot just minimize the cost here; I need to make sure I produced enough of each chemical:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ans =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, p1, p2))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ans) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">constr_ok =</span> (A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> B <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> C <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["A"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["constr_ok"],"name":[7],"type":["lgl"],"align":["right"]}],"data":[{"1":"3.936926","2":"0.4219791","3":"12.232757","4":"4.358905","5":"3.936926","6":"16.169683","7":"FALSE"},{"1":"2.568848","2":"1.2480891","3":"8.954633","4":"3.816937","5":"2.568848","6":"11.523481","7":"FALSE"},{"1":"1.241611","2":"3.0353055","3":"6.760138","4":"4.276916","5":"1.241611","6":"8.001749","7":"FALSE"},{"1":"3.828483","2":"4.0922087","3":"15.577657","4":"7.920692","5":"3.828483","6":"19.406140","7":"TRUE"},{"1":"4.101035","2":"1.4918833","3":"13.794987","4":"5.592918","5":"4.101035","6":"17.896021","7":"TRUE"},{"1":"2.613874","2":"3.9595575","3":"11.801179","4":"6.573431","5":"2.613874","6":"14.415052","7":"FALSE"},{"1":"1.342721","2":"3.9360561","3":"7.964220","4":"5.278777","5":"1.342721","6":"9.306941","7":"FALSE"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Only two of these rows satisfy all the constraints, so I <code>filter</code> those and find the one with the smallest cost:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ans =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, p1, p2))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ans) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">constr_ok =</span> (A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> B <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> C <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(constr_ok) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(cost)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["A"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["constr_ok"],"name":[7],"type":["lgl"],"align":["right"]}],"data":[{"1":"4.101035","2":"1.491883","3":"13.79499","4":"5.592918","5":"4.101035","6":"17.89602","7":"TRUE"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Using this amazingly mindless method, our best cost is 17.9 (compared to the actual optimal 14), running process 1 for 4.10 hours and process 2 for 1.49 hours. But at least it satisfies the constraints.</p>
<p>The thing about simulations is that you can get an answer as close as you like by running it long enough. Let’s try 1000 simulations this time. The code is all there; this time we redefine <code>g</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">nrow <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb19-2">max_hours <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb19-3">g <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p1 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(nrow, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, max_hours), </span>
<span id="cb19-4">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p2 =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(nrow, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, max_hours))</span>
<span id="cb19-5">g</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"2.681756432","2":"3.26809100"},{"1":"4.220664427","2":"3.51459720"},{"1":"0.306669499","2":"3.37486575"},{"1":"2.155122914","2":"0.06010657"},{"1":"3.122742808","2":"3.76729996"},{"1":"0.090997053","2":"0.06658156"},{"1":"3.069837393","2":"4.39841154"},{"1":"4.239164961","2":"2.26184748"},{"1":"1.785051429","2":"2.98804787"},{"1":"3.779636234","2":"2.12729480"},{"1":"2.692373365","2":"3.63347638"},{"1":"0.697365643","2":"1.09778122"},{"1":"1.270416415","2":"4.36051841"},{"1":"4.184453659","2":"1.40298538"},{"1":"3.705935983","2":"3.04199949"},{"1":"1.124288834","2":"4.33842393"},{"1":"0.683619949","2":"2.73112159"},{"1":"1.053103689","2":"0.43644393"},{"1":"2.331357282","2":"1.37987647"},{"1":"4.468375881","2":"4.81258735"},{"1":"3.392530765","2":"1.32562317"},{"1":"1.019673651","2":"0.17350536"},{"1":"3.978068285","2":"2.39904705"},{"1":"0.702518842","2":"0.69188476"},{"1":"3.116149988","2":"4.85209382"},{"1":"2.199044834","2":"2.17443052"},{"1":"0.362968483","2":"4.78235796"},{"1":"2.830451869","2":"3.17101336"},{"1":"3.759679869","2":"4.87322406"},{"1":"0.387655944","2":"2.11461417"},{"1":"2.053531203","2":"4.64799502"},{"1":"1.463114250","2":"0.94087878"},{"1":"2.066622246","2":"1.68521555"},{"1":"1.850604656","2":"3.96808993"},{"1":"4.401713180","2":"2.69495170"},{"1":"4.470760839","2":"1.82843801"},{"1":"3.276760514","2":"3.60922651"},{"1":"0.350593672","2":"4.71432238"},{"1":"3.312415443","2":"4.14282870"},{"1":"4.523762858","2":"2.85673697"},{"1":"1.346989926","2":"1.92116358"},{"1":"0.886633700","2":"4.86282857"},{"1":"0.889994018","2":"3.48154148"},{"1":"4.494623984","2":"2.64447558"},{"1":"0.199866502","2":"2.17520227"},{"1":"1.410205986","2":"4.56590690"},{"1":"1.900360439","2":"3.37395465"},{"1":"3.596185055","2":"3.83710019"},{"1":"4.129914842","2":"1.45863305"},{"1":"1.032740711","2":"2.70308136"},{"1":"2.313242554","2":"4.93300385"},{"1":"4.181956762","2":"4.11317949"},{"1":"3.974857194","2":"1.11404110"},{"1":"3.293594426","2":"4.82978089"},{"1":"0.649740339","2":"1.93051393"},{"1":"1.842122055","2":"2.97276177"},{"1":"1.546278641","2":"2.55606581"},{"1":"4.115609898","2":"4.64324346"},{"1":"3.338556497","2":"1.03783231"},{"1":"3.227868255","2":"4.16478023"},{"1":"4.837843372","2":"3.90581128"},{"1":"1.223178542","2":"0.87653868"},{"1":"0.879791206","2":"4.33372451"},{"1":"0.959862584","2":"4.02665210"},{"1":"0.249471802","2":"1.85713532"},{"1":"1.596514347","2":"0.94499421"},{"1":"2.358308712","2":"4.97015350"},{"1":"4.250783623","2":"4.22504896"},{"1":"0.833412317","2":"1.54982607"},{"1":"0.113604706","2":"1.58813144"},{"1":"2.236500444","2":"2.82395387"},{"1":"4.069606406","2":"3.47846835"},{"1":"2.491060935","2":"4.81358506"},{"1":"0.342663929","2":"4.38632029"},{"1":"0.215243852","2":"0.54707817"},{"1":"1.783670355","2":"3.83755077"},{"1":"4.132840768","2":"0.70524512"},{"1":"2.127100769","2":"1.95546208"},{"1":"3.173015605","2":"2.74874115"},{"1":"0.619475906","2":"2.55355559"},{"1":"4.802773519","2":"2.31976449"},{"1":"2.530440856","2":"1.47992734"},{"1":"2.755555459","2":"3.47705634"},{"1":"0.287206069","2":"3.99939972"},{"1":"3.788660821","2":"3.31429285"},{"1":"2.241677693","2":"3.58683751"},{"1":"3.033530543","2":"4.61389028"},{"1":"2.005183410","2":"3.98972982"},{"1":"1.876963867","2":"1.62610541"},{"1":"2.464050887","2":"0.08802365"},{"1":"4.647404803","2":"0.12479907"},{"1":"2.374650650","2":"2.77344557"},{"1":"2.629445348","2":"4.10275604"},{"1":"3.278929758","2":"2.80391401"},{"1":"4.338343260","2":"4.63879311"},{"1":"1.120008696","2":"2.24298083"},{"1":"2.943287232","2":"2.37920928"},{"1":"3.078437928","2":"2.90582449"},{"1":"3.042016363","2":"3.52089700"},{"1":"3.732427259","2":"0.44862694"},{"1":"0.152926525","2":"3.56598171"},{"1":"3.210204395","2":"2.53416220"},{"1":"1.639173370","2":"4.61634329"},{"1":"2.939318345","2":"4.53983494"},{"1":"0.443962164","2":"1.06786349"},{"1":"1.591010523","2":"3.83565955"},{"1":"4.075607219","2":"0.04203427"},{"1":"1.505241956","2":"2.86324842"},{"1":"2.684726645","2":"3.56870832"},{"1":"4.323734220","2":"3.35174441"},{"1":"3.347857086","2":"3.16468201"},{"1":"1.082929459","2":"3.30478068"},{"1":"4.390607531","2":"0.74409315"},{"1":"2.835711287","2":"2.78138004"},{"1":"3.839837421","2":"3.19391870"},{"1":"4.118184133","2":"3.88266494"},{"1":"4.578088344","2":"1.14045689"},{"1":"3.109886021","2":"3.45442540"},{"1":"1.575790646","2":"3.91063271"},{"1":"0.798735000","2":"1.62361114"},{"1":"2.378147888","2":"3.94192352"},{"1":"3.617821180","2":"0.27372418"},{"1":"1.462334606","2":"3.75808680"},{"1":"1.092959860","2":"4.97112249"},{"1":"2.096412497","2":"3.29136041"},{"1":"3.660197653","2":"2.39909554"},{"1":"1.285274725","2":"3.49966475"},{"1":"3.372348950","2":"4.78374179"},{"1":"4.047477796","2":"4.11278534"},{"1":"0.853650784","2":"0.74330832"},{"1":"4.035406181","2":"4.75486054"},{"1":"4.007457438","2":"4.36725460"},{"1":"1.245862443","2":"4.50698016"},{"1":"4.820047461","2":"0.16820073"},{"1":"0.792836081","2":"1.96227123"},{"1":"2.167869449","2":"0.64050725"},{"1":"1.426539829","2":"1.37694954"},{"1":"1.701713831","2":"2.15686649"},{"1":"1.153467181","2":"2.70401407"},{"1":"4.042797282","2":"3.01389483"},{"1":"2.928349497","2":"4.51726048"},{"1":"1.637484693","2":"1.81117173"},{"1":"1.776968908","2":"0.77806204"},{"1":"1.231907834","2":"2.37482377"},{"1":"3.207188272","2":"4.66864860"},{"1":"0.699355149","2":"2.19564133"},{"1":"0.579103901","2":"2.97259485"},{"1":"3.567489333","2":"0.92283679"},{"1":"0.090164460","2":"3.01435787"},{"1":"1.500796945","2":"4.42953114"},{"1":"4.571803603","2":"1.13979048"},{"1":"4.842577650","2":"3.65017101"},{"1":"0.572723324","2":"4.91642860"},{"1":"2.495833179","2":"4.64364701"},{"1":"2.586623206","2":"2.76607288"},{"1":"1.571321379","2":"0.10046206"},{"1":"2.817794729","2":"1.76392272"},{"1":"0.859534466","2":"1.72191957"},{"1":"2.276136799","2":"4.14756010"},{"1":"4.548406983","2":"2.38368452"},{"1":"0.737537147","2":"4.04845020"},{"1":"4.119226481","2":"1.34441086"},{"1":"0.234077831","2":"2.63912527"},{"1":"1.087920482","2":"0.80650510"},{"1":"4.417315630","2":"2.69514230"},{"1":"2.745800426","2":"3.79934458"},{"1":"3.746932134","2":"1.22406321"},{"1":"4.476159342","2":"4.46110619"},{"1":"1.232130035","2":"1.25397680"},{"1":"2.389816500","2":"3.54920596"},{"1":"0.345800514","2":"1.13927823"},{"1":"2.947808083","2":"0.38647145"},{"1":"2.014047649","2":"4.60352338"},{"1":"1.556820585","2":"2.79583927"},{"1":"0.893199970","2":"0.91644995"},{"1":"2.327338137","2":"0.55129079"},{"1":"1.973330663","2":"3.40120139"},{"1":"3.918346330","2":"4.86396325"},{"1":"0.378640190","2":"2.39913378"},{"1":"2.140140703","2":"2.46665578"},{"1":"3.701353722","2":"1.17797376"},{"1":"0.736359952","2":"3.53054087"},{"1":"2.185136403","2":"4.30162652"},{"1":"0.142376269","2":"4.35748632"},{"1":"2.429875641","2":"3.27553501"},{"1":"3.286084877","2":"3.03055219"},{"1":"2.060518370","2":"0.36449665"},{"1":"3.111702630","2":"1.43516220"},{"1":"3.560827817","2":"3.83435873"},{"1":"2.566636956","2":"3.39356715"},{"1":"0.674086686","2":"1.74568148"},{"1":"3.341253353","2":"3.93854164"},{"1":"3.108700969","2":"1.76926142"},{"1":"4.763196872","2":"3.52551963"},{"1":"2.009304382","2":"0.81658397"},{"1":"4.660763483","2":"2.88438685"},{"1":"3.429002797","2":"0.46896691"},{"1":"1.032511253","2":"3.92076716"},{"1":"4.796315314","2":"0.14461133"},{"1":"1.286672684","2":"3.95990222"},{"1":"4.894492534","2":"0.68734352"},{"1":"2.838194756","2":"0.64407938"},{"1":"1.793894466","2":"1.03810356"},{"1":"1.365585286","2":"1.77263212"},{"1":"0.479475542","2":"3.94547458"},{"1":"0.810423746","2":"1.34317136"},{"1":"2.179998739","2":"4.75094527"},{"1":"3.673187657","2":"4.37496311"},{"1":"1.737287962","2":"1.12302098"},{"1":"1.664739173","2":"4.96173536"},{"1":"1.206129192","2":"2.28218115"},{"1":"1.030669471","2":"3.05474542"},{"1":"1.983380315","2":"0.55114987"},{"1":"4.240523890","2":"1.82951402"},{"1":"0.416045474","2":"2.78723144"},{"1":"1.105555495","2":"4.48321280"},{"1":"0.591406261","2":"2.79892676"},{"1":"1.518500776","2":"3.56475545"},{"1":"2.565070244","2":"1.96521619"},{"1":"4.255323547","2":"3.55432739"},{"1":"3.278020865","2":"0.51695047"},{"1":"4.040452022","2":"3.48626958"},{"1":"0.242934416","2":"2.66090106"},{"1":"0.907126105","2":"2.28467087"},{"1":"0.735177431","2":"2.25545112"},{"1":"3.715863299","2":"4.78950328"},{"1":"0.559266513","2":"3.57335341"},{"1":"4.572441570","2":"4.03705705"},{"1":"1.070722945","2":"3.72139640"},{"1":"3.722671811","2":"3.64746950"},{"1":"0.156641066","2":"3.09147273"},{"1":"4.083133889","2":"3.34510029"},{"1":"4.073923933","2":"1.68169573"},{"1":"0.463305868","2":"1.91258008"},{"1":"2.635602558","2":"3.99471941"},{"1":"4.004227842","2":"1.06790970"},{"1":"1.093217224","2":"4.16386578"},{"1":"3.597111767","2":"1.28266940"},{"1":"3.173311141","2":"4.34468730"},{"1":"2.889764785","2":"0.92221334"},{"1":"3.273847561","2":"2.49429483"},{"1":"0.406226758","2":"1.82094099"},{"1":"4.276700622","2":"4.09190510"},{"1":"3.466932764","2":"1.35993602"},{"1":"3.377647102","2":"1.41186491"},{"1":"1.104276483","2":"2.19676980"},{"1":"1.235129102","2":"3.64101909"},{"1":"0.611563449","2":"4.12264012"},{"1":"2.507669883","2":"1.90068623"},{"1":"3.871890845","2":"3.37007234"},{"1":"2.878063380","2":"0.48838529"},{"1":"1.523309187","2":"0.30666196"},{"1":"1.984524283","2":"3.65667292"},{"1":"4.216693043","2":"1.53014072"},{"1":"0.906047839","2":"2.65189375"},{"1":"4.529641817","2":"1.00310443"},{"1":"2.927319675","2":"2.13667975"},{"1":"2.959705094","2":"2.21174376"},{"1":"4.196318872","2":"2.45701568"},{"1":"0.540253127","2":"3.07029313"},{"1":"0.430583493","2":"1.95224411"},{"1":"3.915006582","2":"0.77562703"},{"1":"2.531198930","2":"4.48656595"},{"1":"3.268101842","2":"0.31954454"},{"1":"4.515691265","2":"4.78659515"},{"1":"1.260786684","2":"4.07627926"},{"1":"0.382338561","2":"0.20842728"},{"1":"4.665254682","2":"2.02047608"},{"1":"2.091367056","2":"0.24166086"},{"1":"0.554678598","2":"3.35805323"},{"1":"3.950313574","2":"0.63240203"},{"1":"0.913320213","2":"1.41965584"},{"1":"1.349036954","2":"0.15164937"},{"1":"1.755678831","2":"3.75656006"},{"1":"1.680557994","2":"3.81416524"},{"1":"1.356439753","2":"4.39222345"},{"1":"4.681929856","2":"0.47208960"},{"1":"3.978088646","2":"2.51610469"},{"1":"4.432860258","2":"4.95078876"},{"1":"3.109374309","2":"0.43333294"},{"1":"3.937118867","2":"3.34819046"},{"1":"3.727676090","2":"0.16231181"},{"1":"4.228513323","2":"0.18486420"},{"1":"4.229601056","2":"1.67214646"},{"1":"3.658868525","2":"4.04579825"},{"1":"4.829029591","2":"3.11398375"},{"1":"4.648501169","2":"3.87226070"},{"1":"3.762809199","2":"3.66201965"},{"1":"4.101655523","2":"1.72021633"},{"1":"3.722606646","2":"1.12116420"},{"1":"3.658712752","2":"1.81968059"},{"1":"4.563323890","2":"2.09894465"},{"1":"4.616987982","2":"0.92859512"},{"1":"2.415537509","2":"1.61927741"},{"1":"4.420326225","2":"3.29084121"},{"1":"3.250023687","2":"4.82995143"},{"1":"4.320222223","2":"4.17802133"},{"1":"4.984772546","2":"2.81589553"},{"1":"0.045249563","2":"3.97264997"},{"1":"3.385545781","2":"2.18371848"},{"1":"2.066756887","2":"1.57187274"},{"1":"2.384360592","2":"1.43073404"},{"1":"4.052082244","2":"1.64980922"},{"1":"4.604089578","2":"3.87849830"},{"1":"3.392593858","2":"2.79753688"},{"1":"2.398809468","2":"3.28745748"},{"1":"2.818514449","2":"3.40018666"},{"1":"0.718455745","2":"2.07018246"},{"1":"1.270594103","2":"1.49912916"},{"1":"3.769166151","2":"2.19151183"},{"1":"4.945476778","2":"3.59461323"},{"1":"3.914238218","2":"1.41216740"},{"1":"4.848364156","2":"1.92372720"},{"1":"0.936664611","2":"2.46428151"},{"1":"0.168116544","2":"3.91388305"},{"1":"3.186351395","2":"4.87206947"},{"1":"0.397526876","2":"0.47469839"},{"1":"2.729647468","2":"0.86259547"},{"1":"4.405393514","2":"4.94140024"},{"1":"2.744705199","2":"0.62599922"},{"1":"3.008602310","2":"0.03274363"},{"1":"3.072912650","2":"1.13795786"},{"1":"3.890687334","2":"0.46753832"},{"1":"0.128465063","2":"0.78767231"},{"1":"0.667240794","2":"2.46136033"},{"1":"3.771626974","2":"3.37910761"},{"1":"2.555607485","2":"2.32908797"},{"1":"1.216383001","2":"4.90062332"},{"1":"0.596108247","2":"1.58253192"},{"1":"1.914038634","2":"2.42420830"},{"1":"3.034739606","2":"2.38345478"},{"1":"1.063062743","2":"2.58693043"},{"1":"4.441554021","2":"3.48072939"},{"1":"3.220471466","2":"0.94087060"},{"1":"3.472749988","2":"3.51511270"},{"1":"3.809879940","2":"1.40735098"},{"1":"1.107944662","2":"4.85743250"},{"1":"2.270063666","2":"4.13474630"},{"1":"2.016059225","2":"0.57220545"},{"1":"4.491959475","2":"0.52705708"},{"1":"0.290856702","2":"2.52908954"},{"1":"3.463035939","2":"1.46405928"},{"1":"0.464228910","2":"1.27598160"},{"1":"3.345598254","2":"2.08750489"},{"1":"1.769738990","2":"1.52429791"},{"1":"3.238670473","2":"4.82222999"},{"1":"2.737533954","2":"0.92250839"},{"1":"3.064941096","2":"4.37943386"},{"1":"0.482357290","2":"0.13510076"},{"1":"0.608279108","2":"3.41011145"},{"1":"2.437757264","2":"2.55602113"},{"1":"1.806412090","2":"4.84037086"},{"1":"1.720513150","2":"1.12369981"},{"1":"0.287697997","2":"0.01101916"},{"1":"4.545120442","2":"0.34584385"},{"1":"3.357640518","2":"1.11422829"},{"1":"4.368188551","2":"2.95874390"},{"1":"4.262192128","2":"0.86312024"},{"1":"3.571882310","2":"0.99670475"},{"1":"4.117247179","2":"3.96963782"},{"1":"2.891927445","2":"4.05431996"},{"1":"3.858330827","2":"1.15054626"},{"1":"1.040855207","2":"2.08314425"},{"1":"2.115746269","2":"1.52901392"},{"1":"1.455160390","2":"3.46468956"},{"1":"1.754558366","2":"3.11062614"},{"1":"1.143808768","2":"1.35493718"},{"1":"0.215780248","2":"3.55840783"},{"1":"1.196496832","2":"2.09590873"},{"1":"1.541777058","2":"3.28788213"},{"1":"1.853888580","2":"4.86387920"},{"1":"0.446446947","2":"2.96356349"},{"1":"1.968045607","2":"1.56408284"},{"1":"4.645793387","2":"4.43379603"},{"1":"2.755928133","2":"0.19388290"},{"1":"2.036781033","2":"1.78681929"},{"1":"2.359562222","2":"3.53961218"},{"1":"4.405514069","2":"0.92695689"},{"1":"0.349432931","2":"1.86272762"},{"1":"0.665478741","2":"2.23004118"},{"1":"4.666087698","2":"3.87133321"},{"1":"1.615997301","2":"4.54370565"},{"1":"2.701944596","2":"1.79651942"},{"1":"2.846267445","2":"1.26320184"},{"1":"2.755240613","2":"0.75028046"},{"1":"0.111648460","2":"4.91255983"},{"1":"3.308188450","2":"2.48249557"},{"1":"3.320375272","2":"2.33551243"},{"1":"3.114130681","2":"0.24664973"},{"1":"4.421909400","2":"3.87613043"},{"1":"1.410046726","2":"2.66902803"},{"1":"3.774737574","2":"1.73091649"},{"1":"2.565416821","2":"0.20967950"},{"1":"2.083223315","2":"0.90202131"},{"1":"1.030886632","2":"1.84917194"},{"1":"2.109301593","2":"2.98928065"},{"1":"2.426911336","2":"1.71218550"},{"1":"4.933093652","2":"3.50332262"},{"1":"4.037919323","2":"3.25049869"},{"1":"2.036516961","2":"2.34434219"},{"1":"3.445410632","2":"3.02874765"},{"1":"0.119290093","2":"1.41833690"},{"1":"3.640356049","2":"3.41132711"},{"1":"4.176627235","2":"4.44588394"},{"1":"3.291555870","2":"0.35157065"},{"1":"3.354017527","2":"1.12873744"},{"1":"2.072136072","2":"4.99944890"},{"1":"1.133940992","2":"2.70589655"},{"1":"2.488351841","2":"3.62799382"},{"1":"0.197658103","2":"4.62132797"},{"1":"4.525700728","2":"4.34305396"},{"1":"0.444123929","2":"2.70032297"},{"1":"4.719606201","2":"0.13763491"},{"1":"4.301409902","2":"4.94176380"},{"1":"3.737682480","2":"0.52501803"},{"1":"3.931856130","2":"2.12709240"},{"1":"2.429729309","2":"1.29043585"},{"1":"0.466180386","2":"2.20172675"},{"1":"0.055991390","2":"3.86011011"},{"1":"2.079654153","2":"4.86257841"},{"1":"3.032555681","2":"3.76203624"},{"1":"0.207569685","2":"1.17848311"},{"1":"4.131739988","2":"1.56756364"},{"1":"1.947617115","2":"0.40608750"},{"1":"2.554591937","2":"2.62664852"},{"1":"2.526769064","2":"4.05237048"},{"1":"1.149180178","2":"2.15908896"},{"1":"3.998206332","2":"1.96431217"},{"1":"3.123439422","2":"2.97161678"},{"1":"0.203994452","2":"2.60989527"},{"1":"2.579801552","2":"4.70912998"},{"1":"4.817258308","2":"4.68748051"},{"1":"4.907802879","2":"0.28167868"},{"1":"2.460561927","2":"3.75218241"},{"1":"1.690838756","2":"4.44348494"},{"1":"3.841122113","2":"1.79398810"},{"1":"0.236251568","2":"0.68182509"},{"1":"4.534029694","2":"0.92464036"},{"1":"3.949737524","2":"1.31345551"},{"1":"1.291593660","2":"1.96124048"},{"1":"0.695849929","2":"0.23985092"},{"1":"1.831084210","2":"3.02847895"},{"1":"2.236491228","2":"4.22579185"},{"1":"3.459886252","2":"4.75358652"},{"1":"2.478570647","2":"3.31614572"},{"1":"0.661136103","2":"0.70791577"},{"1":"1.431301810","2":"3.85036049"},{"1":"2.338705894","2":"3.43988342"},{"1":"2.630587857","2":"3.12949493"},{"1":"4.778157731","2":"4.46172966"},{"1":"1.399690174","2":"4.50567517"},{"1":"0.682069933","2":"0.95452458"},{"1":"1.936939969","2":"0.86831157"},{"1":"1.865845413","2":"1.96336267"},{"1":"1.070110938","2":"0.95907664"},{"1":"1.064191140","2":"3.49348415"},{"1":"3.439665341","2":"3.79937877"},{"1":"0.642582717","2":"1.57055732"},{"1":"2.985688605","2":"0.31893362"},{"1":"2.656088655","2":"4.24310823"},{"1":"0.935126918","2":"0.94157375"},{"1":"2.732572433","2":"4.13081927"},{"1":"2.985674755","2":"1.17141348"},{"1":"4.061547320","2":"0.98031703"},{"1":"0.393836899","2":"2.22161875"},{"1":"1.842365211","2":"4.76848129"},{"1":"1.583405794","2":"0.87812017"},{"1":"3.644829699","2":"1.20666036"},{"1":"1.670910380","2":"4.04659341"},{"1":"3.838924735","2":"2.25935168"},{"1":"0.530277810","2":"4.45028499"},{"1":"1.716994271","2":"2.57200724"},{"1":"0.367547713","2":"4.16078087"},{"1":"4.644137411","2":"4.67588224"},{"1":"1.777397667","2":"0.80290946"},{"1":"0.955571526","2":"0.59959446"},{"1":"2.471277303","2":"2.46015590"},{"1":"3.755072985","2":"4.04024293"},{"1":"1.246943207","2":"1.56097945"},{"1":"4.707665025","2":"2.54962607"},{"1":"4.219086549","2":"4.26530960"},{"1":"2.844935671","2":"4.69427487"},{"1":"4.791130489","2":"2.33254954"},{"1":"2.122378938","2":"3.62999148"},{"1":"3.411888308","2":"3.55839843"},{"1":"3.648850584","2":"4.66221622"},{"1":"0.291900777","2":"1.81991661"},{"1":"2.014770210","2":"2.90489679"},{"1":"3.276173503","2":"1.73275803"},{"1":"1.350325113","2":"4.61789324"},{"1":"0.786678810","2":"3.94669057"},{"1":"4.931091430","2":"4.17317905"},{"1":"3.514745509","2":"4.88166323"},{"1":"2.923667914","2":"1.56746635"},{"1":"4.225300509","2":"4.50106492"},{"1":"4.271334750","2":"0.73734940"},{"1":"0.053484004","2":"0.32518076"},{"1":"2.130169634","2":"4.66547977"},{"1":"0.475654715","2":"2.11278279"},{"1":"3.343948288","2":"4.97643435"},{"1":"2.403745863","2":"0.12117485"},{"1":"4.250370479","2":"1.66127750"},{"1":"2.589723679","2":"3.71361334"},{"1":"3.706940525","2":"0.47385870"},{"1":"1.437369840","2":"2.54428433"},{"1":"0.030483559","2":"2.50126722"},{"1":"1.261492745","2":"0.34367188"},{"1":"0.113958318","2":"3.84784774"},{"1":"4.227383448","2":"1.27379227"},{"1":"2.403147940","2":"2.88009357"},{"1":"3.876488968","2":"3.06814683"},{"1":"1.686990247","2":"2.94282904"},{"1":"3.581857589","2":"3.10206486"},{"1":"3.726596223","2":"3.67905305"},{"1":"0.728255709","2":"0.36933504"},{"1":"1.356074425","2":"2.58009716"},{"1":"1.890479783","2":"1.62734578"},{"1":"4.880820308","2":"2.60337590"},{"1":"1.031097792","2":"2.27314492"},{"1":"0.810947006","2":"1.41828255"},{"1":"2.659744461","2":"4.99998713"},{"1":"2.399762868","2":"3.09282787"},{"1":"0.549986783","2":"4.82849057"},{"1":"3.700435733","2":"4.06743347"},{"1":"3.896167648","2":"2.25408932"},{"1":"0.989529758","2":"3.26825958"},{"1":"0.861660988","2":"2.72975240"},{"1":"0.044455335","2":"0.04935198"},{"1":"0.016232209","2":"4.72619127"},{"1":"2.674471319","2":"2.57050349"},{"1":"0.167742560","2":"3.85842126"},{"1":"3.165580550","2":"0.34707192"},{"1":"1.431444273","2":"3.18841830"},{"1":"3.275846632","2":"2.40022589"},{"1":"2.836027750","2":"4.57938969"},{"1":"1.822444261","2":"4.06545213"},{"1":"2.984263796","2":"1.07332619"},{"1":"2.304069465","2":"0.15457433"},{"1":"0.078299403","2":"1.08850482"},{"1":"0.240343367","2":"4.76315433"},{"1":"2.487365786","2":"4.58094807"},{"1":"2.161819399","2":"1.17414229"},{"1":"0.732466492","2":"1.60520458"},{"1":"0.482260110","2":"2.17385455"},{"1":"2.868899119","2":"2.59312706"},{"1":"1.638789598","2":"3.50098582"},{"1":"3.461392793","2":"0.13178325"},{"1":"3.650110943","2":"2.76080840"},{"1":"1.838945007","2":"1.65997794"},{"1":"2.203985174","2":"4.13179198"},{"1":"4.686004397","2":"3.31627867"},{"1":"4.673397153","2":"0.52773477"},{"1":"4.728946512","2":"3.70028668"},{"1":"1.510938243","2":"3.51144117"},{"1":"0.868381914","2":"3.50518847"},{"1":"3.654666704","2":"3.05424356"},{"1":"0.476012717","2":"1.34888848"},{"1":"1.689708049","2":"4.45174397"},{"1":"2.024836871","2":"1.30763988"},{"1":"1.437568653","2":"3.64372547"},{"1":"3.768194984","2":"2.38947374"},{"1":"0.649651258","2":"1.31133033"},{"1":"3.730824054","2":"2.59939881"},{"1":"1.462943458","2":"0.35223277"},{"1":"3.497092094","2":"3.58100898"},{"1":"4.898067580","2":"2.69664355"},{"1":"1.138083626","2":"2.11505620"},{"1":"0.932863312","2":"3.46430573"},{"1":"1.261470200","2":"0.88855512"},{"1":"4.366244258","2":"1.50114303"},{"1":"3.853905419","2":"1.38775052"},{"1":"4.208567807","2":"3.69994109"},{"1":"3.991364619","2":"0.46699371"},{"1":"0.702523737","2":"0.79868190"},{"1":"0.441067712","2":"3.58414495"},{"1":"4.077421003","2":"1.87465165"},{"1":"0.058763059","2":"1.70216969"},{"1":"4.467383892","2":"1.14950178"},{"1":"0.544470137","2":"4.14131174"},{"1":"3.167354354","2":"2.84878714"},{"1":"2.529906263","2":"1.00546973"},{"1":"4.727540442","2":"4.44404893"},{"1":"3.144002887","2":"4.96307952"},{"1":"3.351523833","2":"1.44532729"},{"1":"4.320481352","2":"2.61345167"},{"1":"3.135272899","2":"0.64861556"},{"1":"3.656544122","2":"2.90385509"},{"1":"3.626472995","2":"3.81585351"},{"1":"4.586577507","2":"0.25879531"},{"1":"3.994475415","2":"4.45258254"},{"1":"4.158571056","2":"2.54097549"},{"1":"1.971229110","2":"3.98151072"},{"1":"0.879513798","2":"2.51536287"},{"1":"0.376912585","2":"0.48588763"},{"1":"3.985834161","2":"0.35438853"},{"1":"2.996571956","2":"4.15307108"},{"1":"1.247242192","2":"1.69964943"},{"1":"4.094184884","2":"2.61556236"},{"1":"3.908610567","2":"3.03546623"},{"1":"2.776131169","2":"0.37872381"},{"1":"3.469683049","2":"4.79213767"},{"1":"3.318015774","2":"0.95617775"},{"1":"1.204248331","2":"4.92860433"},{"1":"3.681277448","2":"1.74985348"},{"1":"4.046862744","2":"1.70318579"},{"1":"0.782656625","2":"4.28720185"},{"1":"4.953227262","2":"0.79096937"},{"1":"2.809946479","2":"4.59511430"},{"1":"4.477789430","2":"0.11661283"},{"1":"2.359730364","2":"2.30394353"},{"1":"0.833568971","2":"1.56792297"},{"1":"3.678371490","2":"1.30329467"},{"1":"0.727198903","2":"2.91126221"},{"1":"3.183849066","2":"2.83413135"},{"1":"0.287767323","2":"3.70728811"},{"1":"3.339474321","2":"1.86845583"},{"1":"4.803468013","2":"0.78140011"},{"1":"1.788488153","2":"2.07120784"},{"1":"2.373312884","2":"4.25141837"},{"1":"2.674218565","2":"0.33824869"},{"1":"2.122645394","2":"0.52338877"},{"1":"4.816558101","2":"1.29598037"},{"1":"2.942756576","2":"0.45445442"},{"1":"2.839464457","2":"2.86453553"},{"1":"4.227997843","2":"2.62037897"},{"1":"2.738009243","2":"1.26169173"},{"1":"0.694008776","2":"4.76791925"},{"1":"3.836502316","2":"0.84546400"},{"1":"2.454541748","2":"2.00393412"},{"1":"2.872428674","2":"2.32439729"},{"1":"4.276689293","2":"2.16841637"},{"1":"4.343928251","2":"0.02832469"},{"1":"1.221323798","2":"4.92534717"},{"1":"3.266814109","2":"1.26554573"},{"1":"3.391609988","2":"2.92136719"},{"1":"0.378243965","2":"2.46918672"},{"1":"3.398523715","2":"0.73988387"},{"1":"3.989917678","2":"1.21555590"},{"1":"4.906791043","2":"0.09423732"},{"1":"4.143570733","2":"4.91369518"},{"1":"2.474369038","2":"0.27454461"},{"1":"2.697523247","2":"2.36349011"},{"1":"2.301583121","2":"1.60666120"},{"1":"3.809579527","2":"0.98679283"},{"1":"2.193685584","2":"4.93743380"},{"1":"1.855288236","2":"2.87624948"},{"1":"4.741167170","2":"1.71107817"},{"1":"4.544226890","2":"4.09447654"},{"1":"1.873151739","2":"4.60378080"},{"1":"4.790627175","2":"1.49885128"},{"1":"3.365505117","2":"4.65502917"},{"1":"4.532607640","2":"0.05499697"},{"1":"1.452707255","2":"3.97520038"},{"1":"2.827690785","2":"4.78990737"},{"1":"4.429769109","2":"3.69827799"},{"1":"1.519029783","2":"4.57859815"},{"1":"4.605812898","2":"4.91531808"},{"1":"2.608877969","2":"4.22716726"},{"1":"3.496260811","2":"4.26699653"},{"1":"3.703849140","2":"3.56068048"},{"1":"1.979603997","2":"1.48426640"},{"1":"4.231001714","2":"2.66544151"},{"1":"1.103027832","2":"1.92505051"},{"1":"3.924132066","2":"2.47383785"},{"1":"2.775580867","2":"4.60762888"},{"1":"1.591025278","2":"0.33791135"},{"1":"3.710548097","2":"2.47524651"},{"1":"4.591084374","2":"0.26393105"},{"1":"1.674219605","2":"4.99423865"},{"1":"0.285464206","2":"2.46590034"},{"1":"2.089820692","2":"0.83611239"},{"1":"0.136168069","2":"4.18457166"},{"1":"1.212246089","2":"1.25345822"},{"1":"3.002747418","2":"1.25174692"},{"1":"0.642112324","2":"1.59021294"},{"1":"1.518280676","2":"1.98148111"},{"1":"1.900190779","2":"2.23209288"},{"1":"3.724436053","2":"1.83521940"},{"1":"2.885950389","2":"4.37827500"},{"1":"4.512871712","2":"2.65412890"},{"1":"0.965394143","2":"0.58157342"},{"1":"2.787510875","2":"3.32735157"},{"1":"2.864824825","2":"1.68926535"},{"1":"4.599557987","2":"1.55234452"},{"1":"0.905671179","2":"4.65828968"},{"1":"4.754956495","2":"3.27863046"},{"1":"4.285847449","2":"1.35766572"},{"1":"4.001721704","2":"1.55205654"},{"1":"1.905299624","2":"1.73549592"},{"1":"1.651556030","2":"1.17487207"},{"1":"1.591808868","2":"1.14984424"},{"1":"2.099742590","2":"3.54653828"},{"1":"1.747749813","2":"2.64249719"},{"1":"2.880538680","2":"3.83982881"},{"1":"3.201856914","2":"1.12789361"},{"1":"2.163675801","2":"2.66708489"},{"1":"2.622807177","2":"3.83796040"},{"1":"1.918349636","2":"2.70594210"},{"1":"3.709920165","2":"0.80720278"},{"1":"2.467297483","2":"4.76701099"},{"1":"4.868654255","2":"3.93980534"},{"1":"3.026172146","2":"2.29799518"},{"1":"2.556861070","2":"1.41738747"},{"1":"4.369351026","2":"3.21120856"},{"1":"0.991584151","2":"3.36751787"},{"1":"0.620491912","2":"3.96192019"},{"1":"4.610172319","2":"4.15430305"},{"1":"1.916051029","2":"1.78140902"},{"1":"4.080878827","2":"3.53097930"},{"1":"3.303020140","2":"1.37375462"},{"1":"3.092442179","2":"2.93686627"},{"1":"2.011481950","2":"3.37941268"},{"1":"2.646023075","2":"1.53824399"},{"1":"2.211877427","2":"4.12241924"},{"1":"3.632670776","2":"1.45546911"},{"1":"0.051227220","2":"0.03132977"},{"1":"1.781200038","2":"2.57974738"},{"1":"1.467085702","2":"1.09423841"},{"1":"1.379815304","2":"1.17240923"},{"1":"4.708584348","2":"3.89740990"},{"1":"4.877634393","2":"1.50181710"},{"1":"4.264177125","2":"0.11567627"},{"1":"0.158449031","2":"1.32078739"},{"1":"3.641623372","2":"2.03176533"},{"1":"2.357514981","2":"2.88519072"},{"1":"3.715018741","2":"4.87280053"},{"1":"4.995556800","2":"1.85222533"},{"1":"1.109677320","2":"0.97098270"},{"1":"0.968780637","2":"4.51078962"},{"1":"1.734090445","2":"2.34465944"},{"1":"1.543150023","2":"2.11792814"},{"1":"3.281485497","2":"1.52574011"},{"1":"1.090944224","2":"4.50021059"},{"1":"1.945336986","2":"4.74787051"},{"1":"3.449245899","2":"2.31733677"},{"1":"0.961323780","2":"3.93899145"},{"1":"1.250478274","2":"3.34065837"},{"1":"3.668710835","2":"0.83142261"},{"1":"1.093771831","2":"2.37300583"},{"1":"3.673365138","2":"2.46678394"},{"1":"4.354112707","2":"2.11616376"},{"1":"2.404298226","2":"0.76713267"},{"1":"2.229136690","2":"3.54847350"},{"1":"2.960170867","2":"0.40640964"},{"1":"3.443050059","2":"0.70685122"},{"1":"1.126155632","2":"1.43138342"},{"1":"0.040641233","2":"3.59246532"},{"1":"4.207968652","2":"2.58547570"},{"1":"2.152172560","2":"0.56564493"},{"1":"3.883907439","2":"0.73194318"},{"1":"3.313547344","2":"4.82971686"},{"1":"2.132664741","2":"1.49298079"},{"1":"3.999123684","2":"0.80896528"},{"1":"2.963944060","2":"2.03079708"},{"1":"2.348846375","2":"1.81156322"},{"1":"4.747185441","2":"2.26890983"},{"1":"1.859718339","2":"2.27512615"},{"1":"2.498146404","2":"3.38637918"},{"1":"3.272317913","2":"0.95884730"},{"1":"4.081700768","2":"2.39129727"},{"1":"3.846036412","2":"4.19652323"},{"1":"0.406625373","2":"0.43987804"},{"1":"2.892268801","2":"3.76757838"},{"1":"2.303476834","2":"0.26776906"},{"1":"0.649281229","2":"4.21420540"},{"1":"2.910783506","2":"0.62218324"},{"1":"0.732353115","2":"0.52284360"},{"1":"1.685694777","2":"3.35120901"},{"1":"1.255938859","2":"0.53476669"},{"1":"3.263779117","2":"0.93107676"},{"1":"2.872836958","2":"0.58651632"},{"1":"4.099955269","2":"3.73280612"},{"1":"4.568321940","2":"0.68761529"},{"1":"2.699765128","2":"1.33612964"},{"1":"3.404589291","2":"2.29803483"},{"1":"3.302947258","2":"4.25347789"},{"1":"2.769253060","2":"4.37112936"},{"1":"1.617829653","2":"4.33135687"},{"1":"4.525274207","2":"3.62359380"},{"1":"3.641481601","2":"1.98506327"},{"1":"1.880340112","2":"3.97311946"},{"1":"0.912643601","2":"2.22123060"},{"1":"4.459253812","2":"3.39028365"},{"1":"2.229220202","2":"2.39375619"},{"1":"0.057248792","2":"1.84127394"},{"1":"1.461185025","2":"2.12454111"},{"1":"3.761039212","2":"0.99408877"},{"1":"2.911169558","2":"4.32459038"},{"1":"4.022386220","2":"1.43213505"},{"1":"1.631599922","2":"3.67710822"},{"1":"2.715494565","2":"2.85223244"},{"1":"2.802780585","2":"2.96802607"},{"1":"0.542732193","2":"0.61498757"},{"1":"0.707301695","2":"2.85556229"},{"1":"3.056998211","2":"1.63497689"},{"1":"3.804270125","2":"1.40353939"},{"1":"0.601353556","2":"3.61243918"},{"1":"4.589517057","2":"0.18474808"},{"1":"1.710032324","2":"1.70740578"},{"1":"2.999916801","2":"4.03018599"},{"1":"3.260096879","2":"2.39363358"},{"1":"2.604738302","2":"0.99431347"},{"1":"3.943883372","2":"4.80561375"},{"1":"4.400050471","2":"0.47256051"},{"1":"1.799998659","2":"2.74647992"},{"1":"1.777041376","2":"0.83512168"},{"1":"3.368462586","2":"0.91328375"},{"1":"3.412761203","2":"3.05157338"},{"1":"3.188916191","2":"3.49704745"},{"1":"1.062173751","2":"1.79967131"},{"1":"0.770097146","2":"1.53055071"},{"1":"4.282034393","2":"4.69698854"},{"1":"2.966531923","2":"1.55054521"},{"1":"3.305523335","2":"3.05048416"},{"1":"0.502370796","2":"2.09874992"},{"1":"1.050647476","2":"0.18317446"},{"1":"1.672860947","2":"2.44388656"},{"1":"4.870340782","2":"3.18770128"},{"1":"1.503723805","2":"0.56537426"},{"1":"2.550193071","2":"4.35207459"},{"1":"3.922687507","2":"0.86922999"},{"1":"4.037082042","2":"0.12134759"},{"1":"3.920877343","2":"1.11661213"},{"1":"2.182635991","2":"2.15590969"},{"1":"1.519750885","2":"3.90060628"},{"1":"4.122276166","2":"2.83958983"},{"1":"2.285541946","2":"1.12571974"},{"1":"3.182338959","2":"3.90709685"},{"1":"0.490878165","2":"1.04077092"},{"1":"2.417328497","2":"1.06175593"},{"1":"4.146254855","2":"2.96273889"},{"1":"2.772227590","2":"1.30818760"},{"1":"0.692102524","2":"4.27557463"},{"1":"1.380954302","2":"1.29035513"},{"1":"3.532661862","2":"2.78937334"},{"1":"3.779911739","2":"1.58675813"},{"1":"2.031824631","2":"2.89149851"},{"1":"1.474164259","2":"1.89873452"},{"1":"4.874377229","2":"2.49143331"},{"1":"2.574707903","2":"1.01529047"},{"1":"3.072797764","2":"3.00956173"},{"1":"0.712035950","2":"1.85663079"},{"1":"3.820550886","2":"2.82971727"},{"1":"1.667749698","2":"4.80664515"},{"1":"3.618206874","2":"1.13645153"},{"1":"4.077616513","2":"3.87750864"},{"1":"1.923907431","2":"4.94686215"},{"1":"0.341394470","2":"1.74755471"},{"1":"0.111140456","2":"1.20097935"},{"1":"1.298324737","2":"4.43630846"},{"1":"4.229425398","2":"1.13422213"},{"1":"4.685995021","2":"1.82784512"},{"1":"3.849972645","2":"4.85167429"},{"1":"2.085422516","2":"3.51969545"},{"1":"3.986714050","2":"1.91328375"},{"1":"0.484679425","2":"0.55964564"},{"1":"1.131390715","2":"0.55059128"},{"1":"4.703492376","2":"1.76610775"},{"1":"3.514380688","2":"2.18937904"},{"1":"2.619229740","2":"2.92083551"},{"1":"1.075048143","2":"0.82708595"},{"1":"4.114332355","2":"0.97904606"},{"1":"4.502302073","2":"4.08939773"},{"1":"2.015944328","2":"4.89152776"},{"1":"4.772705156","2":"1.92333131"},{"1":"1.774720104","2":"4.20057443"},{"1":"4.431152418","2":"2.94853508"},{"1":"0.782136824","2":"4.09938862"},{"1":"3.283786947","2":"1.90609628"},{"1":"4.416186657","2":"0.76347100"},{"1":"2.229564541","2":"2.26256092"},{"1":"2.914050027","2":"1.06912935"},{"1":"1.030531790","2":"1.65658008"},{"1":"3.372334413","2":"4.24945801"},{"1":"1.418246906","2":"4.60821306"},{"1":"4.134175746","2":"0.68625177"},{"1":"1.268770562","2":"0.33699842"},{"1":"1.420192849","2":"1.46646722"},{"1":"1.719791791","2":"4.81501378"},{"1":"3.552083687","2":"0.78875968"},{"1":"2.412016136","2":"1.37998211"},{"1":"3.922592903","2":"4.64670060"},{"1":"1.329627185","2":"1.98291623"},{"1":"0.411057583","2":"4.38757746"},{"1":"3.364433486","2":"0.38207646"},{"1":"0.930196146","2":"1.26248916"},{"1":"0.074732474","2":"4.82011020"},{"1":"4.196403107","2":"4.92686354"},{"1":"3.222217527","2":"1.97193442"},{"1":"1.743582907","2":"3.41212319"},{"1":"0.818279039","2":"4.48678056"},{"1":"0.339022407","2":"2.15039010"},{"1":"0.819749637","2":"1.58778445"},{"1":"4.057023022","2":"1.77248092"},{"1":"4.313625020","2":"4.71355484"},{"1":"1.338671842","2":"3.86030315"},{"1":"4.575618776","2":"3.19309338"},{"1":"0.980941023","2":"4.15028282"},{"1":"2.597428653","2":"0.47890843"},{"1":"1.625922621","2":"4.28056381"},{"1":"3.134490049","2":"2.07379960"},{"1":"0.532642675","2":"2.22601289"},{"1":"1.951647123","2":"2.21480178"},{"1":"4.117995261","2":"1.84434235"},{"1":"1.101874184","2":"4.58670317"},{"1":"3.481693808","2":"0.93482401"},{"1":"4.001330169","2":"1.37611178"},{"1":"1.770021894","2":"4.04733392"},{"1":"0.349624633","2":"4.70768757"},{"1":"3.648033497","2":"0.48400309"},{"1":"2.713978702","2":"1.82117964"},{"1":"0.393405105","2":"2.66315581"},{"1":"2.118778960","2":"3.35632423"},{"1":"2.501213761","2":"0.03572150"},{"1":"3.251215442","2":"2.08513053"},{"1":"1.903200428","2":"3.06925182"},{"1":"3.440020039","2":"3.96308129"},{"1":"2.977326812","2":"3.11349698"},{"1":"3.185796407","2":"2.03556049"},{"1":"1.745930833","2":"3.95952357"},{"1":"4.445182200","2":"2.73991175"},{"1":"0.813436267","2":"0.51041954"},{"1":"4.229074104","2":"4.38965160"},{"1":"3.021419395","2":"3.03068856"},{"1":"3.929076488","2":"4.78890923"},{"1":"0.523072160","2":"4.04602298"},{"1":"3.521381068","2":"4.66837445"},{"1":"4.539941051","2":"0.75678778"},{"1":"4.744668650","2":"3.59138109"},{"1":"4.514409972","2":"0.60604121"},{"1":"2.673610678","2":"1.59640505"},{"1":"4.006652973","2":"3.97800917"},{"1":"0.996500559","2":"3.20766268"},{"1":"0.810601914","2":"1.14891730"},{"1":"2.872489517","2":"4.97490021"},{"1":"4.685042450","2":"1.77067922"},{"1":"3.920566845","2":"4.82774962"},{"1":"0.933438282","2":"0.57897885"},{"1":"2.755865152","2":"0.02807700"},{"1":"4.941225691","2":"0.63999449"},{"1":"0.399844049","2":"3.41020411"},{"1":"0.870087000","2":"2.07967927"},{"1":"3.618597783","2":"2.78694977"},{"1":"4.230784533","2":"4.60576533"},{"1":"3.083402748","2":"3.16390249"},{"1":"1.798390080","2":"4.17613848"},{"1":"1.088907698","2":"0.04094100"},{"1":"4.483486165","2":"1.99714482"},{"1":"0.685482798","2":"2.30839853"},{"1":"0.786741400","2":"0.76764598"},{"1":"0.579424675","2":"4.25353969"},{"1":"2.477565845","2":"2.25426941"},{"1":"1.901852879","2":"1.08028159"},{"1":"3.231321248","2":"4.83206305"},{"1":"0.094100430","2":"1.62721095"},{"1":"4.089133410","2":"0.57143362"},{"1":"2.640511196","2":"0.41455878"},{"1":"2.170883522","2":"0.13494948"},{"1":"2.454007462","2":"3.72757840"},{"1":"1.250568684","2":"3.12253909"},{"1":"0.007469247","2":"4.26415846"},{"1":"0.747895982","2":"0.59041560"},{"1":"4.621709657","2":"0.94665304"},{"1":"0.377911417","2":"1.82478590"},{"1":"3.585721357","2":"3.90784285"},{"1":"0.640335778","2":"3.81566174"},{"1":"2.079635360","2":"1.93083603"},{"1":"0.662929304","2":"2.33266071"},{"1":"1.853197159","2":"0.21206648"},{"1":"4.829803196","2":"2.23258877"},{"1":"0.213604068","2":"2.99244405"},{"1":"4.484229673","2":"3.52021909"},{"1":"1.900619827","2":"2.29921207"},{"1":"3.894802583","2":"3.05247843"},{"1":"3.287436477","2":"4.98622971"},{"1":"3.845935966","2":"4.54571172"},{"1":"0.625386996","2":"2.65544494"},{"1":"4.619211113","2":"0.52313229"},{"1":"0.937192716","2":"1.48966507"},{"1":"1.540855070","2":"0.98748605"},{"1":"3.994215180","2":"2.66791282"},{"1":"4.121170583","2":"4.45261961"},{"1":"4.769741159","2":"3.29457376"},{"1":"2.231326891","2":"2.46044954"},{"1":"3.103353032","2":"1.13852910"},{"1":"0.598204141","2":"4.42834751"},{"1":"0.472719995","2":"2.93416943"},{"1":"3.565240863","2":"3.04193472"},{"1":"2.031860775","2":"0.78382146"},{"1":"1.194060924","2":"1.73757425"},{"1":"4.692623555","2":"4.17745803"},{"1":"0.862667095","2":"0.56923944"},{"1":"4.420253370","2":"3.90135668"},{"1":"1.344735499","2":"1.10701123"},{"1":"3.078708904","2":"3.48703868"},{"1":"4.599773076","2":"3.06765286"},{"1":"1.436120351","2":"2.82932850"},{"1":"4.950472916","2":"2.66041719"},{"1":"0.225719175","2":"2.40656034"},{"1":"3.537300951","2":"4.76620277"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and then run the exact same code again:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ans =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(d, p1, p2))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ans) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">constr_ok =</span> (A <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> B <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> C <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(constr_ok) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(cost)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p1"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["p2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["A"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["B"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["C"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["cost"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["constr_ok"],"name":[7],"type":["lgl"],"align":["right"]}],"data":[{"1":"3.026172","2":"2.297995","3":"11.37651","4":"5.324167","5":"3.026172","6":"14.40268","7":"TRUE"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and this time we have gotten much closer to the answer that <code>tidyLP</code> gave us: a cost of 14.4 from running process 1 for 3.03 hours and process 2 for 2.30 hours, and we can check that at least enough of each chemical was produced.</p>
<p>Simulation for the win. Or something like that.</p>
</section>
<section id="integer-programming" class="level2">
<h2 class="anchored" data-anchor-id="integer-programming">Integer programming</h2>
<p>The <code>tidyLP</code> package can also handle “integer programming”, where the answers must be integers or a binary “yes/no”. In the package <a href="https://github.com/colin-fraser/tidyLP">README</a>, the author gives an actual practical non-toy example of selecting a fantasy basketball team, given limits on the total salary and requirements for players playing each position and the number that can be from the same NBA team. There are over 300 players to select from, and each player is either in or not in the fantasy team (so it is a binary problem).</p>
<p>Here is a rather artificial integer problem, based on problem 15 in section 9.2 of Winston:<sup>5</sup></p>
<blockquote class="blockquote">
<p>Fruit Computer makes two types of computer. A Pear computer requires 1 hour of labour and 2 chips, and sells for $400; an Apricot computer requires 2 hours of labour and 5 chips, and sells for $900. There are 23 chips and 15 hours of labour available. The company should make at least as many Apricot computers as Pear ones. How many computers of each type should Fruit Computer manufacture to maximize revenue?</p>
</blockquote>
<p>The setup for this is the same as for a regular linear programming problem:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">d_csv <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb21-2"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">computer, labour, chips, selling</span></span>
<span id="cb21-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Pear, 1, 2, 400</span></span>
<span id="cb21-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Apricot, 2, 5, 900</span></span>
<span id="cb21-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span></span>
<span id="cb21-6">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(d_csv)</span>
<span id="cb21-7">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["computer"],"name":[1],"type":["chr"],"align":["left"]},{"label":["labour"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["chips"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["selling"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"Pear","2":"1","3":"2","4":"400"},{"1":"Apricot","2":"2","3":"5","4":"900"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This time, we have a constraint in terms of computers (not just the columns of the dataframe <code>d</code>). The right side of a constraint has to be a number, so we write this as “the number of Apricots made minus the number of Pears made must be greater or equal to zero”. With that in mind, and solving as before, we get:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy_lp</span>(</span>
<span id="cb22-2">  d, </span>
<span id="cb22-3">  selling, </span>
<span id="cb22-4">  labour <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>),</span>
<span id="cb22-5">  chips <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>),</span>
<span id="cb22-6">  (computer <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Apricot"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (computer <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pear"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)</span>
<span id="cb22-7">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lp_solve</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_solution</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> sol</span>
<span id="cb22-10">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(computer, .solution)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["computer"],"name":[1],"type":["chr"],"align":["left"]},{"label":[".solution"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"Pear","2":"3.285714"},{"1":"Apricot","2":"3.285714"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Obviously, we cannot make any fractional computers. To force the answers to be integers, set <code>.all_int</code> to be TRUE:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy_lp</span>(</span>
<span id="cb23-2">  d, </span>
<span id="cb23-3">  selling, </span>
<span id="cb23-4">  labour <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>),</span>
<span id="cb23-5">  chips <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">23</span>),</span>
<span id="cb23-6">  (computer <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Apricot"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (computer <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pear"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>),</span>
<span id="cb23-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.all_int =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span></span>
<span id="cb23-8">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb23-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lp_solve</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb23-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_solution</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> sol</span>
<span id="cb23-11">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(computer, .solution)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["computer"],"name":[1],"type":["chr"],"align":["left"]},{"label":[".solution"],"name":[2],"type":["dbl"],"align":["right"]}],"data":[{"1":"Pear","2":"1"},{"1":"Apricot","2":"4"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The best solution is to make 1 Pear computer and 4 Apricot. Note that this solution is not nearly the same as the decimal solution to this problem rounded off. Making 3 Pears and 3 Apricots uses 9 hours of labour and 21 chips, so it satisfies the constraints, with a revenue of $3900, but…</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb24-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(labour<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>selling, \(x) x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["computer"],"name":[1],"type":["chr"],"align":["left"]},{"label":["labour"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["chips"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["selling"],"name":[4],"type":["dbl"],"align":["right"]},{"label":[".solution"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"Pear","2":"1","3":"2","4":"400","5":"1"},{"1":"Apricot","2":"8","3":"20","4":"3600","5":"4"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">sol <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb25-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(labour<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>selling, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> .solution)))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["labour"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["chips"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["selling"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"9","2":"22","3":"4000"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The revenue is $4000, more than from making 3 computers of each type. This solution uses 22 of the 23 chips and only 9 of the 15 available hours of labour. Requiring the answers to be integers has limited us a good bit.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I have simplified some details, but the problem is the same.↩︎</p></li>
<li id="fn2"><p>The obvious way to set this up is with a <code>tribble</code>, but I always find that I get fed up with typing quotes with these, so the way I use here takes advantage of being able to read in a piece of text as if it were a CSV file.↩︎</p></li>
<li id="fn3"><p>Well, I <em>am</em> a statistician↩︎</p></li>
<li id="fn4"><p>At least, I don’t think it is.↩︎</p></li>
<li id="fn5"><p>I fiddled with the numbers and added an extra constraint.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/tidy-linear-programming/</guid>
  <pubDate>Tue, 30 Dec 2025 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/tidy-linear-programming/Constraint-set-lines-points-number-solutions.gif" medium="image" type="image/gif"/>
</item>
<item>
  <title>Welcome To My Blog</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/welcome/</link>
  <description><![CDATA[ 





<p>My blog is now a Quarto blog. Welcome!</p>
<p>This is where I work:</p>
<p><img src="https://blog.ritsokiguess.site/posts/welcome/OIP.7OcVp7ykJSUaa3pfHoxi-QHaE8.webp" class="img-fluid"></p>
<p>Well, not literally <em>here</em>, but in this building.</p>



 ]]></description>
  <category>news</category>
  <guid>https://blog.ritsokiguess.site/posts/welcome/</guid>
  <pubDate>Sun, 28 Dec 2025 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Assignments, Worksheets and targets</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>In my teaching, I create worksheets and assignments for my students.</p>
<p>These contain a number of questions based on a small number (usually two or three) scenarios. I like each scenario and its associated questions to live in a separate Quarto file (for ease of moving to another worksheet or assignment later, in case I don’t get as far in class as I had anticipated). I also want the worksheets to be available with and without solutions. (My students see a worksheet without solutions at their tutorial, and they get to try it with a TA available for help. After the tutorials are all done for the week, I post the solutions so that the students can see how they did).</p>
<p>This means that I need to navigate several things:</p>
<ol type="1">
<li>creating a scenario and its questions</li>
<li>figuring out how to set up the solutions so that they can be rendered or not</li>
<li>setting up a worksheet with several scenarios and their questions and solutions</li>
<li>creating two versions of the rendered document, one with solutions and one without</li>
<li>making all this happen in Targets, respecting the dependency of each worksheet on its constituent scenario files (and maybe datafiles as well, if they are likely to change).</li>
</ol>
<p>Let’s take these in turn. I’m going to create a baby worksheet called Worksheet 99 which has two scenarios with a couple of questions each, using one familiar and one possibly unfamiliar dataset.</p>
</section>
<section id="two-scenarios" class="level2">
<h2 class="anchored" data-anchor-id="two-scenarios">Two scenarios</h2>
<p>The first scenario is based on the infamous <code>mtcars</code> dataset. I put this in the file <code>motor-trend.qmd</code>, which looks like this:</p>
<pre><code>## Motor Trend cars

In 1974, the *Motor Trend* magazine collected data on fuel consumption
and other features of 32 different makes of car. The data are available
in the built-in dataset `mtcars`. The variables of interest to us are:

- `mpg`: fuel consumption in miles per US gallon
- `cyl`: number of cylinders in the engine
- `wt`: weight of car, in thousands of pounds.

(@) Make a suitable plot of fuel consumption against weight.

(@) Modify your plot to distinguish cars with different numbers of cylinders by colour.</code></pre>
<p>This is not a self-contained Quarto file: it’s Quarto all right, but it’s designed to be included in another file (which it will be, later). In the Quarto documentation, they recommend giving a file like this a name with an underscore on the front, to make sure it doesn’t get rendered by accident (if, for example, the folder is a Quarto project and you render the whole folder). I’m going to control things with Targets, however, so I’m not going to worry about that.</p>
<p>The other notable feature here is how I label the two questions: the <code>(@)</code> on the front, which will auto-number them from 1 upwards in the final worksheet.<sup>1</sup></p>
<p>The second scenario is based on some data on making soap, which lives in <code>soapy.qmd</code>:</p>
<pre><code>## Making soap

A factory makes soap. There are two production lines, `a` and `b`. 
These can be run at different speeds; running the production line faster
produces more soap, but it also produces more scrap (soap that cannot be
sold). Does the amount of scrap differ by production line? Answer the
questions below to find out. The data is in
&lt;https://ritsokiguess.site/datafiles/soap.txt&gt;.

(@) Read in and display some of the data.

(@) Make a suitable plot of the scrap produced and the production line. How do the production lines compare?

(@) Do you get a different story if you include speed in your plot?
</code></pre>
<p>This is structured the same way as the first file, and will have three numbered questions when it is rendered.</p>
</section>
<section id="adding-the-solutions" class="level2">
<h2 class="anchored" data-anchor-id="adding-the-solutions">Adding the solutions</h2>
<p>This, I have to admit, I stole more or less wholesale from <a href="https://nrennie.rbind.io/blog/r-tutorial-worksheets-quarto/">Nicola Rennie</a>, whose blog post you would do well to read. There are two key ideas:</p>
<ul>
<li><a href="https://quarto.org/docs/computations/parameters.html#knitr">Parameterised documents</a></li>
<li><a href="https://quarto.org/docs/authoring/conditional.html">Conditional content</a></li>
</ul>
<p>In the YAML header of a Quarto document, you can have a section called <code>params</code> which supplies some default values for parameters. The example in the Quarto documentation is this:</p>
<pre><code>---
params:
  alpha: 0.1
  ratio: 0.1
---
</code></pre>
<p>which sets default values for the parameters <code>alpha</code> and <code>ratio</code>. You access them in the Quarto document through R like this, in an R code block:</p>
<pre><code>params$alpha</code></pre>
<p>You can supply different values by running <code>quarto render</code> with the <code>-P</code> option, like this:</p>
<pre><code>quarto render myfile.qmd -P alpha:0.2 -P ratio:0.3</code></pre>
<p>and then 0.2 and 0.3 will get passed down into your document.</p>
<p>Wait, you say, <em>what</em> YAML block? Neither of our files even <em>have</em> a YAML block. Well, when we get around to making the worksheet itself out of our two scenarios, we’ll have a proper “main” Quarto document that includes our two files, and not only will <em>that</em> have a YAML header with default parameter values in it, but also they will get passed down into our “child” documents with the scenarios in. The parameter we will use will be called <code>hide_answers</code> and will be either <code>true</code> or <code>false</code>.</p>
<p>All right, now to conditional content. Here’s how you hide some content if you are creating an HTML document (from the Quarto docs):</p>
<pre><code>::: {.content-hidden when-format="html"}

Will not appear in HTML.

:::
</code></pre>
<p>The <code>:::</code> marks the beginning and end of a so-called “div block”. Inside the <code>{}</code> on the top line is a class that the text has (being hidden) and an optional condition when it should be hidden (when the document format is HTML).</p>
<p>That’s all fine and wonderful, but we want to make our content hidden when <em>something in R</em> is true (namely, <code>params$hide_answers</code> is TRUE). The way around this is to use inline R code to produce the top and bottom lines of our div block:</p>
<p><img src="https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/Screenshot from 2024-11-02 00-52-36.png" class="img-fluid"></p>
<p>at the top, and</p>
<p><img src="https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/Screenshot from 2024-11-02 00-54-27.png" class="img-fluid"></p>
<p>at the bottom. The way this works is if <code>params$hide_answers</code> is TRUE, these lines create a div block with the content-hidden class (that is, the text between these two lines is hidden), but if <code>params$hide_answers</code> is FALSE, no div block is created at all, and the text between these two lines is displayed.</p>
<p>Now we have the machinery to add some optionally-displayable text, that is to say, solutions, to our problems. What you do is to add the code that optionally starts the div block at the <em>start</em> of a solution, and the code that optionally ends it at the end. Thus, for example, the Motor Trend question file with solutions<sup>2</sup> looks like this:</p>
<p><img src="https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/Screenshot from 2024-11-02 00-49-51.png" class="img-fluid"></p>
<p>This process for adding solutions to a file of questions really ought to be called Renniefication.</p>
</section>
<section id="making-a-worksheet" class="level2">
<h2 class="anchored" data-anchor-id="making-a-worksheet">Making a worksheet</h2>
<p>Now that we have scenarios, questions and solutions, we can put together our Worksheet 99. This is how it goes together, with some comments below:</p>
<p><img src="https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/Screenshot from 2024-11-02 21-21-14.png" class="img-fluid"></p>
<ul>
<li>In the YAML block at the top:
<ul>
<li>I include <code>df-print: paged</code> to make dataframes (in the solutions) display nicely, and, as a pre-emptive strike, <code>embed_resources: true</code> to make sure my output HTML doesn’t lose any of its graphs if the file gets moved around.</li>
<li><em>This</em> is where my <code>params</code> block goes. I have one parameter here, the <code>hide_answers</code> that I mentioned earlier, which I have set to <code>true</code> here.</li>
</ul></li>
<li>In the diminutive body of this document, I have space for overall instructions, and loading of any packages the worksheet might need.</li>
<li>The separate scenario files are loaded using the <code>include</code> Quarto “shortcode”. I think this is the cleanest way to do it, but you could also use R Markdown style “child documents” here. This works as if the file contents have been literally copied and pasted where the <code>include</code> is, and has the effect that the parameters (that is, the value of <code>hide_answers</code>) are passed down into the included files. When I refer to <code>params$hide_answers</code> inside <code>motor-trend.qmd</code> (as I did above), it uses the right value and correctly includes or excludes the solutions.</li>
</ul>
<p>This is, you might say, the end of the story. You render this file once with <code>hide_answers: true</code> to make a worksheet to give to your students, and later you change <code>true</code> to <code>false</code> to make the solutions for them.</p>
<p>However, there is more human intervention here than you might like. Both the question document and the solutions document will be called <code>worksheet_99.html</code>, and you’ll have to remember (or look) to find out whether it currently contains questions or solutions. It would be nice to make <em>two</em> html files, one with just the questions and the other with solutions as well, each with different names like <code>worksheet_99_q.html</code> and <code>worksheet_99_a.html</code>.</p>
<p>The other thing is how to keep everything up to date. If you change either of the included files, you want to be able to re-render <code>worksheet_99.qmd</code> without having to remember to do so. Veteran Fortran programmers like me would solve this with a Makefile. The R way to do this is to use the <code>targets</code> package, which we discuss shortly.</p>
</section>
<section id="rendering-with-parameters" class="level2">
<h2 class="anchored" data-anchor-id="rendering-with-parameters">Rendering with parameters</h2>
<p>One way to supply parameter values is to put them in <code>params</code> in the YAML block. But you can also supply them on the command line if you render that way, like this:</p>
<pre><code>quarto render worksheet_99.qmd -P hide_answers:true</code></pre>
<p>This puts the questions without solutions into <code>worksheet_99.html</code>. But we can go one step further and set the name of the output file, like this:</p>
<pre><code>quarto render worksheet_99.qmd -P hide_answers:true -o worksheet_99_q.html</code></pre>
<p>Then, to make an HTML file with the solutions as well, you change <code>hide_answers:true</code> to <code>hide_answers:false</code> and change the <code>-o</code> part to <code>-o worksheet_99_a.html</code>.</p>
<p>There is enough repetitive stuff here that I wrote a function to do it:</p>
<pre><code>renderify &lt;- function(fname, ...) {
  ans &lt;- SplitPath(fname)
  qq &lt;- str_c(ans$filename, "_q.html")
  aa &lt;- str_c(ans$filename, "_a.html")
  cmd &lt;- str_c("quarto render ", ans$fullfilename)
  cmd1 &lt;- str_c(cmd, " -P hide_answers:true -o ", qq)
  cmd2 &lt;- str_c(cmd, " -P hide_answers:false -o ", aa)
  olddir &lt;- setwd(ans$dirname)
  system(cmd1)
  system(cmd2)
  setwd(olddir)
  fname
}</code></pre>
<p>I wrote this just <em>before</em> reading <a href="https://blog.djnavarro.net/posts/2024-10-06_fs/">Danielle Navarro’s excellent blog post</a> on the <code>fs</code> package, and now I realize that this would have been a great reason to learn about that package. I had, however, gotten this working using <code>SplitPath</code> from the <code>DescTools</code> package, so this is what you get. Also, I realize, now that I look at the code, that it would have benefitted greatly from using <code>glue::glue</code> rather than <code>str_c</code> from <code>stringr</code>.<sup>3</sup></p>
<p>Girt af.</p>
<p>Anyway: the function takes as input a filename (of a <code>.qmd</code> file) and:</p>
<ul>
<li>splits the input filename up into folder, base filename, and extension.</li>
<li>constructs two output filenames by gluing <code>_q.html</code> and <code>_a.html</code> onto the end of the base filename</li>
<li>constructs the common part of the <code>quarto render</code> command. <code>fullfilename</code> is the base filename plus its extension but <em>not</em> including its folder. This is important for reasons we see in a moment.</li>
<li>constructs the full <code>quarto render</code> commands using the <code>-P</code> and <code>-o</code> options we saw above.</li>
</ul>
<p>So now I run <code>cmd1</code> and <code>cmd2</code> that I so laboriously constructed, right? Not so fast. When you run <code>quarto render</code> from the command line, the file you’re rendering has to be in the <em>same folder</em> that you currently are. This is not usually the case for me: my project has worksheets and assignments in a subfolder <code>assignments</code>, and when I am running <code>targets</code> that is all controlled from the main project folder. So, very carefully, I change to the subfolder the <code>.qmd</code> is in (saving my previous folder to go back to later), <em>then</em> run my commands, <em>then</em> go back to the folder I was in.</p>
<p>I hope in this way I am safe from <a href="https://github.com/jennybc/here_here">having my computer set on fire</a>, although I could undoubtedly stand to learn about the <code>here</code> package too.</p>
</section>
<section id="doing-this-in-targets" class="level2">
<h2 class="anchored" data-anchor-id="doing-this-in-targets">Doing this in <code>targets</code></h2>
<p>The (very brief) idea behind <code>targets</code> is that certain of your files (like documents) depend on certain other things (included files, here, or functions or datasets in general). The <a href="https://books.ropensci.org/targets/walkthrough.html">Targets book</a> has a great intro walkthrough. What you do is to create “targets”, and then have functions that express how targets depend on each other. The definition of the targets lives in a file <code>_targets.R</code> in the project folder. Here is the relevant bit of that:</p>
<pre><code>worksheet99 &lt;- list(
  tar_target(worksheet_99_file, "assignments/worksheet_99.qmd", format = "file"),
  tar_target(motor_trend, "assignments/motor-trend.qmd", format = "file"),
  tar_target(soapy, "assignments/soapy.qmd", format = "file"),
  tar_target(worksheet_99, renderify("assignments/worksheet_99.qmd",
                                      worksheet_99_file,
                                      motor_trend,
                                      soapy))
)</code></pre>
<p>My function <code>renderify</code> is in a file <code>R/functions.R</code> which has been <code>source</code>d earlier in <code>_targets.R</code>.</p>
<p>First, I create targets for each of my three files: the two question files, and the main worksheet file. Inside <code>tar_target</code>, the first thing is the name of the target you’re making, the second is where the file lives, and the third thing is <code>format = "file"</code>. Then, the last target is a function call. As we have seen, <code>renderify</code> creates the two output files for the worksheet, but also serves the double duty of enforcing the dependency of the final worksheet on the targets made from the other three files. Targets knows this because those other three targets are also input to <code>renderify</code>, so if any of those three targets have changed (meaning, any of the files from which those targets are made), the whole worksheet will be re-rendered. The relevant part of <code>targets::tar_visnetwork</code> shows this: target <code>worksheet_99</code> depends on targets <code>worksheet_99_file</code>, <code>motor_trend</code>, and <code>soapy</code>.</p>
<p>The project from which the above was taken has targets for a whole bunch of worksheets, assignments, tests etc. If you look at <code>targets</code> examples, you will usually see, at the end of the file, a <code>list()</code> that defines every single one of the targets. But for me, this was getting out of hand, so I defined and saved a separate <code>list()</code> for each worksheet or assignment, and then at the end of my <code>_targets.R</code>, I glue them all together into one list with some code like this:</p>
<pre><code>c(worksheet1, worksheet2, worksheet3, worksheet4, 
  worksheet5, worksheet6, worksheet7, worksheet8, 
  worksheet9, worksheet99)</code></pre>
<p>Now, I can run <code>targets::tar_make()</code> and my worksheet will be re-rendered (twice, to get the two output files) if and only if any of the files making it up have changed.</p>
<p>You might be thinking that <code>renderify</code> doesn’t <em>need</em> those other inputs, and you would be quite right: the only thing the function uses is the filename that is the first input. I used <code>...</code> in my function code to allow arbitrary many other inputs, and these are used <em>only</em> to create the dependency, so that Targets knows what depends on what. This is the cleanest way I could think of.</p>
</section>
<section id="results" class="level2">
<h2 class="anchored" data-anchor-id="results">Results</h2>
<p>So maybe now you want to see the result of all of this:</p>
<ul>
<li><a href="https://ritsokiguess.site/datafiles/worksheet_99_q.html">questions file</a></li>
<li><a href="https://ritsokiguess.site/datafiles/worksheet_99_a.html">solutions file</a></li>
</ul>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I would actually prefer to have these questions numbered 1(a) and 1(b), but I haven’t figured out how to control this numbering in the way you can in LaTeX.↩︎</p></li>
<li id="fn2"><p>Much briefer than my usual solutions, it has to be said.↩︎</p></li>
<li id="fn3"><p>Which is really just <code>paste0</code>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <guid>https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/</guid>
  <pubDate>Sat, 02 Nov 2024 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/assignments-worksheets-and-targets/Screenshot 2025-12-28 at 13-18-13 Data science workflows with the Targets package in R End-to-end example with code Towards Data Science.png" medium="image" type="image/png" height="76" width="144"/>
</item>
<item>
  <title>Honestly Significant Differences</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/honestly-significant-differences/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>In running an analysis of variance, the standard <img src="https://latex.codecogs.com/png.latex?F">-test is, by itself, not very helpful.</p>
<p>You are testing a null hypothesis that all the treatment groups have the same mean, against an alternative that the null is not true. You reject this null, and conclude… what? All you can say at this point is that not all the groups have the same mean: this tells you nothing about which groups differ from which.</p>
<p>To learn more, standard procedure is to run some kind of followup. One way is to compare each pair of groups with a two-sample <img src="https://latex.codecogs.com/png.latex?t">-test. A problem with that is if you have <img src="https://latex.codecogs.com/png.latex?k"> groups, you have <img src="https://latex.codecogs.com/png.latex?k(k-1)/2"> pairs of groups, so you have that many two-sample <img src="https://latex.codecogs.com/png.latex?t">-tests all run at once, and you need to do something about the multiple testing. How can you avoid that?</p>
</section>
<section id="an-example" class="level2">
<h2 class="anchored" data-anchor-id="an-example">An example</h2>
<p>One of the benefits of exercise is that it stresses bones and makes them stronger. Researchers at Purdue did a study in which they randomly assigned rats to one of three exercise groups (“high jumping”, “low jumping” and a control group that was not made to do any jumping). The rats were made to do 10 jumps a day, 5 days a week, for 8 weeks, and at the end of this time each rat’s bone density was measured. Did the amount of exercising affect the bone density, and if so, how?</p>
<p>A starting point is to read in the data and make a graph. With one quantitative and one categorical variable, the right kind of graph is a boxplot:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/jumping.txt"</span></span>
<span id="cb3-2">rats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_delim</span>(my_url,<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 30 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: " "
chr (1): group
dbl (1): density

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">rats</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 30 × 2
   group   density
   &lt;chr&gt;     &lt;dbl&gt;
 1 Control     611
 2 Control     621
 3 Control     614
 4 Control     593
 5 Control     593
 6 Control     653
 7 Control     600
 8 Control     554
 9 Control     603
10 Control     569
# ℹ 20 more rows</code></pre>
</div>
</div>
<p>and then</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(rats, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>density, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_inorder</span>(group))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/honestly-significant-differences/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The boxplot shows that the bone density is much higher on average for the high-jumping rats than for the others; there seems to be not much difference between the control rats and the ones doing low jumping.</p>
<p>An annoying detail is that <code>ggplot</code> will put the groups in alphabetical (= nonsensical) order unless you stop it from doing so. In the data read in from the file, the jumping groups are in a sensible order, so I can use <code>fct_inorder</code> from <code>forcats</code> to arrange the <code>group</code> categories in the order they appear in the data.</p>
<p>Under the standard assumptions for analysis of variance (which we don’t assess in this post), the ANOVA <img src="https://latex.codecogs.com/png.latex?F">-test is obtained this way:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">rats<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span> <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aov</span>(density <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> group, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> rats)</span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(rats<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>            Df Sum Sq Mean Sq F value Pr(&gt;F)   
group        2   7434    3717   7.978 0.0019 **
Residuals   27  12579     466                  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1</code></pre>
</div>
</div>
<p>The null hypothesis, which says that all three groups have the same mean bone density, is rejected. So, not all the groups have the same mean. But that’s <em>all</em> we learn here. We learn nothing about which groups differ from which, which is what we really want to know.</p>
</section>
<section id="some-math" class="level2">
<h2 class="anchored" data-anchor-id="some-math">Some math</h2>
<p>Let’s see how far we can get with math. Let’s assume the null hypothesis is true (that all of our <img src="https://latex.codecogs.com/png.latex?k"> groups have the same mean <img src="https://latex.codecogs.com/png.latex?%5Cmu">), and we’ll also assume that all the observations have a normal distribution with the same variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2"> (a standard assumption of ANOVA), and, to make life easier, that all the groups have the same number of observations <img src="https://latex.codecogs.com/png.latex?n">.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?Y_%7Bij%7D"> denote the <img src="https://latex.codecogs.com/png.latex?j">th observation in group <img src="https://latex.codecogs.com/png.latex?i">. Then, each <img src="https://latex.codecogs.com/png.latex?Y_%7Bij%7D"> under our assumptions has independently a normal distribution with mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2">. Hence, each group’s sample mean <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BY%7D_%7Bi%7D"> has a normal sampling distribution with mean <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2%20/%20n">. (Having each group be the same size makes these variances the same.)</p>
<p>Tukey’s idea was: one of the sample means will be largest, just by chance, and one of them will be smallest, just by chance. What is the distribution of the largest one minus the smallest one? This leans on a known result: if <img src="https://latex.codecogs.com/png.latex?X_1,%20%5Cldots%20X_k"> are independently normal with the same distribution, the distribution of the largest one minus the smallest one, scaled by an estimate of spread, has what is called a <a href="https://en.wikipedia.org/wiki/Studentized_range"><em>studentized range distribution</em></a>, for which (as we used to say in the old days) tables are available: <img src="https://latex.codecogs.com/png.latex?(X_%7Bmax%7D%20-%20X_%7Bmin%7D)/s"> has a studentized range distribution, which depends on <img src="https://latex.codecogs.com/png.latex?k"> and the degrees of freedom in <img src="https://latex.codecogs.com/png.latex?s">.</p>
<p>Now, since the <img src="https://latex.codecogs.com/png.latex?%5Cbar%7BY%7D_%7Bi%7D"> are also normal, it follows that <img src="https://latex.codecogs.com/png.latex?(%5Cbar%7BY%7D_%7Bmax%7D%20-%5Cbar%7BY%7D_%7Bmin%7D)%20/%20(s%20/%20%5Csqrt%7Bn%7D)"> also has a studentized range distribution, where <img src="https://latex.codecogs.com/png.latex?s"> in this case is the square root of a pooled estimate of variance, which is just the average of the within-group variances (because the sample size within each group is the same). To make this work, Tukey said that if you take the upper 5% point of this distribution, and scale it properly, you can say that any group means will rarely differ by this much if the null hypothesis is true, and thus that any two group means that <em>do</em> differ by more than this are significantly different.</p>
<p>The value of doing this is that you are only doing one test, based on how far apart the largest and smallest sample means might be, and applying that to <em>all</em> pairs of groups, so that you avoid the multiple testing problem of doing all possible two-sample <img src="https://latex.codecogs.com/png.latex?t">-tests.</p>
<p>So, say we have three groups with 10 observations in each (as in our jumping rats data). The upper 5% point of the appropriate Studentized range distribution is</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qtukey</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">nmeans =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span>)</span>
<span id="cb10-2">q</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3.506426</code></pre>
</div>
</div>
<p>then we multiply that by the square root of the error mean square divided by its df, with an extra factor of 2 that I am not sure about:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> q <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">466</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb12-2">w</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 20.60112</code></pre>
</div>
</div>
<p>and we say that any sample means differing by more than that are significantly different:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">rats <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(group) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_density =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(density))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3 × 2
  group    mean_density
  &lt;chr&gt;           &lt;dbl&gt;
1 Control          601.
2 Highjump         639.
3 Lowjump          612.</code></pre>
</div>
</div>
<p>Control and Lowjump are not far enough apart to be significantly different, but the two comparisons with Highjump are both significant.</p>
<p>In practice, we would use <code>TukeyHSD</code> which does all of that for us:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">TukeyHSD</span>(rats<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = density ~ group, data = rats)

$group
                  diff       lwr       upr     p adj
Highjump-Control  37.6  13.66604 61.533957 0.0016388
Lowjump-Control   11.4 -12.53396 35.333957 0.4744032
Lowjump-Highjump -26.2 -50.13396 -2.266043 0.0297843</code></pre>
</div>
</div>
<p>with the same results.</p>
</section>
<section id="some-simulation" class="level2">
<h2 class="anchored" data-anchor-id="some-simulation">Some simulation</h2>
<p>Can we simulate the distribution of the difference between the highest and lowest sample means of our three groups? We have to set this up so that the null hypothesis is true, so that all three simulations come from groups with the same mean. It doesn’t matter what mean we use, but we may as well use the overall mean of our data:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">rats <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">grand_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(density)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> grand_mean</span>
<span id="cb18-3">grand_mean</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 617.4333</code></pre>
</div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">gp_sd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">466</span>)</span>
<span id="cb20-2">gp_sd</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 21.58703</code></pre>
</div>
</div>
<p>so we simulate three groups of 10 observations from a normal distribution with mean 617.4333 and SD 21.5870, and then compare the highest mean with the lowest one. I’m putting this into a function because I’m going to build a simulation around it. The steps are:</p>
<ul>
<li>set up for as many samples as I want</li>
<li>work rowwise</li>
<li>draw a random sample of the right size with the right mean and SD in each row</li>
<li>work out the mean of each sample</li>
<li>stop working rowwise</li>
<li>find the largest and smallest of the simulated sample means, and the difference between them</li>
<li>return that difference as a number:</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">ksample <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(nobs, nsample, mu, sigma) {</span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>nsample) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(nobs, mu, sigma))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(my_sample)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mn =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(my_mean), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mx =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(my_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rnge =</span> mx <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> mn) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(rnge)</span>
<span id="cb22-10">}</span></code></pre></div></div>
</div>
<p>Does it work?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ksample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, grand_mean, gp_sd)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 16.44373</code></pre>
</div>
</div>
<p>Well, I guess that looks all right. So now let’s do this many times:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">457299</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rnge =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ksample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, grand_mean, gp_sd)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> ranges</span>
<span id="cb26-4">ranges</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1,000 × 2
# Rowwise: 
     sim  rnge
   &lt;int&gt; &lt;dbl&gt;
 1     1  7.73
 2     2 25.1 
 3     3  3.45
 4     4 13.2 
 5     5 17.5 
 6     6 22.9 
 7     7 12.6 
 8     8 12.9 
 9     9 12.0 
10    10  2.92
# ℹ 990 more rows</code></pre>
</div>
</div>
<p>The differences between the highest sample mean and the lowest one seem all over the place, but so it is:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(ranges, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> rnge)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/honestly-significant-differences/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Skewed to the right, with a lower limit of zero.</p>
<p>So now, let’s compare this null distribution with our actual data:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">rats <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(group) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_density =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(density))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 3 × 2
  group    mean_density
  &lt;chr&gt;           &lt;dbl&gt;
1 Control          601.
2 Highjump         639.
3 Lowjump          612.</code></pre>
</div>
</div>
<p>We can get a Tukey-style P-value by taking each difference in means, and seeing how many of our simulated max mean minus min mean exceed that:</p>
<ul>
<li>control vs highjump, difference is 37.6:</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">ranges <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(rnge <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">37.6</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 2
# Rowwise: 
  `rnge &gt;= 37.6`     n
  &lt;lgl&gt;          &lt;int&gt;
1 FALSE           1000</code></pre>
</div>
</div>
<ul>
<li>control vs lowjump, difference is 11.4:</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1">ranges <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(rnge <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">11.4</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
# Rowwise: 
  `rnge &gt;= 11.4`     n
  &lt;lgl&gt;          &lt;int&gt;
1 FALSE            544
2 TRUE             456</code></pre>
</div>
</div>
<p>highjump vs lowjump, difference is 26.2:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">ranges <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(rnge <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">26.2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
# Rowwise: 
  `rnge &gt;= 26.2`     n
  &lt;lgl&gt;          &lt;int&gt;
1 FALSE            985
2 TRUE              15</code></pre>
</div>
</div>
<p>With P-values</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">TukeyHSD</span>(rats<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = density ~ group, data = rats)

$group
                  diff       lwr       upr     p adj
Highjump-Control  37.6  13.66604 61.533957 0.0016388
Lowjump-Control   11.4 -12.53396 35.333957 0.4744032
Lowjump-Highjump -26.2 -50.13396 -2.266043 0.0297843</code></pre>
</div>
</div>
<p>Our simulated P-values were respectively 0, 0.456, and 0.015, which are very much consistent with the actual ones from Tukey’s procedure.</p>
<p>If you wanted to do this the way we used to do it, rather than getting P-values, you get a critical value as the 95th percentile of the simulated mean differences:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">ranges <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb39-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">q =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(rnge, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 1
      q
  &lt;dbl&gt;
1  22.5</code></pre>
</div>
</div>
<p>and then you’d say that any means that differed by more than 22.5 were significantly different (the two comparisons involving <code>Highjump</code>) and any differing by less than that were not (<code>Lowjump</code> vs <code>Control</code>).</p>


</section>

 ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/honestly-significant-differences/</guid>
  <pubDate>Thu, 29 Feb 2024 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/honestly-significant-differences/Screenshot 2025-12-28 at 13-29-10 Example of One-Way ANOVA - Minitab.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Looking In on Purrr 1.0</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/looking-in-on-purrr-10/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>A brief look at some of what’s new in Purrr 1.0.</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for purrr, the magrittr pipe, and crossing from tidyr</span></span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="square-roots-and-logs-with-map" class="level2">
<h2 class="anchored" data-anchor-id="square-roots-and-logs-with-map">Square roots (and logs) with <code>map</code></h2>
<section id="introduction-1" class="level3">
<h3 class="anchored" data-anchor-id="introduction-1">Introduction</h3>
<p>The square root function is vectorized:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
 [9] 3.000000 3.162278</code></pre>
</div>
</div>
<p>so let’s make ourselves work harder by defining one that is not:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">sqrt1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(x[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1</code></pre>
</div>
</div>
<p>How can we use <code>sqrt1</code> to calculate the square roots of all of the numbers 1 through 10? This is what <code>map</code> and friends from <code>purrr</code> are for.</p>
<p>There now three ways to use <code>map</code>.</p>
</section>
<section id="method-1-the-original-way" class="level3">
<h3 class="anchored" data-anchor-id="method-1-the-original-way">Method 1: the original way</h3>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(sqrt1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
 [9] 3.000000 3.162278</code></pre>
</div>
</div>
<p>I never liked this because the thing I was for-eaching over had to be the first input of the function, and then you have to add arguments after the first one separately. For example, if you want base 10 logs<sup>1</sup> of a bunch of numbers:<sup>2</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(log, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980
 [8] 0.9030900 0.9542425 1.0000000</code></pre>
</div>
</div>
<p>These examples use <code>map_dbl</code> because <code>sqrt1</code> and <code>log</code> return a decimal number or <code>dbl</code>.</p>
<p>This approach would be awkward if you wanted to compute, let’s say, the log of 10 to different bases:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">log_base <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, x)</span>
<span id="cb11-2">base <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># the second one is e</span></span>
<span id="cb11-3">base <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(log_base)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3.321928 2.302585 1.000000</code></pre>
</div>
</div>
<p>I had to define a helper function with the thing to be for-eached over as its first argument.</p>
<p>Historically, this notation comes from the <code>apply</code> family of functions. In this case:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sapply</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, log, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980
 [8] 0.9030900 0.9542425 1.0000000</code></pre>
</div>
</div>
</section>
<section id="method-2-lambda-functions" class="level3">
<h3 class="anchored" data-anchor-id="method-2-lambda-functions">Method 2: lambda functions</h3>
<p>Second, the way I came to prefer (which I will now have to unlearn, see below) is this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(.))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
 [9] 3.000000 3.162278</code></pre>
</div>
</div>
<p>I would read this to myself in English as “for each thing in 1 through 10, work out the square root of it”, where <code>~</code> was read as “work out” and <code>.</code> (or <code>.x</code> if you prefer) was read as “it”.</p>
<p>You can also create a new column of a dataframe this way:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">root =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(x, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(.)))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
       x  root
   &lt;int&gt; &lt;dbl&gt;
 1     1  1   
 2     2  1.41
 3     3  1.73
 4     4  2   
 5     5  2.24
 6     6  2.45
 7     7  2.65
 8     8  2.83
 9     9  3   
10    10  3.16</code></pre>
</div>
</div>
<p>This is a little odd, for learners, because the thing inside the <code>sqrt1</code> is crying out to be called <code>x</code>. I still think this is all right: “for each thing in <code>x</code>, work out the square root of it”, in the same way that you would use <code>i</code> as a loop index in a for loop.</p>
<p>The log examples both work more smoothly this way:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(., <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 0.0000000 0.3010300 0.4771213 0.6020600 0.6989700 0.7781513 0.8450980
 [8] 0.9030900 0.9542425 1.0000000</code></pre>
</div>
</div>
<p>and</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">base</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  2.000000  2.718282 10.000000</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">base <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, .))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3.321928 2.302585 1.000000</code></pre>
</div>
</div>
<p>without the need to handle additional inputs specially, and without the requirement to have the “it” be the first input to the function. The call to the function looks exactly the same as it does when you call it outside a <code>map</code>, which makes it easier to learn.</p>
</section>
<section id="method-3-anonymous-functions" class="level3">
<h3 class="anchored" data-anchor-id="method-3-anonymous-functions">Method 3: anonymous functions</h3>
<p>A third way of specifying what to “work out” is to use the new (to R 4.0) concept of an “anonymous function”: a function, typically a one-liner, defined inline without a name. This is how it goes:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(\(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
 [9] 3.000000 3.162278</code></pre>
</div>
</div>
<p>This one, to my mind, is not any clearer than the “work out” notation with a squiggle, though you can still cast your eyes over it and read “for each thing in 1 through 10, work out the square root of it” with a bit of practice.</p>
<p>This notation wins where the input things have names:<sup>3</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1">number <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span>
<span id="cb27-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(number, \(number) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(number))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427
 [9] 3.000000 3.162278</code></pre>
</div>
</div>
<p>And thus also in defining new columns of a dataframe:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">root =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(x, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(x)))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
       x  root
   &lt;int&gt; &lt;dbl&gt;
 1     1  1   
 2     2  1.41
 3     3  1.73
 4     4  2   
 5     5  2.24
 6     6  2.45
 7     7  2.65
 8     8  2.83
 9     9  3   
10    10  3.16</code></pre>
</div>
</div>
<p>The clarity comes from the ability to use the name of the input column also as the name of the input to the anonymous function, so that everything joins up: “for each thing in <code>x</code>, work out the square root of that <code>x</code>”.<sup>4</sup></p>
<p>This also works if you are for-eaching over two columns, for example working out logs of different numbers to different bases:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span></span>
<span id="cb31-2">base</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  2.000000  2.718282 10.000000</code></pre>
</div>
</div>
<p><code>crossing</code> (from <code>tidyr</code>) makes a dataframe out of all combinations of its inputs, and so:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">crossing</span>(x, base) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb33-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">log_of =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(x, base, \(x, base) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(x, base)))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 9 × 3
      x  base log_of
  &lt;int&gt; &lt;dbl&gt;  &lt;dbl&gt;
1     2  2     1    
2     2  2.72  0.693
3     2 10     0.301
4     3  2     1.58 
5     3  2.72  1.10 
6     3 10     0.477
7     4  2     2    
8     4  2.72  1.39 
9     4 10     0.602</code></pre>
</div>
</div>
<p>This doesn’t only apply to making dataframe columns, but again works nicely any time the input things have names:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">u <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb35-2">v <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span></span>
<span id="cb35-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map2_dbl</span>(u, v, \(u, v) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(u<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span>v))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3.464102 3.741657 4.000000 4.242641 4.472136</code></pre>
</div>
</div>
</section>
</section>
<section id="collatz" class="level2">
<h2 class="anchored" data-anchor-id="collatz">Collatz</h2>
<p>When I am teaching this stuff, I say that if the thing you are working out is complicated, write a function to do that first, and <em>then</em> worry about for-eaching it. For example, imagine you want a function that takes an integer as input, and the output is:</p>
<ul>
<li>if the input is even, half the input</li>
<li>if the input is odd, three times the input plus one</li>
</ul>
<p>This is a bit long to put in the anonymous function of a <code>map</code>, so we’ll define a function <code>hotpo</code> to do it first:<sup>5</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1">hotpo <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb37-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stopifnot</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(x)) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># error out if input is not an integer</span></span>
<span id="cb37-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> (x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) {</span>
<span id="cb37-4">    ans <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%/%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span></span>
<span id="cb37-5">  } <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> {</span>
<span id="cb37-6">    ans <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb37-7">  }</span>
<span id="cb37-8">  ans</span>
<span id="cb37-9">}</span>
<span id="cb37-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 10</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5.6</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error in hotpo(5.6): x == round(x) is not TRUE</code></pre>
</div>
</div>
<p>So now, we can use a <code>map</code> to work out <code>hotpo</code> of each of the numbers 1 through 6:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">first <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span></span>
<span id="cb43-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(first, hotpo)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  4  1 10  2 16  3</code></pre>
</div>
</div>
<p>or</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(first, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(.))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  4  1 10  2 16  3</code></pre>
</div>
</div>
<p>or</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(first, \(first) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(first))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  4  1 10  2 16  3</code></pre>
</div>
</div>
<p>where we call our function in the anonymous function. The answer is the same any of these ways, and you can reasonably argue that the last one is the clearest because the inputs to the <code>map_int</code> and the function have the same name.</p>
<p>This one is <code>map_int</code> because <code>hotpo</code> returns an integer.</p>
<p>This function is actually more than a random function defined on integers; it is part of an open problem in number theory called the <a href="https://www.quantamagazine.org/why-mathematicians-still-cant-solve-the-collatz-conjecture-20200922/">Collatz conjecture</a>. The idea is if you do this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1"><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 10</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb51-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 5</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 16</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 8</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb57" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb57-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>))))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)))))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb61-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>))))))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1</code></pre>
</div>
</div>
<p>you obtain a sequence of integers. If you ever get to 1, you’ll go back to 4, 2, 1, and loop forever, so we’ll say the sequence ends if it gets to 1. The Collatz conjecture says that, no matter where you start, you will <em>always</em> get to 1.<sup>6</sup></p>
<p>Let’s assume that we <em>are</em> going to get to 1, and write a function to generate the whole sequence. The two key ingredients are: the <code>hotpo</code> function we wrote, and a <code>while</code> loop to keep going until we do get to 1:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb63" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb63-1">hotpo_seq <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb63-2">  ans <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> x</span>
<span id="cb63-3">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">while</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb63-4">    x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo</span>(x)</span>
<span id="cb63-5">    ans <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(ans, x)</span>
<span id="cb63-6">  }</span>
<span id="cb63-7">  ans</span>
<span id="cb63-8">}</span></code></pre></div></div>
</div>
<p>and test it:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb64" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb64-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo_seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 10  5 16  8  4  2  1</code></pre>
</div>
</div>
<p>the same short ride that we had above, and a rather longer one:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb66" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb66-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo_seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">27</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  [1]   27   82   41  124   62   31   94   47  142   71  214  107  322  161  484
 [16]  242  121  364  182   91  274  137  412  206  103  310  155  466  233  700
 [31]  350  175  526  263  790  395 1186  593 1780  890  445 1336  668  334  167
 [46]  502  251  754  377 1132  566  283  850  425 1276  638  319  958  479 1438
 [61]  719 2158 1079 3238 1619 4858 2429 7288 3644 1822  911 2734 1367 4102 2051
 [76] 6154 3077 9232 4616 2308 1154  577 1732  866  433 1300  650  325  976  488
 [91]  244  122   61  184   92   46   23   70   35  106   53  160   80   40   20
[106]   10    5   16    8    4    2    1</code></pre>
</div>
</div>
<p>Now, let’s suppose that we want to make a dataframe with the sequences for the starting points 1 through 10. The sequence is a vector rather than an integer, so that we need to do this with <code>map</code>:<sup>7</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb68" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb68-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb68-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sequence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(start, \(start) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo_seq</span>(start)))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
   start sequence  
   &lt;int&gt; &lt;list&gt;    
 1     1 &lt;int [1]&gt; 
 2     2 &lt;dbl [2]&gt; 
 3     3 &lt;dbl [8]&gt; 
 4     4 &lt;dbl [3]&gt; 
 5     5 &lt;dbl [6]&gt; 
 6     6 &lt;dbl [9]&gt; 
 7     7 &lt;dbl [17]&gt;
 8     8 &lt;dbl [4]&gt; 
 9     9 &lt;dbl [20]&gt;
10    10 &lt;dbl [7]&gt; </code></pre>
</div>
</div>
<p>and we have made a list-column. You can see by the lengths of the vectors in the list-column how long each sequence is.<sup>8</sup> We might want to make explicit how long each sequence is, and how high it goes:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb70" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb70-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">start =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sequence =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map</span>(start, \(start) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo_seq</span>(start))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seq_len =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(sequence, \(sequence) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(sequence))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb70-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">seq_max =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_int</span>(sequence, \(sequence) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(sequence))) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> seq_info</span>
<span id="cb70-5">seq_info</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 4
   start sequence   seq_len seq_max
   &lt;int&gt; &lt;list&gt;       &lt;int&gt;   &lt;int&gt;
 1     1 &lt;int [1]&gt;        1       1
 2     2 &lt;dbl [2]&gt;        2       2
 3     3 &lt;dbl [8]&gt;        8      16
 4     4 &lt;dbl [3]&gt;        3       4
 5     5 &lt;dbl [6]&gt;        6      16
 6     6 &lt;dbl [9]&gt;        9      16
 7     7 &lt;dbl [17]&gt;      17      52
 8     8 &lt;dbl [4]&gt;        4       8
 9     9 &lt;dbl [20]&gt;      20      52
10    10 &lt;dbl [7]&gt;        7      16</code></pre>
</div>
</div>
<p>To verify for a starting point of 7:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb72" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb72-1">q <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hotpo_seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)</span>
<span id="cb72-2">q</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1]  7 22 11 34 17 52 26 13 40 20 10  5 16  8  4  2  1</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb74" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb74-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(q)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 17</code></pre>
</div>
</div>
<p>This does indeed have a length of 17 and goes up as high as 52 before coming back down to 1.</p>
</section>
<section id="keeping-and-discarding-by-name" class="level2">
<h2 class="anchored" data-anchor-id="keeping-and-discarding-by-name">Keeping and discarding by name</h2>
<p>We don’t have to make a dataframe of these (though that, these days, is usually my preferred way of working). We can instead put the sequences in a <code>list</code>. This one is a “named list”, with each sequence paired with its starting point (its “name”):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb76" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb76-1">seq_list <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> seq_info<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sequence</span>
<span id="cb76-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(seq_list) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> seq_info<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>start</span>
<span id="cb76-3">seq_list</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>$`1`
[1] 1

$`2`
[1] 2 1

$`3`
[1]  3 10  5 16  8  4  2  1

$`4`
[1] 4 2 1

$`5`
[1]  5 16  8  4  2  1

$`6`
[1]  6  3 10  5 16  8  4  2  1

$`7`
 [1]  7 22 11 34 17 52 26 13 40 20 10  5 16  8  4  2  1

$`8`
[1] 8 4 2 1

$`9`
 [1]  9 28 14  7 22 11 34 17 52 26 13 40 20 10  5 16  8  4  2  1

$`10`
[1] 10  5 16  8  4  2  1</code></pre>
</div>
</div>
<p>If these were in a dataframe as above, a <code>filter</code> would pick out the sequences for particular starting points. As an example, we will pick out the sequences for odd-numbered starting points. Here, this allows us to learn about the new <code>keep_at</code> and <code>discard_at</code>. There is already <code>keep</code> and <code>discard</code>,<sup>9</sup> for selecting by value, but the new ones allow selecting by name.</p>
<p>There are different ways to use <code>keep_at</code>, but one is to write a function that accepts a name and returns <code>TRUE</code> if that is one of the names you want to keep. Mine is below. The names are text, so I convert the name to an integer and then test it for oddness as we did in <code>hotpo</code>:</p>
<p>keep the sequences for odd-numbered starting points</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb78" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb78-1">is_odd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb78-2">  x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.integer</span>(x)</span>
<span id="cb78-3">  x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb78-4">}</span>
<span id="cb78-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_odd</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] TRUE</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb80" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb80-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_odd</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] FALSE</code></pre>
</div>
</div>
<p>and now I keep the sequences that have odd starting points thus:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb82" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb82-1">seq_list <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb82-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">keep_at</span>(\(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_odd</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>$`1`
[1] 1

$`3`
[1]  3 10  5 16  8  4  2  1

$`5`
[1]  5 16  8  4  2  1

$`7`
 [1]  7 22 11 34 17 52 26 13 40 20 10  5 16  8  4  2  1

$`9`
 [1]  9 28 14  7 22 11 34 17 52 26 13 40 20 10  5 16  8  4  2  1</code></pre>
</div>
</div>
<p><code>discard_at</code> selects the ones for which the helper function is <code>FALSE</code>, which in this case will give us the even-numbered starting points:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb84" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb84-1">seq_list <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb84-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">discard_at</span>(\(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_odd</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>$`2`
[1] 2 1

$`4`
[1] 4 2 1

$`6`
[1]  6  3 10  5 16  8  4  2  1

$`8`
[1] 8 4 2 1

$`10`
[1] 10  5 16  8  4  2  1</code></pre>
</div>
</div>
</section>
<section id="final-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="final-thoughts">Final thoughts</h2>
<p>I have long been a devotee of the lambda-function notation with a <code>map</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb86" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb86-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb86-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(x, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(.))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1.000000 1.414214 1.732051 2.000000 2.236068</code></pre>
</div>
</div>
<p>but I have always had vague misgivings about teaching this, because it is not immediately obvious why the thing inside <code>sqrt1</code> is not also <code>x</code>. The reason, of course, is the same as this in Python:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb88" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb88-1">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'a'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'b'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'c'</span>]</span>
<span id="cb88-2"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> x:</span>
<span id="cb88-3">  <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(i)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>a
b
c</code></pre>
</div>
</div>
<p>where <code>i</code> stands for “the element of <code>x</code> that I am currently looking at”, but it takes a bit of thinking for the learner to get to that point.</p>
<p>Using the anonymous function approach makes things a bit clearer:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb90" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb90-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span></span>
<span id="cb90-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dbl</span>(x, \(x) <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt1</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1.000000 1.414214 1.732051 2.000000 2.236068</code></pre>
</div>
</div>
<p>where <code>x</code> appears <em>three</em> times in the <code>map</code>, first as the vector of values of which we want the square roots, and then as the input to <code>sqrt1</code>, so that everything appears to line up.</p>
<p>But there is some sleight of hand here: the meaning of <code>x</code> actually changes as you go along! The first <code>x</code> is a vector, but the second and third <code>x</code> values are <em>numbers</em>, elements of the vector <code>x</code>. Maybe this is all right, because we are used to treating vectors elementwise in R:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb92" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb92-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb92-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">root =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(x))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 2
      x  root
  &lt;int&gt; &lt;dbl&gt;
1     1  1   
2     2  1.41
3     3  1.73
4     4  2   
5     5  2.24</code></pre>
</div>
</div>
<p>Functions like <code>sqrt</code> are vectorized, so the <code>mutate</code> really means something like “take the elements of <code>x</code> one at a time and take the square root of each one, gluing the result back together into a vector”. So, in the grand scheme of things, I am sold on the (new) anonymous function way of running <code>map</code>, and I think I will be using this rather than the lambda-function way of doing things in the future.</p>
<p>Now, if you’ll excuse me, I now have to attend to all the times I’ve used <code>map</code> in my lecture notes!</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>R’s <code>log</code> function has two arguments: the number whose log you want, and then the base of the log, which defaults to <img src="https://latex.codecogs.com/png.latex?e">.↩︎</p></li>
<li id="fn2"><p>Ignoring the fact that <code>log</code> is vectorized.↩︎</p></li>
<li id="fn3"><p>The logic here seems to require the vector to have a <em>singular</em> name.↩︎</p></li>
<li id="fn4"><p>The input to the anonymous function could be called anything, but it seems like a waste to not use the same name as the column being for-eached over.↩︎</p></li>
<li id="fn5"><p><code>%/%</code> is integer division, discarding the remainder, and <code>%%</code> is the remainder itself. We need to be careful with the division because, for example, <code>4 / 2</code> is actually a <em>decimal</em> number, what we old FORTRAN programmers used to write as <code>2.0</code> or <code>2.</code>.↩︎</p></li>
<li id="fn6"><p>Spoiler: nobody has been able to prove that this is always true, but every starting point that has been tried gets to 1.↩︎</p></li>
<li id="fn7"><p>Using plain <code>map</code> means that its output will be a <code>list</code>, and in a dataframe will result in the new column being a list-column with something more than a single number stored in each cell.↩︎</p></li>
<li id="fn8"><p>I am a little bothered by most of them being <code>dbl</code> rather than <code>int</code>.↩︎</p></li>
<li id="fn9"><p>I must be having flashbacks of SAS, because I expected the opposite of “keep” to be “drop”.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <guid>https://blog.ritsokiguess.site/posts/looking-in-on-purrr-10/</guid>
  <pubDate>Fri, 23 Dec 2022 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/looking-in-on-purrr-10/OIP.gorIIacKboTjCTCHfiZ6qQHaIl.webp" medium="image" type="image/webp"/>
</item>
<item>
  <title>A Journey with Targets</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/journey-with-targets/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>There are lots of good reasons to adopt a workflow based around functions, and the <code>targets</code> package provides an excellent way to do that, by asking the user to express how the functions relate to one another.</p>
<p>This solves a problem I always had when trying to express things in functions: I would end up with hundreds of little functions that I would have to keep straight, especially when something went wrong later and I would have to try to remember how my edifice of functions had actually been constructed.</p>
<p>There are lots of good introductions to <code>targets</code>, none better than the Walkthrough in the <a href="https://books.ropensci.org/targets/">user manual</a>. Having worked through that myself, I wanted to see whether I could build a <code>targets</code> project from scratch. This is the story of that process. Thanks to <a href="https://fosstodon.org/@blasbenito">Blas Benito</a> and <a href="https://mastodon.social/@rmflight">Robert Flight</a> on Mastodon for guiding me when I got stuck.</p>
<p>My aim was to read in some data from a website, make a graph, do an analysis, and write a report including these things, and to do this in a way that makes it easy to add a second analysis of a second dataset to the report. The second analysis I do here is (deliberately) structured in a different way to the first one, so that we can see what implications that has for the process.</p>
</section>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>We<sup>1</sup> begin by installing the <code>targets</code> and <code>tarchetypes</code><sup>2</sup> packages.</p>
<p>Next, we go down to the console and run</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(targets)</span></code></pre></div></div>
</div>
<p>Most of the actual running of things happens in the console with this way of working. Next,</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">use_targets</span>()</span></code></pre></div></div>
</div>
<p>This creates and opens a file called <code>_targets.R</code> that will say how everything is connected together. It is the equivalent of a <code>Makefile</code> in what it does, though the way it works is a bit different from a <code>Makefile</code>. The one <code>use_targets</code> creates is a template, with places to fill in what we need.</p>
<p>I was trying to keep things tidy, so I also created some folders at this point: <code>data</code>, where the data we read in from the website will be stored, <code>report</code>, where the files for my report will live, and <code>R</code>, where the code for our functions will live.</p>
<p>If you are using git/github, there is one more piece of setup to do. <code>targets</code> will create some (potentially) large objects in the folder <code>_targets</code> of your project, and these don’t need to be under version control (because they can always be recreated), so we add the <code>_targets</code> folder to <code>.gitignore</code>.</p>
<p>All of my code is <a href="https://github.com/nxskok/soapy">here</a>.</p>
</section>
<section id="the-soap-data" class="level2">
<h2 class="anchored" data-anchor-id="the-soap-data">The soap data</h2>
<p>The standard way to run <code>targets</code> is to define functions to do everything that needs doing (which you put in one or more files in the <code>R</code> folder), and then edit <code>_targets.R</code> to say how those functions work together to create what you need.</p>
<p>I have three functions: one to read the data from a file, one to make a scatterplot with points and lines defined by groups, and one to fit a regression model, strictly an analysis of covariance model, like this (in file <code>functions.R</code> in folder <code>R</code>):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">read_data <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(filename) {</span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_delim</span>(filename, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span>
<span id="cb3-3">}</span>
<span id="cb3-4"></span>
<span id="cb3-5">plot_lines <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb3-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(x, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> speed, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> scrap, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> line)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">method =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)</span>
<span id="cb3-9">}</span>
<span id="cb3-10"></span>
<span id="cb3-11">fit_model1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(x) {</span>
<span id="cb3-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(scrap <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> speed <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> line, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> x)</span>
<span id="cb3-13">}</span></code></pre></div></div>
</div>
<p>The background is this: our data come from a factory that makes soap bars. The interest here is in how much <code>scrap</code> soap (which cannot be made into soap bars) is produced, which might depend on the <code>speed</code> at which the production is run. There are two different production <code>line</code>s, labelled <code>a</code> and <code>b</code>; the plot suggests that the <code>scrap</code>-<code>speed</code> relationship can be modelled by separate but parallel lines for each production <code>line</code>, which is what the model fits.</p>
<p>Now we have to edit <code>_targets.R</code> to express this. Here is the top bit of mine, with comments edited out:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(targets)</span>
<span id="cb4-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tarchetypes) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Load other packages as needed. # nolint</span></span>
<span id="cb4-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_option_set</span>(</span>
<span id="cb4-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">packages =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"tidyverse"</span>), <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># packages that your targets need to run</span></span>
<span id="cb4-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rds"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># default storage format</span></span>
<span id="cb4-6">)</span>
<span id="cb4-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_source</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"R/functions.R"</span>)</span></code></pre></div></div>
</div>
<p>First we have the targets script load both <code>targets</code> and <code>tarchetypes</code> that we installed earlier.<sup>3</sup> Then we add any packages that we need for the analysis to run: in this case <code>tidyverse</code>, or if you’re a stickler for efficiency, <code>readr</code> and <code>ggplot2</code>. The storage format came from <code>use_targets</code>; unless you are creating huge things in your code, the default will be fine. Finally, we load the functions that we wrote.</p>
<p>The bottom bit of <code>_targets.R</code> is the bit that looks like a Makefile, where we say how those functions are going to be used. This is the currently relevant part of mine:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_download</span>(file1, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/soap.txt"</span>,</span>
<span id="cb5-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">paths =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/soap.txt"</span>),</span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_data</span>(file1)),</span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(plot1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_lines</span>(soap)),</span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(model1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model1</span>(soap))</span>
<span id="cb5-7">)</span></code></pre></div></div>
</div>
<p>Each of these create a “target”, the first thing inside <code>tar_target</code> (or <code>tar_download</code>), and anything else says how that target is made. Each <code>tar_target</code> can (and undoubtedly will) use previously made targets.</p>
<p>I have to talk about the first one. My datafile existed on a website with the URL shown, but <code>targets</code> works with local files only. <code>tarchetypes</code> contains a number of recipes for doing jobs like this. <code>tar_download</code> creates here a target <code>file1</code> that refers to the datafile by downloading the datafile from its website to a file in <code>data</code>. <code>file1</code> refers to that file without us having to remember where the file is.</p>
<p>So, having created a target <code>file1</code> that refers to the (downloaded and locally stored) datafile, the next three targets do this:</p>
<ul>
<li>read the data in from the local file using our function <code>read_data</code> and store it as the target <code>soap</code></li>
<li>make a scatterplot with lines of the soap data, using our function <code>plot_lines</code> and store it as <code>plot1</code></li>
<li>fit a model, using our function <code>fit_model</code>, and store it as <code>model1</code>.</li>
</ul>
<p>Having set up our pipeline, now we can think about running it. Before that, we can go to the console and type <code>tar_visnetwork()</code>. This makes a diagram showing how the bits fit together. In this case we have, left to right:</p>
<ul>
<li><code>file1_url</code>: the url where the data will come from</li>
<li><code>file1</code>, which depends on <code>file1_url</code></li>
<li><code>soap</code>, which depends on <code>file1</code></li>
<li><code>plot1</code>, which depends on <code>soap</code></li>
<li><code>model1</code>, which also depends on <code>soap</code>.</li>
</ul>
<p>The value of looking at this diagram is to make sure that we have coded the dependency structure properly, which it seems (in this case) I have done.</p>
<p>Also note that the targets are colour-coded according to whether they are up-to-date or outdated. At first, everything will be outdated, but later only some of it will be outdated. <code>targets</code> knows to only update what needs updating.</p>
<p>Next, we actually run everything. This is done in the console with <code>tar_make</code>. This will update everything that needs updating, using in each case the recipe in <code>_targets.R</code>. The output to <code>tar_make()</code> tells you whether each target was “built” (updated) or “skipped” (nothing in that target had changed). If anything goes wrong, there will be an error message. I find the error messages not very helpful, but at least it is clear which of the targets caused the error, which at least gives a place to start looking.</p>
<p>Once everything has run, the output from each target is stored,<sup>4</sup> and can be inspected (in the Console) using <code>tar_read</code>. For example, <code>tar_read(soap)</code> will display the dataframe that was read in from the file, and <code>tar_read(plot1)</code> will display the scatterplot. You can do the same thing with the fitted model object <code>model1</code>, but this is probably not what you want; you would probably prefer to look at the <code>summary</code> of the model,<sup>5</sup> which you can do like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_load</span>(model1)</span>
<span id="cb6-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model1)</span></code></pre></div></div>
</div>
<p><code>tar_load</code> puts a copy of the target named into your workspace, and then you can do something with it as you normally would.</p>
</section>
<section id="making-a-child-report" class="level2">
<h2 class="anchored" data-anchor-id="making-a-child-report">Making a child report</h2>
<p>So far, this is very standard <code>targets</code> work: use functions to make a pipeline that you specify in <code>_targets.R</code>, and then use <code>tar_make</code> to run it. But, having gotten this to work, I wanted to add a report, and I wanted to do so flexibly, so that I could easily add a different analysis of a different dataset later. The way I like to do this is to use “child documents”: write each report as a separate <code>.Rmd</code> file, and then have a “parent” report that loads each child report.</p>
<p>We’ll get to the parent report later. Let’s write a report about the soap data first, which will be saved in <code>soap.Rmd</code> in the <code>report</code> folder. This is a child document, so <em>it has no YAML header</em>, and we begin right away with the report header and a description of the dataset. The report structure will be simple: we display the data, display the scatterplot (and talk about it a bit), then display the regression output (and talk about that a bit).</p>
<p>There is a standard <code>targets</code> way of making reports like these: we do all the computation in previous targets (as we have done), and then we read in what we want to display with <code>tar_read</code> or <code>tar_load</code> as we did above, instead of doing more computation to obtain a target that we already have. For this report, that means having three code chunks, the content of which we have already seen:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_read</span>(soap)</span></code></pre></div></div>
</div>
<p>to display the data,</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_read</span>(plot1)</span></code></pre></div></div>
</div>
<p>to display the scatterplot, and</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_load</span>(model1)</span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summary</span>(model1)</span></code></pre></div></div>
</div>
<p>to display the model output. That, together with my comments, is <code>soap.Rmd</code>.</p>
<p>Finally, we need to add this into the pipeline, so that <code>targets</code> knows to update this part of the report if the text in it changes, or if any of <code>soap</code>, <code>plot1</code>, or <code>model1</code> changes.<sup>6</sup> This is a so-called “file” target, and we add it to the end of <code>_targets.R</code> to make this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_download</span>(file1, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/soap.txt"</span>,</span>
<span id="cb10-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">paths =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/soap.txt"</span>),</span>
<span id="cb10-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_data</span>(file1)),</span>
<span id="cb10-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(plot1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_lines</span>(soap)),</span>
<span id="cb10-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(model1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model1</span>(soap)),</span>
<span id="cb10-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/soap.Rmd"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"file"</span>)</span>
<span id="cb10-8">)</span></code></pre></div></div>
</div>
<p><code>targets</code> knows additionally that the soap report depends on <code>soap</code>, <code>plot1</code>, and <code>model1</code> because of the <code>tar_read</code> and <code>tar_load</code> statements in the report. (This doesn’t always show up in <code>tar_visnetwork</code>, but <code>targets</code> knows about it all the same.)</p>
</section>
<section id="making-a-parent-report" class="level2">
<h2 class="anchored" data-anchor-id="making-a-parent-report">Making a parent report</h2>
<p>This is a genuine Markdown report, so it begins with a YAML header specifying the author, title, date, etc. Then follows a setup chunk with <code>knitr::opts_chunk$set(echo = FALSE)</code>: the only code in this part of the report is the <code>tar_read</code> and <code>tar_load</code> statements that the reader doesn’t need to see.</p>
<p>Then we need to say that this report depends on <code>soap_report</code>, so that if <em>that</em> changes, the parent report needs to be updated. I come from a Makefile background, so this took some time (and help) for me to figure out, but the way you do it is, as in the child report, <code>tar_load</code>ing the target that represents the report:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_load</span>(soap_report)</span></code></pre></div></div>
</div>
<p>Then I had some preamble text.</p>
<p>Then we load the child report itself. It seems that it should be possible to use <code>soap_report</code> directly (it contains the path to the child report), but I couldn’t get this to work, so I specified the actual path to the child report directly with <code>child = "report/soap.Rmd"</code> as a chunk option.<sup>7</sup></p>
<p>The last thing to do here is to add the parent report as a target. We now have:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_download</span>(file1, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/soap.txt"</span>,</span>
<span id="cb12-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">paths =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/soap.txt"</span>),</span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_data</span>(file1)),</span>
<span id="cb12-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(plot1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_lines</span>(soap)),</span>
<span id="cb12-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(model1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model1</span>(soap)),</span>
<span id="cb12-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/soap.Rmd"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"file"</span>),</span>
<span id="cb12-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_render</span>(final_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/report.Rmd"</span>)</span>
<span id="cb12-9">)</span></code></pre></div></div>
</div>
<p>The parent report is the thing that needs to be knitted, so we use the special target <code>tar_render</code> (from <code>tarchetypes</code>), which says to take the document stored in the second input, and knit it to create the target that is the first input. After this runs, there is a file <code>report.html</code> in <code>report</code> that is the knitted report.</p>
</section>
<section id="including-the-code-in-a-report" class="level2">
<h2 class="anchored" data-anchor-id="including-the-code-in-a-report">Including the code in a report</h2>
<p>If I were writing the report for someone else, I wouldn’t expect them to be very interested in the code that produced the plot or the model summary. But for teaching, it is very useful to show what code the output came from. The problem with the <code>targets</code>-style analysis we just did is that, by separating the computation from the reporting, the code is nowhere to be found.</p>
<p>At the expense of good <code>targets</code> style and efficient computation, however, there is no problem including the computation in the report, so that my second child report, in <code>report/spiders.Rmd</code>, looks like any other R Notebook you might write, with code chunks and the output immediately below them, to read in the data from a website, make a boxplot and run a logistic regression. The background for this one is that a certain spider is or is not found on a beach, and the research hypothesis is that whether or not this spider is found depends somehow on the size of the grains of sand on that beach.</p>
<p>So there is no difficulty composing this kind of report, and no need to write extra functions and make extra targets to compute its constituent pieces. The only extra thing that needs to go in <code>_targets.R</code> is this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_download</span>(file1, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/soap.txt"</span>,</span>
<span id="cb13-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">paths =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"data/soap.txt"</span>),</span>
<span id="cb13-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_data</span>(file1)),</span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(plot1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">plot_lines</span>(soap)),</span>
<span id="cb13-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(model1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_model1</span>(soap)),</span>
<span id="cb13-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(soap_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/soap.Rmd"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"file"</span>),</span>
<span id="cb13-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_target</span>(spiders_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/spiders.Rmd"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">format =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"file"</span>),</span>
<span id="cb13-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_render</span>(final_report, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"report/report.Rmd"</span>)</span>
<span id="cb13-10">)</span></code></pre></div></div>
</div>
<p>(the second to last line, entirely analogous to the other child report). The extra stuff that goes in the parent report is to turn the display of code back on:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>opts_chunk<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">echo =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
</div>
<p>Another <code>tar_load</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tar_load</span>(spiders_report)</span></code></pre></div></div>
</div>
<p>(to build the dependence of the parent report on this child one as well), and then the actual importation of the second child report with <code>child = "report/spiders.Rmd"</code> in the options of another empty code chunk.</p>
<p>This puts all the dependencies in the right places, and so another <code>tar_make</code> will build the whole report with its two child reports on the two datasets, updating anything that needs updating.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>The authorly “we”, meaning you, the reader, and I are going on this journey together. Or so the fiction has it.↩︎</p></li>
<li id="fn2"><p>This is needed for some extensions to <code>targets</code>, two of which I use here.↩︎</p></li>
<li id="fn3"><p>The reason for <code>tarchetypes</code> will become clear shortly.↩︎</p></li>
<li id="fn4"><p>In an <code>.rds</code> file in the <code>_targets</code> folder.↩︎</p></li>
<li id="fn5"><p>Or run something like <code>broom::tidy</code>.↩︎</p></li>
<li id="fn6"><p>Which might be because we changed the plot-drawing function to make a different plot, for example.↩︎</p></li>
<li id="fn7"><p>Actually done Quarto-style within the chunk, using the special comment line <code>#|</code>.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/journey-with-targets/</guid>
  <pubDate>Mon, 19 Dec 2022 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/journey-with-targets/logo.png" medium="image" type="image/png" height="166" width="144"/>
</item>
<item>
  <title>Random sampling from groups</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/random-sampling-groups/</link>
  <description><![CDATA[ 





<section id="packages" class="level1">
<h1>Packages</h1>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>In a previous post, I discussed how we might sample in groups, where each group was a sample from a different population.</p>
<p>I introduced this function:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">gen_sample <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n, mean, sd) {</span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb3-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb3-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb3-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb3-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(gp, z)</span>
<span id="cb3-7">}</span></code></pre></div></div>
</div>
<p>that samples from normal populations with possibly different means, SDs, and sample sizes in different groups.</p>
</section>
<section id="explanation" class="level1">
<h1>Explanation</h1>
<p>This is (more or less) the same explanation that appeared at the end of the previous post, so feel free to skip if you have read it before.</p>
<p>The first step is to make a data frame with one row for each sample that will be generated. This uses the inputs to the function above, so we will make some up:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb4-2">mean <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb4-3">sd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb4-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  gp        n  mean    sd
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2
2 y         3    10     1</code></pre>
</div>
</div>
<p>Evidently, in a function for public consumption, you would check that all the inputs are the same length, or you would rely on <code>tibble</code> telling you that only vectors of length 1 are recycled.<sup>1</sup> The groups are for no good reason called <code>x</code> and <code>y</code>.</p>
<p>The next two lines generate random samples, one for each group, according to the specifications, and store them each in one cell of the two-row spreadsheet:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
# Rowwise: 
  gp        n  mean    sd z        
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;list&gt;   
1 x         5    20     2 &lt;dbl [5]&gt;
2 y         3    10     1 &lt;dbl [3]&gt;</code></pre>
</div>
</div>
<p>The new column <code>z</code> is a list column, since the top cell of the column is a vector of length 5, and the bottom cell is a vector of length 3. To actually see the values they contain, we <code>unnest</code> <code>z</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 8 × 5
  gp        n  mean    sd     z
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2 21.9 
2 x         5    20     2 20.1 
3 x         5    20     2 18.8 
4 x         5    20     2 20.4 
5 x         5    20     2 15.8 
6 y         3    10     1  9.74
7 y         3    10     1 11.0 
8 y         3    10     1 10.2 </code></pre>
</div>
</div>
<p>and, finally, the middle three columns were only used to generate the values in <code>z</code>, so they can be thrown away now by <code>select</code>ing only <code>gp</code> and <code>z</code>.</p>
<p>The <code>rowwise</code> is necessary:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 5
  gp        n  mean    sd     z
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2 22.7 
2 x         5    20     2  9.92
3 y         3    10     1 22.7 
4 y         3    10     1  9.92</code></pre>
</div>
</div>
<p>because <code>rnorm</code> is vectorized, and for the <code>x</code> sample, R will draw one sampled value from each normal distribution, and then repeat the same values for the <code>y</code> sample. This is very much <em>not</em> what we want.</p>
<p>The same idea can be used to draw random chi-squared data (say):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, df))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 2
      df       z
   &lt;dbl&gt;   &lt;dbl&gt;
 1     2  0.298 
 2     2  1.11  
 3     2  0.445 
 4     2  1.11  
 5     2  0.0144
 6     6  7.54  
 7     6  4.46  
 8     6 14.8   
 9     6  9.13  
10     6  7.99  </code></pre>
</div>
</div>
<p>(five values from <img src="https://latex.codecogs.com/png.latex?%5Cchi%5E2_2">, followed by five from <img src="https://latex.codecogs.com/png.latex?%5Cchi%5E2_6">.)</p>
<p>This suggests that I ought to be able to generalize my function <code>gen_sample</code>. Generalizing to any number of groups needs no extra work: the length of the input <code>n</code> determines the number of groups, and the values in <code>n</code> determine the size of each of those groups.</p>
<p>The interesting generalization is the distribution to sample from. The first parameter of the functions <code>rnorm</code>, <code>rchisq</code> etc. is always the number of random values to generate, but the remaining parameters are different for each distribution. This suggests that my generalized random sample generator ought to have the name of the random sampling function as input, followed by some means to allow any other inputs needed by that sampling function.</p>
</section>
<section id="generalizing" class="level1">
<h1>Generalizing</h1>
<p>To generalize <code>gen_sample</code> to a new function <code>sample_groups</code>, we need to consider how to handle different distributions. The distribution itself is not the hard part; that can be specified by having the random sample generator function for the desired distribution as an input to the new function. The problem is that each distribution has different parameters, which need to be inputs to <code>sample_groups</code>.</p>
<p>The standard way of doing this kind of thing is to use the <code>...</code> input to a function. If I had just one group, it would go like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">sample_group <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n, dist, ...) {</span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dist</span>(n, ...)</span>
<span id="cb14-3">}</span></code></pre></div></div>
</div>
<p>and then</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_group</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, rnorm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  8.634868  8.737404  7.642048 10.836333  9.608791</code></pre>
</div>
</div>
<p>or</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_group</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, rpois, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3 6 1 1 1 0</code></pre>
</div>
</div>
<p>The additional (named) inputs to <code>sample_group</code> are passed on unchanged to the random sample generator, to generate respectively normal random values with mean 10 and SD 3, and Poisson random values with mean 2.<sup>2</sup></p>
<p>The random sample generators are vectorized, but the obvious thing for generating two samples from different distributions does not work:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_group</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), rnorm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  7.124158 21.351194</code></pre>
</div>
</div>
<p>We appear to have one value from each distribution, not six from the first and three from the second. This, I <em>think</em>, is what the help files say will happen.</p>
<p>To allow for the stuff at the end of the call to be different, another way is to use a <code>list</code> to pass each distribution’s parameters. This turns out to be what I do later.</p>
</section>
<section id="a-bad-approach" class="level1">
<h1>A bad approach</h1>
<p>Let’s suppose I ask the user to write the code to generate each sample as text (a vector of pieces of text, one for each sample). Here’s how my example above would look:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rnorm(5, mean = 10, sd = 3)"</span>, </span>
<span id="cb21-2">          <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"rpois(6, lambda = 2)"</span>)</span>
<span id="cb21-3">code</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "rnorm(5, mean = 10, sd = 3)" "rpois(6, lambda = 2)"       </code></pre>
</div>
</div>
<p>The problem is that this is text, not runnable code. One way to turn this into something useful is to <code>parse</code> it:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">pc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> code[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb23-2">pc</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>expression(rnorm(5, mean = 10, sd = 3))</code></pre>
</div>
</div>
<p>This has turned the text into an <code>expression</code>, something that can be evaluated, thus:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eval</span>(pc)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  9.273030  8.994949 15.743894 11.803060 13.231719</code></pre>
</div>
</div>
<p>And now we have a strategy:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb27-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(code) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> code))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb27-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eval</span>(expr))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb27-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 3
   code                        expr               z
   &lt;chr&gt;                       &lt;list&gt;         &lt;dbl&gt;
 1 rnorm(5, mean = 10, sd = 3) &lt;expression&gt;  9.57  
 2 rnorm(5, mean = 10, sd = 3) &lt;expression&gt; -0.0996
 3 rnorm(5, mean = 10, sd = 3) &lt;expression&gt;  6.94  
 4 rnorm(5, mean = 10, sd = 3) &lt;expression&gt; 15.5   
 5 rnorm(5, mean = 10, sd = 3) &lt;expression&gt;  9.09  
 6 rpois(6, lambda = 2)        &lt;expression&gt;  2     
 7 rpois(6, lambda = 2)        &lt;expression&gt;  4     
 8 rpois(6, lambda = 2)        &lt;expression&gt;  6     
 9 rpois(6, lambda = 2)        &lt;expression&gt;  0     
10 rpois(6, lambda = 2)        &lt;expression&gt;  4     
11 rpois(6, lambda = 2)        &lt;expression&gt;  3     </code></pre>
</div>
</div>
<p>So now we have the ingredients for a version of <code>sample_groups</code> based on the user writing the random-sampling code for us. I added one extra thing: lettering the groups, since they otherwise have bad names:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">sample_groups <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(code) {</span>
<span id="cb29-2">  n_pop <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(code)</span>
<span id="cb29-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(code, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> letters[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n_pop]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">expr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">text =</span> code))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb29-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eval</span>(expr))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb29-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(gp, z)</span>
<span id="cb29-9">}</span></code></pre></div></div>
</div>
<p>and to test:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_groups</span>(code)</span>
<span id="cb30-2">d</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 2
   gp        z
   &lt;chr&gt; &lt;dbl&gt;
 1 a      7.81
 2 a     10.3 
 3 a      7.07
 4 a     10.1 
 5 a      8.93
 6 b      2   
 7 b      1   
 8 b      2   
 9 b      1   
10 b      6   
11 b      2   </code></pre>
</div>
</div>
<p>Using the <code>eval(parse(text = something))</code> idea is not (apparently) very well regarded.<sup>3</sup> One immediate problem is that the user could put any code at all (that evaluates into a vector of numbers) into the input <code>code</code>, which seems less than secure:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">code <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"1:3"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mtcars$mpg"</span>)</span>
<span id="cb32-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_groups</span>(code)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 35 × 2
   gp        z
   &lt;chr&gt; &lt;dbl&gt;
 1 a       1  
 2 a       2  
 3 a       3  
 4 b      21  
 5 b      21  
 6 b      22.8
 7 b      21.4
 8 b      18.7
 9 b      18.1
10 b      14.3
# ℹ 25 more rows</code></pre>
</div>
</div>
</section>
<section id="a-better-way" class="level1">
<h1>A better way</h1>
<p>I want to get back to the user inputting the desired random sample generators as functions, and then running those functions on the rest of the input. This is what <code>do.call</code> does:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(rnorm, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1]  7.493600 13.180428  8.346022  8.472318 11.365156</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(rpois, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2 1 4 1 2 1</code></pre>
</div>
</div>
<p>Having realized (i) that <code>do.call</code> is what I wanted, and (ii) that the input parameters to the functions need to be in a <code>list</code>, I packaged up those distribution parameters into a <code>list</code> of <code>list</code>s first. It is actually not necessary to make the list of distributions into a <code>list</code>, but it works if you do that too:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">dist <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(rnorm, rpois)</span>
<span id="cb38-2">pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), </span>
<span id="cb38-3">             <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb38-4">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist =</span> dist, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pars =</span> pars)</span>
<span id="cb38-5">d</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  dist   pars            
  &lt;list&gt; &lt;list&gt;          
1 &lt;fn&gt;   &lt;named list [3]&gt;
2 &lt;fn&gt;   &lt;named list [2]&gt;</code></pre>
</div>
</div>
<p>and then we put the <code>do.call</code> in a rowwise mutate, wrapping the whole thing in a list to make a list-column:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(dist, pars))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb40-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 3
   dist   pars                 z
   &lt;list&gt; &lt;list&gt;           &lt;dbl&gt;
 1 &lt;fn&gt;   &lt;named list [3]&gt;  5.97
 2 &lt;fn&gt;   &lt;named list [3]&gt; 12.0 
 3 &lt;fn&gt;   &lt;named list [3]&gt; 10.1 
 4 &lt;fn&gt;   &lt;named list [3]&gt; 18.3 
 5 &lt;fn&gt;   &lt;named list [3]&gt;  6.27
 6 &lt;fn&gt;   &lt;named list [2]&gt;  3   
 7 &lt;fn&gt;   &lt;named list [2]&gt;  2   
 8 &lt;fn&gt;   &lt;named list [2]&gt;  2   
 9 &lt;fn&gt;   &lt;named list [2]&gt;  2   
10 &lt;fn&gt;   &lt;named list [2]&gt;  2   
11 &lt;fn&gt;   &lt;named list [2]&gt;  2   </code></pre>
</div>
</div>
<p>it works!</p>
<p>And so, we can now make our function:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb42" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb42-1">sample_groups <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(dist, pars) {</span>
<span id="cb42-2">  nr <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(pars)</span>
<span id="cb42-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">dist =</span> dist, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pars =</span> pars, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> letters[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>nr]) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(dist, pars))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb42-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(gp, z)</span>
<span id="cb42-8">}</span></code></pre></div></div>
</div>
<p>and to test</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1">dists <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(rnorm, rpois)</span>
<span id="cb43-2">pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lambda =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb43-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_groups</span>(dists, pars)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 11 × 2
   gp        z
   &lt;chr&gt; &lt;dbl&gt;
 1 a     13.6 
 2 a     10.7 
 3 a     14.3 
 4 a      7.93
 5 a      6.76
 6 b      2   
 7 b      0   
 8 b      2   
 9 b      4   
10 b      3   
11 b      3   </code></pre>
</div>
</div>
<p>The only weirdness is that the user has to specify a list of lists for the parameters (because <code>do.call</code> needs a list for inputs to its function). But it definitely works.</p>
<p>One shortcut is that if you want all the samples to be from the same distribution, you specify only one thing in the input that I called <code>dists</code>:<sup>4</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb45-1">dist <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(rnorm)</span>
<span id="cb45-2">new_pars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>),</span>
<span id="cb45-3">                 <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb45-4">                 <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))</span>
<span id="cb45-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample_groups</span>(dist, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pars =</span> new_pars)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 9 × 2
  gp        z
  &lt;chr&gt; &lt;dbl&gt;
1 a      3.58
2 a      5.26
3 a      4.75
4 b      8.15
5 b     12.5 
6 c     21.1 
7 c     17.7 
8 c     17.7 
9 c     14.6 </code></pre>
</div>
</div>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>So, for example, if both your sample sizes are the same, you could define eg <code>n &lt;- 10</code> and it would get expanded to length 2 in the function.↩︎</p></li>
<li id="fn2"><p>It may take a trip to the help files to find out what R calls these parameters.↩︎</p></li>
<li id="fn3"><p>There seems to be a recurring theme on Stack Overflow that if <code>eval(parse())</code> is the answer, you are asking the wrong question.↩︎</p></li>
<li id="fn4"><p>This is where the <code>list()</code> is important: there is no problem having a list-column of functions, but you cannot have a column which is just a function.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/random-sampling-groups/</guid>
  <pubDate>Mon, 09 May 2022 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/random-sampling-groups/Stratified_Sampling-278x232.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Why the rank sum test is also a waste of time</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/rank-sum-also-waste/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>In the same way that the signed rank test is mostly a waste of time, here I argue that the rank sum test is very rarely useful, and offer a less well-known test to use instead.</p>
<p>In a <a href="http://ritsokiguess.site/blogg/posts/2022-05-04-why-the-signed-rank-test-is-a-waste-of-time/">previous post</a>, I argued that the signed rank test is a waste of time (to learn and to do) in all but unlikely cases. In this post, I argue that the same is true for the rank sum test (Mann-Whitney test, Wilcoxon two-sample rank sum test), and suggest a much-neglected test to use instead.</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(smmr)</span></code></pre></div></div>
</div>
<p>The package <code>smmr</code> is one that I wrote myself, and lives on Github <a href="https://github.com/nxskok/smmr">here</a>. It may be installed using <code>install_github</code> from package <code>remotes</code>, or using <code>pkg_install</code> from package <code>pak</code>.</p>
</section>
<section id="the-two-sample-t-test" class="level2">
<h2 class="anchored" data-anchor-id="the-two-sample-t-test">The two-sample <img src="https://latex.codecogs.com/png.latex?t">-test</h2>
<p>The two-sample <img src="https://latex.codecogs.com/png.latex?t">-test is probably your first idea for comparing (means of) two independent samples from two different populations, such as one set of people who undergo some (real) treatment and a different set of people who undergo some different (eg. placebo) treatment.<sup>1</sup></p>
<p>There are actually two flavours of two-sample <img src="https://latex.codecogs.com/png.latex?t">-test:</p>
<ul>
<li>If you took a mathematical statistics class, you most likely learned about the <em>pooled test</em>. Here, as well as the normality assumption that we talk about later, it is assumed that the two populations have the <em>same variance</em>, which makes the algebra a lot easier. (To run the test, we estimate the common variance by making a weighted average of the two sample variances.)</li>
<li>The default used by R’s <code>t.test</code> is a different test due to Welch and to Satterthwaite.<sup>2</sup> This does not assume equal variances, but the test statistic (obtained by estimating each population’s variance separately) does not have an exact <img src="https://latex.codecogs.com/png.latex?t">-distribution. It is assumed that the test statistic has a <img src="https://latex.codecogs.com/png.latex?t">-distribution with a different (usually fractional) degrees of freedom that Welch and Satterthwaite each give a formula for; this assumption is usually good. The <a href="https://en.wikipedia.org/wiki/Welch%E2%80%93Satterthwaite_equation#:~:text=In%20statistics%20and%20uncertainty%20analysis,corresponding%20to%20the%20pooled%20variance.">Wikipedia page</a> gives the formula and references to the original papers (in Further Reading).</li>
</ul>
<p>In practice, when both tests apply, the two tests usually give very similar P-values, and so there is no harm in only ever using the Welch-Satterthwaite test, and thus the pooled <img src="https://latex.codecogs.com/png.latex?t">-test is really only a curiosity of math stat classes. If you really want the pooled test, you have to ask <code>t.test</code> for it specifically (using <code>var.equal = TRUE</code>). If, however, the two populations have different variances, then the pooled test can be very misleading.</p>
<p>I will illustrate, first from populations with equal variances. Before that, though, generating the random samples in long format is a bit annoying, so I define a function to do that first, with the sample sizes, population means, and population SDs as vector input (each of length 2):<sup>3</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">gen_sample <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">function</span>(n, mean, sd) {</span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb4-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb4-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(gp, z)</span>
<span id="cb4-7">}</span></code></pre></div></div>
</div>
<p>The simulated data values are called <code>z</code> and the groups are called <code>x</code> and <code>y</code> in column <code>gp</code>. As written, the data are always drawn from a normal distribution.</p>
<p>Now we generate some data and run the two flavours of <img src="https://latex.codecogs.com/png.latex?t">-test on it. This time, the means are different (so the null is actually false) but the SDs are the same:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">d1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gen_sample</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">140</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> d1)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Welch Two Sample t-test

data:  z by gp
t = -2.2065, df = 16.785, p-value = 0.04158
alternative hypothesis: true difference in means between group x and group y is not equal to 0
95 percent confidence interval:
 -59.836964  -1.311008
sample estimates:
mean in group x mean in group y 
       104.9265        135.5005 </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> d1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var.equal =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Two Sample t-test

data:  z by gp
t = -2.2065, df = 18, p-value = 0.04058
alternative hypothesis: true difference in means between group x and group y is not equal to 0
95 percent confidence interval:
 -59.685224  -1.462748
sample estimates:
mean in group x mean in group y 
       104.9265        135.5005 </code></pre>
</div>
</div>
<p>In this case, the P-values are almost identical.</p>
<p>Let’s try it again, but this time group <img src="https://latex.codecogs.com/png.latex?y"> has a smaller sample and also a larger variance:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">d2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gen_sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">140</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>))</span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> d2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Welch Two Sample t-test

data:  z by gp
t = -1.6394, df = 4.4934, p-value = 0.1686
alternative hypothesis: true difference in means between group x and group y is not equal to 0
95 percent confidence interval:
 -127.22157   30.20689
sample estimates:
mean in group x mean in group y 
       106.2476        154.7549 </code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> d2, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">var.equal =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Two Sample t-test

data:  z by gp
t = -2.2024, df = 13, p-value = 0.04629
alternative hypothesis: true difference in means between group x and group y is not equal to 0
95 percent confidence interval:
 -96.0887846  -0.9258949
sample estimates:
mean in group x mean in group y 
       106.2476        154.7549 </code></pre>
</div>
</div>
<p>This time, the pooled test has a much smaller P-value, which makes us think there is a real difference between the means, even though the (correct<sup>4</sup>) Welch test says that the evidence is not strong enough, probably because the sample sizes are not big enough.</p>
<p>If the smaller sample had come from the population with smaller variance, things would not have been so bad estimation-wise, but having the small sample be less informative about its population mean is asking for trouble.</p>
<p>The same derivation that Welch used applies to a comparison of any number of groups, so there is also a Welch ANOVA for comparing the means of three or more groups, without assuming equal variances. Likewise, the <img src="https://latex.codecogs.com/png.latex?F">-statistic no longer has exactly an <img src="https://latex.codecogs.com/png.latex?F"> distribution, so Welch obtained an approximate denominator degrees of freedom so that the <img src="https://latex.codecogs.com/png.latex?F">-test is still good enough. R has this in <code>oneway.test</code>. Welch’s ANOVA deserves to be a lot better known than it is.<sup>5</sup></p>
<p>The two-sample <img src="https://latex.codecogs.com/png.latex?t">-tests have a normality assumption, like the one-sample <img src="https://latex.codecogs.com/png.latex?t">. Here, it is that the observations from each population have a normal distribution, independently of each other and the observations from the other population. As with a one-sample test, the Central Limit Theorem helps, and with larger sample sizes, the normality matters less. I tend to say that each sample should be close enough to normal in shape given its sample size (as you might assess with separate normal quantile plots for each sample), but this is being somewhat too stringent because the <img src="https://latex.codecogs.com/png.latex?t"> statistic for either of the two-sample tests is based on the difference between the sample means, and that will tend to be a bit more normal than either sampling distribution of the two sample means individually. You might assess this with a bootstrap distribution of the <img src="https://latex.codecogs.com/png.latex?t">-statistic (or of the difference in sample means), though this requires care to get bootstrap samples of the same size as the original ones (simply resampling rows of a long data frame will not do this).</p>
</section>
<section id="the-rank-sum-test" class="level2">
<h2 class="anchored" data-anchor-id="the-rank-sum-test">The rank sum test</h2>
<p>So, what to do if the observations within each sample are not as normal as you would like? Something that is often suggested is the rank sum test, often with the names Mann and Whitney attached, and sometimes with the same name Wilcoxon that is attached to the signed rank test. I illustrate with the same data I used for the second version of the two-sample <img src="https://latex.codecogs.com/png.latex?t">:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wilcox.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> d2)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Wilcoxon rank sum exact test

data:  z by gp
W = 15, p-value = 0.2544
alternative hypothesis: true location shift is not equal to 0</code></pre>
</div>
</div>
<p>To see where the <code>W = 15</code> came from:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">d2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rk =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rank</span>(z)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(gp) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb15-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rank_sum =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(rk),</span>
<span id="cb15-4">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> s1</span>
<span id="cb15-5">s1</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 3
  gp    rank_sum     n
  &lt;chr&gt;    &lt;dbl&gt; &lt;int&gt;
1 x           70    10
2 y           50     5</code></pre>
</div>
</div>
<p>The two groups are pooled together and ranked from smallest to largest, and then the ranks for each group are summed. There are only 5 observations in group <code>y</code>, so the ranks in this group are typically larger (to go with the values themselves being typically larger). To account for this, the following calculation is done:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">s1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">W =</span> rank_sum <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
# Rowwise: 
  gp    rank_sum     n     W
  &lt;chr&gt;    &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
1 x           70    10    15
2 y           50     5    35</code></pre>
</div>
</div>
<p>and the smaller of the two values in <code>W</code> is the test statistic. The smaller the test statistic is, the more significant (the smallest possible value is zero). In this case, the P-value is 0.2544, not significant.</p>
<p>What happens if one of the samples is more variable, even if the means and sample sizes are the same, so that the null hypothesis is still true? We should therefore reject 5% of the time still. Let’s do a simulation to find out:<sup>6</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gen_sample</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wilcox.test</span>(z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> gp, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> my_sample))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_val =</span> my_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d5</span>
<span id="cb19-6">d5</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 4
# Rowwise: 
     sim my_sample         my_test p_val
   &lt;int&gt; &lt;list&gt;            &lt;list&gt;  &lt;dbl&gt;
 1     1 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.971
 2     2 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.280
 3     3 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.684
 4     4 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.579
 5     5 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.853
 6     6 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.481
 7     7 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.684
 8     8 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.190
 9     9 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.971
10    10 &lt;tibble [20 × 2]&gt; &lt;htest&gt; 0.247
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>How many of these P-values are 0.05 or less?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">d5 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_val <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
# Rowwise: 
  `p_val &lt;= 0.05`     n
  &lt;lgl&gt;           &lt;int&gt;
1 FALSE            9193
2 TRUE              807</code></pre>
</div>
</div>
<p>This says: we are rejecting over 8% of the time, with a test that is supposed to reject only 5% of the time. The reason for doing 10,000 simulations is so that we can get a good sense of whether this is “really” greater than 5%:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prop.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">824</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    1-sample proportions test with continuity correction

data:  824 out of 10000, null probability 0.05
X-squared = 220.32, df = 1, p-value &lt; 2.2e-16
alternative hypothesis: true p is not equal to 0.05
95 percent confidence interval:
 0.07712114 0.08800255
sample estimates:
     p 
0.0824 </code></pre>
</div>
</div>
<p>The probability of incorrectly rejecting the true null is definitely not 0.05, and the confidence interval indicates that it is substantially greater than 0.05. So we should not be using the rank sum test if the two populations have different variances: in other words, the rank sum test suffers from the same problems as the pooled <img src="https://latex.codecogs.com/png.latex?t">-test.</p>
<p>This is often stated as saying that the rank sum test is actually testing a null hypothesis of equal <em>distributions</em>, and if you reject, as you too often do here, the distributions could differ in some way other than equal means. This might be what you want (though, as this simulation shows, you do not get much power to detect unequal spreads), but in the kind of situation where you would have done a <img src="https://latex.codecogs.com/png.latex?t">-test, it most likely is not. We don’t want to be worrying about whether spreads or distribution shapes differ when our principal interest is in means or medians.</p>
</section>
<section id="moods-median-test" class="level2">
<h2 class="anchored" data-anchor-id="moods-median-test">Mood’s median test</h2>
<p>So, if the rank sum test doesn’t do the job when the <img src="https://latex.codecogs.com/png.latex?t">-test doesn’t, what <em>does</em>? I suggest a test that seems to be unjustly maligned called Mood’s median test. It is a sort of two-sample version of the sign test, and like the sign test, it is a test for medians.<sup>7</sup></p>
<p>To illustrate, let’s generate some data from right-skewed chi-squared distributions: one sample with 2 df (that has mean 2) and one sample with 6 df (that has mean 6):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb25-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb25-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, df))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb25-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d6</span>
<span id="cb25-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(d6, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> z)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ncol =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/rank-sum-also-waste/index_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>At least the first of these does not look very normal (dissuading us from a <img src="https://latex.codecogs.com/png.latex?t">-test), and they don’t seem to have the same spread or shape of distribution (dissuading us from a rank sum test).</p>
<p>The idea behind the test is to work out the median of all the data, and then to count the number of observations above and below this grand median. This is much as you would do for a sign test, but here we count aboves and belows for ecah group separately:<sup>8</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">med <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(d6<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>z)</span>
<span id="cb26-2">med </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 3.522339</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">tab <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(d6, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">above =</span> (z <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> med)))</span>
<span id="cb28-2">tab</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>     above
group FALSE TRUE
    2    15    5
    6     5   15</code></pre>
</div>
</div>
<p>If the two groups have the same median, about 50% of the observations in each group should be above the overall median and about 50% below. If the two groups have different medians, one of the groups will have most of its observations above the grand median, and the other one will have most of its observations below. As for the sign test, it doesn’t matter how <em>far</em> above or below the grand median each observation is, just whether it <em>is</em> above or below.</p>
<p>In the example above, knowing which group an observation is from tells you something about whether it is likely to be above or below the grand median (if the 2 df group, probably below; if the 6 df group, probably above). Hence there appears to be an <em>association</em> between group and being above or below, and you might imagine testing this with a chi-squared test for association. This is how I run the test in my <code>smmr</code> package:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median_test</span>(d6, z, df)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>$grand_median
[1] 3.522339

$table
     above
group above below
    2     5    15
    6    15     5

$test
       what        value
1 statistic 10.000000000
2        df  1.000000000
3   P-value  0.001565402</code></pre>
</div>
</div>
<p>The P-value is definitely small enough to conclude that there <em>is</em> an association, and hence to (correctly) conclude that the two groups have different medians.</p>
<p>A couple of technicalities:</p>
<ul>
<li>This runs the chi-squared test <em>without</em> Yates’ correction, even for 2-by-2 tables. This is to enable you to get the same result by hand-calculation, should you remember how to do that. The <code>chisq.test</code> function <em>does</em> by default use this correction for 2-by-2 tables (see the help for <code>chisq.test</code>).</li>
<li>You might have observed in this example that the row totals are fixed (each sample is of size 20, split between above and below the grand median somehow), and also the column totals are fixed (altogether 20 of the data values must be above the grand median and 20 below). This is always true, and so you might consider running Fisher’s exact test here instead, which I do below. I didn’t put this in my package, for a number of reasons: (i) I would have to teach my students Fisher’s exact test first; (ii) depending on the level of the class, I would also have to teach the hypergeometric distribution; (iii) Mood’s median test also applies to more than two groups (see below), but it is not clear to me that Fisher’s exact test applies to the <img src="https://latex.codecogs.com/png.latex?k"> by 2 table you would get from <img src="https://latex.codecogs.com/png.latex?k"> groups.</li>
</ul>
<p>Fisher’s exact test for the same data looks like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fisher.test</span>(tab)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Fisher's Exact Test for Count Data

data:  tab
p-value = 0.003848
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  1.794006 48.338244
sample estimates:
odds ratio 
  8.418596 </code></pre>
</div>
</div>
<p>The P-value is bigger here, but still significant. End of technicalities.</p>
<p>As the sign test does, this test counts only whether each data value is above or below something, and this is using the data inefficiently <em>if</em> the actual values are meaningful. Thus you would expect the <img src="https://latex.codecogs.com/png.latex?t">-test to be more powerful if it is valid, but this is of no concern, because in that case you would use the <img src="https://latex.codecogs.com/png.latex?t">-test. When the <img src="https://latex.codecogs.com/png.latex?t">-test is not valid, Mood’s median test makes no assumptions about the data (in contrast to the rank sum test): if the two populations have the same median, about 50% of the values in each group will be above that common median, and if they don’t, there will be an association between group and being above/below that the chi-squared test has a chance at finding. This is regardless of the shape of either distribution.</p>
</section>
<section id="more-than-two-groups-anova-revisited" class="level2">
<h2 class="anchored" data-anchor-id="more-than-two-groups-anova-revisited">More than two groups: ANOVA revisited</h2>
<p>You may have noticed that it doesn’t really matter how many groups you have: you work out the median of all the observations and count above and below within each group, no matter how many groups there are. The null hypothesis in this case is that <em>all</em> the groups have the same median, and the alternative is “not the null”. This is analogous to one-way ANOVA, where the null hypothesis is that all the groups have the same <em>mean</em>, and if rejected, there is further work to do to find which groups differ from which. You might do that with Tukey’s method.</p>
<p>You might use Mood’s median test in an ANOVA-type situation where you felt that the observations within each group were not normal enough given your sample sizes. Since this test is analogous to the <img src="https://latex.codecogs.com/png.latex?F">-test, you may need a followup to decide which groups have different medians. One way to do this is to run Mood’s median test on all possible pairs of groups (ignoring the data in groups other than the ones you are comparing), and then do an adjustment to the P-values like Bonferroni or Holm to account for the multiple testing.</p>
<p>I would actually go further than this. I would begin by drawing a boxplot to assess normality and equal spreads within each group, and then:</p>
<ul>
<li>if the data are normal enough and the spreads<sup>9</sup> are more or less equal, do ordinary ANOVA followed by Tukey if needed.</li>
<li>if the data are normal enough but the spreads are not more or less equal, do Welch’s ANOVA (see above) using <code>oneway.test</code>, following up if needed with <a href="https://aaronschlegel.me/games-howell-post-hoc-multiple-comparisons-test-python.html#:~:text=The%20Games%2DHowell%20test%20is,variances%20or%20equal%20sample%20sizes.">Games-Howell</a>. The Games-Howell procedure is available in package <code>PMCMRplus</code> as <code>gamesHowellTest</code>.</li>
<li>if the data are not normal enough, do Mood’s median test, followed by pairwise Mood’s median tests, adjusting for multiple testing.</li>
</ul>
<p>All of this seems to need an example. I use the <code>InsectSprays</code> data. These are counts of insects in experimental units treated with different insecticides. The fact that these are counts suggests that higher counts might be more variable:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">data</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"InsectSprays"</span>)</span></code></pre></div></div>
</div>
<p>As suggested above, we start with boxplots:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(InsectSprays, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> spray, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> count)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_boxplot</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/rank-sum-also-waste/index_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>The smaller counts (associated with sprays C, D, and E) do seem to be less variable. The normality is mostly not too bad, though there are some high outliers with sprays C and E. (There are only twelve observations for each spray.)</p>
<p>The counts do not have a common spread across sprays, so ordinary ANOVA is out of the question, but Welch ANOVA might be OK:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">oneway.test</span>(count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> spray, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> InsectSprays)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    One-way analysis of means (not assuming equal variances)

data:  count and spray
F = 36.065, num df = 5.000, denom df = 30.043, p-value = 7.999e-12</code></pre>
</div>
</div>
<p>Strongly significant, so we need some kind of post-hoc test. The recommended one is called Games-Howell,<sup>10</sup> which can be found in the <code>PMCMRplus</code> package:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb38" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb38-1">PMCMRplus<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gamesHowellTest</span>(count <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> spray, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> InsectSprays)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>
    Pairwise comparisons using Games-Howell test</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>data: count by spray</code></pre>
</div>
<div class="cell-output cell-output-stdout">
<pre><code>  A       B       C       D       E      
B 0.99725 -       -       -       -      
C 6.6e-06 6.7e-07 -       -       -      
D 0.00013 1.3e-05 0.05567 -       -      
E 3.2e-05 3.7e-06 0.44661 0.60060 -      
F 0.92475 0.98879 3.4e-05 0.00029 0.00011</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>
P value adjustment method: none</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>alternative hypothesis: two.sided</code></pre>
</div>
</div>
<p>A look at the boxplot suggests that the sprays divide into two sets: A, B, F (high insect count), and C, D, E (low). This is how Games-Howell comes out, though the C-D difference is almost significant.</p>
<p>If you are bothered by the outliers, then Mood’s median test is the way to go (from <code>smmr</code>):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median_test</span>(InsectSprays, count, spray)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>$grand_median
[1] 7

$table
     above
group above below
    A    11     0
    B    11     0
    C     0    11
    D     1    11
    E     0    12
    F    12     0

$test
       what        value
1 statistic 6.533256e+01
2        df 5.000000e+00
3   P-value 9.561214e-13</code></pre>
</div>
</div>
<p>This is also strongly significant. Looking at the table of values above and below suggests the same division of the sprays into two sets:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pairwise_median_test</span>(InsectSprays, count, spray)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 15 × 4
   g1    g2        p_value adj_p_value
   &lt;fct&gt; &lt;fct&gt;       &lt;dbl&gt;       &lt;dbl&gt;
 1 A     B     0.391         1        
 2 A     C     0.00000273    0.0000409
 3 A     D     0.0000446     0.000668 
 4 A     E     0.000000963   0.0000145
 5 A     F     0.528         1        
 6 B     C     0.00000273    0.0000409
 7 B     D     0.0000446     0.000668 
 8 B     E     0.000000963   0.0000145
 9 B     F     0.414         1        
10 C     D     0.00165       0.0248   
11 C     E     0.0661        0.991    
12 C     F     0.000000963   0.0000145
13 D     E     0.123         1        
14 D     F     0.0000446     0.000668 
15 E     F     0.000000963   0.0000145</code></pre>
</div>
</div>
<p>though this time sprays C and D <em>are</em> significantly different, with a P-value of 0.025.</p>
</section>
<section id="appendix-generating-samples-from-groups" class="level2">
<h2 class="anchored" data-anchor-id="appendix-generating-samples-from-groups">Appendix: generating samples from groups</h2>
<p>Earlier, I threw this function at you without explaining it:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb48" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb48-1">gen_sample</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>function (n, mean, sd) 
{
    tibble(gp = c("x", "y"), n = n, mean = mean, sd = sd) %&gt;% 
        rowwise() %&gt;% mutate(z = list(rnorm(n, mean, sd))) %&gt;% 
        unnest(z) %&gt;% select(gp, z)
}
&lt;bytecode: 0x5b4269900df8&gt;</code></pre>
</div>
</div>
<p>There are different ways to generate samples from different groups with possibly different means, SDs and sample sizes. This is how I like to do it. Let me take you through the process.</p>
<p>The first step is to make a data frame with one row for each sample that will be generated. This uses the inputs to the function above, so we will make some up:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1">n <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb50-2">mean <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb50-3">sd <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb50-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 4
  gp        n  mean    sd
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2
2 y         3    10     1</code></pre>
</div>
</div>
<p>Evidently, in a function for public consumption, you would check that all the inputs are the same length, or you would rely on <code>tibble</code> telling you that only vectors of length 1 are recycled.<sup>11</sup> The groups are for no good reason called <code>x</code> and <code>y</code>.</p>
<p>The next two lines generate random samples, one for each group, according to the specifications, and store them each in one cell of the two-row spreadsheet:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) </span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 5
# Rowwise: 
  gp        n  mean    sd z        
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;list&gt;   
1 x         5    20     2 &lt;dbl [5]&gt;
2 y         3    10     1 &lt;dbl [3]&gt;</code></pre>
</div>
</div>
<p>The new column <code>z</code> is a list column, since the top cell of the column is a vector of length 5, and the bottom cell is a vector of length 3. To actually see the values they contain, we <code>unnest</code> <code>z</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb54" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb54-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb54-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb54-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb54-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 8 × 5
  gp        n  mean    sd     z
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2 23.1 
2 x         5    20     2 22.2 
3 x         5    20     2 20.6 
4 x         5    20     2 17.0 
5 x         5    20     2 21.1 
6 y         3    10     1 10.7 
7 y         3    10     1 10.1 
8 y         3    10     1  9.25</code></pre>
</div>
</div>
<p>and, finally, the middle three columns were only used to generate the values in <code>z</code>, so they can be thrown away now by <code>select</code>ing only <code>gp</code> and <code>z</code>.</p>
<p>The <code>rowwise</code> is necessary:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">gp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"y"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> sd) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb56-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(n, mean, sd))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb56-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 5
  gp        n  mean    sd     z
  &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1 x         5    20     2 15.3 
2 x         5    20     2  9.45
3 y         3    10     1 15.3 
4 y         3    10     1  9.45</code></pre>
</div>
</div>
<p>because <code>rnorm</code> is vectorized, and for the <code>x</code> sample, R will draw one sampled value from each normal distribution, and then repeat the same values for the <code>y</code> sample. This is very much <em>not</em> what we want.</p>
<p>I used the same idea to draw my random chi-squared data later on:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb58-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">z =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, df))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(z)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 40 × 2
      df      z
   &lt;dbl&gt;  &lt;dbl&gt;
 1     2  2.23 
 2     2  3.58 
 3     2  1.56 
 4     2 11.1  
 5     2  0.924
 6     2  0.263
 7     2  0.692
 8     2  2.81 
 9     2  4.85 
10     2  2.39 
# ℹ 30 more rows</code></pre>
</div>
</div>
<p>(twenty values from <img src="https://latex.codecogs.com/png.latex?%5Cchi%5E2_2">, followed by twenty from <img src="https://latex.codecogs.com/png.latex?%5Cchi%5E2_6">.)</p>
<p>This suggests that I ought to be able to generalize my function <code>gen_sample</code>. Generalizing to any number of groups needs no extra work: the length of the input <code>n</code> determines the number of groups, and the values in <code>n</code> determine the size of each of those groups.</p>
<p>The interesting generalization is the distribution to sample from. The first parameter of the functions <code>rnorm</code>, <code>rchisq</code> etc. is always the number of random values to generate, but the remaining parameters are different for each distribution. This suggests that my generalized random sample generator ought to have the name of the random sampling function as input, followed by <code>...</code> to allow any other inputs needed by that sampling function; these then get passed on. At present, this idea is still living in my head, so I think I need to write another blog post about that to make sure that it does indeed work.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>If each person undergoes <em>both</em> treatments, this is matched pairs, and requires a one-sample test on the <em>differences</em> between the outcomes on each treatment for each person.↩︎</p></li>
<li id="fn2"><p>Two independent pieces of work that came to the same answer. R uses the name Welch, SAS the name Satterthwaite.↩︎</p></li>
<li id="fn3"><p>Explanation of this function is in the Appendix.↩︎</p></li>
<li id="fn4"><p>Because the variances are different now.↩︎</p></li>
<li id="fn5"><p>See below for an example.↩︎</p></li>
<li id="fn6"><p>Steps: make a data frame with a row for each simulation; work rowwise; generate a long dataframe with a sample from each population for each simulation; run the rank sum test for each simulation; extract the P-value.↩︎</p></li>
<li id="fn7"><p>Like the sign test, it is not very powerful when the data are actually normal, but why do you care about that?↩︎</p></li>
<li id="fn8"><p>The <code>with</code> says to look in data frame <code>d6</code> for <code>df</code> and <code>z</code>.↩︎</p></li>
<li id="fn9"><p>eg. as measured by the heights of the boxplot boxes, which are IQRs.↩︎</p></li>
<li id="fn10"><p>This is a variation on Tukey which does not assume equal variances, and so is exactly what we want.↩︎</p></li>
<li id="fn11"><p>So, for example, if both your sample sizes are the same, you could define eg <code>n &lt;- 10</code> and it would get expanded to length 2 in the function.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/rank-sum-also-waste/</guid>
  <pubDate>Fri, 06 May 2022 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/rank-sum-also-waste/Screenshot 2025-12-28 at 18-17-54 2.5. Rank Sum Test — Introduction to Statistics and Data Science.png" medium="image" type="image/png" height="109" width="144"/>
</item>
<item>
  <title>Why the Signed-Rank Test Is a Waste of Time</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>The signed rank test is only rarely useful, and, as we see, even more rarely useful in the kind of situation where we might think of using it.</p>
<p>(The image accompanying this post is of Frank Wilcoxon, who developed the signed rank test.)</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Screenshot 2025-12-28 at 13-45-13 12.4 Wilcoxon Signed-Rank Test - Statistics LibreTexts.png"</span></span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "Screenshot 2025-12-28 at 13-45-13 12.4 Wilcoxon Signed-Rank Test - Statistics LibreTexts.png"</code></pre>
</div>
</div>
</section>
<section id="the-one-sample-t-test" class="level2">
<h2 class="anchored" data-anchor-id="the-one-sample-t-test">The one-sample t-test</h2>
<p>Suppose we have one sample of independent, identically distributed observations from some population, and we want to see whether we believe the population mean or median is some value, like 15. The standard test is the one-sample <img src="https://latex.codecogs.com/png.latex?t">-test, where here we are pretending that we have a reason to know that the mean is greater than 15 if it is not 15:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb5-2">x</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 20.86560 13.76096 15.19321 13.90139 16.63971 18.12691 12.76501 18.37393
 [9] 16.01214 19.28764</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    One Sample t-test

data:  x
t = 1.7818, df = 9, p-value = 0.05423
alternative hypothesis: true mean is greater than 15
95 percent confidence interval:
 14.95704      Inf
sample estimates:
mean of x 
 16.49265 </code></pre>
</div>
</div>
<p>In this case, the P-value is 0.0542, so we do not quite reject the null hypothesis that the population mean is 15: there is no evidence (at <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%200.05">) that the population mean is greater than 15. In this case, we have made a type II error, because the population mean is actually 16, and so the null hypothesis is actually wrong but we failed to reject it.</p>
<p>We use these data again later.</p>
<p>The theory behind the <img src="https://latex.codecogs.com/png.latex?t">-test is that the population from which the sample is taken has a normal distribution. This is assessed in practice by looking at a histogram or a normal quantile plot of the data:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(x), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> x)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>With a small sample, it is hard to detect whether a deviation from normality like this indicates a non-normal population or is just randomness.<sup>1</sup> In this case, I actually generated my sample from a normal distribution, so I know the answer here is randomness.</p>
<p>There is another issue here, the Central Limit Theorem. This says, in words, that the sampling distribution of the sample mean from a large sample will be approximately normal, <em>no matter what the population distribution is</em>. How close the approximation is will depend on how non-normal the population is; if the population is very non-normal (for example, very skewed or has extreme outliers), it might take a very large sample for the approximation to be of any use.</p>
<p>Example: the chi-squared distribution is right-skewed, with one parameter, the degrees of freedom. As the degrees of freedom increases, the distribution becomes less skewed and more normal in shape.<sup>2</sup></p>
<p>Consider the chi-squared distribution with 12 df:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dchisq</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This is mildly skewed. Is a sample of size 20 from this distribution large enough to use the <img src="https://latex.codecogs.com/png.latex?t">-test? We can simulate the sampling distribution of the sample mean, since we know what the population is, by drawing many (in this case 1000) samples from it and seeing now normal the simulated sample means look:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(my_sample)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> my_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This is a tiny bit skewed right (the very largest values are slightly too large and the very smallest ones not quite small enough, though the rest of the values hug the line), but I would consider this close enough to trust the <img src="https://latex.codecogs.com/png.latex?t">-test.</p>
<p>Now consider the chi-squared distribution with 3 df, which is more skewed:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dchisq</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> y)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>How normal is the sampling distribution of the sampling mean now, again with a sample of size 20?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">df =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_mean =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(my_sample)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> my_mean)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>This time, the normal quantile plot definitely strays from the line in a way that indicates a right-skewed non-normal sampling distribution of the sample mean. With this sample size, if the population is as skewed as a chi-squared distribution with 12 degrees of freedom, the <img src="https://latex.codecogs.com/png.latex?t">-test is fine, but if it is as skewed as a chi-squared distribution with 3 degrees of freedom, the <img src="https://latex.codecogs.com/png.latex?t">-test is at best questionable.</p>
<p>So, consideration of whether to use a <img src="https://latex.codecogs.com/png.latex?t">-test has two parts: how <em>normal</em> the population is (answered by asking how normal your <em>sample</em> is), and how <em>large</em> the sample is. The larger the sample size is, the less the normality matters, but it is an awkward judgement call to assess whether the non-normality in the data distribution matters enough given the sample size.<sup>3</sup></p>
</section>
<section id="the-sign-test" class="level2">
<h2 class="anchored" data-anchor-id="the-sign-test">The sign test</h2>
<p>If you have decided that your sample does not have close enough to a normal distribution (given the sample size), and therefore that you should not be using the <img src="https://latex.codecogs.com/png.latex?t">-test, what do you do? Two standard options are the sign test and the signed-rank test, with the latter often being recommended over the former because of the former’s lack of power. These tests are both non-parametric, in that they do not depend on the data having (at least approximately) any specific distribution.</p>
<p>For the sign test, you count how many of your observations are above and below the null median. Here we use the same data as we used for the <img src="https://latex.codecogs.com/png.latex?t">-test:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  `x &gt; 15`     n
  &lt;lgl&gt;    &lt;int&gt;
1 FALSE        3
2 TRUE         7</code></pre>
</div>
</div>
<p>The number of values (say) above the null median is the test statistic. If the null hypothesis is true, each value is independently either above or below the null median with probability 0.5, and thus the test statistic has a binomial distribution with <img src="https://latex.codecogs.com/png.latex?n"> equal to the sample size and <img src="https://latex.codecogs.com/png.latex?p%20=%200.5">. Hence the P-value for an upper-tailed test is</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dbinom</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">prob =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.171875</code></pre>
</div>
</div>
<p>The split of 7 values above 15 and 3 below is still fairly close<sup>4</sup> to 50–50, and so the P-value is large, much larger than for the <img src="https://latex.codecogs.com/png.latex?t">-test.</p>
<p>The sign test does not use the data very efficiently: it only counts whether each data value is above or below the hypothesized median. Thus, if you are in a position to use the <img src="https://latex.codecogs.com/png.latex?t">-test that uses the actual data values, you should do so. However, it is completely assumption-free: as long as the observations really are independent, it does not matter at all what the population distribution looks like.</p>
</section>
<section id="the-signed-rank-test" class="level2">
<h2 class="anchored" data-anchor-id="the-signed-rank-test">The signed rank test</h2>
<p>The signed-rank test occupies a kind of middle ground between the sign test and the <img src="https://latex.codecogs.com/png.latex?t">-test.</p>
<p>Here’s how it works for our data, testing for a median of 15, against an upper-tailed alternative:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">diff =</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">abs_diff =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(diff)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rk =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rank</span>(abs_diff)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb18-5">d</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 4
       x   diff abs_diff    rk
   &lt;dbl&gt;  &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;
 1  20.9  5.87     5.87     10
 2  13.8 -1.24     1.24      4
 3  15.2  0.193    0.193     1
 4  13.9 -1.10     1.10      3
 5  16.6  1.64     1.64      5
 6  18.1  3.13     3.13      7
 7  12.8 -2.23     2.23      6
 8  18.4  3.37     3.37      8
 9  16.0  1.01     1.01      2
10  19.3  4.29     4.29      9</code></pre>
</div>
</div>
<p>Subtract the hypothesized median from each data value, and then rank the differences from smallest to largest in terms of absolute value. The smallest difference in size is 0.193, which gets rank 1, and the largest in size is <img src="https://latex.codecogs.com/png.latex?5.87">, which gets rank 10. One of the negative differences is <img src="https://latex.codecogs.com/png.latex?-2.23">, which is the fifth largest in size (has rank 6).</p>
<p>The next stage is to sum up the ranks separately for the positive and negative differences:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(diff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sum =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(rk))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  `diff &gt; 0`   sum
  &lt;lgl&gt;      &lt;dbl&gt;
1 FALSE         13
2 TRUE          42</code></pre>
</div>
</div>
<p>There are only three negative differences, so their ranks add up to only 13, compared to a sum of 42 for the positive differences.<sup>5</sup> For an upper-tailed test, the test statistic is the sum of the positive differences, which, if <em>large</em> enough, will lead to rejection of the null hypothesis.</p>
<p>Is 42 large enough to reject the null with?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wilcox.test</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Wilcoxon signed rank exact test

data:  x
V = 42, p-value = 0.08008
alternative hypothesis: true location is greater than 15</code></pre>
</div>
</div>
<p>The P-value is 0.0801, not small enough to reject the null median of 15 in favour of a larger value. It is a little bigger than for the <img src="https://latex.codecogs.com/png.latex?t">-test, but smaller than for the sign test.</p>
<p>A historical note: the name usually attached to the signed-rank test is <a href="https://en.wikipedia.org/wiki/Frank_Wilcoxon">Frank Wilcoxon</a>. He worked out the null distribution of the signed rank statistic (an exercise in combinatorics).</p>
<p>The R function is a bit of a confusing misnomer, because there was also a statistician called <a href="https://en.wikipedia.org/wiki/Walter_Francis_Willcox">Walter Francis Willcox</a>, who had nothing to do with this test.</p>
</section>
<section id="assessing-the-signed-rank-test" class="level2">
<h2 class="anchored" data-anchor-id="assessing-the-signed-rank-test">Assessing the signed rank test</h2>
<p>The signed-rank test seemed to behave well enough in our example, with actually normal data. But the point of mentioning the test is as something to use when the data are <em>not</em> normal.</p>
<p>So let’s take some samples from our skewed chi-squared distribution with 3 df.</p>
<p>This distribution has this median:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">med <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">qchisq</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb24-2">med</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 2.365974</code></pre>
</div>
</div>
<p>and away we go. I’ll do 10,000 simulations this time:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">wilcox.test</span>(my_sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> med, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb26-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_p =</span> my_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> dd</span>
<span id="cb26-6">dd</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10,000 × 4
# Rowwise: 
     sim my_sample  my_test   my_p
   &lt;int&gt; &lt;list&gt;     &lt;list&gt;   &lt;dbl&gt;
 1     1 &lt;dbl [10]&gt; &lt;htest&gt; 0.862 
 2     2 &lt;dbl [10]&gt; &lt;htest&gt; 0.722 
 3     3 &lt;dbl [10]&gt; &lt;htest&gt; 0.5   
 4     4 &lt;dbl [10]&gt; &lt;htest&gt; 0.615 
 5     5 &lt;dbl [10]&gt; &lt;htest&gt; 0.0322
 6     6 &lt;dbl [10]&gt; &lt;htest&gt; 0.947 
 7     7 &lt;dbl [10]&gt; &lt;htest&gt; 0.0322
 8     8 &lt;dbl [10]&gt; &lt;htest&gt; 0.0244
 9     9 &lt;dbl [10]&gt; &lt;htest&gt; 0.0967
10    10 &lt;dbl [10]&gt; &lt;htest&gt; 0.539 
# ℹ 9,990 more rows</code></pre>
</div>
</div>
<p>Since the null hypothesis is true, the P-values should have a uniform distribution:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(dd, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> my_p)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/index_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That doesn’t look very uniform, but rather skewed to the right, with too many low values, so that the test rejects more often than it should:<sup>6</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">dd <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(my_p <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
# Rowwise: 
  `my_p &lt;= 0.05`     n
  &lt;lgl&gt;          &lt;int&gt;
1 FALSE           9194
2 TRUE             806</code></pre>
</div>
</div>
<p>A supposed test at <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%200.05"> actually has a probability near 0.08 of making a type I error. (That’s why I did 10,000 simulations, in the hopes of eliminating sampling variability as a reason for it being different than 0.05.)</p>
<p>To investigate what happened, let’s look at one random sample and see whether we can reason it out:<sup>7</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">457297</span>)</span></code></pre></div></div>
</div>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rchisq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb32-2">x</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 5.6369920 7.8136036 2.8290842 8.2463800 0.6228536 0.8004611 2.2627925
 [8] 6.3275717 1.2881783 0.6634575</code></pre>
</div>
</div>
<p>and go through the calculations for the signed rank statistic again:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">diff =</span> x <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> med) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">abs_diff =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(diff)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb34-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">rk =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rank</span>(abs_diff)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb34-5">d</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 10 × 4
       x   diff abs_diff    rk
   &lt;dbl&gt;  &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;
 1 5.64   3.27     3.27      7
 2 7.81   5.45     5.45      9
 3 2.83   0.463    0.463     2
 4 8.25   5.88     5.88     10
 5 0.623 -1.74     1.74      6
 6 0.800 -1.57     1.57      4
 7 2.26  -0.103    0.103     1
 8 6.33   3.96     3.96      8
 9 1.29  -1.08     1.08      3
10 0.663 -1.70     1.70      5</code></pre>
</div>
</div>
<p>There are five positive and five negative differences, exactly as we would expect. But the positive differences are the <em>four largest ones in size</em>, so that the sum of the ranks for the positive differences is quite a bit larger than the sum of the ranks for the negative differences: 36 as against 19:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb36-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(diff <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb36-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sum =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(rk))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  `diff &gt; 0`   sum
  &lt;lgl&gt;      &lt;dbl&gt;
1 FALSE         19
2 TRUE          36</code></pre>
</div>
</div>
<p>This seems like a bit of a difference in rank sums, given that the null hypothesis is actually <em>true</em>.</p>
<p>Why did this happen, and why might it happen again? The population distribution is skewed to the right, so that there will occasionally be sample values <em>much</em> larger than the null median (even if that median is correct). There can not be sample values much <em>smaller</em> than the null median, because the distribution is bunched up at the bottom. That means that the positive differences will tend to be the largest ones in size, and hence the test statistic will tend to be bigger, and the P-value smaller, than ought to be the case.</p>
</section>
<section id="conclusions" class="level2">
<h2 class="anchored" data-anchor-id="conclusions">Conclusions</h2>
<p>The usual get-out for the above is to say that the signed-rank test only applies to symmetric distributions. Except that, one of the principal ways that the <img src="https://latex.codecogs.com/png.latex?t">-test can fail is that the population distribution is skewed, and what we are then saying is that in that situation, we cannot use the signed-rank test either. Really, the only situation in which the signed-rank test has any value is when the population is symmetric with long tails or outliers, which seems to me a small fraction of the times when you would not want to use a <img src="https://latex.codecogs.com/png.latex?t">-test.</p>
<p>So, the official recommendation is:</p>
<ul>
<li>when the population distribution seems normal enough (given the sample size), use the <img src="https://latex.codecogs.com/png.latex?t">-test</li>
<li>when the population distribution is not normal enough but is apparently symmetric, use the signed-rank test</li>
<li>otherwise, use the sign test.</li>
</ul>
<p>The second of those seems a bit unlikely (or unlikely to be sure enough about in practice), so that when I teach this stuff, it’s the <img src="https://latex.codecogs.com/png.latex?t">-test or the sign test. As I have explained, the signed-rank test is only very rarely useful, and therefore, I contend, it is a waste of time to learn about it.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>This is a good place to use Di Cook’s idea of a “line-up”, where you generate eight normal quantile plots of actual normal data, and then add your actual normal quantile plot, shuffle them around, and see whether you can pick which one is your data. If you can, your data is different from what random normals would produce.↩︎</p></li>
<li id="fn2"><p>In the limit as degrees of freedom increases, the distribution <em>is</em> normal.↩︎</p></li>
<li id="fn3"><p>When you have actual data from some unknown distribution, one way to get a sense of the sampling distribution of the sample mean is to use the bootstrap: generate a large number of samples <em>from the sample</em> with replacement, work out the mean of each one, and then see whether that distribution of sample means is close to normal. This is still a subjective call, but at least it is only one thing to assess, rather than having to combine an assessment of normality with an assessment of sample size.↩︎</p></li>
<li id="fn4"><p>In the sense that if you tossed a fair coin 10 times, you would not be terribly surprised to see 7 heads and 3 tails.↩︎</p></li>
<li id="fn5"><p>The three negative differences average to a rank of about 4.3, while the seven positive differences average to a rank of 6. Thus the positive differences are typically bigger in size than the negative ones are.↩︎</p></li>
<li id="fn6"><p>A 95% confidence interval for the true type I error probability is from 0.075 to 0.086, so it is definitely higher than 0.05. <code>prop.test</code> is a nice way to get this interval.↩︎</p></li>
<li id="fn7"><p>I am cheating and using one that I think makes the point clear.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/</guid>
  <pubDate>Wed, 04 May 2022 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/signed-rank-waste-of-time/Screenshot 2025-12-28 at 13-45-13 12.4 Wilcoxon Signed-Rank Test - Statistics LibreTexts.png" medium="image" type="image/png" height="190" width="144"/>
</item>
<item>
  <title>Tidy simulation</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/tidy-simulation/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Using rowwise to save calculation, estimate power or test size, bootstrap distributions</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>To see what might happen when a process is repeated many times, we can calculate. Or we can physically re-run the process many times, and count up the results: simulation of the process.</p>
<p>This can be applied to estimating probabilities, obtaining bootstrap distributions (for example when assessing normality), or estimating the power or size of tests.</p>
<p>I want my simulations here to be reproducible, so I will set the random number seed first:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">457299</span>)</span></code></pre></div></div>
</div>
</section>
<section id="tossing-a-coin" class="level2">
<h2 class="anchored" data-anchor-id="tossing-a-coin">Tossing a coin</h2>
<p>Imagine we toss a fair coin 10 times. How likely are we to get 8 or more heads? If you remember the binomial distribution, you can work it out. But if you don’t? Make a virtual coin, toss it 10 times, count the number of heads, repeat many times, see how many of those are 8 or greater.</p>
<p>Let’s set up our virtual coin first:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">coin <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"H"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"T"</span>)</span></code></pre></div></div>
</div>
<p>and, since getting a head on one toss doesn’t prevent a head on others, ten coin tosses would be a sample of size 10 with replacement from this coin:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(coin, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "H" "T" "H" "H" "H" "H" "H" "H" "T" "T"</code></pre>
</div>
</div>
<p>Seven heads this time.</p>
<p>I have a mechanism I use for “tidy simulation”:</p>
<ul>
<li>set up a dataframe with a column called <code>sim</code> to label the simulations</li>
<li>work <code>rowwise</code></li>
<li>for each <code>sim</code>, do one copy of the thing you’ll be doing many times (in this case, simulating 10 coin tosses)</li>
<li>calculate whatever you want to calculate for each <code>sim</code></li>
<li>summarize the results</li>
</ul>
<p>For this problem, the code looks like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">my_sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(coin, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">heads =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(my_sample <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"H"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(heads <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["heads >= 8"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"946"},{"1":"TRUE","2":"54"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>It is probably a good idea to run this one line at a time (to see what it does, and later as you develop your own).</p>
<p>In this case, 54 of the 1000 simulated sets of 10 coin tosses gave at least 8 heads, so our estimate of the probability of getting 8 or more heads in 10 tosses of a fair coin is 0.054.</p>
<p>Some notes about the code:</p>
<ul>
<li>I am using 1000 simulations as my “many” repeats of tossing a coin 10 times. A larger number would give a more accurate answer, but would take longer to run.<sup>1</sup></li>
<li>working <code>rowwise</code> allows us to treat each row of the dataframe we are building as an independent entity. This makes the coding in the two <code>mutate</code>s that follow much easier to follow, because our mental model only has to work one row at a time.<sup>2</sup></li>
<li><code>my_sample</code> behaves like <em>one</em> sample of 10 coin tosses, though in fact it is a whole column of samples of 10 coin tosses. It is a vector of length 10, so to get it into one cell of our dataframe, we wrap it in <code>list</code>, making the whole column a list-column.</li>
<li>Once again thinking of <code>my_sample</code> as a single sample, we then count the number of heads in it. I could use <code>count</code>, or <code>table</code>, but I don’t want to get caught by samples with no heads or no tails. This way counts 1 for each H in the sample, then adds up the counts.<sup>3</sup></li>
<li>Finally, count up the number of simulated sets of 10 coin tosses that had 8 or more heads. <code>count</code> accepts a logical condition as well as a column. (Behind the scenes it constructs a column of <code>TRUE</code> and <code>FALSE</code> first, and then counts that.)</li>
</ul>
<p>In this case, we know the right answer:<sup>4</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pbinom</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lower.tail =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 0.0546875</code></pre>
</div>
</div>
<p>Our simulation came out very close to this.</p>
<p>Aside:<sup>5</sup> we can work out how accurate our simulation might be by noting that our 1000 simulations are also like Bernoulli trials: each one gives us 8 or more heads or it doesn’t, with unknown probability that is precisely the thing that we are trying to estimate. Thus:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">54</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  54 and 1000
number of successes = 54, number of trials = 1000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.04082335 0.06987401
sample estimates:
probability of success 
                 0.054 </code></pre>
</div>
</div>
<p>tells us, with 95% confidence, that the probability of 8 or more heads is between 0.041 and 0.070. To nail it down more precisely, use more than 1000 simulations.</p>
</section>
<section id="how-long-is-my-longest-suit" class="level2">
<h2 class="anchored" data-anchor-id="how-long-is-my-longest-suit">How long is my longest suit?</h2>
<p>In the game of bridge, each player, in two partnerships of 2, receives a hand of 13 cards randomly dealt from the usual deck of 52 cards. There is then an “auction” in which the two partnerships compete for the right to name the trump suit and play the hand. The bids in this auction are an undertaking to win a certain number of the 13 tricks with the named suit as trumps.<sup>6</sup> Your partner cannot see your cards, and so in the bidding you have to share information about the strength and suit distribution of your hand using standard methods<sup>7</sup> (you are not allowed to deceive your opponents), so that as a partnership you can decide how many tricks you can win between you.</p>
<p>One of the considerations in the bidding is the length of your longest suit, that is, the suit you hold the most cards in. The longest suit might have only 4 cards (eg. if you have 4 spades and 3 of each of the other suits), but if you are lucky<sup>8</sup> you might be dealt a hand with 13 cards all of the same suit and have a longest suit of 13 cards. Evidently something in between those is more likely, but <em>how</em> likely?</p>
<p>For a simulation, we need to set up a deck of cards and select 13 cards from it <em>without</em> replacement (since you can’t draw the same card twice in the same hand). The only thing that matters here is the suits, so we’ll set up a deck with only suits and no denominations like Ace or King. (This will make the sampling without replacement look a bit odd.)</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">deck <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"S"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"H"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>),</span>
<span id="cb12-2">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"D"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"C"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>))</span>
<span id="cb12-3">deck</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "S" "H" "H" "H" "H" "H" "H"
[20] "H" "H" "H" "H" "H" "H" "H" "D" "D" "D" "D" "D" "D" "D" "D" "D" "D" "D" "D"
[39] "D" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C"</code></pre>
</div>
</div>
<p>and deal ourselves a hand of 13 cards thus:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">hand <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(deck, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb14-2">hand</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "D" "C" "C" "D" "S" "H" "S" "S" "D" "S" "H" "H" "H"</code></pre>
</div>
</div>
<p>(note, for example, that the four Hearts in this hand are actually four <em>different</em> ones of the thirteen H in <code>deck</code>, since we are sampling without replacement. I could have labelled them by which Heart they were, but that would have made counting them more difficult.)</p>
<p>Then count the number of cards in each suit:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">tab <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(hand)</span>
<span id="cb16-2">tab</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>hand
C D H S 
2 3 4 4 </code></pre>
</div>
</div>
<p>This time the longest suit has four cards:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(tab)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4</code></pre>
</div>
</div>
<p>Using <code>table</code> is safe here, because we don’t care whether there are any suits with no cards in the hand, only about the greatest number of cards in any suit that we have cards in.<sup>9</sup></p>
<p>All of that leads us to this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hand =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(deck, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">suits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">table</span>(hand))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">longest =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(suits)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(longest)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["longest"],"name":[1],"type":["int"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"4","2":"354"},{"1":"5","2":"435"},{"1":"6","2":"171"},{"1":"7","2":"38"},{"1":"8","2":"2"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Note: the hands, and the tables of how many cards a hand has in each suit, are more than single numbers, so they need to be wrapped in <code>list</code>.</p>
<p>The most likely longest suit has 5 cards in it, a bit less than half the time. According to this, a longest suit of 8 cards happens about once in 500 hands, and longer longest suits are even less likely. (To estimate these small probabilities accurately, you need a lot of simulations, like, way more than 1000.)</p>
<p>Aside: the standard way of assessing hand <em>strength</em> is via high-card points: 4 for an ace, 3 for a king, 2 for a queen and one for a jack. All the other cards count zero. To simulate the number of points you might get in a hand, build a deck with the points for each card. There are four cards of each rank, and nine ranks that are worth no points:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">deck <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>),</span>
<span id="cb21-2">          <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>,<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rep</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">36</span>))</span></code></pre></div></div>
</div>
<p>The simulation process after that is a lot like before:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hand =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(deck, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb22-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">points =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(hand)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb22-5">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["sim"],"name":[1],"type":["int"],"align":["right"]},{"label":["hand"],"name":[2],"type":["list"],"align":["right"]},{"label":["points"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"<dbl [13]>","3":"13"},{"1":"2","2":"<dbl [13]>","3":"10"},{"1":"3","2":"<dbl [13]>","3":"11"},{"1":"4","2":"<dbl [13]>","3":"4"},{"1":"5","2":"<dbl [13]>","3":"4"},{"1":"6","2":"<dbl [13]>","3":"6"},{"1":"7","2":"<dbl [13]>","3":"13"},{"1":"8","2":"<dbl [13]>","3":"1"},{"1":"9","2":"<dbl [13]>","3":"13"},{"1":"10","2":"<dbl [13]>","3":"7"},{"1":"11","2":"<dbl [13]>","3":"10"},{"1":"12","2":"<dbl [13]>","3":"12"},{"1":"13","2":"<dbl [13]>","3":"6"},{"1":"14","2":"<dbl [13]>","3":"9"},{"1":"15","2":"<dbl [13]>","3":"10"},{"1":"16","2":"<dbl [13]>","3":"13"},{"1":"17","2":"<dbl [13]>","3":"8"},{"1":"18","2":"<dbl [13]>","3":"14"},{"1":"19","2":"<dbl [13]>","3":"16"},{"1":"20","2":"<dbl [13]>","3":"17"},{"1":"21","2":"<dbl [13]>","3":"7"},{"1":"22","2":"<dbl [13]>","3":"10"},{"1":"23","2":"<dbl [13]>","3":"6"},{"1":"24","2":"<dbl [13]>","3":"16"},{"1":"25","2":"<dbl [13]>","3":"12"},{"1":"26","2":"<dbl [13]>","3":"12"},{"1":"27","2":"<dbl [13]>","3":"15"},{"1":"28","2":"<dbl [13]>","3":"9"},{"1":"29","2":"<dbl [13]>","3":"11"},{"1":"30","2":"<dbl [13]>","3":"3"},{"1":"31","2":"<dbl [13]>","3":"5"},{"1":"32","2":"<dbl [13]>","3":"12"},{"1":"33","2":"<dbl [13]>","3":"11"},{"1":"34","2":"<dbl [13]>","3":"7"},{"1":"35","2":"<dbl [13]>","3":"10"},{"1":"36","2":"<dbl [13]>","3":"17"},{"1":"37","2":"<dbl [13]>","3":"15"},{"1":"38","2":"<dbl [13]>","3":"11"},{"1":"39","2":"<dbl [13]>","3":"7"},{"1":"40","2":"<dbl [13]>","3":"6"},{"1":"41","2":"<dbl [13]>","3":"0"},{"1":"42","2":"<dbl [13]>","3":"14"},{"1":"43","2":"<dbl [13]>","3":"8"},{"1":"44","2":"<dbl [13]>","3":"13"},{"1":"45","2":"<dbl [13]>","3":"16"},{"1":"46","2":"<dbl [13]>","3":"5"},{"1":"47","2":"<dbl [13]>","3":"12"},{"1":"48","2":"<dbl [13]>","3":"9"},{"1":"49","2":"<dbl [13]>","3":"10"},{"1":"50","2":"<dbl [13]>","3":"11"},{"1":"51","2":"<dbl [13]>","3":"15"},{"1":"52","2":"<dbl [13]>","3":"10"},{"1":"53","2":"<dbl [13]>","3":"20"},{"1":"54","2":"<dbl [13]>","3":"7"},{"1":"55","2":"<dbl [13]>","3":"11"},{"1":"56","2":"<dbl [13]>","3":"12"},{"1":"57","2":"<dbl [13]>","3":"15"},{"1":"58","2":"<dbl [13]>","3":"10"},{"1":"59","2":"<dbl [13]>","3":"3"},{"1":"60","2":"<dbl [13]>","3":"7"},{"1":"61","2":"<dbl [13]>","3":"11"},{"1":"62","2":"<dbl [13]>","3":"16"},{"1":"63","2":"<dbl [13]>","3":"6"},{"1":"64","2":"<dbl [13]>","3":"17"},{"1":"65","2":"<dbl [13]>","3":"5"},{"1":"66","2":"<dbl [13]>","3":"7"},{"1":"67","2":"<dbl [13]>","3":"7"},{"1":"68","2":"<dbl [13]>","3":"4"},{"1":"69","2":"<dbl [13]>","3":"13"},{"1":"70","2":"<dbl [13]>","3":"7"},{"1":"71","2":"<dbl [13]>","3":"9"},{"1":"72","2":"<dbl [13]>","3":"7"},{"1":"73","2":"<dbl [13]>","3":"6"},{"1":"74","2":"<dbl [13]>","3":"6"},{"1":"75","2":"<dbl [13]>","3":"13"},{"1":"76","2":"<dbl [13]>","3":"14"},{"1":"77","2":"<dbl [13]>","3":"16"},{"1":"78","2":"<dbl [13]>","3":"6"},{"1":"79","2":"<dbl [13]>","3":"6"},{"1":"80","2":"<dbl [13]>","3":"10"},{"1":"81","2":"<dbl [13]>","3":"15"},{"1":"82","2":"<dbl [13]>","3":"12"},{"1":"83","2":"<dbl [13]>","3":"9"},{"1":"84","2":"<dbl [13]>","3":"9"},{"1":"85","2":"<dbl [13]>","3":"20"},{"1":"86","2":"<dbl [13]>","3":"7"},{"1":"87","2":"<dbl [13]>","3":"11"},{"1":"88","2":"<dbl [13]>","3":"11"},{"1":"89","2":"<dbl [13]>","3":"2"},{"1":"90","2":"<dbl [13]>","3":"15"},{"1":"91","2":"<dbl [13]>","3":"11"},{"1":"92","2":"<dbl [13]>","3":"13"},{"1":"93","2":"<dbl [13]>","3":"10"},{"1":"94","2":"<dbl [13]>","3":"13"},{"1":"95","2":"<dbl [13]>","3":"12"},{"1":"96","2":"<dbl [13]>","3":"8"},{"1":"97","2":"<dbl [13]>","3":"10"},{"1":"98","2":"<dbl [13]>","3":"11"},{"1":"99","2":"<dbl [13]>","3":"4"},{"1":"100","2":"<dbl [13]>","3":"13"},{"1":"101","2":"<dbl [13]>","3":"6"},{"1":"102","2":"<dbl [13]>","3":"12"},{"1":"103","2":"<dbl [13]>","3":"10"},{"1":"104","2":"<dbl [13]>","3":"7"},{"1":"105","2":"<dbl [13]>","3":"3"},{"1":"106","2":"<dbl [13]>","3":"15"},{"1":"107","2":"<dbl [13]>","3":"19"},{"1":"108","2":"<dbl [13]>","3":"16"},{"1":"109","2":"<dbl [13]>","3":"7"},{"1":"110","2":"<dbl [13]>","3":"6"},{"1":"111","2":"<dbl [13]>","3":"8"},{"1":"112","2":"<dbl [13]>","3":"8"},{"1":"113","2":"<dbl [13]>","3":"12"},{"1":"114","2":"<dbl [13]>","3":"5"},{"1":"115","2":"<dbl [13]>","3":"4"},{"1":"116","2":"<dbl [13]>","3":"11"},{"1":"117","2":"<dbl [13]>","3":"9"},{"1":"118","2":"<dbl [13]>","3":"9"},{"1":"119","2":"<dbl [13]>","3":"9"},{"1":"120","2":"<dbl [13]>","3":"9"},{"1":"121","2":"<dbl [13]>","3":"10"},{"1":"122","2":"<dbl [13]>","3":"7"},{"1":"123","2":"<dbl [13]>","3":"15"},{"1":"124","2":"<dbl [13]>","3":"9"},{"1":"125","2":"<dbl [13]>","3":"4"},{"1":"126","2":"<dbl [13]>","3":"7"},{"1":"127","2":"<dbl [13]>","3":"4"},{"1":"128","2":"<dbl [13]>","3":"13"},{"1":"129","2":"<dbl [13]>","3":"7"},{"1":"130","2":"<dbl [13]>","3":"11"},{"1":"131","2":"<dbl [13]>","3":"5"},{"1":"132","2":"<dbl [13]>","3":"6"},{"1":"133","2":"<dbl [13]>","3":"14"},{"1":"134","2":"<dbl [13]>","3":"10"},{"1":"135","2":"<dbl [13]>","3":"16"},{"1":"136","2":"<dbl [13]>","3":"6"},{"1":"137","2":"<dbl [13]>","3":"12"},{"1":"138","2":"<dbl [13]>","3":"15"},{"1":"139","2":"<dbl [13]>","3":"17"},{"1":"140","2":"<dbl [13]>","3":"2"},{"1":"141","2":"<dbl [13]>","3":"7"},{"1":"142","2":"<dbl [13]>","3":"12"},{"1":"143","2":"<dbl [13]>","3":"11"},{"1":"144","2":"<dbl [13]>","3":"5"},{"1":"145","2":"<dbl [13]>","3":"9"},{"1":"146","2":"<dbl [13]>","3":"11"},{"1":"147","2":"<dbl [13]>","3":"9"},{"1":"148","2":"<dbl [13]>","3":"12"},{"1":"149","2":"<dbl [13]>","3":"6"},{"1":"150","2":"<dbl [13]>","3":"11"},{"1":"151","2":"<dbl [13]>","3":"10"},{"1":"152","2":"<dbl [13]>","3":"9"},{"1":"153","2":"<dbl [13]>","3":"7"},{"1":"154","2":"<dbl [13]>","3":"11"},{"1":"155","2":"<dbl [13]>","3":"14"},{"1":"156","2":"<dbl [13]>","3":"9"},{"1":"157","2":"<dbl [13]>","3":"2"},{"1":"158","2":"<dbl [13]>","3":"14"},{"1":"159","2":"<dbl [13]>","3":"7"},{"1":"160","2":"<dbl [13]>","3":"10"},{"1":"161","2":"<dbl [13]>","3":"7"},{"1":"162","2":"<dbl [13]>","3":"18"},{"1":"163","2":"<dbl [13]>","3":"6"},{"1":"164","2":"<dbl [13]>","3":"10"},{"1":"165","2":"<dbl [13]>","3":"16"},{"1":"166","2":"<dbl [13]>","3":"8"},{"1":"167","2":"<dbl [13]>","3":"14"},{"1":"168","2":"<dbl [13]>","3":"10"},{"1":"169","2":"<dbl [13]>","3":"3"},{"1":"170","2":"<dbl [13]>","3":"0"},{"1":"171","2":"<dbl [13]>","3":"8"},{"1":"172","2":"<dbl [13]>","3":"8"},{"1":"173","2":"<dbl [13]>","3":"15"},{"1":"174","2":"<dbl [13]>","3":"8"},{"1":"175","2":"<dbl [13]>","3":"14"},{"1":"176","2":"<dbl [13]>","3":"14"},{"1":"177","2":"<dbl [13]>","3":"14"},{"1":"178","2":"<dbl [13]>","3":"18"},{"1":"179","2":"<dbl [13]>","3":"8"},{"1":"180","2":"<dbl [13]>","3":"9"},{"1":"181","2":"<dbl [13]>","3":"16"},{"1":"182","2":"<dbl [13]>","3":"9"},{"1":"183","2":"<dbl [13]>","3":"8"},{"1":"184","2":"<dbl [13]>","3":"14"},{"1":"185","2":"<dbl [13]>","3":"19"},{"1":"186","2":"<dbl [13]>","3":"11"},{"1":"187","2":"<dbl [13]>","3":"0"},{"1":"188","2":"<dbl [13]>","3":"15"},{"1":"189","2":"<dbl [13]>","3":"8"},{"1":"190","2":"<dbl [13]>","3":"11"},{"1":"191","2":"<dbl [13]>","3":"11"},{"1":"192","2":"<dbl [13]>","3":"9"},{"1":"193","2":"<dbl [13]>","3":"10"},{"1":"194","2":"<dbl [13]>","3":"0"},{"1":"195","2":"<dbl [13]>","3":"11"},{"1":"196","2":"<dbl [13]>","3":"7"},{"1":"197","2":"<dbl [13]>","3":"11"},{"1":"198","2":"<dbl [13]>","3":"8"},{"1":"199","2":"<dbl [13]>","3":"8"},{"1":"200","2":"<dbl [13]>","3":"10"},{"1":"201","2":"<dbl [13]>","3":"2"},{"1":"202","2":"<dbl [13]>","3":"11"},{"1":"203","2":"<dbl [13]>","3":"19"},{"1":"204","2":"<dbl [13]>","3":"18"},{"1":"205","2":"<dbl [13]>","3":"5"},{"1":"206","2":"<dbl [13]>","3":"10"},{"1":"207","2":"<dbl [13]>","3":"12"},{"1":"208","2":"<dbl [13]>","3":"12"},{"1":"209","2":"<dbl [13]>","3":"8"},{"1":"210","2":"<dbl [13]>","3":"2"},{"1":"211","2":"<dbl [13]>","3":"6"},{"1":"212","2":"<dbl [13]>","3":"4"},{"1":"213","2":"<dbl [13]>","3":"7"},{"1":"214","2":"<dbl [13]>","3":"12"},{"1":"215","2":"<dbl [13]>","3":"14"},{"1":"216","2":"<dbl [13]>","3":"15"},{"1":"217","2":"<dbl [13]>","3":"10"},{"1":"218","2":"<dbl [13]>","3":"6"},{"1":"219","2":"<dbl [13]>","3":"12"},{"1":"220","2":"<dbl [13]>","3":"20"},{"1":"221","2":"<dbl [13]>","3":"8"},{"1":"222","2":"<dbl [13]>","3":"15"},{"1":"223","2":"<dbl [13]>","3":"10"},{"1":"224","2":"<dbl [13]>","3":"4"},{"1":"225","2":"<dbl [13]>","3":"9"},{"1":"226","2":"<dbl [13]>","3":"11"},{"1":"227","2":"<dbl [13]>","3":"12"},{"1":"228","2":"<dbl [13]>","3":"8"},{"1":"229","2":"<dbl [13]>","3":"3"},{"1":"230","2":"<dbl [13]>","3":"7"},{"1":"231","2":"<dbl [13]>","3":"9"},{"1":"232","2":"<dbl [13]>","3":"9"},{"1":"233","2":"<dbl [13]>","3":"5"},{"1":"234","2":"<dbl [13]>","3":"15"},{"1":"235","2":"<dbl [13]>","3":"12"},{"1":"236","2":"<dbl [13]>","3":"12"},{"1":"237","2":"<dbl [13]>","3":"8"},{"1":"238","2":"<dbl [13]>","3":"7"},{"1":"239","2":"<dbl [13]>","3":"10"},{"1":"240","2":"<dbl [13]>","3":"7"},{"1":"241","2":"<dbl [13]>","3":"7"},{"1":"242","2":"<dbl [13]>","3":"10"},{"1":"243","2":"<dbl [13]>","3":"9"},{"1":"244","2":"<dbl [13]>","3":"10"},{"1":"245","2":"<dbl [13]>","3":"10"},{"1":"246","2":"<dbl [13]>","3":"15"},{"1":"247","2":"<dbl [13]>","3":"14"},{"1":"248","2":"<dbl [13]>","3":"14"},{"1":"249","2":"<dbl [13]>","3":"6"},{"1":"250","2":"<dbl [13]>","3":"7"},{"1":"251","2":"<dbl [13]>","3":"3"},{"1":"252","2":"<dbl [13]>","3":"14"},{"1":"253","2":"<dbl [13]>","3":"4"},{"1":"254","2":"<dbl [13]>","3":"6"},{"1":"255","2":"<dbl [13]>","3":"13"},{"1":"256","2":"<dbl [13]>","3":"9"},{"1":"257","2":"<dbl [13]>","3":"11"},{"1":"258","2":"<dbl [13]>","3":"11"},{"1":"259","2":"<dbl [13]>","3":"10"},{"1":"260","2":"<dbl [13]>","3":"7"},{"1":"261","2":"<dbl [13]>","3":"6"},{"1":"262","2":"<dbl [13]>","3":"11"},{"1":"263","2":"<dbl [13]>","3":"6"},{"1":"264","2":"<dbl [13]>","3":"11"},{"1":"265","2":"<dbl [13]>","3":"10"},{"1":"266","2":"<dbl [13]>","3":"9"},{"1":"267","2":"<dbl [13]>","3":"10"},{"1":"268","2":"<dbl [13]>","3":"4"},{"1":"269","2":"<dbl [13]>","3":"10"},{"1":"270","2":"<dbl [13]>","3":"12"},{"1":"271","2":"<dbl [13]>","3":"20"},{"1":"272","2":"<dbl [13]>","3":"5"},{"1":"273","2":"<dbl [13]>","3":"8"},{"1":"274","2":"<dbl [13]>","3":"2"},{"1":"275","2":"<dbl [13]>","3":"16"},{"1":"276","2":"<dbl [13]>","3":"13"},{"1":"277","2":"<dbl [13]>","3":"4"},{"1":"278","2":"<dbl [13]>","3":"16"},{"1":"279","2":"<dbl [13]>","3":"6"},{"1":"280","2":"<dbl [13]>","3":"9"},{"1":"281","2":"<dbl [13]>","3":"7"},{"1":"282","2":"<dbl [13]>","3":"9"},{"1":"283","2":"<dbl [13]>","3":"6"},{"1":"284","2":"<dbl [13]>","3":"7"},{"1":"285","2":"<dbl [13]>","3":"7"},{"1":"286","2":"<dbl [13]>","3":"8"},{"1":"287","2":"<dbl [13]>","3":"8"},{"1":"288","2":"<dbl [13]>","3":"7"},{"1":"289","2":"<dbl [13]>","3":"7"},{"1":"290","2":"<dbl [13]>","3":"11"},{"1":"291","2":"<dbl [13]>","3":"13"},{"1":"292","2":"<dbl [13]>","3":"10"},{"1":"293","2":"<dbl [13]>","3":"14"},{"1":"294","2":"<dbl [13]>","3":"6"},{"1":"295","2":"<dbl [13]>","3":"8"},{"1":"296","2":"<dbl [13]>","3":"3"},{"1":"297","2":"<dbl [13]>","3":"9"},{"1":"298","2":"<dbl [13]>","3":"10"},{"1":"299","2":"<dbl [13]>","3":"10"},{"1":"300","2":"<dbl [13]>","3":"9"},{"1":"301","2":"<dbl [13]>","3":"3"},{"1":"302","2":"<dbl [13]>","3":"10"},{"1":"303","2":"<dbl [13]>","3":"12"},{"1":"304","2":"<dbl [13]>","3":"13"},{"1":"305","2":"<dbl [13]>","3":"9"},{"1":"306","2":"<dbl [13]>","3":"4"},{"1":"307","2":"<dbl [13]>","3":"6"},{"1":"308","2":"<dbl [13]>","3":"14"},{"1":"309","2":"<dbl [13]>","3":"8"},{"1":"310","2":"<dbl [13]>","3":"10"},{"1":"311","2":"<dbl [13]>","3":"4"},{"1":"312","2":"<dbl [13]>","3":"13"},{"1":"313","2":"<dbl [13]>","3":"10"},{"1":"314","2":"<dbl [13]>","3":"9"},{"1":"315","2":"<dbl [13]>","3":"13"},{"1":"316","2":"<dbl [13]>","3":"11"},{"1":"317","2":"<dbl [13]>","3":"9"},{"1":"318","2":"<dbl [13]>","3":"20"},{"1":"319","2":"<dbl [13]>","3":"9"},{"1":"320","2":"<dbl [13]>","3":"8"},{"1":"321","2":"<dbl [13]>","3":"7"},{"1":"322","2":"<dbl [13]>","3":"12"},{"1":"323","2":"<dbl [13]>","3":"19"},{"1":"324","2":"<dbl [13]>","3":"8"},{"1":"325","2":"<dbl [13]>","3":"5"},{"1":"326","2":"<dbl [13]>","3":"7"},{"1":"327","2":"<dbl [13]>","3":"19"},{"1":"328","2":"<dbl [13]>","3":"10"},{"1":"329","2":"<dbl [13]>","3":"5"},{"1":"330","2":"<dbl [13]>","3":"13"},{"1":"331","2":"<dbl [13]>","3":"15"},{"1":"332","2":"<dbl [13]>","3":"18"},{"1":"333","2":"<dbl [13]>","3":"7"},{"1":"334","2":"<dbl [13]>","3":"11"},{"1":"335","2":"<dbl [13]>","3":"13"},{"1":"336","2":"<dbl [13]>","3":"14"},{"1":"337","2":"<dbl [13]>","3":"11"},{"1":"338","2":"<dbl [13]>","3":"9"},{"1":"339","2":"<dbl [13]>","3":"8"},{"1":"340","2":"<dbl [13]>","3":"13"},{"1":"341","2":"<dbl [13]>","3":"13"},{"1":"342","2":"<dbl [13]>","3":"11"},{"1":"343","2":"<dbl [13]>","3":"10"},{"1":"344","2":"<dbl [13]>","3":"13"},{"1":"345","2":"<dbl [13]>","3":"13"},{"1":"346","2":"<dbl [13]>","3":"10"},{"1":"347","2":"<dbl [13]>","3":"4"},{"1":"348","2":"<dbl [13]>","3":"5"},{"1":"349","2":"<dbl [13]>","3":"16"},{"1":"350","2":"<dbl [13]>","3":"10"},{"1":"351","2":"<dbl [13]>","3":"10"},{"1":"352","2":"<dbl [13]>","3":"12"},{"1":"353","2":"<dbl [13]>","3":"11"},{"1":"354","2":"<dbl [13]>","3":"0"},{"1":"355","2":"<dbl [13]>","3":"5"},{"1":"356","2":"<dbl [13]>","3":"7"},{"1":"357","2":"<dbl [13]>","3":"11"},{"1":"358","2":"<dbl [13]>","3":"7"},{"1":"359","2":"<dbl [13]>","3":"8"},{"1":"360","2":"<dbl [13]>","3":"15"},{"1":"361","2":"<dbl [13]>","3":"7"},{"1":"362","2":"<dbl [13]>","3":"8"},{"1":"363","2":"<dbl [13]>","3":"14"},{"1":"364","2":"<dbl [13]>","3":"13"},{"1":"365","2":"<dbl [13]>","3":"12"},{"1":"366","2":"<dbl [13]>","3":"13"},{"1":"367","2":"<dbl [13]>","3":"10"},{"1":"368","2":"<dbl [13]>","3":"17"},{"1":"369","2":"<dbl [13]>","3":"9"},{"1":"370","2":"<dbl [13]>","3":"10"},{"1":"371","2":"<dbl [13]>","3":"13"},{"1":"372","2":"<dbl [13]>","3":"8"},{"1":"373","2":"<dbl [13]>","3":"12"},{"1":"374","2":"<dbl [13]>","3":"6"},{"1":"375","2":"<dbl [13]>","3":"15"},{"1":"376","2":"<dbl [13]>","3":"11"},{"1":"377","2":"<dbl [13]>","3":"6"},{"1":"378","2":"<dbl [13]>","3":"8"},{"1":"379","2":"<dbl [13]>","3":"8"},{"1":"380","2":"<dbl [13]>","3":"9"},{"1":"381","2":"<dbl [13]>","3":"9"},{"1":"382","2":"<dbl [13]>","3":"7"},{"1":"383","2":"<dbl [13]>","3":"8"},{"1":"384","2":"<dbl [13]>","3":"9"},{"1":"385","2":"<dbl [13]>","3":"11"},{"1":"386","2":"<dbl [13]>","3":"18"},{"1":"387","2":"<dbl [13]>","3":"14"},{"1":"388","2":"<dbl [13]>","3":"16"},{"1":"389","2":"<dbl [13]>","3":"11"},{"1":"390","2":"<dbl [13]>","3":"3"},{"1":"391","2":"<dbl [13]>","3":"8"},{"1":"392","2":"<dbl [13]>","3":"8"},{"1":"393","2":"<dbl [13]>","3":"6"},{"1":"394","2":"<dbl [13]>","3":"20"},{"1":"395","2":"<dbl [13]>","3":"16"},{"1":"396","2":"<dbl [13]>","3":"16"},{"1":"397","2":"<dbl [13]>","3":"5"},{"1":"398","2":"<dbl [13]>","3":"12"},{"1":"399","2":"<dbl [13]>","3":"8"},{"1":"400","2":"<dbl [13]>","3":"5"},{"1":"401","2":"<dbl [13]>","3":"5"},{"1":"402","2":"<dbl [13]>","3":"13"},{"1":"403","2":"<dbl [13]>","3":"9"},{"1":"404","2":"<dbl [13]>","3":"8"},{"1":"405","2":"<dbl [13]>","3":"10"},{"1":"406","2":"<dbl [13]>","3":"14"},{"1":"407","2":"<dbl [13]>","3":"12"},{"1":"408","2":"<dbl [13]>","3":"8"},{"1":"409","2":"<dbl [13]>","3":"11"},{"1":"410","2":"<dbl [13]>","3":"13"},{"1":"411","2":"<dbl [13]>","3":"2"},{"1":"412","2":"<dbl [13]>","3":"12"},{"1":"413","2":"<dbl [13]>","3":"11"},{"1":"414","2":"<dbl [13]>","3":"6"},{"1":"415","2":"<dbl [13]>","3":"13"},{"1":"416","2":"<dbl [13]>","3":"10"},{"1":"417","2":"<dbl [13]>","3":"16"},{"1":"418","2":"<dbl [13]>","3":"10"},{"1":"419","2":"<dbl [13]>","3":"9"},{"1":"420","2":"<dbl [13]>","3":"10"},{"1":"421","2":"<dbl [13]>","3":"12"},{"1":"422","2":"<dbl [13]>","3":"8"},{"1":"423","2":"<dbl [13]>","3":"17"},{"1":"424","2":"<dbl [13]>","3":"15"},{"1":"425","2":"<dbl [13]>","3":"11"},{"1":"426","2":"<dbl [13]>","3":"12"},{"1":"427","2":"<dbl [13]>","3":"13"},{"1":"428","2":"<dbl [13]>","3":"6"},{"1":"429","2":"<dbl [13]>","3":"11"},{"1":"430","2":"<dbl [13]>","3":"7"},{"1":"431","2":"<dbl [13]>","3":"13"},{"1":"432","2":"<dbl [13]>","3":"8"},{"1":"433","2":"<dbl [13]>","3":"2"},{"1":"434","2":"<dbl [13]>","3":"21"},{"1":"435","2":"<dbl [13]>","3":"7"},{"1":"436","2":"<dbl [13]>","3":"15"},{"1":"437","2":"<dbl [13]>","3":"3"},{"1":"438","2":"<dbl [13]>","3":"4"},{"1":"439","2":"<dbl [13]>","3":"8"},{"1":"440","2":"<dbl [13]>","3":"10"},{"1":"441","2":"<dbl [13]>","3":"12"},{"1":"442","2":"<dbl [13]>","3":"22"},{"1":"443","2":"<dbl [13]>","3":"6"},{"1":"444","2":"<dbl [13]>","3":"6"},{"1":"445","2":"<dbl [13]>","3":"11"},{"1":"446","2":"<dbl [13]>","3":"15"},{"1":"447","2":"<dbl [13]>","3":"5"},{"1":"448","2":"<dbl [13]>","3":"8"},{"1":"449","2":"<dbl [13]>","3":"10"},{"1":"450","2":"<dbl [13]>","3":"9"},{"1":"451","2":"<dbl [13]>","3":"14"},{"1":"452","2":"<dbl [13]>","3":"11"},{"1":"453","2":"<dbl [13]>","3":"12"},{"1":"454","2":"<dbl [13]>","3":"10"},{"1":"455","2":"<dbl [13]>","3":"9"},{"1":"456","2":"<dbl [13]>","3":"6"},{"1":"457","2":"<dbl [13]>","3":"5"},{"1":"458","2":"<dbl [13]>","3":"12"},{"1":"459","2":"<dbl [13]>","3":"12"},{"1":"460","2":"<dbl [13]>","3":"10"},{"1":"461","2":"<dbl [13]>","3":"11"},{"1":"462","2":"<dbl [13]>","3":"9"},{"1":"463","2":"<dbl [13]>","3":"12"},{"1":"464","2":"<dbl [13]>","3":"16"},{"1":"465","2":"<dbl [13]>","3":"12"},{"1":"466","2":"<dbl [13]>","3":"11"},{"1":"467","2":"<dbl [13]>","3":"17"},{"1":"468","2":"<dbl [13]>","3":"4"},{"1":"469","2":"<dbl [13]>","3":"7"},{"1":"470","2":"<dbl [13]>","3":"7"},{"1":"471","2":"<dbl [13]>","3":"10"},{"1":"472","2":"<dbl [13]>","3":"11"},{"1":"473","2":"<dbl [13]>","3":"10"},{"1":"474","2":"<dbl [13]>","3":"5"},{"1":"475","2":"<dbl [13]>","3":"14"},{"1":"476","2":"<dbl [13]>","3":"12"},{"1":"477","2":"<dbl [13]>","3":"18"},{"1":"478","2":"<dbl [13]>","3":"15"},{"1":"479","2":"<dbl [13]>","3":"10"},{"1":"480","2":"<dbl [13]>","3":"8"},{"1":"481","2":"<dbl [13]>","3":"7"},{"1":"482","2":"<dbl [13]>","3":"2"},{"1":"483","2":"<dbl [13]>","3":"10"},{"1":"484","2":"<dbl [13]>","3":"11"},{"1":"485","2":"<dbl [13]>","3":"6"},{"1":"486","2":"<dbl [13]>","3":"8"},{"1":"487","2":"<dbl [13]>","3":"13"},{"1":"488","2":"<dbl [13]>","3":"7"},{"1":"489","2":"<dbl [13]>","3":"16"},{"1":"490","2":"<dbl [13]>","3":"3"},{"1":"491","2":"<dbl [13]>","3":"5"},{"1":"492","2":"<dbl [13]>","3":"13"},{"1":"493","2":"<dbl [13]>","3":"16"},{"1":"494","2":"<dbl [13]>","3":"5"},{"1":"495","2":"<dbl [13]>","3":"11"},{"1":"496","2":"<dbl [13]>","3":"8"},{"1":"497","2":"<dbl [13]>","3":"6"},{"1":"498","2":"<dbl [13]>","3":"14"},{"1":"499","2":"<dbl [13]>","3":"3"},{"1":"500","2":"<dbl [13]>","3":"16"},{"1":"501","2":"<dbl [13]>","3":"13"},{"1":"502","2":"<dbl [13]>","3":"7"},{"1":"503","2":"<dbl [13]>","3":"3"},{"1":"504","2":"<dbl [13]>","3":"19"},{"1":"505","2":"<dbl [13]>","3":"7"},{"1":"506","2":"<dbl [13]>","3":"16"},{"1":"507","2":"<dbl [13]>","3":"4"},{"1":"508","2":"<dbl [13]>","3":"11"},{"1":"509","2":"<dbl [13]>","3":"4"},{"1":"510","2":"<dbl [13]>","3":"17"},{"1":"511","2":"<dbl [13]>","3":"13"},{"1":"512","2":"<dbl [13]>","3":"10"},{"1":"513","2":"<dbl [13]>","3":"17"},{"1":"514","2":"<dbl [13]>","3":"13"},{"1":"515","2":"<dbl [13]>","3":"6"},{"1":"516","2":"<dbl [13]>","3":"12"},{"1":"517","2":"<dbl [13]>","3":"10"},{"1":"518","2":"<dbl [13]>","3":"7"},{"1":"519","2":"<dbl [13]>","3":"14"},{"1":"520","2":"<dbl [13]>","3":"12"},{"1":"521","2":"<dbl [13]>","3":"11"},{"1":"522","2":"<dbl [13]>","3":"15"},{"1":"523","2":"<dbl [13]>","3":"7"},{"1":"524","2":"<dbl [13]>","3":"11"},{"1":"525","2":"<dbl [13]>","3":"16"},{"1":"526","2":"<dbl [13]>","3":"10"},{"1":"527","2":"<dbl [13]>","3":"2"},{"1":"528","2":"<dbl [13]>","3":"7"},{"1":"529","2":"<dbl [13]>","3":"10"},{"1":"530","2":"<dbl [13]>","3":"11"},{"1":"531","2":"<dbl [13]>","3":"6"},{"1":"532","2":"<dbl [13]>","3":"11"},{"1":"533","2":"<dbl [13]>","3":"17"},{"1":"534","2":"<dbl [13]>","3":"5"},{"1":"535","2":"<dbl [13]>","3":"5"},{"1":"536","2":"<dbl [13]>","3":"12"},{"1":"537","2":"<dbl [13]>","3":"8"},{"1":"538","2":"<dbl [13]>","3":"19"},{"1":"539","2":"<dbl [13]>","3":"13"},{"1":"540","2":"<dbl [13]>","3":"21"},{"1":"541","2":"<dbl [13]>","3":"16"},{"1":"542","2":"<dbl [13]>","3":"8"},{"1":"543","2":"<dbl [13]>","3":"5"},{"1":"544","2":"<dbl [13]>","3":"5"},{"1":"545","2":"<dbl [13]>","3":"8"},{"1":"546","2":"<dbl [13]>","3":"11"},{"1":"547","2":"<dbl [13]>","3":"13"},{"1":"548","2":"<dbl [13]>","3":"8"},{"1":"549","2":"<dbl [13]>","3":"13"},{"1":"550","2":"<dbl [13]>","3":"12"},{"1":"551","2":"<dbl [13]>","3":"11"},{"1":"552","2":"<dbl [13]>","3":"8"},{"1":"553","2":"<dbl [13]>","3":"6"},{"1":"554","2":"<dbl [13]>","3":"7"},{"1":"555","2":"<dbl [13]>","3":"6"},{"1":"556","2":"<dbl [13]>","3":"14"},{"1":"557","2":"<dbl [13]>","3":"17"},{"1":"558","2":"<dbl [13]>","3":"17"},{"1":"559","2":"<dbl [13]>","3":"8"},{"1":"560","2":"<dbl [13]>","3":"7"},{"1":"561","2":"<dbl [13]>","3":"10"},{"1":"562","2":"<dbl [13]>","3":"5"},{"1":"563","2":"<dbl [13]>","3":"7"},{"1":"564","2":"<dbl [13]>","3":"9"},{"1":"565","2":"<dbl [13]>","3":"2"},{"1":"566","2":"<dbl [13]>","3":"3"},{"1":"567","2":"<dbl [13]>","3":"6"},{"1":"568","2":"<dbl [13]>","3":"0"},{"1":"569","2":"<dbl [13]>","3":"9"},{"1":"570","2":"<dbl [13]>","3":"9"},{"1":"571","2":"<dbl [13]>","3":"18"},{"1":"572","2":"<dbl [13]>","3":"5"},{"1":"573","2":"<dbl [13]>","3":"7"},{"1":"574","2":"<dbl [13]>","3":"11"},{"1":"575","2":"<dbl [13]>","3":"9"},{"1":"576","2":"<dbl [13]>","3":"11"},{"1":"577","2":"<dbl [13]>","3":"12"},{"1":"578","2":"<dbl [13]>","3":"6"},{"1":"579","2":"<dbl [13]>","3":"14"},{"1":"580","2":"<dbl [13]>","3":"4"},{"1":"581","2":"<dbl [13]>","3":"11"},{"1":"582","2":"<dbl [13]>","3":"11"},{"1":"583","2":"<dbl [13]>","3":"6"},{"1":"584","2":"<dbl [13]>","3":"12"},{"1":"585","2":"<dbl [13]>","3":"10"},{"1":"586","2":"<dbl [13]>","3":"10"},{"1":"587","2":"<dbl [13]>","3":"2"},{"1":"588","2":"<dbl [13]>","3":"14"},{"1":"589","2":"<dbl [13]>","3":"14"},{"1":"590","2":"<dbl [13]>","3":"3"},{"1":"591","2":"<dbl [13]>","3":"14"},{"1":"592","2":"<dbl [13]>","3":"9"},{"1":"593","2":"<dbl [13]>","3":"12"},{"1":"594","2":"<dbl [13]>","3":"12"},{"1":"595","2":"<dbl [13]>","3":"19"},{"1":"596","2":"<dbl [13]>","3":"8"},{"1":"597","2":"<dbl [13]>","3":"1"},{"1":"598","2":"<dbl [13]>","3":"10"},{"1":"599","2":"<dbl [13]>","3":"4"},{"1":"600","2":"<dbl [13]>","3":"15"},{"1":"601","2":"<dbl [13]>","3":"9"},{"1":"602","2":"<dbl [13]>","3":"6"},{"1":"603","2":"<dbl [13]>","3":"21"},{"1":"604","2":"<dbl [13]>","3":"12"},{"1":"605","2":"<dbl [13]>","3":"13"},{"1":"606","2":"<dbl [13]>","3":"13"},{"1":"607","2":"<dbl [13]>","3":"14"},{"1":"608","2":"<dbl [13]>","3":"14"},{"1":"609","2":"<dbl [13]>","3":"2"},{"1":"610","2":"<dbl [13]>","3":"10"},{"1":"611","2":"<dbl [13]>","3":"10"},{"1":"612","2":"<dbl [13]>","3":"3"},{"1":"613","2":"<dbl [13]>","3":"4"},{"1":"614","2":"<dbl [13]>","3":"6"},{"1":"615","2":"<dbl [13]>","3":"6"},{"1":"616","2":"<dbl [13]>","3":"6"},{"1":"617","2":"<dbl [13]>","3":"6"},{"1":"618","2":"<dbl [13]>","3":"4"},{"1":"619","2":"<dbl [13]>","3":"11"},{"1":"620","2":"<dbl [13]>","3":"13"},{"1":"621","2":"<dbl [13]>","3":"18"},{"1":"622","2":"<dbl [13]>","3":"19"},{"1":"623","2":"<dbl [13]>","3":"13"},{"1":"624","2":"<dbl [13]>","3":"10"},{"1":"625","2":"<dbl [13]>","3":"16"},{"1":"626","2":"<dbl [13]>","3":"8"},{"1":"627","2":"<dbl [13]>","3":"9"},{"1":"628","2":"<dbl [13]>","3":"9"},{"1":"629","2":"<dbl [13]>","3":"10"},{"1":"630","2":"<dbl [13]>","3":"10"},{"1":"631","2":"<dbl [13]>","3":"8"},{"1":"632","2":"<dbl [13]>","3":"10"},{"1":"633","2":"<dbl [13]>","3":"13"},{"1":"634","2":"<dbl [13]>","3":"13"},{"1":"635","2":"<dbl [13]>","3":"10"},{"1":"636","2":"<dbl [13]>","3":"1"},{"1":"637","2":"<dbl [13]>","3":"6"},{"1":"638","2":"<dbl [13]>","3":"4"},{"1":"639","2":"<dbl [13]>","3":"8"},{"1":"640","2":"<dbl [13]>","3":"14"},{"1":"641","2":"<dbl [13]>","3":"14"},{"1":"642","2":"<dbl [13]>","3":"6"},{"1":"643","2":"<dbl [13]>","3":"16"},{"1":"644","2":"<dbl [13]>","3":"11"},{"1":"645","2":"<dbl [13]>","3":"12"},{"1":"646","2":"<dbl [13]>","3":"1"},{"1":"647","2":"<dbl [13]>","3":"16"},{"1":"648","2":"<dbl [13]>","3":"8"},{"1":"649","2":"<dbl [13]>","3":"13"},{"1":"650","2":"<dbl [13]>","3":"11"},{"1":"651","2":"<dbl [13]>","3":"15"},{"1":"652","2":"<dbl [13]>","3":"4"},{"1":"653","2":"<dbl [13]>","3":"14"},{"1":"654","2":"<dbl [13]>","3":"12"},{"1":"655","2":"<dbl [13]>","3":"8"},{"1":"656","2":"<dbl [13]>","3":"10"},{"1":"657","2":"<dbl [13]>","3":"10"},{"1":"658","2":"<dbl [13]>","3":"12"},{"1":"659","2":"<dbl [13]>","3":"10"},{"1":"660","2":"<dbl [13]>","3":"10"},{"1":"661","2":"<dbl [13]>","3":"11"},{"1":"662","2":"<dbl [13]>","3":"8"},{"1":"663","2":"<dbl [13]>","3":"9"},{"1":"664","2":"<dbl [13]>","3":"9"},{"1":"665","2":"<dbl [13]>","3":"5"},{"1":"666","2":"<dbl [13]>","3":"5"},{"1":"667","2":"<dbl [13]>","3":"10"},{"1":"668","2":"<dbl [13]>","3":"12"},{"1":"669","2":"<dbl [13]>","3":"17"},{"1":"670","2":"<dbl [13]>","3":"14"},{"1":"671","2":"<dbl [13]>","3":"9"},{"1":"672","2":"<dbl [13]>","3":"17"},{"1":"673","2":"<dbl [13]>","3":"15"},{"1":"674","2":"<dbl [13]>","3":"9"},{"1":"675","2":"<dbl [13]>","3":"8"},{"1":"676","2":"<dbl [13]>","3":"5"},{"1":"677","2":"<dbl [13]>","3":"6"},{"1":"678","2":"<dbl [13]>","3":"8"},{"1":"679","2":"<dbl [13]>","3":"11"},{"1":"680","2":"<dbl [13]>","3":"10"},{"1":"681","2":"<dbl [13]>","3":"13"},{"1":"682","2":"<dbl [13]>","3":"8"},{"1":"683","2":"<dbl [13]>","3":"10"},{"1":"684","2":"<dbl [13]>","3":"8"},{"1":"685","2":"<dbl [13]>","3":"14"},{"1":"686","2":"<dbl [13]>","3":"11"},{"1":"687","2":"<dbl [13]>","3":"6"},{"1":"688","2":"<dbl [13]>","3":"4"},{"1":"689","2":"<dbl [13]>","3":"9"},{"1":"690","2":"<dbl [13]>","3":"10"},{"1":"691","2":"<dbl [13]>","3":"9"},{"1":"692","2":"<dbl [13]>","3":"8"},{"1":"693","2":"<dbl [13]>","3":"8"},{"1":"694","2":"<dbl [13]>","3":"20"},{"1":"695","2":"<dbl [13]>","3":"8"},{"1":"696","2":"<dbl [13]>","3":"6"},{"1":"697","2":"<dbl [13]>","3":"3"},{"1":"698","2":"<dbl [13]>","3":"12"},{"1":"699","2":"<dbl [13]>","3":"5"},{"1":"700","2":"<dbl [13]>","3":"6"},{"1":"701","2":"<dbl [13]>","3":"7"},{"1":"702","2":"<dbl [13]>","3":"14"},{"1":"703","2":"<dbl [13]>","3":"7"},{"1":"704","2":"<dbl [13]>","3":"7"},{"1":"705","2":"<dbl [13]>","3":"11"},{"1":"706","2":"<dbl [13]>","3":"6"},{"1":"707","2":"<dbl [13]>","3":"12"},{"1":"708","2":"<dbl [13]>","3":"15"},{"1":"709","2":"<dbl [13]>","3":"11"},{"1":"710","2":"<dbl [13]>","3":"9"},{"1":"711","2":"<dbl [13]>","3":"1"},{"1":"712","2":"<dbl [13]>","3":"9"},{"1":"713","2":"<dbl [13]>","3":"8"},{"1":"714","2":"<dbl [13]>","3":"14"},{"1":"715","2":"<dbl [13]>","3":"6"},{"1":"716","2":"<dbl [13]>","3":"14"},{"1":"717","2":"<dbl [13]>","3":"12"},{"1":"718","2":"<dbl [13]>","3":"11"},{"1":"719","2":"<dbl [13]>","3":"5"},{"1":"720","2":"<dbl [13]>","3":"7"},{"1":"721","2":"<dbl [13]>","3":"7"},{"1":"722","2":"<dbl [13]>","3":"11"},{"1":"723","2":"<dbl [13]>","3":"4"},{"1":"724","2":"<dbl [13]>","3":"12"},{"1":"725","2":"<dbl [13]>","3":"10"},{"1":"726","2":"<dbl [13]>","3":"11"},{"1":"727","2":"<dbl [13]>","3":"9"},{"1":"728","2":"<dbl [13]>","3":"11"},{"1":"729","2":"<dbl [13]>","3":"10"},{"1":"730","2":"<dbl [13]>","3":"9"},{"1":"731","2":"<dbl [13]>","3":"8"},{"1":"732","2":"<dbl [13]>","3":"7"},{"1":"733","2":"<dbl [13]>","3":"9"},{"1":"734","2":"<dbl [13]>","3":"6"},{"1":"735","2":"<dbl [13]>","3":"9"},{"1":"736","2":"<dbl [13]>","3":"6"},{"1":"737","2":"<dbl [13]>","3":"10"},{"1":"738","2":"<dbl [13]>","3":"10"},{"1":"739","2":"<dbl [13]>","3":"7"},{"1":"740","2":"<dbl [13]>","3":"13"},{"1":"741","2":"<dbl [13]>","3":"12"},{"1":"742","2":"<dbl [13]>","3":"10"},{"1":"743","2":"<dbl [13]>","3":"6"},{"1":"744","2":"<dbl [13]>","3":"12"},{"1":"745","2":"<dbl [13]>","3":"3"},{"1":"746","2":"<dbl [13]>","3":"10"},{"1":"747","2":"<dbl [13]>","3":"16"},{"1":"748","2":"<dbl [13]>","3":"9"},{"1":"749","2":"<dbl [13]>","3":"10"},{"1":"750","2":"<dbl [13]>","3":"10"},{"1":"751","2":"<dbl [13]>","3":"7"},{"1":"752","2":"<dbl [13]>","3":"4"},{"1":"753","2":"<dbl [13]>","3":"1"},{"1":"754","2":"<dbl [13]>","3":"10"},{"1":"755","2":"<dbl [13]>","3":"4"},{"1":"756","2":"<dbl [13]>","3":"9"},{"1":"757","2":"<dbl [13]>","3":"10"},{"1":"758","2":"<dbl [13]>","3":"19"},{"1":"759","2":"<dbl [13]>","3":"11"},{"1":"760","2":"<dbl [13]>","3":"20"},{"1":"761","2":"<dbl [13]>","3":"10"},{"1":"762","2":"<dbl [13]>","3":"6"},{"1":"763","2":"<dbl [13]>","3":"9"},{"1":"764","2":"<dbl [13]>","3":"18"},{"1":"765","2":"<dbl [13]>","3":"13"},{"1":"766","2":"<dbl [13]>","3":"10"},{"1":"767","2":"<dbl [13]>","3":"14"},{"1":"768","2":"<dbl [13]>","3":"6"},{"1":"769","2":"<dbl [13]>","3":"5"},{"1":"770","2":"<dbl [13]>","3":"6"},{"1":"771","2":"<dbl [13]>","3":"6"},{"1":"772","2":"<dbl [13]>","3":"4"},{"1":"773","2":"<dbl [13]>","3":"13"},{"1":"774","2":"<dbl [13]>","3":"9"},{"1":"775","2":"<dbl [13]>","3":"11"},{"1":"776","2":"<dbl [13]>","3":"8"},{"1":"777","2":"<dbl [13]>","3":"16"},{"1":"778","2":"<dbl [13]>","3":"6"},{"1":"779","2":"<dbl [13]>","3":"9"},{"1":"780","2":"<dbl [13]>","3":"11"},{"1":"781","2":"<dbl [13]>","3":"8"},{"1":"782","2":"<dbl [13]>","3":"12"},{"1":"783","2":"<dbl [13]>","3":"12"},{"1":"784","2":"<dbl [13]>","3":"13"},{"1":"785","2":"<dbl [13]>","3":"13"},{"1":"786","2":"<dbl [13]>","3":"13"},{"1":"787","2":"<dbl [13]>","3":"8"},{"1":"788","2":"<dbl [13]>","3":"10"},{"1":"789","2":"<dbl [13]>","3":"10"},{"1":"790","2":"<dbl [13]>","3":"13"},{"1":"791","2":"<dbl [13]>","3":"11"},{"1":"792","2":"<dbl [13]>","3":"11"},{"1":"793","2":"<dbl [13]>","3":"14"},{"1":"794","2":"<dbl [13]>","3":"12"},{"1":"795","2":"<dbl [13]>","3":"4"},{"1":"796","2":"<dbl [13]>","3":"9"},{"1":"797","2":"<dbl [13]>","3":"8"},{"1":"798","2":"<dbl [13]>","3":"14"},{"1":"799","2":"<dbl [13]>","3":"11"},{"1":"800","2":"<dbl [13]>","3":"13"},{"1":"801","2":"<dbl [13]>","3":"5"},{"1":"802","2":"<dbl [13]>","3":"9"},{"1":"803","2":"<dbl [13]>","3":"6"},{"1":"804","2":"<dbl [13]>","3":"3"},{"1":"805","2":"<dbl [13]>","3":"15"},{"1":"806","2":"<dbl [13]>","3":"10"},{"1":"807","2":"<dbl [13]>","3":"19"},{"1":"808","2":"<dbl [13]>","3":"10"},{"1":"809","2":"<dbl [13]>","3":"9"},{"1":"810","2":"<dbl [13]>","3":"6"},{"1":"811","2":"<dbl [13]>","3":"16"},{"1":"812","2":"<dbl [13]>","3":"7"},{"1":"813","2":"<dbl [13]>","3":"23"},{"1":"814","2":"<dbl [13]>","3":"13"},{"1":"815","2":"<dbl [13]>","3":"8"},{"1":"816","2":"<dbl [13]>","3":"10"},{"1":"817","2":"<dbl [13]>","3":"10"},{"1":"818","2":"<dbl [13]>","3":"11"},{"1":"819","2":"<dbl [13]>","3":"8"},{"1":"820","2":"<dbl [13]>","3":"9"},{"1":"821","2":"<dbl [13]>","3":"18"},{"1":"822","2":"<dbl [13]>","3":"15"},{"1":"823","2":"<dbl [13]>","3":"6"},{"1":"824","2":"<dbl [13]>","3":"12"},{"1":"825","2":"<dbl [13]>","3":"15"},{"1":"826","2":"<dbl [13]>","3":"12"},{"1":"827","2":"<dbl [13]>","3":"11"},{"1":"828","2":"<dbl [13]>","3":"6"},{"1":"829","2":"<dbl [13]>","3":"7"},{"1":"830","2":"<dbl [13]>","3":"9"},{"1":"831","2":"<dbl [13]>","3":"12"},{"1":"832","2":"<dbl [13]>","3":"4"},{"1":"833","2":"<dbl [13]>","3":"19"},{"1":"834","2":"<dbl [13]>","3":"12"},{"1":"835","2":"<dbl [13]>","3":"8"},{"1":"836","2":"<dbl [13]>","3":"9"},{"1":"837","2":"<dbl [13]>","3":"9"},{"1":"838","2":"<dbl [13]>","3":"12"},{"1":"839","2":"<dbl [13]>","3":"16"},{"1":"840","2":"<dbl [13]>","3":"2"},{"1":"841","2":"<dbl [13]>","3":"15"},{"1":"842","2":"<dbl [13]>","3":"13"},{"1":"843","2":"<dbl [13]>","3":"10"},{"1":"844","2":"<dbl [13]>","3":"6"},{"1":"845","2":"<dbl [13]>","3":"3"},{"1":"846","2":"<dbl [13]>","3":"11"},{"1":"847","2":"<dbl [13]>","3":"14"},{"1":"848","2":"<dbl [13]>","3":"12"},{"1":"849","2":"<dbl [13]>","3":"6"},{"1":"850","2":"<dbl [13]>","3":"13"},{"1":"851","2":"<dbl [13]>","3":"12"},{"1":"852","2":"<dbl [13]>","3":"7"},{"1":"853","2":"<dbl [13]>","3":"9"},{"1":"854","2":"<dbl [13]>","3":"19"},{"1":"855","2":"<dbl [13]>","3":"8"},{"1":"856","2":"<dbl [13]>","3":"13"},{"1":"857","2":"<dbl [13]>","3":"11"},{"1":"858","2":"<dbl [13]>","3":"9"},{"1":"859","2":"<dbl [13]>","3":"4"},{"1":"860","2":"<dbl [13]>","3":"13"},{"1":"861","2":"<dbl [13]>","3":"4"},{"1":"862","2":"<dbl [13]>","3":"12"},{"1":"863","2":"<dbl [13]>","3":"10"},{"1":"864","2":"<dbl [13]>","3":"5"},{"1":"865","2":"<dbl [13]>","3":"7"},{"1":"866","2":"<dbl [13]>","3":"10"},{"1":"867","2":"<dbl [13]>","3":"13"},{"1":"868","2":"<dbl [13]>","3":"13"},{"1":"869","2":"<dbl [13]>","3":"11"},{"1":"870","2":"<dbl [13]>","3":"9"},{"1":"871","2":"<dbl [13]>","3":"14"},{"1":"872","2":"<dbl [13]>","3":"11"},{"1":"873","2":"<dbl [13]>","3":"7"},{"1":"874","2":"<dbl [13]>","3":"6"},{"1":"875","2":"<dbl [13]>","3":"12"},{"1":"876","2":"<dbl [13]>","3":"9"},{"1":"877","2":"<dbl [13]>","3":"11"},{"1":"878","2":"<dbl [13]>","3":"15"},{"1":"879","2":"<dbl [13]>","3":"8"},{"1":"880","2":"<dbl [13]>","3":"4"},{"1":"881","2":"<dbl [13]>","3":"7"},{"1":"882","2":"<dbl [13]>","3":"6"},{"1":"883","2":"<dbl [13]>","3":"14"},{"1":"884","2":"<dbl [13]>","3":"12"},{"1":"885","2":"<dbl [13]>","3":"16"},{"1":"886","2":"<dbl [13]>","3":"4"},{"1":"887","2":"<dbl [13]>","3":"2"},{"1":"888","2":"<dbl [13]>","3":"11"},{"1":"889","2":"<dbl [13]>","3":"15"},{"1":"890","2":"<dbl [13]>","3":"7"},{"1":"891","2":"<dbl [13]>","3":"13"},{"1":"892","2":"<dbl [13]>","3":"12"},{"1":"893","2":"<dbl [13]>","3":"17"},{"1":"894","2":"<dbl [13]>","3":"16"},{"1":"895","2":"<dbl [13]>","3":"9"},{"1":"896","2":"<dbl [13]>","3":"16"},{"1":"897","2":"<dbl [13]>","3":"6"},{"1":"898","2":"<dbl [13]>","3":"6"},{"1":"899","2":"<dbl [13]>","3":"10"},{"1":"900","2":"<dbl [13]>","3":"5"},{"1":"901","2":"<dbl [13]>","3":"11"},{"1":"902","2":"<dbl [13]>","3":"9"},{"1":"903","2":"<dbl [13]>","3":"8"},{"1":"904","2":"<dbl [13]>","3":"8"},{"1":"905","2":"<dbl [13]>","3":"9"},{"1":"906","2":"<dbl [13]>","3":"3"},{"1":"907","2":"<dbl [13]>","3":"15"},{"1":"908","2":"<dbl [13]>","3":"10"},{"1":"909","2":"<dbl [13]>","3":"8"},{"1":"910","2":"<dbl [13]>","3":"16"},{"1":"911","2":"<dbl [13]>","3":"7"},{"1":"912","2":"<dbl [13]>","3":"9"},{"1":"913","2":"<dbl [13]>","3":"17"},{"1":"914","2":"<dbl [13]>","3":"11"},{"1":"915","2":"<dbl [13]>","3":"12"},{"1":"916","2":"<dbl [13]>","3":"11"},{"1":"917","2":"<dbl [13]>","3":"10"},{"1":"918","2":"<dbl [13]>","3":"3"},{"1":"919","2":"<dbl [13]>","3":"4"},{"1":"920","2":"<dbl [13]>","3":"4"},{"1":"921","2":"<dbl [13]>","3":"11"},{"1":"922","2":"<dbl [13]>","3":"16"},{"1":"923","2":"<dbl [13]>","3":"13"},{"1":"924","2":"<dbl [13]>","3":"7"},{"1":"925","2":"<dbl [13]>","3":"6"},{"1":"926","2":"<dbl [13]>","3":"13"},{"1":"927","2":"<dbl [13]>","3":"8"},{"1":"928","2":"<dbl [13]>","3":"9"},{"1":"929","2":"<dbl [13]>","3":"5"},{"1":"930","2":"<dbl [13]>","3":"11"},{"1":"931","2":"<dbl [13]>","3":"7"},{"1":"932","2":"<dbl [13]>","3":"10"},{"1":"933","2":"<dbl [13]>","3":"7"},{"1":"934","2":"<dbl [13]>","3":"12"},{"1":"935","2":"<dbl [13]>","3":"12"},{"1":"936","2":"<dbl [13]>","3":"4"},{"1":"937","2":"<dbl [13]>","3":"10"},{"1":"938","2":"<dbl [13]>","3":"9"},{"1":"939","2":"<dbl [13]>","3":"17"},{"1":"940","2":"<dbl [13]>","3":"10"},{"1":"941","2":"<dbl [13]>","3":"14"},{"1":"942","2":"<dbl [13]>","3":"12"},{"1":"943","2":"<dbl [13]>","3":"12"},{"1":"944","2":"<dbl [13]>","3":"15"},{"1":"945","2":"<dbl [13]>","3":"6"},{"1":"946","2":"<dbl [13]>","3":"15"},{"1":"947","2":"<dbl [13]>","3":"12"},{"1":"948","2":"<dbl [13]>","3":"7"},{"1":"949","2":"<dbl [13]>","3":"11"},{"1":"950","2":"<dbl [13]>","3":"22"},{"1":"951","2":"<dbl [13]>","3":"14"},{"1":"952","2":"<dbl [13]>","3":"17"},{"1":"953","2":"<dbl [13]>","3":"4"},{"1":"954","2":"<dbl [13]>","3":"8"},{"1":"955","2":"<dbl [13]>","3":"11"},{"1":"956","2":"<dbl [13]>","3":"14"},{"1":"957","2":"<dbl [13]>","3":"6"},{"1":"958","2":"<dbl [13]>","3":"12"},{"1":"959","2":"<dbl [13]>","3":"13"},{"1":"960","2":"<dbl [13]>","3":"13"},{"1":"961","2":"<dbl [13]>","3":"5"},{"1":"962","2":"<dbl [13]>","3":"10"},{"1":"963","2":"<dbl [13]>","3":"9"},{"1":"964","2":"<dbl [13]>","3":"11"},{"1":"965","2":"<dbl [13]>","3":"12"},{"1":"966","2":"<dbl [13]>","3":"10"},{"1":"967","2":"<dbl [13]>","3":"4"},{"1":"968","2":"<dbl [13]>","3":"13"},{"1":"969","2":"<dbl [13]>","3":"8"},{"1":"970","2":"<dbl [13]>","3":"7"},{"1":"971","2":"<dbl [13]>","3":"10"},{"1":"972","2":"<dbl [13]>","3":"8"},{"1":"973","2":"<dbl [13]>","3":"10"},{"1":"974","2":"<dbl [13]>","3":"7"},{"1":"975","2":"<dbl [13]>","3":"13"},{"1":"976","2":"<dbl [13]>","3":"10"},{"1":"977","2":"<dbl [13]>","3":"10"},{"1":"978","2":"<dbl [13]>","3":"7"},{"1":"979","2":"<dbl [13]>","3":"4"},{"1":"980","2":"<dbl [13]>","3":"13"},{"1":"981","2":"<dbl [13]>","3":"15"},{"1":"982","2":"<dbl [13]>","3":"6"},{"1":"983","2":"<dbl [13]>","3":"11"},{"1":"984","2":"<dbl [13]>","3":"11"},{"1":"985","2":"<dbl [13]>","3":"8"},{"1":"986","2":"<dbl [13]>","3":"14"},{"1":"987","2":"<dbl [13]>","3":"11"},{"1":"988","2":"<dbl [13]>","3":"13"},{"1":"989","2":"<dbl [13]>","3":"11"},{"1":"990","2":"<dbl [13]>","3":"20"},{"1":"991","2":"<dbl [13]>","3":"9"},{"1":"992","2":"<dbl [13]>","3":"6"},{"1":"993","2":"<dbl [13]>","3":"10"},{"1":"994","2":"<dbl [13]>","3":"11"},{"1":"995","2":"<dbl [13]>","3":"10"},{"1":"996","2":"<dbl [13]>","3":"9"},{"1":"997","2":"<dbl [13]>","3":"13"},{"1":"998","2":"<dbl [13]>","3":"17"},{"1":"999","2":"<dbl [13]>","3":"3"},{"1":"1000","2":"<dbl [13]>","3":"9"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>I stopped it there, partly to show what the dataframe looks like at this point (a hand of 13 point values, and a points total that is the sum of these) and partly because I wanted to do about three things with this, and it made sense to save what we have done thus far.</p>
<p>First, a bar chart of how likely each number of points is:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(d, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> points)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_bar</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/tidy-simulation/index_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>If you do more simulations, you can check whether the shape is indeed smooth (I’m guessing it is). The average number of points is 10 (there are 40 points in the deck and yours is one of four hands) and the distribution is right-skewed because it is possible, though rather unlikely, to get over 20 points.</p>
<p>In most bidding systems, having 13 points justifies opening the bidding (making the first bid in the auction if everyone has passed on their turn before you). How likely is that?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(points <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["points >= 13"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"749"},{"1":"TRUE","2":"251"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Only about a quarter of the time.</p>
<p>Having 20 or more points qualifies your hand for an opening bid at the 2-level.<sup>10</sup> How likely is that?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(points <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["points >= 20"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"985"},{"1":"TRUE","2":"15"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>A bit of a rarity, less than a 2% shot.</p>
</section>
<section id="bootstrapping-a-sampling-distribution" class="level2">
<h2 class="anchored" data-anchor-id="bootstrapping-a-sampling-distribution">Bootstrapping a sampling distribution</h2>
<p>To return to the messy world of actual applied statistics: there are a lot of procedures based on an assumption of the right things having normal distributions.<sup>11</sup> One of the commonest questions is whether we should be using the normal-theory procedure or something else (non-parametric, maybe). Let’s take an example. The data <a href="http://ritsokiguess.site/datafiles/jays15-home.csv">here</a> are information about Toronto Blue Jays baseball games from the early part of the 2015 season:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/jays15-home.csv"</span></span>
<span id="cb26-2">jays <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(my_url)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 25 Columns: 21
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (12): date, box, team, opp, result, wl, gb, winner, loser, save, Daynig...
dbl   (7): row, game, runs, Oppruns, innings, position, attendance
lgl   (1): venue
time  (1): game time

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">jays</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["row"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["game"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["date"],"name":[3],"type":["chr"],"align":["left"]},{"label":["box"],"name":[4],"type":["chr"],"align":["left"]},{"label":["team"],"name":[5],"type":["chr"],"align":["left"]},{"label":["venue"],"name":[6],"type":["lgl"],"align":["right"]},{"label":["opp"],"name":[7],"type":["chr"],"align":["left"]},{"label":["result"],"name":[8],"type":["chr"],"align":["left"]},{"label":["runs"],"name":[9],"type":["dbl"],"align":["right"]},{"label":["Oppruns"],"name":[10],"type":["dbl"],"align":["right"]},{"label":["innings"],"name":[11],"type":["dbl"],"align":["right"]},{"label":["wl"],"name":[12],"type":["chr"],"align":["left"]},{"label":["position"],"name":[13],"type":["dbl"],"align":["right"]},{"label":["gb"],"name":[14],"type":["chr"],"align":["left"]},{"label":["winner"],"name":[15],"type":["chr"],"align":["left"]},{"label":["loser"],"name":[16],"type":["chr"],"align":["left"]},{"label":["save"],"name":[17],"type":["chr"],"align":["left"]},{"label":["game time"],"name":[18],"type":["time"],"align":["right"]},{"label":["Daynight"],"name":[19],"type":["chr"],"align":["left"]},{"label":["attendance"],"name":[20],"type":["dbl"],"align":["right"]},{"label":["streak"],"name":[21],"type":["chr"],"align":["left"]}],"data":[{"1":"82","2":"7","3":"Monday, Apr 13","4":"boxscore","5":"TOR","6":"NA","7":"TBR","8":"L","9":"1","10":"2","11":"NA","12":"4-3","13":"2","14":"1","15":"Odorizzi","16":"Dickey","17":"Boxberger","18":"02:30:00","19":"N","20":"48414","21":"-"},{"1":"83","2":"8","3":"Tuesday, Apr 14","4":"boxscore","5":"TOR","6":"NA","7":"TBR","8":"L","9":"2","10":"3","11":"NA","12":"4-4","13":"3","14":"2","15":"Geltz","16":"Castro","17":"Jepsen","18":"03:06:00","19":"N","20":"17264","21":"--"},{"1":"84","2":"9","3":"Wednesday, Apr 15","4":"boxscore","5":"TOR","6":"NA","7":"TBR","8":"W","9":"12","10":"7","11":"NA","12":"5-4","13":"2","14":"1","15":"Buehrle","16":"Ramirez","17":"NA","18":"03:02:00","19":"N","20":"15086","21":"+"},{"1":"85","2":"10","3":"Thursday, Apr 16","4":"boxscore","5":"TOR","6":"NA","7":"TBR","8":"L","9":"2","10":"4","11":"NA","12":"5-5","13":"4","14":"1.5","15":"Archer","16":"Sanchez","17":"Boxberger","18":"03:00:00","19":"N","20":"14433","21":"-"},{"1":"86","2":"11","3":"Friday, Apr 17","4":"boxscore","5":"TOR","6":"NA","7":"ATL","8":"L","9":"7","10":"8","11":"NA","12":"5-6","13":"4","14":"2.5","15":"Martin","16":"Cecil","17":"Grilli","18":"03:09:00","19":"N","20":"21397","21":"--"},{"1":"87","2":"12","3":"Saturday, Apr 18","4":"boxscore","5":"TOR","6":"NA","7":"ATL","8":"W-wo","9":"6","10":"5","11":"10","12":"6-6","13":"3","14":"1.5","15":"Cecil","16":"Marimon","17":"NA","18":"02:41:00","19":"D","20":"34743","21":"+"},{"1":"88","2":"13","3":"Sunday, Apr 19","4":"boxscore","5":"TOR","6":"NA","7":"ATL","8":"L","9":"2","10":"5","11":"NA","12":"6-7","13":"4","14":"1.5","15":"Miller","16":"Norris","17":"Grilli","18":"02:41:00","19":"D","20":"44794","21":"-"},{"1":"89","2":"14","3":"Tuesday, Apr 21","4":"boxscore","5":"TOR","6":"NA","7":"BAL","8":"W","9":"13","10":"6","11":"NA","12":"7-7","13":"2","14":"2","15":"Buehrle","16":"Norris","17":"NA","18":"02:53:00","19":"N","20":"14184","21":"+"},{"1":"90","2":"15","3":"Wednesday, Apr 22","4":"boxscore","5":"TOR","6":"NA","7":"BAL","8":"W","9":"4","10":"2","11":"NA","12":"8-7","13":"2","14":"1","15":"Sanchez","16":"Jimenez","17":"Castro","18":"02:36:00","19":"N","20":"15606","21":"++"},{"1":"91","2":"16","3":"Thursday, Apr 23","4":"boxscore","5":"TOR","6":"NA","7":"BAL","8":"W","9":"7","10":"6","11":"NA","12":"9-7","13":"1","14":"Tied","15":"Hutchison","16":"Tillman","17":"Castro","18":"02:36:00","19":"N","20":"18581","21":"+++"},{"1":"92","2":"27","3":"Monday, May 4","4":"boxscore","5":"TOR","6":"NA","7":"NYY","8":"W","9":"3","10":"1","11":"NA","12":"13-14","13":"4","14":"3.5","15":"Dickey","16":"Martin","17":"Cecil","18":"02:18:00","19":"N","20":"19217","21":"+"},{"1":"93","2":"28","3":"Tuesday, May 5","4":"boxscore","5":"TOR","6":"NA","7":"NYY","8":"L","9":"3","10":"6","11":"NA","12":"13-15","13":"5","14":"4.5","15":"Pineda","16":"Estrada","17":"Miller","18":"02:54:00","19":"N","20":"21519","21":"-"},{"1":"94","2":"29","3":"Wednesday, May 6","4":"boxscore","5":"TOR","6":"NA","7":"NYY","8":"W","9":"5","10":"1","11":"NA","12":"14-15","13":"3","14":"3.5","15":"Buehrle","16":"Sabathia","17":"NA","18":"02:30:00","19":"N","20":"21312","21":"+"},{"1":"95","2":"30","3":"Friday, May 8","4":"boxscore","5":"TOR","6":"NA","7":"BOS","8":"W","9":"7","10":"0","11":"NA","12":"15-15","13":"3","14":"4","15":"Sanchez","16":"Miley","17":"NA","18":"02:41:00","19":"N","20":"30430","21":"++"},{"1":"96","2":"31","3":"Saturday, May 9","4":"boxscore","5":"TOR","6":"NA","7":"BOS","8":"W","9":"7","10":"1","11":"NA","12":"16-15","13":"3","14":"3","15":"Hutchison","16":"Kelly","17":"NA","18":"03:12:00","19":"D","20":"42917","21":"+++"},{"1":"97","2":"32","3":"Sunday, May 10","4":"boxscore","5":"TOR","6":"NA","7":"BOS","8":"L","9":"3","10":"6","11":"NA","12":"16-16","13":"3","14":"4","15":"Buchholz","16":"Dickey","17":"Uehara","18":"02:38:00","19":"D","20":"42419","21":"-"},{"1":"98","2":"40","3":"Monday, May 18","4":"boxscore","5":"TOR","6":"NA","7":"LAA","8":"W","9":"10","10":"6","11":"NA","12":"18-22","13":"5","14":"4.5","15":"Osuna","16":"Morin","17":"NA","18":"03:28:00","19":"D","20":"29306","21":"+"},{"1":"99","2":"41","3":"Tuesday, May 19","4":"boxscore","5":"TOR","6":"NA","7":"LAA","8":"L","9":"2","10":"3","11":"NA","12":"18-23","13":"5","14":"4.5","15":"Santiago","16":"Sanchez","17":"Street","18":"02:32:00","19":"N","20":"15062","21":"-"},{"1":"100","2":"42","3":"Wednesday, May 20","4":"boxscore","5":"TOR","6":"NA","7":"LAA","8":"L","9":"3","10":"4","11":"NA","12":"18-24","13":"5","14":"4.5","15":"Weaver","16":"Hutchison","17":"Street","18":"02:36:00","19":"N","20":"16402","21":"--"},{"1":"101","2":"43","3":"Thursday, May 21","4":"boxscore","5":"TOR","6":"NA","7":"LAA","8":"W","9":"8","10":"4","11":"NA","12":"19-24","13":"5","14":"4.5","15":"Dickey","16":"Shoemaker","17":"NA","18":"02:22:00","19":"N","20":"19014","21":"+"},{"1":"102","2":"44","3":"Friday, May 22","4":"boxscore","5":"TOR","6":"NA","7":"SEA","8":"L","9":"3","10":"4","11":"NA","12":"19-25","13":"5","14":"5.5","15":"Hernandez","16":"Estrada","17":"Rodney","18":"02:30:00","19":"N","20":"21195","21":"-"},{"1":"103","2":"45","3":"Saturday, May 23","4":"boxscore","5":"TOR","6":"NA","7":"SEA","8":"L","9":"2","10":"3","11":"NA","12":"19-26","13":"5","14":"5.5","15":"Paxton","16":"Buehrle","17":"Rodney","18":"02:25:00","19":"D","20":"33086","21":"--"},{"1":"104","2":"46","3":"Sunday, May 24","4":"boxscore","5":"TOR","6":"NA","7":"SEA","8":"W","9":"8","10":"2","11":"NA","12":"20-26","13":"5","14":"4.5","15":"Sanchez","16":"Walker","17":"NA","18":"02:50:00","19":"D","20":"37929","21":"+"},{"1":"105","2":"47","3":"Monday, May 25","4":"boxscore","5":"TOR","6":"NA","7":"CHW","8":"W","9":"6","10":"0","11":"NA","12":"21-26","13":"5","14":"3.5","15":"Hutchison","16":"Noesi","17":"NA","18":"02:10:00","19":"N","20":"15168","21":"++"},{"1":"106","2":"48","3":"Tuesday, May 26","4":"boxscore","5":"TOR","6":"NA","7":"CHW","8":"W-wo","9":"10","10":"9","11":"NA","12":"22-26","13":"4","14":"3","15":"Delabar","16":"Robertson","17":"NA","18":"03:12:00","19":"N","20":"17276","21":"+++"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>There is a lot of information here, but we’re going to focus on the attendances over near the right side, and in particular, we’re interested in the mean attendance over all games of which these are a sample (“early-season Blue Jays games in the years between 2010 and 2019”, or something like that). There are, of course, lots of reasons that attendances might vary (opposition, weather, weekend vs.&nbsp;weekday, etc.) that we are going to completely ignore here.</p>
<p>The normal<sup>12</sup> way to estimate a population mean is to use the confidence interval based on the one-sample <img src="https://latex.codecogs.com/png.latex?t">-test, but before we jump into that, we should look at a graph of the attendances:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(jays, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> attendance)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/tidy-simulation/index_files/figure-html/unnamed-chunk-19-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Well, that doesn’t look much like a normal distribution. It’s very much skewed to the right. There seem to be two<sup>13</sup> schools of thought as to what we should do now:</p>
<ul>
<li>we have a large enough sample (<img src="https://latex.codecogs.com/png.latex?n%20=%2025">) so that we should get enough help from the central limit theorem (also expressed as “the <img src="https://latex.codecogs.com/png.latex?t">-test is robust to non-normality”) and therefore the <img src="https://latex.codecogs.com/png.latex?t">-procedure should be fine.</li>
<li>this distribution is a long way from being normal, so there is no way we should use a <img src="https://latex.codecogs.com/png.latex?t">-procedure, instead using a sign test or signed-rank test,<sup>14</sup> inverted to get a confidence interval for the median attendance.</li>
</ul>
<p>Both of these have an air of handwavery about them. How do we decide between them? Well, let’s think about this a little more carefully. When it comes to getting confidence limits, it all depends on the <em>sampling distribution of the sample mean</em>. If that is close enough to normal, the <img src="https://latex.codecogs.com/png.latex?t">-interval is good. But this comes from repeated sampling. You conceptualize it by imagining taking lots of samples from the same population, working out the mean of each sample, and making something like a histogram or normal quantile plot of those. But but — we only have the one sample we have. How to think about possible sample means we might get?</p>
<p>A way around this is to use the <strong>bootstrap</strong>. The idea is to think of the sample we have as a population (resembling, we hope, the population we want to make inferences about of “all possible attendances”) and to take samples from our sample(!) of the same size as the sample we had. If we do this the obvious way (without replacement), we’ll get back the original sample we had, every time. So what we do instead is to sample from our sample, but <em>with</em> replacement so as to get a different set of values each time, with some values missing and some values repeated. Like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">s <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(jays<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>attendance, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb30-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sort</span>(s)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] 15062 15062 15086 15086 15086 15168 16402 17264 17276 17276 18581 19217
[13] 21519 21519 21519 21519 21519 29306 29306 33086 34743 37929 42419 42917
[25] 44794</code></pre>
</div>
</div>
<p>Sorting the sample reveals that the first two values and the next three are repeats, so there must be some values from the original sample that are missing. (This is the only reason I sorted them.)</p>
<p>The original data had a mean of</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">jays <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarise</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mean_att =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(attendance))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["mean_att"],"name":[1],"type":["dbl"],"align":["right"]}],"data":[{"1":"25070.16"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>but the bootstrap sample has a mean of</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(s)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 23946.44</code></pre>
</div>
</div>
<p>different; if we were to take more bootstrap samples, and find the mean of each one, we would get a sense of the sampling distribution of the sample mean. That is to say, we <em>simulate</em> the bootstrapped sampling distribution of the sample mean. Given what we’ve seen in the other simulations, the structure of the code below ought to come as no surprise:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb35-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb35-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">s =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(jays<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>attendance, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">replace =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb35-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">m =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(s)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb35-5">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["sim"],"name":[1],"type":["int"],"align":["right"]},{"label":["s"],"name":[2],"type":["list"],"align":["right"]},{"label":["m"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"<dbl [25]>","3":"22328.24"},{"1":"2","2":"<dbl [25]>","3":"29363.76"},{"1":"3","2":"<dbl [25]>","3":"24193.64"},{"1":"4","2":"<dbl [25]>","3":"22907.08"},{"1":"5","2":"<dbl [25]>","3":"22604.32"},{"1":"6","2":"<dbl [25]>","3":"22905.20"},{"1":"7","2":"<dbl [25]>","3":"23956.76"},{"1":"8","2":"<dbl [25]>","3":"23566.96"},{"1":"9","2":"<dbl [25]>","3":"24943.72"},{"1":"10","2":"<dbl [25]>","3":"27149.84"},{"1":"11","2":"<dbl [25]>","3":"24689.68"},{"1":"12","2":"<dbl [25]>","3":"29450.48"},{"1":"13","2":"<dbl [25]>","3":"27092.08"},{"1":"14","2":"<dbl [25]>","3":"24950.48"},{"1":"15","2":"<dbl [25]>","3":"27056.88"},{"1":"16","2":"<dbl [25]>","3":"29750.72"},{"1":"17","2":"<dbl [25]>","3":"26114.32"},{"1":"18","2":"<dbl [25]>","3":"28310.60"},{"1":"19","2":"<dbl [25]>","3":"27962.92"},{"1":"20","2":"<dbl [25]>","3":"25474.72"},{"1":"21","2":"<dbl [25]>","3":"24925.20"},{"1":"22","2":"<dbl [25]>","3":"25694.80"},{"1":"23","2":"<dbl [25]>","3":"25906.48"},{"1":"24","2":"<dbl [25]>","3":"25864.88"},{"1":"25","2":"<dbl [25]>","3":"25613.20"},{"1":"26","2":"<dbl [25]>","3":"24231.96"},{"1":"27","2":"<dbl [25]>","3":"27809.72"},{"1":"28","2":"<dbl [25]>","3":"21860.92"},{"1":"29","2":"<dbl [25]>","3":"24208.52"},{"1":"30","2":"<dbl [25]>","3":"22599.00"},{"1":"31","2":"<dbl [25]>","3":"25477.48"},{"1":"32","2":"<dbl [25]>","3":"30260.52"},{"1":"33","2":"<dbl [25]>","3":"25947.80"},{"1":"34","2":"<dbl [25]>","3":"30445.32"},{"1":"35","2":"<dbl [25]>","3":"24801.44"},{"1":"36","2":"<dbl [25]>","3":"26660.80"},{"1":"37","2":"<dbl [25]>","3":"24440.08"},{"1":"38","2":"<dbl [25]>","3":"23743.60"},{"1":"39","2":"<dbl [25]>","3":"28456.04"},{"1":"40","2":"<dbl [25]>","3":"25558.32"},{"1":"41","2":"<dbl [25]>","3":"26199.88"},{"1":"42","2":"<dbl [25]>","3":"26557.20"},{"1":"43","2":"<dbl [25]>","3":"24809.68"},{"1":"44","2":"<dbl [25]>","3":"25503.56"},{"1":"45","2":"<dbl [25]>","3":"22932.20"},{"1":"46","2":"<dbl [25]>","3":"22614.48"},{"1":"47","2":"<dbl [25]>","3":"22889.56"},{"1":"48","2":"<dbl [25]>","3":"22707.60"},{"1":"49","2":"<dbl [25]>","3":"22143.44"},{"1":"50","2":"<dbl [25]>","3":"25165.92"},{"1":"51","2":"<dbl [25]>","3":"25398.28"},{"1":"52","2":"<dbl [25]>","3":"26414.72"},{"1":"53","2":"<dbl [25]>","3":"26757.96"},{"1":"54","2":"<dbl [25]>","3":"23003.76"},{"1":"55","2":"<dbl [25]>","3":"24830.96"},{"1":"56","2":"<dbl [25]>","3":"22968.96"},{"1":"57","2":"<dbl [25]>","3":"23267.08"},{"1":"58","2":"<dbl [25]>","3":"24961.44"},{"1":"59","2":"<dbl [25]>","3":"24659.84"},{"1":"60","2":"<dbl [25]>","3":"25865.36"},{"1":"61","2":"<dbl [25]>","3":"23100.08"},{"1":"62","2":"<dbl [25]>","3":"23845.56"},{"1":"63","2":"<dbl [25]>","3":"23471.28"},{"1":"64","2":"<dbl [25]>","3":"25033.20"},{"1":"65","2":"<dbl [25]>","3":"24442.04"},{"1":"66","2":"<dbl [25]>","3":"24553.48"},{"1":"67","2":"<dbl [25]>","3":"27273.64"},{"1":"68","2":"<dbl [25]>","3":"29995.20"},{"1":"69","2":"<dbl [25]>","3":"27038.24"},{"1":"70","2":"<dbl [25]>","3":"24066.68"},{"1":"71","2":"<dbl [25]>","3":"22675.28"},{"1":"72","2":"<dbl [25]>","3":"25429.72"},{"1":"73","2":"<dbl [25]>","3":"26133.80"},{"1":"74","2":"<dbl [25]>","3":"30593.72"},{"1":"75","2":"<dbl [25]>","3":"25758.40"},{"1":"76","2":"<dbl [25]>","3":"24698.84"},{"1":"77","2":"<dbl [25]>","3":"23416.60"},{"1":"78","2":"<dbl [25]>","3":"26464.80"},{"1":"79","2":"<dbl [25]>","3":"24968.00"},{"1":"80","2":"<dbl [25]>","3":"25810.48"},{"1":"81","2":"<dbl [25]>","3":"27907.32"},{"1":"82","2":"<dbl [25]>","3":"26470.84"},{"1":"83","2":"<dbl [25]>","3":"26150.84"},{"1":"84","2":"<dbl [25]>","3":"29442.08"},{"1":"85","2":"<dbl [25]>","3":"26966.24"},{"1":"86","2":"<dbl [25]>","3":"27422.12"},{"1":"87","2":"<dbl [25]>","3":"23569.08"},{"1":"88","2":"<dbl [25]>","3":"24321.28"},{"1":"89","2":"<dbl [25]>","3":"27376.36"},{"1":"90","2":"<dbl [25]>","3":"23109.48"},{"1":"91","2":"<dbl [25]>","3":"24693.04"},{"1":"92","2":"<dbl [25]>","3":"24922.44"},{"1":"93","2":"<dbl [25]>","3":"24351.80"},{"1":"94","2":"<dbl [25]>","3":"23089.64"},{"1":"95","2":"<dbl [25]>","3":"25246.60"},{"1":"96","2":"<dbl [25]>","3":"24235.60"},{"1":"97","2":"<dbl [25]>","3":"26758.20"},{"1":"98","2":"<dbl [25]>","3":"25337.44"},{"1":"99","2":"<dbl [25]>","3":"23846.28"},{"1":"100","2":"<dbl [25]>","3":"26825.16"},{"1":"101","2":"<dbl [25]>","3":"26696.32"},{"1":"102","2":"<dbl [25]>","3":"27730.24"},{"1":"103","2":"<dbl [25]>","3":"22622.92"},{"1":"104","2":"<dbl [25]>","3":"26323.60"},{"1":"105","2":"<dbl [25]>","3":"24849.32"},{"1":"106","2":"<dbl [25]>","3":"24085.40"},{"1":"107","2":"<dbl [25]>","3":"28418.88"},{"1":"108","2":"<dbl [25]>","3":"26292.04"},{"1":"109","2":"<dbl [25]>","3":"25219.04"},{"1":"110","2":"<dbl [25]>","3":"27100.32"},{"1":"111","2":"<dbl [25]>","3":"23916.96"},{"1":"112","2":"<dbl [25]>","3":"26991.76"},{"1":"113","2":"<dbl [25]>","3":"26068.40"},{"1":"114","2":"<dbl [25]>","3":"25132.12"},{"1":"115","2":"<dbl [25]>","3":"24001.48"},{"1":"116","2":"<dbl [25]>","3":"26696.40"},{"1":"117","2":"<dbl [25]>","3":"22732.60"},{"1":"118","2":"<dbl [25]>","3":"23556.68"},{"1":"119","2":"<dbl [25]>","3":"27564.20"},{"1":"120","2":"<dbl [25]>","3":"24606.72"},{"1":"121","2":"<dbl [25]>","3":"23246.92"},{"1":"122","2":"<dbl [25]>","3":"26101.76"},{"1":"123","2":"<dbl [25]>","3":"25381.08"},{"1":"124","2":"<dbl [25]>","3":"22986.00"},{"1":"125","2":"<dbl [25]>","3":"25739.60"},{"1":"126","2":"<dbl [25]>","3":"24206.68"},{"1":"127","2":"<dbl [25]>","3":"26700.40"},{"1":"128","2":"<dbl [25]>","3":"25077.40"},{"1":"129","2":"<dbl [25]>","3":"26731.84"},{"1":"130","2":"<dbl [25]>","3":"23492.08"},{"1":"131","2":"<dbl [25]>","3":"23344.84"},{"1":"132","2":"<dbl [25]>","3":"26618.00"},{"1":"133","2":"<dbl [25]>","3":"28368.68"},{"1":"134","2":"<dbl [25]>","3":"24155.40"},{"1":"135","2":"<dbl [25]>","3":"25222.96"},{"1":"136","2":"<dbl [25]>","3":"21046.80"},{"1":"137","2":"<dbl [25]>","3":"24793.92"},{"1":"138","2":"<dbl [25]>","3":"24801.00"},{"1":"139","2":"<dbl [25]>","3":"24656.08"},{"1":"140","2":"<dbl [25]>","3":"23776.04"},{"1":"141","2":"<dbl [25]>","3":"28508.28"},{"1":"142","2":"<dbl [25]>","3":"26074.44"},{"1":"143","2":"<dbl [25]>","3":"23203.24"},{"1":"144","2":"<dbl [25]>","3":"28235.84"},{"1":"145","2":"<dbl [25]>","3":"23134.64"},{"1":"146","2":"<dbl [25]>","3":"25498.92"},{"1":"147","2":"<dbl [25]>","3":"26066.60"},{"1":"148","2":"<dbl [25]>","3":"24830.80"},{"1":"149","2":"<dbl [25]>","3":"19995.00"},{"1":"150","2":"<dbl [25]>","3":"23970.52"},{"1":"151","2":"<dbl [25]>","3":"25016.76"},{"1":"152","2":"<dbl [25]>","3":"23719.92"},{"1":"153","2":"<dbl [25]>","3":"27908.04"},{"1":"154","2":"<dbl [25]>","3":"26769.20"},{"1":"155","2":"<dbl [25]>","3":"23571.88"},{"1":"156","2":"<dbl [25]>","3":"27517.64"},{"1":"157","2":"<dbl [25]>","3":"24691.76"},{"1":"158","2":"<dbl [25]>","3":"27483.76"},{"1":"159","2":"<dbl [25]>","3":"22318.88"},{"1":"160","2":"<dbl [25]>","3":"21996.20"},{"1":"161","2":"<dbl [25]>","3":"26633.60"},{"1":"162","2":"<dbl [25]>","3":"27708.08"},{"1":"163","2":"<dbl [25]>","3":"26954.08"},{"1":"164","2":"<dbl [25]>","3":"27075.96"},{"1":"165","2":"<dbl [25]>","3":"24387.52"},{"1":"166","2":"<dbl [25]>","3":"24700.08"},{"1":"167","2":"<dbl [25]>","3":"25754.88"},{"1":"168","2":"<dbl [25]>","3":"26785.44"},{"1":"169","2":"<dbl [25]>","3":"27324.04"},{"1":"170","2":"<dbl [25]>","3":"28040.40"},{"1":"171","2":"<dbl [25]>","3":"29174.80"},{"1":"172","2":"<dbl [25]>","3":"29951.76"},{"1":"173","2":"<dbl [25]>","3":"26012.20"},{"1":"174","2":"<dbl [25]>","3":"20123.12"},{"1":"175","2":"<dbl [25]>","3":"24158.00"},{"1":"176","2":"<dbl [25]>","3":"26639.28"},{"1":"177","2":"<dbl [25]>","3":"23756.92"},{"1":"178","2":"<dbl [25]>","3":"20516.24"},{"1":"179","2":"<dbl [25]>","3":"22804.36"},{"1":"180","2":"<dbl [25]>","3":"23501.92"},{"1":"181","2":"<dbl [25]>","3":"28515.88"},{"1":"182","2":"<dbl [25]>","3":"27795.64"},{"1":"183","2":"<dbl [25]>","3":"24586.52"},{"1":"184","2":"<dbl [25]>","3":"24129.16"},{"1":"185","2":"<dbl [25]>","3":"23432.80"},{"1":"186","2":"<dbl [25]>","3":"26059.72"},{"1":"187","2":"<dbl [25]>","3":"24888.52"},{"1":"188","2":"<dbl [25]>","3":"22565.80"},{"1":"189","2":"<dbl [25]>","3":"27391.92"},{"1":"190","2":"<dbl [25]>","3":"27796.28"},{"1":"191","2":"<dbl [25]>","3":"27477.84"},{"1":"192","2":"<dbl [25]>","3":"23197.12"},{"1":"193","2":"<dbl [25]>","3":"26103.96"},{"1":"194","2":"<dbl [25]>","3":"23225.84"},{"1":"195","2":"<dbl [25]>","3":"22955.40"},{"1":"196","2":"<dbl [25]>","3":"22773.52"},{"1":"197","2":"<dbl [25]>","3":"20978.44"},{"1":"198","2":"<dbl [25]>","3":"23525.60"},{"1":"199","2":"<dbl [25]>","3":"22553.32"},{"1":"200","2":"<dbl [25]>","3":"22022.00"},{"1":"201","2":"<dbl [25]>","3":"23994.44"},{"1":"202","2":"<dbl [25]>","3":"25376.20"},{"1":"203","2":"<dbl [25]>","3":"23576.12"},{"1":"204","2":"<dbl [25]>","3":"27398.20"},{"1":"205","2":"<dbl [25]>","3":"25950.84"},{"1":"206","2":"<dbl [25]>","3":"25990.64"},{"1":"207","2":"<dbl [25]>","3":"25094.04"},{"1":"208","2":"<dbl [25]>","3":"26100.08"},{"1":"209","2":"<dbl [25]>","3":"27924.56"},{"1":"210","2":"<dbl [25]>","3":"27775.92"},{"1":"211","2":"<dbl [25]>","3":"23106.04"},{"1":"212","2":"<dbl [25]>","3":"25514.48"},{"1":"213","2":"<dbl [25]>","3":"26145.88"},{"1":"214","2":"<dbl [25]>","3":"29640.52"},{"1":"215","2":"<dbl [25]>","3":"24154.08"},{"1":"216","2":"<dbl [25]>","3":"25731.92"},{"1":"217","2":"<dbl [25]>","3":"24343.48"},{"1":"218","2":"<dbl [25]>","3":"26748.92"},{"1":"219","2":"<dbl [25]>","3":"26159.00"},{"1":"220","2":"<dbl [25]>","3":"27085.76"},{"1":"221","2":"<dbl [25]>","3":"25111.16"},{"1":"222","2":"<dbl [25]>","3":"24089.36"},{"1":"223","2":"<dbl [25]>","3":"24269.64"},{"1":"224","2":"<dbl [25]>","3":"27047.76"},{"1":"225","2":"<dbl [25]>","3":"23138.96"},{"1":"226","2":"<dbl [25]>","3":"32821.44"},{"1":"227","2":"<dbl [25]>","3":"22307.72"},{"1":"228","2":"<dbl [25]>","3":"28259.72"},{"1":"229","2":"<dbl [25]>","3":"20867.32"},{"1":"230","2":"<dbl [25]>","3":"26074.08"},{"1":"231","2":"<dbl [25]>","3":"25642.68"},{"1":"232","2":"<dbl [25]>","3":"21771.24"},{"1":"233","2":"<dbl [25]>","3":"22610.44"},{"1":"234","2":"<dbl [25]>","3":"23719.20"},{"1":"235","2":"<dbl [25]>","3":"25747.04"},{"1":"236","2":"<dbl [25]>","3":"30744.16"},{"1":"237","2":"<dbl [25]>","3":"22970.04"},{"1":"238","2":"<dbl [25]>","3":"19344.56"},{"1":"239","2":"<dbl [25]>","3":"24925.12"},{"1":"240","2":"<dbl [25]>","3":"25089.20"},{"1":"241","2":"<dbl [25]>","3":"26419.64"},{"1":"242","2":"<dbl [25]>","3":"27927.48"},{"1":"243","2":"<dbl [25]>","3":"20719.36"},{"1":"244","2":"<dbl [25]>","3":"23667.56"},{"1":"245","2":"<dbl [25]>","3":"25832.44"},{"1":"246","2":"<dbl [25]>","3":"22953.96"},{"1":"247","2":"<dbl [25]>","3":"26278.92"},{"1":"248","2":"<dbl [25]>","3":"23010.88"},{"1":"249","2":"<dbl [25]>","3":"26377.64"},{"1":"250","2":"<dbl [25]>","3":"25469.24"},{"1":"251","2":"<dbl [25]>","3":"28234.56"},{"1":"252","2":"<dbl [25]>","3":"24706.00"},{"1":"253","2":"<dbl [25]>","3":"25370.08"},{"1":"254","2":"<dbl [25]>","3":"25027.52"},{"1":"255","2":"<dbl [25]>","3":"26107.36"},{"1":"256","2":"<dbl [25]>","3":"23955.64"},{"1":"257","2":"<dbl [25]>","3":"25696.84"},{"1":"258","2":"<dbl [25]>","3":"20927.24"},{"1":"259","2":"<dbl [25]>","3":"23332.32"},{"1":"260","2":"<dbl [25]>","3":"25495.00"},{"1":"261","2":"<dbl [25]>","3":"25995.40"},{"1":"262","2":"<dbl [25]>","3":"25773.48"},{"1":"263","2":"<dbl [25]>","3":"25325.80"},{"1":"264","2":"<dbl [25]>","3":"24843.04"},{"1":"265","2":"<dbl [25]>","3":"25666.32"},{"1":"266","2":"<dbl [25]>","3":"25475.20"},{"1":"267","2":"<dbl [25]>","3":"24857.96"},{"1":"268","2":"<dbl [25]>","3":"23149.92"},{"1":"269","2":"<dbl [25]>","3":"24974.96"},{"1":"270","2":"<dbl [25]>","3":"23692.16"},{"1":"271","2":"<dbl [25]>","3":"22884.56"},{"1":"272","2":"<dbl [25]>","3":"24017.72"},{"1":"273","2":"<dbl [25]>","3":"25173.56"},{"1":"274","2":"<dbl [25]>","3":"25703.24"},{"1":"275","2":"<dbl [25]>","3":"26935.68"},{"1":"276","2":"<dbl [25]>","3":"25784.44"},{"1":"277","2":"<dbl [25]>","3":"22149.16"},{"1":"278","2":"<dbl [25]>","3":"26414.84"},{"1":"279","2":"<dbl [25]>","3":"26939.80"},{"1":"280","2":"<dbl [25]>","3":"20963.28"},{"1":"281","2":"<dbl [25]>","3":"22976.76"},{"1":"282","2":"<dbl [25]>","3":"27026.36"},{"1":"283","2":"<dbl [25]>","3":"22926.84"},{"1":"284","2":"<dbl [25]>","3":"29529.08"},{"1":"285","2":"<dbl [25]>","3":"24593.92"},{"1":"286","2":"<dbl [25]>","3":"25269.36"},{"1":"287","2":"<dbl [25]>","3":"24134.52"},{"1":"288","2":"<dbl [25]>","3":"24804.68"},{"1":"289","2":"<dbl [25]>","3":"28474.36"},{"1":"290","2":"<dbl [25]>","3":"25659.00"},{"1":"291","2":"<dbl [25]>","3":"22297.04"},{"1":"292","2":"<dbl [25]>","3":"27860.76"},{"1":"293","2":"<dbl [25]>","3":"24599.20"},{"1":"294","2":"<dbl [25]>","3":"26701.28"},{"1":"295","2":"<dbl [25]>","3":"25624.28"},{"1":"296","2":"<dbl [25]>","3":"21605.40"},{"1":"297","2":"<dbl [25]>","3":"25416.88"},{"1":"298","2":"<dbl [25]>","3":"26947.48"},{"1":"299","2":"<dbl [25]>","3":"24008.28"},{"1":"300","2":"<dbl [25]>","3":"25607.88"},{"1":"301","2":"<dbl [25]>","3":"24511.68"},{"1":"302","2":"<dbl [25]>","3":"23854.96"},{"1":"303","2":"<dbl [25]>","3":"26470.68"},{"1":"304","2":"<dbl [25]>","3":"24093.76"},{"1":"305","2":"<dbl [25]>","3":"24182.28"},{"1":"306","2":"<dbl [25]>","3":"24200.84"},{"1":"307","2":"<dbl [25]>","3":"24451.72"},{"1":"308","2":"<dbl [25]>","3":"24680.08"},{"1":"309","2":"<dbl [25]>","3":"24625.84"},{"1":"310","2":"<dbl [25]>","3":"29378.36"},{"1":"311","2":"<dbl [25]>","3":"27493.84"},{"1":"312","2":"<dbl [25]>","3":"26539.72"},{"1":"313","2":"<dbl [25]>","3":"27219.92"},{"1":"314","2":"<dbl [25]>","3":"23071.24"},{"1":"315","2":"<dbl [25]>","3":"20324.84"},{"1":"316","2":"<dbl [25]>","3":"23442.96"},{"1":"317","2":"<dbl [25]>","3":"26064.72"},{"1":"318","2":"<dbl [25]>","3":"25081.64"},{"1":"319","2":"<dbl [25]>","3":"26675.28"},{"1":"320","2":"<dbl [25]>","3":"22514.20"},{"1":"321","2":"<dbl [25]>","3":"27678.20"},{"1":"322","2":"<dbl [25]>","3":"22568.96"},{"1":"323","2":"<dbl [25]>","3":"25953.04"},{"1":"324","2":"<dbl [25]>","3":"21076.96"},{"1":"325","2":"<dbl [25]>","3":"26212.76"},{"1":"326","2":"<dbl [25]>","3":"26286.56"},{"1":"327","2":"<dbl [25]>","3":"26294.72"},{"1":"328","2":"<dbl [25]>","3":"24144.92"},{"1":"329","2":"<dbl [25]>","3":"21745.08"},{"1":"330","2":"<dbl [25]>","3":"25404.12"},{"1":"331","2":"<dbl [25]>","3":"28398.28"},{"1":"332","2":"<dbl [25]>","3":"25188.88"},{"1":"333","2":"<dbl [25]>","3":"26268.88"},{"1":"334","2":"<dbl [25]>","3":"24912.20"},{"1":"335","2":"<dbl [25]>","3":"21196.96"},{"1":"336","2":"<dbl [25]>","3":"23094.00"},{"1":"337","2":"<dbl [25]>","3":"25481.72"},{"1":"338","2":"<dbl [25]>","3":"22845.92"},{"1":"339","2":"<dbl [25]>","3":"27073.00"},{"1":"340","2":"<dbl [25]>","3":"25557.04"},{"1":"341","2":"<dbl [25]>","3":"25720.68"},{"1":"342","2":"<dbl [25]>","3":"23156.80"},{"1":"343","2":"<dbl [25]>","3":"24919.20"},{"1":"344","2":"<dbl [25]>","3":"28478.08"},{"1":"345","2":"<dbl [25]>","3":"27674.08"},{"1":"346","2":"<dbl [25]>","3":"22167.04"},{"1":"347","2":"<dbl [25]>","3":"25216.28"},{"1":"348","2":"<dbl [25]>","3":"24570.08"},{"1":"349","2":"<dbl [25]>","3":"21943.84"},{"1":"350","2":"<dbl [25]>","3":"27672.64"},{"1":"351","2":"<dbl [25]>","3":"27609.68"},{"1":"352","2":"<dbl [25]>","3":"28055.08"},{"1":"353","2":"<dbl [25]>","3":"24737.12"},{"1":"354","2":"<dbl [25]>","3":"21540.60"},{"1":"355","2":"<dbl [25]>","3":"25475.48"},{"1":"356","2":"<dbl [25]>","3":"25135.76"},{"1":"357","2":"<dbl [25]>","3":"25425.40"},{"1":"358","2":"<dbl [25]>","3":"24727.72"},{"1":"359","2":"<dbl [25]>","3":"23656.52"},{"1":"360","2":"<dbl [25]>","3":"23721.68"},{"1":"361","2":"<dbl [25]>","3":"23898.44"},{"1":"362","2":"<dbl [25]>","3":"25070.28"},{"1":"363","2":"<dbl [25]>","3":"25288.96"},{"1":"364","2":"<dbl [25]>","3":"24871.84"},{"1":"365","2":"<dbl [25]>","3":"23913.04"},{"1":"366","2":"<dbl [25]>","3":"23366.28"},{"1":"367","2":"<dbl [25]>","3":"26193.28"},{"1":"368","2":"<dbl [25]>","3":"23236.32"},{"1":"369","2":"<dbl [25]>","3":"24030.40"},{"1":"370","2":"<dbl [25]>","3":"28995.40"},{"1":"371","2":"<dbl [25]>","3":"20553.84"},{"1":"372","2":"<dbl [25]>","3":"23448.12"},{"1":"373","2":"<dbl [25]>","3":"22607.08"},{"1":"374","2":"<dbl [25]>","3":"27015.40"},{"1":"375","2":"<dbl [25]>","3":"26506.00"},{"1":"376","2":"<dbl [25]>","3":"25234.80"},{"1":"377","2":"<dbl [25]>","3":"26423.40"},{"1":"378","2":"<dbl [25]>","3":"27362.40"},{"1":"379","2":"<dbl [25]>","3":"29163.84"},{"1":"380","2":"<dbl [25]>","3":"23361.20"},{"1":"381","2":"<dbl [25]>","3":"24927.96"},{"1":"382","2":"<dbl [25]>","3":"28512.32"},{"1":"383","2":"<dbl [25]>","3":"27391.08"},{"1":"384","2":"<dbl [25]>","3":"22250.88"},{"1":"385","2":"<dbl [25]>","3":"30383.52"},{"1":"386","2":"<dbl [25]>","3":"24435.68"},{"1":"387","2":"<dbl [25]>","3":"24139.68"},{"1":"388","2":"<dbl [25]>","3":"24349.72"},{"1":"389","2":"<dbl [25]>","3":"26641.96"},{"1":"390","2":"<dbl [25]>","3":"22186.40"},{"1":"391","2":"<dbl [25]>","3":"24715.96"},{"1":"392","2":"<dbl [25]>","3":"24393.64"},{"1":"393","2":"<dbl [25]>","3":"22713.28"},{"1":"394","2":"<dbl [25]>","3":"25204.92"},{"1":"395","2":"<dbl [25]>","3":"23264.80"},{"1":"396","2":"<dbl [25]>","3":"22322.68"},{"1":"397","2":"<dbl [25]>","3":"22829.40"},{"1":"398","2":"<dbl [25]>","3":"25646.76"},{"1":"399","2":"<dbl [25]>","3":"28197.28"},{"1":"400","2":"<dbl [25]>","3":"23396.12"},{"1":"401","2":"<dbl [25]>","3":"22825.76"},{"1":"402","2":"<dbl [25]>","3":"24018.68"},{"1":"403","2":"<dbl [25]>","3":"26497.44"},{"1":"404","2":"<dbl [25]>","3":"26593.12"},{"1":"405","2":"<dbl [25]>","3":"21370.24"},{"1":"406","2":"<dbl [25]>","3":"23362.44"},{"1":"407","2":"<dbl [25]>","3":"23505.32"},{"1":"408","2":"<dbl [25]>","3":"26064.80"},{"1":"409","2":"<dbl [25]>","3":"26266.92"},{"1":"410","2":"<dbl [25]>","3":"22668.44"},{"1":"411","2":"<dbl [25]>","3":"31203.84"},{"1":"412","2":"<dbl [25]>","3":"19494.36"},{"1":"413","2":"<dbl [25]>","3":"24253.44"},{"1":"414","2":"<dbl [25]>","3":"25185.24"},{"1":"415","2":"<dbl [25]>","3":"27947.48"},{"1":"416","2":"<dbl [25]>","3":"23124.76"},{"1":"417","2":"<dbl [25]>","3":"24012.80"},{"1":"418","2":"<dbl [25]>","3":"21525.32"},{"1":"419","2":"<dbl [25]>","3":"22376.00"},{"1":"420","2":"<dbl [25]>","3":"26569.52"},{"1":"421","2":"<dbl [25]>","3":"29638.56"},{"1":"422","2":"<dbl [25]>","3":"28181.16"},{"1":"423","2":"<dbl [25]>","3":"26170.40"},{"1":"424","2":"<dbl [25]>","3":"23325.40"},{"1":"425","2":"<dbl [25]>","3":"25648.52"},{"1":"426","2":"<dbl [25]>","3":"27568.72"},{"1":"427","2":"<dbl [25]>","3":"24447.60"},{"1":"428","2":"<dbl [25]>","3":"27918.88"},{"1":"429","2":"<dbl [25]>","3":"24691.64"},{"1":"430","2":"<dbl [25]>","3":"27405.24"},{"1":"431","2":"<dbl [25]>","3":"24576.24"},{"1":"432","2":"<dbl [25]>","3":"25780.52"},{"1":"433","2":"<dbl [25]>","3":"23766.24"},{"1":"434","2":"<dbl [25]>","3":"27143.00"},{"1":"435","2":"<dbl [25]>","3":"24163.12"},{"1":"436","2":"<dbl [25]>","3":"19866.48"},{"1":"437","2":"<dbl [25]>","3":"26263.04"},{"1":"438","2":"<dbl [25]>","3":"24336.36"},{"1":"439","2":"<dbl [25]>","3":"25117.20"},{"1":"440","2":"<dbl [25]>","3":"27228.68"},{"1":"441","2":"<dbl [25]>","3":"29810.32"},{"1":"442","2":"<dbl [25]>","3":"25165.00"},{"1":"443","2":"<dbl [25]>","3":"23428.28"},{"1":"444","2":"<dbl [25]>","3":"22784.04"},{"1":"445","2":"<dbl [25]>","3":"25782.36"},{"1":"446","2":"<dbl [25]>","3":"20650.88"},{"1":"447","2":"<dbl [25]>","3":"21885.60"},{"1":"448","2":"<dbl [25]>","3":"22431.36"},{"1":"449","2":"<dbl [25]>","3":"23904.40"},{"1":"450","2":"<dbl [25]>","3":"25914.80"},{"1":"451","2":"<dbl [25]>","3":"21412.12"},{"1":"452","2":"<dbl [25]>","3":"25843.60"},{"1":"453","2":"<dbl [25]>","3":"25106.40"},{"1":"454","2":"<dbl [25]>","3":"21243.80"},{"1":"455","2":"<dbl [25]>","3":"26458.28"},{"1":"456","2":"<dbl [25]>","3":"24818.16"},{"1":"457","2":"<dbl [25]>","3":"23582.80"},{"1":"458","2":"<dbl [25]>","3":"22336.52"},{"1":"459","2":"<dbl [25]>","3":"24585.68"},{"1":"460","2":"<dbl [25]>","3":"26508.32"},{"1":"461","2":"<dbl [25]>","3":"26161.36"},{"1":"462","2":"<dbl [25]>","3":"23127.40"},{"1":"463","2":"<dbl [25]>","3":"26434.24"},{"1":"464","2":"<dbl [25]>","3":"26520.80"},{"1":"465","2":"<dbl [25]>","3":"22364.12"},{"1":"466","2":"<dbl [25]>","3":"28298.36"},{"1":"467","2":"<dbl [25]>","3":"25001.96"},{"1":"468","2":"<dbl [25]>","3":"28474.72"},{"1":"469","2":"<dbl [25]>","3":"22713.80"},{"1":"470","2":"<dbl [25]>","3":"23284.44"},{"1":"471","2":"<dbl [25]>","3":"24264.44"},{"1":"472","2":"<dbl [25]>","3":"26093.64"},{"1":"473","2":"<dbl [25]>","3":"24293.16"},{"1":"474","2":"<dbl [25]>","3":"24600.76"},{"1":"475","2":"<dbl [25]>","3":"27615.96"},{"1":"476","2":"<dbl [25]>","3":"20908.12"},{"1":"477","2":"<dbl [25]>","3":"24748.44"},{"1":"478","2":"<dbl [25]>","3":"25205.24"},{"1":"479","2":"<dbl [25]>","3":"29469.52"},{"1":"480","2":"<dbl [25]>","3":"22327.32"},{"1":"481","2":"<dbl [25]>","3":"27639.92"},{"1":"482","2":"<dbl [25]>","3":"26121.04"},{"1":"483","2":"<dbl [25]>","3":"21002.32"},{"1":"484","2":"<dbl [25]>","3":"26756.04"},{"1":"485","2":"<dbl [25]>","3":"25662.08"},{"1":"486","2":"<dbl [25]>","3":"26380.92"},{"1":"487","2":"<dbl [25]>","3":"25100.88"},{"1":"488","2":"<dbl [25]>","3":"26024.08"},{"1":"489","2":"<dbl [25]>","3":"28153.36"},{"1":"490","2":"<dbl [25]>","3":"21859.56"},{"1":"491","2":"<dbl [25]>","3":"22535.68"},{"1":"492","2":"<dbl [25]>","3":"26389.28"},{"1":"493","2":"<dbl [25]>","3":"24042.68"},{"1":"494","2":"<dbl [25]>","3":"24897.72"},{"1":"495","2":"<dbl [25]>","3":"25368.72"},{"1":"496","2":"<dbl [25]>","3":"28434.64"},{"1":"497","2":"<dbl [25]>","3":"21909.84"},{"1":"498","2":"<dbl [25]>","3":"26921.84"},{"1":"499","2":"<dbl [25]>","3":"30237.32"},{"1":"500","2":"<dbl [25]>","3":"26122.56"},{"1":"501","2":"<dbl [25]>","3":"24501.20"},{"1":"502","2":"<dbl [25]>","3":"26190.96"},{"1":"503","2":"<dbl [25]>","3":"27669.72"},{"1":"504","2":"<dbl [25]>","3":"29231.16"},{"1":"505","2":"<dbl [25]>","3":"23713.64"},{"1":"506","2":"<dbl [25]>","3":"21747.32"},{"1":"507","2":"<dbl [25]>","3":"24788.04"},{"1":"508","2":"<dbl [25]>","3":"23900.76"},{"1":"509","2":"<dbl [25]>","3":"25941.04"},{"1":"510","2":"<dbl [25]>","3":"27556.08"},{"1":"511","2":"<dbl [25]>","3":"24610.00"},{"1":"512","2":"<dbl [25]>","3":"28468.20"},{"1":"513","2":"<dbl [25]>","3":"30141.52"},{"1":"514","2":"<dbl [25]>","3":"21384.48"},{"1":"515","2":"<dbl [25]>","3":"24713.60"},{"1":"516","2":"<dbl [25]>","3":"27418.12"},{"1":"517","2":"<dbl [25]>","3":"30379.20"},{"1":"518","2":"<dbl [25]>","3":"27660.96"},{"1":"519","2":"<dbl [25]>","3":"20883.96"},{"1":"520","2":"<dbl [25]>","3":"24491.32"},{"1":"521","2":"<dbl [25]>","3":"24698.00"},{"1":"522","2":"<dbl [25]>","3":"23633.16"},{"1":"523","2":"<dbl [25]>","3":"25330.84"},{"1":"524","2":"<dbl [25]>","3":"26615.28"},{"1":"525","2":"<dbl [25]>","3":"23792.76"},{"1":"526","2":"<dbl [25]>","3":"25220.92"},{"1":"527","2":"<dbl [25]>","3":"25994.12"},{"1":"528","2":"<dbl [25]>","3":"21917.24"},{"1":"529","2":"<dbl [25]>","3":"30809.60"},{"1":"530","2":"<dbl [25]>","3":"30287.44"},{"1":"531","2":"<dbl [25]>","3":"27698.24"},{"1":"532","2":"<dbl [25]>","3":"22666.12"},{"1":"533","2":"<dbl [25]>","3":"27746.64"},{"1":"534","2":"<dbl [25]>","3":"24679.24"},{"1":"535","2":"<dbl [25]>","3":"23600.60"},{"1":"536","2":"<dbl [25]>","3":"25359.36"},{"1":"537","2":"<dbl [25]>","3":"26266.32"},{"1":"538","2":"<dbl [25]>","3":"22497.48"},{"1":"539","2":"<dbl [25]>","3":"27170.44"},{"1":"540","2":"<dbl [25]>","3":"24936.80"},{"1":"541","2":"<dbl [25]>","3":"24196.64"},{"1":"542","2":"<dbl [25]>","3":"26307.52"},{"1":"543","2":"<dbl [25]>","3":"24259.96"},{"1":"544","2":"<dbl [25]>","3":"24570.08"},{"1":"545","2":"<dbl [25]>","3":"25458.72"},{"1":"546","2":"<dbl [25]>","3":"27709.36"},{"1":"547","2":"<dbl [25]>","3":"23405.28"},{"1":"548","2":"<dbl [25]>","3":"22328.68"},{"1":"549","2":"<dbl [25]>","3":"25935.44"},{"1":"550","2":"<dbl [25]>","3":"29288.72"},{"1":"551","2":"<dbl [25]>","3":"24160.00"},{"1":"552","2":"<dbl [25]>","3":"27196.32"},{"1":"553","2":"<dbl [25]>","3":"21187.16"},{"1":"554","2":"<dbl [25]>","3":"22585.64"},{"1":"555","2":"<dbl [25]>","3":"23604.56"},{"1":"556","2":"<dbl [25]>","3":"24016.20"},{"1":"557","2":"<dbl [25]>","3":"23982.40"},{"1":"558","2":"<dbl [25]>","3":"26837.56"},{"1":"559","2":"<dbl [25]>","3":"23131.24"},{"1":"560","2":"<dbl [25]>","3":"28291.96"},{"1":"561","2":"<dbl [25]>","3":"26929.72"},{"1":"562","2":"<dbl [25]>","3":"23449.92"},{"1":"563","2":"<dbl [25]>","3":"24415.04"},{"1":"564","2":"<dbl [25]>","3":"27246.08"},{"1":"565","2":"<dbl [25]>","3":"28341.76"},{"1":"566","2":"<dbl [25]>","3":"24338.92"},{"1":"567","2":"<dbl [25]>","3":"23593.68"},{"1":"568","2":"<dbl [25]>","3":"24362.60"},{"1":"569","2":"<dbl [25]>","3":"26893.00"},{"1":"570","2":"<dbl [25]>","3":"20952.80"},{"1":"571","2":"<dbl [25]>","3":"23697.68"},{"1":"572","2":"<dbl [25]>","3":"22377.72"},{"1":"573","2":"<dbl [25]>","3":"24000.40"},{"1":"574","2":"<dbl [25]>","3":"26383.56"},{"1":"575","2":"<dbl [25]>","3":"26892.16"},{"1":"576","2":"<dbl [25]>","3":"28238.36"},{"1":"577","2":"<dbl [25]>","3":"22036.40"},{"1":"578","2":"<dbl [25]>","3":"25944.56"},{"1":"579","2":"<dbl [25]>","3":"26057.88"},{"1":"580","2":"<dbl [25]>","3":"24473.04"},{"1":"581","2":"<dbl [25]>","3":"24004.60"},{"1":"582","2":"<dbl [25]>","3":"23535.48"},{"1":"583","2":"<dbl [25]>","3":"22810.96"},{"1":"584","2":"<dbl [25]>","3":"20772.16"},{"1":"585","2":"<dbl [25]>","3":"26942.52"},{"1":"586","2":"<dbl [25]>","3":"25276.04"},{"1":"587","2":"<dbl [25]>","3":"26681.32"},{"1":"588","2":"<dbl [25]>","3":"28373.96"},{"1":"589","2":"<dbl [25]>","3":"25010.36"},{"1":"590","2":"<dbl [25]>","3":"24359.36"},{"1":"591","2":"<dbl [25]>","3":"26894.16"},{"1":"592","2":"<dbl [25]>","3":"27143.52"},{"1":"593","2":"<dbl [25]>","3":"25398.12"},{"1":"594","2":"<dbl [25]>","3":"24749.56"},{"1":"595","2":"<dbl [25]>","3":"26796.12"},{"1":"596","2":"<dbl [25]>","3":"24289.96"},{"1":"597","2":"<dbl [25]>","3":"23161.20"},{"1":"598","2":"<dbl [25]>","3":"24934.76"},{"1":"599","2":"<dbl [25]>","3":"24526.16"},{"1":"600","2":"<dbl [25]>","3":"23359.96"},{"1":"601","2":"<dbl [25]>","3":"24006.44"},{"1":"602","2":"<dbl [25]>","3":"23286.64"},{"1":"603","2":"<dbl [25]>","3":"26117.32"},{"1":"604","2":"<dbl [25]>","3":"22970.36"},{"1":"605","2":"<dbl [25]>","3":"27943.96"},{"1":"606","2":"<dbl [25]>","3":"23304.16"},{"1":"607","2":"<dbl [25]>","3":"23804.12"},{"1":"608","2":"<dbl [25]>","3":"24212.00"},{"1":"609","2":"<dbl [25]>","3":"23818.64"},{"1":"610","2":"<dbl [25]>","3":"25426.20"},{"1":"611","2":"<dbl [25]>","3":"26247.04"},{"1":"612","2":"<dbl [25]>","3":"18814.60"},{"1":"613","2":"<dbl [25]>","3":"22636.52"},{"1":"614","2":"<dbl [25]>","3":"29667.88"},{"1":"615","2":"<dbl [25]>","3":"24194.12"},{"1":"616","2":"<dbl [25]>","3":"26229.88"},{"1":"617","2":"<dbl [25]>","3":"20820.40"},{"1":"618","2":"<dbl [25]>","3":"27754.88"},{"1":"619","2":"<dbl [25]>","3":"22925.68"},{"1":"620","2":"<dbl [25]>","3":"28010.92"},{"1":"621","2":"<dbl [25]>","3":"22720.52"},{"1":"622","2":"<dbl [25]>","3":"25401.88"},{"1":"623","2":"<dbl [25]>","3":"24672.64"},{"1":"624","2":"<dbl [25]>","3":"25571.96"},{"1":"625","2":"<dbl [25]>","3":"26610.40"},{"1":"626","2":"<dbl [25]>","3":"24530.08"},{"1":"627","2":"<dbl [25]>","3":"26664.20"},{"1":"628","2":"<dbl [25]>","3":"22553.60"},{"1":"629","2":"<dbl [25]>","3":"26359.44"},{"1":"630","2":"<dbl [25]>","3":"24061.20"},{"1":"631","2":"<dbl [25]>","3":"30257.76"},{"1":"632","2":"<dbl [25]>","3":"24561.56"},{"1":"633","2":"<dbl [25]>","3":"19749.88"},{"1":"634","2":"<dbl [25]>","3":"26981.56"},{"1":"635","2":"<dbl [25]>","3":"24437.16"},{"1":"636","2":"<dbl [25]>","3":"27590.96"},{"1":"637","2":"<dbl [25]>","3":"22924.24"},{"1":"638","2":"<dbl [25]>","3":"23999.76"},{"1":"639","2":"<dbl [25]>","3":"26704.92"},{"1":"640","2":"<dbl [25]>","3":"28621.60"},{"1":"641","2":"<dbl [25]>","3":"22116.84"},{"1":"642","2":"<dbl [25]>","3":"23036.68"},{"1":"643","2":"<dbl [25]>","3":"30775.84"},{"1":"644","2":"<dbl [25]>","3":"26331.40"},{"1":"645","2":"<dbl [25]>","3":"22328.84"},{"1":"646","2":"<dbl [25]>","3":"25405.72"},{"1":"647","2":"<dbl [25]>","3":"21512.48"},{"1":"648","2":"<dbl [25]>","3":"23925.84"},{"1":"649","2":"<dbl [25]>","3":"25002.64"},{"1":"650","2":"<dbl [25]>","3":"23261.28"},{"1":"651","2":"<dbl [25]>","3":"21374.32"},{"1":"652","2":"<dbl [25]>","3":"26562.32"},{"1":"653","2":"<dbl [25]>","3":"24884.00"},{"1":"654","2":"<dbl [25]>","3":"25001.32"},{"1":"655","2":"<dbl [25]>","3":"25574.56"},{"1":"656","2":"<dbl [25]>","3":"23488.52"},{"1":"657","2":"<dbl [25]>","3":"27245.84"},{"1":"658","2":"<dbl [25]>","3":"25943.08"},{"1":"659","2":"<dbl [25]>","3":"25978.88"},{"1":"660","2":"<dbl [25]>","3":"22600.88"},{"1":"661","2":"<dbl [25]>","3":"24534.28"},{"1":"662","2":"<dbl [25]>","3":"20494.44"},{"1":"663","2":"<dbl [25]>","3":"27101.96"},{"1":"664","2":"<dbl [25]>","3":"27229.36"},{"1":"665","2":"<dbl [25]>","3":"27519.20"},{"1":"666","2":"<dbl [25]>","3":"30418.76"},{"1":"667","2":"<dbl [25]>","3":"25070.40"},{"1":"668","2":"<dbl [25]>","3":"25491.32"},{"1":"669","2":"<dbl [25]>","3":"25443.00"},{"1":"670","2":"<dbl [25]>","3":"25663.28"},{"1":"671","2":"<dbl [25]>","3":"23008.28"},{"1":"672","2":"<dbl [25]>","3":"27524.20"},{"1":"673","2":"<dbl [25]>","3":"22936.04"},{"1":"674","2":"<dbl [25]>","3":"24130.72"},{"1":"675","2":"<dbl [25]>","3":"29595.68"},{"1":"676","2":"<dbl [25]>","3":"25666.16"},{"1":"677","2":"<dbl [25]>","3":"23078.76"},{"1":"678","2":"<dbl [25]>","3":"25131.00"},{"1":"679","2":"<dbl [25]>","3":"23404.08"},{"1":"680","2":"<dbl [25]>","3":"23660.84"},{"1":"681","2":"<dbl [25]>","3":"23760.72"},{"1":"682","2":"<dbl [25]>","3":"24691.40"},{"1":"683","2":"<dbl [25]>","3":"26048.16"},{"1":"684","2":"<dbl [25]>","3":"21767.56"},{"1":"685","2":"<dbl [25]>","3":"29155.76"},{"1":"686","2":"<dbl [25]>","3":"23886.92"},{"1":"687","2":"<dbl [25]>","3":"24553.52"},{"1":"688","2":"<dbl [25]>","3":"26755.32"},{"1":"689","2":"<dbl [25]>","3":"24790.04"},{"1":"690","2":"<dbl [25]>","3":"25786.84"},{"1":"691","2":"<dbl [25]>","3":"24115.40"},{"1":"692","2":"<dbl [25]>","3":"25471.60"},{"1":"693","2":"<dbl [25]>","3":"23189.68"},{"1":"694","2":"<dbl [25]>","3":"27569.08"},{"1":"695","2":"<dbl [25]>","3":"26250.32"},{"1":"696","2":"<dbl [25]>","3":"23400.80"},{"1":"697","2":"<dbl [25]>","3":"24262.88"},{"1":"698","2":"<dbl [25]>","3":"25542.24"},{"1":"699","2":"<dbl [25]>","3":"22293.56"},{"1":"700","2":"<dbl [25]>","3":"23872.36"},{"1":"701","2":"<dbl [25]>","3":"24157.64"},{"1":"702","2":"<dbl [25]>","3":"24235.64"},{"1":"703","2":"<dbl [25]>","3":"27227.68"},{"1":"704","2":"<dbl [25]>","3":"26612.08"},{"1":"705","2":"<dbl [25]>","3":"22641.96"},{"1":"706","2":"<dbl [25]>","3":"28586.72"},{"1":"707","2":"<dbl [25]>","3":"24758.92"},{"1":"708","2":"<dbl [25]>","3":"24505.32"},{"1":"709","2":"<dbl [25]>","3":"24324.12"},{"1":"710","2":"<dbl [25]>","3":"28385.56"},{"1":"711","2":"<dbl [25]>","3":"26999.20"},{"1":"712","2":"<dbl [25]>","3":"21002.04"},{"1":"713","2":"<dbl [25]>","3":"22720.72"},{"1":"714","2":"<dbl [25]>","3":"28862.96"},{"1":"715","2":"<dbl [25]>","3":"24625.84"},{"1":"716","2":"<dbl [25]>","3":"28090.76"},{"1":"717","2":"<dbl [25]>","3":"20897.40"},{"1":"718","2":"<dbl [25]>","3":"22457.04"},{"1":"719","2":"<dbl [25]>","3":"24483.12"},{"1":"720","2":"<dbl [25]>","3":"23883.60"},{"1":"721","2":"<dbl [25]>","3":"23187.68"},{"1":"722","2":"<dbl [25]>","3":"24987.40"},{"1":"723","2":"<dbl [25]>","3":"27197.76"},{"1":"724","2":"<dbl [25]>","3":"26498.64"},{"1":"725","2":"<dbl [25]>","3":"25464.56"},{"1":"726","2":"<dbl [25]>","3":"19362.92"},{"1":"727","2":"<dbl [25]>","3":"25748.24"},{"1":"728","2":"<dbl [25]>","3":"23543.52"},{"1":"729","2":"<dbl [25]>","3":"23700.96"},{"1":"730","2":"<dbl [25]>","3":"22742.12"},{"1":"731","2":"<dbl [25]>","3":"25964.76"},{"1":"732","2":"<dbl [25]>","3":"23608.80"},{"1":"733","2":"<dbl [25]>","3":"24103.84"},{"1":"734","2":"<dbl [25]>","3":"22469.76"},{"1":"735","2":"<dbl [25]>","3":"21329.20"},{"1":"736","2":"<dbl [25]>","3":"26611.64"},{"1":"737","2":"<dbl [25]>","3":"25027.68"},{"1":"738","2":"<dbl [25]>","3":"23012.88"},{"1":"739","2":"<dbl [25]>","3":"27831.24"},{"1":"740","2":"<dbl [25]>","3":"25905.72"},{"1":"741","2":"<dbl [25]>","3":"27624.12"},{"1":"742","2":"<dbl [25]>","3":"22520.60"},{"1":"743","2":"<dbl [25]>","3":"28588.24"},{"1":"744","2":"<dbl [25]>","3":"22299.68"},{"1":"745","2":"<dbl [25]>","3":"27942.80"},{"1":"746","2":"<dbl [25]>","3":"25418.28"},{"1":"747","2":"<dbl [25]>","3":"25549.20"},{"1":"748","2":"<dbl [25]>","3":"23886.20"},{"1":"749","2":"<dbl [25]>","3":"24475.20"},{"1":"750","2":"<dbl [25]>","3":"23875.08"},{"1":"751","2":"<dbl [25]>","3":"23482.48"},{"1":"752","2":"<dbl [25]>","3":"22522.68"},{"1":"753","2":"<dbl [25]>","3":"24685.24"},{"1":"754","2":"<dbl [25]>","3":"26498.96"},{"1":"755","2":"<dbl [25]>","3":"27132.60"},{"1":"756","2":"<dbl [25]>","3":"23536.04"},{"1":"757","2":"<dbl [25]>","3":"24886.80"},{"1":"758","2":"<dbl [25]>","3":"25226.04"},{"1":"759","2":"<dbl [25]>","3":"22465.92"},{"1":"760","2":"<dbl [25]>","3":"25929.68"},{"1":"761","2":"<dbl [25]>","3":"26694.04"},{"1":"762","2":"<dbl [25]>","3":"26269.20"},{"1":"763","2":"<dbl [25]>","3":"22508.36"},{"1":"764","2":"<dbl [25]>","3":"25310.64"},{"1":"765","2":"<dbl [25]>","3":"28847.60"},{"1":"766","2":"<dbl [25]>","3":"23912.76"},{"1":"767","2":"<dbl [25]>","3":"21318.44"},{"1":"768","2":"<dbl [25]>","3":"21346.52"},{"1":"769","2":"<dbl [25]>","3":"22102.04"},{"1":"770","2":"<dbl [25]>","3":"26904.00"},{"1":"771","2":"<dbl [25]>","3":"21457.96"},{"1":"772","2":"<dbl [25]>","3":"22882.80"},{"1":"773","2":"<dbl [25]>","3":"28734.72"},{"1":"774","2":"<dbl [25]>","3":"24387.08"},{"1":"775","2":"<dbl [25]>","3":"25742.84"},{"1":"776","2":"<dbl [25]>","3":"28130.60"},{"1":"777","2":"<dbl [25]>","3":"22090.96"},{"1":"778","2":"<dbl [25]>","3":"27967.88"},{"1":"779","2":"<dbl [25]>","3":"23810.60"},{"1":"780","2":"<dbl [25]>","3":"23416.52"},{"1":"781","2":"<dbl [25]>","3":"22816.44"},{"1":"782","2":"<dbl [25]>","3":"26835.64"},{"1":"783","2":"<dbl [25]>","3":"25778.44"},{"1":"784","2":"<dbl [25]>","3":"26807.72"},{"1":"785","2":"<dbl [25]>","3":"26806.44"},{"1":"786","2":"<dbl [25]>","3":"23711.76"},{"1":"787","2":"<dbl [25]>","3":"29465.76"},{"1":"788","2":"<dbl [25]>","3":"28016.36"},{"1":"789","2":"<dbl [25]>","3":"24523.04"},{"1":"790","2":"<dbl [25]>","3":"23567.28"},{"1":"791","2":"<dbl [25]>","3":"23100.68"},{"1":"792","2":"<dbl [25]>","3":"24202.40"},{"1":"793","2":"<dbl [25]>","3":"20056.00"},{"1":"794","2":"<dbl [25]>","3":"24925.20"},{"1":"795","2":"<dbl [25]>","3":"22588.76"},{"1":"796","2":"<dbl [25]>","3":"25837.20"},{"1":"797","2":"<dbl [25]>","3":"26053.36"},{"1":"798","2":"<dbl [25]>","3":"22210.40"},{"1":"799","2":"<dbl [25]>","3":"27044.40"},{"1":"800","2":"<dbl [25]>","3":"22336.00"},{"1":"801","2":"<dbl [25]>","3":"25117.44"},{"1":"802","2":"<dbl [25]>","3":"23160.76"},{"1":"803","2":"<dbl [25]>","3":"23258.92"},{"1":"804","2":"<dbl [25]>","3":"25219.04"},{"1":"805","2":"<dbl [25]>","3":"25660.80"},{"1":"806","2":"<dbl [25]>","3":"24569.24"},{"1":"807","2":"<dbl [25]>","3":"27008.04"},{"1":"808","2":"<dbl [25]>","3":"26803.68"},{"1":"809","2":"<dbl [25]>","3":"23106.64"},{"1":"810","2":"<dbl [25]>","3":"27129.24"},{"1":"811","2":"<dbl [25]>","3":"26240.72"},{"1":"812","2":"<dbl [25]>","3":"25847.64"},{"1":"813","2":"<dbl [25]>","3":"32223.44"},{"1":"814","2":"<dbl [25]>","3":"28480.20"},{"1":"815","2":"<dbl [25]>","3":"25361.76"},{"1":"816","2":"<dbl [25]>","3":"25146.36"},{"1":"817","2":"<dbl [25]>","3":"26400.60"},{"1":"818","2":"<dbl [25]>","3":"25720.12"},{"1":"819","2":"<dbl [25]>","3":"22506.56"},{"1":"820","2":"<dbl [25]>","3":"26518.52"},{"1":"821","2":"<dbl [25]>","3":"27616.00"},{"1":"822","2":"<dbl [25]>","3":"23431.28"},{"1":"823","2":"<dbl [25]>","3":"25666.56"},{"1":"824","2":"<dbl [25]>","3":"25779.04"},{"1":"825","2":"<dbl [25]>","3":"28559.52"},{"1":"826","2":"<dbl [25]>","3":"24186.72"},{"1":"827","2":"<dbl [25]>","3":"22829.96"},{"1":"828","2":"<dbl [25]>","3":"22270.76"},{"1":"829","2":"<dbl [25]>","3":"25994.56"},{"1":"830","2":"<dbl [25]>","3":"24824.08"},{"1":"831","2":"<dbl [25]>","3":"25685.48"},{"1":"832","2":"<dbl [25]>","3":"26082.28"},{"1":"833","2":"<dbl [25]>","3":"23645.60"},{"1":"834","2":"<dbl [25]>","3":"25719.08"},{"1":"835","2":"<dbl [25]>","3":"28080.92"},{"1":"836","2":"<dbl [25]>","3":"26715.96"},{"1":"837","2":"<dbl [25]>","3":"23531.52"},{"1":"838","2":"<dbl [25]>","3":"24335.60"},{"1":"839","2":"<dbl [25]>","3":"23118.00"},{"1":"840","2":"<dbl [25]>","3":"22416.36"},{"1":"841","2":"<dbl [25]>","3":"25871.80"},{"1":"842","2":"<dbl [25]>","3":"23766.64"},{"1":"843","2":"<dbl [25]>","3":"29668.64"},{"1":"844","2":"<dbl [25]>","3":"29241.84"},{"1":"845","2":"<dbl [25]>","3":"23452.56"},{"1":"846","2":"<dbl [25]>","3":"27561.24"},{"1":"847","2":"<dbl [25]>","3":"23702.72"},{"1":"848","2":"<dbl [25]>","3":"23459.12"},{"1":"849","2":"<dbl [25]>","3":"26135.20"},{"1":"850","2":"<dbl [25]>","3":"25751.44"},{"1":"851","2":"<dbl [25]>","3":"29137.16"},{"1":"852","2":"<dbl [25]>","3":"25574.40"},{"1":"853","2":"<dbl [25]>","3":"23356.64"},{"1":"854","2":"<dbl [25]>","3":"23705.16"},{"1":"855","2":"<dbl [25]>","3":"23520.04"},{"1":"856","2":"<dbl [25]>","3":"27342.08"},{"1":"857","2":"<dbl [25]>","3":"24543.04"},{"1":"858","2":"<dbl [25]>","3":"25434.00"},{"1":"859","2":"<dbl [25]>","3":"24474.56"},{"1":"860","2":"<dbl [25]>","3":"25679.88"},{"1":"861","2":"<dbl [25]>","3":"27688.84"},{"1":"862","2":"<dbl [25]>","3":"27810.12"},{"1":"863","2":"<dbl [25]>","3":"25418.00"},{"1":"864","2":"<dbl [25]>","3":"26488.76"},{"1":"865","2":"<dbl [25]>","3":"23995.08"},{"1":"866","2":"<dbl [25]>","3":"21271.76"},{"1":"867","2":"<dbl [25]>","3":"26461.76"},{"1":"868","2":"<dbl [25]>","3":"27401.32"},{"1":"869","2":"<dbl [25]>","3":"25099.80"},{"1":"870","2":"<dbl [25]>","3":"24970.88"},{"1":"871","2":"<dbl [25]>","3":"27596.40"},{"1":"872","2":"<dbl [25]>","3":"22913.04"},{"1":"873","2":"<dbl [25]>","3":"21710.00"},{"1":"874","2":"<dbl [25]>","3":"22121.60"},{"1":"875","2":"<dbl [25]>","3":"22958.60"},{"1":"876","2":"<dbl [25]>","3":"29327.44"},{"1":"877","2":"<dbl [25]>","3":"27138.04"},{"1":"878","2":"<dbl [25]>","3":"24025.40"},{"1":"879","2":"<dbl [25]>","3":"25458.28"},{"1":"880","2":"<dbl [25]>","3":"27423.84"},{"1":"881","2":"<dbl [25]>","3":"22034.12"},{"1":"882","2":"<dbl [25]>","3":"27982.24"},{"1":"883","2":"<dbl [25]>","3":"25279.28"},{"1":"884","2":"<dbl [25]>","3":"27072.12"},{"1":"885","2":"<dbl [25]>","3":"24510.84"},{"1":"886","2":"<dbl [25]>","3":"28957.96"},{"1":"887","2":"<dbl [25]>","3":"24183.92"},{"1":"888","2":"<dbl [25]>","3":"26845.64"},{"1":"889","2":"<dbl [25]>","3":"24446.92"},{"1":"890","2":"<dbl [25]>","3":"23880.12"},{"1":"891","2":"<dbl [25]>","3":"22689.00"},{"1":"892","2":"<dbl [25]>","3":"24114.00"},{"1":"893","2":"<dbl [25]>","3":"26073.56"},{"1":"894","2":"<dbl [25]>","3":"24697.40"},{"1":"895","2":"<dbl [25]>","3":"25601.76"},{"1":"896","2":"<dbl [25]>","3":"24127.04"},{"1":"897","2":"<dbl [25]>","3":"20521.20"},{"1":"898","2":"<dbl [25]>","3":"26516.28"},{"1":"899","2":"<dbl [25]>","3":"23599.80"},{"1":"900","2":"<dbl [25]>","3":"25628.20"},{"1":"901","2":"<dbl [25]>","3":"25874.92"},{"1":"902","2":"<dbl [25]>","3":"23570.60"},{"1":"903","2":"<dbl [25]>","3":"26382.12"},{"1":"904","2":"<dbl [25]>","3":"23232.52"},{"1":"905","2":"<dbl [25]>","3":"24583.52"},{"1":"906","2":"<dbl [25]>","3":"25838.72"},{"1":"907","2":"<dbl [25]>","3":"25477.60"},{"1":"908","2":"<dbl [25]>","3":"24764.32"},{"1":"909","2":"<dbl [25]>","3":"23892.32"},{"1":"910","2":"<dbl [25]>","3":"26162.32"},{"1":"911","2":"<dbl [25]>","3":"30191.52"},{"1":"912","2":"<dbl [25]>","3":"24626.36"},{"1":"913","2":"<dbl [25]>","3":"23736.52"},{"1":"914","2":"<dbl [25]>","3":"24602.96"},{"1":"915","2":"<dbl [25]>","3":"25692.28"},{"1":"916","2":"<dbl [25]>","3":"26298.88"},{"1":"917","2":"<dbl [25]>","3":"25051.64"},{"1":"918","2":"<dbl [25]>","3":"25789.36"},{"1":"919","2":"<dbl [25]>","3":"25313.80"},{"1":"920","2":"<dbl [25]>","3":"26095.60"},{"1":"921","2":"<dbl [25]>","3":"26898.36"},{"1":"922","2":"<dbl [25]>","3":"24331.84"},{"1":"923","2":"<dbl [25]>","3":"28581.84"},{"1":"924","2":"<dbl [25]>","3":"25517.56"},{"1":"925","2":"<dbl [25]>","3":"24334.28"},{"1":"926","2":"<dbl [25]>","3":"29219.04"},{"1":"927","2":"<dbl [25]>","3":"26474.28"},{"1":"928","2":"<dbl [25]>","3":"26589.72"},{"1":"929","2":"<dbl [25]>","3":"22718.84"},{"1":"930","2":"<dbl [25]>","3":"23917.64"},{"1":"931","2":"<dbl [25]>","3":"28450.24"},{"1":"932","2":"<dbl [25]>","3":"27025.32"},{"1":"933","2":"<dbl [25]>","3":"24518.20"},{"1":"934","2":"<dbl [25]>","3":"25274.28"},{"1":"935","2":"<dbl [25]>","3":"24566.12"},{"1":"936","2":"<dbl [25]>","3":"25383.16"},{"1":"937","2":"<dbl [25]>","3":"26897.44"},{"1":"938","2":"<dbl [25]>","3":"25323.40"},{"1":"939","2":"<dbl [25]>","3":"24481.60"},{"1":"940","2":"<dbl [25]>","3":"25598.56"},{"1":"941","2":"<dbl [25]>","3":"25892.76"},{"1":"942","2":"<dbl [25]>","3":"24776.16"},{"1":"943","2":"<dbl [25]>","3":"24028.12"},{"1":"944","2":"<dbl [25]>","3":"24949.00"},{"1":"945","2":"<dbl [25]>","3":"22977.48"},{"1":"946","2":"<dbl [25]>","3":"24275.96"},{"1":"947","2":"<dbl [25]>","3":"25705.24"},{"1":"948","2":"<dbl [25]>","3":"26166.80"},{"1":"949","2":"<dbl [25]>","3":"24187.32"},{"1":"950","2":"<dbl [25]>","3":"24912.92"},{"1":"951","2":"<dbl [25]>","3":"26534.72"},{"1":"952","2":"<dbl [25]>","3":"24864.68"},{"1":"953","2":"<dbl [25]>","3":"28550.80"},{"1":"954","2":"<dbl [25]>","3":"29823.04"},{"1":"955","2":"<dbl [25]>","3":"30236.88"},{"1":"956","2":"<dbl [25]>","3":"27027.40"},{"1":"957","2":"<dbl [25]>","3":"26171.84"},{"1":"958","2":"<dbl [25]>","3":"24920.92"},{"1":"959","2":"<dbl [25]>","3":"27733.96"},{"1":"960","2":"<dbl [25]>","3":"24882.16"},{"1":"961","2":"<dbl [25]>","3":"27178.64"},{"1":"962","2":"<dbl [25]>","3":"22944.28"},{"1":"963","2":"<dbl [25]>","3":"26738.24"},{"1":"964","2":"<dbl [25]>","3":"23721.84"},{"1":"965","2":"<dbl [25]>","3":"28000.72"},{"1":"966","2":"<dbl [25]>","3":"22587.80"},{"1":"967","2":"<dbl [25]>","3":"25067.48"},{"1":"968","2":"<dbl [25]>","3":"27377.40"},{"1":"969","2":"<dbl [25]>","3":"27768.92"},{"1":"970","2":"<dbl [25]>","3":"22327.80"},{"1":"971","2":"<dbl [25]>","3":"22956.32"},{"1":"972","2":"<dbl [25]>","3":"22206.72"},{"1":"973","2":"<dbl [25]>","3":"23518.88"},{"1":"974","2":"<dbl [25]>","3":"23267.08"},{"1":"975","2":"<dbl [25]>","3":"28004.08"},{"1":"976","2":"<dbl [25]>","3":"21803.96"},{"1":"977","2":"<dbl [25]>","3":"24217.44"},{"1":"978","2":"<dbl [25]>","3":"23591.12"},{"1":"979","2":"<dbl [25]>","3":"27038.04"},{"1":"980","2":"<dbl [25]>","3":"24861.36"},{"1":"981","2":"<dbl [25]>","3":"24289.92"},{"1":"982","2":"<dbl [25]>","3":"28434.08"},{"1":"983","2":"<dbl [25]>","3":"23139.04"},{"1":"984","2":"<dbl [25]>","3":"26711.48"},{"1":"985","2":"<dbl [25]>","3":"26162.68"},{"1":"986","2":"<dbl [25]>","3":"23171.72"},{"1":"987","2":"<dbl [25]>","3":"23685.00"},{"1":"988","2":"<dbl [25]>","3":"25021.16"},{"1":"989","2":"<dbl [25]>","3":"25788.24"},{"1":"990","2":"<dbl [25]>","3":"24561.56"},{"1":"991","2":"<dbl [25]>","3":"30510.44"},{"1":"992","2":"<dbl [25]>","3":"28458.96"},{"1":"993","2":"<dbl [25]>","3":"28120.08"},{"1":"994","2":"<dbl [25]>","3":"26970.04"},{"1":"995","2":"<dbl [25]>","3":"22929.48"},{"1":"996","2":"<dbl [25]>","3":"26955.12"},{"1":"997","2":"<dbl [25]>","3":"23157.12"},{"1":"998","2":"<dbl [25]>","3":"23736.36"},{"1":"999","2":"<dbl [25]>","3":"25679.52"},{"1":"1000","2":"<dbl [25]>","3":"29057.76"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>In words, set up the 1000 simulations and work rowwise as before, then take (for each row) a bootstrap sample of the attendances, and then take the mean of it. I’ve saved the resulting dataframe so that we can look at it and then do something else with it. The column <code>s</code> containing the samples is a list-column again.</p>
<p>Our question was whether this bootstrapped sampling distribution of the sample mean looked like a normal distribution. To see that, a normal quantile plot is the thing:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(d, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> m)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_qq_line</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/tidy-simulation/index_files/figure-html/unnamed-chunk-24-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>That is very close to a normal distribution, and so in fact the <img src="https://latex.codecogs.com/png.latex?t">-procedure really is fine and the first school of thought is correct (and now we have <em>evidence</em>, no hand-waving required):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(jays<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>attendance)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    One Sample t-test

data:  jays$attendance
t = 11.389, df = 24, p-value = 3.661e-11
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 20526.82 29613.50
sample estimates:
mean of x 
 25070.16 </code></pre>
</div>
</div>
<p>A 95% confidence interval for the mean attendance goes from 20500 to 29600.</p>
<p>Another way to go is to use the bootstrapped sampling distribution directly, entirely bypassing all the normal theory, and just take the middle 95% of it:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ci =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(m, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.025</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.975</span>)))</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.</code></pre>
</div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["ci"],"name":[1],"type":["dbl"],"align":["right"]}],"data":[{"1":"20978.06"},{"1":"29667.90"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>21000 to 29700, not that different (given the large amount of variability) from the <img src="https://latex.codecogs.com/png.latex?t">-interval. There are better ways to get the interval rather than using sample quantiles; see for example <a href="https://acclab.github.io/bootstrap-confidence-intervals.html">here</a>. But this will do for now.</p>
<p>The <code>ungroup</code> in the code is there because the dataframe <code>d</code> is still <code>rowwise</code>: everything we do with <code>d</code> will still be done one row at a time. But now we want to work on the whole column <code>m</code>, so we have to undo the <code>rowwise</code> first. <code>rowwise</code> is a special case of <code>group_by</code> (a sort of group-by-rows), so you undo <code>rowwise</code> in the same way that you undo <code>group_by</code>.</p>
</section>
<section id="power-by-simulation" class="level2">
<h2 class="anchored" data-anchor-id="power-by-simulation">Power by simulation</h2>
<section id="power-of-a-test" class="level3">
<h3 class="anchored" data-anchor-id="power-of-a-test">Power of a test</h3>
<p>R has things like <code>power.t.test</code> that will allow you to calculate the power of one- and two-sample <img src="https://latex.codecogs.com/png.latex?t">-tests for normally-distributed populations. But what if you want to find out the power of some other test, or of a <img src="https://latex.codecogs.com/png.latex?t">-test under other assumptions about the population distribution? We need to have a mechanism for simulating power.</p>
<p>Let’s start off simple with one where we can check the answer. Let’s suppose that our population is normal with mean 110 and SD 30, and we have a sample of size 20. How likely are we to (correctly) reject the null hypothesis than the mean is 100, in favour of the alternative that the mean is greater than 100?</p>
<p>The exact answer is this, using <img src="https://latex.codecogs.com/png.latex?%5Calpha%20=%200.05">:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb41-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">power.t.test</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">delta =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one.sample"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one.sided"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
     One-sample t test power calculation 

              n = 20
          delta = 10
             sd = 30
      sig.level = 0.05
          power = 0.4178514
    alternative = one.sided</code></pre>
</div>
</div>
<p>Simulating this gives a rather more detailed handle on what is actually going on. The idea is to draw lots of samples from <em>the truth</em>, test the (incorrect) null, and grab the P-value each time, then count how many of those P-values are less than 0.05 (or whatever your <img src="https://latex.codecogs.com/png.latex?%5Calpha"> is). The true population here is normal with mean 110 and SD 30, and our sample size is 20:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb43-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb43-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb43-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb43-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb43-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb43-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"5676"},{"1":"TRUE","2":"4324"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The estimated power is 42.6%, a little bigger than but not far from the correct answer.<sup>15</sup></p>
<p>I did this in several steps. After the <code>rowwise</code>, I drew a (list-column of) samples from the true population, then I ran a one-sample <img src="https://latex.codecogs.com/png.latex?t">-test to test whether the population mean is greater than 100 (and saved all the <code>t.test</code> output), then I extracted the P-value, then I counted how many of those P-values were 0.05 or less. I laid it out this way so that you can adapt for your purposes; you could change the population distribution, or the test, and the procedure will still work.<sup>16</sup></p>
<p>The usual practical reason for wanting to get power is before an experiment is run: this is the sample size you plan to use, this is what you think the (true) population is, this is the null hypothesis you would like to reject. Except that this is not quite what usually happens in practice; usually you have a target power in mind, like 0.80, and you want to know what <em>sample size</em> you need in order to achieve that power.</p>
<p>With <code>power.t.test</code>, this is as simple as putting in <code>power</code> and leaving out <code>n</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb44" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb44-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">power.t.test</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">power =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.80</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">delta =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sd =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">type =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one.sample"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"one.sided"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
     One-sample t test power calculation 

              n = 57.02048
          delta = 10
             sd = 30
      sig.level = 0.05
          power = 0.8
    alternative = one.sided</code></pre>
</div>
</div>
<p>and the sample size has to be 58 (rounding up).</p>
<p>But by simulation, <code>n</code> has to be input to the simulation and <code>power</code> is the output. So the best we can do is to try different sample sizes and see which one gets us closest to the power we are aiming for. A sample size of 20 is, we know, too small, but what about 40? One change to the previous code:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb46" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb46-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">40</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb46-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"3388"},{"1":"TRUE","2":"6612"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This estimates the power to be 66%, not big enough, so the sample size needs to be bigger still.</p>
<p>Another problem is that this is only an <em>estimate</em> of the power, based on “only” 10,000 simulations. It could be that the power for a sample size of 40 is higher than this. But how much higher? The <code>binom.test</code> idea from earlier gives us a confidence interval for the true power:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb47-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6612</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  6612 and 10000
number of successes = 6612, number of trials = 10000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.6518275 0.6704785
sample estimates:
probability of success 
                0.6612 </code></pre>
</div>
</div>
<p>The power (between 0.652 and 0.670) is evidently not high enough yet, so we need a bigger sample size. 60?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb49-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb49-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb49-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb49-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb49-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb49-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"1848"},{"1":"TRUE","2":"8152"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>That looks pretty close. What does the confidence interval look like?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb50" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb50-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8152</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  8152 and 10000
number of successes = 8152, number of trials = 10000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.8074513 0.8227647
sample estimates:
probability of success 
                0.8152 </code></pre>
</div>
</div>
<p>The 95% CI for the true power goes from 0.807 to 0.823, which is a little too high, so the sample size I need is a little under 60. Now you see the reason for doing 10,000 simulations instead of only 1000: I’ve nailed down the true power rather accurately. Compare this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb52" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb52-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb52-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"189"},{"1":"TRUE","2":"811"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb53-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">811</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  811 and 1000
number of successes = 811, number of trials = 1000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.7853323 0.8348215
sample estimates:
probability of success 
                 0.811 </code></pre>
</div>
</div>
<p>With this many simulations, we see that a sample size of 60 is as close as we are going to get, since 0.80 is inside this confidence interval.</p>
</section>
<section id="size-of-a-test" class="level3">
<h3 class="anchored" data-anchor-id="size-of-a-test">Size of a test</h3>
<p>If the true mean and the hypothesized mean are the same, then the null hypothesis is actually <em>true</em> and the probability of (now incorrectly) rejecting it should be 0.05. In a situation where the test is properly calibrated, this is not very interesting:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb55-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb55-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb55-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb55-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb55-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb55-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"9487"},{"1":"TRUE","2":"513"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb56" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb56-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">513</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  513 and 10000
number of successes = 513, number of trials = 10000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.04705735 0.05580631
sample estimates:
probability of success 
                0.0513 </code></pre>
</div>
</div>
<p>Now the true mean and the null mean are both 100, and the population distribution is normal, so the <img src="https://latex.codecogs.com/png.latex?t">-test must be appropriate, and the “power”, that is to say, the probability of a type I error, could indeed be 0.05.</p>
<p>But what if the population distribution is not normal? Then we have the Central Limit Theorem, which says that everything should still behave well “for large samples”, without actually telling us how big the sample has to be. One way to assess whether our sample is big enough, if we have data, is to estimate the bootstrap sampling distribution of the sample mean. If we don’t have data, we can suggest a distributional form for the population distribution, and see how the test behaves in that case: does the test still reject 5% of the time, when the null is true?</p>
<p>To be specific, let’s suppose we’re taking a sample of size 20 from the very right-skewed exponential distribution with mean 100. Does the <img src="https://latex.codecogs.com/png.latex?t">-test for the mean reject 5% of the time when the null mean is 100?<sup>17</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb58" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb58-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rexp</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">t_test =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(sample, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mu =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alternative =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"greater"</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">p_value =</span> t_test<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>p.value) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb58-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(p_value <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["p_value <= 0.05"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"9802"},{"1":"TRUE","2":"198"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb59" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb59-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binom.test</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">198</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Exact binomial test

data:  198 and 10000
number of successes = 198, number of trials = 10000, p-value &lt; 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.01716007 0.02272454
sample estimates:
probability of success 
                0.0198 </code></pre>
</div>
</div>
<p>Clearly it rejects too little of the time (confidence interval for type I error probability from 0.017 to 0.023). So a sample of size 20 is not big enough for the Central Limit Theorem to work in this case.</p>
<p>Another use for simulation is to understand the sampling distribution of a test statistic when we do not have theory to guide us. Let’s return to our normal population with mean 100 and SD 30. Suppose we now want to reject a mean of 100 in favour of the mean being greater than 100 if the sample <em>maximum</em> is large enough. How large should the sample maximum be?</p>
<p>The procedure is to generate samples from the truth (in this case the null is true), find the maximum of each simulated sample, and then find the 95th percentile of the simulated maxima:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb61" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb61-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_max =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(sample)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb61-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(sample_max, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.95</span>))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["pp"],"name":[1],"type":["dbl"],"align":["right"]}],"data":[{"1":"183.9445"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>We should reject a mean of 100 if the sample maximum is 184 or greater.</p>
<p>And now we can estimate the power of this test if the mean is actually 110:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb62" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb62-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sim =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">110</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">30</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sample_max =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(sample)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb62-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(sample_max <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">184</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["sample_max >= 184"],"name":[1],"type":["lgl"],"align":["right"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"FALSE","2":"871"},{"1":"TRUE","2":"129"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The power is now only about 13%, much less than for the test based on the mean (which was about 42%). The reason for the <code>ungroup</code> was that I wanted to count something for the <em>whole</em> dataframe, not one row at a time, so I had to undo the <code>rowwise</code>.</p>
</section>
</section>
<section id="final-remarks" class="level2">
<h2 class="anchored" data-anchor-id="final-remarks">Final remarks</h2>
<p>There is a lot of repetitiousness here. It would almost certainly be better to abstract the ideas of the simulation away into a function (that might have inputs the true parameter(s), the null parameter, the test and the population distribution), but one of the things I wanted to get across was that these all work the same way with a few small changes, which doesn’t come across quite so clearly by changing inputs to a function.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I will need to knit the multiple simulations in this blog post before I put it up, so I am sticking mostly with 1000, but you may be more patient than I am.↩︎</p></li>
<li id="fn2"><p>You can also do this with <code>map</code> from <code>purrr</code>, but I find the code more difficult to follow.↩︎</p></li>
<li id="fn3"><p>My base R heritage sometimes shows through.↩︎</p></li>
<li id="fn4"><p>The 7 is because using <code>lower.tail = FALSE</code> gives a probability strictly greater than the first input.↩︎</p></li>
<li id="fn5"><p>But one that is used again later.↩︎</p></li>
<li id="fn6"><p>Add six to the number in the bid to determine how many tricks the bidder and their partner are promising to win. Thus if you bid “two clubs” you are undertaking to win 8 of the 13 tricks between you with clubs as trumps.↩︎</p></li>
<li id="fn7"><p>In order to convey information, most bids say something about hand strength and the length of the suit bid, according to a system like Standard American or ACOL (British).↩︎</p></li>
<li id="fn8"><p>That is to say, out-of-this-universe lucky.↩︎</p></li>
<li id="fn9"><p>We could use a similar approach to estimate the probability of being dealt a <em>void</em>, a suit with no cards in it, but we would have to be more careful counting. Counting the number of different suits represented in the hand and seeing whether it is less than 4 would be one way.↩︎</p></li>
<li id="fn10"><p>In Standard American, 2 notrumps if you have no long or short suits, 2 clubs if you do, or your hand is stronger than 21 points. In one bidding system I know of, the <em>lowest</em> bid of 1 club is reserved for really strong hands like this!↩︎</p></li>
<li id="fn11"><p>Because that was easier to develop theory for.↩︎</p></li>
<li id="fn12"><p>Joke. You may laugh.↩︎</p></li>
<li id="fn13"><p>A third school would say “do a Bayesian analysis with suitable prior and likelihood model”, but that’s for another discussion.↩︎</p></li>
<li id="fn14"><p>There are problems with this, too, that I will go into another time.↩︎</p></li>
<li id="fn15"><p>I’m doing 10,000 simulations this time.↩︎</p></li>
<li id="fn16"><p>The part of the test output with the P-value in it might not be called <code>p.value</code> in your case. Investigate.↩︎</p></li>
<li id="fn17"><p><code>rexp</code>’s second input is the “rate”, the reciprocal of the mean.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <guid>https://blog.ritsokiguess.site/posts/tidy-simulation/</guid>
  <pubDate>Sun, 14 Nov 2021 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/tidy-simulation/Coin-Toss1.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Correcting a dataframe</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/correcting-a-dataframe/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Using <code>tidyverse</code> ideas to make some changes in a dataframe.</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tmaptools)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(leaflet)</span></code></pre></div></div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>So I had a dataframe today, in which I wanted to make some small corrections. Specifically, I had this one:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"http://ritsokiguess.site/datafiles/wisconsin.txt"</span></span>
<span id="cb4-2">wisc <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_table</span>(my_url)</span>
<span id="cb4-3">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(location)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton"},{"1":"Beloit"},{"1":"Fort.Atkinson"},{"1":"Madison"},{"1":"Marshfield"},{"1":"Milwaukee"},{"1":"Monroe"},{"1":"Superior"},{"1":"Wausau"},{"1":"Dubuque"},{"1":"St.Paul"},{"1":"Chicago"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>These are mostly, but not all, cities in Wisconsin, and I want to draw them on a map. To do that, though, I need to affix their states to them, and I thought a good starting point was to start by pretending that they were all in Wisconsin, and then correct the ones that aren’t:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(location) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">state =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"WI"</span>) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> wisc</span>
<span id="cb5-3">wisc</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]},{"label":["state"],"name":[2],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton","2":"WI"},{"1":"Beloit","2":"WI"},{"1":"Fort.Atkinson","2":"WI"},{"1":"Madison","2":"WI"},{"1":"Marshfield","2":"WI"},{"1":"Milwaukee","2":"WI"},{"1":"Monroe","2":"WI"},{"1":"Superior","2":"WI"},{"1":"Wausau","2":"WI"},{"1":"Dubuque","2":"WI"},{"1":"St.Paul","2":"WI"},{"1":"Chicago","2":"WI"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The last three cities are in the wrong state: Dubuque is in Iowa (IA), St.&nbsp;Paul in Minnesota (MN), and Chicago is in Illinois (IL). I know how to fix this in base R: I write something like</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">wisc<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>state[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"IL"</span></span></code></pre></div></div>
</div>
<p>but how do you do this the Tidyverse way?</p>
</section>
<section id="a-better-way" class="level2">
<h2 class="anchored" data-anchor-id="a-better-way">A better way</h2>
<p>The first step is to make a small dataframe with the cities that need to be corrected, and the states they are actually in:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">corrections <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tribble</span>(</span>
<span id="cb7-2">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>location, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>state,</span>
<span id="cb7-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dubuque"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"IA"</span>,</span>
<span id="cb7-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"St.Paul"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MN"</span>,</span>
<span id="cb7-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chicago"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"IL"</span></span>
<span id="cb7-6">)</span>
<span id="cb7-7">corrections</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]},{"label":["state"],"name":[2],"type":["chr"],"align":["left"]}],"data":[{"1":"Dubuque","2":"IA"},{"1":"St.Paul","2":"MN"},{"1":"Chicago","2":"IL"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Note that the columns of this dataframe have the <em>same names</em> as the ones in the original dataframe <code>wisc</code>.</p>
<p>So, I was thinking, this is a lookup table (of a sort), and so joining this to <code>wisc</code> might yield something helpful. We want to look up locations and <em>not</em> match states, since we want to have these three cities have their correct state as a possibility. So what does this do?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(corrections, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"location"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]},{"label":["state.x"],"name":[2],"type":["chr"],"align":["left"]},{"label":["state.y"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton","2":"WI","3":"NA"},{"1":"Beloit","2":"WI","3":"NA"},{"1":"Fort.Atkinson","2":"WI","3":"NA"},{"1":"Madison","2":"WI","3":"NA"},{"1":"Marshfield","2":"WI","3":"NA"},{"1":"Milwaukee","2":"WI","3":"NA"},{"1":"Monroe","2":"WI","3":"NA"},{"1":"Superior","2":"WI","3":"NA"},{"1":"Wausau","2":"WI","3":"NA"},{"1":"Dubuque","2":"WI","3":"IA"},{"1":"St.Paul","2":"WI","3":"MN"},{"1":"Chicago","2":"WI","3":"IL"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Now, we have <em>two</em> states for each city. The first one is always Wisconsin, and the second one is usually missing, but where the state in <code>state.y</code> has a value, <em>that</em> is the true state of the city. So, the thought process is that the actual <code>state</code> should be:</p>
<ul>
<li>if <code>state.y</code> is not missing, use that</li>
<li>else, use the value in <code>state.x</code>.</li>
</ul>
<p>I had an idea that there was a function that would do exactly this, only I couldn’t remember its name, so I couldn’t really search for it. My first thought was <a href="https://dplyr.tidyverse.org/reference/na_if.html"><code>na_if</code></a>. What this does is every time it sees a certain value, it replaces it with NA. This, though, is the opposite way from what I wanted. So I looked at the See Also, and saw <a href="https://tidyr.tidyverse.org/reference/replace_na.html"><code>replace_na</code></a>. This replaces NAs with a given value. Not quite right, but closer.</p>
<p>In the See Also for <code>replace_na</code>, I saw one more thing: <a href="https://dplyr.tidyverse.org/reference/coalesce.html"><code>coalesce</code></a>, “replace NAs with values from other vectors”. Was that what I was thinking of? It was. The way it works is that you feed it several vectors, and the first one that is not missing gives its value to the result. Hence, what I needed was this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(corrections, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"location"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">state=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">coalesce</span>(state.y, state.x))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]},{"label":["state.x"],"name":[2],"type":["chr"],"align":["left"]},{"label":["state.y"],"name":[3],"type":["chr"],"align":["left"]},{"label":["state"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton","2":"WI","3":"NA","4":"WI"},{"1":"Beloit","2":"WI","3":"NA","4":"WI"},{"1":"Fort.Atkinson","2":"WI","3":"NA","4":"WI"},{"1":"Madison","2":"WI","3":"NA","4":"WI"},{"1":"Marshfield","2":"WI","3":"NA","4":"WI"},{"1":"Milwaukee","2":"WI","3":"NA","4":"WI"},{"1":"Monroe","2":"WI","3":"NA","4":"WI"},{"1":"Superior","2":"WI","3":"NA","4":"WI"},{"1":"Wausau","2":"WI","3":"NA","4":"WI"},{"1":"Dubuque","2":"WI","3":"IA","4":"IA"},{"1":"St.Paul","2":"WI","3":"MN","4":"MN"},{"1":"Chicago","2":"WI","3":"IL","4":"IL"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Where <code>state.y</code> has a value, it is used; if it’s missing, the value in <code>state.x</code> is used instead.</p>
</section>
<section id="the-best-way" class="level2">
<h2 class="anchored" data-anchor-id="the-best-way">The best way</h2>
<p>I was quite pleased with myself for coming up with this, but I had missed the actual best way of doing this. In SQL, there is UPDATE, and what that does is to take a table of keys to look up and some new values for other columns to replace the ones in the original table. Because <code>dplyr</code> has a lot of things in common with SQL, it is perhaps no surprise that there is a <a href="https://dplyr.tidyverse.org/reference/rows.html"><code>rows_update</code></a>, and for this job it is as simple as this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rows_update</span>(corrections) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> wisc</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Matching, by = "location"</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">wisc</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["location"],"name":[1],"type":["chr"],"align":["left"]},{"label":["state"],"name":[2],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton","2":"WI"},{"1":"Beloit","2":"WI"},{"1":"Fort.Atkinson","2":"WI"},{"1":"Madison","2":"WI"},{"1":"Marshfield","2":"WI"},{"1":"Milwaukee","2":"WI"},{"1":"Monroe","2":"WI"},{"1":"Superior","2":"WI"},{"1":"Wausau","2":"WI"},{"1":"Dubuque","2":"IA"},{"1":"St.Paul","2":"MN"},{"1":"Chicago","2":"IL"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The values to look up (the “keys”) are by default in the first column, which is where they are in <code>corrections</code>. If they had not been, I would have used a <code>by</code> in the same way as with a join.</p>
<p>Mind. Blown. (Well, my mind was, anyway.)</p>
</section>
<section id="geocoding" class="level2">
<h2 class="anchored" data-anchor-id="geocoding">Geocoding</h2>
<p>I said I wanted to draw a map with these cities on it. For that, I need to look up the longitude and latitude of these places, and for <em>that</em>, I need to glue the state onto the name of each city, to make sure I don’t look up the wrong one. It is perhaps easy to forget that <code>unite</code> is the cleanest way of doing this, particularly if you don’t want the individual columns any more:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unite</span>(where, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(location, state), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> wisc</span>
<span id="cb13-2">wisc</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]}],"data":[{"1":"Appleton WI"},{"1":"Beloit WI"},{"1":"Fort.Atkinson WI"},{"1":"Madison WI"},{"1":"Marshfield WI"},{"1":"Milwaukee WI"},{"1":"Monroe WI"},{"1":"Superior WI"},{"1":"Wausau WI"},{"1":"Dubuque IA"},{"1":"St.Paul MN"},{"1":"Chicago IL"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The function <code>geocode_OSM</code> from <code>tmaptools</code> will find the longitude and latitude of a place. It expects <em>one</em> place as input, not a vector of placenames, so we will work <code>rowwise</code> to geocode one at a time. (Using <code>map</code> from <code>purrr</code> is also an option.) The geocoder returns a list, which contains, buried a little deeply, the longitudes and latitudes:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rowwise</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ll =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geocode_OSM</span>(where))) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> wisc</span>
<span id="cb14-4">wisc</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]},{"label":["ll"],"name":[2],"type":["list"],"align":["right"]}],"data":[{"1":"Appleton WI","2":"<named list [3]>"},{"1":"Beloit WI","2":"<named list [3]>"},{"1":"Fort.Atkinson WI","2":"<named list [3]>"},{"1":"Madison WI","2":"<named list [3]>"},{"1":"Marshfield WI","2":"<named list [3]>"},{"1":"Milwaukee WI","2":"<named list [3]>"},{"1":"Monroe WI","2":"<named list [3]>"},{"1":"Superior WI","2":"<named list [3]>"},{"1":"Wausau WI","2":"<named list [3]>"},{"1":"Dubuque IA","2":"<named list [3]>"},{"1":"St.Paul MN","2":"<named list [3]>"},{"1":"Chicago IL","2":"<named list [3]>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The column <code>ll</code> is a list-column, and the usual way to handle these is to <code>unnest</code>, but that isn’t quite right here:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>(ll)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]},{"label":["ll"],"name":[2],"type":["named list"],"align":["right"]}],"data":[{"1":"Appleton WI","2":"<chr [1]>"},{"1":"Appleton WI","2":"<dbl [2]>"},{"1":"Appleton WI","2":"<bbox>"},{"1":"Beloit WI","2":"<chr [1]>"},{"1":"Beloit WI","2":"<dbl [2]>"},{"1":"Beloit WI","2":"<bbox>"},{"1":"Fort.Atkinson WI","2":"<chr [1]>"},{"1":"Fort.Atkinson WI","2":"<dbl [2]>"},{"1":"Fort.Atkinson WI","2":"<bbox>"},{"1":"Madison WI","2":"<chr [1]>"},{"1":"Madison WI","2":"<dbl [2]>"},{"1":"Madison WI","2":"<bbox>"},{"1":"Marshfield WI","2":"<chr [1]>"},{"1":"Marshfield WI","2":"<dbl [2]>"},{"1":"Marshfield WI","2":"<bbox>"},{"1":"Milwaukee WI","2":"<chr [1]>"},{"1":"Milwaukee WI","2":"<dbl [2]>"},{"1":"Milwaukee WI","2":"<bbox>"},{"1":"Monroe WI","2":"<chr [1]>"},{"1":"Monroe WI","2":"<dbl [2]>"},{"1":"Monroe WI","2":"<bbox>"},{"1":"Superior WI","2":"<chr [1]>"},{"1":"Superior WI","2":"<dbl [2]>"},{"1":"Superior WI","2":"<bbox>"},{"1":"Wausau WI","2":"<chr [1]>"},{"1":"Wausau WI","2":"<dbl [2]>"},{"1":"Wausau WI","2":"<bbox>"},{"1":"Dubuque IA","2":"<chr [1]>"},{"1":"Dubuque IA","2":"<dbl [2]>"},{"1":"Dubuque IA","2":"<bbox>"},{"1":"St.Paul MN","2":"<chr [1]>"},{"1":"St.Paul MN","2":"<dbl [2]>"},{"1":"St.Paul MN","2":"<bbox>"},{"1":"Chicago IL","2":"<chr [1]>"},{"1":"Chicago IL","2":"<dbl [2]>"},{"1":"Chicago IL","2":"<bbox>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Unnesting a list of three things produces <em>three</em> rows for each city. It would make more sense to have the unnesting go to the right and produce a new <em>column</em> for each thing in the list. The new <code>tidyr</code> has a variant called <code>unnest_wider</code> that does this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb16-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ll)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]},{"label":["query"],"name":[2],"type":["chr"],"align":["left"]},{"label":["coords"],"name":[3],"type":["list"],"align":["right"]},{"label":["bbox"],"name":[4],"type":["list"],"align":["right"]}],"data":[{"1":"Appleton WI","2":"Appleton WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Beloit WI","2":"Beloit WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Fort.Atkinson WI","2":"Fort.Atkinson WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Madison WI","2":"Madison WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Marshfield WI","2":"Marshfield WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Milwaukee WI","2":"Milwaukee WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Monroe WI","2":"Monroe WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Superior WI","2":"Superior WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Wausau WI","2":"Wausau WI","3":"<dbl [2]>","4":"<bbox>"},{"1":"Dubuque IA","2":"Dubuque IA","3":"<dbl [2]>","4":"<bbox>"},{"1":"St.Paul MN","2":"St.Paul MN","3":"<dbl [2]>","4":"<bbox>"},{"1":"Chicago IL","2":"Chicago IL","3":"<dbl [2]>","4":"<bbox>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The longitudes and latitudes we want are still hidden in a list-column, the one called <code>coords</code>, so with luck, if we unnest that wider as well, we should be in business:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">wisc <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(ll) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest_wider</span>(coords) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> wisc</span>
<span id="cb17-4">wisc</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]},{"label":["query"],"name":[2],"type":["chr"],"align":["left"]},{"label":["x"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["y"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["bbox"],"name":[5],"type":["list"],"align":["right"]}],"data":[{"1":"Appleton WI","2":"Appleton WI","3":"-88.40697","4":"44.26140","5":"<bbox>"},{"1":"Beloit WI","2":"Beloit WI","3":"-89.03178","4":"42.50833","5":"<bbox>"},{"1":"Fort.Atkinson WI","2":"Fort.Atkinson WI","3":"-88.83705","4":"42.92889","5":"<bbox>"},{"1":"Madison WI","2":"Madison WI","3":"-89.38417","4":"43.07469","5":"<bbox>"},{"1":"Marshfield WI","2":"Marshfield WI","3":"-90.17403","4":"44.66623","5":"<bbox>"},{"1":"Milwaukee WI","2":"Milwaukee WI","3":"-87.90908","4":"43.03865","5":"<bbox>"},{"1":"Monroe WI","2":"Monroe WI","3":"-90.63973","4":"43.94168","5":"<bbox>"},{"1":"Superior WI","2":"Superior WI","3":"-92.10408","4":"46.72077","5":"<bbox>"},{"1":"Wausau WI","2":"Wausau WI","3":"-89.62728","4":"44.95979","5":"<bbox>"},{"1":"Dubuque IA","2":"Dubuque IA","3":"-90.66480","4":"42.50062","5":"<bbox>"},{"1":"St.Paul MN","2":"St.Paul MN","3":"-93.09310","4":"44.94975","5":"<bbox>"},{"1":"Chicago IL","2":"Chicago IL","3":"-87.62442","4":"41.87556","5":"<bbox>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>And now we are. <code>x</code> contains the longitudes (negative for degrees west), and <code>y</code> the latitudes (positive for degrees north).</p>
</section>
<section id="making-a-map-with-these-on-them" class="level2">
<h2 class="anchored" data-anchor-id="making-a-map-with-these-on-them">Making a map with these on them</h2>
<p>The most enjoyable way to make a map in R is to use the <code>leaflet</code> package. Making a map is a three-step process:</p>
<ul>
<li><code>leaflet()</code> with the name of the dataframe</li>
<li><code>addTiles()</code> to get map tiles to draw the map with</li>
<li>add some kind of markers to show where the points are. I use circle markers here; there are also markers (from <code>addMarkers</code>) that look like Google map pins. Here also you associate the longs and lats with the columns they are in in your dataframe:</li>
</ul>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> wisc) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb18-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addCircleMarkers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lng =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lat =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>y) </span></code></pre></div></div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item" id="htmlwidget-54c0009801cbad6567ed" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-54c0009801cbad6567ed">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org/copyright/\">OpenStreetMap<\/a>,  <a href=\"https://opendatacommons.org/licenses/odbl/\">ODbL<\/a>"}]},{"method":"addCircleMarkers","args":[[44.2613967,42.5083272,42.9288944,43.07469,44.6662287,43.0386475,43.9416755,46.7207737,44.9597858,42.5006243,44.9497487,41.8755616],[-88.4069744,-89.031784,-88.83705089999999,-89.3841663,-90.1740313,-87.9090751,-90.6397264,-92.10407960000001,-89.6272791,-90.6647985,-93.0931028,-87.6244212],10,null,null,{"interactive":true,"className":"","stroke":true,"color":"#03F","weight":5,"opacity":0.5,"fill":true,"fillColor":"#03F","fillOpacity":0.2},null,null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[41.8755616,46.7207737],"lng":[-93.0931028,-87.6244212]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>The nice thing about Leaflet maps is that you can zoom, pan and generally move about in them. For example, you can zoom in to find out which city each circle represents.</p>


</section>

 ]]></description>
  <category>code</category>
  <category>analysis</category>
  <category>maps</category>
  <guid>https://blog.ritsokiguess.site/posts/correcting-a-dataframe/</guid>
  <pubDate>Mon, 26 Apr 2021 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/correcting-a-dataframe/Screenshot 2025-12-30 at 22-39-57 SQL UPDATE Statement.png" medium="image" type="image/png" height="42" width="144"/>
</item>
<item>
  <title>Sampling locations in a city</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/sampling-locations-in-city/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>With the aim of getting an aerial map of the sampled location.</p>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Do you follow @londonmapbot on Twitter? You should. Every so often a satellite photo is posted of somewhere in London (the one in England), with the implied invitation to guess where it is. Along with the tweet is a link to openstreetmap, and if you click on it, it gives you a map of where the photo is, so you can see whether your guess was right. Or, if you’re me, you look at the latitude and longitude in the link, and figure out roughly where in the city it is. My strategy is to note that Oxford Circus, in the centre of London, is at about 51.5 north and 0.15 west, and work from there.<sup>1</sup></p>
<p>Matt Dray, who is behind @londonmapbot, selects random points in a rectangle that goes as far in each compass direction as the M25 goes. (This motorway surrounds London in something like a circle, and is often taken as a definition of what is considered to be London; if outside, not in London. There is a surprising amount of countryside inside the M25.)</p>
<p>London has the advantage of being roughly a rectangle aligned north-south and east-west, and is therefore easy to sample from. I have been thinking about doing something similar for my home city Toronto, but I ran into an immediate problem:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/sampling-locations-in-city/Screenshot_2020-10-10_12-39-13.png" title="Toronto map" class="img-fluid figure-img"></p>
<figcaption>Toronto with boundary</figcaption>
</figure>
</div>
<p>Toronto is <em>not</em> nicely aligned north-south and east-west, and so if you sample from a rectangle enclosing it, this is what will happen:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/sampling-locations-in-city/Screenshot_2020-10-10_12-40-39.png" title="Map of Toronto with randomly sampled points" class="img-fluid figure-img"></p>
<figcaption>randomly sampled points from rectangle surrounding Toronto</figcaption>
</figure>
</div>
<p>You get some points inside the city, but you will also get a number of points in Vaughan or Mississauga or Pickering or Lake Ontario! How to eliminate the ones I don’t want?</p>
</section>
<section id="sampling-from-a-region" class="level2">
<h2 class="anchored" data-anchor-id="sampling-from-a-region">Sampling from a region</h2>
<p>Let’s load some packages:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(leaflet)</span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(sp)</span></code></pre></div></div>
</div>
<p>I had this vague idea that it would be possible to decide if a sampled point was inside a polygon or not. So I figured I would start by defining the boundary of Toronto as a collection of straight lines joining points, at least approximately. The northern boundary of Toronto is Steeles Avenue, all the way across, and <em>that</em> is a straight line, but the southern boundary is Lake Ontario, and the western and eastern boundaries are a mixture of streets and rivers, so I tried to pick points which, when joined by straight lines, enclosed all of Toronto without too much extra. This is what I came up with:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">boundary <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tribble</span>(</span>
<span id="cb4-2">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>where, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>lat, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>long,</span>
<span id="cb4-3"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Steeles @ 427"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.75</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.639</span>,</span>
<span id="cb4-4"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Steeles @ Pickering Townline"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.855</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.17</span>,</span>
<span id="cb4-5"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Twyn Rivers @ Rouge River"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.815</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.15</span>,</span>
<span id="cb4-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rouge Beach"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.795</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.115</span>,</span>
<span id="cb4-7"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Tommy Thompson Park"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.61</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.33</span>,</span>
<span id="cb4-8"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Gibraltar Point"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.61</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.39</span>,</span>
<span id="cb4-9"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Sunnyside Beach"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.635</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.45</span>,</span>
<span id="cb4-10"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Cliff Lumsden Park"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.59</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.50</span>,</span>
<span id="cb4-11"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Marie Curtis Park"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.58</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.54</span>,</span>
<span id="cb4-12"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rathburn @ Mill"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.645</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.59</span>,</span>
<span id="cb4-13"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Etobicoke Creek @ Eglinton"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.645</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.61</span>,</span>
<span id="cb4-14"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Eglinton @ Renforth"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.665</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.59</span>,</span>
<span id="cb4-15"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Steeles @ 427"</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">43.75</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">79.639</span>,</span>
<span id="cb4-16">)</span>
<span id="cb4-17">boundary</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["where"],"name":[1],"type":["chr"],"align":["left"]},{"label":["lat"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["long"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"Steeles @ 427","2":"43.750","3":"-79.639"},{"1":"Steeles @ Pickering Townline","2":"43.855","3":"-79.170"},{"1":"Twyn Rivers @ Rouge River","2":"43.815","3":"-79.150"},{"1":"Rouge Beach","2":"43.795","3":"-79.115"},{"1":"Tommy Thompson Park","2":"43.610","3":"-79.330"},{"1":"Gibraltar Point","2":"43.610","3":"-79.390"},{"1":"Sunnyside Beach","2":"43.635","3":"-79.450"},{"1":"Cliff Lumsden Park","2":"43.590","3":"-79.500"},{"1":"Marie Curtis Park","2":"43.580","3":"-79.540"},{"1":"Rathburn @ Mill","2":"43.645","3":"-79.590"},{"1":"Etobicoke Creek @ Eglinton","2":"43.645","3":"-79.610"},{"1":"Eglinton @ Renforth","2":"43.665","3":"-79.590"},{"1":"Steeles @ 427","2":"43.750","3":"-79.639"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>I kind of had the idea that you could determine whether a point was inside a polygon or not. The idea turns out to be <a href="https://www.geeksforgeeks.org/how-to-check-if-a-given-point-lies-inside-a-polygon/">this</a>: you draw a line to the right from your point; if it crosses the boundary of the polygon an odd number of times, it’s inside, and if an even number of times, it’s outside. So is there something like this in R? Yes: <a href="https://www.rdocumentation.org/packages/sp/versions/1.4-2/topics/point.in.polygon">this function</a> in the <code>sp</code> package.<sup>2</sup></p>
<p>So now I could generate some points in the enclosing rectangle and see whether they were inside or outside the city, like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">457299</span>)</span>
<span id="cb5-2">n_point <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span></span>
<span id="cb5-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lat =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n_point, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat)),</span>
<span id="cb5-4">       <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">long =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">runif</span>(n_point, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long))) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb5-5">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">inside =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">point.in.polygon</span>(d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long, d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat, boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long, boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ifelse</span>(inside <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"blue"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"red"</span>)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d</span>
<span id="cb5-7">d</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["lat"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["long"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["inside"],"name":[3],"type":["int"],"align":["right"]},{"label":["colour"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"43.84059","2":"-79.61337","3":"0","4":"red"},{"1":"43.61426","2":"-79.46672","3":"0","4":"red"},{"1":"43.64263","2":"-79.57940","3":"1","4":"blue"},{"1":"43.73337","2":"-79.30686","3":"1","4":"blue"},{"1":"43.68835","2":"-79.16859","3":"0","4":"red"},{"1":"43.78691","2":"-79.32389","3":"1","4":"blue"},{"1":"43.64658","2":"-79.22027","3":"0","4":"red"},{"1":"43.63970","2":"-79.21372","3":"0","4":"red"},{"1":"43.74072","2":"-79.51963","3":"1","4":"blue"},{"1":"43.80795","2":"-79.22558","3":"1","4":"blue"},{"1":"43.78923","2":"-79.36598","3":"1","4":"blue"},{"1":"43.68218","2":"-79.53673","3":"1","4":"blue"},{"1":"43.61862","2":"-79.14901","3":"0","4":"red"},{"1":"43.70045","2":"-79.42279","3":"1","4":"blue"},{"1":"43.79605","2":"-79.40043","3":"1","4":"blue"},{"1":"43.77576","2":"-79.20796","3":"1","4":"blue"},{"1":"43.71794","2":"-79.32193","3":"1","4":"blue"},{"1":"43.67341","2":"-79.11510","3":"0","4":"red"},{"1":"43.81744","2":"-79.61663","3":"0","4":"red"},{"1":"43.73533","2":"-79.18852","3":"1","4":"blue"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The function <code>point.in.polygon</code> returns a 1 if the point is inside the polygon (city boundary) and a 0 if outside.<sup>3</sup></p>
<p>I added a column <code>colour</code> to plot the inside and outside points in different colours on a map, which we do next. The <code>leaflet</code> package is much the easiest way to do this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>(d) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addCircleMarkers</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> d<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>colour) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addPolygons</span>(boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long, boundary<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Assuming "long" and "lat" are longitude and latitude, respectively</code></pre>
</div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item" id="htmlwidget-8b59768cdb3fb27325bc" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-8b59768cdb3fb27325bc">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org/copyright/\">OpenStreetMap<\/a>,  <a href=\"https://opendatacommons.org/licenses/odbl/\">ODbL<\/a>"}]},{"method":"addCircleMarkers","args":[[43.8405856347538,43.61426369740045,43.64262542010518,43.73337334233918,43.68834765338222,43.78690791667439,43.64657993395347,43.63969688656623,43.74071805920801,43.8079513767513,43.78922769949538,43.68217609096435,43.61862195389229,43.7004526383779,43.79604510519537,43.77576099754776,43.7179439867509,43.67340562679572,43.8174444213754,43.73533082670881],[-79.61336902292352,-79.46672170049324,-79.57940009679831,-79.30686328414176,-79.16858700585179,-79.32389253722317,-79.22027419113367,-79.21371703103091,-79.51963196806329,-79.22557651174441,-79.36597719456627,-79.53672570987045,-79.14900863720942,-79.42279331711586,-79.40042839737143,-79.20795921882335,-79.32193468714598,-79.11510484959929,-79.61663471217173,-79.18852064993139],10,null,null,{"interactive":true,"className":"","stroke":true,"color":["red","red","blue","blue","red","blue","red","red","blue","blue","blue","blue","red","blue","blue","blue","blue","red","red","blue"],"weight":5,"opacity":0.5,"fill":true,"fillColor":["red","red","blue","blue","red","blue","red","red","blue","blue","blue","blue","red","blue","blue","blue","blue","red","red","blue"],"fillOpacity":0.2},null,null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]},{"method":"addPolygons","args":[[[[{"lng":[-79.639,-79.17,-79.15000000000001,-79.11499999999999,-79.33,-79.39,-79.45,-79.5,-79.54000000000001,-79.59,-79.61,-79.59,-79.639],"lat":[43.75,43.855,43.815,43.795,43.61,43.61,43.635,43.59,43.58,43.645,43.645,43.665,43.75]}]]],null,null,{"interactive":true,"className":"","stroke":true,"color":"#03F","weight":5,"opacity":0.5,"fill":true,"fillColor":"#03F","fillOpacity":0.2,"smoothFactor":1,"noClip":false},null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[43.58,43.855],"lng":[-79.639,-79.11499999999999]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>The polygons come from a different dataframe, so I need to specify that in <code>addPolygons</code>. Leaflet is clever enough to figure out which is longitude and which latitude (there are several possibilities it will understand).</p>
<p>This one seems to have classified the points more or less correctly. The bottom left red circle is just in the lake, though it looks as if one of the three rightmost blue circles is in the lake also. Oops. The way to test this is to generate several sets of random points, test the ones near the boundary, and if they were classified wrongly, tweak the boundary points and try again. The coastline around the Scarborough Bluffs is not as straight as I was hoping.</p>
</section>
<section id="mapbox" class="level2">
<h2 class="anchored" data-anchor-id="mapbox">Mapbox</h2>
<p><a href="https://www.rostrum.blog/2020/09/21/londonmapbot/">Matt Dray’s blog post</a> gives a nice clear explanation of how to set up MapBox to return you a satellite image of a lat and long you feed it. What you need is a Mapbox API key. A good place to save this is in your <code>.Renviron</code>, and <code>edit_r_environ</code> from <code>usethis</code> is a good way to get at that. Then you use this key to construct a URL that will return you an image of that point.</p>
<p>Let’s grab one of those sampled points that actually is in Toronto:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(inside <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d1</span>
<span id="cb8-2">d1</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["lat"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["long"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["inside"],"name":[3],"type":["int"],"align":["right"]},{"label":["colour"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"43.64263","2":"-79.5794","3":"1","4":"blue"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and then I get my API key and use it to make a URL for an image at this point:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">mapbox_token <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Sys.getenv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"MAPBOX_TOKEN"</span>)</span>
<span id="cb9-2">url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://api.mapbox.com/styles/v1/mapbox/satellite-v9/static/"</span>,</span>
<span id="cb9-3">             d1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>long,</span>
<span id="cb9-4">             <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">","</span>,</span>
<span id="cb9-5">             d1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>lat,</span>
<span id="cb9-6">             <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">",15,0/600x400?access_token="</span>,</span>
<span id="cb9-7">             mapbox_token)</span></code></pre></div></div>
</div>
<p>I’m not showing you the actual URL, since it contains my key! The last-but-one line contains the zoom (15) and the size of the image (600 by 400). These are slightly more zoomed out and bigger than the values Matt uses. (I wanted to have a wider area to make it easier to guess.)</p>
<p>Then download this and save it somewhere:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">where <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"img.png"</span></span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download.file</span>(url, where)</span></code></pre></div></div>
</div>
<p>and display it:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/sampling-locations-in-city/img.png" class="img-fluid figure-img"></p>
<figcaption>satellite image of somewhere in Toronto</figcaption>
</figure>
</div>
<p>I don’t recognize that, so I’ll fire up leaflet again:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">leaflet</span>(d1) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addTiles</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">addCircleMarkers</span>() </span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Assuming "long" and "lat" are longitude and latitude, respectively</code></pre>
</div>
<div class="cell-output-display">
<div class="leaflet html-widget html-fill-item" id="htmlwidget-f76071c14749485d95c8" style="width:100%;height:464px;"></div>
<script type="application/json" data-for="htmlwidget-f76071c14749485d95c8">{"x":{"options":{"crs":{"crsClass":"L.CRS.EPSG3857","code":null,"proj4def":null,"projectedBounds":null,"options":{}}},"calls":[{"method":"addTiles","args":["https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png",null,null,{"minZoom":0,"maxZoom":18,"tileSize":256,"subdomains":"abc","errorTileUrl":"","tms":false,"noWrap":false,"zoomOffset":0,"zoomReverse":false,"opacity":1,"zIndex":1,"detectRetina":false,"attribution":"&copy; <a href=\"https://openstreetmap.org/copyright/\">OpenStreetMap<\/a>,  <a href=\"https://opendatacommons.org/licenses/odbl/\">ODbL<\/a>"}]},{"method":"addCircleMarkers","args":[43.64262542010518,-79.57940009679831,10,null,null,{"interactive":true,"className":"","stroke":true,"color":"#03F","weight":5,"opacity":0.5,"fill":true,"fillColor":"#03F","fillOpacity":0.2},null,null,null,null,null,{"interactive":false,"permanent":false,"direction":"auto","opacity":1,"offset":[0,0],"textsize":"10px","textOnly":false,"className":"","sticky":true},null]}],"limits":{"lat":[43.64262542010518,43.64262542010518],"lng":[-79.57940009679831,-79.57940009679831]}},"evals":[],"jsHooks":[]}</script>
</div>
</div>
<p>It’s the bit of Toronto that’s almost in Mississauga. The boundary is Etobicoke Creek, at the bottom left of the image.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<p><a href="https://www.geeksforgeeks.org/how-to-check-if-a-given-point-lies-inside-a-polygon/">How to determine if point inside polygon</a></p>
<p><a href="https://www.rdocumentation.org/packages/sp/versions/1.4-2/topics/point.in.polygon">point.in.polygon function documentation</a></p>
<p><a href="https://www.rostrum.blog/2020/09/21/londonmapbot/">Matt Dray blog post on londonmapbot</a></p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>London extends roughly between latitude 51.2 and 51.7 degrees, and between longitude 0.25 degrees east and 0.5 west. Knowing this enables you to place a location in London from its lat and long.↩︎</p></li>
<li id="fn2"><p>Having had a bad experience with rgdal earlier, I was afraid that sp would be a pain to install, but there was no problem at all.↩︎</p></li>
<li id="fn3"><p>It also returns a 2 if the point is on an edge of the polygon and a 3 if at a vertex.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <category>maps</category>
  <guid>https://blog.ritsokiguess.site/posts/sampling-locations-in-city/</guid>
  <pubDate>Sat, 10 Oct 2020 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/sampling-locations-in-city/Screenshot_2020-10-10_12-39-13.png" medium="image" type="image/png" height="87" width="144"/>
</item>
<item>
  <title>Another tidying problem</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/another-tidying-problem/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>This ends up with a matched pairs test after tidying.</p>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Some cars have a computer that records gas mileage since the last time the computer was reset. A driver is concerned that the computer on their car is not as accurate as it might be, so they keep an old-fashioned notebook and record the miles driven since the last fillup, and the amount of gas filled up, and use that to compute the miles per gallon. They also record what the car’s computer says the miles per gallon was.</p>
<p>Is there a systematic difference between the computer’s values and the driver’s? If so, which way does it go?</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="the-data" class="level2">
<h2 class="anchored" data-anchor-id="the-data">The data</h2>
<p>The driver’s notebook has small pages, so the data look like this:</p>
<pre><code>Fillup     1    2    3    4    5
Computer 41.5 50.7 36.6 37.3 34.2
Driver   36.5 44.2 37.2 35.6 30.5
Fillup     6    7    8    9   10
Computer 45.0 48.0 43.2 47.7 42.2
Driver   40.5 40.0 41.0 42.8 39.2
Fillup    11   12   13   14   15
Computer 43.2 44.6 48.4 46.4 46.8
Driver   38.8 44.5 45.4 45.3 45.7
Fillup    16   17   18   19   20
Computer 39.2 37.3 43.5 44.3 43.3
Driver   34.2 35.2 39.8 44.9 47.5</code></pre>
<p>This is not very close to tidy. There are three variables: the fillup number (identification), the computer’s miles-per-gallon value, and the driver’s. These should be in <em>columns</em>, not rows. Also, there are really four sets of rows, because of the way the data was recorded. How are we going to make this tidy?</p>
</section>
<section id="making-it-tidy" class="level2">
<h2 class="anchored" data-anchor-id="making-it-tidy">Making it tidy</h2>
<p>As ever, we take this one step at a time, building a pipeline as we go: we see what each step produces before figuring out what to do next.</p>
<p>The first thing is to read the data in; these are aligned columns, so <code>read_table</code> is the thing. Also, there are no column headers, so we have to say that as well:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">my_url <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"gas-mileage.txt"</span></span>
<span id="cb4-2">gas <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_table</span>(my_url, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>
── Column specification ────────────────────────────────────────────────────────
cols(
  X1 = col_character(),
  X2 = col_double(),
  X3 = col_double(),
  X4 = col_double(),
  X5 = col_double(),
  X6 = col_double()
)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">gas</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["X2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["X3"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["X4"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["X5"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["X6"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"Fillup","2":"1.0","3":"2.0","4":"3.0","5":"4.0","6":"5.0"},{"1":"Computer","2":"41.5","3":"50.7","4":"36.6","5":"37.3","6":"34.2"},{"1":"Driver","2":"36.5","3":"44.2","4":"37.2","5":"35.6","6":"30.5"},{"1":"Fillup","2":"6.0","3":"7.0","4":"8.0","5":"9.0","6":"10.0"},{"1":"Computer","2":"45.0","3":"48.0","4":"43.2","5":"47.7","6":"42.2"},{"1":"Driver","2":"40.5","3":"40.0","4":"41.0","5":"42.8","6":"39.2"},{"1":"Fillup","2":"11.0","3":"12.0","4":"13.0","5":"14.0","6":"15.0"},{"1":"Computer","2":"43.2","3":"44.6","4":"48.4","5":"46.4","6":"46.8"},{"1":"Driver","2":"38.8","3":"44.5","4":"45.4","5":"45.3","6":"45.7"},{"1":"Fillup","2":"16.0","3":"17.0","4":"18.0","5":"19.0","6":"20.0"},{"1":"Computer","2":"39.2","3":"37.3","4":"43.5","5":"44.3","6":"43.3"},{"1":"Driver","2":"34.2","3":"35.2","4":"39.8","5":"44.9","6":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<section id="longer-first" class="level3">
<h3 class="anchored" data-anchor-id="longer-first">Longer first</h3>
<p>I usually find it easier to make the dataframe longer first, and then figure out what to do next. Here, that means putting all the data values in one column, and having a column of variable names indicating what each variable is a value of, thus:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["var_name"],"name":[2],"type":["chr"],"align":["left"]},{"label":["var_value"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"Fillup","2":"X2","3":"1.0"},{"1":"Fillup","2":"X3","3":"2.0"},{"1":"Fillup","2":"X4","3":"3.0"},{"1":"Fillup","2":"X5","3":"4.0"},{"1":"Fillup","2":"X6","3":"5.0"},{"1":"Computer","2":"X2","3":"41.5"},{"1":"Computer","2":"X3","3":"50.7"},{"1":"Computer","2":"X4","3":"36.6"},{"1":"Computer","2":"X5","3":"37.3"},{"1":"Computer","2":"X6","3":"34.2"},{"1":"Driver","2":"X2","3":"36.5"},{"1":"Driver","2":"X3","3":"44.2"},{"1":"Driver","2":"X4","3":"37.2"},{"1":"Driver","2":"X5","3":"35.6"},{"1":"Driver","2":"X6","3":"30.5"},{"1":"Fillup","2":"X2","3":"6.0"},{"1":"Fillup","2":"X3","3":"7.0"},{"1":"Fillup","2":"X4","3":"8.0"},{"1":"Fillup","2":"X5","3":"9.0"},{"1":"Fillup","2":"X6","3":"10.0"},{"1":"Computer","2":"X2","3":"45.0"},{"1":"Computer","2":"X3","3":"48.0"},{"1":"Computer","2":"X4","3":"43.2"},{"1":"Computer","2":"X5","3":"47.7"},{"1":"Computer","2":"X6","3":"42.2"},{"1":"Driver","2":"X2","3":"40.5"},{"1":"Driver","2":"X3","3":"40.0"},{"1":"Driver","2":"X4","3":"41.0"},{"1":"Driver","2":"X5","3":"42.8"},{"1":"Driver","2":"X6","3":"39.2"},{"1":"Fillup","2":"X2","3":"11.0"},{"1":"Fillup","2":"X3","3":"12.0"},{"1":"Fillup","2":"X4","3":"13.0"},{"1":"Fillup","2":"X5","3":"14.0"},{"1":"Fillup","2":"X6","3":"15.0"},{"1":"Computer","2":"X2","3":"43.2"},{"1":"Computer","2":"X3","3":"44.6"},{"1":"Computer","2":"X4","3":"48.4"},{"1":"Computer","2":"X5","3":"46.4"},{"1":"Computer","2":"X6","3":"46.8"},{"1":"Driver","2":"X2","3":"38.8"},{"1":"Driver","2":"X3","3":"44.5"},{"1":"Driver","2":"X4","3":"45.4"},{"1":"Driver","2":"X5","3":"45.3"},{"1":"Driver","2":"X6","3":"45.7"},{"1":"Fillup","2":"X2","3":"16.0"},{"1":"Fillup","2":"X3","3":"17.0"},{"1":"Fillup","2":"X4","3":"18.0"},{"1":"Fillup","2":"X5","3":"19.0"},{"1":"Fillup","2":"X6","3":"20.0"},{"1":"Computer","2":"X2","3":"39.2"},{"1":"Computer","2":"X3","3":"37.3"},{"1":"Computer","2":"X4","3":"43.5"},{"1":"Computer","2":"X5","3":"44.3"},{"1":"Computer","2":"X6","3":"43.3"},{"1":"Driver","2":"X2","3":"34.2"},{"1":"Driver","2":"X3","3":"35.2"},{"1":"Driver","2":"X4","3":"39.8"},{"1":"Driver","2":"X5","3":"44.9"},{"1":"Driver","2":"X6","3":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>The things in <code>X1</code> are our column-names-to-be, and the values that go with them are in <code>var_value</code>. <code>var_name</code> has mostly served its purpose; these are the original columns in the data file, which we don’t need any more. So now, we make this wider, right?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> X1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> var_value)  </span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Values from `var_value` are not uniquely identified; output will contain
list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
  {data} |&gt;
  dplyr::summarise(n = dplyr::n(), .by = c(var_name, X1)) |&gt;
  dplyr::filter(n &gt; 1L)</code></pre>
</div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["var_name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Fillup"],"name":[2],"type":["list"],"align":["right"]},{"label":["Computer"],"name":[3],"type":["list"],"align":["right"]},{"label":["Driver"],"name":[4],"type":["list"],"align":["right"]}],"data":[{"1":"X2","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X3","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X4","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X5","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X6","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Oh. How did we get list-columns?</p>
<p>The answer is that <code>pivot_wider</code> needs to know which <em>column</em> each <code>var_value</code> is going to, but also which <em>row</em>. The way it decides about rows is to look at all combinations of things in the <em>other</em> columns, the ones not involved in the <code>pivot_wider</code>. The only one of those here is <code>var_name</code>, so each value goes in the column according to its value in <code>X1</code>, and the row according to its value in <code>var_name</code>. For example, the value 41.5 in row 6 of the longer dataframe goes into the column labelled <code>Computer</code> and the row labelled <code>X2</code>. But if you scroll down the longer dataframe, you’ll find there are four data values with the <code>Computer</code>-<code>X2</code> combination, so <code>pivot_wider</code> collects them together into one cell of the output dataframe.</p>
<p>This is what the warning is about.</p>
<p><code>spread</code> handled this much less gracefully:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">spread</span>(X1, var_value)  </span></code></pre></div></div>
<div class="cell-output cell-output-error">
<pre><code>Error in `spread()`:
! Each row of output must be identified by a unique combination of keys.
ℹ Keys are shared for 60 rows
• 6, 21, 36, 51
• 7, 22, 37, 52
• 8, 23, 38, 53
• 9, 24, 39, 54
• 10, 25, 40, 55
• 11, 26, 41, 56
• 12, 27, 42, 57
• 13, 28, 43, 58
• 14, 29, 44, 59
• 15, 30, 45, 60
• 1, 16, 31, 46
• 2, 17, 32, 47
• 3, 18, 33, 48
• 4, 19, 34, 49
• 5, 20, 35, 50</code></pre>
</div>
</div>
<p>It required a unique combination of values for the other variables in the dataframe, <a href="http://www.solearabiantree.net/namingofparts/namingofparts.php">which in our case we have not got</a>.</p>
<p>All right, back to this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb12-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> X1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> var_value)  </span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Values from `var_value` are not uniquely identified; output will contain
list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
  {data} |&gt;
  dplyr::summarise(n = dplyr::n(), .by = c(var_name, X1)) |&gt;
  dplyr::filter(n &gt; 1L)</code></pre>
</div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["var_name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Fillup"],"name":[2],"type":["list"],"align":["right"]},{"label":["Computer"],"name":[3],"type":["list"],"align":["right"]},{"label":["Driver"],"name":[4],"type":["list"],"align":["right"]}],"data":[{"1":"X2","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X3","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X4","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X5","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"},{"1":"X6","2":"<dbl [4]>","3":"<dbl [4]>","4":"<dbl [4]>"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>There is a mindless way to go on from here, and a thoughtful way.</p>
<p>The mindless way to handle unwanted list-columns is to throw an <code>unnest</code> at the problem and see what happens:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> X1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> var_value)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unnest</span>()</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: Values from `var_value` are not uniquely identified; output will contain
list-cols.
• Use `values_fn = list` to suppress this warning.
• Use `values_fn = {summary_fun}` to summarise duplicates.
• Use the following dplyr code to identify duplicates.
  {data} |&gt;
  dplyr::summarise(n = dplyr::n(), .by = c(var_name, X1)) |&gt;
  dplyr::filter(n &gt; 1L)</code></pre>
</div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: `cols` is now required when using `unnest()`.
ℹ Please use `cols = c(Fillup, Computer, Driver)`.</code></pre>
</div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["var_name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Fillup"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Computer"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Driver"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"X2","2":"1","3":"41.5","4":"36.5"},{"1":"X2","2":"6","3":"45.0","4":"40.5"},{"1":"X2","2":"11","3":"43.2","4":"38.8"},{"1":"X2","2":"16","3":"39.2","4":"34.2"},{"1":"X3","2":"2","3":"50.7","4":"44.2"},{"1":"X3","2":"7","3":"48.0","4":"40.0"},{"1":"X3","2":"12","3":"44.6","4":"44.5"},{"1":"X3","2":"17","3":"37.3","4":"35.2"},{"1":"X4","2":"3","3":"36.6","4":"37.2"},{"1":"X4","2":"8","3":"43.2","4":"41.0"},{"1":"X4","2":"13","3":"48.4","4":"45.4"},{"1":"X4","2":"18","3":"43.5","4":"39.8"},{"1":"X5","2":"4","3":"37.3","4":"35.6"},{"1":"X5","2":"9","3":"47.7","4":"42.8"},{"1":"X5","2":"14","3":"46.4","4":"45.3"},{"1":"X5","2":"19","3":"44.3","4":"44.9"},{"1":"X6","2":"5","3":"34.2","4":"30.5"},{"1":"X6","2":"10","3":"42.2","4":"39.2"},{"1":"X6","2":"15","3":"46.8","4":"45.7"},{"1":"X6","2":"20","3":"43.3","4":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This has worked.<sup>1</sup> The fillup numbers have come out in the wrong order, but that’s probably not a problem. It would also work if you had a different number of observations on each row of the original data file, as long as you had a fillup number, a computer value and a driver value for each one.</p>
<p>The thoughtful way to go is to organize it so that each row will have a unique combination of columns that are left. A way to do that is to note that the original data file has four “blocks” of five observations each:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">gas</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["X2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["X3"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["X4"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["X5"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["X6"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"Fillup","2":"1.0","3":"2.0","4":"3.0","5":"4.0","6":"5.0"},{"1":"Computer","2":"41.5","3":"50.7","4":"36.6","5":"37.3","6":"34.2"},{"1":"Driver","2":"36.5","3":"44.2","4":"37.2","5":"35.6","6":"30.5"},{"1":"Fillup","2":"6.0","3":"7.0","4":"8.0","5":"9.0","6":"10.0"},{"1":"Computer","2":"45.0","3":"48.0","4":"43.2","5":"47.7","6":"42.2"},{"1":"Driver","2":"40.5","3":"40.0","4":"41.0","5":"42.8","6":"39.2"},{"1":"Fillup","2":"11.0","3":"12.0","4":"13.0","5":"14.0","6":"15.0"},{"1":"Computer","2":"43.2","3":"44.6","4":"48.4","5":"46.4","6":"46.8"},{"1":"Driver","2":"38.8","3":"44.5","4":"45.4","5":"45.3","6":"45.7"},{"1":"Fillup","2":"16.0","3":"17.0","4":"18.0","5":"19.0","6":"20.0"},{"1":"Computer","2":"39.2","3":"37.3","4":"43.5","5":"44.3","6":"43.3"},{"1":"Driver","2":"34.2","3":"35.2","4":"39.8","5":"44.9","6":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Each set of three rows is one block. So if we number the blocks, each observation of <code>Fillup</code>, <code>Computer</code>, and <code>Driver</code> will have an X-something column that it comes from and a block, and this combination will be unique.</p>
<p>You could create the block column by hand easily enough, or note that each block starts with a row called <code>Fillup</code> and use <a href="https://tidyr.tidyverse.org/articles/pivot.html#contact-list-1">this idea</a>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">block =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(X1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fillup"</span>))</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["X2"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["X3"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["X4"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["X5"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["X6"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["block"],"name":[7],"type":["int"],"align":["right"]}],"data":[{"1":"Fillup","2":"1.0","3":"2.0","4":"3.0","5":"4.0","6":"5.0","7":"1"},{"1":"Computer","2":"41.5","3":"50.7","4":"36.6","5":"37.3","6":"34.2","7":"1"},{"1":"Driver","2":"36.5","3":"44.2","4":"37.2","5":"35.6","6":"30.5","7":"1"},{"1":"Fillup","2":"6.0","3":"7.0","4":"8.0","5":"9.0","6":"10.0","7":"2"},{"1":"Computer","2":"45.0","3":"48.0","4":"43.2","5":"47.7","6":"42.2","7":"2"},{"1":"Driver","2":"40.5","3":"40.0","4":"41.0","5":"42.8","6":"39.2","7":"2"},{"1":"Fillup","2":"11.0","3":"12.0","4":"13.0","5":"14.0","6":"15.0","7":"3"},{"1":"Computer","2":"43.2","3":"44.6","4":"48.4","5":"46.4","6":"46.8","7":"3"},{"1":"Driver","2":"38.8","3":"44.5","4":"45.4","5":"45.3","6":"45.7","7":"3"},{"1":"Fillup","2":"16.0","3":"17.0","4":"18.0","5":"19.0","6":"20.0","7":"4"},{"1":"Computer","2":"39.2","3":"37.3","4":"43.5","5":"44.3","6":"43.3","7":"4"},{"1":"Driver","2":"34.2","3":"35.2","4":"39.8","5":"44.9","6":"47.5","7":"4"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This works because <code>X1=="Fillup"</code> is either true or false. <code>cumsum</code> takes cumulative sums; that is, the sum of all the values in the column down to and including the one you’re looking at. It requires numeric input, though, so it turns the logical values into 1 for <code>TRUE</code> and 0 for <code>FALSE</code> and adds <em>those</em> up. (This is the same thing that <code>as.numeric</code> does.) The idea is that the value of <code>block</code> gets bumped by one every time you hit a <code>Fillup</code> line.</p>
<p>Then pivot-longer as before:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">block =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(X1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fillup"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["block"],"name":[2],"type":["int"],"align":["right"]},{"label":["var_name"],"name":[3],"type":["chr"],"align":["left"]},{"label":["var_value"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"Fillup","2":"1","3":"X2","4":"1.0"},{"1":"Fillup","2":"1","3":"X3","4":"2.0"},{"1":"Fillup","2":"1","3":"X4","4":"3.0"},{"1":"Fillup","2":"1","3":"X5","4":"4.0"},{"1":"Fillup","2":"1","3":"X6","4":"5.0"},{"1":"Computer","2":"1","3":"X2","4":"41.5"},{"1":"Computer","2":"1","3":"X3","4":"50.7"},{"1":"Computer","2":"1","3":"X4","4":"36.6"},{"1":"Computer","2":"1","3":"X5","4":"37.3"},{"1":"Computer","2":"1","3":"X6","4":"34.2"},{"1":"Driver","2":"1","3":"X2","4":"36.5"},{"1":"Driver","2":"1","3":"X3","4":"44.2"},{"1":"Driver","2":"1","3":"X4","4":"37.2"},{"1":"Driver","2":"1","3":"X5","4":"35.6"},{"1":"Driver","2":"1","3":"X6","4":"30.5"},{"1":"Fillup","2":"2","3":"X2","4":"6.0"},{"1":"Fillup","2":"2","3":"X3","4":"7.0"},{"1":"Fillup","2":"2","3":"X4","4":"8.0"},{"1":"Fillup","2":"2","3":"X5","4":"9.0"},{"1":"Fillup","2":"2","3":"X6","4":"10.0"},{"1":"Computer","2":"2","3":"X2","4":"45.0"},{"1":"Computer","2":"2","3":"X3","4":"48.0"},{"1":"Computer","2":"2","3":"X4","4":"43.2"},{"1":"Computer","2":"2","3":"X5","4":"47.7"},{"1":"Computer","2":"2","3":"X6","4":"42.2"},{"1":"Driver","2":"2","3":"X2","4":"40.5"},{"1":"Driver","2":"2","3":"X3","4":"40.0"},{"1":"Driver","2":"2","3":"X4","4":"41.0"},{"1":"Driver","2":"2","3":"X5","4":"42.8"},{"1":"Driver","2":"2","3":"X6","4":"39.2"},{"1":"Fillup","2":"3","3":"X2","4":"11.0"},{"1":"Fillup","2":"3","3":"X3","4":"12.0"},{"1":"Fillup","2":"3","3":"X4","4":"13.0"},{"1":"Fillup","2":"3","3":"X5","4":"14.0"},{"1":"Fillup","2":"3","3":"X6","4":"15.0"},{"1":"Computer","2":"3","3":"X2","4":"43.2"},{"1":"Computer","2":"3","3":"X3","4":"44.6"},{"1":"Computer","2":"3","3":"X4","4":"48.4"},{"1":"Computer","2":"3","3":"X5","4":"46.4"},{"1":"Computer","2":"3","3":"X6","4":"46.8"},{"1":"Driver","2":"3","3":"X2","4":"38.8"},{"1":"Driver","2":"3","3":"X3","4":"44.5"},{"1":"Driver","2":"3","3":"X4","4":"45.4"},{"1":"Driver","2":"3","3":"X5","4":"45.3"},{"1":"Driver","2":"3","3":"X6","4":"45.7"},{"1":"Fillup","2":"4","3":"X2","4":"16.0"},{"1":"Fillup","2":"4","3":"X3","4":"17.0"},{"1":"Fillup","2":"4","3":"X4","4":"18.0"},{"1":"Fillup","2":"4","3":"X5","4":"19.0"},{"1":"Fillup","2":"4","3":"X6","4":"20.0"},{"1":"Computer","2":"4","3":"X2","4":"39.2"},{"1":"Computer","2":"4","3":"X3","4":"37.3"},{"1":"Computer","2":"4","3":"X4","4":"43.5"},{"1":"Computer","2":"4","3":"X5","4":"44.3"},{"1":"Computer","2":"4","3":"X6","4":"43.3"},{"1":"Driver","2":"4","3":"X2","4":"34.2"},{"1":"Driver","2":"4","3":"X3","4":"35.2"},{"1":"Driver","2":"4","3":"X4","4":"39.8"},{"1":"Driver","2":"4","3":"X5","4":"44.9"},{"1":"Driver","2":"4","3":"X6","4":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and now you can check that the <code>var_name</code> - <code>block</code> combinations are unique for each value in <code>X1</code>, so pivoting wider should work smoothly now:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">(gas <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">block =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(X1<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Fillup"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(X2<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>X6, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_name"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"var_value"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from =</span> X1, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from =</span> var_value) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> gas1)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["block"],"name":[1],"type":["int"],"align":["right"]},{"label":["var_name"],"name":[2],"type":["chr"],"align":["left"]},{"label":["Fillup"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Computer"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Driver"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"1","2":"X2","3":"1","4":"41.5","5":"36.5"},{"1":"1","2":"X3","3":"2","4":"50.7","5":"44.2"},{"1":"1","2":"X4","3":"3","4":"36.6","5":"37.2"},{"1":"1","2":"X5","3":"4","4":"37.3","5":"35.6"},{"1":"1","2":"X6","3":"5","4":"34.2","5":"30.5"},{"1":"2","2":"X2","3":"6","4":"45.0","5":"40.5"},{"1":"2","2":"X3","3":"7","4":"48.0","5":"40.0"},{"1":"2","2":"X4","3":"8","4":"43.2","5":"41.0"},{"1":"2","2":"X5","3":"9","4":"47.7","5":"42.8"},{"1":"2","2":"X6","3":"10","4":"42.2","5":"39.2"},{"1":"3","2":"X2","3":"11","4":"43.2","5":"38.8"},{"1":"3","2":"X3","3":"12","4":"44.6","5":"44.5"},{"1":"3","2":"X4","3":"13","4":"48.4","5":"45.4"},{"1":"3","2":"X5","3":"14","4":"46.4","5":"45.3"},{"1":"3","2":"X6","3":"15","4":"46.8","5":"45.7"},{"1":"4","2":"X2","3":"16","4":"39.2","5":"34.2"},{"1":"4","2":"X3","3":"17","4":"37.3","5":"35.2"},{"1":"4","2":"X4","3":"18","4":"43.5","5":"39.8"},{"1":"4","2":"X5","3":"19","4":"44.3","5":"44.9"},{"1":"4","2":"X6","3":"20","4":"43.3","5":"47.5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and so it does.</p>
<p>Sometimes a <code>pivot_longer</code> followed by a <code>pivot_wider</code> can be turned into a single <code>pivot_longer</code> with options (see the <a href="https://tidyr.tidyverse.org/articles/pivot.html">pivoting vignette</a> for examples), but this appears not to be one of those.</p>
</section>
</section>
<section id="comparing-the-driver-and-the-computer" class="level2">
<h2 class="anchored" data-anchor-id="comparing-the-driver-and-the-computer">Comparing the driver and the computer</h2>
<p>Now that we have tidy data, we can do an analysis. These are matched-pair data (one <code>Computer</code> and one <code>Driver</code> measurement), so a sensible graph would be of the differences, a histogram, say:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">gas1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">diff =</span> Computer <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> Driver) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb21-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span>diff)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/another-tidying-problem/index_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>There is only one observation where the driver’s measurement is much bigger than the computer’s; otherwise, there is not much to choose or the computer’s measurement is bigger. Is this something that would generalize to “all measurements”, presumably all measurements at fillup by this driver and this computer? The differences are not badly non-normal, so a <img src="https://latex.codecogs.com/png.latex?t">-test should be fine:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">with</span>(gas1, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">t.test</span>(Computer, Driver, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">paired =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Paired t-test

data:  Computer and Driver
t = 4.358, df = 19, p-value = 0.0003386
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 1.418847 4.041153
sample estimates:
mean difference 
           2.73 </code></pre>
</div>
</div>
<p>It is. The computer’s mean measurement is estimated to be between about 1.4 and 4.0 miles per gallon larger than the driver’s.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li>Data from <a href="https://www.amazon.com/Freeman-Introduction-Practice-Statistics-7th/dp/1429274077">here</a>, exercise 7.35.</li>
<li><a href="http://www.solearabiantree.net/namingofparts/namingofparts.php">Naming of parts</a></li>
<li><a href="https://tidyr.tidyverse.org/articles/pivot.html">Pivoting vignette from tidyr</a></li>
</ul>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I did get away with using unnest the old-fashioned way, though. What I should have done is given below the second warning.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>tidying</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/another-tidying-problem/</guid>
  <pubDate>Mon, 07 Sep 2020 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/another-tidying-problem/Screenshot from 2025-12-30 22-45-59.png" medium="image" type="image/png" height="222" width="144"/>
</item>
<item>
  <title>Understanding the result of a chi-square test</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/understanding-chi-square/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Going beyond the chi-square statistic and its P-value</p>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>A chisquare test can be used for assessing whether there is association between two categorical variables. The problem it has is that knowing that an association exists is only part of the story; we want to know what is making the association happen. This is the same kind of thing that happens with analysis of variance: a significant <img src="https://latex.codecogs.com/png.latex?F">-test indicates that the group means are not all the same, but not which ones are different.</p>
<p>Recently I discovered that R’s <code>chisq.test</code> has something that will help in understanding this.</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
<p>which I always seem to need for something.</p>
</section>
<section id="example" class="level2">
<h2 class="anchored" data-anchor-id="example">Example</h2>
<p>How do males and females differ in their choice of eyewear (glasses, contacts, neither), if at all? Some data (frequencies):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">eyewear <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tribble</span>(</span>
<span id="cb3-2">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>gender, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>contacts, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>glasses, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>none,</span>
<span id="cb3-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"female"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">121</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">129</span>,</span>
<span id="cb3-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"male"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">37</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">85</span></span>
<span id="cb3-5">)</span>
<span id="cb3-6">eyewear</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["gender"],"name":[1],"type":["chr"],"align":["left"]},{"label":["contacts"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["glasses"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["none"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"female","2":"121","3":"32","4":"129"},{"1":"male","2":"42","3":"37","4":"85"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>It is a little difficult to compare since there are fewer males than females here, but we might suspect that males proportionately are more likely to wear glasses and less likely to wear contacts than females.</p>
<p>Does the data support an association at all?</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">eyewear <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>gender) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">chisq.test</span>() <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> z</span>
<span id="cb4-2">z</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>
    Pearson's Chi-squared test

data:  .
X-squared = 17.718, df = 2, p-value = 0.0001421</code></pre>
</div>
</div>
<p>There is indeed an association.</p>
<p>Coding note: normally <code>chisq.test</code> accepts as input a matrix (eg. output from <code>table</code>), but it also accepts a data frame as long as all the columns are frequencies. So I had to remove the <code>gender</code> column first.<sup>1</sup></p>
<p>So, what kind of association? <code>chisq.test</code> has, as part of its output, <code>residuals</code>. Maybe you remember calculating these tests by hand, and have, lurking in the back of your mind somewhere, “observed minus expected, squared, divide by expected”. There is one of these for each cell, and you add them up to get the test statistic. The “Pearson residuals” in a chi-squared table are the signed square roots of these, where the sign is negative if observed is less than expected:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">eyewear</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["gender"],"name":[1],"type":["chr"],"align":["left"]},{"label":["contacts"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["glasses"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["none"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"female","2":"121","3":"32","4":"129"},{"1":"male","2":"42","3":"37","4":"85"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">z<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>residuals</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>      contacts   glasses       none
[1,]  1.766868 -1.760419 -0.5424069
[2,] -2.316898  2.308440  0.7112591</code></pre>
</div>
</div>
<p>The largest (in size) residuals make the biggest contribution to the chi-squared test statistic, so these are the ones where observed and expected are farthest apart. Hence, here, fewer males wear contacts and more males wear glasses compared to what you would expect if there were no association between gender and eyewear.</p>
<p>I am not quite being sexist here: the male and female frequencies are equally far away from the expected in absolute terms:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">eyewear</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["gender"],"name":[1],"type":["chr"],"align":["left"]},{"label":["contacts"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["glasses"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["none"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"female","2":"121","3":"32","4":"129"},{"1":"male","2":"42","3":"37","4":"85"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">z<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>expected</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>      contacts glasses      none
[1,] 103.06278 43.6278 135.30942
[2,]  59.93722 25.3722  78.69058</code></pre>
</div>
</div>
<p>but the contribution to the test statistic is more for the males because there are fewer of them altogether.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>This behaviour undoubtedly comes from the days when matrices had row names which didn’t count as a column.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/understanding-chi-square/</guid>
  <pubDate>Sat, 14 Mar 2020 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/understanding-chi-square/Screenshot from 2025-12-30 22-50-59.png" medium="image" type="image/png" height="86" width="144"/>
</item>
<item>
  <title>Two header rows and other spreadsheets</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/two-header-rows/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Tidying data arranged in odd ways</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>A friend tells you that they are trying to find out which combination of detergent and temperature gets the most dirt off their laundry. They send you a spreadsheet that looks like this:</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/two-header-rows/Screenshot_2019-12-01_17-44-05.png" class="img-fluid figure-img"></p>
<figcaption>Data in spreadsheet</figcaption>
</figure>
</div>
<p>The first row is the name of the detergent (only named once), and the second row is the washing temperature. Below that is the amount of dirt removed from each of four loads of laundry washed under those conditions. (You know that your friend knows <em>something</em> about statistics and would have been careful to randomize loads of laundry to treatments.)</p>
<p>This is not going to be very helpful to you because it has <em>two</em> header rows. Fortunately <a href="https://alison.rbind.io/post/2018-02-23-read-multiple-header-rows/">Alison Hill</a> has a blog post on almost this thing, which we can steal. In hers, the first row was variable names and the second was variable descriptions (which she used to make a data dictionary). Here, though, the column names need to be made out of bits of <em>both</em> rows.</p>
</section>
<section id="making-column-names" class="level2">
<h2 class="anchored" data-anchor-id="making-column-names">Making column names</h2>
<p>The strategy is the same as Alison used (so I’m claiming very little originality here): read the header lines and make column names out of them, then read the rest of the data with the column names that we made.</p>
<p>Your friend supplied you with a <code>.csv</code> file (they do have <em>some</em> training, after all):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">my_file <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"detergent.csv"</span></span>
<span id="cb3-2">headers <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(my_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names=</span>F, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_max=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 2 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): X1, X2, X3, X4, X5, X6

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">headers</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["X1"],"name":[1],"type":["chr"],"align":["left"]},{"label":["X2"],"name":[2],"type":["chr"],"align":["left"]},{"label":["X3"],"name":[3],"type":["chr"],"align":["left"]},{"label":["X4"],"name":[4],"type":["chr"],"align":["left"]},{"label":["X5"],"name":[5],"type":["chr"],"align":["left"]},{"label":["X6"],"name":[6],"type":["chr"],"align":["left"]}],"data":[{"1":"Super","2":"NA","3":"NA","4":"Best","5":"NA","6":"NA"},{"1":"Cold","2":"Warm","3":"Hot","4":"Cold","5":"Warm","6":"Hot"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Couple of things here: we want <code>read_csv</code> to supply some dummy column names, and we want to read only two rows.</p>
<p>To use this, we want to construct some column names, but to do this it will be much easier if we have six rows and a few columns. For me, this is an everything-looks-like-a-nail moment, and I reach for <code>gather</code>, and then stop myself just in time to use <code>pivot_longer</code> instead. To keep things straight, I’m going to make a new column first so that I know what is what, and then use the default column names <code>name</code> and <code>value</code> in <code>pivot_longer</code> until I figure out what I’m doing:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">headers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">what=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"detergent"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"temperature"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>what)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["what"],"name":[1],"type":["chr"],"align":["left"]},{"label":["name"],"name":[2],"type":["chr"],"align":["left"]},{"label":["value"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"detergent","2":"X1","3":"Super"},{"1":"detergent","2":"X2","3":"NA"},{"1":"detergent","2":"X3","3":"NA"},{"1":"detergent","2":"X4","3":"Best"},{"1":"detergent","2":"X5","3":"NA"},{"1":"detergent","2":"X6","3":"NA"},{"1":"temperature","2":"X1","3":"Cold"},{"1":"temperature","2":"X2","3":"Warm"},{"1":"temperature","2":"X3","3":"Hot"},{"1":"temperature","2":"X4","3":"Cold"},{"1":"temperature","2":"X5","3":"Warm"},{"1":"temperature","2":"X6","3":"Hot"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>So now it looks as if I want to <code>pivot_wider</code> that column <code>what</code>, getting the values from <code>value</code>. (At this point, I feel a nagging doubt that I can do this with one <code>pivot_longer</code>, but anyway):<sup>1</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">headers <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">what=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"detergent"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"temperature"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>what) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_wider</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_from=</span>what, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_from=</span>value) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d1</span>
<span id="cb7-4">d1</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["detergent"],"name":[2],"type":["chr"],"align":["left"]},{"label":["temperature"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"X1","2":"Super","3":"Cold"},{"1":"X2","2":"NA","3":"Warm"},{"1":"X3","2":"NA","3":"Hot"},{"1":"X4","2":"Best","3":"Cold"},{"1":"X5","2":"NA","3":"Warm"},{"1":"X6","2":"NA","3":"Hot"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Much better. Next, I need to fill those missing values in <code>detergent</code>, and then I glue those two things together to make my column names:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">d1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fill</span>(detergent) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mycol=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(detergent, temperature, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sep=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_"</span>)) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> d2</span>
<span id="cb8-3">d2</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["name"],"name":[1],"type":["chr"],"align":["left"]},{"label":["detergent"],"name":[2],"type":["chr"],"align":["left"]},{"label":["temperature"],"name":[3],"type":["chr"],"align":["left"]},{"label":["mycol"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"X1","2":"Super","3":"Cold","4":"Super_Cold"},{"1":"X2","2":"Super","3":"Warm","4":"Super_Warm"},{"1":"X3","2":"Super","3":"Hot","4":"Super_Hot"},{"1":"X4","2":"Best","3":"Cold","4":"Best_Cold"},{"1":"X5","2":"Best","3":"Warm","4":"Best_Warm"},{"1":"X6","2":"Best","3":"Hot","4":"Best_Hot"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>and then grab my desired column names as a vector:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">d2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(mycol) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> my_col_names</span></code></pre></div></div>
</div>
</section>
<section id="constructing-the-data-frame-with-the-rest-of-the-data" class="level2">
<h2 class="anchored" data-anchor-id="constructing-the-data-frame-with-the-rest-of-the-data">Constructing the data frame with the rest of the data</h2>
<p>Now we need to read the actual data, which means skipping the first <em>two</em> rows, and while doing so, use the column names we made as column names for the data frame (Alison’s idea again):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">laundry <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(my_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">skip=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_names=</span>my_col_names)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Rows: 4 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (6): Super_Cold, Super_Warm, Super_Hot, Best_Cold, Best_Warm, Best_Hot

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">laundry</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Super_Cold"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["Super_Warm"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Super_Hot"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Best_Cold"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["Best_Warm"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["Best_Hot"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"4","2":"7","3":"10","4":"6","5":"13","6":"12"},{"1":"5","2":"9","3":"12","4":"6","5":"15","6":"13"},{"1":"6","2":"8","3":"11","4":"4","5":"12","6":"10"},{"1":"5","2":"12","3":"9","4":"4","5":"12","6":"13"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Looking good so far.</p>
<p>We need to make this longer to do anything useful with it. Each column name encodes two things: a detergent name and a temperature, and this can be made longer in one shot by using <em>two</em> things in <code>names_to</code> in <code>pivot_longer</code>. This means I also have to say what those two names are separated by (which I forgot the first time, but the error message was helpful):</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">laundry <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>(), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"detergent"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"temperature"</span>), </span>
<span id="cb13-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_sep=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"_"</span>, </span>
<span id="cb13-4">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"dirt_removed"</span>) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> laundry_tidy</span>
<span id="cb13-5">laundry_tidy</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["detergent"],"name":[1],"type":["chr"],"align":["left"]},{"label":["temperature"],"name":[2],"type":["chr"],"align":["left"]},{"label":["dirt_removed"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"Super","2":"Cold","3":"4"},{"1":"Super","2":"Warm","3":"7"},{"1":"Super","2":"Hot","3":"10"},{"1":"Best","2":"Cold","3":"6"},{"1":"Best","2":"Warm","3":"13"},{"1":"Best","2":"Hot","3":"12"},{"1":"Super","2":"Cold","3":"5"},{"1":"Super","2":"Warm","3":"9"},{"1":"Super","2":"Hot","3":"12"},{"1":"Best","2":"Cold","3":"6"},{"1":"Best","2":"Warm","3":"15"},{"1":"Best","2":"Hot","3":"13"},{"1":"Super","2":"Cold","3":"6"},{"1":"Super","2":"Warm","3":"8"},{"1":"Super","2":"Hot","3":"11"},{"1":"Best","2":"Cold","3":"4"},{"1":"Best","2":"Warm","3":"12"},{"1":"Best","2":"Hot","3":"10"},{"1":"Super","2":"Cold","3":"5"},{"1":"Super","2":"Warm","3":"12"},{"1":"Super","2":"Hot","3":"9"},{"1":"Best","2":"Cold","3":"4"},{"1":"Best","2":"Warm","3":"12"},{"1":"Best","2":"Hot","3":"13"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>Success.</p>
</section>
<section id="a-plot" class="level2">
<h2 class="anchored" data-anchor-id="a-plot">A plot</h2>
<p>There are four observations per combination of detergent and temperature, so that devotees of ANOVA among you will know that we can test for a significant interaction effect between detergent and temperature on the amount of dirt removed. (That is to say, the effect of temperature on dirt removed might be different for each detergent, and we have enough data to see whether that is indeed the case “for all laundry loads”.)</p>
<p>To see whether this is likely, we can make an <em>interaction plot</em>: plot the mean dirt removed for each temperature, separately for each detergent, and then join the results for each temperature by lines (coloured by detergent). This can be done by first making a data frame of means using <code>group_by</code> and <code>summarize</code>, or like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(laundry_tidy, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_inorder</span>(temperature), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y=</span>dirt_removed, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">colour=</span>detergent, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group=</span>detergent)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_summary</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun.y=</span>mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">geom=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"point"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_summary</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun.y=</span>mean, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">geom=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"line"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.</code></pre>
</div>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://blog.ritsokiguess.site/posts/two-header-rows/index_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid figure-img" width="672"></p>
</figure>
</div>
</div>
</div>
<p>Code-wise, the last two lines are a kind of funky <code>geom_point</code> and <code>geom_line</code>, except that instead of plotting the actual amounts of dirt removed, we plot the <em>mean</em> dirt removed each time. (The <code>fct_inorder</code> plots the temperatures in the sensible order that they appear in the data, rather than alphabetical order.)</p>
<p>Statistically, if the two traces are more or less parallel, the two factors detergent and temperature act independently on the amount of dirt removed. But that is not the case here: a warm temperature is the best for Best detergent, while a hot temperature is best for Super detergent.<sup>2</sup></p>
</section>
<section id="as-in-actual-website" class="level2">
<h2 class="anchored" data-anchor-id="as-in-actual-website">As in actual website</h2>
<p>So I lied to you (for the purpose of telling a story, but I hope a useful one).</p>
<p>Here’s how the data <a href="http://statweb.stanford.edu/~susan/courses/s141/exanova.pdf">were actually laid out</a>:</p>
<pre><code>Detergent    Cold         Warm          Hot
Super     4,5,6,5     7,9,8,12   10,12,11,9
Best      6,6,4,4  13,15,12,12  12,13,10,13</code></pre>
<p>Let’s see whether we can tell a different story by getting these data tidy. (I added the word Detergent to the top left cell to make our lives slightly easier.)</p>
<p>First, this is column-aligned data, so we need <code>read_table</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">my_file<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"laundry.txt"</span></span>
<span id="cb17-2">laundry_2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_table</span>(my_file, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col_types=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cols</span>(</span>
<span id="cb17-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Cold=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb17-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Warm=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>(),</span>
<span id="cb17-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">Hot=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">col_character</span>()</span>
<span id="cb17-6">))</span>
<span id="cb17-7">laundry_2</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Detergent"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Cold"],"name":[2],"type":["chr"],"align":["left"]},{"label":["Warm"],"name":[3],"type":["chr"],"align":["left"]},{"label":["Hot"],"name":[4],"type":["chr"],"align":["left"]}],"data":[{"1":"Super","2":"4,5,6,5","3":"7,9,8,12","4":"10,12,11,9"},{"1":"Best","2":"6,6,4,4","3":"13,15,12,12","4":"12,13,10,13"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>My first go at this turned out to treat the comma as a thousands separator (which was then dropped), so the top left cell got read as the number 4565. This use of <code>col_types</code> forces the columns to be text, so they get left alone.</p>
<p>So now, a standard <code>pivot_longer</code> to begin:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">laundry_2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Detergent, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Temperature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dirt_removed"</span>)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Detergent"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Temperature"],"name":[2],"type":["chr"],"align":["left"]},{"label":["Dirt_removed"],"name":[3],"type":["chr"],"align":["left"]}],"data":[{"1":"Super","2":"Cold","3":"4,5,6,5"},{"1":"Super","2":"Warm","3":"7,9,8,12"},{"1":"Super","2":"Hot","3":"10,12,11,9"},{"1":"Best","2":"Cold","3":"6,6,4,4"},{"1":"Best","2":"Warm","3":"13,15,12,12"},{"1":"Best","2":"Hot","3":"12,13,10,13"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>We have several values for dirt removed, separated by commas. We could use <code>separate</code> to create four new columns and pivot <em>them</em> longer as well. But there is a better way:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1">laundry_2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Detergent, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Temperature"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Dirt_removed"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb19-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">separate_rows</span>(Dirt_removed, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">convert=</span>T) </span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Detergent"],"name":[1],"type":["chr"],"align":["left"]},{"label":["Temperature"],"name":[2],"type":["chr"],"align":["left"]},{"label":["Dirt_removed"],"name":[3],"type":["int"],"align":["right"]}],"data":[{"1":"Super","2":"Cold","3":"4"},{"1":"Super","2":"Cold","3":"5"},{"1":"Super","2":"Cold","3":"6"},{"1":"Super","2":"Cold","3":"5"},{"1":"Super","2":"Warm","3":"7"},{"1":"Super","2":"Warm","3":"9"},{"1":"Super","2":"Warm","3":"8"},{"1":"Super","2":"Warm","3":"12"},{"1":"Super","2":"Hot","3":"10"},{"1":"Super","2":"Hot","3":"12"},{"1":"Super","2":"Hot","3":"11"},{"1":"Super","2":"Hot","3":"9"},{"1":"Best","2":"Cold","3":"6"},{"1":"Best","2":"Cold","3":"6"},{"1":"Best","2":"Cold","3":"4"},{"1":"Best","2":"Cold","3":"4"},{"1":"Best","2":"Warm","3":"13"},{"1":"Best","2":"Warm","3":"15"},{"1":"Best","2":"Warm","3":"12"},{"1":"Best","2":"Warm","3":"12"},{"1":"Best","2":"Hot","3":"12"},{"1":"Best","2":"Hot","3":"13"},{"1":"Best","2":"Hot","3":"10"},{"1":"Best","2":"Hot","3":"13"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This brings us back to where we were before. A couple of notes about <code>separate_rows</code>:</p>
<ul>
<li>it puts each separated value on a new row, and so is a combined <code>separate</code> and <code>pivot_longer</code>.</li>
<li>the default separator between values is everything non-alphanumeric except for a dot. That includes a comma, so we don’t have to do anything special.</li>
<li><code>convert=T</code> says to turn the separated values into whatever they look like (here numbers).</li>
</ul>
<p>From here, we can proceed as before with plots, ANOVA or whatever.</p>
</section>
<section id="references" class="level2">
<h2 class="anchored" data-anchor-id="references">References</h2>
<ul>
<li><a href="http://statweb.stanford.edu/~susan/courses/s141/exanova.pdf">Data from here</a></li>
<li><a href="https://alison.rbind.io/post/2018-02-23-read-multiple-header-rows/">Alison Hill’s blog post</a></li>
<li><a href="https://readr.tidyverse.org/articles/readr.html">Introduction to <code>readr</code></a></li>
<li><a href="https://www.rdocumentation.org/packages/tidyr/versions/0.8.3/topics/separate_rows">Documentation for <code>separate_rows</code></a></li>
</ul>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p>I don’t actually think I can here. I was thinking of <code>.value</code>, but that is used when the names of the columns that I’m making longer contain the names of new columns in them.↩︎</p></li>
<li id="fn2"><p>There are always two ways to express an interaction effect. The other one here is that the two detergents are pretty similar except at warm water temperatures, where Best is a lot better.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>tidying</category>
  <guid>https://blog.ritsokiguess.site/posts/two-header-rows/</guid>
  <pubDate>Sun, 01 Dec 2019 05:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/two-header-rows/tips-for-easier-laundromat-trips-2145703-hero-7284dc0c51e647dab28aae46fdc6428e.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Un-counting</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/un-counting/</link>
  <description><![CDATA[ 





<section id="description" class="level2">
<h2 class="anchored" data-anchor-id="description">Description</h2>
<p>Why you would want to do the opposite of counting.</p>
</section>
<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>You probably know about <code>count</code>, which tells you how many observations you have in each group:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tribble</span>(</span>
<span id="cb3-2">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>g, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>y,</span>
<span id="cb3-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>,</span>
<span id="cb3-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>,</span>
<span id="cb3-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>, </span>
<span id="cb3-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>,</span>
<span id="cb3-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>,</span>
<span id="cb3-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>,</span>
<span id="cb3-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span></span>
<span id="cb3-10">)</span></code></pre></div></div>
</div>
<p>There are four observations in group <code>a</code> and three in group <code>b</code>:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(g) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> counts</span>
<span id="cb4-2">counts</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["g"],"name":[1],"type":["chr"],"align":["left"]},{"label":["n"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"4"},{"1":"b","2":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>I didn’t know about this until fairly recently. Until then, I thought you had to do this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(g) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">count=</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">n</span>()) </span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["g"],"name":[1],"type":["chr"],"align":["left"]},{"label":["count"],"name":[2],"type":["int"],"align":["right"]}],"data":[{"1":"a","2":"4"},{"1":"b","2":"3"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>which works, but is a lot more typing.</p>
</section>
<section id="going-the-other-way" class="level2">
<h2 class="anchored" data-anchor-id="going-the-other-way">Going the other way</h2>
<p>The other day, I had the opposite problem. I had a table of frequencies, and I wanted to get it back to one row per observation (I was fitting a model in Stan, and I didn’t know how to deal with frequencies). I had no idea how you might do that (without something ugly like loops), and I was almost embarrassed to stumble upon this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">counts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uncount</span>(n)</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["g"],"name":[1],"type":["chr"],"align":["left"]}],"data":[{"1":"a"},{"1":"a"},{"1":"a"},{"1":"a"},{"1":"b"},{"1":"b"},{"1":"b"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>My situation was a bit less trivial than that. I had disease category counts of coal miners with different exposures to coal dust:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">my_url<span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://www.utsc.utoronto.ca/~butler/d29/miners-tab.txt"</span></span>
<span id="cb7-2">miners0 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_table</span>(my_url)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>
── Column specification ────────────────────────────────────────────────────────
cols(
  Exposure = col_double(),
  None = col_double(),
  Moderate = col_double(),
  Severe = col_double()
)</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">miners0</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Exposure"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["None"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["Moderate"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["Severe"],"name":[4],"type":["dbl"],"align":["right"]}],"data":[{"1":"5.8","2":"98","3":"0","4":"0"},{"1":"15.0","2":"51","3":"2","4":"1"},{"1":"21.5","2":"34","3":"6","4":"3"},{"1":"27.5","2":"35","3":"5","4":"8"},{"1":"33.5","2":"32","3":"10","4":"9"},{"1":"39.5","2":"23","3":"7","4":"8"},{"1":"46.0","2":"12","3":"6","4":"10"},{"1":"51.5","2":"4","3":"2","4":"5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>This needs tidying to get the frequencies all into one column:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">miners0 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gather</span>(disease, freq, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>Exposure) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">-&gt;</span> miners</span>
<span id="cb10-3">miners</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Exposure"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["disease"],"name":[2],"type":["chr"],"align":["left"]},{"label":["freq"],"name":[3],"type":["dbl"],"align":["right"]}],"data":[{"1":"5.8","2":"None","3":"98"},{"1":"15.0","2":"None","3":"51"},{"1":"21.5","2":"None","3":"34"},{"1":"27.5","2":"None","3":"35"},{"1":"33.5","2":"None","3":"32"},{"1":"39.5","2":"None","3":"23"},{"1":"46.0","2":"None","3":"12"},{"1":"51.5","2":"None","3":"4"},{"1":"5.8","2":"Moderate","3":"0"},{"1":"15.0","2":"Moderate","3":"2"},{"1":"21.5","2":"Moderate","3":"6"},{"1":"27.5","2":"Moderate","3":"5"},{"1":"33.5","2":"Moderate","3":"10"},{"1":"39.5","2":"Moderate","3":"7"},{"1":"46.0","2":"Moderate","3":"6"},{"1":"51.5","2":"Moderate","3":"2"},{"1":"5.8","2":"Severe","3":"0"},{"1":"15.0","2":"Severe","3":"1"},{"1":"21.5","2":"Severe","3":"3"},{"1":"27.5","2":"Severe","3":"8"},{"1":"33.5","2":"Severe","3":"9"},{"1":"39.5","2":"Severe","3":"8"},{"1":"46.0","2":"Severe","3":"10"},{"1":"51.5","2":"Severe","3":"5"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>So I wanted to fit an ordered logistic regression in Stan, predicting disease category from exposure, but I didn’t know how to handle the frequencies. If I had one row per miner, I thought…</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">miners <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">uncount</span>(freq) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> rmarkdown<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paged_table</span>()</span></code></pre></div></div>
<div class="cell-output-display">
<div data-pagedtable="false">
  <script data-pagedtable-source="" type="application/json">
{"columns":[{"label":["Exposure"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["disease"],"name":[2],"type":["chr"],"align":["left"]}],"data":[{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"5.8","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"15.0","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"21.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"27.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"33.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"39.5","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"46.0","2":"None"},{"1":"51.5","2":"None"},{"1":"51.5","2":"None"},{"1":"51.5","2":"None"},{"1":"51.5","2":"None"},{"1":"15.0","2":"Moderate"},{"1":"15.0","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"21.5","2":"Moderate"},{"1":"27.5","2":"Moderate"},{"1":"27.5","2":"Moderate"},{"1":"27.5","2":"Moderate"},{"1":"27.5","2":"Moderate"},{"1":"27.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"33.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"39.5","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"46.0","2":"Moderate"},{"1":"51.5","2":"Moderate"},{"1":"51.5","2":"Moderate"},{"1":"15.0","2":"Severe"},{"1":"21.5","2":"Severe"},{"1":"21.5","2":"Severe"},{"1":"21.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"27.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"33.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"39.5","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"46.0","2":"Severe"},{"1":"51.5","2":"Severe"},{"1":"51.5","2":"Severe"},{"1":"51.5","2":"Severe"},{"1":"51.5","2":"Severe"},{"1":"51.5","2":"Severe"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>
</div>
</div>
<p>… and so I do. (I scrolled down to check, and <em>eventually</em> got past the 98 miners with 5.8 years of exposure and no disease).</p>
<p>From there, you can use <a href="https://mc-stan.org/docs/2_19/stan-users-guide/ordered-logistic-section.html">this</a> to fit the model, though I would rather have weakly informative priors for their <code>beta</code> and <code>c</code>. <code>c</code> is tricky, since it is ordered, but I used the idea <a href="https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations">here</a> (near the bottom) and it worked smoothly.</p>


</section>

 ]]></description>
  <category>code</category>
  <guid>https://blog.ritsokiguess.site/posts/un-counting/</guid>
  <pubDate>Sat, 13 Jul 2019 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/un-counting/scrshot.png" medium="image" type="image/png" height="139" width="120"/>
</item>
<item>
  <title>Changing a lot of things in a lot of places</title>
  <dc:creator>Ken Butler</dc:creator>
  <link>https://blog.ritsokiguess.site/posts/changing-a-lot/</link>
  <description><![CDATA[ 





<section id="packages" class="level2">
<h2 class="anchored" data-anchor-id="packages">Packages</h2>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span></code></pre></div></div>
<div class="cell-output cell-output-stderr">
<pre><code>── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (&lt;http://conflicted.r-lib.org/&gt;) to force all conflicts to become errors</code></pre>
</div>
</div>
</section>
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>Making a lot of changes in text, all in one go.</p>
<p>Let’s suppose you have a data frame like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">d</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 3
  x1       x2    y     
  &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt; 
1 one      two   two   
2 four     three four  
3 seven    nine  eight 
4 six      eight seven 
5 fourteen nine  twelve</code></pre>
</div>
</div>
<p>What you want to do is to change all the even numbers in columns <code>x1</code> and <code>x2</code>, but <em>not</em> <code>y</code>, to the number versions of themselves, so that, for example, <code>eight</code> becomes <code>8</code>. This would seem to be a job for <code>str_replace_all</code>, but how to manage the multitude of changes?</p>
</section>
<section id="making-a-lot-of-changes-with-str_replace_all" class="level2">
<h2 class="anchored" data-anchor-id="making-a-lot-of-changes-with-str_replace_all">Making a lot of changes with <code>str_replace_all</code></h2>
<p>I learned today that you can feed <code>str_replace_all</code> a <em>named vector</em>. Wossat, you say? Well, one of these:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">quantile</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>  0%  25%  50%  75% 100% 
 1.0  2.5  4.0  5.5  7.0 </code></pre>
</div>
</div>
<p>The numbers are here the five-number summary; the things next to them, that say which percentile they are, are the <code>names</code> attribute. You can make one of these yourself like this:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb7-2">x</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1 2 3</code></pre>
</div>
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(x) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"first"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"second"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"third"</span>)</span>
<span id="cb9-2">x</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code> first second  third 
     1      2      3 </code></pre>
</div>
</div>
<p>The value of this for us is that you can feed the boatload of potential changes into <code>str_replace_all</code> by feeding it a named vector of the changes it might make.</p>
<p>In our example, we wanted to replace the even numbers by the numeric versions of themselves, so let’s make a little data frame with all of those:</p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">changes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tribble</span>(</span>
<span id="cb11-2">  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>from, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>to,</span>
<span id="cb11-3">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"two"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2"</span>,</span>
<span id="cb11-4">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"four"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"4"</span>,</span>
<span id="cb11-5">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"six"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"6"</span>,</span>
<span id="cb11-6">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"eight"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"8"</span>,</span>
<span id="cb11-7">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ten"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"10"</span>,</span>
<span id="cb11-8">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"twelve"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"12"</span>,</span>
<span id="cb11-9">  <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"fourteen"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"14"</span></span>
<span id="cb11-10">)</span></code></pre></div></div>
</div>
<p>I think this is as high as we need to go. I like a <code>tribble</code> for this so that you can easily see what is going to replace what.</p>
<p>For the named vector, the <em>values</em> are the new values (the ones I called <code>to</code> in <code>changes</code>), while the <em>names</em> are the old ones (<code>from</code>). So let’s construct that. There is one extra thing: I want to replace whole words only (and not end up with something like <code>4teen</code>, which sounds like one of those 90s boy bands), so what I’ll do is to put “word boundaries”<sup>1</sup> around the <code>from</code> values:<sup>2</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1">my_changes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> changes<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>to</span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(my_changes) <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">b"</span>, changes<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>from, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">b"</span>)</span>
<span id="cb12-3">my_changes</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>     \\btwo\\b     \\bfour\\b      \\bsix\\b    \\beight\\b      \\bten\\b 
           "2"            "4"            "6"            "8"           "10" 
  \\btwelve\\b \\bfourteen\\b 
          "12"           "14" </code></pre>
</div>
</div>
<p>and that seems to reflect the changes we want to make. So let’s make it go, just on columns <code>x1</code> and <code>x2</code>:<sup>3</sup></p>
<div class="cell">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">d <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate_at</span>(</span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"x"</span>)),</span>
<span id="cb14-3">       <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace_all</span>(., my_changes)</span>
<span id="cb14-4">  )</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 5 × 3
  x1    x2    y     
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; 
1 one   2     two   
2 4     three four  
3 seven nine  eight 
4 6     8     seven 
5 14    nine  twelve</code></pre>
</div>
</div>
<p>“for each of the columns that starts with <code>x</code>, replace everything in it that’s in the recipe in <code>my_changes</code>.”</p>
<p>It seems to have worked, and not a 90s boy band in sight.</p>


</section>


<div id="quarto-appendix" class="default"><section id="footnotes" class="footnotes footnotes-end-of-document"><h2 class="anchored quarto-appendix-heading">Footnotes</h2>

<ol>
<li id="fn1"><p><a href="https://stackoverflow.com/questions/24085680/why-do-backslashes-appear-twice">This Stack Overflow answer</a> explains why the backslashes need to be doubled. The answer is for Python, but the same issue applies to R.↩︎</p></li>
<li id="fn2"><p>This means that the number names only match if they are surrounded by non-word characters, that is, spaces, or the beginning or end of the text.↩︎</p></li>
<li id="fn3"><p>The modern way to do this is to use <code>across</code>, but I wrote this post in 2019, and this is all we had then.↩︎</p></li>
</ol>
</section></div> ]]></description>
  <category>code</category>
  <category>analysis</category>
  <guid>https://blog.ritsokiguess.site/posts/changing-a-lot/</guid>
  <pubDate>Sun, 12 May 2019 04:00:00 GMT</pubDate>
  <media:content url="https://blog.ritsokiguess.site/posts/changing-a-lot/stringr-logo.png" medium="image" type="image/png" height="166" width="144"/>
</item>
</channel>
</rss>
