Chapter 6.2.3.1 Exercises

Untitled
In [1]:
# we introduce one-way analysis of variance (ANOVA) through the analysis of a motivating example.
# Consider the red.cell.folate data in ISwR. We have data on folate levels of patients under three different treatments. Suppose we want to know whether treatment has a significant effect on the folate level of the patient. How would we proceed? First, we would want to plot the data. A boxplot is a good idea.

library(ISwR)
library(ggplot2)
red.cell.folate <- ISwR::red.cell.folate; red.cell.folate
ggplot(red.cell.folate, aes(x = ventilation, y = folate)) + 
  geom_boxplot()

# Looking at this data, it is not clear whether there is a difference. It seems that the folate level under treatment 1 is perhaps higher. We could try to compare the means of each pair of ventilations using t.test(). The problem with that is we would be doing three tests, and we would need to adjust our p-values accordingly. We are going to take a different approach, which is ANOVA.
folateventilation
243 N2O+O2,24h
251 N2O+O2,24h
275 N2O+O2,24h
291 N2O+O2,24h
347 N2O+O2,24h
354 N2O+O2,24h
380 N2O+O2,24h
392 N2O+O2,24h
206 N2O+O2,op
210 N2O+O2,op
226 N2O+O2,op
249 N2O+O2,op
255 N2O+O2,op
273 N2O+O2,op
285 N2O+O2,op
295 N2O+O2,op
309 N2O+O2,op
241 O2,24h
258 O2,24h
270 O2,24h
293 O2,24h
328 O2,24h
In [6]:
library(dplyr)

red.cell.folate %>% group_by(ventilation) %>% summarize(mean(folate))
mean(red.cell.folate$folate)
ventilationmean(folate)
N2O+O2,24h316.6250
N2O+O2,op 256.4444
O2,24h 278.0000
283.227272727273
In [24]:
# Returning to red.cell.folate, we use R to do all of the computations for us. First, build a linear model for folate as explained by ventilation, and then apply the anova function to the model. anova produces a table which contains all of the computations from the discussion above.

rcf.mod <- lm(folate~ventilation, data = red.cell.folate)
anova(rcf.mod)
# The two rows ventilation and Residuals correspond to the between group and within group variation.
# The first column, Df gives the degrees of freedom in each case. Since k=3, the between group variation has k−1=2 degrees of freedom, and since N=22, the within group variation (Residuals) has N−k=19 degrees of freedom.
# The Sum Sq column gives SSDB and SSDW.
# The Mean Sq variable is the Sum Sq value divided by the degrees of freedom. These two numbers are the numerator and denominator of the test statistic, F. So here, F=7757.9/2090.3=3.7113.
# To compute the p-value, we need the area under the tail of the F distribution above F=3.7113. Note that this is a one-tailed test, because small values of F would actually be evidence in favor of H0.
DfSum SqMean SqF valuePr(>F)
ventilation 2 15515.77 7757.883 3.711336 0.04358933
Residuals19 39716.10 2090.321 NA NA
In [25]:
rcf.mod
library(moderndive)
get_regression_table(rcf.mod)
Call:
lm(formula = folate ~ ventilation, data = red.cell.folate)

Coefficients:
         (Intercept)  ventilationN2O+O2,op     ventilationO2,24h  
              316.63                -60.18                -38.63  
termestimatestd_errorstatisticp_valuelower_ciupper_ci
intercept 316.625 16.164 19.588 0.000 282.792 350.458
ventilationN2O+O2,op-60.181 22.216 -2.709 0.014 -106.679 -13.682
ventilationO2,24h -38.625 26.064 -1.482 0.155 -93.178 15.928
In [10]:
# The ANOVA table gives the p-value as Pr(>F), here P=0.04359. We could compute this from F using:

pf(3.7113, df1 = 2, df2 = 19, lower.tail = FALSE)
0.0435904629985431
In [9]:
# According to the p-value, we would reject at the α=.05 level, but some caution is in order. Let’s continue by checking the assumptions in the model. As in regression, plotting the linear model object results in a series of residual plots.

plot(rcf.mod)

# The variances of each group did not look equal in the boxplot, and the Scale-Location plot suggests that the rightmost group does have somewhat higher variation than the others. The x-axis of these plots show Fitted values, which in ANOVA are the group means. So the rightmost cluster of points corresponds to the group with mean 316, which is group 1, the N2O+O2,24h group.

# The Normal Q-Q plot suggests the data may be thick tailed. I would put a strongly worded caveat in any report claiming significance based on this data, but it is certainly evidence that perhaps a larger study would be useful.
In [10]:
# The built in data set InsectSprays records the counts of insects in agricultural experimental units treated with different insecticides. Plotting the data reveals an obvious difference in effectiveness among the spray types:

ggplot(InsectSprays, aes(x=spray, y=count))+geom_boxplot()
In [11]:
# Not surprisingly, ANOVA strongly rejects the null hypothesis that all sprays have the same mean, with a p-value less than 2.2×10−16:

IS.mod <- lm(count ~ spray, data = InsectSprays)
anova(IS.mod)
DfSum SqMean SqF valuePr(>F)
spray 5 2668.833 533.76667 34.70228 3.182584e-17
Residuals66 1015.167 15.38131 NA NA
In [14]:
IS.mod <- lm(sqrt(count) ~ spray, data = InsectSprays)
plot(IS.mod)
In [15]:
anova(IS.mod)
DfSum SqMean SqF valuePr(>F)
spray 5 88.43787 17.6875745 44.79933 6.334505e-20
Residuals66 26.05798 0.3948178 NA NA
In [16]:
# Or, if normality of the data within each group is not in question (only the equal variance assumption) then oneway.test can be used.

oneway.test(count~spray, data = InsectSprays)
	One-way analysis of means (not assuming equal variances)

data:  count and spray
F = 36.065, num df = 5.000, denom df = 30.043, p-value = 7.999e-12
In [18]:
# For example, returning to the red.cell.folate data, we could do a pairwise t test via the following command:

pairwise.t.test(red.cell.folate$folate, red.cell.folate$ventilation)
	Pairwise comparisons using t tests with pooled SD 

data:  red.cell.folate$folate and red.cell.folate$ventilation 

          N2O+O2,24h N2O+O2,op
N2O+O2,op 0.042      -        
O2,24h    0.310      0.408    

P value adjustment method: holm 
In [5]:
# You can get a lot of interesting information from ANOVA objects. To see everything that’s stored in one, run the command on an ANOVA object. For example, here’s what’s in our last ANOVA object:
# Show me what's in my aov object

library(yarrr)
cleaner.type.int.aov <- aov(formula = time ~ cleaner * type,
                          data = poopdeck)
names(cleaner.type.int.aov)
  1. 'coefficients'
  2. 'residuals'
  3. 'effects'
  4. 'rank'
  5. 'fitted.values'
  6. 'assign'
  7. 'qr'
  8. 'df.residual'
  9. 'contrasts'
  10. 'xlevels'
  11. 'call'
  12. 'terms'
  13. 'model'
In [8]:
# For example, the "fitted.values" contains the model fits for the dependent variable (time) for every observation in our dataset. We can add these fits back to the dataset with the $ operator and assignment. For example, let’s get the model fitted values from both the interaction model (cleaner.type.aov) and the non-interaction model (cleaner.type.int.aov) and assign them to new columns in the dataframe:

# Add the fits for the interaction model to the dataframe as int.fit

poopdeck$int.fit <- cleaner.type.int.aov$fitted.values

# Add the fits for the main effects model to the dataframe as me.fit

cleaner.type.aov <- aov(formula = time ~ cleaner + type,
                        data = poopdeck)
poopdeck$me.fit <- cleaner.type.aov$fitted.values
In [9]:
# Now let’s look at the first few rows in the table to see the fits for the first few observations.

head(poopdeck)
daycleanertypetimeint.fitme.fit
1 a parrot 47 45.76 54.35667
1 b parrot 55 53.82 53.93667
1 c parrot 64 56.13 47.41667
1 a shark 101 86.28 77.68333
1 b shark 76 77.38 77.26333
1 c shark 63 62.03 70.74333
In [15]:
# Calculate a mixed-effects regression on time with two fixed factors (cleaner and type) and one repeated measure (day)

library(lme4)
(my.mod <- lmer(formula = time ~ cleaner + type + (1|day),
                data = poopdeck))
singular fit
Linear mixed model fit by REML ['lmerMod']
Formula: time ~ cleaner + type + (1 | day)
   Data: poopdeck
REML criterion at convergence: 5334.635
Random effects:
 Groups   Name        Std.Dev.
 day      (Intercept)  0.00   
 Residual             20.88   
Number of obs: 600, groups:  day, 100
Fixed Effects:
(Intercept)     cleanerb     cleanerc    typeshark  
      54.36        -0.42        -6.94        23.33  
convergence code 0; 1 optimizer warnings; 0 lme4 warnings 
In [17]:
pirates
idsexageheightweightheadbandcollegetattoostchestsparrotsfavorite.piratesword.typeeyepatchsword.timebeard.lengthfav.pixargrogg
1 male 28 173.11 70.5 yes JSSFP 9 0 0 Jack Sparrow cutlass 1 0.58 16 Monsters, Inc. 11
2 male 31 209.25 105.6 yes JSSFP 9 11 0 Jack Sparrow cutlass 0 1.11 21 WALL-E 9
3 male 26 169.95 77.1 yes CCCC 10 10 1 Jack Sparrow cutlass 1 1.44 19 Inside Out 7
4 female 31 144.29 58.5 no JSSFP 2 0 2 Jack Sparrow scimitar 1 36.11 2 Inside Out 9
5 female 41 157.85 58.4 yes JSSFP 9 6 4 Hook cutlass 1 0.11 0 Inside Out 14
6 male 26 190.20 85.4 yes CCCC 7 19 0 Jack Sparrow cutlass 1 0.59 17 Monsters University 7
7 female 31 158.05 59.6 yes JSSFP 9 1 7 Blackbeard cutlass 0 3.01 1 Cars 9
8 female 31 172.52 74.5 yes JSSFP 5 13 7 Hook cutlass 1 0.06 1 Inside Out 12
9 female 28 164.57 68.7 yes JSSFP 12 37 2 Anicetus cutlass 0 0.74 1 Toy Story 3 16
10 male 30 183.52 84.7 yes JSSFP 12 69 4 Jack Sparrow cutlass 1 0.71 25 Monsters University 9
11 female 25 162.49 63.5 yes CCCC 10 1 3 Blackbeard cutlass 1 0.78 1 Inside Out 7
12 male 20 163.65 70.0 yes CCCC 14 5 3 Jack Sparrow cutlass 0 0.47 27 Monsters, Inc. 8
13 female 24 171.04 72.6 yes CCCC 8 6 0 Lewis Scot cutlass 0 0.53 0 Monsters, Inc. 12
14 male 26 177.99 78.2 yes CCCC 9 12 3 Jack Sparrow cutlass 1 1.33 19 Inside Out 7
15 female 45 167.65 65.4 yes JSSFP 14 70 0 Hook cutlass 1 0.33 0 Monsters University 9
16 female 40 165.10 65.7 yes JSSFP 8 3 1 Hook scimitar 1 27.56 1 The Incredibles 10
17 male 21 177.52 72.7 yes CCCC 11 3 0 Jack Sparrow cutlass 0 0.33 20 Finding Nemo 15
18 female 30 158.30 65.6 yes JSSFP 7 107 3 Blackbeard cutlass 0 1.55 0 Monsters, Inc. 8
19 female 35 173.07 72.8 yes JSSFP 7 75 14 Hook cutlass 0 0.72 0 Finding Nemo 11
20 male 26 175.45 69.0 yes CCCC 14 30 3 Jack Sparrow cutlass 0 0.40 27 WALL-E 10
21 male 22 187.83 85.8 yes CCCC 10 11 6 Jack Sparrow cutlass 0 0.50 12 Monsters, Inc. 6
22 female 34 179.06 82.1 no JSSFP 5 13 4 Edward Low cutlass 1 0.03 2 Ratatouille 8
23 male 23 158.66 53.5 yes CCCC 13 11 1 Edward Low cutlass 0 0.02 29 Finding Nemo 13
24 male 12 175.29 83.0 yes CCCC 10 3 2 Jack Sparrow cutlass 1 0.14 24 Up 8
25 male 35 187.48 85.6 yes JSSFP 10 18 1 Jack Sparrow sabre 1 11.23 17 Inside Out 2
26 female 26 153.54 54.6 yes CCCC 10 10 3 Blackbeard cutlass 1 1.10 0 Inside Out 9
27 male 15 171.48 66.0 no CCCC 2 2 0 Jack Sparrow banana 1 44.74 18 Inside Out 10
28 male 30 177.04 71.9 yes JSSFP 15 15 4 Edward Low cutlass 1 0.35 29 Inside Out 15
29 female 17 145.98 57.4 yes CCCC 14 1 1 Hook cutlass 0 0.18 0 Finding Nemo 11
30 female 26 134.87 37.9 yes CCCC 12 44 10 Edward Low cutlass 1 0.59 1 Monsters University 2
971 male 27 179.00 68.6 yes CCCC 7 3 12 Jack Sparrow cutlass 0 0.78 17 Finding Nemo 12
972 male 31 177.47 75.5 yes JSSFP 9 7 0 Jack Sparrow cutlass 1 0.05 27 Up 8
973 male 28 183.56 82.5 yes CCCC 10 4 1 Blackbeard cutlass 1 2.72 19 Inside Out 10
974 male 31 188.38 82.8 yes CCCC 14 65 7 Lewis Scot cutlass 1 0.18 21 The Incredibles 10
975 female 35 152.16 58.7 yes JSSFP 8 4 16 Blackbeard cutlass 1 0.70 1 WALL-E 9
976 male 21 176.95 70.9 yes CCCC 7 26 1 Hook cutlass 1 0.01 20 Inside Out 11
977 female 29 160.87 65.1 yes CCCC 14 11 0 Lewis Scot cutlass 1 0.40 1 Monsters University12
978 male 15 176.33 70.2 yes CCCC 12 1 3 Jack Sparrow cutlass 0 1.22 24 The Incredibles 13
979 female 19 152.44 54.7 yes CCCC 11 41 0 Anicetus cutlass 0 1.49 0 Finding Nemo 13
980 female 33 161.61 70.4 yes JSSFP 12 20 4 Hook cutlass 1 1.30 2 Toy Story 2 9
981 female 32 161.10 56.8 yes JSSFP 11 64 1 Lewis Scot cutlass 1 0.50 1 Inside Out 10
982 female 30 170.72 67.7 yes JSSFP 9 51 6 Blackbeard cutlass 1 0.11 1 Inside Out 14
983 male 19 165.42 67.4 yes CCCC 10 24 3 Jack Sparrow scimitar 1 4.05 10 Inside Out 5
984 female 30 186.30 78.3 yes CCCC 12 26 5 Blackbeard cutlass 1 0.14 0 Up 15
985 female 27 149.63 48.7 yes CCCC 11 0 0 Anicetus cutlass 1 0.05 2 WALL-E 6
986 male 27 179.29 79.6 yes CCCC 13 24 6 Jack Sparrow cutlass 0 0.17 30 Finding Nemo 9
987 female 31 161.65 62.1 yes JSSFP 15 30 1 Jack Sparrow cutlass 0 0.36 0 WALL-E 14
988 female 32 160.55 64.5 yes JSSFP 9 19 2 Jack Sparrow cutlass 0 1.54 0 Finding Nemo 7
989 male 23 175.95 81.6 yes CCCC 17 26 1 Jack Sparrow cutlass 1 0.63 25 Inside Out 10
990 male 29 164.79 63.7 yes CCCC 10 37 10 Blackbeard cutlass 1 0.37 23 Inside Out 7
991 male 24 186.27 86.9 yes CCCC 13 6 3 Jack Sparrow cutlass 0 0.34 17 Finding Nemo 14
992 male 26 173.10 73.7 yes CCCC 13 6 0 Jack Sparrow cutlass 0 0.02 23 Finding Nemo 14
993 female 31 161.17 61.4 no JSSFP 7 15 2 Blackbeard scimitar 1 5.45 1 Up 9
994 female 37 153.95 65.2 yes JSSFP 11 12 2 Jack Sparrow cutlass 1 1.94 0 Inside Out 12
995 male 34 172.45 75.0 yes JSSFP 9 18 16 Jack Sparrow cutlass 1 0.16 18 Toy Story 3 10
996 male 26 178.83 70.8 yes CCCC 14 11 2 Jack Sparrow cutlass 0 0.46 20 Up 13
997 female 34 157.20 53.6 yes JSSFP 13 48 2 Hook cutlass 0 0.15 0 Finding Nemo 8
998 male 36 175.74 72.4 yes JSSFP 9 95 14 Jack Sparrow cutlass 1 1.28 18 The Incredibles 11
999 male 30 175.24 70.2 yes JSSFP 9 6 0 Lewis Scot cutlass 1 0.21 24 Inside Out 10
1000 male 30 189.45 84.0 yes JSSFP 10 36 1 Jack Sparrow cutlass 0 0.95 19 Up 14
In [18]:
tatmov <- lm(formula = tattoos ~ fav.pixar,
              data = pirates)
summary(tatmov)
Call:
lm(formula = tattoos ~ fav.pixar, data = pirates)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.3571  -1.9455   0.3642   2.3642   9.2857 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                   10.3571     0.6346  16.322  < 2e-16 ***
fav.pixarBrave                -0.6429     0.9693  -0.663  0.50734    
fav.pixarCars                  0.2143     0.9693   0.221  0.82508    
fav.pixarCars 2                0.1429     1.0991   0.130  0.89661    
fav.pixarFinding Nemo         -0.7213     0.6872  -1.050  0.29413    
fav.pixarInside Out           -0.9492     0.6631  -1.431  0.15261    
fav.pixarMonsters University  -1.9954     0.7229  -2.760  0.00588 ** 
fav.pixarMonsters, Inc.       -0.9971     0.7926  -1.258  0.20864    
fav.pixarRatatouille          -1.6905     2.0398  -0.829  0.40745    
fav.pixarThe Incredibles      -0.3874     0.7573  -0.512  0.60903    
fav.pixarToy Story            -1.4196     1.0523  -1.349  0.17761    
fav.pixarToy Story 2          -1.4371     0.9239  -1.555  0.12015    
fav.pixarToy Story 3          -1.5879     1.1269  -1.409  0.15912    
fav.pixarUp                   -0.7978     0.7058  -1.130  0.25862    
fav.pixarWALL-E               -1.1418     0.7590  -1.504  0.13284    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.358 on 985 degrees of freedom
Multiple R-squared:  0.01992,	Adjusted R-squared:  0.005988 
F-statistic:  1.43 on 14 and 985 DF,  p-value: 0.1321
In [19]:
with(pirates,
     table(fav.pixar))
fav.pixar
       A Bug's Life               Brave                Cars              Cars 2 
                 28                  21                  21                  14 
       Finding Nemo          Inside Out Monsters University      Monsters, Inc. 
                162                 304                  94                  50 
        Ratatouille     The Incredibles           Toy Story         Toy Story 2 
                  3                  66                  16                  25 
        Toy Story 3                  Up              WALL-E 
                 13                 118                  65 
In [62]:
tatmov.aov <- aov(formula = tattoos ~ fav.pixar,
              data = pirates)
summary(tatmov.aov)

# aov(tatmov) returns the same.
             Df Sum Sq Mean Sq F value Pr(>F)
fav.pixar    14    226   16.12    1.43  0.132
Residuals   985  11105   11.27               
             Df Sum Sq Mean Sq F value Pr(>F)
fav.pixar    14    226   16.12    1.43  0.132
Residuals   985  11105   11.27               
In [51]:
tatpir <- lm(formula = tattoos ~ favorite.pirate,
              data = pirates)
summary(tatpir)
Call:
lm(formula = tattoos ~ favorite.pirate, data = pirates)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4706 -2.2360  0.2054  2.5294  9.5294 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  10.2360     0.3566  28.707   <2e-16 ***
favorite.pirateBlackbeard    -0.9586     0.4714  -2.034   0.0423 *  
favorite.pirateEdward Low    -0.7654     0.4879  -1.569   0.1171    
favorite.pirateHook          -1.1077     0.4731  -2.341   0.0194 *  
favorite.pirateJack Sparrow  -0.7766     0.3899  -1.992   0.0467 *  
favorite.pirateLewis Scot    -1.1173     0.4723  -2.366   0.0182 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.364 on 994 degrees of freedom
Multiple R-squared:  0.007346,	Adjusted R-squared:  0.002353 
F-statistic: 1.471 on 5 and 994 DF,  p-value: 0.1965
In [54]:
TukeyHSD(tatpir.aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = tattoos ~ favorite.pirate, data = pirates)

$favorite.pirate
                               diff        lwr       upr     p adj
Blackbeard-Anicetus     -0.95864413 -2.3046637 0.3873755 0.3239509
Edward Low-Anicetus     -0.76536682 -2.1585550 0.6278214 0.6195577
Hook-Anicetus           -1.10774993 -2.4586831 0.2431833 0.1786080
Jack Sparrow-Anicetus   -0.77661440 -1.8898496 0.3366209 0.3476254
Lewis Scot-Anicetus     -1.11731099 -2.4657688 0.2311468 0.1693857
Edward Low-Blackbeard    0.19327731 -1.1027399 1.4892945 0.9982207
Hook-Blackbeard         -0.14910580 -1.3995888 1.1013772 0.9993972
Jack Sparrow-Blackbeard  0.18202973 -0.8068989 1.1709584 0.9951622
Lewis Scot-Blackbeard   -0.15866686 -1.4064752 1.0891415 0.9991759
Hook-Edward Low         -0.34238311 -1.6435028 0.9587366 0.9752881
Jack Sparrow-Edward Low -0.01124758 -1.0634760 1.0409808 1.0000000
Lewis Scot-Edward Low   -0.35194417 -1.6504935 0.9466052 0.9718794
Jack Sparrow-Hook        0.33113553 -0.6644707 1.3267417 0.9333242
Lewis Scot-Hook         -0.00956106 -1.2626682 1.2435461 1.0000000
Lewis Scot-Jack Sparrow -0.34069659 -1.3329413 0.6515482 0.9241969
In [55]:
tatmopi <- lm(formula = tattoos ~ favorite.pirate + fav.pixar,
              data = pirates)
summary(tatmopi)
Call:
lm(formula = tattoos ~ favorite.pirate + fav.pixar, data = pirates)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.0659  -2.0862   0.3334   2.3334   9.2239 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  11.14095    0.71937  15.487   <2e-16 ***
favorite.pirateBlackbeard    -0.85787    0.47455  -1.808   0.0710 .  
favorite.pirateEdward Low    -0.72080    0.49007  -1.471   0.1417    
favorite.pirateHook          -1.07866    0.47584  -2.267   0.0236 *  
favorite.pirateJack Sparrow  -0.70668    0.39355  -1.796   0.0729 .  
favorite.pirateLewis Scot    -1.07509    0.47651  -2.256   0.0243 *  
fav.pixarBrave               -0.64400    0.96896  -0.665   0.5064    
fav.pixarCars                 0.08289    0.97270   0.085   0.9321    
fav.pixarCars 2               0.11807    1.09912   0.107   0.9145    
fav.pixarFinding Nemo        -0.76770    0.68921  -1.114   0.2656    
fav.pixarInside Out          -0.97605    0.66496  -1.468   0.1425    
fav.pixarMonsters University -2.00759    0.72454  -2.771   0.0057 ** 
fav.pixarMonsters, Inc.      -1.04895    0.79379  -1.321   0.1867    
fav.pixarRatatouille         -1.87565    2.04353  -0.918   0.3589    
fav.pixarThe Incredibles     -0.44642    0.75959  -0.588   0.5569    
fav.pixarToy Story           -1.51606    1.05233  -1.441   0.1500    
fav.pixarToy Story 2         -1.38137    0.92493  -1.493   0.1356    
fav.pixarToy Story 3         -1.83487    1.13094  -1.622   0.1050    
fav.pixarUp                  -0.84124    0.70693  -1.190   0.2343    
fav.pixarWALL-E              -1.11314    0.75932  -1.466   0.1430    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.355 on 980 degrees of freedom
Multiple R-squared:  0.02662,	Adjusted R-squared:  0.007745 
F-statistic:  1.41 on 19 and 980 DF,  p-value: 0.1127
In [58]:
tatmopii <- lm(formula = tattoos ~ favorite.pirate * fav.pixar,
              data = pirates)
summary(tatmopii)
Call:
lm(formula = tattoos ~ favorite.pirate * fav.pixar, data = pirates)

Residuals:
   Min     1Q Median     3Q    Max 
-9.722 -2.000  0.250  2.171  8.500 

Coefficients: (5 not defined because of singularities)
                                                          Estimate Std. Error
(Intercept)                                               10.66667    1.94125
favorite.pirateBlackbeard                                 -0.50000    2.37754
favorite.pirateEdward Low                                  2.33333    2.74535
favorite.pirateHook                                       -1.41667    2.56804
favorite.pirateJack Sparrow                                1.47619    2.32024
favorite.pirateLewis Scot                                 -3.46667    2.45551
fav.pixarBrave                                             1.33333    3.06939
fav.pixarCars                                              0.33333    2.45551
fav.pixarCars 2                                            4.33333    3.88251
fav.pixarFinding Nemo                                     -1.30952    2.13916
fav.pixarInside Out                                       -0.23188    2.06398
fav.pixarMonsters University                              -0.66667    2.45551
fav.pixarMonsters, Inc.                                    0.73333    2.45551
fav.pixarRatatouille                                       2.33333    3.88251
fav.pixarThe Incredibles                                  -0.66667    2.27632
fav.pixarToy Story                                         0.33333    2.74535
fav.pixarToy Story 2                                      -5.66667    3.88251
fav.pixarToy Story 3                                       0.08333    2.56804
fav.pixarUp                                               -1.57576    2.19003
fav.pixarWALL-E                                            0.33333    2.74535
favorite.pirateBlackbeard:fav.pixarBrave                  -1.70000    3.68327
favorite.pirateEdward Low:fav.pixarBrave                  -3.83333    4.34077
favorite.pirateHook:fav.pixarBrave                         1.16667    3.88251
favorite.pirateJack Sparrow:fav.pixarBrave                -4.97619    3.59450
favorite.pirateLewis Scot:fav.pixarBrave                  -2.53333    4.16352
favorite.pirateBlackbeard:fav.pixarCars                   -1.50000    4.38397
favorite.pirateEdward Low:fav.pixarCars                    2.66667    4.59384
favorite.pirateHook:fav.pixarCars                          1.61667    3.33421
favorite.pirateJack Sparrow:fav.pixarCars                 -2.72619    3.00961
favorite.pirateLewis Scot:fav.pixarCars                    0.46667    4.42674
favorite.pirateBlackbeard:fav.pixarCars 2                 -2.50000    4.75508
favorite.pirateEdward Low:fav.pixarCars 2                 -9.33333    4.94924
favorite.pirateHook:fav.pixarCars 2                       -3.58333    5.40422
favorite.pirateJack Sparrow:fav.pixarCars 2               -5.47619    4.30966
favorite.pirateLewis Scot:fav.pixarCars 2                 -3.53333    4.79454
favorite.pirateBlackbeard:fav.pixarFinding Nemo            1.33036    2.67709
favorite.pirateEdward Low:fav.pixarFinding Nemo           -1.96825    2.99542
favorite.pirateHook:fav.pixarFinding Nemo                  1.52106    2.87611
favorite.pirateJack Sparrow:fav.pixarFinding Nemo         -1.16667    2.51607
favorite.pirateLewis Scot:fav.pixarFinding Nemo            3.40952    2.72072
favorite.pirateBlackbeard:fav.pixarInside Out             -1.21064    2.55618
favorite.pirateEdward Low:fav.pixarInside Out             -3.26812    2.88834
favorite.pirateHook:fav.pixarInside Out                   -0.18028    2.71880
favorite.pirateJack Sparrow:fav.pixarInside Out           -2.37625    2.43999
favorite.pirateLewis Scot:fav.pixarInside Out              2.31760    2.61612
favorite.pirateBlackbeard:fav.pixarMonsters University    -1.28571    2.95318
favorite.pirateEdward Low:fav.pixarMonsters University    -3.70833    3.34831
favorite.pirateHook:fav.pixarMonsters University          -0.41667    3.13018
favorite.pirateJack Sparrow:fav.pixarMonsters University  -3.41236    2.80805
favorite.pirateLewis Scot:fav.pixarMonsters University     2.84167    3.11509
favorite.pirateBlackbeard:fav.pixarMonsters, Inc.         -2.47143    3.08688
favorite.pirateEdward Low:fav.pixarMonsters, Inc.         -2.40000    3.68327
favorite.pirateHook:fav.pixarMonsters, Inc.               -0.48333    3.41793
favorite.pirateJack Sparrow:fav.pixarMonsters, Inc.       -3.87619    2.84549
favorite.pirateLewis Scot:fav.pixarMonsters, Inc.          1.23333    3.18980
favorite.pirateBlackbeard:fav.pixarRatatouille                  NA         NA
favorite.pirateEdward Low:fav.pixarRatatouille           -10.33333    5.49069
favorite.pirateHook:fav.pixarRatatouille                        NA         NA
favorite.pirateJack Sparrow:fav.pixarRatatouille                NA         NA
favorite.pirateLewis Scot:fav.pixarRatatouille            -1.53333    5.35167
favorite.pirateBlackbeard:fav.pixarThe Incredibles         0.50000    3.29155
favorite.pirateEdward Low:fav.pixarThe Incredibles        -4.19048    3.25041
favorite.pirateHook:fav.pixarThe Incredibles               1.71667    3.02300
favorite.pirateJack Sparrow:fav.pixarThe Incredibles      -1.12135    2.67608
favorite.pirateLewis Scot:fav.pixarThe Incredibles         3.03810    3.00961
favorite.pirateBlackbeard:fav.pixarToy Story              -1.50000    3.88251
favorite.pirateEdward Low:fav.pixarToy Story              -3.33333    4.11802
favorite.pirateHook:fav.pixarToy Story                    -4.08333    4.00200
favorite.pirateJack Sparrow:fav.pixarToy Story            -4.07619    3.37832
favorite.pirateLewis Scot:fav.pixarToy Story               1.96667    3.93074
favorite.pirateBlackbeard:fav.pixarToy Story 2             9.50000    4.75508
favorite.pirateEdward Low:fav.pixarToy Story 2            -1.83333    4.94924
favorite.pirateHook:fav.pixarToy Story 2                   6.16667    4.55264
favorite.pirateJack Sparrow:fav.pixarToy Story 2           2.22381    4.22131
favorite.pirateLewis Scot:fav.pixarToy Story 2             7.30000    4.38397
favorite.pirateBlackbeard:fav.pixarToy Story 3            -2.00000    3.36235
favorite.pirateEdward Low:fav.pixarToy Story 3           -12.08333    4.65496
favorite.pirateHook:fav.pixarToy Story 3                        NA         NA
favorite.pirateJack Sparrow:fav.pixarToy Story 3          -2.97619    3.32208
favorite.pirateLewis Scot:fav.pixarToy Story 3                  NA         NA
favorite.pirateBlackbeard:fav.pixarUp                      1.68687    2.70343
favorite.pirateEdward Low:fav.pixarUp                     -0.72424    3.11372
favorite.pirateHook:fav.pixarUp                            0.71037    2.91414
favorite.pirateJack Sparrow:fav.pixarUp                   -1.13074    2.57232
favorite.pirateLewis Scot:fav.pixarUp                      4.19394    2.84343
favorite.pirateBlackbeard:fav.pixarWALL-E                 -2.80000    3.24834
favorite.pirateEdward Low:fav.pixarWALL-E                 -4.50000    3.63175
favorite.pirateHook:fav.pixarWALL-E                       -0.83333    3.43168
favorite.pirateJack Sparrow:fav.pixarWALL-E               -2.55311    3.09626
favorite.pirateLewis Scot:fav.pixarWALL-E                  1.46667    3.27721
                                                         t value Pr(>|t|)    
(Intercept)                                                5.495 5.07e-08 ***
favorite.pirateBlackbeard                                 -0.210  0.83348    
favorite.pirateEdward Low                                  0.850  0.39559    
favorite.pirateHook                                       -0.552  0.58132    
favorite.pirateJack Sparrow                                0.636  0.52479    
favorite.pirateLewis Scot                                 -1.412  0.15835    
fav.pixarBrave                                             0.434  0.66410    
fav.pixarCars                                              0.136  0.89205    
fav.pixarCars 2                                            1.116  0.26466    
fav.pixarFinding Nemo                                     -0.612  0.54058    
fav.pixarInside Out                                       -0.112  0.91057    
fav.pixarMonsters University                              -0.271  0.78607    
fav.pixarMonsters, Inc.                                    0.299  0.76528    
fav.pixarRatatouille                                       0.601  0.54800    
fav.pixarThe Incredibles                                  -0.293  0.76969    
fav.pixarToy Story                                         0.121  0.90339    
fav.pixarToy Story 2                                      -1.460  0.14476    
fav.pixarToy Story 3                                       0.032  0.97412    
fav.pixarUp                                               -0.720  0.47201    
fav.pixarWALL-E                                            0.121  0.90339    
favorite.pirateBlackbeard:fav.pixarBrave                  -0.462  0.64452    
favorite.pirateEdward Low:fav.pixarBrave                  -0.883  0.37741    
favorite.pirateHook:fav.pixarBrave                         0.300  0.76387    
favorite.pirateJack Sparrow:fav.pixarBrave                -1.384  0.16658    
favorite.pirateLewis Scot:fav.pixarBrave                  -0.608  0.54303    
favorite.pirateBlackbeard:fav.pixarCars                   -0.342  0.73231    
favorite.pirateEdward Low:fav.pixarCars                    0.580  0.56173    
favorite.pirateHook:fav.pixarCars                          0.485  0.62788    
favorite.pirateJack Sparrow:fav.pixarCars                 -0.906  0.36527    
favorite.pirateLewis Scot:fav.pixarCars                    0.105  0.91607    
favorite.pirateBlackbeard:fav.pixarCars 2                 -0.526  0.59919    
favorite.pirateEdward Low:fav.pixarCars 2                 -1.886  0.05964 .  
favorite.pirateHook:fav.pixarCars 2                       -0.663  0.50746    
favorite.pirateJack Sparrow:fav.pixarCars 2               -1.271  0.20417    
favorite.pirateLewis Scot:fav.pixarCars 2                 -0.737  0.46134    
favorite.pirateBlackbeard:fav.pixarFinding Nemo            0.497  0.61935    
favorite.pirateEdward Low:fav.pixarFinding Nemo           -0.657  0.51129    
favorite.pirateHook:fav.pixarFinding Nemo                  0.529  0.59703    
favorite.pirateJack Sparrow:fav.pixarFinding Nemo         -0.464  0.64298    
favorite.pirateLewis Scot:fav.pixarFinding Nemo            1.253  0.21046    
favorite.pirateBlackbeard:fav.pixarInside Out             -0.474  0.63589    
favorite.pirateEdward Low:fav.pixarInside Out             -1.131  0.25815    
favorite.pirateHook:fav.pixarInside Out                   -0.066  0.94715    
favorite.pirateJack Sparrow:fav.pixarInside Out           -0.974  0.33038    
favorite.pirateLewis Scot:fav.pixarInside Out              0.886  0.37591    
favorite.pirateBlackbeard:fav.pixarMonsters University    -0.435  0.66340    
favorite.pirateEdward Low:fav.pixarMonsters University    -1.108  0.26836    
favorite.pirateHook:fav.pixarMonsters University          -0.133  0.89413    
favorite.pirateJack Sparrow:fav.pixarMonsters University  -1.215  0.22460    
favorite.pirateLewis Scot:fav.pixarMonsters University     0.912  0.36189    
favorite.pirateBlackbeard:fav.pixarMonsters, Inc.         -0.801  0.42356    
favorite.pirateEdward Low:fav.pixarMonsters, Inc.         -0.652  0.51483    
favorite.pirateHook:fav.pixarMonsters, Inc.               -0.141  0.88758    
favorite.pirateJack Sparrow:fav.pixarMonsters, Inc.       -1.362  0.17346    
favorite.pirateLewis Scot:fav.pixarMonsters, Inc.          0.387  0.69911    
favorite.pirateBlackbeard:fav.pixarRatatouille                NA       NA    
favorite.pirateEdward Low:fav.pixarRatatouille            -1.882  0.06016 .  
favorite.pirateHook:fav.pixarRatatouille                      NA       NA    
favorite.pirateJack Sparrow:fav.pixarRatatouille              NA       NA    
favorite.pirateLewis Scot:fav.pixarRatatouille            -0.287  0.77455    
favorite.pirateBlackbeard:fav.pixarThe Incredibles         0.152  0.87930    
favorite.pirateEdward Low:fav.pixarThe Incredibles        -1.289  0.19765    
favorite.pirateHook:fav.pixarThe Incredibles               0.568  0.57026    
favorite.pirateJack Sparrow:fav.pixarThe Incredibles      -0.419  0.67529    
favorite.pirateLewis Scot:fav.pixarThe Incredibles         1.009  0.31302    
favorite.pirateBlackbeard:fav.pixarToy Story              -0.386  0.69933    
favorite.pirateEdward Low:fav.pixarToy Story              -0.809  0.41847    
favorite.pirateHook:fav.pixarToy Story                    -1.020  0.30784    
favorite.pirateJack Sparrow:fav.pixarToy Story            -1.207  0.22791    
favorite.pirateLewis Scot:fav.pixarToy Story               0.500  0.61696    
favorite.pirateBlackbeard:fav.pixarToy Story 2             1.998  0.04603 *  
favorite.pirateEdward Low:fav.pixarToy Story 2            -0.370  0.71115    
favorite.pirateHook:fav.pixarToy Story 2                   1.355  0.17590    
favorite.pirateJack Sparrow:fav.pixarToy Story 2           0.527  0.59846    
favorite.pirateLewis Scot:fav.pixarToy Story 2             1.665  0.09622 .  
favorite.pirateBlackbeard:fav.pixarToy Story 3            -0.595  0.55211    
favorite.pirateEdward Low:fav.pixarToy Story 3            -2.596  0.00959 ** 
favorite.pirateHook:fav.pixarToy Story 3                      NA       NA    
favorite.pirateJack Sparrow:fav.pixarToy Story 3          -0.896  0.37055    
favorite.pirateLewis Scot:fav.pixarToy Story 3                NA       NA    
favorite.pirateBlackbeard:fav.pixarUp                      0.624  0.53280    
favorite.pirateEdward Low:fav.pixarUp                     -0.233  0.81613    
favorite.pirateHook:fav.pixarUp                            0.244  0.80747    
favorite.pirateJack Sparrow:fav.pixarUp                   -0.440  0.66035    
favorite.pirateLewis Scot:fav.pixarUp                      1.475  0.14057    
favorite.pirateBlackbeard:fav.pixarWALL-E                 -0.862  0.38892    
favorite.pirateEdward Low:fav.pixarWALL-E                 -1.239  0.21564    
favorite.pirateHook:fav.pixarWALL-E                       -0.243  0.80819    
favorite.pirateJack Sparrow:fav.pixarWALL-E               -0.825  0.40982    
favorite.pirateLewis Scot:fav.pixarWALL-E                  0.448  0.65459    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.362 on 915 degrees of freedom
Multiple R-squared:  0.08706,	Adjusted R-squared:  0.003254 
F-statistic: 1.039 on 84 and 915 DF,  p-value: 0.389
In [ ]:
library(yarrr)
head(poopdeck)
yarrr::pirateplot(time ~ cleaner, 
                  data = poopdeck, 
                  theme = 2, 
                  cap.beans = TRUE,
                  main = "formula = time ~ cleaner")
In [2]:
# From the plot, it looks like cleaners a and b are the same, and cleaner c is a bit faster. To test this, we’ll create an ANOVA object with aov. Because time is the dependent variable and cleaner is the independent variable, we’ll set the formula to formula = time ~ cleaner

# Step 1: aov object with time as DV and cleaner as IV

cleaner.aov <- aov(formula = time ~ cleaner,
                   data = poopdeck)

# Now, to see a full ANOVA summary table of the ANOVA object, apply the summary() to the ANOVA object from Step 1.
# Step 2: Look at the summary of the anova object

summary(cleaner.aov)

# The main result from our table is that we have a significant effect of cleaner on cleaning time (F(2, 597) = 5.29, p = 0.005. However, the ANOVA table does not tell us which levels of the independent variable differ. In other words, we don’t know which cleaner is better than which. To answer this, we need to conduct a post-hoc test.
             Df Sum Sq Mean Sq F value  Pr(>F)   
cleaner       2   6057    3028   5.294 0.00526 **
Residuals   597 341511     572                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In [3]:
# If you’ve found a significant effect of a factor, you can then do post-hoc tests to test the difference between each all pairs of levels of the independent variable. There are many types of pairwise comparisons that make different assumptions. One of the most common post-hoc tests for standard ANOVAs is the Tukey Honestly Significant Difference (HSD) test. To do an HSD test, apply the TukeyHSD() function to your ANOVA object as follows:
# Step 3: Conduct post-hoc tests

TukeyHSD(cleaner.aov)

# This table shows us pair-wise differences between each group pair. The diff column shows us the mean differences between groups (which thankfully are identical to what we found in the summary of the regression object before), a confidence interval for the difference, and a p-value testing the null hypothesis that the group differences are not different.
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = time ~ cleaner, data = poopdeck)

$cleaner
     diff        lwr        upr     p adj
b-a -0.42  -6.039575  5.1995746 0.9831441
c-a -6.94 -12.559575 -1.3204254 0.0107324
c-b -6.52 -12.139575 -0.9004254 0.0180906
In [4]:
# I almost always find it helpful to combine an ANOVA summary table with a regression summary table. Because ANOVA is just a special case of regression (where all the independent variables are factors), you’ll get the same results with a regression object as you will with an ANOVA object. However, the format of the results are different and frequently easier to interpret. 
# To create a regression object, use the lm() function. Your inputs to this function will be identical to your inputs to the aov() function
# Step 4: Create a regression object

cleaner.lm <- lm(formula = time ~ cleaner,
                 data = poopdeck)

# Show summary

summary(cleaner.lm)

# As you can see, the regression table does not give us tests for each variable like the ANOVA table does. Instead, it tells us how different each level of an independent variable is from a default value. You can tell which value of an independent variable is the default variable just by seeing which value is missing from the table. In this case, I don’t see a coefficient for cleaner a, so that must be the default value.
# The intercept in the table tells us the mean of the default value. In this case, the mean time of cleaner a was 66.02. The coefficients for the other levels tell us that cleaner b is, on average 0.42 minutes faster than cleaner a, and cleaner c is on average 6.94 minutes faster than cleaner a. Not surprisingly, these are the same differences we saw in the Tukey HSD test!
Call:
lm(formula = time ~ cleaner, data = poopdeck)

Residuals:
   Min     1Q Median     3Q    Max 
-63.02 -16.60  -1.05  16.92  71.92 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   66.020      1.691  39.037  < 2e-16 ***
cleanerb      -0.420      2.392  -0.176  0.86066    
cleanerc      -6.940      2.392  -2.902  0.00385 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.92 on 597 degrees of freedom
Multiple R-squared:  0.01743,	Adjusted R-squared:  0.01413 
F-statistic: 5.294 on 2 and 597 DF,  p-value: 0.005261
In [5]:
pirateplot(formula = time ~ cleaner + type,
           data = poopdeck,
           ylim = c(0, 150),
           xlab = "Cleaner",
           ylab = "Cleaning Time (minutes)",
           main = "poopdeck data",
           back.col = gray(.97), 
           cap.beans = TRUE, 
           theme = 2)
In [19]:
# we introduce one-way analysis of variance (ANOVA) through the analysis of a motivating example.
# Consider the red.cell.folate data in ISwR. We have data on folate levels of patients under three different treatments. Suppose we want to know whether treatment has a significant effect on the folate level of the patient. How would we proceed? First, we would want to plot the data. A boxplot is a good idea.

library(ISwR)
library(ggplot2)
red.cell.folate <- ISwR::red.cell.folate; red.cell.folate
ggplot(red.cell.folate, aes(x = ventilation, y = folate)) + 
  geom_boxplot()

# Looking at this data, it is not clear whether there is a difference. It seems that the folate level under treatment 1 is perhaps higher. We could try to compare the means of each pair of ventilations using t.test(). The problem with that is we would be doing three tests, and we would need to adjust our p-values accordingly. We are going to take a different approach, which is ANOVA.
folateventilation
243 N2O+O2,24h
251 N2O+O2,24h
275 N2O+O2,24h
291 N2O+O2,24h
347 N2O+O2,24h
354 N2O+O2,24h
380 N2O+O2,24h
392 N2O+O2,24h
206 N2O+O2,op
210 N2O+O2,op
226 N2O+O2,op
249 N2O+O2,op
255 N2O+O2,op
273 N2O+O2,op
285 N2O+O2,op
295 N2O+O2,op
309 N2O+O2,op
241 O2,24h
258 O2,24h
270 O2,24h
293 O2,24h
328 O2,24h
In [20]:
library(dplyr)

red.cell.folate %>% group_by(ventilation) %>% summarize(mean(folate))
mean(red.cell.folate$folate)
ventilationmean(folate)
N2O+O2,24h316.6250
N2O+O2,op 256.4444
O2,24h 278.0000
283.227272727273
In [21]:
# Returning to red.cell.folate, we use R to do all of the computations for us. First, build a linear model for folate as explained by ventilation, and then apply the anova function to the model. anova produces a table which contains all of the computations from the discussion above.

(rcf.mod <- lm(folate~ventilation, data = red.cell.folate))
anova(rcf.mod)
# The two rows ventilation and Residuals correspond to the between group and within group variation.
# The first column, Df gives the degrees of freedom in each case. Since k=3, the between group variation has k−1=2 degrees of freedom, and since N=22, the within group variation (Residuals) has N−k=19 degrees of freedom.
# The Sum Sq column gives SSDB and SSDW.
# The Mean Sq variable is the Sum Sq value divided by the degrees of freedom. These two numbers are the numerator and denominator of the test statistic, F. So here, F=7757.9/2090.3=3.7113.
# To compute the p-value, we need the area under the tail of the F distribution above F=3.7113. Note that this is a one-tailed test, because small values of F would actually be evidence in favor of H0.
Call:
lm(formula = folate ~ ventilation, data = red.cell.folate)

Coefficients:
         (Intercept)  ventilationN2O+O2,op     ventilationO2,24h  
              316.63                -60.18                -38.63  
DfSum SqMean SqF valuePr(>F)
ventilation 2 15515.77 7757.883 3.711336 0.04358933
Residuals19 39716.10 2090.321 NA NA
In [13]:
# Not surprisingly, ANOVA strongly rejects the null hypothesis that all sprays have the same mean, with a p-value less than 2.2×10−16:

IS.mod <- lm(count ~ spray, data = InsectSprays)
anova(IS.mod)
DfSum SqMean SqF valuePr(>F)
spray 5 2668.833 533.76667 34.70228 3.182584e-17
Residuals66 1015.167 15.38131 NA NA
In [16]:
IS.mod <- lm(sqrt(count) ~ spray, data = InsectSprays)
plot(IS.mod)
In [3]:
library(yarrr)

# To conduct a two-way ANOVA or a Menage a trois NOVA, just include additional independent variables in the regression model formula with the + sign. That’s it. All the steps are the same. Let’s conduct a two-way ANOVA with both cleaner and type as independent variables. To do this, we’ll set formula = time ~ cleaner + type.

# Step 1: Create ANOVA object with aov()
cleaner.type.aov <- aov(formula = time ~ cleaner + type,
                        data = poopdeck)

# Step 2: Get ANOVA table with summary()
summary(cleaner.type.aov)
             Df Sum Sq Mean Sq F value  Pr(>F)    
cleaner       2   6057    3028   6.945 0.00104 ** 
type          1  81620   81620 187.177 < 2e-16 ***
Residuals   596 259891     436                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In [4]:
# It looks like we found significant effects of both independent variables.
# Step 3: Conduct post-hoc tests

TukeyHSD(cleaner.type.aov)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = time ~ cleaner + type, data = poopdeck)

$cleaner
     diff        lwr       upr     p adj
b-a -0.42  -5.326395  4.486395 0.9779465
c-a -6.94 -11.846395 -2.033605 0.0027112
c-b -6.52 -11.426395 -1.613605 0.0053376

$type
                 diff      lwr      upr p adj
shark-parrot 23.32667 19.97811 26.67522     0
In [5]:
# The only non-significant group difference we found is between cleaner b and cleaner a. All other comparisons were significant.
# Step 4: Look at regression coefficients

cleaner.type.lm <- lm(formula = time ~ cleaner + type,
                      data = poopdeck)

summary(cleaner.type.lm)

# Now we need to interpret the results in respect to two default values (here, cleaner = a and type = parrot). The intercept means that the average time for cleaner a on parrot poop was 54.36 minutes. Additionally, the average time to clean shark poop was 23.33 minutes slower than when cleaning parrot poop.
Call:
lm(formula = time ~ cleaner + type, data = poopdeck)

Residuals:
    Min      1Q  Median      3Q     Max 
-59.743 -13.792  -0.683  13.583  83.583 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   54.357      1.705  31.881  < 2e-16 ***
cleanerb      -0.420      2.088  -0.201 0.840665    
cleanerc      -6.940      2.088  -3.323 0.000944 ***
typeshark     23.327      1.705  13.681  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.88 on 596 degrees of freedom
Multiple R-squared:  0.2523,	Adjusted R-squared:  0.2485 
F-statistic: 67.02 on 3 and 596 DF,  p-value: < 2.2e-16
In [6]:
# Let’s repeat our previous ANOVA with two independent variables, but now we’ll include the interaction between cleaner and type. To do this, we’ll set the formula to time ~ cleaner * type.
# Step 1: Create ANOVA object with interactions

cleaner.type.int.aov <- aov(formula = time ~ cleaner * type,
                          data = poopdeck)

# Step 2: Look at summary table

summary(cleaner.type.int.aov)
              Df Sum Sq Mean Sq F value   Pr(>F)    
cleaner        2   6057    3028   7.824 0.000443 ***
type           1  81620   81620 210.863  < 2e-16 ***
cleaner:type   2  29968   14984  38.710  < 2e-16 ***
Residuals    594 229923     387                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In [7]:
# Looks like we did indeed find a significant interaction between cleaner and type. In other words, the effectiveness of a cleaner depends on the type of poop it’s being applied to. This makes sense given our plot of the data at the beginning of the chapter.
# To understand the nature of the difference, we’ll look at the regression coefficients from a regression object:

# Step 4: Calculate regression coefficients

cleaner.type.int.lm <- lm(formula = time ~ cleaner * type,
                          data = poopdeck)
summary(cleaner.type.int.lm)

# Again, to interpret this table, we first need to know what the default values are. We can tell this from the coefficients that are ‘missing’ from the table. Because I don’t see terms for cleanera or typeparrot, this means that cleaner = "a" and type = "parrot" are the defaults. Again, we can interpret the coefficients as differences between a level and the default. It looks like for parrot poop, cleaners b and c both take more time than cleaner a (the default). Additionally, shark poop tends to take much longer than parrot poop to clean (the estimate for typeshark is positive).

# The interaction terms tell us how the effect of cleaners changes when one is cleaning shark poop. The negative estimate (-16.96) for cleanerb:typeshark means that cleaner b is, on average 16.96 minutes faster when cleaning shark poop compared to parrot poop. Because the previous estimate for cleaner b (for parrot poop) was just 8.06, this suggests that cleaner b is slower than cleaner a for parrot poop, but faster than cleaner a for shark poop. Same thing for cleaner c which simply has stronger effects in both directions.
Call:
lm(formula = time ~ cleaner * type, data = poopdeck)

Residuals:
   Min     1Q Median     3Q    Max 
-54.28 -12.83  -0.08  12.29  74.87 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)          45.760      1.967  23.259  < 2e-16 ***
cleanerb              8.060      2.782   2.897 0.003908 ** 
cleanerc             10.370      2.782   3.727 0.000212 ***
typeshark            40.520      2.782  14.563  < 2e-16 ***
cleanerb:typeshark  -16.960      3.935  -4.310 1.91e-05 ***
cleanerc:typeshark  -34.620      3.935  -8.798  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.67 on 594 degrees of freedom
Multiple R-squared:  0.3385,	Adjusted R-squared:  0.3329 
F-statistic: 60.79 on 5 and 594 DF,  p-value: < 2.2e-16
In [8]:
# Center variables before computing interactions!? what now?

# Create a regression model with interactions between IVS weight and clarity

diamonds.int.lm <- lm(formula = value ~ weight * clarity,
                      data = diamonds)

# Show summary statistics of model coefficients

summary(diamonds.int.lm)$coefficients
EstimateStd. Errort valuePr(>|t|)
(Intercept)157.4721482 10.569499 14.8987334 4.169909e-31
weight 0.9786784 1.069683 0.9149234 3.617404e-01
clarity 9.9244813 10.484999 0.9465409 3.454367e-01
weight:clarity 1.2446969 1.055131 1.1796609 2.400537e-01
In [9]:
# Centering a variable means simply subtracting the mean of the variable from all observations. In the following code, I’ll repeat the previous regression, but first I’ll create new centered variables weight.c and clarity.c, and then run the regression on the interaction between these centered variables:

# Create centered versions of weight and clarity

diamonds$weight.c <- diamonds$weight - mean(diamonds$weight)
diamonds$clarity.c <- diamonds$clarity - mean(diamonds$clarity)

# Create a regression model with interactions of centered variables

diamonds.int.lm <- lm(formula = value ~ weight.c * clarity.c,
                      data = diamonds)

# Print summary

summary(diamonds.int.lm)$coefficients

# That looks much better! Now we see that the main effects are significant and the interaction is non-significant.
EstimateStd. Errort valuePr(>|t|)
(Intercept)189.401518 0.3830999 494.392034 2.907642e-237
weight.c 2.222877 0.1987610 11.183671 2.322231e-21
clarity.c 22.248309 2.1338360 10.426438 2.271834e-19
weight.c:clarity.c 1.244697 1.0551311 1.179661 2.400537e-01
In [3]:
library(yarrr)
# In the next code chunk, I’ll calculate 3 separate ANOVAs from the poopdeck data using the three different types. First, I’ll create a regression object with lm(). As you’ll see, the Anova() function requires you to enter a regression object as the main argument, and not a formula and dataset. That is, you need to first create a regression object from the data with lm() (or glm()), and then enter that object into the Anova() function. You can also do the same thing with the standard aov() function.
# Step 1: Calculate regression object with lm()

time.lm <- lm(formula = time ~ type + cleaner,
              data = poopdeck)
In [4]:
# Now that I’ve created the regression object time.lm, I can calculate the three different types of ANOVAs by entering the object as the main argument to either aov() for a Type I ANOVA, or Anova() in the car package for a Type II or Type III ANOVA:
# Type I ANOVA - aov()

time.I.aov <- aov(time.lm)

# Type II ANOVA - Anova(type = 2)

time.II.aov <- car::Anova(time.lm, type = 2)

# Type III ANOVA - Anova(type = 3)

time.III.aov <- car::Anova(time.lm, type = 3)

# As it happens, the data in the poopdeck dataframe are perfectly balanced (so we’ll get exactly the same result for each ANOVA type. However, if they were not balanced, then we should not use the Type I ANOVA calculated with the aov() function.
In [3]:
library(yarrr)

# Let’s do an example with the diamonds dataset. I’ll create three regression models that each predict a diamond’s value. The models will differ in their complexity – that is, the number of independent variables they use. diamonds.mod1 will be the simplest model with just one IV (weight), diamonds.mod2 will include 2 IVs (weight and clarity) while diamonds.mod3 will include three IVs (weight, clarity, and color).

# model 1: 1 IV (only weight)

 diamonds.mod1 <- lm(value ~ weight, data = diamonds)

 # Model 2: 2 IVs (weight AND clarity)

 diamonds.mod2 <- lm(value ~ weight + clarity, data = diamonds)

 # Model 3: 3 IVs (weight AND clarity AND color)

 diamonds.mod3 <- lm(value ~ weight + clarity + color, data = diamonds)
In [4]:
# Now let’s use the anova() function to compare these models and see which one provides the best parsimonious fit of the data. First, we’ll compare the two simplest models: model 1 with model 2. Because these models differ in the use of the clarity IV (both models use weight), this ANVOA will test whether or not including the clarity IV leads to a significant improvement over using just the weight IV:

# Compare model 1 to model 2

anova(diamonds.mod1, diamonds.mod2)

# As you can see, the result shows a Df of 1 (indicating that the more complex model has one additional parameter), and a very small p-value (< .001). This means that adding the clarity IV to the model did lead to a significantly improved fit over the model 1.
Res.DfRSSDfSum of SqFPr(>F)
148 5568.748 NA NA NA NA
147 3221.273 1 2347.475 107.125 3.373243e-19
In [5]:
# Next, let’s use anova() to compare model 2 and model 3. This will tell us whether adding color (on top of weight and clarity) further improves the model:

# Compare model 2 to model 3

anova(diamonds.mod2, diamonds.mod3)

# The result shows a non-significant result (p = 0.21). Thus, we should reject model 3 and stick with model 2 with only 2 IVs.
Res.DfRSSDfSum of SqFPr(>F)
147 3221.273 NA NA NA NA
146 3187.282 1 33.99118 1.557036 0.2140973
In [7]:
# You don’t need to compare models that only differ in one IV – you can also compare models that differ in multiple DVs. For example, here is a comparison of model 1 (with 1 IV) to model 3 (with 3 IVs):

# Compare model 1 to model 3

anova(diamonds.mod1, diamonds.mod3)

# The result shows that model 3 did indeed provide a significantly better fit to the data compared to model 1. However, as we know from our previous analysis, model 3 is not significantly better than model 2.
Res.DfRSSDfSum of SqFPr(>F)
148 5568.748 NA NA NA NA
146 3187.282 2 2381.467 54.54399 2.038865e-18
In [22]:
plot(function(x)(df(x, df1 = 2, df2 = 150)), xlim = c(-2,10))
plot(function(x)(df(x, df1 = 4, df2 = 10)), xlim = c(-2,10),lty="dashed", add = T)