Chapter 6.1.1.1 Exercises
In [3]:
library(dplyr)
library(ggplot2)
library(infer)
library(nycflights13)
library(ggplot2movies)
library(broom)
In [1]:
require(graphics)
t.test(1:10, y = c(7:20)) # P = .00001855
t.test(1:10, y = c(7:20, 200)) # P = .1245 -- NOT significant anymore
In [3]:
# Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))
In [3]:
# Traditional interface
with(sleep, t.test(extra[group == 1], extra[group == 2]))
In [6]:
wilcox.test(y - x, alternative = "less") # The same.
In [7]:
wilcox.test(y - x, alternative = "less",
exact = FALSE, correct = FALSE) # H&W large sample approximation
In [ ]:
# Two-sample test.
# Hollander & Wolfe (1973), 69f.
# Permeability constants of the human chorioamnion (a placental
# membrane) at term (x) and between 12 to 26 weeks gestational
# age (y). The alternative of interest is greater permeability
# of the human chorioamnion for the term pregnancy.
x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)
y <- c(1.15, 0.88, 0.90, 0.74, 1.21)
wilcox.test(x, y, alternative = "g") # greater
In [10]:
wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)
In [4]:
# Formula interface.
boxplot(Ozone ~ Month, data = airquality)
wilcox.test(Ozone ~ Month, data = airquality,
subset = Month %in% c(5, 8))
In [1]:
wilcox.test(c(15,7,3,10,13), mu = 6)
In [3]:
# Let’s look at some examples. Consider the react data set. Recall that a t test looks like this:
react <- ISwR::react
t.test(react)
In [4]:
# and we got a p-value of 10**−13. If we instead were to do a wilcoxon test, we get
wilcox.test(react)
In [5]:
(p <- sum(replicate(10000,t.test(rnorm(50),mu=.5)$p.value)< .05))
# Here, we see that t.test correctly rejects H0 93.75% of the time. Doing the same thing for Wilcoxon:
(p <- sum(replicate(10000,wilcox.test(rnorm(50),mu=.5)$p.value)< .05))
# We see that Wilcoxon correctly rejects H0 91.86% of the time. So, if you have reason to think that your data is normal or symmetric with not too heavy tails, then you will get better results using t.test. If it is symmetric but with heavy tails or especially with outliers, then Wilcoxon will perform better.
In [14]:
# Ordinal data: In order to do a two-sample Wilcoxon test, we need to load the data:
movies <- read.csv("http://stat.slu.edu/~speegle/_book_data/movieLensData", as.is = TRUE)
# Now, we need to pull out the ratings that correspond to Toy Story and Toy Story II. We begin by finding the movieID of the Toy Story movies:
library(dplyr)
filter(movies, stringr::str_detect(Title, "Toy Story")) %>% select(MovieID, Title) %>% distinct()
In [20]:
# Next, we create a data frame that only has the ratings of the Toy Story movies in it.
ts <- filter(movies, MovieID %in% c(1, 3114))
ts
# Now, we can perform the Wilcoxon test:
wilcox.test(Rating~Title, data = ts)
# And we can see that there is not a statistically significant difference in the ratings of Toy Story and Toy Story II, so we do not reject the null hypothesis.
In [25]:
# This is an example of a paired, one-sided Mann-Whitney-Wilcoxon test. Let’s pull out the data that we want to examine:
cbt <- filter(MASS::anorexia, Treat == "CBT")
#Used MASS:: so I didn't have to load entire library(MASS). You could also load the library if you prefer. But, MASS has a select function that will hide dplyr's select, if you load it afterwards.
# We want to make a boxplot:
library(tidyr)
cbtBoxData <- gather(cbt, key = WeightTime, value = Weight, -Treat)
ggplot(cbtBoxData, aes(x = WeightTime, y = Weight)) + geom_boxplot()
In [27]:
# So, there is some evidence that the treatment was effective based on this test, but the evidence isn’t overwhelming. Note that if we hadn’t paired the data, we would have seen less significance in the results, and if we hadn’t done a one-sided test, then we would have seen even less significance.
wilcox.test(cbt$Prewt, cbt$Postwt, paired = FALSE, alternative = "less")
In [32]:
wilcox.test(cbt$Prewt, cbt$Postwt, paired = FALSE, alternative = "two")
In [33]:
# Simulations: We begin by preparing the power of Wilcoxon rank-sum test with t-test when the underlying data is normal. We assume we are testing H0:μ=0 versus Ha:μ≠0 at the α=.05 level, when the underlying population is truly normal with mean 1 and standard deviation 1 with a sample size of 10. Let’s first estimate the percentage of time that a t-test correctly rejects the null-hypothesis.
set.seed(1)
sum(replicate(10000, t.test(rnorm(10,1,1))$p.value < .05))
In [36]:
sum(replicate(10000, wilcox.test(rt(10,3) + 1)$p.value < .05))
# Here, we see that Wilcoxon outperforms t, but not by a tremendous amount.
In [41]:
# The automatic ranks are the ones that are not in manualRanks. It can be a little tricky to pull these out. Here are a couple of ways:
which(!(1:32 %in% manualRanks))
(1:32)[-manualRanks]
setdiff(1:32, manualRanks)
autoRanks <- setdiff(1:32, manualRanks) ; autoRanks
In [46]:
# If it is bigger than or equal to the observed value of 214 or less than or equal to 33 (which is the value equally far from the expected value of 13*19/2 = 123.5), then the test statistic would be in the smallest rejection region that contains our observed test statistic. The p-value is the probability of landing in this rejection region, so we estimate it.
mean(replicate(20000, { manualRanks <- sample(32, 13);
autoRanks <- setdiff(1:32, manualRanks);
abs(wilcox.test(manualRanks, autoRanks)$statistic - 123.5) >= 90.5}))
In [44]:
# Another option would be to take the 32 data points that we have, and randomly assign them to either auto or manual transmissions, and estimate the p-value this way. The code would look like this: I am suppressing warnings in wilcox.test so we don’t get warned about ties so much!
mean(replicate(20000, {manualSample <- sample(32, 13)
manualDisp <- mtcars$disp[manualSample];
autoDisp <- mtcars$disp[setdiff(1:32, manualSample)];
abs(suppressWarnings(wilcox.test(manualDisp, autoDisp)$statistic) - 123.5) >= 90.5}))
# The results are similar to the previous method.
In [1]:
# On the other hand, if we obtained 30 successes, the p-value would sum the probabilities that are less than observing exactly 30 successes, as indicated by this plot:
library(ggplot2)
ggplot(plotData, aes(x, y)) + geom_bar(stat = "identity") +
geom_abline(slope = 0, intercept = dbinom(30, 100, 0.4), linetype = "dashed", color = "red")
In [8]:
# R will make these computations for us, naturally, in the following way.
binom.test(x = 50, n = 100, p = .4)
In [16]:
pnorm(50, 40, 4.89879, lower.tail = FALSE) + pnorm(30, 40, 4.89879)
In [14]:
# The built in R function for this is prop.test, as follows:
prop.test(x = 50, n = 100, p = 0.4, correct = FALSE)
binom.test(x = 50, n = 100, p = .4)