Week 4 Lab sheet with answers

Yiling Huo

2024-05-28

All data in the exercises are simulated fake data.

Before the exercises: R operators and loops

If statement

There are two ways to write the if statement in R: the standard if statement, and the ifelse() function.

# If expression1 is true, statement1 will be executed. If expression1 is false but 2 is true, statement2 will be executed. Otherwise, statement3 will be executed. 
if (expression1) {
  statement1
} else if (expression2) {
  statement2
} else {
  statement3
}
# If expression is ture, statement1 will be executed, otherwise statement2 will be executed.
ifelse(expression, statement1, statement2)

While loops and for loops

In a while loop, test_expression is evaluated and the statement is executed if the result is TRUE. Then the expression is evaluated again. A while loop will run repeatedly until text_expression is no longer TRUE.

while (test_expression)
{
statement
}

In a for loop, value takes on each element in the vector sequence. statement is executed in each iteration.

for (value in sequence)
{
statement
}

Exercise 1: Language learning class

A teacher designed three sets of class materials (sets A, B, and C) for a language learning class. The teacher wants to know whether students’ learning outcome differ depending on the set of material used, and/or whether they receive a 10-min revision session after the class. 60 students participated in a round of trial classes: 10 students took part in the class with material set A, and received revision after class; 10 students also took part in the class with material set A, but did not receive revision; and so on.

This study is a 3 x 2 design with Material set (A vs. B vs. C) and Revision (revision vs. no revision).

  1. Download the data Here.
  2. Read the data into a data frame.
data1 <- read.csv('language-class.csv', header = TRUE)
  1. Determine which statistical test to use to answer the teacher’s question.
    • How many factors are there in the study?
    • Are the factor(s) between- or within-subject?
    • How many levels does each factor have?

There are two factors in this study, Material set and Revision. Each factor has two levels. Both factors are between-subjects. Therefore the suitable test is the two-way between-subject ANOVA.

  1. Visualise the data using your preferred type of plot and get descriptive statistics.
library(ggplot2)
data1_groupmean <- aggregate(score ~ material+revision, data = data1, FUN = mean)
ggplot(data1_groupmean, aes(x = material, y = score, group=revision, color = revision)) + 
  geom_point() + 
  geom_line() + 
  ylim(c(50,100)) +
  theme_light() 

library(psych)
describeBy(score ~ material*revision, data=data1)
## 
##  Descriptive statistics by group 
## material: A
## revision: no_revision
##       vars  n mean    sd median trimmed  mad min max range skew kurtosis   se
## score    1 10 62.3 11.14   61.5   62.62 12.6  44  78    34 -0.3    -1.24 3.52
## ------------------------------------------------------------ 
## material: B
## revision: no_revision
##       vars  n mean   sd median trimmed mad min max range skew kurtosis   se
## score    1 10 77.5 9.41   77.5   77.12 8.9  64  94    30 0.24     -1.2 2.97
## ------------------------------------------------------------ 
## material: C
## revision: no_revision
##       vars  n mean   sd median trimmed  mad min max range  skew kurtosis   se
## score    1 10 84.4 8.07   86.5      85 7.41  69  95    26 -0.56    -0.99 2.55
## ------------------------------------------------------------ 
## material: A
## revision: revision
##       vars  n mean   sd median trimmed   mad min max range  skew kurtosis  se
## score    1 10 78.5 9.79     80   78.75 11.86  63  92    29 -0.15    -1.53 3.1
## ------------------------------------------------------------ 
## material: B
## revision: revision
##       vars  n mean    sd median trimmed   mad min max range skew kurtosis   se
## score    1 10 76.4 11.22   73.5    76.5 13.34  59  93    34 0.03    -1.43 3.55
## ------------------------------------------------------------ 
## material: C
## revision: revision
##       vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
## score    1 10   91 6.31     89      91 6.67  82 100    18 0.23     -1.6 1.99
  1. Check whether the data meet the assumptions and run the test.
data1_result <- aov(score ~ material*revision, data = data1)
summary(data1_result)
##                   Df Sum Sq Mean Sq F value  Pr(>F)    
## material           2   3052  1525.9  16.982 1.9e-06 ***
## revision           1    785   784.8   8.735 0.00462 ** 
## material:revision  2    751   375.6   4.180 0.02051 *  
## Residuals         54   4852    89.8                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
data1_residuals <- residuals(object = data1_result)
shapiro.test(data1_residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  data1_residuals
## W = 0.97604, p-value = 0.2846
library(car)
leveneTest(score ~ material*revision, data = data1)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  5  0.5797 0.7153
##       54

Different ways to do pair-wise comparisons: In the two-way between-subject example in the handout, I simply did pairwise comparisons between each possible combination of gropus. This is totally ok, like this:

# Two-way between-subject ANOVA post-hocs
# This is the method I showed you in the handout, which does all possible pair-wise comparisons
data1$group <- paste('set_', data1$material, '_', data1$revision, sep = '')
data1$group <- as.factor(data1$group)
pairwise.t.test(data1$score, data1$group, p.adjust.method = 'bonferroni')
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  data1$score and data1$group 
## 
##                   set_A_no_revision set_A_revision set_B_no_revision
## set_A_revision    0.0052            -              -                
## set_B_no_revision 0.0108            1.0000         -                
## set_B_revision    0.0238            1.0000         1.0000           
## set_C_no_revision 4.5e-05           1.0000         1.0000           
## set_C_revision    1.5e-07           0.0706         0.0361           
##                   set_B_revision set_C_no_revision
## set_A_revision    -              -                
## set_B_no_revision -              -                
## set_B_revision    -              -                
## set_C_no_revision 0.9677         -                
## set_C_revision    0.0167         1.0000           
## 
## P value adjustment method: bonferroni

However we can also approach this in a more careful way: you might wonder what’s the point in comparisons such as set_A_revision vs. set_B_no-revision? These comparisons don’t sound very informative to the research question of this study.

Can’t I do something that simply lets me know whether for each set of materials, there is an effect of revision or not (I only care about set_A_revision vs. set_A_no-revision; set_B_revision vs. set_B_no-revision; etc.)? You can, of course, and in fact in the Mixed-ANOVA example in the handout, I did something very similar. When you have a very clear idea of what comparisons will be useful for your research question, then all you need are those comparisons you planned to do.

Let’s say we only want to know whether, for each set of materials, revision significantly changes students’ performance:

# Let's look at whether revision has a significant effect for each material set.
library(rstatix)
data1_pt <- data1 %>%
  group_by(material) %>%
  rstatix::pairwise_t_test(score ~ revision, p.adjust.method = "bonferroni")
data1_pt
## # A tibble: 3 × 10
##   material .y.   group1      group2     n1    n2       p p.sig…¹   p.adj p.adj…²
## * <chr>    <chr> <chr>       <chr>   <int> <int>   <dbl> <chr>     <dbl> <chr>  
## 1 A        score no_revision revisi…    10    10 0.00282 **      0.00282 **     
## 2 B        score no_revision revisi…    10    10 0.815   ns      0.815   ns     
## 3 C        score no_revision revisi…    10    10 0.0566  ns      0.0566  ns     
## # … with abbreviated variable names ¹​p.signif, ²​p.adj.signif

You might notice that the p-values are slightly different, this is because we are not doing the ‘useless’ comparisons, so we don’t need to adjust the p-values that much.

Alternatively (or in addition), if you care about whether the sets of materials differ when student get the revision session, and whether the sets of materials differ when students don’t get the revision session, you can do something like this:

# we can also look at whether the sets of materials differ, when students revised vs. when they didn't revise. 
data1_pt2 <- data1 %>%
  group_by(revision) %>%
  rstatix::pairwise_t_test(score ~ material, p.adjust.method = "bonferroni")
data1_pt2
## # A tibble: 6 × 10
##   revision    .y.   group1 group2    n1    n2         p p.signif   p.adj p.adj…¹
## * <chr>       <chr> <chr>  <chr>  <int> <int>     <dbl> <chr>      <dbl> <chr>  
## 1 no_revision score A      B         10    10 0.0015    **       4.5 e-3 **     
## 2 no_revision score A      C         10    10 0.0000211 ****     6.33e-5 ****   
## 3 no_revision score B      C         10    10 0.12      ns       3.61e-1 ns     
## 4 revision    score A      B         10    10 0.619     ns       1   e+0 ns     
## 5 revision    score A      C         10    10 0.00583   **       1.75e-2 *      
## 6 revision    score B      C         10    10 0.00165   **       4.94e-3 **     
## # … with abbreviated variable name ¹​p.adj.signif

It’s up to you how you want to approach your pair-wise comparisons.

  1. Find out the sizes of the effects.
library(lsr)
etaSquared(data1_result)
##                       eta.sq eta.sq.part
## material          0.32328529   0.3861152
## revision          0.08314044   0.1392329
## material:revision 0.07958275   0.1340738
  1. Report your findings in a short paragraph.

Two-way between-subjects ANOVA revealed significant main effects of Material set (F(2, 54)=16.98, p<0.001, \(\eta^2\)=0.39) and Revision (F(1, 54)=8.74, p=0.005, \(\eta^2\)=0.14), as well as a significant interaction (F(2, 54)=4.18, p=0.02, \(\eta^2\)=0.13). Post-hoc pair-wise t-tests with Bonferroni correction revealed that the effect of Revision was only sigificant for Material set A (p=0.003), but not the other two sets of materials. (Alternatively or in addition: Without revision, Material sets B and C were significantly more effective than Material set A (both p’s<0.01), while there was no significant difference between sets B and C. With revision, Material set A became as effective as Material set B; however, set C was still significantly more effective than both A and B (both p’s<0.01). )

Exercise 2: Negation

Saying an affirmative sentence is true is easier than saying it’s false. For example, saying “A peguin is a bird” is true is easier (takes shorter time) than saying “A dolphin is a fish” is false. A researcher wants to know whether the same can be said about negative sentences.

10 participants took part in an experiment investigating negation processing. In each trial, the participant saw a visual display of a picture together with a sentence that is either congruent or incongruent with the visual display. Half of the sentences were affirmative, while the other half were negative. Participants were asked to jedge the truth value of each sentence. Their reaction time (seconds) was measured.1 Example adapted from Wang, S., Sun, C., Tian, Y., & Breheny, R. (2021). Verifying Negative Sentences. Journal of Psycholinguistic Research, 50(6), 1511-1534. Note that data used in this example is simulated and conclusions may differ from the real experiment.

This study has a 2 x 2 design with Polarity (affirmative vs. negative) and Truth value (true vs. false), resulting in four conditions. Each participant completed 15 trials in each condition.

A table representing the experiment design. (Adapted from Wang, Sun, Tian, & Breheny, 2021)
A table representing the experiment design. (Adapted from Wang, Sun, Tian, & Breheny, 2021)
  1. Download the data Here.
  2. Read the data into a data frame.
data2 <- read.csv('negation.csv', header = TRUE)
  1. Take a look at the data and determine which statistical test to use to anser the question: is reaction time affected by Polarity and/or Truth value?
    • How many factors are there in the study?
    • Are the factor(s) between- or within-subject?
    • How many levels does each factor have?

There are two factors, Polarity and Truth value, each having 2 levels. Both factors are within-subject. This should be a two-way within-subject ANOVA.

  1. Convert the independent variables to factors. Hint: data_frame$column_name <- as.factor(data_frame$column_name).
data2$polarity <- as.factor(data2$polarity)
data2$truth_value <- as.factor(data2$truth_value)
  1. Visualise the data using your preferred type of plot and get descriptive statistics.
# descriptives
library(psych)
describeBy(RT ~ polarity*truth_value, data = data2)
## 
##  Descriptive statistics by group 
## polarity: Affirmative
## truth_value: 0
##    vars   n mean   sd median trimmed  mad  min   max range skew kurtosis   se
## RT    1 150 6.63 1.54   6.61    6.63 1.41 2.97 10.31  7.34 0.05    -0.16 0.13
## ------------------------------------------------------------ 
## polarity: Negative
## truth_value: 0
##    vars   n mean   sd median trimmed  mad  min   max range skew kurtosis   se
## RT    1 150 8.95 1.48   8.79    8.92 1.45 5.42 12.73  7.31 0.21    -0.33 0.12
## ------------------------------------------------------------ 
## polarity: Affirmative
## truth_value: 1
##    vars   n mean   sd median trimmed  mad  min  max range skew kurtosis   se
## RT    1 150 4.42 1.59   4.47    4.42 1.67 0.01 8.35  8.34 -0.1    -0.15 0.13
## ------------------------------------------------------------ 
## polarity: Negative
## truth_value: 1
##    vars   n mean   sd median trimmed mad  min   max range skew kurtosis   se
## RT    1 150 8.72 1.64   8.77    8.69 1.7 5.12 12.97  7.86 0.18    -0.34 0.13
# visualisation
library(ggplot2)
data2_grandmean <- aggregate(RT ~ polarity+truth_value, data = data2, FUN = mean)
data2_grandmean$truth_value <- ifelse(data2_grandmean$truth_value==1, 'True','False')
ggplot(data2_grandmean, aes(x = truth_value, y = RT, group=polarity, color = polarity)) + 
  geom_point() + 
  geom_line() + 
  theme_light() 

6. Remember we need by-participant (or by-item) means rather than raw data. Convert raw data to by-participant means. Hint: data_mean <- aggregate(DV ~ factor1+factor2+factor3..., data=data_frame, FUN=mean).

data2_by_par <- aggregate(RT ~ participant+polarity+truth_value+condition, data = data2, FUN = mean)
  1. Check whether the data meet the assumptions and run the test.
# assumption check: normality (of data in each group)
# use shapiro test
tapply(data2_by_par$RT, data2_by_par$condition, shapiro.test)
## $Aff_false
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.95892, p-value = 0.7735
## 
## 
## $Aff_true
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.94552, p-value = 0.6159
## 
## 
## $Neg_false
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.98402, p-value = 0.983
## 
## 
## $Neg_true
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.93444, p-value = 0.493
# run the within-subjects ANOVA with package ez
library(ez)
data2_result <- ezANOVA(data = data2_by_par, dv=.(RT), wid=.(participant),within=c(polarity, truth_value), type = 3)
data2_result
## $ANOVA
##                 Effect DFn DFd         F            p p<.05       ges
## 2             polarity   1   9 425.22450 6.932215e-09     * 0.9403198
## 3          truth_value   1   9  94.86405 4.454277e-06     * 0.6817973
## 4 polarity:truth_value   1   9  44.25836 9.344969e-05     * 0.5856094
# post-hoc: pairwise t-test (within-subjects)
# for the two-way within anova, we can break the data down to each factor to compute post hocs
# Here, we look at the effect of truth value at each level of polarity
library(rstatix)
truth_effect <- data2_by_par %>%
  group_by(polarity) %>%
  anova_test(dv = RT, wid = participant, within = truth_value)%>%
  get_anova_table() %>%
  adjust_pvalue(method = "bonferroni")
truth_effect
## # A tibble: 2 × 9
##   polarity    Effect        DFn   DFd     F          p `p<.05`   ges     p.adj
##   <fct>       <chr>       <dbl> <dbl> <dbl>      <dbl> <chr>   <dbl>     <dbl>
## 1 Affirmative truth_value     1     9 84.0  0.00000737 "*"     0.86  0.0000147
## 2 Negative    truth_value     1     9  3.00 0.117      ""      0.081 0.234
# You can also explore the interaction in another way and look at the effect of polarity at each level of truth value
# It's up to you which is the appropriate way(s) of exploring your interaction (decide based on your experiment design).
# For me this makes less sense in this experiment design. 
polarity_effect <- data2_by_par %>%
  group_by(truth_value) %>%
  anova_test(dv = RT, wid = participant, within = polarity)%>%
  get_anova_table() %>%
  adjust_pvalue(method = "bonferroni")
polarity_effect
## # A tibble: 2 × 9
##   truth_value Effect     DFn   DFd     F            p `p<.05`   ges        p.adj
##   <fct>       <chr>    <dbl> <dbl> <dbl>        <dbl> <chr>   <dbl>        <dbl>
## 1 0           polarity     1     9  129. 0.00000123   *       0.906 0.00000246  
## 2 1           polarity     1     9  341. 0.0000000183 *       0.957 0.0000000366
  1. Find out the sizes of the effects.2 ezANOVA automatically reports the generalised eta-squared (ges). This is slightly different from the partial eta-squared in terms of some assumptions, but for our purposes let’s report the generalised eta-squared
  2. Report your findings in a short paragraph.

Two-way within-subjects ANOVA revealed significant main effects of Polarity (F(1,9)=425.22, p<0.001, \(\eta^2\)=0.94) and Truth value (F(1,9)=94.86, p<0.001, \(\eta^2\)=0.68), as well as a sigificant interaction (F(1,9)=44.26, p<0.001, \(\eta^2\)=0.59). Post-hoc pair-wise t-tests revealed that for affirmative sentences, RT was significantly faster for true sentences than false sentences (p<0.001); However, for negative sentences, there was no significant differences in terms of RT between true and false sentences (p=0.12).

Exercise 3: Scope and speaker variability

Suppose there is a language A, where sentences such as “Every girl in this class read her favorite book” can have two different interpretations: either (A) for each girl (e.g. \(x_1\), \(x_2\), \(x_3\),…) in the class, there is a book that she likes (e.g. \(x_1\) likes \(book_1\), \(x_2\) likes \(book_2\), \(x_3\) likes \(book_3\), etc.), and each girl read their respective favorite book; or (B) there is some girl/woman \(x\), such that each girl in this class read \(x\)’s favorite book.

A speaker of language A will wither consistently prefer one or the other of the two readings for this type of sentences. A linguist wants to know whether this kind of speaker variability depends on the region that the speaker is from (south vs. north). 100 native speakers (50 from the south, 50 from the north) read the example sentence and gave their preferred interpretation (A or B).

  1. Download the data Here.
  2. Read the data into a data frame.
data3 <- read.csv('pronoun.csv', header = TRUE)
  1. Take a look at the data and determine which statistical test to use to answer the linguist’s question.

Here we have categorial data: the dependent varible is categorial and takes the value of either A or B. We can use the Chi-squared test to see if there is a significant difference between the proportion/frequency of A and B depending on the groups/conditions.

  1. Transform the data to a frequency table.
data3_freq <- table(data3$region, data3$interpretation)
data3_freq
##        
##          A  B
##   north 21 29
##   south 17 33
  1. Visualise the data using a suitable type of plot and get descriptive statistics.
library(ggplot2)
data3_plot <- as.data.frame(table(subset(data3, select = -c(participant))))
ggplot(data=data3_plot, aes(x=region, y=Freq, fill=interpretation)) +
  geom_bar(stat="identity", width=0.4)

6. Run the statistical test.

chisq.test(data3_freq)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  data3_freq
## X-squared = 0.382, df = 1, p-value = 0.5365
  1. Report your findings in a short paragraph.

Chi-squared test revealed no significant difference between regions (south vs. north) in terms of native speakers’ preferred interpretation (\(\chi^2\)(1)=0.38, p=0.54).