All data sets (except for one) in the exercises are randomly-generated fake data.

Excercise 1: Hours studied and exam scores

50 students took an exam (scores 0-100). They were also asked how many hours they spent studying for the exam. A teacher wants to know whether there is a relationship between the hours spent studying and the exam results, and whether they can predict future students’ exam results based on the hours spent studying.

Download the data Here.
Visualise the data using a scatter plot (with the best-fitting line drawn). Hint: I showed how to do this in the handout.

data1 <- read.csv("exam.csv", header=TRUE)
head(data1)

##   time_spent test_score participant
## 1  3.6972636         66           1
## 2  3.2692988         71           2
## 3  2.8380752         62           3
## 4  0.5846604         64           4
## 5  1.4220910         72           5
## 6  4.2502780         83           6

library(ggplot2)
ggplot(data1, aes(x=time_spent, y=test_score)) + geom_point() + geom_smooth(method=lm, se=FALSE, color = 'darkgrey') + theme_light()

Decide what analysis/analyses to use to answer the teacher’s questions.

The teacher would like to answer two questions: whether the two variables are correlated, and whether they can predict exam results from hours spent studying. The first question can be answered using a correlation test, but the second question has to be answered by a regression. Now regression models will give you the correlation coefficient eitherway (just squareroot the \(R^2\)), so you can either do both tests or just the regression. I’ll show you how to do both.

Check the assumptions and run the tests.

# Pearson's correlation
cor.test(data1$time_spent, data1$test_score, method='pearson')

## 
##  Pearson's product-moment correlation
## 
## data:  data1$time_spent and data1$test_score
## t = 3.3965, df = 48, p-value = 0.001379
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1844476 0.6401119
## sample estimates:
##       cor 
## 0.4401954

# simple linear regression
slm <- lm(test_score ~ time_spent, data=data1)
summary(slm)

## 
## Call:
## lm(formula = test_score ~ time_spent, data = data1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.895 -10.950   1.203   9.144  30.372 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   46.886      5.039   9.304 2.54e-12 ***
## time_spent     5.095      1.500   3.397  0.00138 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.49 on 48 degrees of freedom
## Multiple R-squared:  0.1938, Adjusted R-squared:  0.177 
## F-statistic: 11.54 on 1 and 48 DF,  p-value: 0.001379

library(performance)
check_model(slm)

Report your findings in a short paragraph.

A simple linear regression model with one predictor time spent studying revealed a significant effect of time spent studying on exam results (\(\beta\)=5.1, t=3.4, p=0.001). It was found that with one hour of more time spent on studying, the exam results increased by 5.1 points.

Exercise 2: Lexical decision

Lexical decision is a popular task in psycho/neurolinguistics. In a lexical decision task, the participant is presented with a word, and is asked to decide whether this word is a real word or not in their language. Typically, reaction times (RT) needed to make a correct decision are recorded. This time is associated with the difficulty of lexical access (pulling the lexical item out of your mental lexicon).

A psycholinguist wants to study whether difficulty of lexical access (as measured by lexical decision RTs) is affected by the target word’s frequency and its length. 50 participants performed the lexical decision task with words varying in frequency and length. Each participant completed 100 items. Reaction times are log-transformed.

Download the data Here.
Visualise the data using scatter plots.

data2 <- read.csv("lexical.csv", header=TRUE)
head(data2)

##   subject item frequency length    logrt        rt
## 1       1    1       109      8 6.809362  906.2921
## 2       1    2       190      5 6.369214  583.5989
## 3       1    3       216      9 6.304127  546.8242
## 4       1    4       118      6 6.995545 1091.7583
## 5       1    5       204      9 6.410343  608.1025
## 6       1    6       201      7 6.354005  574.7901

library(ggplot2)
# let's do two catter plot, one for each predictor
ggplot(data2, aes(x=frequency, y=logrt)) + geom_point() + geom_smooth(method=lm, se=FALSE, color = 'darkgrey') + theme_light()

ggplot(data2, aes(x=length, y=logrt)) + geom_point() + geom_smooth(method=lm, se=FALSE, color = 'darkgrey') + theme_light()

Decide what analysis you should do to answer the research question. Hint: Do you expect there to be systematic variations outside of the experimental manipulations?

We have two predictors: word frequency and word length, both are continuous. We should also include two random factors: participant and item; as participants were repeatedly measured, and items were also repeatedly measured (each participant completed 100 items), we’d expect to have some systematic variability there. This should be a mixed-effects linear regression with two fixed factors (word frequency, word length) and two random factors (participant, item).

Make relevant categorical variables factors.

data2$subject <- as.factor(data2$subject)
data2$item <- as.factor(data2$item)

Check the assumptions and run the tests.
- If you decided to run a mixed-effects model, use the random effects structure (1|subject) + (1|item) (This is in case your computer cannot handle the maximal random effects structure (mine couldn’t)).
- Following the above, specify REML=FALSE and control=lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)) while running your model to optimize performance.

library(lme4)
library(lmerTest)
library(performance)
mm <- lmer(logrt ~ frequency*length + (1|subject) + (1|item), data=data2, REML=FALSE, control=lmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
summary(mm)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: logrt ~ frequency * length + (1 | subject) + (1 | item)
##    Data: data2
## Control: lmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##  -6907.4  -6861.7   3460.7  -6921.4     4993 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5572 -0.6578  0.0239  0.6508  3.6652 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  item     (Intercept) 1.491e-05 0.003861
##  subject  (Intercept) 1.016e-03 0.031867
##  Residual             1.435e-02 0.119789
## Number of obs: 5000, groups:  item, 100; subject, 50
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)       7.406e+00  2.942e-02  1.039e+02 251.755   <2e-16 ***
## frequency        -4.566e-03  1.291e-04  9.924e+01 -35.369   <2e-16 ***
## length            3.991e-03  3.695e-03  9.924e+01   1.080    0.283    
## frequency:length -1.336e-05  1.731e-05  9.924e+01  -0.772    0.442    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) frqncy length
## frequency   -0.953              
## length      -0.951  0.952       
## frqncy:lngt  0.886 -0.956 -0.963

check_model(mm)

Report your findings in a short paragraph.

Mixed-effects linear regression with two fixed factors Frequency and Length as well as by-participant and by-item random intercepts revealed a significant main effect of Frequency on logRT (\(\beta\)=0.00456, t=-35.37, p<0.001), but there was no significant main effect of Length, and no interaction between Frequency and Length.

Exercise 3: Scalar implicature

Scalar implicature refers to implicatures derived from quantifiers. If I say “James ate some of the apples”, many people will conclude that Jamed in fact did not eat all of the apples. Some adjectives can have the same effect: if I say “It’s warm today”, many will conclude that it’s warm but not hot today.

An experimental pragmatician wants to investigate whether there is a difference between how often people derive implicature derived for the word “some” and for the word “warm”. 100 participants each read 25 sentences that contained “some”, and 25 sentences that contained “warm”, and for each sentence, the participant answered the yes-no question “Would you conclude that not all….” or “Would you conclude that …. is not hot”. If the answer was yes, their response was coded as 1, and if the answer was no, the response was coded as 0.

Download the data Here.
Visualise the data.

data3 <- read.csv("scalar.csv", header=TRUE)
head(data3)

##   participant item condition response
## 1           1    1      some        1
## 2           1    2      some        1
## 3           1    3      some        1
## 4           1    4      some        1
## 5           1    5      some        1
## 6           1    6      some        1

library(ggplot2)
# we have binomal data and categorical IV. In this case perhaps the best way is to plot the proportions/frequencies
data3_frequency <- as.data.frame(table(subset(data3, select = -c(participant, item))))
ggplot(data=data3_frequency, aes(x=condition, y=Freq, fill=response)) + geom_bar(stat="identity", width=0.4)

Decide what analysis you should do to answer the research question.

We have one categorical predictor, condition (warm vs. some). Most importantly, our data is binomial (yes vs. no, 1 vs. 0). Moreover we should try to include two random factors: participant and item. Although later on you’ll only be asked to include item, this is because in the example data there is not much variance that can be explained by participant, and the model will run into singular fit problems if we force it to include participant (more on this in the last excercise).

Make relevant categorical variables factors.

data3$participant <- as.factor(data3$participant)
data3$item <- as.factor(data3$item)
data3$condition <- as.factor(data3$condition)

Choose the correct coding scheme for categorical factors, check the assumptions and run the test.
- If you decided to run a mixed-effects model, use the random effects structure (1+condition|item).
- Following the above, specify glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)) while running your model to optimize performance.

The factor is Condition with two levels: warm and some. There’s no obvious default level here, so we should use sum coding.

library(lme4)
library(lmerTest)
library(performance)
# important to sum contrast condition
contrasts(data3$condition) <- contr.sum(2)
contrasts(data3$condition)

##      [,1]
## some    1
## warm   -1

glm <- glmer(response ~ condition + (1+condition|item), data = data3, family = 'binomial', glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5)))
summary(glm)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: response ~ condition + (1 + condition | item)
##    Data: data3
## Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
## 
##      AIC      BIC   logLik deviance df.resid 
##   5965.3   5997.8  -2977.6   5955.3     4995 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3938 -1.1624  0.5503  0.6623  0.9887 
## 
## Random effects:
##  Groups Name        Variance  Std.Dev. Corr 
##  item   (Intercept) 8.968e-05 0.00947       
##         condition1  1.258e-01 0.35466  -0.96
## Number of obs: 5000, groups:  item, 50
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.89642    0.05958  15.046  < 2e-16 ***
## condition1   0.26910    0.05958   4.517 6.28e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## condition1 -0.003

check_model(glm)

Report your findings in a short paragraph.

A mixed-effect logistic model was fitted to the data with one fixed factor Condition (warm vs. some) and one random factor item. The fixed factor was sum contrasted. Results revealed a significant main effect of condition on participant’s response (\(\beta\)=0.27, z=4.51, p<0.001), suggesting that speakers derived significantly less scalar implicatures with the word “warm” than the word “some”.

Advanced excercise: Dealing with problematic mixed-effects models

In the handout I wrote about how to choose your random-effects structure in a mixed-model: always start with the maximal structure and unless the model gives you a problem, stick to it. But will maximal structures often give you problems? In my experience, yes, actually. Often this is because data I collect from an experiment is not big enough to support a maximal random-effects structure. Sure, one participant usually gives me a few dozen trials, but that’s just a few thousand data points if I have a couple dozen participants. That’s not a big data set at all unfortunately, especially if you have more than one fixed factors.

Now what do you do? You should find out what terms are the least important for the model and remove them one by one, until the model can be supported by your data. But how to do that exactly? I’ll give you some of the real data I collected for you to try it out.

Suppose the study was about whether a preceding phonological cue can facilitate lexical access11 The design I described is different from my real design but for our purposes they’re close enough. In each trial, participants were presented with two pictures (e.g. an orange and a pear) and listened to a spoken phrase that identified one of the pictures. The spoken phrase either contained a phonological cue that was informative about the target’s identity (an orange, as opposed to a pear, the informative condition), or uninformative (that orange, the uninformative condition). Participant’s task was to decide which one of the pictures was named, asap. Their RT was recorded and log-transformed. Each participant completed 20 items. I needed to do a statistical test to determine whether participants’ reaction times were faster in the informative condition than the uninformative condition.

Download the data Here.
Read the data into a R data frame and plot the two conditions against logRT.

library(ggplot2)
library(lme4)
library(lmerTest)
library(performance)

data <- read.csv("rt.csv", header = TRUE)
head(data)

##   subject item     condition   rt    logrt
## 1     s07    2   informative 1466 7.290293
## 2     s07    4   informative 1156 7.052721
## 3     s07    5 uninformative 1166 7.061334
## 4     s07    6   informative 1162 7.057898
## 5     s07    7 uninformative 1051 6.957497
## 6     s07   10   informative  780 6.659294

ggplot(data, aes(y = logrt, x = condition)) + geom_boxplot()

Make the variables subject, item, and condition factors.

data$subject <- as.factor(data$subject)
data$item <- as.factor(data$item)
data$condition <- as.factor(data$condition)

Sum contrast the factor condition, and fit a mixed-effects linear model to the logRT, with one fixed factor Condition, as well as by-participant/subject and by-item random intercepts.

contrasts(data$condition) <- contr.sum(2)
contrasts(data$condition)

##               [,1]
## informative      1
## uninformative   -1

model <- lmer(logrt ~ condition + (1+condition | subject) + (1+condition | item), data=data)

## boundary (singular) fit: see help('isSingular')

The model should give you a singularity warning. Basically, this means that some of your random effects are estimated to have nearly no impact on the model.
We need to identify which random effect(s) is causing the singularity problem and remove them from the model. Read the model summary output and find out which term in the random effects structure has the smallest variance explained.

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logrt ~ condition + (1 + condition | subject) + (1 + condition |  
##     item)
##    Data: data
## 
## REML criterion at convergence: -388.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1331 -0.5617 -0.0508  0.5347  4.5499 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  subject  (Intercept) 4.218e-03 0.064947      
##           condition1  1.232e-05 0.003511 -1.00
##  item     (Intercept) 1.182e-02 0.108722      
##           condition1  8.312e-03 0.091169 -0.43
##  Residual             2.610e-02 0.161552      
## Number of obs: 678, groups:  subject, 38; item, 20
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  6.86631    0.02724 25.66625 252.095   <2e-16 ***
## condition1  -0.01647    0.02136 17.80720  -0.771    0.451    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## condition1 -0.382
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Looks like the by-subject random slopes for condition has the least amount of variance explained.

To verify that’s indeed the term we want to remove from the model, we can plot the data, grouping by each of our random factors (here, subject and item). Here’s how to do it:

# plot by subject
ggplot(data, aes(y = logrt, x = condition, colour = subject, group = subject)) + geom_point() + geom_smooth(method = "lm", se = FALSE)

# plot by item
ggplot(data, aes(y = logrt, x = condition, colour = item, group = item)) + geom_point() + geom_smooth(method = "lm", se = FALSE)

I’ll show you what the plots should look like. Compare them to the plots in the handout that explained what random intercepts and random slopes look like, and answer:

For the random factor subject, is there a large variability in intercepts? And in slopes?
For the random factor item, is there a large variability in intercepts? And in slopes?
If we have to take out one random effect term from the model, which one would you choose? Hint: choose from (1) by-subject random intercepts; (2) by-subject random slopes; (3) by-item random intercepts; (4) by-item random slopes.

For Subject, it seems that there is some intercept-wise variability, and perhaps less slope-wise variability. Compare that to Item, it’s clear that for different items, there is a large intercept-wise as well as slope-wise variability.

Take out the random effect you chose, and run the model again. Does it give you a warning still?

model <- lmer(logrt ~ condition + (1 | subject) + (1+condition | item), data=data)
summary(model)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: logrt ~ condition + (1 | subject) + (1 + condition | item)
##    Data: data
## 
## REML criterion at convergence: -387.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1319 -0.5597 -0.0604  0.5339  4.5519 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  subject  (Intercept) 0.004197 0.06479       
##  item     (Intercept) 0.011821 0.10872       
##           condition1  0.008302 0.09112  -0.43
##  Residual             0.026111 0.16159       
## Number of obs: 678, groups:  subject, 38; item, 20
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  6.86622    0.02723 25.63338  252.19   <2e-16 ***
## condition1  -0.01644    0.02134 17.78758   -0.77    0.451    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## condition1 -0.373

check_model(model)

When the model is no longer problematic, report your findings in a short paragraph.

A mixed-effects linear model with one fixed factor Condition (informative vs uninformative) as well as by-participant random intercepts and by-item random intercepts and random slopes was fitted to the data. The factor was sum coded. Results revealed no significant effect of condition on logRT (\(\beta\)=-0.016, t=-0.77, p=0.45), suggesting no difference between conditions in terms of lexical decision RT.

Week 5 Lab sheet with answers

Yiling Huo

2024-06-03

Excercise 1: Hours studied and exam scores

Exercise 2: Lexical decision

Exercise 3: Scalar implicature

Advanced excercise: Dealing with problematic mixed-effects models