Last week we talked about how to use t-tests and one-way ANOVAs to compare means. Today we are going to talk about cases where things get just a bit more complicated: what if I care about the effect of more than one thing (say I want to know whether word frequency and/or word length affect reading time)? And what if my data is not normally distributed?
1 Factors, levels, main effects, and interactions
A factor is an independent (or predictor) variable that is nominal (i.e. categorical). A level refers to a sub-category of the factor. A variable must have at lest 2 levels (i.e. it must vary in some way). For example, if I’m interested in whether having a pet reduces stress level, I’ll have one factor Pet, with two levels: someone either has a pet, or doesn’t have a pet.
Factorial designs:
Many biological, psychological, and clinical phenomena involve more than one variable, for example the risk of cancer can be associated with many variables such as genetics, age, alchohol use, smoking, etc. Similarly, an experiment has a factorial design when it has 2 or more independent variables / factors. These are often described using numbers and multiplication signs, where each number represents an independent variable, and the value of the numbers represents the number of levels of that factor (e.g. a 2 x 2 x 2 design has three factors and each factor has two levels).
An example design of a study investigating the effect of different types of treatment on the outcome of a disorder, where patients were grouped by their biological sex.
Quick Q: how many factors does this study have and how many levels does each factor have?
treatment
group
Talking therapy
Female
Talking therapy
Male
Medication
Female
Medication
Male
Placebo
Female
Placebo
Male
No treatment
Female
No treatment
Male
A main effect refers to the effect of one factor on the outcome, while completely ignoring the different levels of other factors. An interaction refers to the combined effect of 2 or more independent variables on the dependent variable: the way in which the effect of one independent variable may depend on the level of another independent variable. We’ll talk a bit more about it in a factorial ANOVA example.
2 Factorial ANOVAs
2.1 Two-way between-subjects ANOVA
Suppose I’m interested in the effect of L1 reading ability and L2 proficiency on L2 reading comprehension. A group of 40 L2 English learners participated in an English reading comprehension experiment. Participants were grouped by their L1 reading ability (high vs. low), and their L2 proficiency (high vs. low).
This study has a 2 x 2 factorial design: L1 reading ability (high vs. low) L2 proficiency (high vs. low)
Number.of.participants
High.L1.ability
Low.L1.ability
high L2 proficiency
10
10
low L2 proficiency
10
10
Example data.
Mean.scores
High.L1.ability
Low.L1.ability
high L2 proficiency
95
75
low L2 proficiency
65
60
2.1.1 Main effect of L2 proficiency
From the example data, we find a main effect of L2 proficiency: when the scores are averaged across the levels of L1 ability, proficient L2 learners are better at L2 reading comprehension than less proficient L2 learners.
Mean.scores
High.L1.ability
Low.L1.ability
mean
high L2 proficiency
95
75
85.0
low L2 proficiency
65
60
62.5
2.1.2 Main effect of L1 ability
In this case, we also find a main effect of L1 ability: when scores are averaged across levels of L2 proficiency, people with better L1 reading ability perform better in L2 reading comprehension than people with worse L1 reading ability.
Mean.scores
High.L1.ability
Low.L1.ability
high L2 proficiency
95
75.0
low L2 proficiency
65
60.0
mean
80
67.5
2.1.3 Interaction between L1 ability and L2 proficiency
The effect of one independnet variable may depend on the level of another independent variable: For the low L2 proficiency group, having a higher L1 ability does not improve their scores much; but for the high L2 proficiency group, having a higher L1 ability means they perform much better. To say it more formally, the effect of L1 ability is larger in the high L2 proficiency group than the low L2 proficiency group.
Mean.scores
High.L1.ability
Low.L1.ability
high L2 proficiency
95
75
low L2 proficiency
65
60
2.1.4 Possible patterns
In a 2 x 2 factorial design, we can have these patterns of main effects and interactions:
2.1.5 Two-way between-subjects ANOVA: logic and assumptions
A two-way ANOVA can tell us:
If there is a main effect of IV1 (L1 ability)
If there is a main effect of IV2 (L2 proficiency)
If there is an interaction between the two factors
A two-way ANOVA computes and compares four difference souces of total vairance: Between group variance, which is further broken down to variation due to IV1, variation due to IV2, and variation due to the interaction; as well as within group variance.
The two-way between-subjects ANOVA assumes more or less the same things as its one-way counterpart:
Normality: Redisuals (observation - group mean) should be normally distributed in each group \(\approx\) normal distribution of data in each group
Homogeneity of variances: Variances are approximately equal for every group
Post-hoc tests2 Think about this example. Do we need post-hocs? If we do, why? Remember that we needed post-hocs because ANOVA only tells us that at least one group is different from the others. And to see exactly which group(s) are different, we need pair-wise t-tests. In this example, we only have two levels for each factor, so when we have main effects, it’s pretty obvious that it must be the only two groups that differ from each other! In fact, if you run an ANOVA with one factor and two levels, you will get the same results as a t-test. However, we had two factors in our ANOVA that interacted with each other. In this case, post-hocs are still useful for us to explore the interaction. Could it be that L1 ability only affects those who have a high L2 proficiency and not affect those who have low L2 proficiency at all? Or could it be that L1 ability still affects everyone, it’s just the effect is larger for the highly proficient L2 learners? Or… Post-hocs will tell us the answer!
# Two-way between-subject ANOVA: Example
# Effect of L1 reading ability and L2 proficiency on L2 reading comprehension
# first, let's import our data
data_1 <- read.csv('l1-l2-comprehension.csv',header = TRUE)
data_1$L2_proficiency <- as.factor(data_1$L2_proficiency)
data_1$L1_ability <- as.factor(data_1$L1_ability)
# Visualisation and descriptives
# Line graph using ggplot2
# to make the line graph, we need to have group means
library(ggplot2)
data_1_mean <- aggregate(score ~ L1_ability + L2_proficiency, data = data_1, FUN = mean)
ggplot(data_1_mean, aes(x = L2_proficiency, y = score, group=L1_ability, color = L1_ability)) +
geom_point() +
geom_line() +
theme_light() +
ylim(50, 100)
# Get the descriptives
library(psych)
describeBy(score ~ L1_ability*L2_proficiency, data = data_1)
##
## Descriptive statistics by group
## L1_ability: high
## L2_proficiency: high
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 10 92.6 6.95 93 93.38 6.67 79 100 21 -0.58 -0.91 2.2
## ------------------------------------------------------------
## L1_ability: low
## L2_proficiency: high
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 10 76 11.07 74 74.88 12.6 63 98 35 0.54 -0.96 3.5
## ------------------------------------------------------------
## L1_ability: high
## L2_proficiency: low
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 10 61.5 10.95 62.5 61.38 8.9 45 79 34 -0.09 -1.28 3.46
## ------------------------------------------------------------
## L1_ability: low
## L2_proficiency: low
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 10 57.7 10.34 55.5 57.62 9.64 42 74 32 0.26 -1.25 3.27
# Run the two-way between-subjects ANOVA
data_1_result <- aov(score ~ L1_ability * L2_proficiency, data = data_1)
summary(data_1_result)
# Check the assumptions of the two-way between-subject ANOVA
# 1. Normality
# use Shapiro-Wilk test
data_1_residuals <- residuals(object = data_1_result)
shapiro.test(data_1_residuals)
##
## Shapiro-Wilk normality test
##
## data: data_1_residuals
## W = 0.97943, p-value = 0.6682
# 2. Homogeneity of variance
# use Levene's test
library(car)
leveneTest(score ~ L1_ability * L2_proficiency, data = data_1)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.7463 0.5316
## 36
# Post-hoc tests
# Pair-wise t-test using Bonferroni
# the pairwise.t.test() function can only take one factor column, so we need a single column that represents both of our factors.
data_1$group <- paste('L1_',data_1$L1_ability,'_L2_',data_1$L2_proficiency, sep = '')
data_1$group <- as.factor(data_1$group)
pairwise.t.test(data_1$score, data_1$group, p.adjust.method = 'bonferroni')
##
## Pairwise comparisons using t tests with pooled SD
##
## data: data_1$score and data_1$group
##
## L1_high_L2_high L1_high_L2_low L1_low_L2_high
## L1_high_L2_low 2.1e-07 - -
## L1_low_L2_high 0.0040 0.0149 -
## L1_low_L2_low 1.7e-08 1.0000 0.0013
##
## P value adjustment method: bonferroni
Report the results
The results of a two-way Analysis of Variance (ANOVA) with 2 between-subjects factors, L1 reading ability (high vs. low) and L2 proficiency (high vs. low) indicated a significant main effect of L1 reading ability (F(1,36)=10.47, p=0.002), as well as a main effect of L2 proficiency (F(1,36)=61.4, p<0.001). A significant interaction was also found (F(1,36)=4.12, p=0.049).
Pairwise t-test with Bonferroni correction methods revealed that while for the high L2 proficiency groups, there was a significant effect of L1 ability (p=0.004), for the low L2 proficiency groups, there was no significant effect of L1 reading ability (p=1).
3 Withitn-subject ANOVAs: the ez package
Now that we’ve talked about one way and factorial anovas, you might notice that so far all of our examples only have between-subject factors. What if my factor(s) is within-subject?
Within-subject ANOVAs are not very different from between-subject ANOVAs, and here are some examples. There are a few different packages that can do within-subjects ANOVAs, and for this workshop we are going to use the ez package.
3.1 One-way within-subjects ANOVA: Example
Effect of written text on perceived clarity of degraded speech:
10 participants took part in the experiment. In the experiment, participant listened to vocoded words (vocoding: a procedure that removes speech’s fine structure while preserving low-frequency temporal information), and rated speech clarity on a 7-point scale. In the Before condition, written text of the word was presented 800ms before the onset of speech; in the Simaultaneous condition, written text of the word was presented at the same time of onset of speech; in the After condition, written text was presented 800ms after the onset of speech. Each participant completed 15 trials in each condition.3Example adapted from Sohoglu, E., Peelle, J. E., Carlyon, R. P., & Davis, M. H. (2014). Top-down influences of written text on perceived clarity of degraded speech. Journal of Experimental Psychology: Human Perception and Performance, 40(1), 186. Note that data used in this example is simulated and conclusions may differ from the real experiment.
Run the ANOVA (using the ez package, and specify within-subject factor(s))
Check for the assumptions:
Normality of residuals
Sphericity: Analogous to Homogeneity of variance in other ANOVAs. Only matters when a factor has more than two levels
Post-hocs
# One-way within-subjects ANOVA: Example
# First, let's import our data
one_within_exp <- read.csv('vocoded.csv', header = TRUE)
# let's order our factor
one_within_exp$condition <- factor(one_within_exp$condition, levels = c('before', 'simultaneous', 'after'))
# Let's get some descriptives and visualise our data
# descriptives
library(psych)
describeBy(clarity ~ condition, data = one_within_exp)
##
## Descriptive statistics by group
## condition: before
## vars n mean sd median trimmed mad min max range skew kurtosis se
## clarity 1 150 5.35 1.27 5.5 5.42 0.74 2 7 5 -0.39 -0.69 0.1
## ------------------------------------------------------------
## condition: simultaneous
## vars n mean sd median trimmed mad min max range skew kurtosis
## clarity 1 150 5.14 1.32 5 5.2 1.48 2 7 5 -0.26 -0.81
## se
## clarity 0.11
## ------------------------------------------------------------
## condition: after
## vars n mean sd median trimmed mad min max range skew kurtosis se
## clarity 1 150 3.87 1.5 4 3.88 1.48 1 7 6 -0.09 -0.53 0.12
# visualisation, you can use line plots too
boxplot(clarity ~ condition, data = one_within_exp)
# remember that to do t-tests or ANOVAs we need by-participant mean
# in this case, it means that we need to average across the 15 trials in each condition for each participant
# in other words, condition and participant are our grouping variable
one_within_exp_by_par <- aggregate(clarity ~ participant+condition, data = one_within_exp, FUN = mean)
# run the within-subjects ANOVA with the package ez
# note that ezANOVA() checks for sphericity during computation
library(ez)
one_within_exp_result <- ezANOVA(data = one_within_exp_by_par, dv=.(clarity), wid=.(participant),within=.(condition), type = 3)
one_within_exp_result
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 condition 2 18 56.52274 1.740421e-08 * 0.8179273
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 condition 0.6250943 0.15268
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
## 2 condition 0.7273226 1.10469e-06 * 0.8314024 2.257353e-07 *
##
## Pairwise comparisons using paired t tests
##
## data: one_within_exp_by_par$clarity and one_within_exp_by_par$condition
##
## before simultaneous
## simultaneous 0.16303 -
## after 3.9e-05 0.00012
##
## P value adjustment method: bonferroni
Report the results
One-way within-subject ANOVA revealed a signifcant main effect of Condition (F(2,18)=56.52, p<0.001). Pair-wise t-tests with Bonferroni correction revealed significantly decreased clarity in the After condition than the other two conditions (p’s < 0.01); while there was no difference in clarity between the Before and the Simultaneous conditions (p=0.16).
4 Mixed ANOVAs
Mixed ANOVAs refer to factorial ANOVAs with both within-subject and between-subject factors.
4.1 Mixed ANOVA: Example
Effect of language aptitude on the acquisition of programming languages: 40 participants took ten 45-minute sessions of Python training. Before training, participants’ language aptitude was measured using the Modern Language Aptitude Test. Participants were grouped into two groups of 20: high language aptitude and low language aptitude.5Example adapted from Prat, C. S., Madhyastha, T. M., Mottarella, M. J., & Kuo, C. H. (2020). Relating natural language aptitude to individual differences in learning programming languages. Scientific reports, 10(1), 1-10. Note that data used in this example is simulated and conclusions may differ from the real experiment.
Participants’ programming ability is measured after the 1st, 5th, and 10th training session using an examination (max score 100).
2 x 3 design: aptitude (high vs. low, between-subjects) and session (1st vs. 5th vs. 10th, within-subjects).
4.1.1 Do it in R: mixed ANOVA using the ez package
Run the ANOVA (using the ez package, and specify within-subject factor(s))
Check for the assumptions:
Normality of residuals
Homogeneity of variance for between-subject factors, sphericity for within-subject factors.
Post-hocs
# Mixed ANOVA: Example
# Let's import our data
mixed_exp <- read.csv('python.csv', header = TRUE)
# let's make categorical factors
mixed_exp$participant <- as.factor(mixed_exp$participant)
mixed_exp$session <- as.factor(mixed_exp$session)
mixed_exp$aptitude <- as.factor(mixed_exp$aptitude)
# Let's get some descriptives and visualise our data
# descriptives
library(psych)
describeBy(score ~ aptitude*session, data = mixed_exp)
##
## Descriptive statistics by group
## aptitude: High
## session: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 24.75 8.62 23.5 24.31 8.9 11 43 32 0.39 -0.82 1.93
## ------------------------------------------------------------
## aptitude: Low
## session: 1
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 19.75 9.87 20 19.31 8.9 5 38 33 0.21 -1 2.21
## ------------------------------------------------------------
## aptitude: High
## session: 5
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 61.6 9.57 63 61.12 10.38 47 81 34 0.15 -1.12 2.14
## ------------------------------------------------------------
## aptitude: Low
## session: 5
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 49.05 9.32 48.5 49.38 10.38 31 65 34 -0.33 -0.79 2.08
## ------------------------------------------------------------
## aptitude: High
## session: 10
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 83.65 8.13 82.5 83.44 7.41 69 99 30 0.1 -0.75 1.82
## ------------------------------------------------------------
## aptitude: Low
## session: 10
## vars n mean sd median trimmed mad min max range skew kurtosis se
## score 1 20 65.45 9.39 66.5 65.12 7.41 48 88 40 0.25 -0.19 2.1
# visualisation
# for two-way designs, line graph can help us understand the data structure better
# to make the line graph, we need to have group means
library(ggplot2)
mixed_exp_mean <- aggregate(score ~ aptitude*session, data = mixed_exp, FUN = mean)
ggplot(mixed_exp_mean, aes(x = session, y = score, group=aptitude, color = aptitude)) +
geom_point() +
geom_line() +
theme_light()
# assumption check: normality (of data in each group)
# use shapiro test
tapply(mixed_exp$score, mixed_exp$condition, shapiro.test)
## $High_1
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.96375, p-value = 0.6212
##
##
## $High_10
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.97549, p-value = 0.8638
##
##
## $High_5
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.94779, p-value = 0.3348
##
##
## $Low_1
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.96006, p-value = 0.545
##
##
## $Low_10
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.97615, p-value = 0.8753
##
##
## $Low_5
##
## Shapiro-Wilk normality test
##
## data: X[[i]]
## W = 0.95303, p-value = 0.4154
# assumption check: homogeneity of variance (of between subject factor)
# use Levene's test
library(rstatix)
mixed_exp %>%
group_by(session) %>%
levene_test(score ~ aptitude)
# run the mixed ANOVA with package ez
library(ez)
mixed_exp_result <- ezANOVA(data = mixed_exp, dv=.(score), wid=.(participant),between=aptitude, within=session, type = 3)
mixed_exp_result
# post-hoc:
# in this case, first, we want to look at the effect of aptitude at each time point
aptitude_effect <- mixed_exp %>%
group_by(session) %>%
anova_test(dv = score, wid = participant, between = aptitude) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
aptitude_effect
# then, we can also look at the effect of session for each aptitude group
session_effect <- mixed_exp %>%
group_by(aptitude) %>%
anova_test(dv = score, wid = participant, within = session) %>%
get_anova_table() %>%
adjust_pvalue(method = "bonferroni")
session_effect
# because we have more than two levels of session, we also need to run pairwise t-tests to determine exactly which session(s) are different from the other(s).
session_effect_pt <- mixed_exp %>%
group_by(aptitude) %>%
rstatix::pairwise_t_test(score ~ session, paired = TRUE, p.adjust.method = "bonferroni")
print.data.frame(session_effect_pt)
Mixed ANOVA with one within-subject factor Session (1st session vs. 5th session vs. 10th session) and one between-subject factor Aptitude (high vs. low) revealed significant main effects of Session (F(2, 76)=356.78, p<0.001) and Aptitude (F(1, 38)=44.72, P<0.001), as well as an interaction between Aptitude and Session (F(2, 76)=5.59, p=0.005). Post-hoc pair-wise t-tests suggest that for both High Aptitude and Low Aptitude participants, test scores significantly improved after each session (10th > 5th > 1st, all p’s <0.001). After the first session, no sigificant differences were found between the two groups of participants (p=0.09); however, High Aptitude participants perfromed significantly better than Low Aptitude participants after the 5th and the 10th session (all p’s <0.001).
5 The ultimate choose-the-test: t-tests and ANOVAs
How to choose the correct test.
6 Effect sizes
Effect size refers to the size of an effect (obviously). This is different from statistical significance: Effect sizes reflect the practical value of a difference.
With large enough sample size, even a tiny difference can be statistically significant.
Does it really matter, for example, if female verbal IQ = 101 on average, and male = 99, and this has p<0.05 when we have a sample size of 100,000?
In sicence, it’s good practice to report the effect size alongside your p values.
6.1 Effect size of t-tests: Cohen’s \(d\)
The effect size of a t-test is usually reported in term of Cohen’s \(d\). Cohen’s \(d\) indiates the difference between means, standardized by (divided by) the standard deviation of the measure.
\[ d = \frac{\bar{X} - \bar{Y}}{\text{s.d.}} \]
6.1.1 Do it in R: Cohen’s \(d\)
Let’s use our first example of independent samples t-test: performance of SLI children and TD children in non-word repetition task.
# let's skip the assumption checks and run the t-test
t.test(score ~ group, var.equal=TRUE, data = df)
##
## Two Sample t-test
##
## data: score by group
## t = -2.7027, df = 18, p-value = 0.01457
## alternative hypothesis: true difference in means between group SLI and group TD is not equal to 0
## 95 percent confidence interval:
## -9.419927 -1.180073
## sample estimates:
## mean in group SLI mean in group TD
## 7.5 12.8
# Now, let's calculate Cohen's d with the help of the effsize package
# the effsize:: makes sure the function we call is from the effsize package.
library(effsize)
effsize::cohen.d(score ~ group, data = df)
We see that we have a Cohen’s \(d\) of -1.2, which is a relatively large effect.7 Cohen classified effect sizes as small (\(d=0.2\)), medium (\(d=0.5\)), or large(\(d>=0.8\)). See a visualisation here.
6.2 Effect sizes of ANOVAs: Eta-squared (\(\eta^2\)) and Partial Eta-squared (\(\eta_p^2\))
The effect size of an ANOVA can be reported in \(\eta^2\). Eta-squared \(\eta^2\) is the proportion of variance that is explained by the factor in a one-way ANOVA, whereas partical eta-squared \(\eta_p^2\) is the proportion of variance that a factor explains that is not explained by the other facotrs; which is used for factorial ANOVAs.
6.2.1 Do it in R: (Partial) Eta-squared (\(\eta^2\))
Let’s take our one-way between-subjects ANOVA example (vocabulary size).
# let's run the anova
one_between_exp_1_result <- aov(score ~ group, data = one_between_exp_1)
# take a look at our results using the summary() function
summary(one_between_exp_1_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 282478 141239 396.4 <2e-16 ***
## Residuals 27 9621 356
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Let's calculate our eta-squared
# same function also gives us partial eta-squared when we feed it a factorial ANOVA's results.
library(lsr)
etaSquared(one_between_exp_1_result)
## eta.sq eta.sq.part
## group 0.9670626 0.9670626
We have eta-squared = 0.97, which is a very large effect.8 (Partical) Eta-squared can be interpreted as: \(0 < \text{trivial effect} < 0.01 < \text{small effect} < 0.06 < \text{moderate effect} < 0.14 < \text{large effect}\).
7 Non-parametric tests
If you’ve been paying attention to the assumptions of the tests we talked about, you’ll probably notice that all of the tests require the data is in some way normally distributed. But what if my data isn’t normally distributed?
The statstical tests we talked about so far are all parametric tests, meaning that they require a certain distribution of data (normal distribution, in our case). Non-parametric tests, in contrast, do not require a certain distribution, thus can be used when your data do not meet the assumtions of marametric tests. But why don’t we always do non-parametric tests then, and forget about normal distributions? This is because parametric tests generally have more statistical power, meaning that if an effect in fact exists, parametric tests are more likely to detect it.
Parametric tests
Non-Parametric equivalent
Independent samples t-test
Mann-Whitney U test
Paired samples t-test
Wilcoxon signed rank test
One-way ANOVA
Kruskal-Wallis test
7.1 The Chi-squared (\({\chi}^2\)) test: when data is categorical
In this class we will not delve into the non-parametric equivalents of t-tests and ANOVAs. Instead, let’s talk about the Chi-squared test. This test is useful when you have categorical data (DV) (which obviously will not be normally distributed). For example, suppose a biologist is interested in whether a certain chromosome W affects/determines the sex of a newly-discovered bird species. The biologist tested 100 chicks and determined their sex and whether they have the W chromosome. In this case, the Independent Variable is whether the bird has the W chromosome (categorical), and the Dependent Variable is the bird’s sex (also categorical, male or female). For a human example, suppose a neurolinguist is interested in whether having a certain gene x increases the chance of developping dyslexia. In this case the data is also categorical: someone either develops dyslexia or they don’t.
What kinds of statistics can be compared in these situations? Because means and medians are meaningless for nominal categories. In the Chi-squared case, it’s all about counting! The Chi-squared test compared observed frequencies to expected frequencies (based on a null hypothesis). The bigger the difference between the expected and observed frequencies, the more likely the null hypothesis will be rejected.
7.1.1 Chi-squared test example: candies
300 children (150 boys, 150 girls) are asked if they prefer Haribo or M&Ms. Question: Is there a difference between boys & girls in the preferred sweet?9 Quick Q: what is the IV and what is the DV in this study? What data types are they?
Null hypothesis: no difference between boys and girls in preferred sweet
Alternative hypothesis: there is a difference between boys and girls in preferred sweet
Although non-parametric, the Chi-squared test still has some basic assumtions about your data: they need to be randomly sampled; observations should be independent; and groups should be mutually exclusive. All of these you can (should) make sure while collecting your data, so you don’t really need to check for any assumptions in R.
From the result, we observe a significant difference between boys and girls in their preferred sweets.
Reporting a Chi-squared test
A Chi-squared test for association was conducted comparing boys and girls on their choice of favoured sweet between Haribo and M&Ms. There was a significant association between the gender of the child and the sweet chosen, χ2(1)=5.01, p=0.03.