All data in the exercises are simulated fake data.
a <- 0.1
, a <- 5
.a <- "hello world"
, a <- "3"
.a <- c()
. For example, fruits <- c("banana", "apple", "orange")
, fruits is a vector of strings.a <- factor()
. For example, music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
, music_genre is a factor with 4 levels Jazz, Rock, Classic, and Pop (duplicate values are ignored). It’s good practice to convert categorical independent variables to factors before running statistical analysis.a <- list()
. For example, edibles <- list(c("banana", "apple", "orange"), c("tomato", "potato", "spinach"), c("rice", "bread", "noodle"))
, edibles is a list of vectors.a <- matrix()
. For example, thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
, thismatrix is a matrix with 2 columns and 3 rows.a <- array()
. For example, thisarray <- array(1:24)
, thisarray is a one-dimensional array.a <- data.frame()
. For example, df <- data.frame (Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120), Duration = c(60, 30, 45))
, df is a data frame with columns Training, Pulse, and Duration, with three rows of data.Suppose I want to know whether UCL students’ IQs are significantly different from the average (100)1 Fun fact: IQ tests are specifically designed such that the average person would receive the score 100.. I randomly picked 20 students at UCL and asked them to do an IQ test.
Here are their scores: 117 125 116 113 102 122 123 93 99 129 132 94 85 92 101 83 119 119 101 97 (mean = 108.1, s.d.=15.05).
t.test(data, mu = 100)
.I’m interested in whether the frequency of the word ‘museum’ differ between humanities journal papers and blog articles on similar topics. I randomly accessed 10 corpora of humanities journals papers, and calculated the count of the word ‘museum’ per million words in each corpus. And then I did the same with humanities blog articles.
Frequencies of ‘museum’ in each journal paper corpus (count per million): 76 67 66 72 74 64 72 70 74 78
Frequencies of ‘museum’ in each blog article corpus (count per million): 66 63 68 68 60 67 58 70 67 66
data.frame()
.boxplot()
. (or any method you know to create a boxplot in R.)describeBy()
from the psych
package.An experimental syntactician conducted an experiment on what’s called the that-trace phenomenon. 30 participants were presented with sentences such as (1) and (2). They were asked to judge the sentences’ grammaticallity on a 7 point scale (1 = totally ungrammatical; 7 = totally grammatical).
At the end, the researcher calculated the average score given to that sentences and no-that sentences for each participants (such that each person has a score in the ‘that’ condition and one score in the ‘no-that’ condition). The researcher now want to know whether there is a significant difference between the grammaticallity of ‘that’ sentences and ‘no-that’ sentences.
data <- read.csv('that-trace.csv', header = TRUE)
.During conversation, people often change the way they speak to sound more alike each other, this phenomenon is called phonetic convergence.
A researcher recorded some spontaneous conversations in English.3 The example is adapted from Kim, M., Horton, W. S., & Bradlow, A. R. (2011). Phonetic convergence in spontaneous conversations as a function of interlocutor language distance. Note that data used in this exercise are randomly generated fake data and conclusions may differ from the real experiment. These conversations were grouped by interlocutor language distance: Close (native Southern accent - native Southern accent); Intermediate (native Southern accent - native Northern accent); and Far (L2 English - L2 English (of different L1’s)). Each group had 15 conversations. All conversations were from different speakers.
Phonetic convergence is evaluated based on change in speach similarity: how close is speaker A’s speech to speaker B’s speech, at the end of the conversation compared with at the beginning of the conversation. In other words, the data are the convergence scores: similarity scores at the end - similarity scores at the beginning. The researcher is interested in whether interlocutor language distance affects phonetic convergence. In other words, the researcher wants to compare the mean convergence scores and see whether they differ from one another.
data <- read.csv('interlocutor.csv', header = TRUE)
.