Before the exercises: Object types in R

Numerics: numbers. a <- 0.1, a <- 5.
Characters: strings. a <- "hello world", a <- "3".
Vectors: a list of items that are of the same type. a <- c(). For example, fruits <- c("banana", "apple", "orange"), fruits is a vector of strings.
Factors: used to categorise data. a <- factor(). For example, music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz")), music_genre is a factor with 4 levels Jazz, Rock, Classic, and Pop (duplicate values are ignored). It’s good practice to convert categorical independent variables to factors before running statistical analysis.
Lists: a collection of data which is ordered and changeable. a <- list(). For example, edibles <- list(c("banana", "apple", "orange"), c("tomato", "potato", "spinach"), c("rice", "bread", "noodle")), edibles is a list of vectors.
Matrices: two-dimensional object with rows and columns. a <- matrix(). For example, thismatrix <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2), thismatrix is a matrix with 2 columns and 3 rows.
Arrays: compared to matrices, arrays can have one, two , or more than two dimensions. a <- array(). For example, thisarray <- array(1:24), thisarray is a one-dimensional array.
Data frames: data displayed in a table-like format.a <- data.frame(). For example, df <- data.frame (Training = c("Strength", "Stamina", "Other"), Pulse = c(100, 150, 120), Duration = c(60, 30, 45)), df is a data frame with columns Training, Pulse, and Duration, with three rows of data.

Excercise 1: IQ test

Suppose I want to know whether UCL students’ IQs are significantly different from the average (100)11 Fun fact: IQ tests are specifically designed such that the average person would receive the score 100.. I randomly picked 20 students at UCL and asked them to do an IQ test.

Here are their scores: 117 125 116 113 102 122 123 93 99 129 132 94 85 92 101 83 119 119 101 97 (mean = 108.1, s.d.=15.05).

Decide which test statistics to use22 This wasn’t officially part of the lecture but it should be the one sample t-test. The idea is very simple: is the mean of a group of values significantly different from a specific number?.
Decide which object type to save the collection of this data and save the data as an object. Choose between strings, vectors, and lists. Hint: We might not have talked about this in class. Try finding out the answer in the R documentation (important skill to learn!).
Calculate the mean and the standard deviation (in R).
Check whether the data are normally distributed.
Run the one sample t-test. Hint: t.test(data, mu = 100).
Are UCL students’ IQs significantly different from the average?
Report your findings in a short paragraph.

Exercise 2: Word frequency in two types of writings

I’m interested in whether the frequency of the word ‘museum’ differ between humanities journal papers and blog articles on similar topics. I randomly accessed 10 corpora of humanities journals papers, and calculated the count of the word ‘museum’ per million words in each corpus. And then I did the same with humanities blog articles.

Frequencies of ‘museum’ in each journal paper corpus (count per million): 76 67 66 72 74 64 72 70 74 78

Frequencies of ‘museum’ in each blog article corpus (count per million): 66 63 68 68 60 67 58 70 67 66

Decide which test to use.
Make a data frame with the data:
- think about what form (long or short) of data you need and visualise what the data frame should look like.
- save the variables as vectors.
- make a data frame using the vectors with data.frame().
Visualise your data using boxplot(). (or any method you know to create a boxplot in R.)
Get the descriptive statistics (means and standard deviations) using describeBy() from the psych package.
Check whether your data meet the assumptions of your chosen statistical test.
Run the test.
Is there a significant difference between the frequency of ‘musuem’ in journal papers and blog articles?
Report your findings in a short paragraph.

Exercise 3: Grammaticallity judgment

An experimental syntactician conducted an experiment on what’s called the that-trace phenomenon. 30 participants were presented with sentences such as (1) and (2). They were asked to judge the sentences’ grammaticallity on a 7 point scale (1 = totally ungrammatical; 7 = totally grammatical).

Who do you think that will hire Mary? (‘that’ condition)
Who do you think will hire Mary? (‘no that’ condition)

At the end, the researcher calculated the average score given to that sentences and no-that sentences for each participants (such that each person has a score in the ‘that’ condition and one score in the ‘no-that’ condition). The researcher now want to know whether there is a significant difference between the grammaticallity of ‘that’ sentences and ‘no-that’ sentences.

Download the data Here.
Read the data into a data frame by running data <- read.csv('that-trace.csv', header = TRUE).
Visualise the data using a boxplot and get the descriptive statistics.
Take a look at the data and decide which statistical test to use to answer the research questions.
Check whether the data meet the assumptions of your chosen test.
Run the test. Is there a significant difference?
Report your findings in a short paragraph.

Exercise 4: Effect of interlocutor language distance on perceived phonetic convergence

During conversation, people often change the way they speak to sound more alike each other, this phenomenon is called phonetic convergence.

A researcher recorded some spontaneous conversations in English.33 The example is adapted from Kim, M., Horton, W. S., & Bradlow, A. R. (2011). Phonetic convergence in spontaneous conversations as a function of interlocutor language distance. Note that data used in this exercise are randomly generated fake data and conclusions may differ from the real experiment. These conversations were grouped by interlocutor language distance: Close (native Southern accent - native Southern accent); Intermediate (native Southern accent - native Northern accent); and Far (L2 English - L2 English (of different L1’s)). Each group had 15 conversations. All conversations were from different speakers.

Phonetic convergence is evaluated based on change in speach similarity: how close is speaker A’s speech to speaker B’s speech, at the end of the conversation compared with at the beginning of the conversation. In other words, the data are the convergence scores: similarity scores at the end - similarity scores at the beginning. The researcher is interested in whether interlocutor language distance affects phonetic convergence. In other words, the researcher wants to compare the mean convergence scores and see whether they differ from one another.

Download the data Here.
Read the data into a data frame by running data <- read.csv('interlocutor.csv', header = TRUE).
Visualise the data using a boxplot and get descriptive statistics.
Take a look at the data and decide which statistical test to use.
Check whether the data meet the assumptions and run the test.
Does interlocutor language distance affect phonetic convergence? Report your findings in a short paragraph.

Week 3 Lab sheet

Yiling Huo

2024-05-14

Before the exercises: Object types in R

Excercise 1: IQ test

Exercise 2: Word frequency in two types of writings

Exercise 3: Grammaticallity judgment

Exercise 4: Effect of interlocutor language distance on perceived phonetic convergence