0% found this document useful (0 votes)

13 views

LabSet 03

Uploaded by

ike.cai.cxc

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

LabSet 03

Uploaded by

ike.cai.cxc

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lab Set 3

Ike Cai

2024-10-04

1 Lab Set 3

1.1 Dependency parsing

1.1.1 Task 1

From the keyness table (acad_v_fic), identify the preposition with the highest keyness value.
Report the relevant statistics (including frequencies and dispersions) following the conventions
described in Brezina.

library(tidyverse)

-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

v dplyr 1.1.4 v readr 2.1.5
v forcats 1.0.0 v stringr 1.5.1
v ggplot2 3.5.1 v tibble 3.2.1
v lubridate 1.9.3 v tidyr 1.3.1
v purrr 1.0.2
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to becom

library(quanteda)

Package version: 4.0.2

Unicode version: 14.0
ICU version: 71.1

1
Parallel computing: disabled
See https://quanteda.io for tutorials and examples.

library(quanteda.textstats)
library(udpipe)
library(gt)

source("../R/keyness_functions.R")
source("../R/helper_functions.R")

load("../data/sample_corpus.rda")

set.seed(123)
sub_corpus <- sample_corpus %>%
mutate(text_type = str_extract(doc_id, "^[a-z]+")) %>%
group_by(text_type) %>%
sample_n(5) %>%
ungroup() %>%
dplyr::select(doc_id, text)

corpus_split <- split(sub_corpus, seq(1, nrow(sub_corpus), by = 10))

library(future.apply)

Loading required package: future

ncores <- 4L
plan(multisession, workers = ncores)

annotate_splits <- function(corpus_text) {

ud_model <- udpipe_load_model("../models/english-ewt-ud-2.5-191206.udpipe")
x <- data.table::as.data.table(udpipe_annotate(ud_model, x =
↪ corpus_text$text,
doc_id =
↪ corpus_text$doc_id))
return(x)
}

annotation <- future_lapply(corpus_split, annotate_splits, future.seed = T)

annotation <- data.table::rbindlist(annotation)

2
anno_edit <- annotation %>%
dplyr::select(doc_id, sentence_id, token_id, token, lemma, upos, xpos,
↪ head_token_id, dep_rel) %>%
rename(pos = upos, tag = xpos)

anno_edit <- structure(anno_edit, class = c("spacyr_parsed", "data.frame"))

sub_tkns <- as.tokens(anno_edit, include_pos = "tag", concatenator = "_")

doc_categories <- names(sub_tkns) %>%

data.frame(text_type = .) %>%
mutate(text_type = str_extract(text_type, "^[a-z]+"))

docvars(sub_tkns) <- doc_categories

sub_dfm <- dfm(sub_tkns)

sub_dfm <- sub_tkns %>%

tokens_select("^.*[a-zA-Z0-9]+.*_[a-z]", selection = "keep", valuetype =
↪ "regex", case_insensitive = T) %>%
dfm()

acad_dfm <- dfm_subset(sub_dfm, text_type == "acad") %>%

↪ dfm_trim(min_termfreq = 1)
fic_dfm <- dfm_subset(sub_dfm, text_type == "fic") %>% dfm_trim(min_termfreq
↪ = 1)

acad_v_fic <- keyness_table(acad_dfm, fic_dfm) %>%

separate(col = Token, into = c("Token", "Tag"), sep = "_")

3
Token AF_Tar AF_Ref Per_10.4_Tar Per_10.4_Ref DP_Tar DP_Ref
of 563 159 445.02 121.08 0.094 0.067
by 96 30 75.88 22.84 0.188 0.165
as 94 35 74.30 26.65 0.084 0.401
than 37 11 29.25 8.38 0.275 0.163
per 14 1 11.07 0.76 0.605 0.800
in 258 191 203.94 145.45 0.060 0.094
among 15 2 11.86 1.52 0.405 0.800
near 5 0 3.95 0.00 0.798 NA
between 17 6 13.44 4.57 0.333 0.269
for 120 92 94.85 70.06 0.217 0.090

Your response: “of” has the highest AF_Tar (563) and a high relative frequency
Per_10.4_Tar (445.02) with a relatively lower AF_Ref (159). “by” and “in” have
higher dispersion scores, but their frequencies and relative frequencies are much
lower compared to of. Therefore, the preposition “of” has the highest keyness value
due to its overwhelmingly higher AF_Tar (563) and Per_10.4_Tar (445.02) com-
pared to other prepositions.”of” has AF_Tar of 563 (Absolute Frequency in Target
Corpus), AF_Ref of 159 (Absolute Frequency in Reference Corpus), Per_10.4_Tar
of 445.02 (Relative Frequency in Target Corpus), Per_10.4_Ref of 121.08 (Relative
Frequency in Reference Corpus, DP_Tar of 0.094 (Dispersion in Target Corpus),
and DP_Ref of 0.067 (Dispersion in Reference Corpus)

Create a KWIC table (with the preposition as the node word and a context window of 3) of
10 rows.

preposition <- kwic(sub_tkns, pattern = "of_IN", window = 3)

preposition |>
head(10) |>
gt() |>
cols_hide(columns = c('keyword', 'pattern')) |>
as_raw_html()

docname from to pre post

acad_31 24 24 revising_VBG the_DT picture_NN how_WRB they_PRP lived_VBD
acad_31 53 53 a_DT new_JJ category_NN reptiles_NNS ._. Today_NN
acad_31 62 62 familiar_JJ to_IN people_NNS all_DT ages_NNS ,_,
acad_31 129 129 the_DT American_NNP Museum_NNP Natural_NNP History_NNP ,_,
acad_31 154 154 complete_JJ fossil_NN skeleton_IN a_DT sauropod_NN to_TO
acad_31 167 167 An_DT 1891_CD reconstruction_NN B._NNP excelsus_NNP by_IN

4
docname from to pre post
acad_31 182 182 at_IN the_DT top_NN the_DT page_NN ._.
acad_31 201 201 under_IN the_DT direction_NN the_DT paleontologist_NN Edward_NNP
acad_31 230 230 ”_'' 1910_CD reconstruction_NN Diplodocus_NNP ,_, by_IN
acad_31 241 241 followed_VBD the_DT views_NNS the_DT paleontologist_NN Oliver_NNP

Posit an explanation for the higher frequency of the preposition in the target corpus (academic
writing) vs. the reference corpus (fiction).

Your response: Here are a few reasons for the higher frequency of prepositions in
academic writing compared to fiction. First, academic writing often employs more
complex sentence structures, requiring more prepositions to clarify relationships be-
tween ideas (e.g., “under the direction,” “to people of all ages”). Second, academic
writing includes detailed explanations and descriptions that require prepositions
to specify the relationships between different concepts (e.g., “familiar to people,”
“reptiles by Today”). Lastly, in academic writing, references to studies, data, and
attributions (e.g., “by B. excelsus,” “at the top”) often require prepositions for
proper citations. On the other hand, fiction typically focuses on narrative flow and
dialogue, which may require fewer prepositions, leading to their lower frequency.

1.1.2 Task 2

Calculate the mean length of the noun phrases in the 2 two text-types (acad_nps and
fic_nps). Report the results and posit an explanation for the findings that connects to
the previous findings related to prepositions.

library(dplyr)
library(stringr)

extract_noun_phrases <- function(annotation_data) {

np_candidates <- annotation_data %>%
filter(dep_rel %in% c("det", "amod", "compound", "nmod", "nsubj",
↪ "dobj")) %>%
group_by(doc_id, sentence_id, head_token_id) %>%
summarise(np_length = n())

return(np_candidates)
}
np_acad <- extract_noun_phrases(filter(anno_edit, str_detect(doc_id,
↪ "^acad")))

5
`summarise()` has grouped output by 'doc_id', 'sentence_id'. You can override
using the `.groups` argument.

np_fic <- extract_noun_phrases(filter(anno_edit, str_detect(doc_id, "^fic")))

`summarise()` has grouped output by 'doc_id', 'sentence_id'. You can override

using the `.groups` argument.

mean_np_length_acad <- np_acad %>% summarise(mean_length = mean(np_length))

`summarise()` has grouped output by 'doc_id'. You can override using the
`.groups` argument.

mean_np_length_fic <- np_fic %>% summarise(mean_length = mean(np_length))

`summarise()` has grouped output by 'doc_id'. You can override using the
`.groups` argument.

mean(mean_np_length_acad$mean_length)

[1] 1.483593

mean(mean_np_length_fic$mean_length)

[1] 1.21275

Your response: After summarizing the noun phrase lengths, the mean noun phrase
length for academic texts is approximately 1.48. Similarly, the mean noun phrase
length for fiction texts is approximately 1.21. The longer mean noun phrase length
in academic texts reflects the more complex and structured language typically used
in such writing. Academic writing tends to employ more detailed and elaborative
noun phrases (e.g., multiple adjectives, compounds) to provide precision and clarity.
In contrast, fiction tends to use simpler noun phrases to maintain narrative flow and
accessibility. This finding complements the observation that certain prepositions
(such as “of”) are more frequent in academic writing, as they often serve to link
these longer, more complex noun phrases.

6
1.2 Logistic regression

1.2.1 Task 1

Following the example in Brezina (pg. 129), report and briefly interpret the output of the
regression model (wt_regs, line 245).

Your response: The context type was a significant predictor of the relative pronoun
used (that/which). Entry of the predictors into the model significantly improved
model fit (Null Deviance = 8764, Residual Deviance = 8135, AIC = 8143). The
model also has strong predictive capabilities, as indicated by the coeﬀicients table
below. As can be seen from the coeﬀicients table, the use of a comma significantly
increases the likelihood of using which (OR = 4.89, 95% CI [1.46, 5.77]) compared
to the absence of a comma. Being a student from the UK also positively affects the
choice of pronoun (OR = 1.43, 95% CI [1.29, 1.57]). In contrast, speaker status
had a negligible effect (OR = 1.02, 95% CI [0.89, 1.16]), indicating no significant
influence on the choice between that and which.

1.3 Task 2

Write a brief interpretation of the probability curves illustrated in Figure 5.

Your response: Figure 5 displays the relationship between the normalized frequen-
cies of hedges and boosters (denoted as hedges_norm and boosters_norm, respec-
tively) and the predicted probabilities of different disciplines (Humanities, Sciences,
and Social Sciences). For humanities, The probability of belonging to the Human-
ities discipline increases smoothly from 0 to 1 as the frequency of boosters_norm
rises. This relationship follows a hyperbolic tangent (tanh) shape, indicating that
as the use of boosters increases, the likelihood of a text being classified as Humani-
ties grows steadily and sharply, reflecting a strong association between the presence
of boosters and Humanities texts. For science, the probability of belonging to the
Sciences discipline demonstrates a decreasing trend, starting at 1 and declining to
0 as the confidence (i.e., boosters_norm) approaches a value of 3. This suggests
that higher frequencies of boosters correlate with lower probabilities of a text be-
ing classified as Science. For social science, the probability shows a bell-shaped
curve that peaks around a confidence level of 2. This indicates that moderate
frequencies of boosters are most characteristic of Science texts, while both high
and low frequencies are less indicative. The probabilities for disciplines exhibit a
mirroring pattern compared to the boosters_norm. As the frequency of boosters
increases, the behavior of hedges_norm follows a similar trend, indicating that
both language features interact in a way that influences the classification of Social
Sciences texts. The reflection suggests that as hedges increase, the probability for

7
Social Sciences also increases, highlighting the nuanced role that both hedging and
boosting language plays in this discipline’s textual characteristics.

1.4 Multi-dimensional analysis

1.4.1 Task 1

Brezina similarly plots the Brown corpus registers on pg. 169. His process is a little different.
Rather than extracting factor loadings from the Brown corpus, he uses the loadings from the
original Biber data (some of which are listed on pg. 168).
Our loadings for dimension 1 are similar to Biber’s, though with some differences. Likewise,
the resulting plot is similar to the one on pg. 169. Why is this the case, do you think? (If you
want to check Biber’s description of his corpus, it’s on pg. 66 of his book.)

Your response: The similarity between the dimension loadings found in this analysis
and Biber’s work could stem from the foundational structure of language use in the
Brown Corpus. Both analyses focus on similar linguistic features and categories,
leading to comparable results despite differences in methodology and datasets. For
instance, Biber’s dimensions of “Involved vs. Informational Production” highlight
different discourse types, which remain relevant in both analyses. The insights
gained from MDA can inform various fields, including linguistics, discourse analysis,
and even artificial intelligence, where understanding language patterns is essential.

1.4.2 Task 2

Using information from the factor loadings, the positions of the disciplines along the dimension,
and KWIC tables, name dimension 1 following the X vs. Y convention. In a couple of sentences,
explain your reasoning.

Your response: Based on Brezina’s functional interpretation of factors as dimen-

sions and the analysis of the KWIC tables, I would name dimension 1 “Conver-
sational Engagement vs. Academic Formality.” The positive end of this dimen-
sion, characterized by features such as private verbs, contractions, and personal
pronouns (as seen in the “Confidence High” KWIC examples), reflects a style of
communication that is interactive and engaging. These features suggest a focus on
personal experiences and subjective interpretations, indicative of conversational
language that aims to connect with the audience. In contrast, the negative end,
marked by features such as nouns, propositional phrases, and structured academic
writing moves (highlighted in the “Academic Writing Moves” KWIC examples),
indicates a formal and informational approach typical of academic discourse. This

8
style prioritizes objectivity, structure, and logical argumentation, aimed at convey-
ing information rather than fostering interpersonal engagement.

House-Tree-Person Test: First Variation
No ratings yet
House-Tree-Person Test: First Variation
4 pages
Fixing Cognitive Distortions
100% (1)
Fixing Cognitive Distortions
2 pages
Lesson plAN FOR DEMO
No ratings yet
Lesson plAN FOR DEMO
10 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Summarizing Data
No ratings yet
Summarizing Data
13 pages
Implementing KNN Algorithm on the Iris Dataset
No ratings yet
Implementing KNN Algorithm on the Iris Dataset
7 pages
Proyecto Final Model
No ratings yet
Proyecto Final Model
13 pages
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
No ratings yet
Logistic Regression For Binary Classification With Core APIs - TensorFlow Core
22 pages
Codigo Final Histograma y Ojiva
No ratings yet
Codigo Final Histograma y Ojiva
8 pages
Package Inpdfr': R Topics Documented
No ratings yet
Package Inpdfr': R Topics Documented
29 pages
MANOVA
No ratings yet
MANOVA
12 pages
Arbol de Decisiones XGBoos
No ratings yet
Arbol de Decisiones XGBoos
7 pages
Base de Datos IRIS Codigos R Utilizados para El Analisis
No ratings yet
Base de Datos IRIS Codigos R Utilizados para El Analisis
4 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
Ensayo Abrotanella: Cargar Un Arbol Filogenetico
No ratings yet
Ensayo Abrotanella: Cargar Un Arbol Filogenetico
12 pages
NCDF 4
No ratings yet
NCDF 4
35 pages
DSA_1
No ratings yet
DSA_1
8 pages
Midterm Sol
No ratings yet
Midterm Sol
6 pages
01 Mongodb 11
No ratings yet
01 Mongodb 11
7 pages
Unit-4 Lab Questions
No ratings yet
Unit-4 Lab Questions
3 pages
EXP 9 DWM - Merged
No ratings yet
EXP 9 DWM - Merged
11 pages
Oracle Apps Inventory Queries
100% (1)
Oracle Apps Inventory Queries
11 pages
Cia 3
No ratings yet
Cia 3
38 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
overtureR
No ratings yet
overtureR
9 pages
Data Analysis and Evaluation Methods Comparison
No ratings yet
Data Analysis and Evaluation Methods Comparison
11 pages
41 Perusse Alexander Aperusse PDF
No ratings yet
41 Perusse Alexander Aperusse PDF
7 pages
Package Wordcloud': R Topics Documented
No ratings yet
Package Wordcloud': R Topics Documented
9 pages
Lab 14
No ratings yet
Lab 14
5 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
COMP5318
No ratings yet
COMP5318
42 pages
Package Rlas': June 2, 2020
No ratings yet
Package Rlas': June 2, 2020
14 pages
Tcseq: Time Course Sequencing Data Analysis
No ratings yet
Tcseq: Time Course Sequencing Data Analysis
8 pages
SVM and Kmeans -Iris Dataset.ipynb - Colab
No ratings yet
SVM and Kmeans -Iris Dataset.ipynb - Colab
5 pages
4TA05_Razzan Respati_10321297_PrakRobotika_M7
No ratings yet
4TA05_Razzan Respati_10321297_PrakRobotika_M7
4 pages
AI Lab Programs
No ratings yet
AI Lab Programs
9 pages
Trendline Break With Super Ichimoku Cloud
No ratings yet
Trendline Break With Super Ichimoku Cloud
6 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
32 pages
An Introduction To Numpy and Scipy by Scott Shell
No ratings yet
An Introduction To Numpy and Scipy by Scott Shell
24 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
Predictivemaintenance FaultDetection
No ratings yet
Predictivemaintenance FaultDetection
12 pages
IR Practical Code
No ratings yet
IR Practical Code
13 pages
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
No ratings yet
CSA105-LinearRegression-HousePrice-Prediction - Ipynb - Colaboratory
17 pages
IR practical
No ratings yet
IR practical
24 pages
Assingment-3 NLP
No ratings yet
Assingment-3 NLP
5 pages
Final Coding
No ratings yet
Final Coding
6 pages
Chapman 2018appendixs2
No ratings yet
Chapman 2018appendixs2
10 pages
14.3 Neural Networks
No ratings yet
14.3 Neural Networks
6 pages
CueMol - Documents - QScriptFileIO
No ratings yet
CueMol - Documents - QScriptFileIO
3 pages
R Homework
No ratings yet
R Homework
13 pages
Python Solution
No ratings yet
Python Solution
30 pages
Regression Ex
No ratings yet
Regression Ex
8 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
The Travelling Salesman Problem Introduc
No ratings yet
The Travelling Salesman Problem Introduc
19 pages
6th Report
No ratings yet
6th Report
24 pages
K Means Algorithm
No ratings yet
K Means Algorithm
1 page
Week2 DataWrangling DelimitedText PDF
No ratings yet
Week2 DataWrangling DelimitedText PDF
5 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Group17 2
No ratings yet
Group17 2
9 pages
Pratik Zanke Source Codes
No ratings yet
Pratik Zanke Source Codes
20 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Career Transaction (Full-Stack - Backend)
No ratings yet
Career Transaction (Full-Stack - Backend)
3 pages
LabSet 04
No ratings yet
LabSet 04
4 pages
LabSet 01
No ratings yet
LabSet 01
9 pages
12 Pavlenko (2011) Thinking and Speaking in Two Languages
No ratings yet
12 Pavlenko (2011) Thinking and Speaking in Two Languages
16 pages
BNVC Twist CarnegieMellonUniversity&UniversityOfSouthenCalifonia Exec
No ratings yet
BNVC Twist CarnegieMellonUniversity&UniversityOfSouthenCalifonia Exec
3 pages
Journal of Sociolinguistics - 2022 - Hunt - Swear ING Ain T Play ING The Interaction of Taboo Language and The
No ratings yet
Journal of Sociolinguistics - 2022 - Hunt - Swear ING Ain T Play ING The Interaction of Taboo Language and The
23 pages
SWE - University Grad - Full Loop - Access - v1
No ratings yet
SWE - University Grad - Full Loop - Access - v1
11 pages
SW11071
No ratings yet
SW11071
1 page
Color JPN vs. CN
No ratings yet
Color JPN vs. CN
4 pages
2025 NVCGuidebook 07172024
No ratings yet
2025 NVCGuidebook 07172024
21 pages
BakerEtAl TimeSeries
No ratings yet
BakerEtAl TimeSeries
25 pages
Module 1 Lesson 6 Phrases, Clauses AND Sentences
No ratings yet
Module 1 Lesson 6 Phrases, Clauses AND Sentences
1 page
Harry Potter and The Philosopher's Stone - Christmas at Hogwarts
No ratings yet
Harry Potter and The Philosopher's Stone - Christmas at Hogwarts
4 pages
Tools For Analyzing Talk Part 3: Morphosyntactic Analysis: Brian Macwhinney Carnegie Mellon University
No ratings yet
Tools For Analyzing Talk Part 3: Morphosyntactic Analysis: Brian Macwhinney Carnegie Mellon University
88 pages
Lesson Plan - Sight Word Observation
No ratings yet
Lesson Plan - Sight Word Observation
2 pages
Definitions
No ratings yet
Definitions
4 pages
Citizenship Education in Indonesia
100% (1)
Citizenship Education in Indonesia
13 pages
Format For BOOK REPORT
No ratings yet
Format For BOOK REPORT
22 pages
The Power of Mind
No ratings yet
The Power of Mind
13 pages
Creative Thinking & Problem Solving (March 09)
100% (4)
Creative Thinking & Problem Solving (March 09)
48 pages
Leadership Mastery Course
100% (8)
Leadership Mastery Course
81 pages
Decoding Speech Prosody in Five Languages
No ratings yet
Decoding Speech Prosody in Five Languages
18 pages
Am Tras Într-O Zi o Bleandă, Pentru Că Nu-Mi Da Pace Să Prind Muşte..
No ratings yet
Am Tras Într-O Zi o Bleandă, Pentru Că Nu-Mi Da Pace Să Prind Muşte..
2 pages
The Flower in The Vase Is A Peony
No ratings yet
The Flower in The Vase Is A Peony
8 pages
DLL Science7
No ratings yet
DLL Science7
2 pages
This Study Resource Was Shared Via
No ratings yet
This Study Resource Was Shared Via
3 pages
How To Make A Paper Presentation
No ratings yet
How To Make A Paper Presentation
14 pages
Team Nisreen Syrian Vocab
No ratings yet
Team Nisreen Syrian Vocab
62 pages
Narrative Inquiry Story
No ratings yet
Narrative Inquiry Story
11 pages
Minimising and Managing Stress in The Workplace
No ratings yet
Minimising and Managing Stress in The Workplace
29 pages
PE 3 Lesson 1
No ratings yet
PE 3 Lesson 1
13 pages
Age and Health Effect On Cognition
No ratings yet
Age and Health Effect On Cognition
8 pages
Ell Interview Final Draft
No ratings yet
Ell Interview Final Draft
7 pages
M8 - 8.2 - Reflective Teaching - PDF
No ratings yet
M8 - 8.2 - Reflective Teaching - PDF
23 pages
Arnold - The English Word
No ratings yet
Arnold - The English Word
43 pages
Participial Adjectives
No ratings yet
Participial Adjectives
3 pages
Bringing Literature To Life Through Drama The 337
No ratings yet
Bringing Literature To Life Through Drama The 337
5 pages
I. Ii. (My Step by Step Guide To Writing A Research Paper)
No ratings yet
I. Ii. (My Step by Step Guide To Writing A Research Paper)
3 pages

LabSet 03

Uploaded by

LabSet 03

Uploaded by

Lab Set 3

1.1 Dependency parsing

-- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --

Package version: 4.0.2

corpus_split <- split(sub_corpus, seq(1, nrow(sub_corpus), by = 10))

Loading required package: future

annotate_splits <- function(corpus_text) {

annotation <- future_lapply(corpus_split, annotate_splits, future.seed = T)

annotation <- data.table::rbindlist(annotation)

anno_edit <- structure(anno_edit, class = c("spacyr_parsed", "data.frame"))

sub_tkns <- as.tokens(anno_edit, include_pos = "tag", concatenator = "_")

doc_categories <- names(sub_tkns) %>%

docvars(sub_tkns) <- doc_categories

sub_dfm <- dfm(sub_tkns)

sub_dfm <- sub_tkns %>%

acad_dfm <- dfm_subset(sub_dfm, text_type == "acad") %>%

acad_v_fic <- keyness_table(acad_dfm, fic_dfm) %>%

preposition <- kwic(sub_tkns, pattern = "of_IN", window = 3)

docname from to pre post

extract_noun_phrases <- function(annotation_data) {

np_fic <- extract_noun_phrases(filter(anno_edit, str_detect(doc_id, "^fic")))

`summarise()` has grouped output by 'doc_id', 'sentence_id'. You can override

mean_np_length_acad <- np_acad %>% summarise(mean_length = mean(np_length))

mean_np_length_fic <- np_fic %>% summarise(mean_length = mean(np_length))

Write a brief interpretation of the probability curves illustrated in Figure 5.

1.4 Multi-dimensional analysis

Your response: Based on Brezina’s functional interpretation of factors as dimen-

You might also like