All Merged Revised
All Merged Revised
You
are asked to analyze whether increasing the advertising budget would increase the sales. The
following dataset is given to you
str(marketing)
'data.frame': 200 obs. of 4 variables:
Therefore, you plotted the advertising budget and its respective sales:
. Therefore, you decided to summarize and study the relationship between the number of students and
Simple Linear Regression
the number of books sold by using Answer . The results came as
follow
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For an advertising budget that equals to zero, a company may expect a sale of USD 8,440.
For each dollar spent for advertising, a company could expect a sales earning of USD 47.537.
Based on the model, for a company that spent USD 1,000 for YouTube advertising, the company could
55976. 112
expect sales earning of USD Answer
a. write.csv()
b. save.image()
c. save.csv()
d. save()
e. saveRDS()
3. A statistical test that compares or tests the suitability of observations against expectations or its
theoretical frequencies are
Select one:
a. ANOVA
c. t-test
d. Test of Independence
e. Goodness-of-Fit
4. Question text
You are interested in learning the favorite programming languages of the first year Indonesian Informatics
and/or Computer Science undergraduate students. To achieve this mission, you asked your
highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Answer
b. Because a variable is a characteristic of each individual on which data is collected, which of the
following are variables that suit well with the research question?
gender
a boxplot
a time plot
a bar graph
a pie chart
Question 5
Question text
Select one:
a. salary
b. educational degree
c. phone number
d. city of residence
e. hair color
Question 6
Question text
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-and-
learning method of the whole UNSRAT students. There are three teaching-and-learning methods:
online, offline, or blended. The answers then tabulated and the frequency of each method is
presented in the report. Match the item/condition from the example above with the right term!
Statistics frequency
Answer 1
Question 7
Question text
a. Data distribution
b. Central tendency
c. Data symmetricity
d. Distribution gap
e. Outliers
Question 8
Question text
You are assigned to study whether there is a relationship between video game publishers and the video
game genres. You have a dataset with the following structure:
str(vgs)
'data.frame': 11857 obs. of 11 variables:
$ Rank : int 1 3 4 8 11 12 14 15 16 17 ...
$ Name : Factor w/ 8427 levels ".hack: Sekai no Mukou ni + Versus",..: 8048 4013 8049 8046
5006 4012 8042 8043 3598 2681 ...
$ Platform : Factor w/ 10 levels "DS","PC","PS",..: 8 8 8 8 1 1 8 8 9 5 ...
$ Year : Factor w/ 29 levels "1985","1988",..: 16 18 19 16 15 15 17 19 20 23 ...
$ Genre : Factor w/ 10 levels "Action","Adventure",..: 9 5 9 4 8 5 9 9 4 1 ...
$ Publisher : Factor w/ 467 levels "10TACLE Studios",..: 297 297 297 297 297 297 297 297 266
402 ...
$ NA_Sales : num 41.49 15.85 15.75 14.03 9.07 ...
$ EU_Sales : num 29 12.9 11 9.2 11 ...
$ JP_Sales : num 3.77 3.79 3.28 2.93 1.93 4.13 3.6 2.53 0.24 0.97 ...
$ Other_Sales : num 8.46 3.31 2.96 2.85 2.75 1.92 2.15 1.79 1.67 4.14 ...
$ Global_Sales: num 82.7 35.8 33 29 24.8 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as follow:
Cross-tabulation
To achieve the goal of the study, you create a Answer table.
Genre
PS4 122 19 17 15 17 47 34 5
Genre
DS 148 79
PC 49 188
PS 222 70
PS2 400 71
PS3 213 24
PS4 43 5
PSP 135 60
Wii 261 25
X360 220 28
XB 170 21
data: [HIDDEN]
All platforms share the same amount of video game publications on each genre.
There are certain video game genres that commonly published for specific platforms.
There is a significant relationship between video-game platform and the video-game genres.
Question 9
Question text
The following table contains a subset of the results from a survey about how the first year UNSRAT
undergraduate students access e-Learning.
STU003 law NA
Match the item/condition from the example above with the right term!
Element STU001
Answer 1
Variable access_mean
Answer 3
Question 10
Question text
Background
You are assigned to analyze a dataset that contains the scores of students in a class. There are 3
quizzes given to them. Is there any difference in scores between different each quiz?
Data Exploration
The structure of the dataset is as follow
str(qrt)
'data.frame': 105 obs. of 3 variables:
$ student: Factor w/ 35 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
boxplot
You then plotted a Answer , and the result came as follow
Since you were comparing scores in three different quizzes with the same participants (students), then
you need the proper method. Therefore, you need to decide which method to use. So you start with
distribution normality
checking the Answer of the performance score on each group by using
Shapiro-Wilk Test
the Answer . The p-value for each tested group is
shown in the following table:
quiz p-value
1 0.001545939
2 0.016061633
3 0.003481896
Statistical Tests
non-parametric method
Based on it, then you decide to use Answer to find is there any difference in
those quizzes. Due to the nature of the problem, then you ran
Friedman Test
a Answer , and the result is as follow:
1 2
2 0.00060 -
3 2.1e-05 0.00046
Students tend to achieve higher scores in Test 3, followed with Test 2, yet the differences are not
significant
Scores in Test 1 and 3 are significantly different, but Tests 1 and 2 are not, and so with Tests 2 and 3
Students tend to achieve significantly higher scores in Test 3, followed with Test 2. Test 1 scores are
Statistics Percentage
Answer 2
Data Exploration
The structure of the dataset is as follow
str(performance)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot
You then plotted a Answer
, and the result came as follow
Since you were comparing a variable in three different groups, then you need the proper method.
Therefore, you need to decide which method to use. So you start with checking
distribution normality
the Answer
of the performance score on each group by using
Shapiro-Wilk Test
the Answer
The p-value for each tested group is shown in the following table:
stress level p-value
low 0.11428304
moderate 0.07023834
high 0.92983350
Statistical Tests
parametric method
Based on it, then you decide to use Answer
to find is there any difference between the stress levels on the performance. Due to the nature of the
One-w ay ANOVA
problem, then you ran a Answer
, and the result is as follow:
Df Sum Sq Mean Sq F value Pr(>F)
stress 2 0.8235 0.4117 14.5 8.13e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on the result, then you decide
to Answer
continue w ith further tests to find w hich stress level has significantly different impact on performance
TukeyHSD
Therefore you ran a Answer
Fit: [HIDDEN]
$stress
Conclusion
Based on the results of the statistical tests, then you conclude that
employess with high stress level have significantly lower performance, followed with those with
moderate, and then low stress level
employees with a high stress level tend to have significantly lower performance compared to
there is no significant performance difference between employees with moderate and low stress
levels
employee with moderate stress level tend to have significantly higher performance than those with
low and/or high stress levels
13. You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you asked
your highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Answer
b. Because a variable is a characteristic of each individual on which data is collected, which of the
following are variables that suit well with the research question?
gender
14. Which chart or graph would be appropriate to display the concerned variable(s)?
a time plot
a pie chart
a boxplot
a bar graph
15. You are assigned to study whether there is a relationship between the category and the content
rating of selected apps in Google PlayStore. You have a dataset with the following structure:
str(googleplaystore)
'data.frame': 3398 obs. of 13 variables:
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654 2617 580 1701 2595 790 2762
2343 ...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828 969 412 1363 1730 865 2175
68 ...
$ Size : Factor w/ 219 levels "1.0M","1.1M",..: 143 165 163 43 93 45 219 215 136 45 ...
$ Last.Updated : Factor w/ 921 levels "April 1, 2017",..: 460 396 465 78 412 25 735 536 465 691 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551 249 348 1093 600 347 332 ...
NEWS_AND_MAGAZINES 169 66 14 34
There is no significant relationship between the category and the content rating of the selected apps
from Google PlayStore.
There is a significant relationship between the category and the content rating of the selected apps
The content ratings of Google PlayStore apps are not related to the category.
Most apps categories in Google PlayStore are highly related with the content rating.
a. Relative frequency
b. Percentage
d. Tally marks
e. Pie chart
f. Raw data
g. Bar plot
h. Boxplot
a. Gender
18. The alternate hyphothesis of a _____ t-test has the form of "The mean of x of the A group is higher
than ..."
Select one:
a. Paired
b. Two-tail
c. Half-tail
d. Unpaired
19. To gain information about the number elements in a vector, we use the _____ function.
Select one:
b. ncol()
c. sizeof()
d. nrow()
e. getLength()
20. You are assigned to study if there is any connection between the district where a person lives and
his/her hobby. There are 671 randomly selected respondents that were interviewed. Their answers
are collected into a data frame with the following structure:
str(district.hobby)
'data.frame': 671 obs. of 2 variables:
You explored the data by making a barplot that shows the grouped distribution, and it came as follow:
Frequency distribution
To achieve the goal of the study, you create a Answer
table.
hobby
DISTRICT 1 39 29 19 28 37 29
DISTRICT 2 29 33 29 30 25 32
DISTRICT 3 26 24 30 22 30 19
DISTRICT 4 28 36 23 24 26 24
data: [HIDDEN]
There is a significant relationship between the district where someone lives with his/her
hobby.
There is no significant relationship between the district where someone lives with his/her
hobby.jawaban yg benar
Someone's hobby is independent of the district where one lives. jawaban yg benar
21. X is a mean to organize quantitative data. It shows the sum of a class and all classes below
it. What is X?
Select one:
a. Ogive
c. Histogram
d. Stem-and-leaf display
e. Frequency distribution table
22. Methods that can be used to find out whether the data is normally distributed or not are
24. A publishing company is currently reviewing proposals from bookstores in several universities.
These bookstores are asking for more programming books to be stocked for each of them. Since the
stock in the company's warehouse is limited, hence the management will decide the allocation
based on historical sales data. Therefore, the management asked you to make the analysis. The data
that they possess contains historical data of the number of students who took programming
courses and the number of programming books sold at the respective university bookstore. Should
the university with more students who took a programming course to be allocated more
books?
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the
number of books sold at a university bookstore with the respective number of students
who took the programming course.
Therefore, you plotted the number of students with the respective numbers of books sold:
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For each student taking the programming class, the respective university bookstore could
expect a sale of 1.1796 books.
The number of students taking the programming class is a significant predictor of sales.
For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging
onto that.
Select one:
a. Pie chart
b. Ogive
c. Bar plot
d. Line plot
e. Boxplot
Keadaan Selesai
Soal 1
Benar
Teks soal
The suitable statistical test(s) to compare a variable in 3 or more groups, is(are)
Pilih salah satu atau lebih:
b. Kruskal-Wallis
c. ANOVA
d. Kolmogorov-Smirnov
Soal 2
Benar
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72
participants treated with three different drugs. The aim is to examine the potential of a new class of
drugs in lowering cholesterol concentration and consequently reducing heart attack. The participants
include 36 males and 36 females. Males and females were further (equally) subdivided into whether
they were at low or high risk of a heart attack. Is there any difference in the impact of each drug on
cholesterol concentration? If any, which one has the highest impact, in terms of the lowest
cholesterol concentration?
Data Exploration
The structure of the dataset is as follow
str(heartattack)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 72 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot
You then plotted a Jawaban
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
parametric method
Based on it, then you decide to use Jawaban
to find is there any difference between the drugs used toward the cholesterol concentration.
Due to the nature of the problem, then you ran
One-way ANOVA
a Jawaban
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Fit: [HIDDEN]
$drug
Conclusion
Based on the results of the statistical tests, then you conclude that
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium package)
Soal 3
Benar
Tandai pertanyaan
Teks soal
To gain information about the number elements in a vector, we use the _____ function.
Pilih salah satu:
a. ncol()
b. length()
c. sizeof()
d. nrow()
e. getLength()
Umpan balik
Pilihan terbaik adalah: length()
Soal 4
Benar
Teks soal
You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you asked your
highschool classmates who admitted to the specified program.
TRUE
a. The data are collected properly and bias is minimized Jawaban
gender
a time plot
a boxplot
a pie chart
a bar graph
Poin 1,00 dari 1,00
• a bar graph
• a pie chart
Soal 5
Benar
Tandai pertanyaan
Teks soal
A publishing company is currently reviewing proposals from bookstores in several universities. These
bookstores are asking for more programming books to be stocked for each of them. Since the stock in
the company's warehouse is limited, hence the management will decide the allocation based on
historical sales data. Therefore, the management asked you to make the analysis. The data that they
possess contains historical data of the number of students who took programming courses and the
number of programming books sold at the respective university bookstore. Should the university with
more students who took a programming course to be allocated more books?
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the
number of books sold at a university bookstore with the respective number of students who
took the programming course.
The first thing that you need to do is Jawaban Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.
Therefore, you plotted the number of students with the respective numbers of books sold:
Based on the value of R, you know that Jawaban There is a strong positive relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore
. Therefore, you decided to summarize and study the relationship between the number of
students and the number of books sold by
Simple Linear Regression
using Jawaban
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
sales.
For a number of students that equals to zero, a bookstore may expect a sale of 1.1796
books, however since the factor itself is not significant, then the bookstore should not clinging
onto that.
For each student taking the programming class, the respective university bookstore could
expect a sale of 1.1796 books.
Poin 3,00 dari 3,00
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796
books, however since the factor itself is not significant, then the bookstore should not
clinging onto that.
• The number of students taking the programming class is a significant predictor of sales.
• For each student taking the programming class, the respective university bookstore
could expect a sale of 2.1165 books.
.
Based on the model, for a university with 100 students taking a programming course, the publisher
212,82
could expect the respective bookstore would sell Jawaban
Soal 6
Benar
Teks soal
What can be learned from a histogram and/or a stem-and-leaf display?
Pilih salah satu atau lebih:
a. Data symmetricity
b. Outliers
c. Central tendency
d. Distribution gap
e. Data distribution
Umpan balik
Pilihan-pilihan terbaik adalah: Data symmetricity, Data distribution, Outliers, Distribution gap,
Central tendency
Soal 7
Benar
Tandai pertanyaan
Teks soal
You are assigned to study if there is any connection between the district where a person lives and
his/her preferred social media. There are 1,200 randomly selected respondents that were interviewed.
Their answers are collected into a data frame with the following structure:
str(district.socmed)
'data.frame': 1200 obs. of 2 variables:
You explored the data by making a barplot that shows the grouped distribution, and it came as
follow:
Cross-tabulation
To achieve the goal of the study, you create a Jawaban
table.
socmed
district FACEBOOK FRIENDSTER INSTAGRAM LINKEDIN RESEARCHGATE TWITTER
DISTRICT 1 65 48 40 55 43 57
DISTRICT 2 48 46 51 53 29 58
DISTRICT 3 51 51 33 59 65 42
DISTRICT 4 49 57 57 47 54 42
data: [HIDDEN]
Someone's preferred social media is independent of the district where one lives.
There is no significant relationship between the district where someone lives with his/her
preferred social media.
There is a significant relationship between the district where someone lives with his/her
• There is a significant relationship between the district where someone lives with
his/her preferred social media.
• Some social media are significantly preferred in certain districts.
Soal 8
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are interested in knowing the achievement of the present second-year students in your
program at their first semester. It is measured according to the GP achieved. You then collected
the 1st semester GP of 31 randomly selected second-year students and calculate the mean.
Match the item/condition from the example above with the right term!
Statistics Average
Answer 1
Parameter GP
Answer 3
Umpan balik
The correct answer is: Statistics → Average, Population → Second-year students, Parameter →
GP, Samples → 31 randomly selected second-year students
Soal 9
Benar
Tandai pertanyaan
Teks soal
The following table contains a subset of the results from a survey about how the first year UNSRAT
undergraduate students access e-Learning.
questionnaire_code program access_mean
STU003 law NA
Match the item/condition from the example above with the right term!
Variable access_mean
Answer 2
Element STU001
Answer 3
Umpan balik
Your answer is correct.
The correct answer is: Observation → personal notebook/PC, Variable → access_mean, Element
→ STU001
Soal 10
Benar
Tandai pertanyaan
Teks soal
Which of the following are best treated as discrete variables?
Pilih salah satu atau lebih:
f. Gender
SORTIR
2.
To achieve the goal of the study, you create a Cross-tabulation
table
With a 95% degree of freedom. you ran a Chi-square Test of Independence , and the result came
as follow
• There is no significant relationship between the district where someone lives with
his/her hobby
• Someone's hobby is independent of the district where one lives.
3. The following table contains a subset of the results from a survey about how the first year
UNSRAT undergraduate students access e-Learning
Match the item/condition from the example above with the right term!
Observation 7
Element STU003
Variabel SATICFATION
4. Frequency of a categoric variabel could be visualized with Bar plot
5. You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you
asked your highschool classmates who admitted to the specified program
The column students shows the number of students while the column books sold shows the
number of books sold at a university bookstore with the respective number of students who
took the programming course
There is a strong positive relationship between the number of students taking a programming course
and the number of ramming book sold at a particular university bookstore
Therefore, you decided to summarize and study the relationship between the number of students and
the number of books sold by using Simple Linear Regression The results came as
follow
• For each student taking the programming class, the respective university bookstore could
expect a sale of 21165 books
• The number of students taking the programming class is a significant predictor of sales
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging onto
that
Based on the model, for a university with 100 students taking a programming course, the publisher
could expect the respective bookstore would sell 212.82
Since the performance is measured twice, in this problem we only focus on the first
measurement (the t1 column)
You then plotted a bloxplot and the result came as follow
Since you were comparing a variable in three different groups, then you need the proper
method. Therefore, you need to decide which method to use. So you start with checking the
distribution normality performance score on each group by using the Shapiro-Wilk Test The
pvalue for each tested group is shown in the following table:
Statistical Tests
Based on it, then you decide to use parametric method to find is there any difference
between the stress levels on the performance Due to the nature of the problem, then you ran a
One-way ANOVA and the result is as
follow
Flag question
Question text
Which of the following are best treated as nominal variables?
b. Gender
c. Number of students in a class
d. Phone number
e. Number of students in the whole university
Feedback
The correct answers are: Gender, Phone number, Names of your classmates
Question 2
Correct
Mark 1.00 out of 1.00
Flag question
Question text
The suitable statistical test(s) to compare a variable to a specific value is(are)
d. One-sample t-test
e. Paired samples t-test
f. Kruskal-Wallis test
Feedback
The correct answers are: One-sample t-test, Wilcoxon Signed Rank test
Question 3
Correct
Mark 1.00 out of 1.00
Flag question
Question text
Qualitative data could be organized with the following ways:
c. Relative frequency
d. Bar plot
e. Percentage
f. Tally marks
Feedback
The correct answers are: Frequency distribution table, Tally marks, Relative frequency, Percentage
Question 4
Correct
Mark 1.00 out of 1.00
Flag question
Question text
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-and-
learning method of the whole UNSRAT students. There are three teaching-and-learning methods:
online, offline, or blended. The answers then tabulated and the frequency of each method is
presented in the report. Match the item/condition from the example above with the right term!
Question 5
Correct
Mark 1.00 out of 1.00
Flag question
Question text
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-and-
learning method of the whole UNSRAT students. There are three teaching-and-learning methods: online,
offline, or blended. The following table contains a subset of the results
questionnaire_code preferred_tlm
STU001 blended
STU002 offline
STU003 online
STU004 blended
Match the item/condition from the example above with the right term!
Element STU003
Answer 1
Variable preferred_tlm
Answer 2
Observation blended
Answer 3
Feedback
Your answer is correct.
The correct answer is: Element → STU003, Variable → preferred_tlm, Observation → blended
Question 6
Correct
Mark 1.00 out of 1.00
Flag question
Question text
A publishing company is currently reviewing proposals from bookstores in several universities. These
bookstores are asking for more programming books to be stocked for each of them. Since the stock in
the company's warehouse is limited, hence the management will decide the allocation based on
historical sales data. Therefore, the management asked you to make the analysis. The data that they
possess contains historical data of the number of students who took programming courses and the
number of programming books sold at the respective university bookstore. Should the university with
more students who took a programming course to be allocated more books?
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the
number of books sold at a university bookstore with the respective number of students who took
the programming course.
The first thing that you need to do is Answer Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.
Therefore, you plotted the number of students with the respective numbers of books sold:
Based on the value of R, you know that Answer There is a strong positive relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore
. Therefore, you decided to summarize and study the relationship between the number of students
Simple Linear Regression
and the number of books sold by using Answer
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
that.
1.1796 is a strong predictor of the model
For each student taking the programming class, the respective university bookstore could
• For a number of students that equals to zero, a bookstore may expect a sale
of 1.1796 books, however since the factor itself is not significant, then the
bookstore should not clinging onto that.
• The number of students taking the programming class is a significant
predictor of sales.
• For each student taking the programming class, the respective university
bookstore could expect a sale of 2.1165 books.
.
Based on the model, for a university with 100 students taking a programming course, the publisher
212.82
could expect the respective bookstore would sell Answer
Question 7
Correct
Mark 1.00 out of 1.00
Flag question
Question text
Suppose that you are interested in the percentage of cellphone brands owned by the students of
UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the brands of
their cellphones.
a. Why can collecting data only from your classmates cause bias in the data?
Perhaps some of your classmates do not bring their cellphones on Wednesday.
b. Because a variable is a characteristic of each individual on which data is collected, which of the
following are variables in this study?
One of your classmates.
The day the data collected.
cellphone brand
gender
Mark 1.00 out of 1.00
• cellphone brand
c. Which chart or graph would be appropriate to display the proportion of the brands?
pie chart
boxplot
time plot
line plot
bar graph
Mark 1.00 out of 1.00
• bar graph
• pie chart
Question 8
Correct
Mark 1.00 out of 1.00
Flag question
Question text
Background
You are assigned to analyze a dataset that contains the final scores of students in three parallel
classes. Is there any difference in the scores of the students in a different class?
Data Exploration
The structure of the dataset is as follow
str(quiz.result)
'data.frame': 150 obs. of 2 variables:
boxplot
You then plotted a Answer
Since you were comparing scores in three different quizzes with the same participants (students), then
you need the proper method. Therefore, you need to decide which method to use. So you start with
distribution normality
checking the Answer
class p-value
A 0.4505716
B 0.2808105
C 0.2490939
Statistical Tests
non-parametric method
Based on it, then you decide to use Answer
to find is there any difference in those quizzes. Due to the nature of the problem, then you ran
Kruskal-Wallis test
a Answer
.
Dunn Test
Therefore you ran a Answer
(Bonferroni)
Col Mean-|
Row Mean | A B
---------+----------------------
B | 2.193561
| 0.0848
C | 7.259698 5.066137
| 0.0000* 0.0000*
alpha = 0.05
Conclusion
Based on the results of the statistical tests, then you conclude that
• Students in classes A and B tend to have higher scores than students in class
C. The scores are not significantly different between students in classes A
and B
Question 9
Correct
Mark 1.00 out of 1.00
Flag question
Question text
You are assigned to study whether there is a relationship between video game publishers and the video
game genres. You have a dataset with the following structure:
str(vgs)
'data.frame': 11857 obs. of 11 variables:
$ Rank : int 1 3 4 8 11 12 14 15 16 17 ...
$ Name : Factor w/ 8427 levels ".hack: Sekai no Mukou ni + Versus",..: 8048 4013
8049 8046 5006 4012 8042 8043 3598 2681 ...
$ Platform : Factor w/ 10 levels "DS","PC","PS",..: 8 8 8 8 1 1 8 8 9 5 ...
$ Year : Factor w/ 29 levels "1985","1988",..: 16 18 19 16 15 15 17 19 20 23 ...
$ Genre : Factor w/ 10 levels "Action","Adventure",..: 9 5 9 4 8 5 9 9 4 1 ...
$ Publisher : Factor w/ 467 levels "10TACLE Studios",..: 297 297 297 297 297 297 297
297 266 402 ...
$ NA_Sales : num 41.49 15.85 15.75 14.03 9.07 ...
$ EU_Sales : num 29 12.9 11 9.2 11 ...
$ JP_Sales : num 3.77 3.79 3.28 2.93 1.93 4.13 3.6 2.53 0.24 0.97 ...
$ Other_Sales : num 8.46 3.31 2.96 2.85 2.75 1.92 2.15 1.79 1.67 4.14 ...
$ Global_Sales: num 82.7 35.8 33 29 24.8 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as follow:
Cross-tabulation
To achieve the goal of the study, you create a Answer
table.
Genre
PS4 122 19 17 15 17 47 34 5
Genre
DS 148 79
PC 49 188
PS 222 70
PS2 400 71
PS3 213 24
PS4 43 5
PSP 135 60
Wii 261 25
X360 220 28
XB 170 21
genres.
All platforms share the same amount of video game publications on each genre.
There is no significant relationship between genres and the platform used.
There are certain video game genres that commonly published for specific platforms.
Mark 2.00 out of 2.00
(Credit: the dataset was taken from "Video Game Sales Analyze sales data from more than 16,500 games"
by Gregory Smith, available on Kaggle.)
Question 10
Correct
Mark 1.00 out of 1.00
Flag question
Question text
To gain information about the number elements in a vector, we use the _____ function.
Select one:
a. ncol()
b. length()
c. sizeof()
d. nrow()
e. getLength()
Soal 1
Which one of the following that is best treated as ordinal variables?
a. educational degree
b. hair color
c. phone number
d. city of residence
e. salary
Umpan balik
Soal 2
Teks soal
You are interested in knowing the percentage of how the first year UNSRAT undergraduate students
access e-Learning. To estimate the percentage, you survey with 500 randomly selected students and
determine what are the means used by the students.
Match the item/condition from the example above with the right term!
Parameter Answer 1
Answer
Samples
2
Population Answer 3
Statistics Answer 4
Umpan balik
The correct answer is: Parameter → The means used to access e-Learning, Samples → 500 randomly
selected first year UNSRAT undergraduate students, Population → First year UNSRAT undergraduate
students, Statistics → Percentage
Soal 3
Teks soal
X is a mean to organize quantitative data. It shows the sum of a class and all classes below it. What is X?
a. Stem-and-leaf display
b. Histogram
e. Ogive
Umpan balik
Soal 4
Teks soal
Therefore, you plotted the number of students with the respective numbers of books sold:
. Therefore, you decided to summarize and study the relationship between the number of students and
the number of books sold by using Jawaban
Residuals:
Min 1Q Median 3Q Max
-80.265 -37.203 -2.531 38.198 83.988
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1796 23.6051 0.05 0.96
nstudents 2.1165 0.1166 18.15 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For each student taking the programming class, the respective university bookstore could expect a
For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging onto that.
The number of students taking the programming class is a significant predictor of sales.
For each student taking the programming class, the respective university bookstore could expect a
sale of 1.1796 books.
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging onto
that.
• The number of students taking the programming class is a significant predictor of sales.
• For each student taking the programming class, the respective university bookstore could expect
a sale of 2.1165 books.
Based on the model, for a university with 100 students taking a programming course, the
212,82
publisher could expect the respective bookstore would sell Jawaban
Soal 5
Teks soal
You are assigned to study whether there is a relationship between the category and the content
rating of selected apps in Google PlayStore. You have a dataset with the following structure:
str(googleplaystore)
'data.frame': 3398 obs. of 13 variables:
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654
2617 580 1701 2595 790 2762 2343 ...
$ Category : Factor w/ 3 levels "FAMILY","GAME",..: 2 2 2 2 2 2 2 2 2 2
...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828
969 412 1363 1730 865 2175 68 ...
$ Size : Factor w/ 219 levels "1.0M","1.1M",..: 143 165 163 43 93
45 219 215 136 45 ...
$ Installs : Factor w/ 21 levels "0","0+","1,000,000,000+",..: 10 3 19
7 7 16 10 10 19 19 ...
$ Type : Factor w/ 3 levels "Free","NaN","Paid": 1 1 1 1 1 1 1 1 1
1 ...
$ Price : Factor w/ 38 levels "$0.99","$1.04",..: 38 38 38 38 38 38
38 38 38 38 ...
$ Content.Rating: Factor w/ 4 levels "Everyone","Everyone 10+",..: 2 2 1 1 1
1 1 2 1 1 ...
$ Genres : Factor w/ 85 levels "Action","Action;Action &
Adventure",..: 4 7 23 19 23 29 1 77 1 23 ...
$ Last.Updated : Factor w/ 921 levels "April 1, 2017",..: 460 396 465 78
412 25 735 536 465 691 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551
249 348 1093 600 347 332 ...
$ Android.Ver : Factor w/ 24 levels "1.5 and up","1.6 and up",..: 15 15 15
15 14 15 8 15 13 13 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as follow:
table.
Content.Rating
Category Everyone Everyone 10+ Mature 17+ Teen
FAMILY 1529 131 50 261
GAME 608 131 74 331
NEWS_AND_MAGAZINES 169 66 14 34
The content ratings of Google PlayStore apps are not related to the category.
There is no significant relationship between the category and the content rating of the selected
Most apps categories in Google PlayStore are highly related with the content rating.
There is a significant relationship between the category and the content rating of the selected apps
• There is a significant relationship between the category and the content rating of the selected
apps from Google PlayStore.
• Most apps categories in Google PlayStore are highly related with the content rating.
(Credit: the dataset was taken and subsetted from "Google Play Store Apps Web scraped data of
10k Play Store apps for analysing the Android market" by Lavanya Gupta, available on Kaggle.)
Soal 6
Teks soal
You are interested in learning the favorite programming languages of the first year Indonesian
Informatics and/or Computer Science undergraduate students. To achieve this mission, you
asked your highschool classmates who admitted to the specified program.
b. Because a variable is a characteristic of each individual on which data is collected, which of the
following are variables that suit well with the research question?
gender
a time plot
a pie chart
a boxplot
a bar graph
• a bar graph
• a pie chart
Soal 7
Tidak benar
Tandai pertanyaan
Teks soal
a. write.csv()
b. saveRDS()
c. save()
d. save.image()
e. save.csv()
Umpan balik
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72
participants treated with three different drugs. The aim is to examine the potential of a new class
of drugs in lowering cholesterol concentration and consequently reducing heart attack. The
participants include 36 males and 36 females. Males and females were further (equally)
subdivided into whether they were at low or high risk of a heart attack. Is there any difference
in the impact of each drug on cholesterol concentration? If any, which one has the highest
impact, in terms of the lowest cholesterol concentration?
Data Exploration
str(heartattack)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 72 obs. of 5 variables:
$ gender : Factor w/ 2 levels "male","female": 1 1 1 1 1 1 1 1 1 1 ...
$ risk : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 2 2 2 2 ...
$ drug : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 2 2 2 2 ...
$ cholesterol: num 5.24 5.08 4.68 5.36 4.96 ...
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
You then plotted a Jawaban
. The p-value for each tested group is shown in the following table:
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
to find is there any difference between the drugs used toward the cholesterol concentration. Due to the
nature of the problem, then you ran a Jawaban
, and the result is as follow:
Df Sum Sq Mean Sq F value Pr(>F)
drug 2 1.235 0.6177 2.63 0.0793 .
Residuals 69 16.204 0.2348
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Fit: [HIDDEN]
$drug
diff lwr upr p adj
B-A -0.277327333 -0.6124096 0.05775494 0.1241979
C-A -0.278421280 -0.6135035 0.05666099 0.1222405
C-B -0.001093947 -0.3361762 0.33398832 0.9999663
Conclusion
Based on the results of the statistical tests, then you conclude that
drug that yields the lowest cholesterol rate is drug C, followed with drug B, and then drug A
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium
package)
Soal 9
Tidak benar
Tandai pertanyaan
Teks soal
The alternate hyphothesis of a _____ t-test has the form of "The mean of x of the A group is higher than
..."
a. Unpaired
b. Two-tail
c. Paired
d. Half-tail
e. One-tail
Umpan balik
Soal 10
Tidak benar
Tandai pertanyaan
Teks soal
The following table contains a subset of the results from a survey about how the first year
UNSRAT undergraduate students access e-Learning.
STU001 informatics 4
STU002 civil 7
STU003 law 3
STU004 medical 5
Match the item/condition from the example above with the right term!
Element Answer 1
Observation Answer 2
Variable Answer 3
Umpan balik
1
The following table contains a subset of the results from a survey about how the first year UNSRAT
undergraduate students access e-Learning.
questionnaire_code program satisfaction
STU001 informatics 4
STU002 civil 7
STU003 law 3
STU004 medical 5
Match the item/condition from the example above with the right term!
The correct answer is: Element → STU003, Variable → satisfaction, Observation → 7
1,341 undergraduate students were surveyed, to gain knowledge about the preferred teaching-
and-learning method of the whole UNSRAT students. There are three teaching-and-learning
methods: online, offline, or blended. The answers then tabulated and the frequency of each
method is presented in the report. Match the item/condition from the example above with the
right term!
You are assigned to study whether there is a relationship between the category and the content rating of
selected apps in Google PlayStore. You have a dataset with the following structure:
str(googleplaystore)
'data.frame': 3398 obs. of 13 variables:
$ App : Factor w/ 3088 levels "¡Ay Caramba!",..: 2472 2679 654 2617 580
1701 2595 790 2762 2343 ...
$ Rating : num 4.5 4.5 4.4 4.7 4.5 4.2 4.4 4.6 4.3 4.3 ...
$ Reviews : Factor w/ 2379 levels "0","1","10","100",..: 1556 1061 828 969 412
1363 1730 865 2175 68 ...
$ Current.Ver : Factor w/ 1094 levels "0.0.1","0.0.2",..: 606 444 186 551 249 348
1093 600 347 332 ...
You explored the data by making a barplot that shows the grouped distribution, and it came as
follow:
Cross-tabulation
To achieve the goal of the study, you create a Answer
table.
Content.Rating
NEWS_AND_MAGAZINES 169 66 14 34
There is no significant relationship between the category and the content rating of the
selected apps from Google PlayStore.
The content ratings of Google PlayStore apps are not related to the category.
Most apps categories in Google PlayStore are highly related with the content rating.
There is a significant relationship between the category and the content rating of the
A statistical test that conducted to determine whether there is an association between the 2
categorical variables is
The correct answers are: Number of students in a class, Number of students in the whole
university, Grades frequency at the end of a course, Number of children in a family
Background
You are assigned to analyze a dataset that contains the performance score measures of participants at
two-time points. The aim of this study is to evaluate the effect of gender and stress on performance
scores. Is there any difference in performance between different stress levels? If any, which one
yields the best/worst performance score?
Data Exploration
The structure of the dataset is as follow
str(performance)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 60 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
Boundary
Since the performance is measured twice, in this problem we only focus on the first measurement (the
t1 column).
. The p-value for each tested group is shown in the following table:
stress level p-value
low 0.11428304
moderate 0.07023834
high 0.92983350
Statistical Tests
parametric method
Based on it, then you decide to use Answer
to find is there any difference between the stress levels on the performance. Due to the nature of
One-w ay ANOVA
the problem, then you ran a Answer
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
.
TukeyHSD
Therefore you ran a Answer
Fit: [HIDDEN]
$stress
Conclusion
Based on the results of the statistical tests, then you conclude that
employee with moderate stress level tend to have significantly higher performance than
those with low and/or high stress levels
the experiment is a violation of human rights
there is no significant performance differences in all stress level
there is no significant performance difference between employees with moderate and low
stress levels
employees with a high stress level tend to have significantly lower performance compared
Suppose that you are interested in the percentage of cellphone brands owned by the students
of UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the
brands of their cellphones.
a. Why can collecting data only from your classmates cause bias in the data?
It assumes the percentage of the cellphone brands owned by the first-year students may
cellphone brand
The day the data collected.
Mark 1.00 out of 1.00
The correct answer is:
• cellphone brand
c. Which chart or graph would be appropriate to display the proportion of the brands?
line plot
boxplot
time plot
pie chart
bar graph
Mark 1.00 out of 1.00
• bar graph
• pie chart
200 data of advertising budget using YouTube and the respective sales earning were collected. You are
asked to analyze whether increasing the advertising budget would increase the sales. The
following dataset is given to you
str(marketing)
'data.frame': 200 obs. of 4 variables:
The youtube column shows the advertising budget spending, and the sales column shows the
earning. All these numbers are in thousands of dollar.
Therefore, you plotted the advertising budget and its respective sales:
Pearson correlation test
After that, you ran a Answer
. Therefore, you decided to summarize and study the relationship between the number of students
Simple Linear Regression
and the number of books sold by using Answer
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• For an advertising budget that equals to zero, a company may expect a sale of USD
8,440.
• For each dollar spent for advertising, a company could expect a sales earning of USD
47.537.
• the intercept 8.439 is a strong predictor of the model
Based on the model, for a company that spent USD 1,000 for YouTube advertising, the company could
expect sales earning of USD 55976.112
Qualitative data could be organized with the following ways:
10
To gain information about the number elements in a vector, we use the _____ function.
The youtube column shows the advertising budget spending, and the sales
column shows the earning. All these numbers are in thousands of dollar.
After that, you ran a Pearson correlation test and the results is R=0.7822R=0.7822.
Based on the value of R, you know that
There is a strong positive relationship between the spending on Youtube
Therefore, you decided to summarize and study the relationship between the
number of students and the number of books sold by using Simple Linear
Regression The results came as follow
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
2. You are interested in learning the favorite programming languages of the first
year Indonesian Informatics and/or Computer Science undergraduate
students. To achieve this mission, you asked your highschool classmates who
admitted to the specified program.
3. You are assigned to study if there is any connection between the district
where a person lives and his/her hobby. There are 671 randomly selected
respondents that were interviewed. Their answers are collected into a data
frame with the following structure:
tr(district.hobby)
'data.frame': 671 obs. of 2 variables:
2 1 4 4 2 2 2 3 1 4 ...
You explored the data by making a barplot that shows the grouped
distribution, and it came as follow:
To achieve the goal of the study, you create a
Cross-Tabulation
hobby
DISTRICT 1 39 29 19 28 37
29
DISTRICT 2 29 33 29 30 25
32
DISTRICT 3 26 24 30 22 30
19
DISTRICT 4 28 36 23 24 26
24
With a 95% degree of freedom, you ran a
The correct answer is: Chi-square Test of Independence and the result came
as follow
data: [HIDDEN]
Data Exploration
The structure of the dataset is as follow
str(heartattack)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 72 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
Statistical Tests
Based on it, then you decide to use parametric method to find is there any
difference between the drugs used toward the cholesterol concentration. Due to the
nature of the problem, then you ran a One-way ANOVA and the result is as follow:
Df Sum Sq Mean Sq F value Pr(>F)
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Fit: [HIDDEN]
$drug
Conclusion
Based on the results of the statistical tests, then you conclude that
the drugs gave no significantly different impact on the cholesterol rate
9. The following table contains a subset of the results from a survey about how the
first year UNSRAT undergraduate students access e-Learning.
questionnaire_code program access_mean
STU001 informatics personal notebook/PC
STU002 civil shared notebook/PC
STU003 law NA
STU004 medical personal tablet
Match the item/condition from the example above with the right term!
Variable access_mean
10. Quantitative data visualization which separates the first digit and
the the other digits is
Select one:
• Stem-and-leaf display
Soal 1
Separuh benar
Tandai pertanyaan
Teks soal
Background
You are assigned to analyze a dataset that contains measures of cholesterol concentration in 72 participants
treated with three different drugs. The aim is to examine the potential of a new class of drugs in lowering
cholesterol concentration and consequently reducing heart attack. The participants include 36 males and 36
females. Males and females were further (equally) subdivided into whether they were at low or high risk of
a heart attack. Is there any difference in the impact of each drug on cholesterol concentration? If any,
which one has the highest impact, in terms of the lowest cholesterol concentration?
Data Exploration
The structure of the dataset is as follow
str(heartattack)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 72 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
boxplot
You then plotted a Jawaban
Since you were comparing a variable in three different groups, then you need the proper method.
Therefore, you need to decide which method to use. So you start with checking the Jawaban
distribution normality
of the cholesterol concentration on each group by using the Jawaban
Shapiro-Wilk Test
. The p-value for each tested group is shown in the following table:
drug p-value
A 0.1537620
B 0.7674545
C 0.5537145
Statistical Tests
parametric method
Based on it, then you decide to use Jawaban
to find is there any difference between the drugs used toward the cholesterol concentration. Due
to the nature of the problem, then you ran a Jawaban
One-way ANOVA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Based on the result, then you decide to Jawaban
draw a final conclusion
Fit: [HIDDEN]
$drug
Conclusion
Based on the results of the statistical tests, then you conclude that
drug that yields the lowest cholesterol rate is drug C, followed with drug B, and then drug A
(Credit: dataset used in this vignettes is based on the heartattack dataset in the datarium package)
Soal 2
Benar
Tandai pertanyaan
Teks soal
The following table contains a subset of the results from a survey about the achievement of the present
second-year students in your program in their first semester.
questionnaire_code 1st_GP
STU001 3.92
STU002 3.88
STU003 3.2
STU004 2.78
Match the item/condition from the example above with the right term!
Element STU001
Answer 1
Variable 1st_GP
Answer 2
Observation 3.92
Answer 3
Umpan balik
Your answer is correct.
The correct answer is: Element → STU001, Variable → 1st_GP, Observation → 3.92
Soal 3
Benar
Tandai pertanyaan
Teks soal
A publishing company is currently reviewing proposals from bookstores in several universities. These
bookstores are asking for more programming books to be stocked for each of them. Since the stock in the
company's warehouse is limited, hence the management will decide the allocation based on historical sales
data. Therefore, the management asked you to make the analysis. The data that they possess contains
historical data of the number of students who took programming courses and the number of programming
books sold at the respective university bookstore. Should the university with more students who took a
programming course to be allocated more books?
$ nstudents : int 204 179 200 177 207 195 166 178 213 130 ...
$ books_sold: int 441 329 467 376 504 396 354 439 461 235 ...
The column nstudents shows the number of students while the column books_sold shows the
number of books sold at a university bookstore with the respective number of students who took
the programming course.
The first thing that you need to do is Jawaban Determine whether there is a strong relationship between the number of students taking a programming course and the number of programming book sold at a particular university bookstore.
Therefore, you plotted the number of students with the respective numbers of books sold:
. Therefore, you decided to summarize and study the relationship between the number of students
Simple Linear Regression
and the number of books sold by using Jawaban
Call:
Residuals:
Coefficients:
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
For each student taking the programming class, the respective university bookstore could
For each student taking the programming class, the respective university bookstore could
expect a sale of 1.1796 books.
The number of students taking the programming class is a significant predictor of sales.
Poin 3,00 dari 3,00
• For a number of students that equals to zero, a bookstore may expect a sale of 1.1796 books,
however since the factor itself is not significant, then the bookstore should not clinging onto
that.
• The number of students taking the programming class is a significant predictor of sales.
• For each student taking the programming class, the respective university bookstore could
expect a sale of 2.1165 books.
.
Based on the model, for a university with 100 students taking a programming course, the publisher could
212.82
expect the respective bookstore would sell Jawaban
Soal 4
Benar
Tandai pertanyaan
Teks soal
You are assigned to study if there is any connection between the district where a person lives and his/her
hobby. There are 671 randomly selected respondents that were interviewed. Their answers are collected
into a data frame with the following structure:
str(district.hobby)
'data.frame': 671 obs. of 2 variables:
Cross-tabulation
To achieve the goal of the study, you create a Jawaban
table.
hobby
DISTRICT 1 39 29 19 28 37 29
DISTRICT 2 29 33 29 30 25 32
DISTRICT 3 26 24 30 22 30 19
DISTRICT 4 28 36 23 24 26 24
data: [HIDDEN]
There is a significant relationship between the district where someone lives with his/her
hobby.
There is no significant relationship between the district where someone lives with his/her
hobby.
Some hobbies are significantly preferred in certain districts.
• There is no significant relationship between the district where someone lives with his/her
hobby.
• Someone's hobby is independent of the district where one lives.
Soal 5
Benar
Tandai pertanyaan
Teks soal
Methods that can be used to find out whether the data is normally distributed or not are
Pilih salah satu atau lebih:
Soal 6
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
You are interested in knowing the achievement of the present second-year students in your
program at their first semester. It is measured according to the GP achieved. You then collected
the 1st semester GP of 31 randomly selected second-year students and calculate the mean.
Match the item/condition from the example above with the right term!
Parameter GP
Answer 2
Statistics Average
Answer 4
Umpan balik
The correct answer is: Population → Second-year students, Parameter → GP, Samples → 31
randomly selected second-year students, Statistics → Average
Soal 7
Benar
Tandai pertanyaan
Teks soal
Qualitative data could be visualized with the following ways:
Pilih salah satu atau lebih:
a. Raw data
b. Bar plot
c. Pie chart
d. Boxplot
e. Percentage
f. Relative frequency
h. Tally marks
Umpan balik
Pilihan-pilihan terbaik adalah: Bar plot, Pie chart
Soal 8
Benar
Tandai pertanyaan
Teks soal
Suppose that you are interested in the percentage of cellphone brands owned by the students of
UNSRAT. Therefore, on Wednesday, after class, you asked all your classmates about the brands
of their cellphones.
a. Why can collecting data only from your classmates cause bias in the data?
It assumes the percentage of the cellphone brands owned by the first-year students may
gender
cellphone brand
• cellphone brand
c. Which chart or graph would be appropriate to display the proportion of the brands?
pie chart
time plot
boxplot
line plot
bar graph
Poin 1,00 dari 1,00
• bar graph
• pie chart
Soal 9
Benar
Poin 1,00 dari 1,00
Tandai pertanyaan
Teks soal
The correct command(s) to create a sequence of number in R is(are)?
Pilih salah satu atau lebih:
a. seq(1, 10)
b. seq(10, 1)
c. 1:10
d. seq(10)
e. seq(10, 1, 1)
g. seq(10, 1, -1)
Umpan balik
Pilihan-pilihan terbaik adalah: 1:10, seq(1, 10), seq(10, 1), seq(10, 1, -1), seq(10)
Soal 10
Benar
Tandai pertanyaan
Teks soal
Which of the following are best treated as discrete variables?
Pilih salah satu atau lebih:
c. Height