72

The document contains a series of questions and answers related to data analytics, programming, and statistical concepts, including topics such as Walmart's analytical needs, data mining, regression analysis, and the structure of analytical reports. It also covers software tools like IBM Watson Studio and SPSS, as well as programming practices in Python. Additionally, it addresses data visualization techniques and the characteristics of different types of data.

Uploaded by

nhuquynh40085

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

72

Uploaded by

nhuquynh40085

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ADY201m (FPTU_AI)

Hoc trực tuyến tại https://quizlet.com/_d2cnw

How is Walmart reported to have addressed its analytical needs?
A: Social media
B: Code sharing
Crowdsourcing
C: Outsourcing
D: Crowdsourcing
E: None of the options is correct
The results section is where you present:
A: The empirical findings.
B: R Squared. The empirical findings.
C: The conclusion.
D: The methods used.
According to the Module 2 reading, "Data Mining", when data are
missing in a systematic way, you can simply extrapolate the data
or impute the missing data by filling in the average of the values
False.
around the missing data.
A: False.
B: True.
Based on the Module 2 reading, "Regression", the real added
value of the author's research on what type of properties is quan-
tifying the magnitude of relationships between housing prices and
different determinants?
Residential real estate
A: Residential real estate
B: Foreclosed
C: Commercial real estate
D: Vacant
According to the Module 3 reading, "The Final Deliverable", the
ultimate purpose of analytics is to communicate findings to what
people to formulate policy or strategy?
A: Marketing Stakeholders
B: Stakeholders
C: CEO's
D: Salespeople
Based on the Module 3 reading, "The Final Deliverable", what is
the role of a data scientist??
A: Managing a team of analysts to create a predictive model.
B: Using the data to put together a story that boosts financial Using insights to build a narrative to communicate findings.
outlooks.
C: Using insights to build a narrative to communicate findings.
D: Developing a strategy to fix the problems in the findings.
Based on the Module 3 reading, "The Report Structure", regard-
less of the length of the final deliverable, the author recommends
that it includes a cover page, table of contents, executive summa-
ry, a methodology section, and a what?
Discussion section
A: Project scope statement
B: Discussion section
C: List of people who worked on the project
D: Copy of your data
Based on the Module 3 reading, "The Report Structure", an intro-
ductory section is always helpful in setting up the problem for the
reader who might be what?
A: In sales New to the topic
B: Looking for the statistical calculations
C: Wanting to know the research methods
D: New to the topic

Which statement is not true about Open Source and Free Soft-
ware?
Free Software can always be run, studied, modified and redistrib-
uted with or without changes.
A: Free Software and Open Source can be used interchangeably.
B: Free Software can always be run, studied, modified and redis-
1/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
tributed with or without changes.
C: Most of Free Software licenses also qualify for Open Source.
D: Open Source Software can be modified without sharing the
modified source code depending on the Open Source license.
Which statements about IBM Watson Studio and OpenScale are
correct? (Select all that apply.)
A: Watson Studio together with Watson OpenScale is a database Watson Studio together with Watson OpenScale covers the com-
management system. plete development life cycle for all data science, machine learning
B: Watson Studio together with Watson OpenScale covers the and AI tasks.
complete development life cycle for all data science, machine Watson Studio together with Watson OpenScale is available as
learning and AI tasks. a Cloud offering as well as a package running on top of Kuber-
C: Watson Studio together with Watson OpenScale is available netes/RedHat OpenShift in a local data center called IBM Cloud
as a Cloud offering as well as a package running on top of Pak for Data.
Kubernetes/RedHat OpenShift in a local data center called IBM
Cloud Pak for Data.
True or False: Open data is always distributed under a Community
Data License Agreement.
False
A: True
B: False
Fill in the blank: IBM Cloud uses ______________ as a way for
you to organize your account resources in customizable groupings
so that you can quickly assign users access to more than one
resource at a time.
Resource groups
A: Resource groups
B: Services
C: Projects
D: Catalogs
Which products (of those we covered) allow you to build data
pipelines using graphical user interface and no coding?
A: Only IBM SPSS Statistics.
B: Only IBM SPSS Modeler. IBM SPSS Modeler and Modeler Flows in Watson Studio.
C: OpenScale
D: IBM SPSS Modeler and Modeler Flows in Watson Studio.
E: All of the above.
Which features of Data Refinery help save hours and days of data
preparation?
A: Flexibility of using Intuitive user interface and coding templates
enabled with powerful operations to shape and clean data.
B: Data visualization and profiles to spot the difference and guide
data preparation steps. All of the above.
C: Incremental snapshots of the results allowing the user to gauge
success with each iterative change.
D: Saving, editing and fixing the steps provide ability to iteratively
fix the steps in the flow.
E: All of the above.
Watson Knowledge Catalog provides what functionality?
A: Catalog data and ML assets, help to find relevant assets, keep
track of asset lineage, enforce data governance.
B: Build data and water pipelines. Catalog data and ML assets, help to find relevant assets, keep
C: Catalog all books mentioning Doctor Watson and Sherlock track of asset lineage, enforce data governance.
Holmes.
D: Process data, build and deploy models.
E: Create data and deploy models into production.

Fill in the blank: IBM SPSS Statistics syntax can be created using
___________.
A: IBM SPSS Modeler streams. Graphical user interface of IBM SPSS Statistics product or syntax
B: Watson Studio Modeler flows. editor.
C: Graphical user interface of IBM SPSS Statistics product or
syntax editor.

2/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
D: OpenScale
E: AutoAI
AutoAI provides which of the following services?
A: Monitoring for fairness, bias, and model drift.
B: Automatic finding of optimal data preparation steps, model
selection, and hyperparameter optimization. Automatic finding of optimal data preparation steps, model selec-
tion, and hyperparameter optimization.
C: Cataloging data and model assets.
D: Creating SPSS syntax.
E: All of the above.
OpenScale provides which of the following services?
A: Creating SPSS syntax.
B: Automatic finding of optimal data preparation steps, model
selection, and hyperparameter optimization. Monitoring for fairness, bias, and model drift.
C: Cataloging data and model assets.
D: Monitoring for fairness, bias, and model drift.
E: All of the above.
Predictive Model Markup Language (PMML) was created by
which entity?
A: Microsoft
B: The Data Mining Group The Data Mining Group
C: Oracle
D: IBM
E: SPSS
Data Refinery provides which of the following services?
A: Catalog the data assets.
B: Monitor for bias and model drift.
Visualize and prepare data.
C: Visualize and prepare data.
D: Automatically build models.
E: All of the above.
Which feature in Watson Studio helps to keep track of and discover
relevant Machine Learning assets?
A: Watson Knowledge Catalog
B: All of the above Watson Knowledge Catalog
C: AutoAI
D: Modeler Flows
E: OpenScale
What Modeler flow includes data management capabilities and
visualization?
A: All of the above
B: Charts SPSS Modeler streams
C: SPSS Modeler streams
D: Graphical user interface
E: SPSS syntax
The Data Requirements stage of the data science methodolo-
gy involves identifying the necessary data content, formats and
sources for initial data collection. True
A: True
B: False
Database Administrators determine how to collect and prepare
the data.
False
A: True
B: False
In the Data Collection stage, the data requirements are revised
and decisions are made as to whether or not more data is needed.
True
A: True
B: False
In what stage would you correct invalid values and address out-
liers?
3/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
A: The Data Preparation stage
B: The Data Understanding stage
The Data Preparation stage
C: The Modeling stage
D: The Data Requirements stage
Deploying a model into production represents the end of the
iterative process that includes Feedback, Model Refinement, and
Redeployment. False
A: True.
B: False
The data science methodology provides the data scientist with a
framework on how to proceed to do what?
A: Obtain answers
Obtain answers
B: Obtain data storage
C: Obtain data
D: None of the above
in the video, if age=18 what would be the result
A: move on move on
B: you can enter
in the video what would be the result if we set the variable age as
follows: age= -10
A: go see Meat Loaf go see Meat Loaf
move on move on
B: you can enter
move on
Why do we use exception handlers?
A: Terminate a program
B: Read a file Catch errors within a program
C: Write a file
D: Catch errors within a program
What is the purpose of a try...except statement?
A: Executes the code block only if a certain condition exists
B: Catch and handle exceptions when an error occurs Catch and handle exceptions when an error occurs
C: Crash a program when errors occur
D: Only executes if one condition is true
What is the radius attribute
after the following code block is run?
BlueCircle=Circle(10,'blue')
BlueCircle.add_radius(20)
30

A: 10
B: 20
C: 30
Why is it best practice to have multiple except statements with
each type of error labeled correctly?
A: Ensure the error is caught so the program will terminate
In order to know what type of error was thrown and the
B: In order to know what type of error was thrown and the
location within the program
location within the program
C: To skip over certain blocks of code during execution
D: It is not necessary to label errors
What are the most common modes used when opening a file?
A: (a)ppend, (c)lose, (w)rite
B: (a)ppend, (r)ead, (w)rite (a)ppend, (r)ead, (w)rite
C: (a)ppend, (r)edline, (w)rite
D: (s)ave, (r)ead, (w)rite
Use this dataframe to answer the question.

Which will NOT evaluate to 20.6? Select all that apply.

4/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
A: df.iloc[4,5]
B: df.iloc[6,5] df.loc[4,'Music Recording Sales']
C: df.loc[4,'Music Recording Sales'] df.iloc[6, 'Music Recording Sales (millions)']
D: df.iloc[6, 'Music Recording Sales (millions)']
Use this dataframe to answer the question.

How do we select Albums The Dark Side of the Moon to Their

Greatest Hits
df.loc[2:5, 'Album']
(1971-1975)? Select all that apply.
df.iloc[2:6, 1]
A: df.iloc[2:5, 'Album']
B: df.loc[2:5, 'Album']
C: df.iloc[2:6, 1]
D: df.loc[2:5, 1]
What attribute is used to retrieve the number of elements in an
array?
A: a.size
a.size
B: a.ndim
C: a.shape
D: a.dtype
What attribute is used to return the number of dimensions in an
array?
A: a.shape
a.ndim
B: a.dtype
C: a.ndim
D: a.size
Consider the following text file: Example1.txt:
This is line 1
This is line 2
This is line 3
What is the output of the following lines of code?
with open('Exampl1.txt', "r") as File1:
file_stuff=File1.read1ine ( )
print(file_stuff) This is line 1

A: This is line 1
B: This is line 1
This is line 2
This is line 3
C: This is line 1
This is line 2
What do the following lines of code do?
with open("Example.txt,"a") as writefile:
writefile. write( "This is line A\n")
writefile. write( "This is line B\n")
Append the file "Example.txt"
A: Append the file "Example.txt"
B: Read the file "Example.txt"
C: Write to the file "Example.txt"
What is the function of "GET" in HTTP requests?
A: Carries the request to the client from the requestor
B: Deletes a specific resource Carries the request to the client from the requestor
C: Returns the response from the client to the requestor
D: Sends data to create or update a resource
What does URL stand for?
A: Uniform Request Location
B: Uniform Resource Locator Uniform Resource Locator
C: Unilateral Resistance Locator
D: Uniform Resource Learning

5/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
What are the 3 parts to a response message?
A: Bookmarks, history, and security
B: HTTP headers, blank line, and body Start or status line, header, and body
C: Start or status line, header, and body
D: Encoding, body, and cache
What is a two-dimensional data structure?
A: Pandas Series
Pandas Dataframe
B: Pandas Dataframe
C: Numpy
Data Definition Language (DDL) and Data Manipulation Lan-
guage (DML) are what?
A: The basic categories for managing data.
B: The basic categories for providing security to databases.
The basic categories of the SQL language based on functionality.
C: The basic categories of the PYTHON language based on
functionality.
D: The basic categories of the SQL language based on function-
ality.
The measurements of spread or scatter of the individual values
around the central point is called:
A: Measures of dispersion
Measures of dispersion
B: Measures of central tendency
C: Measure of skewness
D: Measures of central tendency and Measures of dispersion
Which of the following is an example of time series data?
A: Annual average housing price in New York
B: Batting average of a baseball player Annual average housing price in New York
C: Number of trees in Jardin du Luxemburg in Paris
D: Number of dolphins in the Pacific Ocean
Which of the following is an example of categorical data?
A: Length of the river Nile
B: Mode of travel to work Mode of travel to work
C: Number of fire hydrants in a city
D: Number of children at a kindergarten
What's the best way to display median and outliers?
A: A bubble chart
B: A box plot A box plot
C: A scatter plot
D: A time series plot
What is a suitable way to display the average basketball scores
between two teams?
A: A bar chart
A bar chart
B: A pie chart
C: A histogram
D: A scatter plot
Which of the following is the suitable way to display the average
income earned by men and women in a city?
A: A scatter plot
A bar chart
B: A bar chart
C: A histogram
D: A pie chart
What is a suitable way to display relationship between two con-
tinuous variables?
A: A histogram
A scatter plot
B: A bar chart
C: A scatter plot
D: A pie chart
When the sum of two or more categories equals 100, what chart
type is ideally suited for displaying data?
A: A box plot
6/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
B: A pie chart
C: A histogram A pie chart
D: A line chart
When multiple observations are reported for each respondent in
the data set, to compute statistics for variables about the respon-
dents, one must:
A: Ignore the presence of duplicates and compute statistics as
Remove duplicates before running analysis
usual
B: Weight data by duplicates
C: Remove duplicates before running analysis
D: None of the above
What test is used to test the equality of variance
A: ANOVA
B: Levene's test Levene's test
C: z-test
D: t-test
Using the teacher's rating data, is there an association between
native (native English speakers) and the number of credits taught?
What test will you use?
Chi-Square Test for Association
A: Chi-Square Test for Association
B: T-test
C: ANOVA
D: Z-test
If I wanted to test for association using chi-square test, whether
there is an association between gender (Male or Female) and
1
tenure-ship (tenured or not tenured), what will be my degree of
freedom?
Battery life of smartphones is of great concern to customers. A
consumer group tested four brands of smartphones to determine
the battery life. Samples of phones of each brand were fully
charged and left to run until the battery died. The table above
displays the number of hours each of the batteries lasted. What
test will be be using to test the difference in means? ANOVA

A: T-test
B: Chi-square Test
C: ANOVA
D: Pearson Correlation Test
A room in a laboratory is only considered safe if the mean radiation
level is 400 or less. When a sample of 10 radiation measurements
were taken, the mean value of the radiation was 414 with a stan-
dard deviation of 17. There are concerns that mean radiation is
above 414. Radiation levels in the lab are known to follow a normal
distribution with standard deviation 22. We will like to conduct a
hypothesis test at the 5% level of signicance to determine whether
z-test
there is evidence that the laboratory is unsafe.
What will be the appropriate test?

A: z-test
B: t-test
C: ANOVA
D: Chi-square
A man accused of committing a crime is taking a polygraph (lie
detector) test. The polygraph is essentially testing the hypotheses
H0: The man is telling the truth vs. Ha: The man is not telling the
truth.
Suppose we use a 5% level of significance. Based on the man's
responses to the questions asked, the polygraph determines a
P-value of 0.08. We conclude that:
7/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
A: The probability that the man is telling the truth is 0.08.
B: We reject the null hypothesis as there is sufficient evidence that
the man is telling the truth. We fail to reject the null hypothesis as there is insufficient evidence
C: We fail to reject the null hypothesis as there is insufficient that the man is not telling the truth.
evidence that the man is not telling the truth.
D: The probability that the man is not telling the truth is 0.08.
Pearson correlation are concerned with:
A: the relationship between two categorical variables
B: the relationship between a quantitative explanatory variable
and a categorical response variable the relationship between two quantitative variables
C: the relationship between a categorical explanatory variable and
a quantitative response variable.
D: the relationship between two quantitative variables
Does running an ANOVA give the same p-value results as running
a regression analysis when testing the difference in group means?
True
A: True
B: False
We run a regression analysis in place of a t-test to test if there
is a difference in number of students enrolled in classes with
professors who are visible minority(vismin = 1) vs professors who
are not (vismin = 0). The table is shown below. What does the
coefficient for vismin mean?

Professors who are visible minority get about 21 students less on

A: Professors who are visible minority get about 21 students less
average that professors who aren't visible minority.
on average that professors who aren't visible minority.
B: We can't conclude because the error is too large and if factored
could change the conclusion of the tests.
C: Professors who are visible minority get about 58 students less
on average that professors who aren't visible minority.
D: Professors who are visible minority get about 21 students more
on average that professors who aren't visible minority.
Which of these options is most likely to be the null hypothesis for
testing correlation between two variables?
A: There is an association between an instructor's looks and
teaching evaluation score. There is no association between an instructor's looks and teaching
B: There is no association between an instructor's looks and evaluation score.
teaching evaluation score.
C: There is a partial association between an instructor's looks and
teaching evaluation score.
Which of the following is the best example of categorical data?
A: Cost of houses in a state
B: Average temperature of a lake on each day of a year Number of cars owned by a household
C: Number of cars owned by a household
D: Weights of children at a school
Which of the following would be the most appropriate way to
visually display the mode of a data set?
A: A box plot
A histogram
B: A line graph
C: A pie chart
D: A histogram
Which of the following data sets would be most appropriate for a
bar chart?
A: Elementary school children's favorite colors
Elementary school children's favorite colors
B: Average temperature of a lake on each day of a year
C: Price distribution of hotel rooms
D: Percent of a population by race

True or false? The statement, "Smoking does not increase your

True
chance of getting lung cancer" is an example of a null hypothesis.
8/9
ADY201m (FPTU_AI)
Hoc trực tuyến tại https://quizlet.com/_d2cnw
A: True
B: False
Which of the following statements is true about z-tests and t-tests?
A: When comparing the means of two independent samples with
equal means, a z-test should be used.
B: When comparing the means of two independent samples with
When comparing the means of two independent samples with
equal variances, a t-test should be used.
equal variances, a t-test should be used.
C: When comparing the means of two independent samples with
unequal means, a z-test should be used.
D: When comparing the means of two independent samples with
unequal variances, a t-test should be used.
'Which statement is true regarding the F-distribution?
A: It should be used in conjunction with an ANOVA.
B: It should be used in conjunction with a p-test. It should be used in conjunction with an ANOVA.
C: It should be used in conjunction with a t-test.
D: It should be used in conjunction with a z-test.
Which of the following tests is most appropriate to find the corre-
lation between a dependent and independent variable?
A: Regression
Regression
B: F-Test
C: T-test
D: ANOVA

9/9