NEW DST ALL Ques (BEFORE+AFTER) Mid+ExamMid
NEW DST ALL Ques (BEFORE+AFTER) Mid+ExamMid
Data Science Basics Questions and Answers - Sanfoundry (10 questions on Data Science Basics)
3. What is the typical file extension for a Comma Separated Values (CSV) file?
a. .txt
b. .csv
c. .py
d. .xls
4. Which of the following is a valid data type in JSON?
a. Date
b. Currenc
y c. List
d. Font
5. What type of XML tag is represented as <tagname />?
a. Start
Tag b. End
Tag
c. Empty-Element Tag
d. Attribute
6. Which method from the json library is used to load JSON from a file in
Python? a. Json.load(file)
b. json.loads(file)
c. json.read(file)
d. json.reads(file)
1. In MongoDB, what does BASE stand for in terms of consistency?
A. Basic Availability, Simple-state, Eventually Consistent
B. Basically Available, Soft-state, Eventually Consistent
C. Basic, Availability, Simple-state, Elastic
D. Basically Atomic, Stable-state, Eventual Consistency
2. What is BSON in MongoDB?
A. Basic Structured Object Notation
B. Binary Structured Object Node
C. Binary JSON
D. BSON Object Naming
3. Which type of NoSQL database is MongoDB classified as?
A. Column Store
B. Key-Value Store
C. Document Store
D. Graph Database
4. What is the purpose of the ObjectId in MongoDB?
A. To identify databases
B. To ensure document uniqueness
C. To create timestamps
D. To manage distributed transactions
5. Which of the following is true about MongoDB's handling of the _id field?
A. It cannot be indexed
B. It is automatically created as an integer
C. Developers cannot provide their own values
D. It is automatically indexed and can be an ObjectId or another unique
immutable value
6. What is a major advantage of MongoDB's document model?
A. Reduced scalability
B. Complex querying
C. Flexibility in schema design
D. Strict adherence to ACID principles
7. What does the term "Horizontally Scalable" mean in the context of MongoDB?
A. Scaling vertically across multiple servers
B. Scaling a single server to handle more requests
C. Distributing data across multiple servers to handle growth
D. Limiting the number of servers for better performance
(Database SQL Only Included(1,2,3,4,5,6,7,8,9,10)) : SQL Queries - Database Questions & Answers -
Sanfoundry
(MongoDB, Only Included (1,2,3,8,9,10)) NoSQL Databases - MongoDB Questions and Answers - Sanfoundry
MONGO SECTION
1. What is MongoDB?
A. Relational database
B. Document-oriented database
C. NoSQL database
D. Both B and C
2. In MongoDB, what is a document equivalent to in a SQL database?
A. Table
B. Record
C. Field
D. Column
3. Which method is used to insert a single document into a MongoDB collection
using PyMongo?
A. add_one()
B. insert_single()
C. insert_one()
D. add_document()
4. What is the purpose of the PyMongo package in Python with respect to MongoDB?
A. Web development
B. Data visualization
C. MongoDB driver for Python
D. Machine learning
5. In MongoDB, what does CRUD stand
for?
A. Create, Retrieve, Update, Delete
B. Connect, Read, Update, Delete
C. Collect, Retrieve, Use, Delete
D. Create, Read, Upload, Delete
6. How do you update a document in MongoDB using PyMongo?
A. update_single()
B. modify_one()
C. update_one()
D. change_document()
7. In PyMongo, what does the $set operator do in the context of updating a document?
A. Sets the document to null
B. Adds a new field to the document
C. Updates a specific field in the document
D. Sorts the document in ascending order
8. Which method is used to delete a single document from a MongoDB collection
in PyMongo?
A. delete_one()
B. remove_single()
C. erase_one()
D. discard_one()
9. What is the purpose of the sort() method in MongoDB when using PyMongo?
A. Group documents in a collection
B. Filter documents based on a condition
C. Order the result in ascending or descending order
D. Limit the number of documents returned
Data Preprocess
2. Question: Which image augmentation technique involves reversing rows or columns of pixels either vertically
or horizontally?
A) Rotation
B) Shifting
C) Flipping
D) Scaling
3. Question: What is the purpose of changing image brightness during data augmentation?
C) To increase contrast
4. Question: In the formula for grayscale conversion (gray_image = 0.3 * R + 0.59 * G + 0.11 * B), what do R, G,
and B represent?
B) Customer names
D) Employee salaries
3. What does the term "seasonality" refer to in the context of time series data?
Question 8:
7. In the context of time series data, what does the term "frequency" refer to?
Lambda
a) 0 b) -1 c) 5 d) 6
a) 24 b) 2 c) [1,2,3,4] d) 7
a) 24 b) 3 c) [1,2,3,4] d) 10
a) 0 b) 4 c)2 d) 1
2. Let x={a,b,d} and y={b,c } then Sjaccord.(x,y) is ---------
a) 0.2 b) 0.25 c) 1 d) 0.5
3. If x=[0 1 0 1] and y =[1 0 1 0] then dhamming(x,y) is ---------- while the squared ecludian (d2(x,y))2 is -------
a) 4/4
b) 2/4
c) 4/2
d) 2/2
4. Which of the following is not an example for NOSQL database
a) MongoDb
b) Neo4j
c) Cassandra
d) SQLite
1. Which module in Python supports regular expressions?
a) String b) regex c) pyregex d) sklearn
2. What does the function “match” in the regular expressions package do?
a) matches a pattern at the start of the string
b) matches a pattern at any position in the string
c) such a function does not exist
d) none of the mentioned
3. Which module in Python supports XML?
a) BeautifulSoup
b) numpy
c) pyxmlex
d) xmlrequest
4. Answer the following question regarding the state after the execution of the following code,
df = DataFrame([(1, 'Kolter', 'Zico'),(2, 'Manek', 'Gaurav'), (3, 'Rice', 'Leslie')],
columns=["Person ID", "Last Name", "First Name"])
df.drop(1, inplace=True, axis=0)
how many records will be in the dataframe df
a) 6 b) 3 c) 9 d) 2
5. To normalize the following dataset using three different techniques
10,40,50,10,50, 70,90,30
b) subtract each value from the standard deviation of the data
c) A followed by B
d) Divide each value by the median
e) Replace each value (x by x-mean of the data)/standard deviation
Answer several questions regarding the normalized tf , DF, IDF, tflogidf, binary representation, data matrix
distance matrix
you should be able to answer any questions about the code in handouts such as the above code
Midterm
Program: General/Intelligent Systems/Cybersecurity
Level: Third Term: Fall 2023/2024
Course Code: 02-24-01203 Course Title: Data Science Tools &Software
Time Allowed: 60minutes Total points: 20
Professor name: Dr. Mohamed Abd El-Hafeez
Attempt ALL the following 53 questions
You may choose E (=ALL) if all answers (A, B, C and D) are correct or choose F (=NONE) if none of the answers fits.
In the designated answer sheet, mark your choice (ⓐ, ⓑ, ⓒ, ⓓ, ⓔ, or ⓕ) in front of the question number.
Be sure that you have filled the appropriate bubbles carefully as in the example below.
Example: if the choice for question 300 is “C” then your answer sheet should look like this:
300. ⓐ ⓑ ⓓ ⓔ ⓕ
1. Which of the following is not true regarding Data Science?
a) Concerned only with big data
E
b) Heavy focus on machine learning algorithms
c) Concerned only with small data
d) Concerned with theories in statistics
2. Which module in Python supports regular expressions?
a) String b) re c) pyregex d) sklearn
3. What does the function “search” in the regular expressions package do?
a) matches a pattern at the start of the string
b) matches a pattern at any position in the string
c) replace all matched
d) delete all matched
4. Which of the following HTTP methods never modifies a server's state?
a) response = requests.put(...) b) response = requests.post(...)
c) response = requests.delete(...) d) response = requests.get(...)
5. Which module in Python supports parsing HTML and XML documents?
a) BeautifulSoup b) numpy c) pandas d) sklearn
6. What is the library that corresponds to the alias “ps” in the following code
df = ps.DataFrame([(1, 'Kolter', 'Zico')])
a) pandas
b) panorama
c) pymatplots
d) scipy
Answer the following two questions regarding the state after the execution of the following code:-
df = DataFrame([(1, 'Kolter', 'Zico'),(2, 'Manek', 'Gaurav'), (3, 'Rice', 'Leslie')],
columns=["Person ID", "Last Name", "First Name"])
df.drop(1, inplace=True, axis=0) drop 1 row
df.drop(2, inplace=True, axis=1) drop 1 column
7. how many records (rows) will be in the dataframe df, after executing the above code?
a) 1 b) 3 c) 2 d) 4
8. how many columns the dataframe df will have, after executing the above code?
a) 6 b) 2 c) 3 d) 1
Page 1 of 4
a) Employee records b) Documents c) Bank transactions d)Time Series
10. What is the primary purpose of the Request module?
a) Send HTTP requests to a server and retrieve web page content
b) Manage database connections for data storage
c) Execute complex algorithms for data analysis
d) Control graphical user interface interactions
11. What will be the output of the following Python code?
CarName = 'Porche'
WordName = 'World' 0 1
print('{0} is the fastest car in the {2}'.format(CarName, WordName))
a) Porche is the fastest car in the World
b) Porche is the fastest car in the
c) Porche is the fastest car in the 2
d) IndexError: tuple index out of range
12. What does the term "ACID" stand for in the context of databases?
a) All-Comprehensive Isolation and Durability
b) Atomicity, Consistency, Isolation, Durability
c) Advanced Configuration for Isolated Databases
d) Association of Concurrent Information and Data
13. How is the _id field automatically created if not provided in MongoDB?
a) Integer b) Timestamp c) ObjectId d) AutoID
14. What is MongoDB?
a) Relational database b) Document-oriented database c) NoSQL database d) Both B and C
15. In MongoDB, what is a document equivalent to in a SQL database?
a) Table b) Record c) Field d) Column
16. Which method is used to find documents in a MongoDB collection based on a specific condition?
a) get_one() b) search() c) find_one() d) query_one()
17. The hamming distance between two binary vectors is equivalent to :-
a) Jaccard Index b) Euclidean Distance c) Squared Euclidean Distance d) cosine similarity
18. Question: What does setting New_max=1 and New_min=0 achieve in data normalization?
a) Increases data complexity b) Reduces the impact of outliers
c) Adds noise to the dataset d) Standardizes data within a specific range
E 19. Which of the following is common technique to replace missing data in a dataset?
a) Mean b) Median c) Mode d) Random Value
20. What is the cosine similarity between the vectors (1, 0) and (0, 1)?
a) 1 b) 0 c) 0.5 d) 2
21. What is the primary purpose of converting an image to grayscale in machine learning algorithms?
a) To increase computational complexity
b) To introduce color variations
c) To reduce computational complexity
d) To improve image resolution
22. In the context of image normalization, what is the benefit of scaling all images to a common range such as [0,1]?
a) It increases computational complexity
b) It ensures fairness across all images
c) It introduces colour variations
d) It reduces the need for data augmentation
23. What is data augmentation in the context of image processing?
a) Increasing the size of an image dataset
b) Making minor alterations to existing data to increase diversity
c) Reducing the diversity of a dataset
d) Converting images to grayscale
24. What is the assumed seasonality for a monthly time series?
Page 2 of 4
a) 7 b) 12 c) 30 d) 365
25. Executing print( (lambda x, y: x//y)(4, 3)) in Python produces
a) 0 b) 1 c) 4/3 d) 7
26. Executing print(map(lambda x: x**3 , [0,1,2])) in Python produces
a) [0,0,0] b) [0,1,2] c) [0,1,8] d) [0,1,3]
27. Executing print(list(filter(lambda x: x > 2 and x < 8, [-1,0,5,3]))) in Python produces
a) [5,3] b) [3,5] c) [-1,0,5,3] d) 8 1+2=3
28. Executing print(functools.reduce(lambda x, y: x+y, [1,2,3,4])) in Python produces 3+3=6
a) 24 b) 10 c) [1,3,6,10] d) 1 4+6=10
29. Which of the following database is not a relational database?
F a) SQLite b) MySQL c) Oracle d) MS Access
30. Which of the following library is used for data visualization?
a) TensorFlow b) Scrapy c) Scikit Learn d) Matplotlib
F 31. Which of the following is not a tool for data processing, machine learning algorithm implementation, and visualization.
a) SAS b) Weka c) RapidMiner d) SAS and WEKA
32. Complex data streams can be analyzed and visualized dynamically using
a) Apache Spark b) Scrapy c) MS Excel d) MS Powerpoint
33. metrics.DistanceMetric.get_metric is a function defined in
a) Pandas b) Scrapy c) Sklearn d) Matplotlib
34. The output of the following code is
dist = get_metric('euclidean')
X = [[2, 3]]
Y = [[2, 2]]
dist.pairwise(X,Y)
a) 1 b)5 c) 9 d) 0
Page 3 of 4
Regarding the following code, answer the following two questions:-
iris = datasets.load_iris()
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
scalar = StandardScaler()
scaled_data = pd.DataFrame(scalar.fit_transform(df))
pca = PCA(n_components = 2)
pca.fit(scaled_data)
data_pca = pca.transform(scaled_data)
44. What is the purpose of using StandardScaler() in the following code
a) reduce the dimension b) fill missing data c) normalize the data d) remove noise
45. the number of columns of the data_pca
a) 4 b) 2 c) 3 d) 1
Given the following term frequencies in a corpus D that contains 3 documents D1..D3, answer the following questions:-
Document 1 (D1) Document 2 (D2) Document 3 (D3)
Term Term Term Term Term Term Count
Caw 2Count Sudan Count
3 Egypt 2
Sudan 1 Caw 2 Nile 2
Camel 1 Nile 1 Caw 1
46. The resulting data matrix will be of size
a) 3×5 b) 4 × 4 c) 5×5 d) 5×4
47. The normalized term frequency of tf (“camel”,D1) is
a) 0.20 b) 3
1/4
c) 4 d) 0.25
48. The inverse document frequency idf(“Camel”,D)
a) 3 b) 1 3/1
c) 1/3 d) 0
49. what is the tflogidf( “caw”,D)
a) 0 b) 1 c) 3 d) 5
50. The resulting distance matrix will be of size
a) 3×5 b) 4 × 4 c) 5×5 d) 3×3
51. The corresponding feature vector of document D1 using binary term frequency is
a) [1 1 1 0 0] b) [ 1 0 0 0 1] c) [1 0 1 1] d) [2 1 1]
52. The correlation between the data using the new axes z1,z2 is ---------- than the correlation between the same data with
respect to the axes x1,x2
a) Higher b) lower c) equals d) higher or equals
53. Which axis you may neglect to reduce the dimension
a) z1 b) z2 c) z1 or z2 d) z1 and z2
Best Wishes
Page 4 of 4
DST Revision After Midterm
Feature Selection& Reduction Techniques& Applications (Handout5)
1-Dimensionality reduction techniques are primarily used for:
a) Data visualization
b) Data compression
c) Noise removal
b) Image retrieval
c) Face recognition
a) Filter model
b) Wrapper model
c) MRMR model
d) Unsupervised model
a) Data compression
b) Dimensionality reduction
c) Feature selection
d) Image retrieval
a) Unsupervised
b) Supervised
c) Semi-supervised
d) Nonlinear
Answer: a) Unsupervised
a) NumPy
b) Pandas
c) Scikit-learn
d) Matplotlib
Answer: c) Scikit-learn
b) To remove outliers
a) Wrapper model
b) Filter model
c) MRMR model
14-The minimum redundancy and maximum relevance (MRMR) feature selection algorithm uses:
a) Heuristic search
b) Complete search
c) Nondeterministic search
c) Manifold learning
a) A built-in function
Answer
2. What is the correct syntax for a lambda function that adds two numbers, a and b?
a) lambda a, b: a + b
d) (lambda a, b: a + b)
Answer:
a) lambda a, b: a + b
a) (lambda a, b: a * b)(5, 3)
b) lambda a, b: a * b(5, 3)
c) call(lambda a, b: a * b, 5, 3)
d) lambda(5, 3, a * b)
Answer:
a) (lambda a, b: a * b)(5, 3)
Answer:
c) They return the result of the expression automatically
5-How do you use a lambda function with the map() function in Python?
Answer:
a) map(lambda x: x * 2, [1, 2, 3])
a) Adds 10 to x
b) Multiplies x by 10
d) Reduces x by 10
Answer:
c) Checks if x is greater than 10
Explanation: This lambda function returns True if x is greater than 10, else False
7- How do you use a lambda function as a key for sorting a list of tuples by the second element?
Answer:
a) sorted(my_list, key=lambda x: x[1])
a) Yes
b) No
Answer:
a) Yes
9- How would you filter out all negative numbers from a list using a lambda function?
Answer:
a) filter(lambda x: x > 0, my_list)
Answer:
c) Using the keyword "lambda"
Answer:
Answer:
Answer:
a) Multiple values
b) None
c) A single value
d) A list of values
Answer:
c) A single value
Answer:
classes.
a. Classification
b. Analysis of data
c. Extraction of data
d. Dataset
a. Data, Information
b. Learning, Classification
c. Knowledge, Information
d. Data, Knowledge
a. training set
b. test set
c. raw data
a. training set
b. validation set
a. True
b. False
a. Association rules.
b. Summarization.
c. Clustering.
d. Prediction
7. The number of classes is known in…..
a. classification
b. clustering
9- In the image below, which would be the best value for k assuming that the algorithm you are using is k-Nearest
Neighbor.
a-3
b-10
c-20
d-50
10-Which of the following machine learning algorithm can be used for imputing missing values of both categorical and
continuous variables?
a-Linear Regression
b-K-NN
c-Logistic Regression
d-NN
1- k-NN performs much better if all of the data have the same scale.
2-k-NN works well with a small number of features (X's), but struggles when the number of inputs is very large
3-k-NN makes no assumptions about the functional form of the problem being solved
a-1 and 2
b-1 and 3
c-Only 1
12-When you find noise in data which of the following option would you consider in k-NN?
d-None of these
13- The basic distinction between a linear regression model and generalised (1)
1. **The errors in the linear regression are normally distributed while they can have a
2. The errors in the linear regression model are homoskedstic while they are
3. The generalised linear model is not used for continuous dependent variable while that is
4. The linear regression model is easy to estimate while the generalised linear regression
14- The process of training a predictive model with well defined target values is known as (1)
1. Unsupervised learning
2. **Supervised learning
3. Model estimation
4. Model testing
16- The trade off between over fitting and under fitting training data is called (3)
1. (X-max(X))/(Max(X)-Min(X)
2. (X-Mean(X))/(Max(X)-Min(X))
3. **(X-Mean(X))/Standard Deviation of X
4. Mean(X)/Max(X)
18- How do you deal with Euclidean distance for nominal data in the context of Knn (3)
classification?
19- The following is the correct code for a function that normalizes the data (1)
20- The following is the correct code to execute a Knn model ( where train is the (2)
training data, test is the testing data, labels are stored in train_labels and we have a 7 nearest neighbour classifiction
2-.…………. method works by grouping data objects into a hierarchy or “tree” of the
cluster.
a. Hierarchical
b. K-Means
c. K-Medoids
3-There are …… styles of hierarchical clustering algorithms to build a tree from the
input set S
a. 1
b. 2
c. 3
d. 4
a. Top-Down
b. Bottom-Up
c. Both a & b
a. Top-Down
b. Bottom-Up
c. Both a & b
shortest distance from any member of one cluster to any member of the other
cluster.
a. Single linkage
b. Complete linkage
c. Average linkage
a. Hierarchical
b. K-Means
c. K-Medoids
8.k-medoids is a:
a. Partitioning methods
b. Hierarchical Methods
c. Model-based clustering
a. K-means
b. CLARANS
c. k-medoids
10.The …… methods can be integrated to cluster data with mixed numeric and
nominal values.
a. K-Modes
b. K-Means
c. K-Medoids
d. A &B
11.___________ is a self-learning technique in which system has to explore data.
a. Supervised Learning
b. Unsupervised Learning
c. semi-supervised Learning
d. Reinforcement Learning
12Which of the following methods will cluster the data in panel (a) of the figure below into the two clusters (red circle
and blue horizontal line) shown in panel (b)? Every dot in the circle and the line is a data point. In all the options that
involve hierarchical clustering, the algorithm is run until we obtain two clusters.
A: Complete linkage
B: Centroid linkage
C: Average linkage