0% found this document useful (0 votes)
48 views

MCQ-402- Unstructured Data Analysis

The document covers various aspects of unstructured data analysis, including definitions, applications, and methods for handling unstructured data. It discusses topics such as feature extraction, text classification, sentiment analysis, and the use of NoSQL databases like MongoDB. Additionally, it addresses audio and image data processing, classification, and the importance of preprocessing techniques in machine learning.

Uploaded by

Priya Velampudi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

MCQ-402- Unstructured Data Analysis

The document covers various aspects of unstructured data analysis, including definitions, applications, and methods for handling unstructured data. It discusses topics such as feature extraction, text classification, sentiment analysis, and the use of NoSQL databases like MongoDB. Additionally, it addresses audio and image data processing, classification, and the importance of preprocessing techniques in machine learning.

Uploaded by

Priya Velampudi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

DADS-402- Unstructured Data Analysis MCQ

Unit 1-Introduction to Unstructured Data


SELF-ASSESSMENT QUESTIONS – 1
1. Data that does not conform to a data model or data schema is known as _1.
Unstructured data.
2. Twitter is the main source of __ Unstructured _ data .
3. Data in unstructured data is not bounded or constrained by any kind of _
Fixed schemas _ _.
4. Unstructured is not suitable to store in the mainstream __ Relational
database.

5. Semi- structured data is the combination of __Structured__ and


_Unstrucured_.
6. _MetaData_ is the data which furnishes information about data.
7. Data Visulization_ is the graphical representation of data for uncomplicated
understanding.
8. _Datavisulization__ is used to highlight entities like people, companies, and
cities.
9. _Data Lake___ is used to store the data in its actual format.
10. _MongoDB_ are predominantly well suitable for managing, housing, and
using unstructured data.
Terminal Question
1. List down a few differences between structured and unstructured data?
2. Various applications of unstructured data?
3. What are the various methods to store unstructured data?
4. List of ways to analyze unstructured data?

Unit 2 Feature Extraction in Unstructured Data


SELF-ASSESSMENT QUESTIONS – 1
1. Headings and sub- headings given to columns are known as __. captions
2. The systematic presentation of raw data in row and column is called
_Tabulation
3. The main part of the table is known as _ body
4.Database Management Systems__ is best fit for storing and managing
recurring transactions, like salestransactions and ATM transactions. DBMS
5. Natural Language _Processing__ is one of the best tools to identifying
sentiment analysis is the use of taxonomies. NLP
6. The Pictorial representation _ representations of the objects are adopted
universally, as they are not bound by any formal language, region, or special
skills..
7. The image of the individual surfaces are called_ Views.
Terminal Questions:
1. Explain the evalution of textual data?
2. Explain what ‘BLOB’ is?
3. Explain the application of NLP and Taxonomies?
4. What is the difference between Text and Big data?
5. What is pictorial data?

Unit 03 : Word Cloud Creation


SELF-ASSESSMENT QUESTIONS – 1
1. To create a word cloud in Python, you will need to install a package called
Wordcloud
2. The first step in creating a word cloud is to import the necessary libraries,
including ___. Matplotlib, numpy, and PIL .
3. To generate a word cloud, you will need to provide a _ Text- as input, which
can be either a string or a file.
4. Once you have your text, you can create a word cloud object using the
_______
class from the wordcloud package.
5. The __ Generate_ method of the word cloud object can be used to generate
a word cloud based on the input text.
6. You can customize the appearance of the word cloud using various
parameters, such as the _ Background color, font size, and maximum number
of words.
7. To display the word cloud, you can use the _ Imshow_ method from the
matplotlib library.
8. You can also save the word cloud as an image file using the __. Savefig___
method from the matplotlib library.
9. To generate a word cloud from a file, you can use the . Open function to read
the file and then pass the contents to the word cloud object.
10.Word clouds can be useful for _. visualizing___ the most common words in
a text, which can provide insights into the themes or topics discussed.
Terminal Questions-
1. What is a word cloud?
2. What package do you need to install in python to create a word cloud?
3. What are some libraries that you need to import to create a word cloud in
python?
4. What is the first step in creating a word cloud?
5. How can you customize the appearance of a word cloud?
6. What is the purpose of generating a word cloud?
7. Can you create a word cloud from a file in python?

Unit 04 : Text Classification


SELF-ASSESSMENT QUESTIONS – 1
1. In text classification, the process of assigning labels or categories to text is
known as -Classification
2. The most common machine learning algorithm used for text classification
is-
Support Vector Machine (svm).
3. A set of pre-defined categories into which text can be classified is known as
a
Taxonomy
4. In supervised text classification, the machine learning model is trained on a
dataset that includes both the text and the _ Labels or Categories __.
5. The process of preparing text data for machine learning algorithms is known
as __. Text preprocessing _.
6. The process of extracting relevant information from text is known as
_ Text mining
7. The process of identifying and extracting named entities from text is known
as __ Named entity recognition (NER).
8. The process of removing common words that do not carry much meaning
from text is known as ____ Stop word removal.
9. unsupervised text classification, the machine learning model is trained on a
dataset that includes only the __. Text_.
10. A popular approach to text classification that involves representing text as
a
vector of word frequencies is known as ___ Bag- of- words (BoW)_.
Terminal Answers
1. Decision tree classifier and how does it work in text classification?
2. What is a naive Bayes classifier and how does it work in text classification?
3. What is a random forest classifier and how does it work in text
classification?
4. What are some common business applications of text classification?
5. How can text classification be used to improve customer experience?
1. Decision tree classifier and how does it work in text classification?

Unit 5-Sentiment Analysis


SELF-ASSESSMENT QUESTIONS – 1
1. The Machine Learning approach involves training a model on a labeled
dataset of text and their corresponding sentiment Labels
2. Transfer Learning in sentiment analysis involves using a pre-trained model
that has been trained on a large corpus of text data, and fine-tuning it on a
smaller dataset of labeled text data for a specific sentiment analysis __ Task.
3. Emotion detection is the process of identifying the __ Emotions_ expressed
in a given text or speech.
4. Emotion detection can be used in various applications, such as customer
Feedback_ and mental health diagnosis.
5. One common approach to emotion detection is using machine learning
algorithms, which can classify text based on __ Linguistic_ features.
6. Aspect-based sentiment analysis is a technique that focuses on identifying
and analyzing the sentiment of specific Aspects__ in a given text.
7. Aspect-based sentiment analysis involves breaking down a text into smaller
components, such as _. Phrases_- or sentences, and analyzing the sentiment
expressed about each aspect.
8. Sentiment analysis can be performed using Python by using various
libraries such as _. TextBlob__ and NLTK.
9. One common approach to sentiment analysis in Python involves using the
Vader_ package, which provides a pre-trained sentiment analysis model that
can be used to analyze the sentiment of text data.
10. Another approach to sentiment analysis in Python is using machine
learning algorithms, which involve training a model on a labeled dataset of
text data
and their corresponding sentiment _. labels_.
Terminal Question
1. What is the Machine Learning approach in sentiment analysis?
2. What is intent analysis?
3. What are some common applications of intent analysis?
4. What is emotion detection?
5. How do you perform sentiment analysis using python?

Unit 6 Topic Modelling


Short Answer Questions:
1. In topic modelling, the goal is to identify and extract . Latent topics or
themes___ within a collection of textual data.
2. The most used algorithm for topic modelling is ___. Latent Dirichlet
Allocation (LDA)__.
3. Topic modelling can be applied in various domains, such as __. Social
media analysis, content recommendation, and customer feedback
analysis.___.
4. Topic modelling is an __. Unsupervised method for uncovering latent topics
within a
collection of text data, whereas topic classification is a _, supervised
___method for assigningpredefined topics to individual documents.
5. In topic modelling, the number and nature of the topics are ___ Unknown, ,
whereas in topic classification, the topics are _ fixed _ and predefined.
6. Topic modelling algorithms such as LDA and NMF use a _. Probabilistic _
approach to identify latent topics, while topic classification algorithms such
as SVM and Naïve Bayes use a _, discriminative __ approach to assign labels
to documents
7. Latent Semantic Analysis (LSA) is a _ Statistical _ technique used to analyze
relationships
between a set of documents and the terms they contain.
8. LSA uses a mathematical method called __ Singular value decomposition
(SVD)_ to create a low-dimensional representation of the documents and
terms.
9. LSA can be used for tasks such as . Document similarity ___ and __
information retrieval _. Answer: document similarity, information retrieval
10. LSA assumes that words with similar meanings will appear in similar __
Contexts,__, and
therefore can be used to identify __. semantic relationships_ between terms.
Terminal Questions
1. What is topic modelling?
2. How does topic modelling work?
3. What are some limitations of topic modelling?
4. What are some applications of topic modelling?
5. What is latent dirichlet allocation (LDA)?
6. How does LDA work?
7. What are some applications of LDA?
8. What is latent semantic analysis (LSA)?
9. How does LSA work?
10. What are some applications of LSA?

Unit 7 Introduction to NoSQL Database


Self Assessment Questions
1. NoSQL stands for "Not only SQL," which means it is not limited to _.
Traditional SQL databases and their structure. _.
2. NoSQL databases are designed to store and manage _ Unstructured or
semi- structured _ data.
3. NoSQL databases are highly scalable and flexible, making them suitable for
applications that require fast and real-time data processing, such as __ Social
media platforms, e-commerce websites, and mobile applications __.
4. NoSQL databases use different data models, such as document-oriented,
key-value,
graph, and column-family, which are optimized for specific _ Use cases_.
5. Unlike relational databases, NoSQL databases do not use a fixed __
Schema __, making them highly adaptable to changing data requirements.
6. NoSQL databases support distributed data storage and processing, which
allows for
high _. Availability _, fault tolerance, and scalability.
7. NoSQL databases are used by many large organizations, including __
Google, Amazon, Facebook, and Twitter __, to
manage and store their data.
8. Some popular NoSQL databases include MongoDB, Cassandra, Redis,
Couchbase, and___. Neo4j __.
9. NoSQL databases are ___. Flexible to use, as they do not require a fixed
schema or
predefined relationships between tables.
10. NoSQL databases are capable of __ Distributing_ data across multiple
servers, ensuring high
availability and fault tolerance.
Terminal Questions
1. What is NoSQL?
2. How are NoSQL databases different from relational databases?
3. What are the different data models used by NoSQL databases?
4. What are the advantages of using NoSQL databases? 5. What are some
popular NoSQL databases?

Unit 8- Introduction to MongoDB


Self Assessment Questions
1. MongoDB is a . Document- oriented _ database management system.
2. MongoDB stores data in _. Collections ___, which are similar to tables in
relational databases.
3. MongoDB uses the ___. BSON (Binary JSON) __ format to store data, which
allows for fast and efficient querying and indexing.
4. MongoDB uses a __. Document __-based data model, where each
document represents a record or an entity.
5. MongoDB provides high availability and fault tolerance through its support
for_ Replica _sets and sharding.
6. MongoDB has a rich __. Query __ language and supports advanced queries,
such as aggregation pipelines, full-text search, and geospatial queries.
7. MongoDB provides a flexible _. Indexing _ system that allows for indexing
any field in a document, including nested fields and arrays.
8. MongoDB is easy to set up and use, with extensive documentation and a
large_. Community _ of users and contributors.
9. MongoDB provides a wide range of tools and ___. Integrations___ for
developers, including drivers for many programming languages, such as
Python, Java, Node.js, and PHP.
Terminal Questions:
1. What is the main feature of MongoDB that sets it apart from relational
databases?
2. How does MongoDB ensure high availability and fault tolerance?
3. What is BSON, and why is it important in MongoDB?
4. What are some use cases for MongoDB?
5. How does MongoDB handle indexing and querying of data?

Unit 9 Working with Audio Data


Self Assessment Questions
1. Audio data processing refers to the manipulation of audio signals to
_enhance or modify_ their characteristics.
2. One common application of audio data processing is . noise reduction _,
which involves the removal or reduction of unwanted sounds from an audio
signal.
3. Another application of audio data processing is _. Equalization__, which
involves adjusting the frequency response of an audio signal to improve its
overall sound quality.
4. Audio data processing can also be used for __ Compression__, which
involves reducing the dynamic range of an audio signal to make it more
consistent in volume.
5. Pitch correction is another application of audio data processing, which
involves adjusting the pitch__of an audio signal to correct any inaccuracies or
errors.
6. The Fourier transform can be used to convert a time-domain signal into a
_frequency-domain representation, allowing us to identify the frequencies
that make up the signal.
7. The inverse Fourier transform allows us to Reconstruct___a time-domain
signal from its
frequency-domain representation.
8. The Fourier transform can be computed using several algorithms, including
the _ . Fast Fourier Transform (FFT)__ algorithm, which is an efficient
implementation of the Fourier transform.
9. The Fast Fourier Transform (FFT) is an algorithm that allows us to __.
Efficiently_compute the
Fourier transform of a discrete signal.
10. The accuracy of the FFT algorithm is determined by the _. sampling rate_ of
the signal and the
length of the signal segment used for the FFT computation.
Terminal Questions
1. What are the common file formats for audio data?
2. How can you load an audio file in python?
3. How can you visualize an audio signal?
4. What is Fourier Transform?
5. What is Fast Fourier Transform (FFT)

Unit 10 -Audio Data Classification-


Self Assessment Questions
1. What is the process of converting an analog audio signal into a digital
representation
called _ Analog- to- Digital Conversion (ADC)__.
2. Sample Value _ is the unit of measurement for the amplitude of a digital
audio signal.
3. _. Digital- to- Analog Conversion (DAC)_ is the process of converting a
digital audio signal back into an analog signal.
4. Sampling rate__ is the term for the number of samples captured per second
in a digital audio recording.
5. Acoustic data can be represented as __. Identifying or labeling__, which are
visual representation of the frequency content of an audio signal over time.
6. The main steps in acoustic data classification include __ Data
preprocessing, feature extraction, and classification.
7. Feature extraction involves transforming the raw audio signal into a set of
___ Numerical__ features that can be used for classification.
8. Good audio data quality is important for accurate ___ Analysis and
classification __ of sounds.
9. The quality of audio data can affect the performance of __. Machine
learning _ models used for sound classification.
10. Poor audio quality can lead to __. Misclassification_ of sounds and a
decrease in model accuracy.

Unit 11- Working with Image


Self-Assessment Questions
1. Image data preprocessing is the process of __Transforming___ raw image
data into a format that is more suitable for analysis.
2. Common techniques used in image data preprocessing include _ resizing,
normalization, cropping, and augmentation __.
3. Resizing is the process of __Changing__ the size of an image to a specific
width and height.
4. Normalization is the process of __Scaling__ pixel values to a common
range, such as [0,1] or [-1, 1].
5. Augmentation is the process of _Generating___ new training samples by
applying random
transformations to the original images, such as rotations, flips, and color
changes.
6. Image data preprocessing can help improve machine learning performance
by__Reducing_the impact of noise and variability in the input data.
7. Preprocessing can also help _Increase__ the efficiency of training by
reducing the computational cost and memory requirements of the machine
learning algorithm.
8. The choice of preprocessing techniques depends on the specific
__Characteristics__ of the image
data and the requirements of the machine learning task.
9. Preprocessing should be performed carefully to avoid _Losing_ important
information
or introducing bias into the data.
10. Histogram equalization may also _Require__significant computational
resources, especially for large images or high-resolution data, which can limit
its practicality for some applications.
Terminal Questions
1. What is histogram equalization?
2. How does histogram equalization work?
3. What are the benefits of using histogram equalization?
4. What types of images are well- suited for histogram equalization?
5. What are some limitations of using histogram equalization?

Unit 12: Image Data Classification


Self-Assessment Questions
1. In ___ Supervised classification __, the machine learning algorithm is
trained on a labeled dataset, where each
data point is associated with a class label.
2. The goal of __ supervised classification __ is to learn a mapping function
that can accurately classify new,
unlabeled data points based on their features.
3. In contrast, __ unsupervised classification ___ involves clustering data
points based on their similarity,
without the use of predefined class labels.
4. The goal of _ unsupervised classification _ is to identify underlying patterns
or structures in the data that can
inform further analysis or decision-making.
5. Supervised classification__ typically requires a larger amount of labeled
data for training, whereas
unsupervised classification can be performed on smaller datasets or even
individual data points.
6. The _ convolutional layers __ in a CNN apply a series of filters to the input
image, which detect local patterns
such as edges, corners, and textures.
7. The ___ pooling layers _ in a CNN downsample the feature maps produced
by the convolutional
layers, reducing the spatial dimensions of the data while preserving important
features.
8. The __ fully connected layers __ in a CNN combine the features learned by
the convolutional and pooling
layers into a final classification decision.
9. The process of training a CNN for image classification typically involves
feeding the network a large dataset of labeled images and using an
optimization algorithm such as _ Backpropagation _ to adjust the weights of
the network to minimize a loss function.
10. To improve the performance of a CNN for image classification, techniques
such as data augmentation, dropout, and transfer learning _ can be used.
Terminal Questions
1. What is CNN image classification?
2. What is the advantage of using a CNN for image classification?
3. What are the key components of a CNN for image classification?
4. What is the process of training a CNN for image classification?
5. How can the performance of a CNN for image classification be improved?

Unit 13: Introduction to Video Classification


Self- Assessment Questions:
1. Video classification is the task of categorizing __ Video __ into different
classes.
2. The main objective of video classification is to assign a ____ Label or
category _ to a video.
3. In video classification, a video is typically divided into smaller segments
called__ Frames or shots _.
4. The most common approach to video classification is to use _ Machine or
deep learning __ learning techniques.
5. In video classification, the features extracted from each frame or shot are
usually fed into a _ Neural network or classifier _ for classification.
6. Some popular applications of video classification include ____ Video
surveillance, content recommendation, sports analysis, and sentiment
analysis _
7. One of the biggest challenges in video classification is dealing with __
Temporal variations or changes over time. _ in the videos.
8. Another challenge in video classification is the need for ____ Annotated or
labeled data ___ to label the
videos for training the model.
9. Video classification can be improved by using __ Unstructured Data
Analysis _ techniques, which involve using pre-trained models on large
datasets.
10. Video classification is an important area of research that has numerous
applications
in __ Industry, academia, and various fields of research.S
Terminal Questions
1. What is video classification?
2. What are some common techniques used for video classification?
3. What are some popular applications of video classification?
4. What are some challenges in video classification?
5. How can transfer learning improve video classification?

Unit 14: Fake News Prediction


Self- Assessment Questions
1. Machine learning algorithms can be trained to classify news articles as
_Real, Or Fake.
2. Natural Language Processing (NLP) techniques can be used to analyze the
text of news articles and identify patterns that indicate _Deception or
manipulation__.
3. In fake news classification, features such as headline sentiment, word
frequency, and
source reliability can be used to __Distinguish real news from fake news_____.
4. Some challenges in fake news classification include the constantly evolving
nature of
fake news and the __Difficulty in obtaining labeled training data____.
5. Random Forest is an example of Supervised learning__ learning algorithm.
6. In a random forest, multiple decision trees are built and each tree is built on
a_Random_ sample of the data.
7. The goal of a random forest is to reduce ___Overfitting__ and improve the
accuracy of the model.
8. In a random forest, the final prediction is made by _A majority vote__ of
predictions made by each decision tree.
9. The process of selecting a random subset of features for each decision tree
in a random forest is called _Feature bagging__.
10. A random forest is often used for _Predictive__ tasks, such as
classification and regression.
Terminal Questions:
1. What is exploratory data analysis?
2. What are some common techniques used in EDA?
3. What is feature extraction?
4. Why is feature extraction important?
5. What are some common techniques used for feature extraction?
6. What is the difference between feature extraction and feature selection?

Unit 15 Case Study on Audio Data Classification


Self-Assessment Questions
1. Bird sound classification is the process of categorizing bird vocalizations
based on their ___Acoustic__ features.
2. Bird sound classification is challenging because it requires the detection
and
recognition of subtle differences in the ___Acoustic__ features of the
vocalizations produced by different bird species.
3. The key features that are used to classify bird sounds include _Pitch__,
rhythm, timbre, and spectral characteristics.
4. Machine learning algorithms such as deep neural networks, decision trees,
and support vector machines are commonly used for bird sound
classification, and are
trained using large datasets of ___Labeled__ bird sound recordings.
5. Bird sound classification has important applications in ecological
monitoring, wildlife conservation, and ___Bioacoustic__ research, and is an
area of active research and development.
6. Siamese Networks are commonly used in tasks that involve__ Similarity or
distance__ comparisons between two inputs, such as image or text matching,
face recognition, and signature verification.
7. Siamese Networks are composed of __Two__ identical sub-networks that
share the same weights and architecture.
8. The contrastive loss function in Siamese Networks is used to penalize the
model when
it incorrectly predicts the __Similarity or dissimilarity__between two inputs.
9. Dilated convolutions can increase the __Receptive_field of a convolutional
neural network (CNN) without increasing the number of parameters, which
can improve the model's performance in tasks that require a larger context,
such as image segmentation and object recognition.
10. Dilated convolutions have gaps between the kernel elements, which are
called __Dilation___ rates, that allow the network to sample the input with a
larger stride, effectively increasing the receptive field of the kernel without
increasing its size.
Terminal Questions:
1. Siamese Networks are commonly used in what type of tasks?
2. Siamese Networks are composed of how many identical sub-networks?
3. In Siamese Networks, what is the purpose of the contrastive loss function?
4. What is the advantage of using dilated convolutions in image processing
tasks?
5. How are dilated convolutions different from regular convolutions?

You might also like