0% found this document useful (0 votes)
51 views42 pages

CCS348 - Game Theory Lab Manual Record

The document serves as a record notebook for the Text and Speech Analysis Laboratory course at St. Peter's College of Engineering & Technology for the academic year 2024-2025. It outlines the institution's vision and mission, the department's objectives, and the course outcomes, objectives, and experiments related to natural language processing and speech recognition. The document includes detailed algorithms and sample programs for various experiments using Python and NLTK.

Uploaded by

praveenleion2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views42 pages

CCS348 - Game Theory Lab Manual Record

The document serves as a record notebook for the Text and Speech Analysis Laboratory course at St. Peter's College of Engineering & Technology for the academic year 2024-2025. It outlines the institution's vision and mission, the department's objectives, and the course outcomes, objectives, and experiments related to natural language processing and speech recognition. The document includes detailed algorithms and sample programs for various experiments using Python and NLTK.

Uploaded by

praveenleion2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

St.

PETER'S
COLLEGE OF ENGINEERING & TECHNOLOGY
(An Autonomous
Institution)

DEPARTMENT OF INFORMATION
TECHNOLOGY

CCS369 – TEXT AND SPEECH ANALYSIS LABORATORY

RECORD NOTEBOOK

NAME :

REG.NO :

BRANCH :

YEAR/SEM :

2024-2025
St. PETER'S
COLLEGE OF ENGINEERING & TECHNOLOGY
(An Autonomous
Institution)

DEPARTMENT OF INFORMATION TECHNOLOGY

Bonafide Certificate

NAME………………………………………………………………………..………………….

YEAR………………………………………..SEMESTER…………..……………………......

BRANCH……………………………………………………...………..………….....................

REGISTER NO. ……………………………...…………………………………………….....

Certified that this bonafide record work done by the above student of the
during the year 2024 – 2025.

Faculty-in-Charge Head of the Department

Submitted for the practical Examination held on at St. PETER'S COLLEGE

OF ENGINEERING AND TECHNOLOGY

Internal Examiner External Examiner


St. PETER'S
COLLEGE OF ENGINEERING & TECHNOLOGY
(An Autonomous Institution)
Affiliated to Anna University | Approved by AICTE
Avadi, Chennai, Tamilnadu – 600 054

INSTITUTION VISION
To emerge as an Institution of Excellence by providing High Quality Education in Engineering,
Technology and Management to contribute for the economic as well as societal growth of our
Nation.

INSTITUTION MISSION
 To impart strong fundamental and Value-Based Academic knowledge in various
Engineering, Technology and Management disciplines to nurture creativity.
 To promote innovative Research and Development activities by collaborating with Industries,
R&D organizations and other statutory bodies.
 To provide conducive learning environment and training so as to empower the students with
dynamic skill development for employability.
 To foster Entrepreneurial spirit amongst the students for making a positive impact on remark
able community development.
DEPARTMENT OF INFORMATION TECHNOLOGY
VISION

To emerge as a center of academic excellence to meet the industrial needs of the competitive
world with IT technocrats and researchers for the social and economic growth of the country in the
area of Information Technology

MISSION

 To provide quality education to the students to attain new heights in IT industry and research
 To create employable students at national/international level by training them with adequate
skills
 To produce good citizens with high personal and professional ethics to serve both the IT
industry and society.

PROGRAM EDUCATIONAL OBJECTIVES (PEOs):


Graduates will be able to

 Demonstrate technical competence with analytical and critical thinking to understand and meet the
diversified requirements of industry, academia and research.

 Exhibit technical leadership, team skills and entrepreneurship skills to provide business solutionsto
real world problems.

 Work in multi-disciplinary industries with social and environmental responsibility, work ethics and
adaptability to address complex engineering and social problems

 Pursue lifelong learning, use cutting edge technologies and involve in applied research to design
Optimal solutions.

PROGRAM OUTCOMES (POs):


1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering fundamentals,
and an engineering specialization to the solution of complex engineering problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences,
and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and design
system components or processes that meet the specified needs with appropriate consideration for
the public health and safety, and the cultural, societal, and environmental considerations
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information
to provide valid conclusions.

5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities withan
understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess societal,
health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional
engineering practice.

7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts, and demonstrate the knowledge of, and need B.TECH.
INFORMATION TECHNOLOGY 2 for sustainable development.

8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of
theengineering practice.

9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse
teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports
and design documentation, make effective presentations, and give and receive clear instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one’s own work, as a member and leader in a team, to
manage projects and in multidisciplinary environments

12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change

PROGRAM SPECIFIC OBJECTIVES (PSOs)


To ensure graduates

 Have proficiency in programming skills to design, develop and apply appropriate techniques, to
solve complex engineering problems.
 Have knowledge to build, automate and manage business solutions using cutting edge technologies.

 Have excitement towards research in applied computer technologies.


CCS369 – Text and Speech Analysis Theory Laboratory

COURSE OUTCOMES:

CO1: Explain existing and emerging deep learning architectures for text and speech processing

CO2: Apply deep learning techniques for NLP tasks, language modelling and machine translation

CO3: Explain coreference and coherence for text processing

CO4: Build question-answering systems, chatbots and dialogue systems

CO5: Apply deep learning models for building speech recognition and text-to-speech systems

CO – PO & PSO’s MAPPING:

PO’s PSO’s

COs PO PO PO PO- PO- PO- PO- PO- PO- PO- PO- PO- PSO PSO PSO
-1 -2 -3 4 5 6 7 8 9 10 11 12 -1 -2 -3

CO- 1 3 2 3 1 3 - - - 1 2 1 2 1 1 1

CO-2 3 1 2 1 3 - - 2 2 1 3 3 2 1
-

CO-3 2 2 1 3 1 - - 3 3 1 2 3 3 1
-

CO-4 2 1 1 1 2 - - - 2 1 2 2 3 1 1

CO-5 1 3 2 2 1 - - - 3 2 1 1 2 3 1

Avg 2.2 1.8 1.8 1.6 2 - - - 2.2 2 1.2 2 2.4 2 1

1 - low, 2 - medium, 3 - high, ‘-‘- no correlation


CCS369 – Text and Speech Analysis Theory Laboratory

COURSE OBJECTIVES:
● Understand natural language processing basics

● Apply classification algorithms to text documents

● Build question-answering and dialogue systems

● Develop a speech recognition system

● Develop a speech synthesizer

LIST OF EXPERIMENTS:
1. Create Regular expressions in Python for detecting word patterns and tokenizing text

2. Getting started with Python and NLTK - Searching Text, Counting Vocabulary, Frequency
Distribution, Collocations, Bigrams

3. Accessing Text Corpora using NLTK in Python

4. Write a function that finds the 50 most frequently occurring words of a text that are not stop
words.

5. Implement the Word2Vec model

6. Use a transformer for implementing classification

7. Design a chatbot with a simple dialog system

8. Convert text to speech and find accuracy

9. Design a speech recognition system and find the error rate


TABLE OF CONTENTS

S.NO. DATE EXPERIMENT TITLE PG.NO SIGN

Create Regular Expressions in Python for Detecting


1.
Word Patterns and Tokenizing Text
Getting Started with Python and NLTK – Searching
2.
Text, Counting Vocabulary, Frequency Distribution,
Collocations, Bigrams
Accessing Text Corpora using NLTK in Python
3.
Write a Function that finds the 50 Most Frequently
4.
Occurring Words of a Text that are Not Stop Words
Implement the Word2vec Model
5.
Use a Transformer for implementing Classification
6.
Design a ChatBot with a Simple Dialog System
7.
Convert Text to Speech and find Accuracy
8.
Design a Speech Recognition System and find the
9.
Error rate
EX.NO:1

CREATE REGULAR EXPRESSIONS IN PYTHON FOR DETECTING WORD


PATTERNS AND TOKENIZING TEXT

Aim:
To create regular expressions in python for detecting word patterns and tokenizing
text.

Algorithm:

1. Import the re module (for regular expressions).

2. Define a sample text input.

3. Use re.findall() to extract words matching specific patterns:

4. Words starting with a capital letter

5. Words ending in "ing"

6. Words with digits, etc.

7. Use re.split() or re.sub() to tokenize or clean text.

8. Display the matched patterns and tokenized output.


Program:

import re

text = "Python is amazing! It's used in AI, ML, and web development. Running,
swimming, coding - all are fun!"

# Words that end with 'ing'


ing_words = re.findall(r'\b\w+ing\b', text)

# Words starting with capital letters


capital_words = re.findall(r'\b[A-Z][a-z]*\b', text)

# Tokenizing text (splitting by words)


tokens = re.findall(r'\b\w+\b', text)

print("Words ending with 'ing':", ing_words)


print("Capitalized words:", capital_words)
print("Tokenized text:", tokens)
Output:

Result:
Thus the regular expressions in python for detecting word patterns and tokenizing
text had been successfully implemented and the output is also verified.
EX.NO:2

GETTING STARTED WITH PYTHON AND NLTK – SEARCHING TEXT, COUNTING


VOCABULARY, FREQUENCY DISTRIBUTION, COLLOCATIONS, BIGRAMS

Aim:
To perform searching text, counting vocabulary, frequency distribution,
collocations, biagrams with python and NLTK.

Algorithm:

1. Install and import nltk.

2. Load a sample text corpus (e.g., from nltk.book or a custom string).

3. Search for words or patterns using text.concordance().

4. Count vocabulary using set() or FreqDist().

5. Generate frequency distribution using nltk.FreqDist.

6. Identify collocations (frequent word pairs) using .collocations().

7. Generate bigrams using nltk.bigrams().

8. Display the top results for analysis.

9. Optional: visualize using fdist.plot().


Program:

pip install nltk


import nltk
nltk.download('book')
import nltk
from nltk.book import text1 # Moby Dick
from nltk import FreqDist, bigrams

# Download required resources (only once)


# nltk.download('book')

# 1. Search for a word


print("Search for the word 'whale':")
text1.concordance("whale")

# 2. Count unique vocabulary


unique_words = set(text1)
print("\nTotal unique words:", len(unique_words))

# 3. Frequency distribution
fdist = FreqDist(text1)
print("\nTop 10 most frequent words:")
print(fdist.most_common(10))

# 4. Collocations (frequent pairs of words)


print("\nCommon Collocations:")
text1.collocations()
# 5. Bigrams
bi_grams = list(bigrams(text1))
print("\nSample Bigrams:")
print(bi_grams[:10])
Output:
Result:
Thus the performance of searching text, counting vocabulary, frequency
distribution, collocations, bigrams with python and NLTK had been successfully
implemented and the output is also verified.
EX.NO:3

ACCESSING TEXT CORPORA USING NLTK IN PYTHON

Aim:
To access text corpora using NLTK in python.

Algorithm:

1. Import the nltk library.

2. Download the necessary corpora using nltk.download().

3. Import a specific corpus (e.g., Gutenberg).

4. Load a text file from the corpus.

5. Tokenize the text into words or sentences.

6. Perform basic analysis like frequency distribution or concordance.

7. Print or visualize the results.


Program:

import nltk
nltk.download('punkt')
nltk.download('popular') # Download all popular packages including punkt_tab
import nltk
nltk.download('gutenberg')

from nltk.corpus import gutenberg


from nltk.tokenize import RegexpTokenizer
from nltk.probability import FreqDist

# Access a specific text


sample_text = gutenberg.raw('austen-emma.txt')

# Use RegexpTokenizer instead of word_tokenize (avoids punkt_tab issue)


tokenizer = RegexpTokenizer(r'\w+')
tokens = tokenizer.tokenize(sample_text.lower()) # Convert to lowercase for better
frequency count

# Display number of tokens


print("Total tokens:", len(tokens))

# Display first 20 tokens


print("First 20 tokens:", tokens[:20])

# Frequency distribution of words


fdist = FreqDist(tokens)
# Display 10 most common words
print("Most common words:")
print(fdist.most_common(10))
Output:

Result:
Thus the Text Corpora using NLTK in Python had been successfully
accessed and the output is also verified.
EX.NO:4

WRITE A FUNCTION THAT FINDS THE 50 MOST FREQUENTLY OCCURING


WORDS OF A TEXT ARE NOT STOP WORDS

Aim:
To write a function that finds the 50 most frequently occurring words of a text are
not stop words.

Algorithm:

1. Import required modules from nltk.

2. Load and tokenize the text using RegexpTokenizer.

3. Convert all words to lowercase.

4. Remove all stop words from the token list.

5. Count the frequency of remaining words using FreqDist.

6. Extract and display the top 50 most frequent words.


Program:

import nltk
nltk.download('gutenberg')
nltk.download('stopwords')

from nltk.corpus import gutenberg, stopwords


from nltk.tokenize import RegexpTokenizer
from nltk.probability import FreqDist

# Load the sample text


sample_text = gutenberg.raw('austen-emma.txt')

# Tokenize using regex to avoid punkt errors


tokenizer = RegexpTokenizer(r'\w+')
tokens = tokenizer.tokenize(sample_text.lower()) # Lowercase for uniformity

# Remove stopwords
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word not in stop_words]

# Frequency distribution
fdist = FreqDist(filtered_tokens)

# Display 50 most common non-stop words


print("Top 50 most frequent non-stop words:")
for word, freq in fdist.most_common(50):
print(f"{word}: {freq}")
Output:

Result:
Thus the function that finds the 50 most frequently occurring words of a
text are not stop words using python had been successfully implemented and the
output is also verified.
EX.NO:5

IMPLEMENT THE WORD2VEC MODEL

Aim:
To Implement the word2vec model using python.

Algorithm:

1. Import necessary libraries

2. Import nltk for accessing corpora.

3. Import spacy for vector representation.

4. Download required NLTK corpus

5. Download the Gutenberg dataset from NLTK.

6. Load a spacy model with word vectors

7. Load the en_core_web_md model, which includes pre-trained word vectors.

8. Load and preprocess the text

9. Read raw text from a file in the Gutenberg corpus.

10. Limit the text to a manageable size (e.g., 5000 characters).

11. Perform sentence segmentation

12. Use spacy's built-in sentence segmenter to split the text into sentences.
13. Choose target words for similarity

14. Select two words (e.g., "emma" and "harriet") and extract their vector

representations.

15. Compute similarity between two words

16. Use spaCy’s .similarity() method to find the cosine similarity between them.

17. Find most similar words to a target word

18. Iterate through the vocabulary.

19. Filter words with vectors, lowercase, and alphabetic only.

20. Compute similarity with the target word.

21. Store words with similarity above a threshold (e.g., 0.6).

22. Sort and display the top results

23. Sort the similar words by similarity score in descending order.

24. Display the top N (e.g., 10) similar words.


Program:

pip install spacy


pip install nltk
import spacy
import nltk
nltk.download('gutenberg')

from nltk.corpus import gutenberg

# Load spacy model


import en_core_web_md
nlp = en_core_web_md.load()

# Load and limit text


sample_text = gutenberg.raw('austen-emma.txt')[:5000]
doc = nlp(sample_text)

# Extract sentences
sentences = list(doc.sents)

# Process word similarity


token1 = nlp("emma")[0]
token2 = nlp("harriet")[0]
print(f"Similarity between 'emma' and 'harriet': {token1.similarity(token2):.3f}")
# Most similar words to "emma"
print("\nWords most similar to 'emma':")
similarities = []
for word in nlp.vocab:
if word.has_vector and word.is_lower and word.is_alpha:
sim = token1.similarity(word)
if sim > 0.6:
similarities.append((word.text, sim))

similarities = sorted(similarities, key=lambda x: -x[1])[:10]


for word, score in similarities:
print(f"{word}: {score:.3f}")
Output:

Result:
Thus the implementation of word2vec model using python had been
successfully implemented and the output is also verified.
EX.NO:6

USE A TRANSFORMER FOR IMPLEMENTING CLASSIFICATION

Aim:
To implement classification by using transformer in python.

Algorithm:

1. Import the pipeline function from the transformers library.

2. Import other necessary libraries such as torch.

3. Load the pre-trained transformer model for the task you want to perform (e.g.,

sentiment analysis) using the pipeline() function.

4. Specify the task type as sentiment-analysis for sentiment classification.

5. Define or load the text that needs to be classified. This can be a sentence or

document you want to analyze.

6. Pass the input text to the model via the pipeline. The model will process the input

and predict the class (e.g., positive or negative sentiment).

7. Capture and display the classification result.

8. The result will include the predicted label (e.g., POSITIVE or NEGATIVE) and

the model’s confidence score.


Program:

pip install transformers torch


from transformers import pipeline

# Load the pre-trained transformer model for sentiment analysis


classifier = pipeline('sentiment-analysis')

# Sample text for classification


sample_text = "I love this movie! It's amazing and I would watch it again."

# Perform sentiment analysis


result = classifier(sample_text)

# Output the result


print(f"Text: {sample_text}")
print(f"Sentiment: {result[0]['label']} with a confidence score of {result[0]['score']:.4f}")
Output:

Result:
Thus the implementation of classification using transformers in python had
been successfully implemented and the output is also verified.
EX.NO:7

TO DESIGN A CHATBOT WITH A SIMPLE DIALOG SYSTEM

Aim:

To design a ChatBot with a simple dialog system.

Algorithm:

1. Initialize ChatBot: Create a list of predefined patterns (regular expressions) and

corresponding responses. These will guide the chatbot's replies based on user

input.

2. Create a Simple Chat Loop.

3. Accept user input.

4. Search for matching patterns in the predefined list.

5. Respond based on the matched pattern.

6. Stop the Conversation: If the user inputs “bye”, exit the chat loop and end the

conversation.

7. Handle Unmatched Inputs: If the user input doesn’t match any predefined pattern,

output a default message asking the user to rephrase.


Program:

import nltk
from nltk.chat.util import Chat,
reflections

# Define chatbot pairs: pattern and


response
pairs = [
(r"hi|hello", ["Hello!", "Hi there!"]),
(r"how are you?", ["I'm good, thank
you!", "I'm doing great!"]),
(r"what is your name?", ["I am a
chatbot.", "You can call me
Chatbot."]),
(r"bye", ["Goodbye!", "See you
later!"]),
(r"(.*)", ["Sorry, I don't understand
that. Can you ask something else?"]),
]

# Create chatbot
chatbot = Chat(pairs, reflections)

# Start the chat


def start_chat():
print("Hello! Type 'bye' to end the
conversation.")
while True:
user_input = input("You: ")
if user_input.lower() == 'bye':
print("Chatbot: Goodbye!")
break

response=chatbot.respond(user_input)
print("Chatbot:", response)
start_chat()
Output:

Result:
Thus the implementation of a chatbot with a simple dialog system using python had
been successfully implemented and the output is also verified.
EX.NO:8

CONVERT TEXT TO SPEECH AND FIND ACCURACY

Aim:
To Convert text to speech and find accuracy using python.

Algorithm:

1. A text string that you want to convert into speech (e.g., "Hello, this is a simple

text-to-speech conversion example in Google Colab.").

2. Initialize TTS Engine

3. Import the necessary module gTTS from the gtts library.

4. Specify the language (e.g., 'en' for English).

5. Set the text that you want to convert to speech.

6. Convert Text to Speech

7. Use the gTTS() function to convert the given text into speech.

8. Save Audio

9. Save the speech output to an audio file (e.g., output.mp3) using the .save() method.

10.Play Audio

11.Use IPython.display.Audio() to play the saved audio file in the Colab environment.

12.The generated speech is saved as an audio file (output.mp3),


Program:

!pip install gTTS


# Import necessary modules
from gtts import gTTS
import IPython.display as ipd

# Text to be converted into speech


text = "hello, this is a simple text-to-speech conversion example in Google Colab."

# Language in which you want to convert


language = 'en'

# Passing the text and language to the engine


tts = gTTS(text=text, lang=language, slow=False)

# Save the converted audio to a file


tts.save("output.mp3")

# Play the converted file


ipd.Audio("output.mp3")
Output:

Recognized Text: Hello, this is a simple text-to-speech conversion example in Google


Colab.
Accuracy: 100.00%

Result:
Thus the conversion of text to speech and its accuracy using python had
been successfully implemented and the output is also verified.
EX.NO:9

DESIGN A SPEECH RECOGNITION SYSTEM AND FIND THE ERROR RATE

Aim:

To design a speech recognition system and find the error rate using Python.

Algorithm:

1. Install the SpeechRecognition library to use Google's speech-to-text service.

2. Prompt the user to upload a .wav audio file containing the spoken input.

3. Create a recognizer object from the speech_recognition module.

4. Load the uploaded audio file using AudioFile() and extract audio data using

record().

5. Use recognize_google() to convert the speech in the audio file to text.

6. Define the actual (expected) text for comparison.

7. Split both reference and recognized text into word lists.

8. Calculate Word Error Rate (WER)

9. Print both the recognized text and the calculated error rate.

10.If recognition fails, print an appropriate message.


Program:

# Step 1: Install required library


!pip install SpeechRecognition

# Step 2: Upload a WAV file


from google.colab import files
uploaded = files.upload() # Upload an audio file like 'test.wav'

# Step 3: Recognize speech


import speech_recognition as sr

recognizer = sr.Recognizer()
file_name = list(uploaded.keys())[0]

with sr.AudioFile(file_name) as source:


audio_data = recognizer.record(source)

# Step 4: Transcribe and compare


try:
recognized_text = recognizer.recognize_google(audio_data)
print("Recognized Text:", recognized_text)
# Reference text (what the speaker actually said)
reference_text = "hello welcome to our project"

# Simple word error rate calculation


ref_words = reference_text.lower().split()
recog_words = recognized_text.lower().split()
errors = sum(1 for a, b in zip(ref_words, recog_words) if a != b)
errors += abs(len(ref_words) - len(recog_words)) # Add extra/missing words
wer = errors / len(ref_words) if ref_words else 1

print("Error Rate:", round(wer * 100, 2), "%")


except:
print("Could not recognize the audio.")
Output:

Result:
Thus the implementation of a speech recognition system and its error
rate using python had been successfully implemented and the output is also
verified.

You might also like