0% found this document useful (0 votes)

108 views31 pages

CCS369 - Text and Speech Analysis

text and speech analysis practice notes

Uploaded by

aaishamaryamj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views31 pages

CCS369 - Text and Speech Analysis

text and speech analysis practice notes

Uploaded by

aaishamaryamj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

CCS369- TEXT AND SPEECH ANALYSIS

LAB RECORD

NAME: ….......................................................

REGISTER NUMBER: …...................................................…

DEGREE&BRANCH: ..................................................….

YEAR/SEMESTER: ….................................................……
BONAFIDE CERTIFICATE

Certified that this is a bonafide record work done

by
Selvan/Selvi......................................................................................
.. with
Register No ………………………… studying in III-year VI semester in
Computer Science and Engineering branch of this Institution during the
academic year 2023-2024[ EVEN SEM]

Staff in-charge Head of the Department

Submitted for the Anna University practical examination held at SCAD

College of Engineering and Technology, Cherranmahadevi on
………………………

Internal Examiner External Examiner

1. Create Regular expressions in Python for detecting word patterns and tokenizing text

AIM:
To create Regular expressions in Python for detecting word patterns and tokenizing text. Regular
expressions (regex) are powerful tools for detecting patterns in text.
ALGORITHM:
1. Import the module.
2. Define your text.
3. Create regular expressions for detecting word patterns.
4. Use the re.findall(pattern, text)
5. Print or process the(matches)
6. Create regular expressions for tokening text.
7. Print or process the tokens.
8. Repeat as necessary.

word patterns
PROGRAM:
import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r'\b[a-zA-Z]*[qQ][a-zA-Z]*\b' # Words containing the letter 'q' or 'Q'
matches = re.findall(pattern, text)
print(matches) # Output: ['quick']

OUTPUT:
['quick']

=== Code Execution Successful ===

tokenizing text
PROGRAM:
import re
text = "The quick brown fox"
tokens = re.split(r'\s+', text)
print(tokens) # Output: ['The', 'quick', 'brown', 'fox']

OUTPUT:
['The', 'quick', 'brown', 'fox']

=== Code Execution Successful ===

RESULT:
Thus regular expressions in Python to detect word patterns and tokenize text is written and verified.
Depending on your specific requirements, you can customize the regex patterns accordingly.
2. Getting started with Python and NLTK - Searching Text, Counting Vocabulary,
Frequency Distribution, Collocations, Bigrams
AIM:
To get started with Python and NLTK - Searching Text, Counting Vocabulary, Frequency Distribution,
Collocations, Bigrams.
ALGORITHM:
Install NLTK.
Import NLTK and Download Resources.
Load text data.
Searching text.
Counting vocabulary.
Frequency Distribution.
Collocations.
Bigrams.
PROGRAM:
(A) SEARCHING TEXT
import nltk
from nltk.book import *
# Search for occurrences of a word
def search_word(text, word):
print("Concordance for word:", word)
text.concordance(word)
# Search for similar words
def search_similar(text, word):
print("Similar words for:", word)
text.similar(word)

# Load text data (you can choose from built-in texts)

text = text1 # Moby Dick

# Perform searches
search_word(text, "whale")
search_similar(text, "whale")
OUTPUT:
Concordance for word: whale
Displaying 25 of 1226 matches:
the Sperm Whale. CHAPTER 32. Cetology. CHAPTER 33. The Spe
rise of the Leviathan. CHAPTER 105. Does the Whale '

(B) COUNTING VOCABULARY

import notch
from nltk.book import *
# Count unique words
def count_vocabulary(text):
vocabulary = set(text)
num_unique_words = len(vocabulary)
return num_unique_words
# Load text data (you can choose from built-in texts)
text = text1 # Moby Dick
# Count vocabulary
num_unique_words = count_vocabulary(text)
print("Number of unique words:", num_unique_words)

OUTPUT:
Number of unique words: 19317

C) FREQUENCY DISTRIBUTION
import nltk
from nltk.book import *

# Calculate frequency distribution of words

def calculate_frequency_distribution(text):
fdist = FreqDist(text)
return fdist

# Load text data (you can choose from built-in texts)

text = text1 # Moby Dick

# Calculate frequency distribution

fdist = calculate_frequency_distribution(text)

# Print the most common words

print("Most common words:", fdist.most_common(10))

OUTPUT:
Most common words: [('the', 13721), (',', 7301), ('.', 6483), ('of', 4748), ('and', 4515), ('a',
3090), ('to', 2824), (';', 2552), ('in', 2185), ('that', 1659)]

D) COLLOCATIONS:
import nltk
from nltk.book import *

# Find collocations
def find_collocations(text):
print("Collocations:")
text.collocations()

# Load text data (you can choose from built-in texts)

text = text1 # Moby Dick

# Find collocations
find_collocations(text)

OUTPUT:
Collocations:
Sperm Whale; Moby Dick; White Whale; old man; Captain Ahab; sperm
whale; Right Whale; Captain Peleg; New Bedford; Cape Horn; cried Ahab;
years ago; lower jaw; never mind; Father Mapple; cried Stubb; chief
mate; white whale; ivory leg; one hand

E) BIGRAMS
import nltk
from nltk.book import *

# Generate bigrams
def generate_bigrams(text):
print("Example bigrams:")
bigrams = list(nltk.bigrams(text))
for bigram in bigrams[:10]: # Displaying the first 10 bigrams as an example
print(bigram)

# Load text data (you can choose from built-in texts)

text = text1 # Moby Dick

# Generate bigrams
generate_bigrams(text)

OUTPUT:
Example bigrams:
('CHAPTER', '1')
('1', 'Loomings')
('Loomings', '.')
('.', 'Call')
('Call', 'me')
('me', 'Ishmael')
('Ishmael', '.')
('.', 'Some')
('Some', 'years')
('years', 'ago')

RESULT:
These are some basic steps to get started with NLTK for text analysis tasks. You can explore further
functionalities and modules within NLTK for more advanced text processing and analysis.
3. Accessing Text Corpora using NLTK in Python
AIM:
To access text corpora using NLTK in python.
ALGORITHM:
4. Install NLTK
5. Import NLTK
6. Download Corpora or models.
7. Access text Corpora.
8. Explore Further.
PROGRAM:
import nltk
from nltk.corpus import gutenberg, brown, wordnet

# Accessing the Gutenberg Corpus

print("=== Gutenberg Corpus ===")
print("Available files:", gutenberg.fileids())
emma_text = gutenberg.raw('austen-emma.txt')[:500] # Extracting first 500 characters
print("Sample text from 'Emma' by Jane Austen:")
print(emma_text)

# Accessing the Brown Corpus

print("\n=== Brown Corpus ===")
print("Available categories (genres):", brown.categories())
news_text = brown.raw(categories='news')[:500] # Extracting first 500 characters
print("Sample text from the 'news' category:")
print(news_text)

# Accessing the WordNet Corpus

print("\n=== WordNet Corpus ===")
car_synsets = wordnet.synsets('car') # Synsets for the word 'car'
print("Synsets for the word 'car':", car_synsets)
print("Definitions of the synsets:")
for synset in car_synsets:
print(synset.definition())

OUTPUT:
=== Gutenberg Corpus ===
Available files: ['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', ...]
Sample text from 'Emma' by Jane Austen:
[Emma by Jane Austen 1816]

VOLUME I

CHAPTER I

Emma Woodhouse, handsome, clever, and rich, with a comfortable home

and happy disposition, seemed to unite some of the best blessings
of existence; and had lived nearly twenty-one years in the world
with very little to distress or vex her.

=== Brown Corpus ===

Available categories (genres): ['adventure', 'belles_lettres', 'editorial', ...]
Sample text from the 'news' category:
The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election
produced "no evidence" that any irregularities took place.

=== WordNet Corpus ===

Synsets for the word 'car': [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'),
Synset('cable_car.n.01')]
Definitions of the synsets:
a motor vehicle with four wheels; usually propelled by an internal combustion engine
a wheeled vehicle adapted to the rails of railroad
the compartment that is suspended from an airship and that carries personnel and the cargo and the
power plant
where passengers ride up and down
a conveyance for passengers or freight on a cable railway

RESULT:
This code snippet demonstrates how to access text corpora from the Gutenberg, Brown, and WordNet
corpora using NLTK in Python, along with sample output from each corpus.

4. Write a function that finds the 50 most frequently occurring words of a text that are not stop
words.
AIM:
To Write a function that finds the 50 most frequently occurring words of a text that are not stop
words.
ALGORITHM:
1. Accept a piece of text as input.
2. Tokenize the input text into words.
3. Remove stopwords from the list of tokens. Stopwords are commonly occurring words in a
language (e.g., "the", "is", "and") that do not carry significant meaning.
4. Count the frequency of each word in the filtered list of tokens.
5. Sort the words based on their frequencies in descending order.
6. Select the top 50 words with the highest frequencies.
7. Return the list of the 50 most frequently occurring words.
PROGRAM:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

def most_frequent_words(text):
# Tokenize the text
tokens = word_tokenize(text)

# Filter out stop words

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words]

# Calculate frequency distribution of filtered tokens

fdist = FreqDist(filtered_tokens)

# Get the 50 most frequent words

most_frequent = fdist.most_common(50)
return most_frequent

# Example usage:
text = "This is a sample text. It contains some words that will be counted. The words in this text will
be analyzed to find the most frequent ones."
result = most_frequent_words(text)
print("50 most frequently occurring words (excluding stop words):")
print(result)

OUTPUT:
50 most frequently occurring words (excluding stop words):
[('words', 2), ('text', 2), ('sample', 1), ('contains', 1), ('counted', 1), ('analyzed', 1), ('find', 1), ('frequent',
1), ('ones', 1)]

RESULT:
This function first tokenizes the input text, then filters out stop words using NLTK's English stop
words list. After that, it calculates the frequency distribution of the filtered tokens and returns the 50
most frequent words along with their frequencies. You can replace the example text with any text you
want to analyze.
5. Implement the Word2Vec model.
AIM:
To implement the Word2Vec model.
ALGORITHM:
1. Accept a corpus of text data as input.
2. Tokenize the text into individual words or phrases.
3. Remove stopwords, punctuation, and other noise if necessary.
4. Initialize a Word2Vec model with parameters like vector size, window size, minimum word
count.
5. These vectors capture semantic meanings of words based on their context in the training data.
6. Evaluate the performance of the Word2Vec model using tasks like word similarity, analogy
7. The trained Word2Vec model with word embeddings.

PROGRAM:
import gensim
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Sample text data

text = "Word2Vec is a technique for natural language processing. It is used for converting words into
vectors. Word vectors can be used for various NLP tasks such as sentiment analysis, named entity
recognition, and machine translation."

# Tokenize the text

tokens = word_tokenize(text.lower())

# Remove stop words

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.isalnum() and word not in stop_words]

# Create Word2Vec model

model = Word2Vec([filtered_tokens], vector_size=100, window=5, min_count=1, workers=4)
# Test the model
word = 'word2vec'
most_similar_words = model.wv.most_similar(word)

# Output
print("Words most similar to '{}':".format(word))
for similar_word, similarity in most_similar_words:
print(similar_word, similarity)

OUTPUT:
Words most similar to 'word2vec':
nlp 0.24648666310310364
machine 0.24033652210235596
recognition 0.2289284769296646
used 0.22583018231391907
vectors 0.22268365383148193
natural 0.20813092589378357
language 0.19862347853183746
technique 0.17929100954532623
sentiment 0.1550669378042221
converting 0.1451803447008133

RESULT:
This code snippet demonstrates how to implement the Word2Vec model using Gensim in Python.

6. Use a transformer for implementing classification

AIM:
To Use a transformer for implementing classification by python code.

ALGORITHM:
1. Data preparation.
2. Model architecture.
3. Training loop.
4. Evaluation.
5. Testing.
6. Inference.
7. Deployment.
PROGRAM:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset

# Define the Transformer model

class TransformerClassifier(nn.Module):
def __init__(self, input_dim, output_dim, max_seq_length, num_heads, num_layers):
super(TransformerClassifier, self).__init__()
self.embedding = nn.Embedding(input_dim, 256)
self.transformer = nn.TransformerEncoder(
nn.TransformerEncoderLayer(d_model=256, nhead=num_heads),
num_layers=num_layers)
self.fc = nn.Linear(256 * max_seq_length, output_dim)

def forward(self, x):

x = self.embedding(x)
x = x.permute(1, 0, 2) # Shape: (seq_len, batch_size, embed_size)
x = self.transformer(x)
x = x.permute(1, 0, 2) # Reshape to (batch_size, seq_len, embed_size)
x = x.reshape(x.size(0), -1) # Flatten
x = self.fc(x)
return F.log_softmax(x, dim=1)

# Dummy dataset
class DummyDataset(Dataset):
def __init__(self, data, labels):
self.data = data
self.labels = labels

def __len__(self):
return len(self.data)

def getitem(self, idx):

return self.data[idx], self.labels[idx]

# Example usage
input_dim = 1000 # Size of vocabulary
output_dim = 10 # Number of classes
max_seq_length = 20 # Maximum sequence length
num_heads = 8 # Number of attention heads
num_layers = 6 # Number of transformer layers

# Instantiate the model

model = TransformerClassifier(input_dim, output_dim, max_seq_length, num_heads, num_layers)

# Dummy data
data = torch.randint(0, input_dim, (1000, max_seq_length)) # 1000 samples of sequences of length
20
labels = torch.randint(0, output_dim, (1000,))
# Create dataset and dataloader
dataset = DummyDataset(data, labels)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

# Define loss function and optimizer

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
for epoch in range(10):
for batch_idx, (inputs, targets) in enumerate(dataloader):
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, targets)
loss.backward()
optimizer.step()

if batch_idx % 10 == 0:
print('Epoch {} Batch {} Loss: {:.4f}'.format(epoch, batch_idx, loss.item()))

# Example inference
test_input = torch.randint(0, input_dim, (1, max_seq_length)) # Test input sequence
with torch.no_grad():
output_probs = torch.exp(model(test_input))
predicted_class = torch.argmax(output_probs)
print("Predicted class:", predicted_class.item())

OUTPUT:
Epoch 0 Batch 0 Loss: 2.3421
Epoch 0 Batch 10 Loss: 2.2925
Epoch 0 Batch 20 Loss: 2.3014
...
Epoch 9 Batch 0 Loss: 0.0543
Epoch 9 Batch 10 Loss: 0.0402
Epoch 9 Batch 20 Loss: 0.0221
Predicted class: 3
RESULT:
This is a basic implementation of a Transformer-based classifier using PyTorch. You can adjust
hyperparameters, model architecture, and dataset accordingly to fit your specific task.
7. Design a chatbot with a simple dialog system

AIM:
To design a chatbot with a simple dialog system
ALGORITHM:
1. Define objectives.
2. Select platform.
3. Data collection.
4. Preprocessing.
5. Training data creation.
6. Model design.
7. Model training
8. Evaluation.
9. Integration
10. Testing
11. Deployment
12. maintenance
PROGRAM:
import random

# Define responses for different intents

responses = {
"greeting": ["Hello!", "Hi there!", "Hey! How can I help you?"],
"farewell": ["Goodbye!", "See you later!", "Have a great day!"],
"thanks": ["You're welcome!", "No problem!", "Anytime!"],
"default": ["Sorry, I didn't understand that.", "Could you please repeat that?", "I'm not sure how to
respond to that."]
}

# Define rules for mapping user inputs to intents

rules = {
"greeting": ["hello", "hi", "hey", "howdy"],
"farewell": ["bye", "goodbye", "see you later", "take care"],
"thanks": ["thank you", "thanks", "thanks a lot"]
}

# Function to classify user input into intents

def classify_intent(user_input):
user_input = user_input.lower()
for intent, patterns in rules.items():
for pattern in patterns:
if pattern in user_input:
return intent
return "default"

# Function to generate response based on intent

def generate_response(intent):
return random.choice(responses[intent])

# Main function to run the chatbot

def chatbot():
print("Chatbot: Hi! How can I help you today?")
while True:
user_input = input("You: ")
if user_input.lower() == 'exit':
print("Chatbot: Goodbye!")
break
intent = classify_intent(user_input)
response = generate_response(intent)
print("Chatbot:", response)

# Run the chatbot

if __name__ == "__main__":
chatbot()

OUTPUT:
Chatbot: Hi! How can I help you today?
You: Hi
Chatbot: Hello!
You: Can you help me with a problem?
Chatbot: Sorry, I didn't understand that.
You: Thank you
Chatbot: You're welcome!
You: Bye
Chatbot: Goodbye!
RESULT:
Thus, We have designed a chatbot with simple dialog system.
8.Convert text to speech and find accuracy

AIM:
To convert text to speech and find accuracy

ALGORITHM:
Input: Text data (source), Speech data (target), Ground truth text
Output: Synthesized speech data, Accuracy metrics

1. Convert text data into speech data using a text-to-speech library or API.
- Utilize the provided source text data.
- Generate synthesized speech data.

2. Evaluate the accuracy of the synthesized speech.

- Utilize the synthesized speech data and ground truth text.
- Transcribe the synthesized speech into text using a speech recognition library or API.
- Calculate accuracy metrics (e.g., Word Error Rate, Character Error Rate) to quantify the accuracy.
- Output the accuracy metrics.

3. Return synthesized speech data and accuracy metrics.

PROGRAM
import pyttsx3
import speech_recognition as sr

# Function to convert text to speech

def text_to_speech(text):
engine = pyttsx3.init()
engine.setProperty('rate', 150) # Speed of speech
engine.say(text)
engine.runAndWait()

# Function to convert speech to text

def speech_to_text():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Say something:")
audio = recognizer.listen(source)

try:
text = recognizer.recognize_google(audio)
return text.lower()
except sr.UnknownValueError:
print("Could not understand audio")
return ""
except sr.RequestError as e:
print("Could not request results: {0}".format(e))
return ""

# Function to calculate accuracy

def calculate_accuracy(ground_truth, synthesized_text):
words_ground_truth = ground_truth.split()
words_synthesized = synthesized_text.split()
num_correct = sum(1 for x, y in zip(words_ground_truth, words_synthesized) if x == y)
accuracy = num_correct / len(words_ground_truth)
return accuracy

# Main function
def main():
# Input text
text = "Hello, how are you?"
# Convert text to speech
print("Synthesizing speech from text...")
text_to_speech(text)

# Convert synthesized speech to text

print("Transcribing synthesized speech...")
synthesized_text = speech_to_text()

# Ground truth
ground_truth = "hello how are you"

# Calculate accuracy
accuracy = calculate_accuracy(ground_truth, synthesized_text)
print("Accuracy:", accuracy)

if __name__ == "__main__":
main()

OUTPUT:
RESULT:
This code first converts the input text into speech using the text to speech() function. Then, it records
speech from the microphone, transcribes it into text using the speech to text() function, and compares
it with the ground truth text. Finally, it calculates the accuracy of the synthesized speech using the
calculate accuracy () function.
9. Design a speech recognition system and find the error rate
AIM:
To design a speech recognition system and find the error rate

ALGORITHM:
Input: Speech recordings, Ground truth transcriptions
Output: Recognized transcriptions, Error rate (e.g., WER or CER)

1. Preprocess the speech recordings:

- Segment into smaller units (e.g., frames).
- Extract features (e.g., MFCCs) from each segment.
- Normalize the feature vectors.

2. Train the speech recognition model:

- Select a suitable model architecture.
- Split the dataset into training, validation, and test sets.
- Train the model on the training set.
- Tune hyperparameters using the validation set.

3. Recognize speech:
- Use the trained model to transcribe speech recordings.

4. Calculate the error rate:

- Compare the recognized transcriptions with the ground truth transcriptions.
- Compute the Word Error Rate (WER) or Character Error Rate (CER).
- WER = (Number of substitutions + Number of deletions + Number of insertions) / Total number of
words in the ground truth transcription.

5. Output recognized transcriptions and error rate.

PROGRAM:
import speech_recognition as sr

# Function to recognize speech

def recognize_speech(audio_file):
recognizer = sr.Recognizer()

with sr.AudioFile(audio_file) as source:

audio_data = recognizer.record(source) # Read the entire audio file

try:
recognized_text = recognizer.recognize_google(audio_data)
return recognized_text.lower()
except sr.UnknownValueError:
print("Speech recognition could not understand audio")
return ""
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return ""

# Function to calculate Word Error Rate (WER)

def calculate_wer(ground_truth, recognized_text):
# Split ground truth and recognized text into words
ground_truth_words = ground_truth.split()
recognized_words = recognized_text.split()

# Initialize counters for substitutions, deletions, and insertions

substitutions = 0
deletions = 0
insertions = 0

# Calculate WER
for word in ground_truth_words:
if word in recognized_words:
recognized_words.remove(word)
else:
deletions += 1

substitutions = len(recognized_words)
total_words = len(ground_truth_words)
wer = (substitutions + deletions + insertions) / total_words
return wer

# Main function
def main():
# Ground truth transcription
ground_truth = "hello how are you"

# Recognize speech from audio file

audio_file = "sample_audio.wav" # Provide the path to your audio file
recognized_text = recognize_speech(audio_file)

# Calculate Word Error Rate (WER)

wer = calculate_wer(ground_truth, recognized_text)
print("Recognized text:", recognized_text)
print("Word Error Rate (WER):", wer)

if __name__ == "__main__":
main()
OUTPUT:
RESULT:
This code takes an audio file path as input, recognizes speech using the Google Speech Recognition
service, and then calculates the Word Error Rate (WER) between the recognized text and the ground
truth transcription. Make sure to replace "sample_audio.wav" with the path to your audio file.

ISO 262, ENG, Ed 3, 2023
No ratings yet
ISO 262, ENG, Ed 3, 2023
8 pages
A2. Revision 2
No ratings yet
A2. Revision 2
2 pages
CCS369-Text and Speech Analysis Lab (1-9) (1)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9) (1)
37 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
tsarecord
No ratings yet
tsarecord
22 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
CCS369-LAB EX 3,4,5
No ratings yet
CCS369-LAB EX 3,4,5
8 pages
TSA Student
No ratings yet
TSA Student
20 pages
Batch 2
No ratings yet
Batch 2
13 pages
NLP FinAL (1)
No ratings yet
NLP FinAL (1)
27 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Ngram 2x3
No ratings yet
Ngram 2x3
5 pages
All Practicals
No ratings yet
All Practicals
33 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
Language Engineering - Section
No ratings yet
Language Engineering - Section
20 pages
R22 Nlp Python Programs
No ratings yet
R22 Nlp Python Programs
15 pages
NLP___
No ratings yet
NLP___
28 pages
Natural Language Processing Lab Manual
No ratings yet
Natural Language Processing Lab Manual
24 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
53 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Unit 5
No ratings yet
Unit 5
4 pages
Ai&Ml Bai601 Nlp Lab Manual
No ratings yet
Ai&Ml Bai601 Nlp Lab Manual
48 pages
NLP_Record(Weeks 1-12) (1)
No ratings yet
NLP_Record(Weeks 1-12) (1)
41 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Lab3 IR BIM
No ratings yet
Lab3 IR BIM
14 pages
Detail NLP
No ratings yet
Detail NLP
5 pages
AIM_PROCEDURE_RESULT_SINGLE SIDE
No ratings yet
AIM_PROCEDURE_RESULT_SINGLE SIDE
18 pages
NLP Lab Manual (R20)
50% (2)
NLP Lab Manual (R20)
24 pages
管新潮"语料库与Python应用"讲座课件
No ratings yet
管新潮"语料库与Python应用"讲座课件
39 pages
NLP Record
No ratings yet
NLP Record
15 pages
Nlp Lab Manual
No ratings yet
Nlp Lab Manual
32 pages
NLP Record
No ratings yet
NLP Record
6 pages
NLP Previous Sem
No ratings yet
NLP Previous Sem
5 pages
Semantic Analysis Theory1
No ratings yet
Semantic Analysis Theory1
16 pages
NLP LAB_MANUAL (1)
No ratings yet
NLP LAB_MANUAL (1)
33 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
NLP Exercises
No ratings yet
NLP Exercises
2 pages
Module5 PPT
No ratings yet
Module5 PPT
69 pages
NLP_Midterm_Spring2025
No ratings yet
NLP_Midterm_Spring2025
7 pages
Python NLP
No ratings yet
Python NLP
15 pages
Midterm 1
No ratings yet
Midterm 1
5 pages
1
No ratings yet
1
13 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
Text Processing
No ratings yet
Text Processing
16 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
4.TWITTER EXTRACTION AND ANALYTICS
No ratings yet
4.TWITTER EXTRACTION AND ANALYTICS
45 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
NLP Lab Complete
No ratings yet
NLP Lab Complete
23 pages
NLP_record[1][1] (1)
No ratings yet
NLP_record[1][1] (1)
23 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
25 pages
UBC Summer School in NLP - VSP 2019 Lecture 10
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 10
33 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Tsa Lab Manual Document About Text and Speech Analysis
No ratings yet
Tsa Lab Manual Document About Text and Speech Analysis
25 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
7 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
32 pages
CCS354 Network Security Manual
100% (1)
CCS354 Network Security Manual
29 pages
Digital Marketing Lab Manual
100% (1)
Digital Marketing Lab Manual
17 pages
CAD Phase1
No ratings yet
CAD Phase1
1 page
BUS_PASS_FORM_FINAL_2
No ratings yet
BUS_PASS_FORM_FINAL_2
1 page
Experimental Determination of Panel Generation Factor For Apo Area of Federal Capital Territory in Nigeria
No ratings yet
Experimental Determination of Panel Generation Factor For Apo Area of Federal Capital Territory in Nigeria
5 pages
Lesson 8 Human Persons As Oriented Towards Their Impending Death
No ratings yet
Lesson 8 Human Persons As Oriented Towards Their Impending Death
4 pages
Codes - DNV RP-B401 - Cathodic Protection Design
100% (2)
Codes - DNV RP-B401 - Cathodic Protection Design
46 pages
Innocent CV
No ratings yet
Innocent CV
4 pages
SA110 LW200xxx Serviceanleitung
No ratings yet
SA110 LW200xxx Serviceanleitung
24 pages
Macroeconomics Second 2nd Canadian Edition by Versioby Paul Krugman instant download
100% (1)
Macroeconomics Second 2nd Canadian Edition by Versioby Paul Krugman instant download
30 pages
Infinity Quick Start Rules ENG CODE ONE
100% (1)
Infinity Quick Start Rules ENG CODE ONE
20 pages
BRIEF2 Fact Sheet
No ratings yet
BRIEF2 Fact Sheet
2 pages
C++ Programming: From Problem Analysis To Program Design, Fifth Edition
No ratings yet
C++ Programming: From Problem Analysis To Program Design, Fifth Edition
38 pages
CS2301-Software Engineering 2 Marks
100% (1)
CS2301-Software Engineering 2 Marks
17 pages
Science 8
No ratings yet
Science 8
3 pages
The Fountain of Apollonia in Illyria - Author Loreta Çapeli
No ratings yet
The Fountain of Apollonia in Illyria - Author Loreta Çapeli
7 pages
9-Error Detection and Correction
No ratings yet
9-Error Detection and Correction
33 pages
Circle Theorems - Mathematics GCSE Revision
No ratings yet
Circle Theorems - Mathematics GCSE Revision
9 pages
Guided Img Body Image 2
No ratings yet
Guided Img Body Image 2
3 pages
F9-CS-CH4-FM - Part 1-3
100% (2)
F9-CS-CH4-FM - Part 1-3
26 pages
Sap Abap Prog
No ratings yet
Sap Abap Prog
3 pages
Mukt Shaikshanik Sansadhno (Oer) Ke Prati Rajkiya Ucch Madhyamik Vidyalayo Ke Gramin V Shari Prdhanacharya Ka Adhyayan
No ratings yet
Mukt Shaikshanik Sansadhno (Oer) Ke Prati Rajkiya Ucch Madhyamik Vidyalayo Ke Gramin V Shari Prdhanacharya Ka Adhyayan
6 pages
Electromagnetic-Induction Practice Worksheet Physics
No ratings yet
Electromagnetic-Induction Practice Worksheet Physics
9 pages
Tandy's Septic Plan
No ratings yet
Tandy's Septic Plan
1 page
Photosynthesis and Cellular Respiration Unit Plan
100% (1)
Photosynthesis and Cellular Respiration Unit Plan
27 pages
Simple Strain PDF
No ratings yet
Simple Strain PDF
12 pages
Factor of Safety
100% (1)
Factor of Safety
4 pages
CON FLY Betocarb UF EN
No ratings yet
CON FLY Betocarb UF EN
4 pages
Humid Tropical Expansive Soils of Trinidad Their G
No ratings yet
Humid Tropical Expansive Soils of Trinidad Their G
18 pages
Pending Points Civil
No ratings yet
Pending Points Civil
3 pages
ELE 4623: Control Systems: Faculty of Engineering Technology
No ratings yet
ELE 4623: Control Systems: Faculty of Engineering Technology
14 pages

CCS369 - Text and Speech Analysis

Uploaded by

CCS369 - Text and Speech Analysis

Uploaded by

CCS369- TEXT AND SPEECH ANALYSIS

REGISTER NUMBER: …...................................................…

Certified that this is a bonafide record work done

Staff in-charge Head of the Department

Submitted for the Anna University practical examination held at SCAD

Internal Examiner External Examiner

=== Code Execution Successful ===

=== Code Execution Successful ===

# Load text data (you can choose from built-in texts)

(B) COUNTING VOCABULARY

# Calculate frequency distribution of words

# Load text data (you can choose from built-in texts)

# Calculate frequency distribution

# Print the most common words

# Load text data (you can choose from built-in texts)

# Load text data (you can choose from built-in texts)

# Accessing the Gutenberg Corpus

# Accessing the Brown Corpus

# Accessing the WordNet Corpus

Emma Woodhouse, handsome, clever, and rich, with a comfortable home

=== Brown Corpus ===

=== WordNet Corpus ===

# Filter out stop words

# Calculate frequency distribution of filtered tokens

# Get the 50 most frequent words

# Sample text data

# Tokenize the text

# Remove stop words

# Create Word2Vec model

6. Use a transformer for implementing classification

# Define the Transformer model

def forward(self, x):

def __getitem__(self, idx):

# Instantiate the model

# Define loss function and optimizer

# Define responses for different intents

# Define rules for mapping user inputs to intents

# Function to classify user input into intents

# Function to generate response based on intent

# Main function to run the chatbot

# Run the chatbot

2. Evaluate the accuracy of the synthesized speech.

3. Return synthesized speech data and accuracy metrics.

# Function to convert text to speech

# Function to convert speech to text

# Function to calculate accuracy

# Convert synthesized speech to text

1. Preprocess the speech recordings:

2. Train the speech recognition model:

4. Calculate the error rate:

5. Output recognized transcriptions and error rate.

# Function to recognize speech

with sr.AudioFile(audio_file) as source:

# Function to calculate Word Error Rate (WER)

# Initialize counters for substitutions, deletions, and insertions

# Recognize speech from audio file

# Calculate Word Error Rate (WER)

You might also like

def getitem(self, idx):