AI Phase3
AI Phase3
Phase 3
Introduction
Program:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.layers import TextVectorization
import re,string
from tensorflow.keras.layers import
LSTM,Dense,Embedding,Dropout,LayerNormalization
Program:
df=pd.read_csv('Chatbot.txt',sep='\t',names=['question','answer'])
print(f'Dataframe size: {len(df)}')
df.head()
Program:
# Check for missing values
print(df.isnull().sum())
# Explore statistics
print(df.describe())
# Visualize the data (e.g., histograms, scatter plots, etc.)
4. Feature Engineering:
Program:
# Example: One-hot encoding for categorical variables
df = pd.get_dummies(df, columns=[' Avg. Area Income ', ' Avg.
AreaHouse Age '])
Split your dataset into training and testing sets. This helps you
evaluateyour model's performance later
Program:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
1.Data Variety:
2.Data Volume:
3.Data Cleaning:
4.Context Understanding:
Multilingual Support:
➢ The first step is to identify the dataset that you want to load.
Program:
import pandas as pd
import numpy as np
import RandomForestRegressor
xgboost as xg
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146:
UserWarning: A NumPy version >=1.16.5
Loading Dataset:
dataset = pd.read_csv('E:/USA_Housing.csv')
Data Exploration:
Dataset:
Output:
Segmentation:
In [1]:
data=open('/kaggle/input/simple-dialogs-for-chatbot/dialogs.txt','r').
read()
In [2]:
Out [1]:
[['hi, how are you doing?', "i'm fine. how about your
self?"], ["i'm fine. how about yourself?", "i'm prett
y good. thanks for asking."], ["i'm pretty good. than
ks for asking.", 'no problem. so how have you been?']
, ['no problem. so how have you been?', "i've been gr
eat. what about you?"], ["i've been great. what about
you?", "i've been good. i'm in school right now."]]
In [3]:
print(questions[0:5])
print(questions[0:5])
Out [2]:
['hi, how are you doing?', "i'm fine. how about yourself?", "i'm pretty
good. thanks for asking.", 'no problem. so how have you been?', "i'v
e been great. what about you?"]
["i'm fine. how about yourself?", "i'm pretty good. thanks for asking."
, 'no problem. so how have you been?', "i've been great. what about
you?", "i've been good. i'm in school right now."]
Normalization:
In [5]:
def remove_diacritic(text):
return ''.join(char for char in unicodedata.normalize('NFD',text)
if unicodedata.category(char) !='Mn')
In [6]:
def preprocessing(text):
text=text.strip()
return text
In [7]:
print(preprocessed_questions[0])
print(preprocessed_answers[0])
Out [3]:
<start> hi , how are you doing ? <end>
<start> i m fine . how about yourself ? <end>
Tokenization:
In [8]:
def tokenize(lang):
lang_tokenizer = tf.keras.preprocessing.text.Tokenizer(
filters='')
return lang_tokenizer
Some common data preprocessing tasks include:
Data cleaning:
Data transformation:
Feature engineering:
➢ This involves creating new features from the existing data. For
example, this may involve creating features that represent
interactions between variables, or features that represent
summary statistics of the data.
Data integration: