0% found this document useful (0 votes)

6 views123 pages

Deeplearning - Ai Deeplearning - Ai

Uploaded by

amanueltewodros94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views123 pages

Deeplearning - Ai Deeplearning - Ai

Uploaded by

amanueltewodros94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 123

Copyright Notice

These slides are distributed under the Creative Commons License.

DeepLearning.AI makes these slides available for educational purposes. You may not use or
distribute these slides for commercial purposes. You may make copies of these slides and
use or distribute them for educational purposes as long as you cite DeepLearning.AI as the
source of the slides.

For the rest of the details of the license, see

https://creativecommons.org/licenses/by-sa/2.0/legalcode
Input to a Deep Neural Network

tf.data makes input pipelines in TensorFlow to be

● Fast

● Flexible

● Easy-to-use
Basic mechanics

Data sources Transformations

tf.data.Datase
tf.data.Dataset
t

...
tf.data.Datase map(func)
t batch(size)
tf.data.Datase ...
t
Basic mechanics

Data sources Transformations

tf.data.Datase
tf.data.Dataset
t

...
tf.data.Datase map(func)
t batch(size)
tf.data.Datase ...
t
Basic mechanics

Data sources Transformations

tf.data.Datase
tf.data.Dataset
t

...
tf.data.Datase map(func)
t batch(size)
tf.data.Datase ...
t
Using an iterator to navigate
dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4])
it = iter(dataset)

>>> while True:

try:
print(next(it))
except StopIteration as e:
break
tf.Tensor(1, shape=(), dtype=int32)
tf.Tensor(2, shape=(), dtype=int32)
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor(4, shape=(), dtype=int32)
Loading numpy arrays (from_tensor_slices)

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))

>>> for image, label in tfds.as_numpy(dataset.take(2)):

print(image.shape, label)
(32, 32, 3) [6]
(32, 32, 3) [9]
First Last Addr Phone Gender Age

Jane Smith 123 555 555 1 3

Anywhere 5555
First Last Addr Phone Gender Age

Jane Smith 123 555 555 1 3

Anywhere 5555

Index Description

0 Male

1 Female

2 Nonbinary

3 Trans

4 Unassigned

... ...
First Last Addr Phone Gender Age

Jane Smith 123 555 555 1 3

Anywhere 5555

Index Description Index Description

0 Male 0 Infant

1 Female 1 Child

2 Nonbinary 2 Teen

3 Trans 3 Young Adult

4 Unassigned 4 Adult

... ... ... ...

Primer on Feature Columns

Categorical column Dense column

Bucketized column

Categorical column (identity) Numeric column

Categorical column (vocabulary) Indicator column
Categorical column (hashed) Embedding column
Crossed column
https://archive.ics.uci.edu/ml/datasets/iris
Numeric column

The Iris dataset has all numeric data as its input features:
● SepalLength

● SepalWidth

● PetalLength

● PetalWidth
Specifying data types

# Defaults to a tf.float32 scalar.

numeric_feature_column = tf.feature_column.numeric_column(key="SepalLength")

# Represent a tf.float64 scalar.

numeric_feature_column = tf.feature_column.numeric_column(key="SepalLength",
dtype=tf.float64)
Shapes for different numeric data

# Represent a 10-element vector in which each cell contains a tf.float32.

vector_feature_column = tf.feature_column.numeric_column(key="Bowling",
shape=10)

# Represent a 10x5 matrix in which each cell contains a tf.float32.

matrix_feature_column = tf.feature_column.numeric_column(key="MyMatrix",
shape=[10,5])
Bucketized column

Date Range Represented as...

< 1960 [1, 0, 0, 0]

>= 1960 but < 1980 [0, 1, 0, 0]

>= 1980 but < 2000 [0, 0, 1, 0]

>= 2000 [0, 0, 0, 1]

Bucketizing features

# First, convert the raw input to a numeric column.

numeric_feature_column = tf.feature_column.numeric_column("Year")

# Then, bucketize the numeric column on the years 1960, 1980, and 2000.
bucketized_feature_column = tf.feature_column.bucketized_column(
source_column = numeric_feature_column,
boundaries = [1960, 1980, 2000])
Categorical identity column
Categorizing identity features

identity_feature_column = tf.feature_column.categorical_column_with_identity(
key='my_feature_b',
num_buckets=4) # Values [0, 4]

def input_fn():
...
return ({ 'my_feature_a':[7, 9, 5, 2], 'my_feature_b':[3, 1, 2, 2] },
[Label_values])
Categorical vocabulary column
Creating a categorical vocab column

From a vocabulary list

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_list(

key=feature_name,

vocabulary_list=["kitchenware", "electronics", "sports"])

From a vocabulary file

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_file(

key=feature_name,

vocabulary_file="product_class.txt",

vocabulary_size=3)
Creating a categorical vocab column

From a vocabulary list

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_list(

key=feature_name,

vocabulary_list=["kitchenware", "electronics", "sports"])

From a vocabulary file

vocabulary_feature_column = tf.feature_column.categorical_column_with_vocabulary_file(

key=feature_name,

vocabulary_file="product_class.txt",

vocabulary_size=3)
Hashed column
hash(raw_feature) % hash_bucket_size
Hashed column

hashed_feature_column = tf.feature_column.categorical_column_with_hash_bucket(

key="some_feature",

hash_bucket_size=100) # The number of categories

Crossed column
# Bucketize the latitude and longitude using the `edges`

latitude_bucket_fc = tf.feature_column.bucketized_column(

tf.feature_column.numeric_column('latitude'),

list(atlanta.latitude.edges))

longitude_bucket_fc = tf.feature_column.bucketized_column(

tf.feature_column.numeric_column('longitude'),

list(atlanta.longitude.edges))

# Cross the bucketized columns, using 5000 hash bins.

crossed_lat_lon_fc = tf.feature_column.crossed_column(

[latitude_bucket_fc, longitude_bucket_fc], 5000)

Crossed column
# Bucketize the latitude and longitude using the `edges`

latitude_bucket_fc = tf.feature_column.bucketized_column(

tf.feature_column.numeric_column('latitude'),

list(atlanta.latitude.edges))

longitude_bucket_fc = tf.feature_column.bucketized_column(

tf.feature_column.numeric_column('longitude'),

list(atlanta.longitude.edges))

# Cross the bucketized columns, using 5000 hash bins.

crossed_lat_lon_fc = tf.feature_column.crossed_column(

[latitude_bucket_fc, longitude_bucket_fc], 5000)

Embedding column
https://www.coursera.org/learn/natural-language-processing-tensorflow
Embedding column

embedding_dimensions = number_of_categories**0.25

categorical_column = ... # Create any categorical column

# Represent the categorical column as an embedding column.

# This means creating an embedding vector lookup table with one element for each

category.

embedding_column = tf.feature_column.embedding_column(

categorical_column=categorical_column,

dimension=embedding_dimensions)
Embedding column

embedding_dimensions = number_of_categories**0.25

categorical_column = ... # Create any categorical column

# Represent the categorical column as an embedding column.

# This means creating an embedding vector lookup table with one element for each

category.

embedding_column = tf.feature_column.embedding_column(

categorical_column=categorical_column,

dimension=embedding_dimensions)
Embedding column

embedding_dimensions = number_of_categories**0.25

categorical_column = ... # Create any categorical column

# Represent the categorical column as an embedding column.

# This means creating an embedding vector lookup table with one element for each

category.

embedding_column = tf.feature_column.embedding_column(

categorical_column=categorical_column,

dimension=embedding_dimensions)
Feature Columns with Keras

from tensorflow import feature_column

from tensorflow.keras import layers

# A utility method to create a feature column

def demo(feature_column):
feature_layer = layers.DenseFeatures(feature_column)
...
Data sources

Numpy DataFrames Images

CSV and Text TFRecords Generators

Numpy Loading a dataset from npz

# Download dataset
DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'
path = tf.keras.utils.get_file('mnist.npz', DATA_URL)

# Extract train and test examples

with np.load(path) as data:
train_examples = data['x_train']
train_labels = data['y_train']
test_examples = data['x_test']

# Create train and test datasets out of the examples

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices(test_examples)

for feat, targ in train_dataset.take(2):

print ('Features shape: {}, Target: {}'.format(feat.shape, targ))
Features shape: (28, 28), Target: 5
Features shape: (28, 28), Target: 0
Pandas Create DataFrames out of CSVs

csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/applied-dl/heart.csv')

df = pd.read_csv(csv_file)
df.head()

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 63 1 1 145 233 1 2 150 0 2.3 3 0 fixed 0

1 67 1 4 160 286 0 2 108 1 1.5 2 3 normal 1

2 67 1 4 120 229 0 2 129 1 2.6 2 2 reversible 0

3 37 1 3 130 250 0 0 187 0 3.5 3 0 normal 0

4 41 0 2 130 204 0 2 172 0 1.4 1 0 normal 0

Pandas Discretizing features

df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes
df.head()

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 63 1 1 145 233 1 2 150 0 2.3 3 0 2 0

1 67 1 4 160 286 0 2 108 1 1.5 2 3 3 1

2 67 1 4 120 229 0 2 129 1 2.6 2 2 4 0

3 37 1 3 130 250 0 0 187 0 3.5 3 0 3 0

4 41 0 2 130 204 0 2 172 0 1.4 1 0 3 0

Pandas Dataset from features and targets

target = df.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))

>>> for feat, targ in dataset.take(5):

print ('Features: {}, Target: {}'.format(feat, targ))

Features: [ 63. 1. 1. 145. 233. 1. 2. 150. 0. 2.3 3. 0. 2. ], Target: 0

Features: [ 67. 1. 4. 160. 286. 0. 2. 108. 1. 1.5 2. 3. 3. ], Target: 1
Features: [ 67. 1. 4. 120. 229. 0. 2. 129. 1. 2.6 2. 2. 4. ], Target: 0
Features: [ 37. 1. 3. 130. 250. 0. 0. 187. 0. 3.5 3. 0. 3. ], Target: 0
Features: [ 41. 0. 2. 130. 204. 0. 2. 172. 0. 1.4 1. 0. 3. ], Target: 0
Images Download and extract images

import pathlib

DATA_URL =
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
data_root_orig = tf.keras.utils.get_file(origin=DATA_URL,
fname='flower_photos', untar=True)
data_root = pathlib.Path(data_root_orig)

label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())

>>> label_names
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
Images Download and extract images

import pathlib

label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())

>>> label_names
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
Images Display a random sample from the loaded dataset

import random
import IPython.display as display

all_image_paths = list(data_root.glob('*/*'))
all_image_paths = [str(path) for path in all_image_paths]
random.shuffle(all_image_paths)

image_count = len(all_image_paths)
image_count

image_path = random.choice(all_image_paths)
display.display(display.Image(image_path))
CSV Loading the structured dataset

TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)

df = pd.read_csv(train_file_path, sep=',')
df.head()

survived sex age n_siblings_spouses parch fare class deck embark_town alone

0 0 male 22.0 1 0 7.2500 Third unknown Southampton n

1 1 female 38.0 1 0 71.2833 First C Cherbourg n

2 1 female 26.0 0 0 7.9250 Third unknown Southampton y

3 1 female 35.0 1 0 53.1000 First C Southampton n

4 0 male 28.0 0 0 8.4583 Third unknown Queenstown y

CSV Loading the structured dataset

TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)

df = pd.read_csv(train_file_path, sep=',')
df.head()

survived sex age n_siblings_spouses parch fare class deck embark_town alone

0 0 male 22.0 1 0 7.2500 Third unknown Southampton n

1 1 female 38.0 1 0 71.2833 First C Cherbourg n

2 1 female 26.0 0 0 7.9250 Third unknown Southampton y

3 1 female 35.0 1 0 53.1000 First C Southampton n

4 0 male 28.0 0 0 8.4583 Third unknown Queenstown y

CSV Numeric data

NUMERIC_FEATURES = ['age','n_siblings_spouses','parch', 'fare']

dense_df = df[NUMERIC_FEATURES]
dense_df.head()

age n_siblings_spouses parch fare

22.0 1 0 7.2500
0

38.0 1 0 71.2833
1

26.0 0 0 7.9250
2

35.0 1 0 53.1000
3
CSV Leveraging features columns

numeric_columns = []
for feature in NUMERIC_FEATURES:
num_col = tf.feature_column.numeric_column(feature)
numeric_columns.append(tf.feature_column.indicator_column(num_col))

>>> numeric_columns
[IndicatorColumn(categorical_column=NumericColumn(key='age', shape=(1,),
default_value=None, dtype=tf.float32, normalizer_fn=None)),
IndicatorColumn(categorical_column=NumericColumn(key='n_siblings_spouses', shape=(1,),
default_value=None, dtype=tf.float32, normalizer_fn=None)),
IndicatorColumn(categorical_column=NumericColumn(key='parch', shape=(1,),
default_value=None, dtype=tf.float32, normalizer_fn=None)),
IndicatorColumn(categorical_column=NumericColumn(key='fare', shape=(1,),
default_value=None, dtype=tf.float32, normalizer_fn=None))]
CSV Leveraging features columns

numeric_columns = []
for feature in NUMERIC_FEATURES:
num_col = tf.feature_column.numeric_column(feature)
numeric_columns.append(tf.feature_column.indicator_column(num_col))

>>> numeric_columns
[IndicatorColumn(categorical_column=NumericColumn(key='age',
shape=(1,), default_value=None, dtype=tf.float32,
normalizer_fn=None)),
...
CSV Categorical data

CATEGORIES = {
'sex': ['male', 'female'],
'class' : ['First', 'Second', 'Third'],
'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
'embark_town' : ['Cherbourg', 'Southampton', 'Queenstown'],
'alone' : ['y', 'n']
}

cat_df = df[list(CATEGORIES.keys())]
cat_df.head()
CSV Categorical data

CATEGORIES = {
cat_df = df[list(CATEGORIES.keys())]
'sex': ['male', 'female'],
cat_df.head()
'class' : ['First', 'Second', 'Third'],
'deck' : ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'],
'embark_town' : ['Cherbourg', 'Southhampton', 'Queenstown'],
'alone' : ['y', 'n']
}

sex class deck embark_town alone

0 male Third unknown Southampton n

1 female First C Cherbourg n

2 female Third unknown Southampton y

3 female First C Southampton n

4 male Third unknown Queenstown y

CSV Categorical columns from raw data

categorical_columns = []
for feature, vocab in CATEGORIES.items():
cat_col = tf.feature_column.categorical_column_with_vocabulary_list(
key=feature, vocabulary_list=vocab)
categorical_columns.append(tf.feature_column.indicator_column(cat_col))
CSV Categorical columns from raw data

>>> categorical_columns
[IndicatorColumn(categorical_column=VocabularyListCategorica
lColumn(key='sex', vocabulary_list=('male', 'female'),
dtype=tf.string, default_value=-1, num_oov_buckets=0)),

IndicatorColumn(categorical_column=VocabularyListCategorical
Column(key='class', vocabulary_list=('First', 'Second',
'Third'), dtype=tf.string, default_value=-1,
num_oov_buckets=0)),
...
CSV Categorical columns from raw data

>>> categorical_columns
[IndicatorColumn(categorical_column=VocabularyListCategorica
lColumn(key='sex', vocabulary_list=('male', 'female'),
dtype=tf.string, default_value=-1, num_oov_buckets=0)),

DIRECTORY_URL =
'https://storage.googleapis.com/download.tensorflow.org/data/ill
iad/'
FILE_NAME = 'cowper.txt'
Text Loading texts with TextLineDataset

file_path = tf.keras.utils.get_file(name,
origin=DIRECTORY_URL + FILE_NAME)

lines_dataset = tf.data.TextLineDataset(file_path)
Text Inspecting texts

>>> for text_data in tfds.as_numpy(lines_dataset.take(3)):

print(text_data.decode('utf-8'))

Achilles sing, O Goddess! Peleus' son;

His wrath pernicious, who ten thousand woes
Caused to Achaia's host, sent many a soul
TFRecord Reading TFRecord files

filenames = [tf_record_filename]
raw_dataset = tf.data.TFRecordDataset(filenames)

feature_description = {
'feature1': tf.io.FixedLenFeature((), tf.string),
'feature2': tf.io.FixedLenFeature((), tf.int64)
}

for raw_record in raw_dataset.take(1):

example = tf.io.parse_single_example(raw_record, feature_description)
print(example)
TFRecord Reading TFRecord files

filenames = [tf_record_filename]
raw_dataset = tf.data.TFRecordDataset(filenames)

feature_description = {
'feature1': tf.io.FixedLenFeature((), tf.string),
'feature2': tf.io.FixedLenFeature((), tf.int64)
}

for raw_record in raw_dataset.take(1):

example = tf.io.parse_single_example(raw_record, feature_description)
print(example)
TFRecord Reading TFRecord files

filenames = [tf_record_filename]
raw_dataset = tf.data.TFRecordDataset(filenames)

feature_description = {
'feature1': tf.io.FixedLenFeature((), tf.string),
'feature2': tf.io.FixedLenFeature((), tf.int64)
}

for raw_record in raw_dataset.take(1):

example = tf.io.parse_single_example(raw_record, feature_description)
print(example)
TFRecord Reading TFRecord files

filenames = [tf_record_filename]
raw_dataset = tf.data.TFRecordDataset(filenames)

feature_description = {
'feature1': tf.io.FixedLenFeature((), tf.string),
'feature2': tf.io.FixedLenFeature((), tf.int64)
}

for raw_record in raw_dataset.take(1):

example = tf.io.parse_single_example(raw_record, feature_description)
print(example)
https://www.coursera.org/learn/convolutional-neural-networks-tensorflow
1.jpg
Images Training Cats
2.jpg

3.jpg

4.jpg
Dogs
5.jpg

6.jpg

7.jpg
Validation Cats
8.jpg

9.jpg
Dogs
10.jpg
Generators Keras ImageDataGenerator

def make_generator():
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255,
rotation_range=20, zoom_range=[0.8, 1.2])

train_generator = train_datagen.flow_from_directory(catsdogs,
target_size=(224, 224), class_mode='categorical',batch_size=32)

return train_generator

train_generator = tf.data.Dataset.from_generator(
make_generator,(tf.float32, tf.uint8))
Generators Keras ImageDataGenerator

def make_generator():
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255,
rotation_range=20, zoom_range=[0.8, 1.2])

train_generator = train_datagen.flow_from_directory(catsdogs,
target_size=(224, 224), class_mode='categorical',batch_size=32)

return train_generator

train_generator = tf.data.Dataset.from_generator(
make_generator,(tf.float32, tf.uint8))
Generators Keras ImageDataGenerator

def make_generator():
train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1. / 255,
rotation_range=20, zoom_range=[0.8, 1.2])

train_generator = train_datagen.flow_from_directory(catsdogs,
target_size=(224, 224), class_mode='categorical',batch_size=32)

return train_generator

train_generator = tf.data.Dataset.from_generator(
make_generator,(tf.float32, tf.uint8))
MNIST Numpy