0% found this document useful (0 votes)
13 views

Aiml Final Rep

Uploaded by

gnaneshkatam0073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Aiml Final Rep

Uploaded by

gnaneshkatam0073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 78

An Internship

Report

on

GOOGLE AI-ML VIRTUAL


INTERNSHIP
Submitted for partial fulfillment of the requirements for the award of degree of

Bachelor of Technology
in
Computer Science & Engineering

BY
K.Gnanesh - 21BQ1A0598

Department Of Computer Science& Engineering

VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY


Approved by AICTE, Permanently Affiliated to JNTU, KAKINADA
Accredited by NBA & Accredited by NAAC with 'A' Grade
NAMBUR(V), PEDAKAKANI(M), GUNTUR(Dt) -522508
Department Of Computer Science &Engineering
VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY
Approved by AICTE, Permanently Affiliated to JNTU, KAKINADA
Accredited by NBA & Accredited by NAAC with 'A' Grade
NAMBUR(V), PEDAKAKANI(M), GUNTUR(Dt) -522508

DECLARATION

I, Gnanesh Katam hereby declare that the course entitled GOOGLE AI-ML
VIRTUAL INTERNSHIP done by me at Vasireddy Venkatadri Institute of
Technology is submitted for partial fulfillment of the requirements for the award of
Credits in Department of CSE. The results embodied in this have not been submitted
to any other University for the same purpose.

Date: K.Gnanesh -21BQ1A0598

Place: Guntur Signature of the Candidate


VASIREDDY VENKATADRI INSTITUTE OF TECHNOLOGY
Department of Computer Science & Engineering

CERTIFICATE

This certificate attests that the following report accurately represents the work
completed by Gnanesh Katam, Registration Number 21BQ1A0598, during the academic
year 2023-2024, covering the time period from Jan to Mar 2024, as part of the GOOGLE
AIML VIRTUAL INTERNSHIP.

Signature of the Internship Coordinator Signature of the HOD

Mr. K. Balakrishna Dr. V.Ramachandran

(Asst. Prof., Department Of CSE ) (Prof., Department Of CSM )


Edu Skills with VVIT:
LETTER OF UNDERTAKING

To
The Principal
Vasireddy Venkatadri Institute of
Technology Namburu,
Guntur.

Subject: Submission of Internship Report on Google AI-ML Virtual


Internship on Eduskills platform.

Respected Sir,
I am pleased to submit my internship report on “Google AI-ML Virtual Internship” as
per your instruction to fulfil the requirements of the Degree of Bachelor of Technology in
CSE from Jawaharlal Nehru Technological University, Kakinada. While preparing this
report, I have tried my level best to include all the relevant information, explanations, things
I learned from the Internship Courses, my contribution to this programme to make the
report informative and comprehensive. It would not have been possible to complete this
report without your assistance, of which I am very thankful. Working for two months on
Google AIML Virtual Internship in online was amazing and a huge learning opportunity for
me. Also, it was a great experience to prepare this report and I will be available for any
clarification, if required.

Therefore, I pray and hope that you would be kind enough to accept my Internship Report and
oblige thereby.

Yours Obediently,
K.Gnanesh

ID:21BQ1A0598
EMAIL: [email protected]
CERTIFICATE OF INTERNSHIP
ACKNOWLEDGEMENT

We take this opportunity to express our deepest gratitude and appreciation to


all those people who made this Internship work easier with words of encouragement, motivation,
discipline, and faith by offering different places to look to expand my ideas and help me towards
the successful completion of this Internship work.

First and foremost, we express our deep gratitude to Mr. Vasireddy VidyaSagar,
Chairman, Vasireddy Venkatadri Institute of Technology for providing necessary facilities
throughout the Computer Science & Engineering program.

We express our sincere thanks to Dr. Y. Mallikarjuna Reddy, Principal, Vasireddy


Venkatadri Institute of Technology for his constant support and cooperation throughout the
Computer Science & Engineering program.

We express our sincere gratitude to Dr. K. Suresh Babu, Professor & HOD,
Information Technology, Vasireddy Venkatadri Institute of Technology for his constant
encouragement, motivation and faith by offering different places to look to expand my ideas.

We would like to express our sincere gratitude to our VVIT INTERNSHIP I/C Mr. YV
Subba Reddy, SPOC and our Internship Coordinator Mr. K. Balakrishna for his insightful
advice, motivating suggestions, invaluable guidance, help and support in successful completion
of this Internship.

We would like to take this opportunity to express our thanks to the teaching and non-
teaching staff in the Department of Computer Science & Engineering, VVIT for their invaluable
help and support.

Katam Gnanesh – 21BQ1A0598


ABSTRACT

The AI-ML Virtual Internship Program is designed to provide participants with


comprehensive exposure to the fields of Artificial Intelligence (AI) and Machine Learning
(ML). This program aims to bridge the gap between academic knowledge and industry
application by offering a robust curriculum that includes theoretical learning, practical
projects, and real-world problem-solving experiences. Participants will engage in a series of
meticulously structured modules that cover essential AI-ML concepts, advanced
algorithms, and the latest trends and technologies in the field.

Throughout the internship, participants will work on hands-on projects that simulate
actual industry scenarios, allowing them to apply their theoretical understanding to practical
challenges. These projects are designed to enhance participants' problem-solving skills,
creativity, and technical proficiency. By collaborating with peers and mentors, interns will
develop a deeper understanding of the complexities and nuances of AI-ML applications,
preparing them for future roles in the tech industry.

Moreover, the program emphasizes the development of soft skills crucial for professional
growth, such as teamwork, communication, and project management. Interactive sessions
with industry experts and thought leaders will provide valuable insights into the current
landscape of AI and ML, as well as emerging trends and future directions. Participants will
also benefit from personalized feedback and career guidance, helping them to refine their
career aspirations and pathways.

In conclusion, the AI-ML Virtual Internship Program is a transformative experience


that equips participants with the knowledge, skills, and experiences necessary to excel in
the dynamic field of artificial intelligence and machine learning. By the end of the program,
interns will have a solid foundation in AI-ML principles, hands-on project experience, and
enhanced professional competencies, positioning them as competitive candidates in the tech
industry.
Table of Contents:
Google AIML virtual internship
Unit Module Contents Date Page No

Program neural networks with TensorFlow 02-01-24


1.The Hello World of Machine Learning
2.Introduction to Computer Vision
1 - 34
3.Introduction to Convolutions
Unit 1 4.Convolutional Neural Networks (CNNs)
5.Complex Images
6.Use CNNS with larger datasets
Get started with object detection 18-01-24
1.Introduction to object
detection 35- 43
Unit 2
2. Build an object detector into your mobile app
3. Integrate an object detector using ML Kit Object Detection API
Go further with object detection 31-01-24
1. Train your own object-detection model

2. Build and deploy a custom object-detection model with TensorFlow Lite 44 - 54


Unit 3

Get started with product image search 20-02-24


1.Introduction to product image search on mobile
2.Build an object detector into your mobile app
3.Detect objects in images to build a visual product search: Android 55 - 62
Unit 4 4.Object detection: static images
5.Object detection: live camera

Go further with product image search 10-03-24


1.Call the product search backend from the mobile app
2.Call the product search backend from the Android 63 - 85
Unit 5
app
3.Build a visual product search backend using Vision API Product Search
Go further with image classification 30-03-24
1.Build a flower recognizer
Unit 6
2. Create a custom model for your image classifier 86 - 99

3. Integrate a custom model into your app


AICTE INTERNSHIP WEEKLY REPORT
Department of CSE, VVIT
Student Roll Number : 21BQ1A0598

Student Name : Katam Gnanesh

Branch (AIML/CSM) : CSE

Year of Study : 4th

AICTE Student Profile ID : STU64313bfd458941680948221

AICTE Regd. E-Mail ID : [email protected]

Contact Number : 7013895796 Internship Course Taken : Google AIML


Virtual Internship
Objective of the
Student
Weeks & Dates Activity Learning Outcome
Signature
Done

Intro to computer
Week-1 Understood complex images
vision

Intro to object
Week-2 detector and in depth Built an object detector
context

Train your own object- Build and deploy a custom object-


Week-3 detection model detection model

Detect objects in
images to build a
Week-4 visual product search: Object detection: static images

Android

Object detection: live


Week-5 Learnt about TensorFlow Lite
camera

Create a custom model for your


Build a flower
Week-6 recognizer
image classifier

Go further with object


detection Train your own object-detection
Week-7 model

Use CNNS with larger The Hello World of Machine


Week-8 datasets Learning

.Built an
Introduction to product
Week-9 image search on object
mobile detector
Program neural networks with TensorFlow

MODULE 1: The Hello World of Machine Learning

What is ML?

Consider the traditional manner of building apps, as represented in the following diagram:

You express rules in a programming language. They act on data and your program provides
answers**.** In the case of the activity detection, the rules (the code you wrote to defineactivity
types) acted upon the data (the person's movement speed) to produce an answer: the returnvalue
from the function for determining the activity status of the user (whether they were walking,
running, biking, or doing something else).

The process for detecting that activity status via ML is very similar, only the axes are different.

Instead of trying to define the rules and express them in a programming language, you provide the
answers (typically called labels) along with the data, and the machine infers the rules that
determine the relationship between the answers and data. For example, your activity detection
scenario might look like this in an ML context:

Page 1 of 105
Beyond being an alternative method to programming that scenario, that approach also gives you
the ability to open new scenarios, such as the golfing one that may not have been possible under
the rules-based traditional programming approach.

In traditional programming, your code compiles into a binary that is typically called a program. In
ML, the item that you create from the data and labels is called a model.

So, if you go back to this diagram:

Consider the result of that to be a model, which is used like this at runtime:

You pass the model some data and the model uses the rules that it inferred from the training to
make a prediction, such as, "That data looks like walking," or "That data looks like biking."

Page 2 of 105
Create your first ML model:

Consider the following sets of numbers. Can you see the relationship between them?

X: -1 0 1 2 3 4

Y: -2 1 4 7 10 13

As you look at them, you might notice that the value of X is increasing by 1 as you read left to
right and the corresponding value of Y is increasing by 3. You probably think that Y equals 3X
plus or minus something. Then, you'd probably look at the 0 on X and see that Y is 1, and you'd
come up with the relationship Y=3X+1.

How would you train a neural network to do the equivalent task? Using data! By feeding it with a
set of X's and a set of Y's, it should be able to figure out the relationship between them.

Start with your imports. Here, you're importing TensorFlow and calling it tf for ease of

use. Next, import a library called numpy, which represents your data as lists easily and

quickly.

The framework for defining a neural network as a set of sequential layers is called keras, so import
that, too.

import tensorflow as tf
import numpy as np
from tensorflow import keras
Define and compile the neural network:

Next, create the simplest possible neural network. It has one layer, that layer has one neuron, and
the input shape to it is only one value.

model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

Next, write the code to compile your neural network. When you do so, you need to specify two
functions—a loss and an optimizer.

In this example, you know that the relationship between the numbers is Y=3X+1.

When the computer is trying to learn that, it makes a guess, maybe Y=10X+10. The loss function
measures the guessed answers against the known correct answers and measures how well or badly
it did.

Page 3 of 105
Page 4 of 105
Next, the model uses the optimizer function to make another guess. Based on the loss function's
result, it tries to minimize the loss. At this point, maybe it will come up with something like
Y=5X+5. While that's still pretty bad, it's closer to the correct result (the loss is lower).

First, here's how to tell it to use mean_squared_error for the loss and stochastic gradient descent
(sgd) for the optimizer. You don't need to understand the math for those yet, but you can see that
they work!

model.compile(optimizer='sgd', loss='mean_squared_error')
Provide the data:

Next, feed some data. In this case, you take the six X and six Y variables from earlier. You can see
that the relationship between those is that Y=3X+1, so where X is -1, Y is -2.

A python library called NumPy provides lots of array type data structures to do this. Specify the
values as an array in NumPy with np.array[].

xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)


ys = np.array([-2.0, 1.0, 4.0, 7.0, 10.0, 13.0], dtype=float)

Train the neural network:

The process of training the neural network, where it learns the relationship between the X's and Y's,
is in the model.fit call. That's where it will go through the loop before making a guess, measuring
how good or bad it is (the loss), or using the optimizer to make another guess. It will do that for the
number of epochs that you specify. When you run that code, you'll see the loss will be printed out
for each epoch.

model.fit(xs, ys, epochs=500)

For example, you can see that for the first few epochs, the loss value is quite large, but it's getting
smaller with each step.

Page 5 of 105
As the training progresses, the loss soon gets very small.

By the time the training is done, the loss is extremely small, showing that our model is doing a
great job of inferring the relationship between the numbers.

You probably don't need all 500 epochs and can experiment with different amounts. As you can
see from the example, the loss is really small after only 50 epochs, so that might be enough!

You have a model that has been trained to learn the relationship between X and Y. You can use
the model.predict method to have it figure out the Y for a previously unknown X. For example, if
X is 10, what do you think Y will be? Take a guess before you run the following code:

print(model.predict([10.0]))

You might have thought 31, but it ended up being a little over. Why do you think that is?

Neural networks deal with probabilities, so it calculated that there is a very high probability that
the relationship between X and Y is Y=3X+1, but it can't know for sure with only six data points.
The result is very close to 31, but not necessarily 31.

As you work with neural networks, you'll see that pattern recurring. You will almost always deal
with probabilities, not certainties, and will do a little bit of coding to figure out what the result is
based on the probabilities, particularly when it comes to classification.

Page 6 of 105
MODULE2: Introduction to Computer Vision:

Start by importing TensorFlow.

import tensorflow as tf
print(tf. version )

You'll train a neural network to recognize items of clothing from a common dataset called Fashion
MNIST. It contains 70,000 items of clothing in 10 different categories. Each item of clothing is in
a 28x28 grayscale image. You can see some examples here:

Label Description

0 T-shirt/top

1 Trouser

2 Pullover

3 Dress

4 Coat

5 Sandal

6 Shirt

7 Sneaker

8 Bag

9 Ankle boot

Page 7 of 105
The labels associated with the dataset are:

The Fashion MNIST data is available in the tf.keras.datasets API. Load it like this:

mnist = tf.keras.datasets.fashion_mnist

Calling load_data on that object gives you two sets of two lists: training values and testing values,
which represent graphics that show clothing items and their labels.

(training_images, training_labels), (test_images, test_labels) = mnist.load_data()

What do those values look like? Print a training image and a training label to see. You can
experiment with different indices in the array.

import matplotlib.pyplot as plt


plt.imshow(training_images[0])
print(training_labels[0])
print(training_images[0])

The print of the data for item 0 looks like this:

You'll notice that all the values are integers between 0 and 255. When training a neural network,
it's easier to treat all values as between 0 and 1, a process called normalization. Fortunately,
Python provides an easy way to normalize a list like that without looping.

training_images = training_images / 255.0


test_images = test_images / 255.0

You may also want to look at 42, a different boot than the one at index 0.

Now, you might be wondering why there are two datasets—training and testing.

Page 8 of 105
The idea is to have one set of data for training and another set of data that the model hasn't yet
encountered to see how well it can classify values. After all, when you're done, you'll want to use
the model with data that it hadn't previously seen! Also, without separate testing data, you'll run
the risk of the network only memorizing its training data without generalizing its knowledge.

Design the model

Now design the model. You'll have three layers. Go through them one-by-one and explore the
different types of layers and the parameters used for each.

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,
activation=tf.nn.relu),
tf.keras.layers.Dense(10,
activation=tf.nn.softmax)])

 Sequential defines a sequence of layers in the neural network.


 Flatten takes a square and turns it into a one-dimensional vector.
 Dense adds a layer of neurons.
 Activation functions tell each layer of neurons what to do. There are lots of options, but use
these for now:
 Relu effectively means that if X is greater than 0 return X, else return 0. It only passes
values of 0 or greater to the next layer in the network.
 Softmax takes a set of values, and effectively picks the biggest one. For example, if the
output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], then it saves
you from having to sort for the largest value—it returns [0,0,0,0,1,0,0,0,0].

Compile and train the model:

Now that the model is defined, the next thing to do is build it. Create a model by first compiling it
with an optimizer and loss function, then train it on your training data and labels. The goal is to
have the model figure out the relationship between the training data and its training labels. Later,
you want your model to see data that resembles your training data, then make a prediction about
what that data should look like.

model.compile(optimizer = tf.keras.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',

Page 9 of 105
metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)

When model.fit executes, you'll see loss and accuracy:

Epoch 1/5
60000/60000 [=======] - 6s 101us/sample - loss: 0.4964 - acc: 0.8247
Epoch 2/5
60000/60000 [=======] - 5s 86us/sample - loss: 0.3720 - acc: 0.8656
Epoch 3/5
60000/60000 [=======] - 5s 85us/sample - loss: 0.3335 - acc: 0.8780
Epoch 4/5
60000/60000 [=======] - 6s 103us/sample - loss: 0.3134 - acc: 0.8844
Epoch 5/5
60000/60000 [=======] - 6s 94us/sample - loss: 0.2931 - acc: 0.8926

Test the model:

How would the model perform on data it hasn't seen? That's why you have the test set. You
call model.evaluate and pass in the two sets, and it reports the loss for each. Give it a try:

model.evaluate(test_images, test_labels)

And here's the output:

10000/10000 [=====] - 1s 56us/sample - loss: 0.3365 - acc: 0.8789


[0.33648381242752073, 0.8789]

That example returned an accuracy of .8789, meaning it was about 88% accurate. (You might
have slightly different values.)

As expected, the model is not as accurate with the unknown data as it was with the data it was
trained on! As you learn more about TensorFlow, you'll find ways to improve that.

Page 10 of 105
MODULE3: Introduction to Convolutions

What are convolutions?

A convolution is a filter that passes over an image, processes it, and extracts the important features.

Let's say you have an image of a person wearing a sneaker. How would you detect that a sneaker is
present in the image? In order for your program to "see" the image as a sneaker, you'll have to
extract the important features, and blur the inessential features. This is called feature mapping.

The feature mapping process is theoretically simple. You'll scan every pixel in the image and then
look at its neighboring pixels. You multiply the values of those pixels by the equivalent weights in
a filter.

For example:

In this case, a 3x3 convolution matrix, or image kernel, is specified.

The current pixel value is 192. You can calculate the value of the new pixel by looking at the
neighbor values, multiplying them by the values specified in the filter, and making the new pixel
value the final amount.

CODE:

Start by importing some Python libraries and the ascent picture:

import cv2
import numpy as np

Page 11 of 105
from scipy import misc
i = misc.ascent()

Next, use the Pyplot library matplotlib to draw the image so that you know what it looks like:

import matplotlib.pyplot as plt


plt.grid(False)
plt.gray()

plt.axis('off')
plt.imshow(i)
plt.show()

The image is stored as a NumPy array, so we can create the transformed image by just copying
that array. The size_x and size_y variables will hold the dimensions of the image so you can
loop over it later.

i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]

Create the convolution matrix:

First, make a convolution matrix (or kernel) as a 3x3 array:

# This filter detects edges nicely


# It creates a filter that only passes through sharp edges and straight lines.
# Experiment with different values for fun effects.
#filter = [ [0, 1, 0], [1, -4, 1], [0, 1, 0]]
# A couple more filters to try for fun!
filter = [ [-1, -2, -1], [0, 0, 0], [1, 2, 1]]
#filter = [ [-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]
weight = 1

Page 12 of 105
That means that the current pixel's neighbor above it and to the left of it will be multiplied by the
top-left item in the filter. Then, multiply the result by the weight and ensure that the result is in the
range 0 through 255.

for x in range(1,size_x-1):
for y in range(1,size_y-1):
output_pixel = 0.0
output_pixel = output_pixel + (i[x - 1, y-1] * filter[0][0])
output_pixel = output_pixel + (i[x, y-1] * filter[0][1])
output_pixel = output_pixel + (i[x + 1, y-1] * filter[0][2])
output_pixel = output_pixel + (i[x-1, y] * filter[1][0])
output_pixel = output_pixel + (i[x, y] * filter[1][1])
output_pixel = output_pixel + (i[x+1, y] * filter[1][2])
output_pixel = output_pixel + (i[x-1, y+1] * filter[2][0])
output_pixel = output_pixel + (i[x, y+1] * filter[2][1])
output_pixel = output_pixel + (i[x+1, y+1] * filter[2][2])
output_pixel = output_pixel * weight
if(output_pixel<0):
output_pixel=0
if(output_pixel>255):
output_pixel=255
i_transformed[x, y] = output_pixel

Examine the results:

Now, plot the image to see the effect of passing the filter over it:
plt.gray()
plt.grid(False)
plt.imshow(i_transformed)
#plt.axis('off')
plt.show()

Page 13 of 105
Consider the following filter values and their impact on the image.

Using [-1,0,1,-2,0,2,-1,0,1] gives you a very strong set of vertical lines:

Using [-1,-2,-1,0,0,0,1,2,1] gives you horizontal lines:

Understanding Pooling

Iterate over the image and, at each point, consider the pixel and its immediate neighbors to the
right, beneath, and right-beneath. Take the largest of those (hence max pooling) and load it into the
new image. Thus, the new image will be one fourth the size of the old.

Code for pooling:

The following code will show a (2, 2) pooling. Run it to see the output.

You'll see that while the image is one-fourth the size of the original, it kept all the features.

Page 14 of 105
new_x = int(size_x/2)
new_y = int(size_y/2)
newImage = np.zeros((new_x, new_y))
for x in range(0, size_x, 2):
for y in range(0, size_y, 2):
pixels = []
pixels.append(i_transformed[x, y])
pixels.append(i_transformed[x+1, y])
pixels.append(i_transformed[x, y+1])
pixels.append(i_transformed[x+1, y+1])
pixels.sort(reverse=True)
newImage[int(x/2),int(y/2)] = pixels[0]
# Plot the image. Note the size of the axes -- now 256 pixels instead of 512
plt.gray()
plt.grid(False)
plt.imshow(newImage)
#plt.axis('off')
plt.show()

Note the axes of that plot. The image is now 256x256, one-fourth of its original size, and the
detected features have been enhanced despite less data now being in the image.

Page 15 of 105
MODULE4: Convolutional Neural Networks (CNNs)

Improve computer vision accuracy with convolutions:

You now know how to do fashion image recognition using a Deep Neural Network (DNN)
containing three layers— the input layer (in the shape of the input data), the output layer (in the
shape of the desired output) and a hidden layer. You experimented with several parameters that
influence the final accuracy, such as different sizes of hidden layers and number of training
epochs.

For convenience, here's the entire code again. Run it and take a note of the test accuracy that is
printed out at the end.

import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))

Try the code:

Run the following code. It's the same neural network as earlier, but this time with convolutional
layers added first. It will take longer, but look at the impact on the accuracy:

import tensorflow as tf
print(tf. version )
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)

Page 16 of 105
test_images=test_images / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))

It's likely gone up to about 93% on the training data and 91% on the validation data.

Gather the data:

The first step is to gather the data.

import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images = training_images/255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images = test_images/255.0

Define the model:

Next, define your model. Instead of the input layer at the top, you're going to add a convolutional
layer. The parameters are:

 The number of convolutions you want to generate. A value like 32 is a good starting point.

 The size of the convolutional matrix, in this case a 3x3 grid.

 image size is reduced in the following way:

Page 17 of 105
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 26, 26, 64) 640

max_pooling2d_2 (MaxPooling2) (None, 13, 13, 64) 0

conv2d_3 (Conv2D) (None, 11, 11, 64) 36928

max_pooling2d_3 (MaxPooling2 ) (None, 5, 5, 64) 0

flatten_2 (Flatten) (None, 1600) 0

dense_4 (Dense) (None, 128) 204928

dense_5 (Dense) (None, 10) 1290

Here's the full code for the CNN:

model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
#Add another convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
#Now flatten the output. After this you'll just have the same DNN structure as the non
convolutional version
tf.keras.layers.Flatten(),
#The same 128 dense layers, and 10 output layers as in the pre-convolution example:
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')])

Compile and train the model

Compile the model, call the fit method to do the training, and evaluate the loss and accuracy from
the test set.

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


model.fit(training_images, training_labels, epochs=5)

Page 18 of 105
test_loss, test_acc = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_acc*100))

Visualize the convolutions and pooling

This code shows you the convolutions graphically. The print (test_labels[:100]) shows the first 100
labels in the test set, and you can see that the ones at index 0, index 23 and index 28 are all the
same value (9).

print(test_labels[:100])

[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
5796137672122445822848077851123987026
2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]

through the convolutions. So, in the following


code, FIRST_IMAGE, SECOND_IMAGE and THIRD_IMAGE are all the indexes for value 9, an
ankle boot.

import matplotlib.pyplot as plt


f, axarr = plt.subplots(3,4)
FIRST_IMAGE=0
SECOND_IMAGE=23
THIRD_IMAGE=28
CONVOLUTION_NUMBER = 6
from tensorflow.keras import models
layer_outputs = [layer.output for layer in model.layers]
activation_model = tf.keras.models.Model(inputs = model.input, outputs = layer_outputs)
for x in range(0,4):
f1 = activation_model.predict(test_images[FIRST_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[0,x].imshow(f1[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[0,x].grid(False)
f2 = activation_model.predict(test_images[SECOND_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[1,x].imshow(f2[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[1,x].grid(False)
f3 = activation_model.predict(test_images[THIRD_IMAGE].reshape(1, 28, 28, 1))[x]
axarr[2,x].imshow(f3[0, : , :, CONVOLUTION_NUMBER], cmap='inferno')
axarr[2,x].grid(False)

And you should see something like the following, where the convolution is taking the essence of
the sole of the shoe, effectively spotting that as a common feature across all shoes.

Page 19 of 105
MODULE5: Complex Images

In this codelab you'll use convolutions to classify images of horses and humans. You'll be using
TensorFlow in this lab to create a CNN that is trained to recognize images of horses and humans,
and classify them.

Getting Started: Acquire the data

First, download the data:

!wget \
https://storage.googleapis.com/learning-datasets/horse-or-human.zip \
-O /tmp/horse-or-human.zip

The following Python code will use the OS library to use operating system libraries, giving you
access to the file system and the zip file library, therefore allowing you to unzip the data.

import os
import zipfile
local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()

The contents of the zip file are extracted to the base directory /tmp/horse-or-human, which contain
horses and human subdirectories.

In short, the training set is the data that is used to tell the neural network model that "this is what a
horse looks like" and "this is what a human looks like."

Use the ImageGenerator to label and prepare the data

You do not explicitly label the images as horses or humans.

Later you'll see something called an ImageDataGenerator being used. It reads images from
subdirectories and automatically labels them from the name of that subdirectory. For example, you
have a training directory containing a horses directory and a humans
directory. ImageDataGenerator will label the images appropriately for you, reducing a coding step.

# Directory with our training horse pictures


train_horse_dir = os.path.join('/tmp/horse-or-human/horses')

Page 20 of 105
# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')

Now, see what the filenames look like in the horses and humans training directories:

train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])
train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])
print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))

First, configure the matplot parameters:

%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4
# Index for iterating over images
pic_index = 0

Now, display a batch of eight horse pictures and eight human pictures. You can rerun the cell to
see a fresh batch each time.

# Set up matplotlib fig, and size it to fit 4x4 pics


fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)
pic_index += 8
next_horse_pix = [os.path.join(train_horse_dir, fname)
for fname in train_horse_names[pic_index-8:pic_index]]
next_human_pix = [os.path.join(train_human_dir, fname)
for fname in train_human_names[pic_index-8:pic_index]]
for i, img_path in enumerate(next_horse_pix+next_human_pix):
# Set up subplot; subplot indices start at 1
sp = plt.subplot(nrows, ncols, i + 1)
sp.axis('Off') # Don't show axes (or gridlines)
img = mpimg.imread(img_path)

Page 21 of 105
plt.imshow(img)
plt.show()
Here are some example images showing horses and humans in different poses and orientations:

Define the model:

Start defining the model.

Begin by importing TensorFlow:

import tensorflow as tf

Then, add convolutional layers and flatten the final result to feed into the densely connected
layers. Finally, add the densely connected layers.

model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),

Page 22 of 105
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for
the other ('humans')
tf.keras.layers.Dense(1, activation='sigmoid')
])

The model.summary() method call prints a summary of the network.

model.summary()

You can see the results here:

Layer (type) Output Shape Param #


=================================================================
conv2d (Conv2D) (None, 298, 298, 16) 448

max_pooling2d (MaxPooling2D) (None, 149, 149, 16) 0

conv2d_1 (Conv2D) (None, 147, 147, 32) 4640

max_pooling2d_1 (MaxPooling2 (None, 73, 73, 32) 0

conv2d_2 (Conv2D) (None, 71, 71, 64) 18496

max_pooling2d_2 (MaxPooling2 (None, 35, 35, 64) 0

conv2d_3 (Conv2D) (None, 33, 33, 64) 36928

max_pooling2d_3 (MaxPooling2 (None, 16, 16, 64) 0

Page 23 of 105
conv2d_4 (Conv2D) (None, 14, 14, 64) 36928

max_pooling2d_4 (MaxPooling2 (None, 7, 7, 64) 0

flatten (Flatten) (None, 3136) 0

dense (Dense) (None, 512) 1606144

dense_1 (Dense) (None, 1) 513


=================================================================
Total params: 1,704,097
Trainable params: 1,704,097
Non-trainable params: 0

The output shape column shows how the size of your feature map evolves in each successive layer.
The convolution layers reduce the size of the feature maps by a bit due to padding and each
pooling layer halves the dimensions.

Compile the model

Next, configure the specifications for model training. Train your model with the
binary_crossentropy loss because it's a binary classification problem and your final activation isa
sigmoid. (For a refresher on loss metrics, see Descending into ML.) Use the rmsprop optimizer
with a learning rate of 0.001. During training, monitor classification accuracy.

Note: In this case, using the RMSprop optimization algorithm is preferable to stochastic gradient
descent (SGD) because RMSprop automates learning-rate tuning for you. (Other optimizers, such
as Adam and Adagrad, also automatically adapt the learning rate during training and would work
equally well here.)

from tensorflow.keras.optimizers import RMSprop


model.compile(loss='binary_crossentropy',
optimizer=RMSprop(lr=0.001),
metrics=['acc'])

Train the model from generators:

Set up data generators that read pictures in your source folders, convert them to float32 tensors,
and feed them (with their labels) to your network.

Page 24 of 105
In Keras, that can be done via the keras.preprocessing.image.ImageDataGenerator class using the
rescale parameter. That ImageDataGenerator class allows you to instantiate generators of
augmented image batches (and their labels) via .flow(data, labels) or
.flow_from_directory(directory). Those generators can then be used with the Keras model
methods that accept data generators as
inputs: fit_generator, evaluate_generator and predict_generator.

from tensorflow.keras.preprocessing.image import ImageDataGenerator


# All images will be rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
# Flow training images in batches of 128 using train_datagen generator
train_generator = train_datagen.flow_from_directory(
'/tmp/horse-or-human/', # This is the source directory for training images
target_size=(300, 300), # All images will be resized to 150x150
batch_size=128,
# Since we use binary_crossentropy loss, we need binary labels
class_mode='binary')

Do the training:

Train for 15 epochs. (That may take a few minutes to run.)

history = model.fit(
train_generator,
steps_per_epoch=8,
epochs=15,
verbose=1)

Note the values per epoch.

The Loss and Accuracy are a great indication of progress of training. It's making a guess as to the
classification of the training data, and then measuring it against the known label, calculating the
result. Accuracy is the portion of correct guesses.

Epoch 1/15
9/9 [==============================] - 9s 1s/step - loss: 0.8662 - acc: 0.5151
Epoch 2/15
9/9 [==============================] - 8s 927ms/step - loss: 0.7212 - acc: 0.5969
Epoch 3/15
9/9 [==============================] - 8s 921ms/step - loss: 0.6612 - acc: 0.6592
Epoch 4/15

Page 25 of 105
9/9 [==============================] - 8s 925ms/step - loss: 0.3135 - acc: 0.8481
Epoch 5/15
9/9 [==============================] - 8s 919ms/step - loss: 0.4640 - acc: 0.8530
Epoch 6/15
9/9 [==============================] - 8s 896ms/step - loss: 0.2306 - acc: 0.9231
Epoch 7/15
9/9 [==============================] - 8s 915ms/step - loss: 0.1464 - acc: 0.9396
Epoch 8/15
9/9 [==============================] - 8s 935ms/step - loss: 0.2663 - acc: 0.8919
Epoch 9/15
9/9 [==============================] - 8s 883ms/step - loss: 0.0772 - acc: 0.9698
Epoch 10/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0403 - acc: 0.9805
Epoch 11/15
9/9 [==============================] - 8s 891ms/step - loss: 0.2618 - acc: 0.9075
Epoch 12/15
9/9 [==============================] - 8s 902ms/step - loss: 0.0434 - acc: 0.9873
Epoch 13/15
9/9 [==============================] - 8s 904ms/step - loss: 0.0187 - acc: 0.9932
Epoch 14/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0974 - acc: 0.9649
Epoch 15/15
9/9 [==============================] - 8s 877ms/step - loss: 0.2859 - acc: 0.9338

Test the model:

The code will allow you to choose one or more files from your file system. It will then upload
them and run them through the model, giving an indication of whether the object is a horse or a
human.

That's due to something called overfitting, which means that the neural network is trained with
very limited data (there are only roughly 500 images of each class). So it's very good at
recognizing images that look like those in the training set, but it can fail a lot at images that are not
in the training set.

import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()

Page 26 of 105
for fn in uploaded.keys():
# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a human")
else:
print(fn + " is a horse")

For example, say that you want to test with this image:

Here's what the colab produces:

Despite it being a cartoon graphic, it still classifies correctly.

The following image also classifies correctly:

Page 27 of 105
Visualize intermediate representations:

Pick a random image from the training set, then generate a figure where each row is the output of a
layer and each image in the row is a specific filter in that output feature map. Rerun that cell to
generate intermediate representations for a variety of training images.

import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)
img = load_img(img_path, target_size=(300, 300)) # this is a PIL image
x = img_to_array(img) # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape) # Numpy array with shape (1, 150, 150, 3)
# Rescale by 1/255
x /= 255
# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)
# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]
# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Just do this for the conv / maxpool layers, not the fully-connected layers
n_features = feature_map.shape[-1] # number of features in feature map
# The feature map has shape (1, size, size, n_features)
size = feature_map.shape[1]
# We will tile our images in this matrix
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):

Page 28 of 105
# Postprocess the feature to make it visually palatable
x = feature_map[0, :, :, i]
x -= x.mean()
if x.std()>0:
x /= x.std()
x *= 64
x += 128
x = np.clip(x, 0, 255).astype('uint8')
# We'll tile each filter into this big horizontal grid
display_grid[:, i * size : (i + 1) * size] = x
# Display the grid
scale = 20. /
n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')

Here are example results:

As you can see, you go from the raw pixels of the images to increasingly abstract and compact
representations. The representations downstream start highlighting what the network pays attention
to, and they show fewer and fewer features being "activated." Most are set to zero. That'scalled
sparsity. Representation sparsity is a key feature of deep learning.

Those representations carry increasingly less information about the original pixels of the image,
but increasingly refined information about the class of the image. You can think of a CNN (or a
deep network in general) as an information distillation pipeline.
Page 29 of 105
Add on-device object detection:

In this step, you will add the functionality to the starter app to detect objects in images. As you saw
in the previous step, the starter app contains boilerplate code to take photos with the camera app on
the device. There are also 3 preset images in the app that you can try object detection on if you are
running the codelab on an Android emulator.

When you have selected an image, either from the preset images or taking a photo with the camera
app, the boilerplate code decodes that image into a Bitmap instance, shows it on the screen and
calls the runObjectDetection method with the image.

There are only 3 simple steps with 3 APIs to set up ML Kit ODT:

 prepare an image: InputImage

 create a detector object: ObjectDetection.getClient(options)

 connect the 2 objects above: process(image)

You achieve these inside the function runObjectDetection(bitmap: Bitmap) in


file MainActivity.kt.

/**
* ML Kit Object Detection Function
*/
private fun runObjectDetection(bitmap: Bitmap) {
}

Step 1: Create an InputImage

ML Kit provides a simple API to create an InputImage from a Bitmap. Then you can feed
an InputImage into the ML Kit APIs.

// Step 1: create ML Kit's InputImage object


val image = InputImage.fromBitmap(bitmap, 0)

Add the above code to the top of runObjectDetection(bitmap:Bitmap).

Step 2: Create a detector instance

ML Kit follows Builder Design Pattern. You will pass the configuration to the builder, then acquire
a detector from it. There are 3 options to configure (the options in bold are used in this codelab):

Page 30 of 105
 detector mode (single image or stream)

 detection mode (single or multiple object detection)

 classification mode (on or off)

This codelab is for single image - multiple object detection & classification. Add that now:

// Step 2: acquire detector object


val options = ObjectDetectorOptions.Builder()
.setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
.enableMultipleObjects()
.enableClassification()
.build()
val objectDetector = ObjectDetection.getClient(options)

Step 3: Feed image(s) to the detector

Object detection and classification is async processing:

 You send an image to the detector (via process()).

 The Detector works pretty hard on it.

 The Detector reports the result back to you via a callback.

The following code does just that (copy and append it to the existing code inside fun
runObjectDetection(bitmap:Bitmap)):

// Step 3: feed given image to detector and setup callback


objectDetector.process(image)
.addOnSuccessListener {
// Task completed successfully
debugPrint(it)
}
.addOnFailureListener {
// Task failed with an exception
Log.e(TAG, it.message.toString())
}

Upon completion, detector notifies you with:

 The total number of objects detected. Each detected object is described with:

Page 31 of 105
 trackingId: an integer you use to track it cross frames (NOT used in this codelab).

 boundingBox: the object's bounding box.

 labels: a list of label(s) for the detected object (only when classification is enabled):

 index (Get the index of this label)

 text (Get the text of this label including "Fashion Goods", "Food", "Home Goods",
"Place", "Plant")

 confidence ( a float between 0.0 to 1.0 with 1.0 means 100%)

private fun debugPrint(detectedObjects: List<DetectedObject>) {


detectedObjects.forEachIndexed { index, detectedObject ->
val box = detectedObject.boundingBox
Log.d(TAG, "Detected object: $index")
Log.d(TAG, " trackingId: ${detectedObject.trackingId}")
Log.d(TAG, " boundingBox: (${box.left}, ${box.top}) - (${box.right},${box.bottom})")
detectedObject.labels.forEach {
Log.d(TAG, " categories: ${it.text}")
Log.d(TAG, " confidence: ${it.confidence}")
}
}
}

Let's run the codelab by clicking Run ( ) in the Android Studio toolbar. Try selecting a preset
image, or take a photo, then look at the logcat window( ) inside the IDE.

D/MLKit Object Detection: Detected object: 0


D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (481, 2021) - (2426,3376)
D/MLKit Object Detection: categories: Food
D/MLKit Object Detection: confidence: 0.90234375
D/MLKit Object Detection: Detected object: 1
D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (2639, 2633) - (3058,3577)
D/MLKit Object Detection: Detected object: 2
D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (3, 1816) - (615,2597)
D/MLKit Object Detection: categories: Home good
D/MLKit Object Detection: confidence: 0.75390625

Page 32 of 105
...which means that the detector saw 3 objects:

 The categories are Food and Home good.

 There is no category returned for the 2nd because it is an unknown class.

 No trackingId (because this is the single image detection mode).

 The position inside the boundingBox rectangle (e.g. (481, 2021) – (2426, 3376))

 The detector is pretty confident that the 1st is a Food (90% confidence—it was salad).

There is some boilerplate code inside the codelab to help you visualize the detection result.
Leverage these utilities to make our visualization code simple:

 data class BoxWithText(val box: Rect, val text: String) This is a data class to store an
object detection result for visualization. box is the bounding box where the object locates,
and text is the detection result string to display together with the object's bounding box.

 fun drawDetectionResult(bitmap: Bitmap, detectionResults: List<BoxWithText>): Bitmap


This method draws the object detection results in detectionResults on theinput bitmap
and returns the modified copy of it.

Here is an example of an output of the drawDetectionResult utility method:

Go to where you call debugPrint() and add the following code snippet below it:

// Parse ML Kit's DetectedObject and create corresponding visualization data


val detectedObjects = it.map { obj ->
var text = "Unknown"
// We will show the top confident detection result if it exist
if (obj.labels.isNotEmpty()) {
val firstLabel = obj.labels.first()
text = "${firstLabel.text}, ${firstLabel.confidence.times(100).toInt()}%"
}

Page 33 of 105
BoxWithText(obj.boundingBox, text)
}
// Draw the detection result on the input bitmap
val visualizedResult = drawDetectionResult(bitmap, detectedObjects)
// Show the detection result on the app screen
runOnUiThread {
inputImageView.setImageBitmap(visualizedResult)}

Now click Run ( ) in the Android Studio toolbar.

Once the app loads, press the Button with the camera icon, point your camera to an object, take a
photo, accept the photo (in Camera App) or you can easily tap any preset images. You should see
the detection results; press the Button again or select another image to repeat a couple of times to
experience the latest ML Kit ODT!

You have used ML Kit to add Object Detection capabilities to your app:

 3 steps with 3 APIs

 Create Input Image

 Create Detector

 Send Image to Detector

Page 34 of 105
ASSESMENT:

1) The advanced computer-vision task that tells you where the objects are within the image
by returning a mask that tells you which pixel belongs to which object is known as .
 Object detection
 Item detection
 Image classification
 Image segmentation
2) True or false? One drawback of object detection is that it can only detect one object.
 True
 False

3) Match the following ML Kit ObjectDetector's builder settings with their


options.. Single image or stream - Classification mode
Detection mode
Detector mode

Single or multiple object detection - Classification mode


Detection mode
Detector mode

Classification mode—on or off - Classification mode


Detection mode
Detector mode

Page 35 of 105
Go further with object detection
MODULE1: Build and deploy a custom object-detection model with
TensorFlow Lite

In this codelab, you'll learn how to

 Build an Android app that detects ingredients in images of meals.

 Integrate a TFLite pre-trained object detection model and see the limit of what the
model can detect.

 Train a custom object detection model to detect the ingredients/components of a


meal using a custom dataset called salad and TFLite Model Maker.

 Deploy the custom model to the Android app using TFLite Task Library.

In the end, you'll create something similar to the image below:

Object Detection:

Object detection is a set of computer vision tasks that can detect and locate objects in a digital
image. Given an image or a video stream, an object detection model can identify which of a known
set of objects might be present, and provide information about their positions within the image.

Page 36 of 105
TensorFlow provides pre-trained, mobile optimized models that can detect common objects, such
as cars, oranges, etc. You can integrate these pre-trained models in your mobile app with just a few
lines of code. However, you may want or need to detect objects in more distinctive or offbeat
categories. That requires collecting your own training images, then training and deploying your
own object detection model.

TensorFlow Lite

TensorFlow Lite is a cross-platform machine learning library that is optimized for running
machine learning models on edge devices, including Android and iOS mobile devices.

TensorFlow Lite is actually the core engine used inside ML Kit to run machine learning models.
There are two components in the TensorFlow Lite ecosystem that make it easy to train and deploy
machine learning models on mobile devices:

 Model Maker is a Python library that makes it easy to train TensorFlow Lite models using
your own data with just a few lines of code, no machine learning expertise required.

 Task Library is a cross-platform library that makes it easy to deploy TensorFlow Lite
models with just a few lines of code in your mobile apps.

This codelab focuses on TFLite. Concepts and code blocks that are not relevant to TFLite and
object detection are not explained and are provided for you to simply copy and paste.

Get set up:

Download the Code

Click the following link to download all the code for this codelab:

file_downloadDownload source code

Unpack the downloaded zip file. This will unpack a root folder (odml-pathways-main) with all of
the resources you will need. For this codelab, you will only need the sources in the object-
detection/codelab2/android subdirectory.

The android subdirectory in the object-detection/codelab2/android repository contains two


directories:

 starter—Starting code that you build upon for this codelab.

 final—Completed code for the finished sample app.

Page 37 of 105
Import the starter app

Let's start by importing the starter app into the Android Studio.

1. Open Android Studio and select Import Project (Gradle, Eclipse ADT, etc.)

2. Open the starter folder from the source code you downloaded earlier.

To be sure that all dependencies are available to your app, you should sync your project with
gradle files when the import process has finished.

3. Select Sync Project with Gradle Files ( ) from the Android Studio
toolbar. Import starter/app/build.gradle

If this button is disabled, make sure you import only starter/app/build.gradle and not the entire
repository.

Run the starter app

Now that you have imported the project into Android Studio, you're ready to run the app for the
first time.

Connect your Android device via USB to your computer or start the Android Studio emulator, and

click Run ( ) in the Android Studio toolbar.

In order to keep this codelab simple and focused on the machine learning bits, the starter app
contains some boilerplate code that do a few things for you:

 It can take photos using the device's camera.

 It contains some stock images for you to try out object detection on an Android emulator.

 It has a convenient method to draw the object detection result on the input bitmap.

You'll mostly interact with these methods in the app skeleton:

 fun runObjectDetection(bitmap: Bitmap) This method is called when you choose a preset
image or take a photo. bitmap is the input image for object detection. Later in this codelab,
you will add object detection code to this method.

 data class DetectionResult(val boundingBoxes: Rect, val text: String) This is a data class
that represents an object detection result for visualization. boundingBoxes is the rectangle
where the object locates, and text is the detection result string to display together with the
object's bounding box.

Page 38 of 105
 fun drawDetectionResult(bitmap: Bitmap, detectionResults: List<DetectionResult>):
Bitmap This method draws the object detection results in detectionResults on theinput
bitmap and returns the modified copy of it.

Here is an example of an output of the drawDetectionResult utility method.

Add on-device object detection:

Now you'll build a prototype by integrating a pre-trained TFLite model that can detect common
objects into the starter app.

Download a pre-trained TFLite object detection model

There are several object detector models on TensorFlow Hub that you can use. For this codelab,
you'll download the EfficientDet-Lite Object detection model, trained on the COCO 2017 dataset,
optimized for TFLite, and designed for performance on mobile CPU, GPU, and EdgeTPU.

Next, use the TFLite Task Library to integrate the pre-trained TFLite model into your starter app.
The TFLite Task Library makes it easy to integrate mobile-optimized machine learning models
into a mobile app. It supports many popular machine learning use cases, including object detection,
image classification, and text classification.

TFLite Task Library only supports TFLite models that contain valid metadata. You can find more
supported object detection models from this TensorFlow Hub collection.

Add the model to the starter app

1. Copy the model that you have just downloaded to the assets folder of the starter app.
You can find the folder in the Project navigation panel in Android Studio.

Page 39 of 105
2. Name the file model.tflite.

Update the Gradle file Task Library dependencies

Go to the app/build.gradle file and add this line into the dependencies configuration:

implementation 'org.tensorflow:tensorflow-lite-task-vision:0.3.1'

Sync your project with gradle files

To be sure that all dependencies are available to your app, you should sync your project with

gradle files at this point. Select Sync Project with Gradle Files ( ) from the Android Studio

toolbar.
(If this button is disabled, make sure you import only starter/app/build.gradle, not the entire
repository.)

Set up and run on-device object detection on an image

There are only 3 simple steps with 3 APIs to load and run an object detection model:

 prepare an image / a stream: TensorImage

 create a detector object: ObjectDetector

 connect the 2 objects above: detect(image)

You achieve these inside the function runObjectDetection(bitmap: Bitmap)in file MainActivity.kt.

/**
* TFLite Object Detection Function
*/
private fun runObjectDetection(bitmap: Bitmap) {
//TODO: Add object detection code here
}

Page 40 of 105
 org.tensorflow.lite.support.image.TensorImage

 org.tensorflow.lite.task.vision.detector.ObjectDetector

Create Image Object

The images you'll use for this codelab are going to come from either the on-device camera, or
preset images that you select on the app's UI. The input image is decoded into the Bitmap format
and passed to the runObjectDetection method.

TFLite provides a simple API to create a TensorImage from Bitmap. Add the code below to the
top of runObjectDetection(bitmap:Bitmap):

// Step 1: create TFLite's TensorImage object


val image = TensorImage.fromBitmap(bitmap)

Create a Detector instance

TFLite Task Library follows the Builder Design Pattern. You pass the configuration to a builder,
then acquire a detector from it. There are several options to configure, including those to adjust the
sensitivity of the object detector:

 max result (the maximum number of objects that the model should detect)

 score threshold (how confidence the object detector should be to return a detected object)

 label allowlist/denylist (allow/deny the objects in a predefined list)

Initialize the object detector instance by specifying the TFLite model file name and the
configuration options:

// Step 2: Initialize the detector object


val options = ObjectDetector.ObjectDetectorOptions.builder()
.setMaxResults(5)
.setScoreThreshold(0.5f)
.build()
val detector = ObjectDetector.createFromFileAndOptions(
this, // the application context
"model.tflite", // must be same as the filename in assets folder
options
)

Page 41 of 105
Feed Image(s) to the detector

Add the following code to fun runObjectDetection(bitmap:Bitmap). This will feed your images to
the detector.

// Step 3: feed given image to the model and print the detection result
val results = detector.detect(image)

Each object is described with:

 boundingBox: the rectangle declaring the presence of an object and its location within the
image

 categories: what kind of object it is and how confident the model is with the detection
result. The model returns multiple categories, and the most confident one is first.

 label: the name of the object category.

 classificationConfidence:a float between 0.0 to 1.0, with 1.0 representing 100%

Print the detection results

Add the following code to fun runObjectDetection(bitmap:Bitmap). This calls a method to print
the object detection results to Logcat.

// Step 4: Parse the detection result and show it


debugPrint(results)

Then add this debugPrint()method to the MainActivity class:

private fun debugPrint(results : List<Detection>) {


for ((i, obj) in results.withIndex()) {
val box = obj.boundingBox
Log.d(TAG, "Detected object: ${i}
")
Log.d(TAG, " boundingBox: (${box.left}, ${box.top}) - (${box.right},${box.bottom})")
for ((j, category) in obj.categories.withIndex()) {
Log.d(TAG, " Label $j: ${category.label}")
val confidence: Int = category.score.times(100).toInt()
Log.d(TAG, " Confidence: ${confidence}%")
}
}
Import the app into Android Studio

Page 42 of 105
Start by importing the starter app into the Android Studio.

Go to Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the product-
search/codelab2/android/final folder from the source code you downloaded earlier.

Page 43 of 105
Run the starter app

Now that you have imported the project into Android Studio, you are ready to run the app for the
first time.

Connect your Android device via USB to your host or Start the Android Studio emulator, and

click Run ( ) in the Android Studio toolbar.


(If this button is disabled, make sure you import only final/app/build.gradle, not the entire
repository.)

Now the app should have launched on your Android device. It is already functioning, but it uses
the demo product search backend that Google has deployed for you.

Next, you'll update the app to use the backend you built earlier in this codelab.

Update the API endpoints

Change the API configurations

Go to the ProductSearchAPIClient class and you will see the configs of the product search
backend already defined. Comment out the configs of the demo backend:

// Define the product search backend


// Option 1: Use the demo project that we have already deployed for you
// const val VISION_API_URL =
"https://us-central1-odml-codelabs.cloudfunctions.net/productSearch"
// const val VISION_API_KEY = ""
// const val VISION_API_PROJECT_ID = "odml-codelabs"
// const val VISION_API_LOCATION_ID = "us-east1"
// const val VISION_API_PRODUCT_SET_ID = "product_set0"

Then replace them with your config:

// Option 2: Go through the Vision API Product Search quickstart and deploy to your project.
// Fill in the const below with your project info.
const val VISION_API_URL = "https://vision.googleapis.com/v1"
const val VISION_API_KEY = "YOUR_API_KEY"
const val VISION_API_PROJECT_ID = "YOUR_PROJECT_ID"
const val VISION_API_LOCATION_ID =
"YOUR_LOCATION_ID"

Page 44 of 105
const val VISION_API_PRODUCT_SET_ID = "YOUR_PRODUCT_SET_ID"

 VISION_API_URL is the API endpoint of Cloud Vision API.

 VISION_API_KEY is the API key that you created earlier in this codelab.

 VISION_API_PROJECT_ID , VISION_API_LOCATION_ID , VISION_API_PROD


UCT_SET_ID is the value you used in the Vision API Product Search quickstart earlier in
this codelab.

Run it:

Now click Run ( ) in the Android Studio toolbar. Once the app loads, tap any preset images, select
an detected object, tap the Search button to see the search results. The app is now using the
product search backend that you have just created!

Page 45 of 105
ASSESMENT:

1. What API allows you to query an image and search for visually similar products from
a product catalog?
 Visual API Product Search
 Picture API Product Search
 Vision API Product Search
 Sight API Product Search

2.Which of the following product categories does Vision API Product Search support?
Choose as many answers as you see fit.
 Homegoods
 Apparel
 Toys
 Food
 Packaged goods
 General
 Machinery

3.It is strongly recommended that you restrict access to the to prevent unauthorized
access.
 IDE
 API calls
 API key
 Mobile app

Page 46 of 105
Go further with image classification

MODULE1: Create a custom model for your image classifier

All of the code to follow along has been prepared for you and is available to execute using Google
Colab here. If you don't have access to Google Colab, you can clone the repo and use the notebook
called CustomImageClassifierModel.ipynb which can be found in the ImageClassificationMobile-
>colab directory.

The easiest way to do this is to create a .zip or .tgz file containing the images, sorted into
directories. For example, if you use images of daisies, dandelions, roses, sunflower and tulips, you
can organize them into directories like this:

Zip that up and host it on a server, and you'll be able to train models with it. You'll use one that has
been prepared for you in the rest of this lab.

This lab will assume you are using Google Colab to train the model. You can find colab at
colab.research.google.com. If you're using another environment you may have to install a lot of
dependencies, not least TensorFlow.

Install and import dependencies:

1. Install TensorFlow Lite Model Maker. You can do this with a pip install. The &>
/dev/null at the end just suppresses the output. Model Maker outputs a lot of stuff that
isn't immediately relevant. It's been suppressed so you can focus on the task at hand.

# Install Model maker


!pip install -q tflite-model-maker &> /dev/null

2. Next you'll need to import the libraries that you need to use and ensure that you are using
TensorFlow 2.x:

Page 47 of 105
# Imports and check that we are using TF2.x
import numpy as np
import os
from tflite_model_maker import configs
from tflite_model_maker import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import image_classifier
from tflite_model_maker.image_classifier import DataLoader
import tensorflow as tf
assert tf. version .startswith('2')
tf.get_logger().setLevel('ERROR')

Now that the environment is ready, it's time to start creating your model!

If your images are organized into folders, and those folders are zipped up, then if you download the
zip and decompress it, you'll automatically get your images labelled based on the folder they're in.
This directory will be referenced as data_path.

data_path = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)

This data path can then be loaded into a neural network model for training with TensorFlow Lite
Model Maker's ImageClassifierDataLoader class. Just point it at the folder and you're good to go.

One important element in training models with machine learning is to not use all of your data for
training. Hold back a little to test the model with data it hasn't previously seen. This is easy to do
with the split method of the dataset that comes back from ImageClassifierDataLoader. By passing
a 0.9 into it, you'll get 90% of it as your training data, and 10% as your test data:

data = DataLoader.from_folder(data_path)
train_data, test_data = data.split(0.9)

Now that your data is prepared, you can create a model using it.

Page 48 of 105
Create the Image Classifier Model

Model Maker abstracts a lot of the specifics of designing the neural network so you don't have to
deal with network design, and things like convolutions, dense, relu, flatten, loss functions and
optimizers. For a default model, you can simply use a single line of code to create a model by
training a neural network with the provided data:

model = image_classifier.create(train_data)

When you run this, you'll see output that looks a bit like the following:

Model: "sequential_2"

Layer (type) Output Shape Param #


=================================================================
hub_keras_layer_v1v2_2 (HubK (None, 1280) 3413024

dropout_2 (Dropout) (None, 1280) 0

dense_2 (Dense) (None, 5) 6405


=================================================================
Total params: 3,419,429
Trainable params: 6,405
Non-trainable params: 3,413,024

None
Epoch 1/5
103/103 [===] - 15s 129ms/step - loss: 1.1169 - accuracy: 0.6181
Epoch 2/5
103/103 [===] - 13s 126ms/step - loss: 0.6595 - accuracy: 0.8911
Epoch 3/5
103/103 [===] - 13s 127ms/step - loss: 0.6239 - accuracy: 0.9133
Epoch 4/5
103/103 [===] - 13s 128ms/step - loss: 0.5994 - accuracy: 0.9287
Epoch 5/5
103/103 [===] - 13s 126ms/step - loss: 0.5836 - accuracy: 0.9385

Page 49 of 105
The first part is showing your model architecture. What Model Maker is doing behind the scenes is
called Transfer Learning, which uses an existing pre-trained model as a starting point, and just
taking the things that that model learned about how images are constructed and applying them to
understanding these 5 flowers. You can see this in the first line that reads:

hub_keras_layer_v1v2_2 (HubK (None, 1280) 3413024

The key is the word ‘Hub', telling us that this model came from TensorFlow Hub. By default,
TensorFlow Lite Model Maker uses a model called ‘MobileNet' which is designed to recognize
1000 types of image.

Earlier you split the data into training and test data, so you can get a gauge for how the network
performs on data it hasn't previously seen – a better indicator of how it might perform in the real
world by using model.evaluate on the test data:

loss, accuracy = model.evaluate(test_data)

This will output something like this:

12/12 [===] - 5s 115ms/step - loss: 0.6622 - accuracy: 0.8801

Note the accuracy here. It's 88.01%, so using the default model in the real world should expect that
level of accuracy. That's not bad for the default model that you trained in about a minute. Of course
you could probably do a lot of tweaking to improve the model, and that's a science unto itself!

Export the Model:

Now that the model is trained, the next step is to export it in the .tflite format that a mobile
application can use. Model maker provides an easy export method that you can use — simply
specify the directory to output to.

Here's the code:

model.export(export_dir='/mm_flowers')

Page 50 of 105
From here, you'll get a listing of the current directory. Use the indicated button to move "up" a
directory:

In your code you specified to export to mm_flowers directory. Open that, and you'll see a file
called ‘model.tflite'. This is your trained model.

Select the file and you'll see 3 dots pop up on the right. Click these to get a context menu, and you
can download the model from there.

After a few moments your model will be downloaded to your downloads folder.

You're now ready to integrate it into your mobile app! You'll do that in the next lab.

Page 51 of 105
MODULE2: Integrate a custom model into your app

Get the Starter App:

Open it in Android Studio, do whatever updates you need, and when it's ready run the app to be
sure it works. You should see something like this:

It's quite a primitive app, but it shows some very powerful functionality with just a little code.
However, if you want this flower to be recognized as a daisy, and not just as a flower, you'll have
to update the app to use your custom model from the Create a custom model for your image
classifier codelab.

Update build.gradle to use Custom ML Kit Models

1. Using Android Studio, find the app-level build.gradle file. The easiest way to do this is in
the project explorer. Make sure Android is selected at the top, and you'll see a folder
for Gradle Scripts at the bottom.

2. Open the one that is for the Module, with your app name followed by ‘.app' as shown here
– (Module: ImageClassifierStep1.app):

Page 52 of 105
3. At the bottom of the file, find the dependencies setting. In there you should see this

line: implementation 'com.google.mlkit:image-labeling:17.0.1'

The version number might be different. Always find the latest version number from the ML Kit
site at: https://developers.google.com/ml-kit/vision/image-labeling/android

4. Replace this with the custom image labeling library reference. The version number for this
can be found at: https://developers.google.com/ml-kit/vision/image-labeling/custom-
models/android

Implementation'com.google.mlkit:image-labeling-custom:16.3.1'

5. Additionally, you'll be adding a .tflite model that you created in the previous lab. You don't
want this model to be compressed when Android Studio compiles your app, so make sure
you use this setting in the Android section of the same build.gradle file:

aaptOptions{ noCom
press "tflite"
}

Make sure it's not within any other setting. It should be nested directly under the android tag.
Here's an example:

Add the TFLite Model:

In the previous codelab you created your custom model and downloaded it as model.tflite.

In your project, find your assets folder that currently contains flower1.jpg. Copy the model to that
folder as follows:

1. Right-click the Assets folder in Android Studio. In the menu that opens, select Reveal in
Finder. (‘Show in Explorer' on Windows, and ‘Show in Files' on Linux.)

Page 53 of 105
2. You'll be taken to the directory on the file system. Copy the model.tflite file into that
directory, alongside flower1.jpg.

Android Studio will update to show both files in your assets folder:

You're now ready to update your code.

Update your code for the custom model:

The first step will be to add some code to load the custom model.

1. In your MainActivity file, add the following to your onCreate, immediately below the line
that reads setContentView(R.layout.activity_main).

This will use a LocalModel to build from the model.tflite asset. If Android Studio complains by
turning ‘LocalModel' red, press ALT + Enter to import the library. It should add an import to
com.google.mlkit.common.model.LocalModel for you.

Page 54 of 105
val localModel = LocalModel.Builder()
.setAssetFilePath("model.tflite")
.build()

Previously, in your btn.setOnClickListener handler you were using the default model. It was set
up with this code:

val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)

You'll replace that to use the custom model.

2. Set up a custom options object:

val options = CustomImageLabelerOptions.Builder(localModel)


.setConfidenceThreshold(0.7f)
.setMaxResultCount(5)
.build()
This replaces the default options with a customized set. The confidence threshold sets a bar for the
quality of predictions to return. If you look back to the sample at the top of this codelab, where the
image was a daisy, you had 4 predictions, each with a value beside them, such as ‘Sky' being
.7632.

You could effectively filter out lower quality results by using a high confidence threshold. Setting
this to 0.9 for example wouldn't return any label with a priority lower than that. The
setMaxResultCount() is useful in models with a lot of classes, but as this model only has 5, you'll
just leave it at 5.

Now that you have options for the labeler, you can change the instantiation of the labeler to:

val labeler = ImageLabeling.getClient(options)

The rest of your code will run without modification. Give it a try!

Page 55 of 105
Here you can see that this flower was now identified as a daisy with a .959 probability!

Let's say you added a second flower image, and reran with that:

It identifies it as a rose.

Get the iOS Start App:

1. First you'll need the app from the first Codelab. If you have gone through the lab, it will be
called ImageClassifierStep1. If you don't want to go through the lab, you can clone the finished
version from the repo. Please note that the pods and .xcworkspace aren't present in the repo, so
before continuing to the next step be sure to run ‘pod install' from the same directory as the
.xcproject.

2. Open ImageClassifierStep1.xcworkspace in Xcode. Note that you should use the .xcworkspace
and not the .xcproject because you have bundled ML Kit using pods, and the workspace will
load these.

Run it and you'll see something like this:

Use Custom ML Kit Image Labeler Pods:

The first app used a pod file to get the base ML Kit Image Labeler libraries and model. You'll
need to update that to use the custom image labelling libraries.

Page 56 of 105
1. Find the file called podfile in your project directory. Open it, and you'll see something like
this:

platform :ios, '10.0'

target 'ImageClassifierStep1' do
pod 'GoogleMLKit/ImageLabeling'
end

2. Change the pod declaration from ImageLabeling to ImageLabelingCustom, like this:

platform :ios, '10.0'


target 'ImageClassifierStep1' do
pod 'GoogleMLKit/ImageLabelingCustom'
end

3. Once you're done, use the terminal to navigate to the directory containing the podfile
(as well as the .xcworkspace) and run pod install.

After a few moments the MLKitImageLabeling libraries will be removed, and the custom ones
added. You can now open your .xcworkspace to edit your code.

Add the TFLite Model to Xcode:

1. With the workspace open in Xcode, drag the model.tflite onto your project. It should be in
the same folder as the rest of your files such as ViewController.swift or Main.storyboard.

2. A dialog will pop up with options for adding the file. Ensure that Add to Targets is
selected, or the model won't be bundled with the app when it's deployed to a device.

Note that the ‘Add to Targets' entry will have ImageClassifierStep1 if you started from that and
are continuing through this lab step-by-step or ImageClassifierStep2 (as shown) if you jumped
ahead to the finished code.

Page 57 of 105
This will ensure that you can load the model. You'll see how to do that in the next step.

Update your Code for the Custom Model:

1. Open your ViewController.swift file. You may see an error on the ‘import
MLKitImageLabeling' at the top of the file. This is because you removed the generic image
labeling libraries when you

updated your pod file. Feel free to delete this line, and update with the following:

import MLKitVision
import MLKit
import MLKitImageLabelingCommon
import MLKitImageLabelingCustom

It might be easy to speed read these and think that they're repeating the same code! But it's
"Common" and "Custom" at the end!

2. Next you'll load the custom model that you added in the previous step. Findthe
getLabels() func. Beneath the line that reads visionImage.orientation =
image.imageOrientation, add these lines:

// Add this code to use a custom model


let localModelFilePath = Bundle.main.path(forResource: "model", ofType: "tflite")
let localModel = LocalModel(path: localModelFilePath!)

3. Find the code for specifying the options for the generic ImageLabeler. It's probably giving
you an error since those libraries were removed:

let options = ImageLabelerOptions()

Replace that with this code, to use a CustomImageLabelerOptions, and which specifies the local
model:

Page 58 of 105
let options = CustomImageLabelerOptions(localModel: localModel)

...and that's it! Try running your app now! When you try to classify the image it should be more
accurate – and tell you that you're looking at a daisy with high probability!

Let's say you added a second flower image, and reran with that:

The app successfully detected that this image matched the label ‘roses'!

The resulting app is, of course, very limited because it relied on bundled image assets. However,
the ML part is working nicely. You could, for example, use AndroidX Camera to take frames
froma live feed and classify them to see what flowers your phone recognizes!

From here the possibilities are endless – and if you have your own data for something other than
flowers, you have the foundations of what you need to build an app that recognizes them using
Computer Vision.

Page 59 of 105
ASSESMENT:

1. Model Maker abstracts a lot of the specifics of designing the neural network so
you don’t have to deal with network design, and things like.
 Convolutions
 Dense
 Relu
 Flatten
 File type
 Loss function
 Optimizers
 Pixels

2. True or false? The confidence threshold sets a bar for the quality of predictions to return.
 True
 False

3. The function is useful in models with a lot of classes.


 setMaxResultCount()
 MaxSetCount()
 count()maxSet
 resultMaxCount()

Page 60 of 105
CONCLUSION

Completing an AI-ML virtual internship provides invaluable hands-on experience in machine


learning and artificial intelligence. Throughout the internship, participants gain exposure to
real-world projects, enhancing their understanding of AI-ML concepts. By working on
diverse tasks such as data preprocessing, model training, and evaluation, interns develop a
comprehensive skill set that includes programming, statistical analysis, and problem- solving.

Moreover, an AI-ML virtual internship enhances your resume and professional profile,
making you a more attractive candidate to potential employers. It demonstrates your ability to
adapt to remote working environments, collaborate with a team, and manage projects
independently.Ultimately, the internship prepares you for a successful career in the rapidly
evolving landscape of artificial intelligence and machine learning, positioning you at the
forefront of technological innovation.

Page 61 of 105
Page 62 of 105
Page 63 of 105
Page 64 of 105
Page 65 of 105
1.
2.
3.
4.

Page 66 of 105

You might also like