Aiml Final Rep
Aiml Final Rep
Report
on
Bachelor of Technology
in
Computer Science & Engineering
BY
K.Gnanesh - 21BQ1A0598
DECLARATION
I, Gnanesh Katam hereby declare that the course entitled GOOGLE AI-ML
VIRTUAL INTERNSHIP done by me at Vasireddy Venkatadri Institute of
Technology is submitted for partial fulfillment of the requirements for the award of
Credits in Department of CSE. The results embodied in this have not been submitted
to any other University for the same purpose.
CERTIFICATE
This certificate attests that the following report accurately represents the work
completed by Gnanesh Katam, Registration Number 21BQ1A0598, during the academic
year 2023-2024, covering the time period from Jan to Mar 2024, as part of the GOOGLE
AIML VIRTUAL INTERNSHIP.
To
The Principal
Vasireddy Venkatadri Institute of
Technology Namburu,
Guntur.
Respected Sir,
I am pleased to submit my internship report on “Google AI-ML Virtual Internship” as
per your instruction to fulfil the requirements of the Degree of Bachelor of Technology in
CSE from Jawaharlal Nehru Technological University, Kakinada. While preparing this
report, I have tried my level best to include all the relevant information, explanations, things
I learned from the Internship Courses, my contribution to this programme to make the
report informative and comprehensive. It would not have been possible to complete this
report without your assistance, of which I am very thankful. Working for two months on
Google AIML Virtual Internship in online was amazing and a huge learning opportunity for
me. Also, it was a great experience to prepare this report and I will be available for any
clarification, if required.
Therefore, I pray and hope that you would be kind enough to accept my Internship Report and
oblige thereby.
Yours Obediently,
K.Gnanesh
ID:21BQ1A0598
EMAIL: [email protected]
CERTIFICATE OF INTERNSHIP
ACKNOWLEDGEMENT
First and foremost, we express our deep gratitude to Mr. Vasireddy VidyaSagar,
Chairman, Vasireddy Venkatadri Institute of Technology for providing necessary facilities
throughout the Computer Science & Engineering program.
We express our sincere gratitude to Dr. K. Suresh Babu, Professor & HOD,
Information Technology, Vasireddy Venkatadri Institute of Technology for his constant
encouragement, motivation and faith by offering different places to look to expand my ideas.
We would like to express our sincere gratitude to our VVIT INTERNSHIP I/C Mr. YV
Subba Reddy, SPOC and our Internship Coordinator Mr. K. Balakrishna for his insightful
advice, motivating suggestions, invaluable guidance, help and support in successful completion
of this Internship.
We would like to take this opportunity to express our thanks to the teaching and non-
teaching staff in the Department of Computer Science & Engineering, VVIT for their invaluable
help and support.
Throughout the internship, participants will work on hands-on projects that simulate
actual industry scenarios, allowing them to apply their theoretical understanding to practical
challenges. These projects are designed to enhance participants' problem-solving skills,
creativity, and technical proficiency. By collaborating with peers and mentors, interns will
develop a deeper understanding of the complexities and nuances of AI-ML applications,
preparing them for future roles in the tech industry.
Moreover, the program emphasizes the development of soft skills crucial for professional
growth, such as teamwork, communication, and project management. Interactive sessions
with industry experts and thought leaders will provide valuable insights into the current
landscape of AI and ML, as well as emerging trends and future directions. Participants will
also benefit from personalized feedback and career guidance, helping them to refine their
career aspirations and pathways.
Intro to computer
Week-1 Understood complex images
vision
Intro to object
Week-2 detector and in depth Built an object detector
context
Detect objects in
images to build a
Week-4 visual product search: Object detection: static images
Android
.Built an
Introduction to product
Week-9 image search on object
mobile detector
Program neural networks with TensorFlow
What is ML?
Consider the traditional manner of building apps, as represented in the following diagram:
You express rules in a programming language. They act on data and your program provides
answers**.** In the case of the activity detection, the rules (the code you wrote to defineactivity
types) acted upon the data (the person's movement speed) to produce an answer: the returnvalue
from the function for determining the activity status of the user (whether they were walking,
running, biking, or doing something else).
The process for detecting that activity status via ML is very similar, only the axes are different.
Instead of trying to define the rules and express them in a programming language, you provide the
answers (typically called labels) along with the data, and the machine infers the rules that
determine the relationship between the answers and data. For example, your activity detection
scenario might look like this in an ML context:
Page 1 of 105
Beyond being an alternative method to programming that scenario, that approach also gives you
the ability to open new scenarios, such as the golfing one that may not have been possible under
the rules-based traditional programming approach.
In traditional programming, your code compiles into a binary that is typically called a program. In
ML, the item that you create from the data and labels is called a model.
Consider the result of that to be a model, which is used like this at runtime:
You pass the model some data and the model uses the rules that it inferred from the training to
make a prediction, such as, "That data looks like walking," or "That data looks like biking."
Page 2 of 105
Create your first ML model:
Consider the following sets of numbers. Can you see the relationship between them?
X: -1 0 1 2 3 4
Y: -2 1 4 7 10 13
As you look at them, you might notice that the value of X is increasing by 1 as you read left to
right and the corresponding value of Y is increasing by 3. You probably think that Y equals 3X
plus or minus something. Then, you'd probably look at the 0 on X and see that Y is 1, and you'd
come up with the relationship Y=3X+1.
How would you train a neural network to do the equivalent task? Using data! By feeding it with a
set of X's and a set of Y's, it should be able to figure out the relationship between them.
Start with your imports. Here, you're importing TensorFlow and calling it tf for ease of
use. Next, import a library called numpy, which represents your data as lists easily and
quickly.
The framework for defining a neural network as a set of sequential layers is called keras, so import
that, too.
import tensorflow as tf
import numpy as np
from tensorflow import keras
Define and compile the neural network:
Next, create the simplest possible neural network. It has one layer, that layer has one neuron, and
the input shape to it is only one value.
Next, write the code to compile your neural network. When you do so, you need to specify two
functions—a loss and an optimizer.
In this example, you know that the relationship between the numbers is Y=3X+1.
When the computer is trying to learn that, it makes a guess, maybe Y=10X+10. The loss function
measures the guessed answers against the known correct answers and measures how well or badly
it did.
Page 3 of 105
Page 4 of 105
Next, the model uses the optimizer function to make another guess. Based on the loss function's
result, it tries to minimize the loss. At this point, maybe it will come up with something like
Y=5X+5. While that's still pretty bad, it's closer to the correct result (the loss is lower).
First, here's how to tell it to use mean_squared_error for the loss and stochastic gradient descent
(sgd) for the optimizer. You don't need to understand the math for those yet, but you can see that
they work!
model.compile(optimizer='sgd', loss='mean_squared_error')
Provide the data:
Next, feed some data. In this case, you take the six X and six Y variables from earlier. You can see
that the relationship between those is that Y=3X+1, so where X is -1, Y is -2.
A python library called NumPy provides lots of array type data structures to do this. Specify the
values as an array in NumPy with np.array[].
The process of training the neural network, where it learns the relationship between the X's and Y's,
is in the model.fit call. That's where it will go through the loop before making a guess, measuring
how good or bad it is (the loss), or using the optimizer to make another guess. It will do that for the
number of epochs that you specify. When you run that code, you'll see the loss will be printed out
for each epoch.
For example, you can see that for the first few epochs, the loss value is quite large, but it's getting
smaller with each step.
Page 5 of 105
As the training progresses, the loss soon gets very small.
By the time the training is done, the loss is extremely small, showing that our model is doing a
great job of inferring the relationship between the numbers.
You probably don't need all 500 epochs and can experiment with different amounts. As you can
see from the example, the loss is really small after only 50 epochs, so that might be enough!
You have a model that has been trained to learn the relationship between X and Y. You can use
the model.predict method to have it figure out the Y for a previously unknown X. For example, if
X is 10, what do you think Y will be? Take a guess before you run the following code:
print(model.predict([10.0]))
You might have thought 31, but it ended up being a little over. Why do you think that is?
Neural networks deal with probabilities, so it calculated that there is a very high probability that
the relationship between X and Y is Y=3X+1, but it can't know for sure with only six data points.
The result is very close to 31, but not necessarily 31.
As you work with neural networks, you'll see that pattern recurring. You will almost always deal
with probabilities, not certainties, and will do a little bit of coding to figure out what the result is
based on the probabilities, particularly when it comes to classification.
Page 6 of 105
MODULE2: Introduction to Computer Vision:
import tensorflow as tf
print(tf. version )
You'll train a neural network to recognize items of clothing from a common dataset called Fashion
MNIST. It contains 70,000 items of clothing in 10 different categories. Each item of clothing is in
a 28x28 grayscale image. You can see some examples here:
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
Page 7 of 105
The labels associated with the dataset are:
The Fashion MNIST data is available in the tf.keras.datasets API. Load it like this:
mnist = tf.keras.datasets.fashion_mnist
Calling load_data on that object gives you two sets of two lists: training values and testing values,
which represent graphics that show clothing items and their labels.
What do those values look like? Print a training image and a training label to see. You can
experiment with different indices in the array.
You'll notice that all the values are integers between 0 and 255. When training a neural network,
it's easier to treat all values as between 0 and 1, a process called normalization. Fortunately,
Python provides an easy way to normalize a list like that without looping.
You may also want to look at 42, a different boot than the one at index 0.
Now, you might be wondering why there are two datasets—training and testing.
Page 8 of 105
The idea is to have one set of data for training and another set of data that the model hasn't yet
encountered to see how well it can classify values. After all, when you're done, you'll want to use
the model with data that it hadn't previously seen! Also, without separate testing data, you'll run
the risk of the network only memorizing its training data without generalizing its knowledge.
Now design the model. You'll have three layers. Go through them one-by-one and explore the
different types of layers and the parameters used for each.
model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128,
activation=tf.nn.relu),
tf.keras.layers.Dense(10,
activation=tf.nn.softmax)])
Now that the model is defined, the next thing to do is build it. Create a model by first compiling it
with an optimizer and loss function, then train it on your training data and labels. The goal is to
have the model figure out the relationship between the training data and its training labels. Later,
you want your model to see data that resembles your training data, then make a prediction about
what that data should look like.
model.compile(optimizer = tf.keras.optimizers.Adam(),
loss = 'sparse_categorical_crossentropy',
Page 9 of 105
metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
Epoch 1/5
60000/60000 [=======] - 6s 101us/sample - loss: 0.4964 - acc: 0.8247
Epoch 2/5
60000/60000 [=======] - 5s 86us/sample - loss: 0.3720 - acc: 0.8656
Epoch 3/5
60000/60000 [=======] - 5s 85us/sample - loss: 0.3335 - acc: 0.8780
Epoch 4/5
60000/60000 [=======] - 6s 103us/sample - loss: 0.3134 - acc: 0.8844
Epoch 5/5
60000/60000 [=======] - 6s 94us/sample - loss: 0.2931 - acc: 0.8926
How would the model perform on data it hasn't seen? That's why you have the test set. You
call model.evaluate and pass in the two sets, and it reports the loss for each. Give it a try:
model.evaluate(test_images, test_labels)
That example returned an accuracy of .8789, meaning it was about 88% accurate. (You might
have slightly different values.)
As expected, the model is not as accurate with the unknown data as it was with the data it was
trained on! As you learn more about TensorFlow, you'll find ways to improve that.
Page 10 of 105
MODULE3: Introduction to Convolutions
A convolution is a filter that passes over an image, processes it, and extracts the important features.
Let's say you have an image of a person wearing a sneaker. How would you detect that a sneaker is
present in the image? In order for your program to "see" the image as a sneaker, you'll have to
extract the important features, and blur the inessential features. This is called feature mapping.
The feature mapping process is theoretically simple. You'll scan every pixel in the image and then
look at its neighboring pixels. You multiply the values of those pixels by the equivalent weights in
a filter.
For example:
The current pixel value is 192. You can calculate the value of the new pixel by looking at the
neighbor values, multiplying them by the values specified in the filter, and making the new pixel
value the final amount.
CODE:
import cv2
import numpy as np
Page 11 of 105
from scipy import misc
i = misc.ascent()
Next, use the Pyplot library matplotlib to draw the image so that you know what it looks like:
plt.axis('off')
plt.imshow(i)
plt.show()
The image is stored as a NumPy array, so we can create the transformed image by just copying
that array. The size_x and size_y variables will hold the dimensions of the image so you can
loop over it later.
i_transformed = np.copy(i)
size_x = i_transformed.shape[0]
size_y = i_transformed.shape[1]
Page 12 of 105
That means that the current pixel's neighbor above it and to the left of it will be multiplied by the
top-left item in the filter. Then, multiply the result by the weight and ensure that the result is in the
range 0 through 255.
for x in range(1,size_x-1):
for y in range(1,size_y-1):
output_pixel = 0.0
output_pixel = output_pixel + (i[x - 1, y-1] * filter[0][0])
output_pixel = output_pixel + (i[x, y-1] * filter[0][1])
output_pixel = output_pixel + (i[x + 1, y-1] * filter[0][2])
output_pixel = output_pixel + (i[x-1, y] * filter[1][0])
output_pixel = output_pixel + (i[x, y] * filter[1][1])
output_pixel = output_pixel + (i[x+1, y] * filter[1][2])
output_pixel = output_pixel + (i[x-1, y+1] * filter[2][0])
output_pixel = output_pixel + (i[x, y+1] * filter[2][1])
output_pixel = output_pixel + (i[x+1, y+1] * filter[2][2])
output_pixel = output_pixel * weight
if(output_pixel<0):
output_pixel=0
if(output_pixel>255):
output_pixel=255
i_transformed[x, y] = output_pixel
Now, plot the image to see the effect of passing the filter over it:
plt.gray()
plt.grid(False)
plt.imshow(i_transformed)
#plt.axis('off')
plt.show()
Page 13 of 105
Consider the following filter values and their impact on the image.
Understanding Pooling
Iterate over the image and, at each point, consider the pixel and its immediate neighbors to the
right, beneath, and right-beneath. Take the largest of those (hence max pooling) and load it into the
new image. Thus, the new image will be one fourth the size of the old.
The following code will show a (2, 2) pooling. Run it to see the output.
You'll see that while the image is one-fourth the size of the original, it kept all the features.
Page 14 of 105
new_x = int(size_x/2)
new_y = int(size_y/2)
newImage = np.zeros((new_x, new_y))
for x in range(0, size_x, 2):
for y in range(0, size_y, 2):
pixels = []
pixels.append(i_transformed[x, y])
pixels.append(i_transformed[x+1, y])
pixels.append(i_transformed[x, y+1])
pixels.append(i_transformed[x+1, y+1])
pixels.sort(reverse=True)
newImage[int(x/2),int(y/2)] = pixels[0]
# Plot the image. Note the size of the axes -- now 256 pixels instead of 512
plt.gray()
plt.grid(False)
plt.imshow(newImage)
#plt.axis('off')
plt.show()
Note the axes of that plot. The image is now 256x256, one-fourth of its original size, and the
detected features have been enhanced despite less data now being in the image.
Page 15 of 105
MODULE4: Convolutional Neural Networks (CNNs)
You now know how to do fashion image recognition using a Deep Neural Network (DNN)
containing three layers— the input layer (in the shape of the input data), the output layer (in the
shape of the desired output) and a hidden layer. You experimented with several parameters that
influence the final accuracy, such as different sizes of hidden layers and number of training
epochs.
For convenience, here's the entire code again. Run it and take a note of the test accuracy that is
printed out at the end.
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images/255.0
test_images=test_images/255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))
Run the following code. It's the same neural network as earlier, but this time with convolutional
layers added first. It will take longer, but look at the impact on the accuracy:
import tensorflow as tf
print(tf. version )
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
Page 16 of 105
test_images=test_images / 255.0
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit(training_images, training_labels, epochs=5)
test_loss, test_accuracy = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_accuracy*100))
It's likely gone up to about 93% on the training data and 91% on the validation data.
import tensorflow as tf
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images=training_images.reshape(60000, 28, 28, 1)
training_images = training_images/255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images = test_images/255.0
Next, define your model. Instead of the input layer at the top, you're going to add a convolutional
layer. The parameters are:
The number of convolutions you want to generate. A value like 32 is a good starting point.
Page 17 of 105
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 26, 26, 64) 640
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(2, 2),
#Add another convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
#Now flatten the output. After this you'll just have the same DNN structure as the non
convolutional version
tf.keras.layers.Flatten(),
#The same 128 dense layers, and 10 output layers as in the pre-convolution example:
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')])
Compile the model, call the fit method to do the training, and evaluate the loss and accuracy from
the test set.
Page 18 of 105
test_loss, test_acc = model.evaluate(test_images, test_labels)
print ('Test loss: {}, Test accuracy: {}'.format(test_loss, test_acc*100))
This code shows you the convolutions graphically. The print (test_labels[:100]) shows the first 100
labels in the test set, and you can see that the ones at index 0, index 23 and index 28 are all the
same value (9).
print(test_labels[:100])
[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
5796137672122445822848077851123987026
2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]
And you should see something like the following, where the convolution is taking the essence of
the sole of the shoe, effectively spotting that as a common feature across all shoes.
Page 19 of 105
MODULE5: Complex Images
In this codelab you'll use convolutions to classify images of horses and humans. You'll be using
TensorFlow in this lab to create a CNN that is trained to recognize images of horses and humans,
and classify them.
!wget \
https://storage.googleapis.com/learning-datasets/horse-or-human.zip \
-O /tmp/horse-or-human.zip
The following Python code will use the OS library to use operating system libraries, giving you
access to the file system and the zip file library, therefore allowing you to unzip the data.
import os
import zipfile
local_zip = '/tmp/horse-or-human.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/horse-or-human')
zip_ref.close()
The contents of the zip file are extracted to the base directory /tmp/horse-or-human, which contain
horses and human subdirectories.
In short, the training set is the data that is used to tell the neural network model that "this is what a
horse looks like" and "this is what a human looks like."
Later you'll see something called an ImageDataGenerator being used. It reads images from
subdirectories and automatically labels them from the name of that subdirectory. For example, you
have a training directory containing a horses directory and a humans
directory. ImageDataGenerator will label the images appropriately for you, reducing a coding step.
Page 20 of 105
# Directory with our training human pictures
train_human_dir = os.path.join('/tmp/horse-or-human/humans')
Now, see what the filenames look like in the horses and humans training directories:
train_horse_names = os.listdir(train_horse_dir)
print(train_horse_names[:10])
train_human_names = os.listdir(train_human_dir)
print(train_human_names[:10])
print('total training horse images:', len(os.listdir(train_horse_dir)))
print('total training human images:', len(os.listdir(train_human_dir)))
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 4
ncols = 4
# Index for iterating over images
pic_index = 0
Now, display a batch of eight horse pictures and eight human pictures. You can rerun the cell to
see a fresh batch each time.
Page 21 of 105
plt.imshow(img)
plt.show()
Here are some example images showing horses and humans in different poses and orientations:
import tensorflow as tf
Then, add convolutional layers and flatten the final result to feed into the densely connected
layers. Finally, add the densely connected layers.
model = tf.keras.models.Sequential([
# Note the input shape is the desired size of the image 300x300 with 3 bytes color
# This is the first convolution
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(300, 300, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
# The second convolution
tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The third convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
Page 22 of 105
tf.keras.layers.MaxPooling2D(2,2),
# The fourth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# The fifth convolution
tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
tf.keras.layers.MaxPooling2D(2,2),
# Flatten the results to feed into a DNN
tf.keras.layers.Flatten(),
# 512 neuron hidden layer
tf.keras.layers.Dense(512, activation='relu'),
# Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('horses') and 1 for
the other ('humans')
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()
Page 23 of 105
conv2d_4 (Conv2D) (None, 14, 14, 64) 36928
The output shape column shows how the size of your feature map evolves in each successive layer.
The convolution layers reduce the size of the feature maps by a bit due to padding and each
pooling layer halves the dimensions.
Next, configure the specifications for model training. Train your model with the
binary_crossentropy loss because it's a binary classification problem and your final activation isa
sigmoid. (For a refresher on loss metrics, see Descending into ML.) Use the rmsprop optimizer
with a learning rate of 0.001. During training, monitor classification accuracy.
Note: In this case, using the RMSprop optimization algorithm is preferable to stochastic gradient
descent (SGD) because RMSprop automates learning-rate tuning for you. (Other optimizers, such
as Adam and Adagrad, also automatically adapt the learning rate during training and would work
equally well here.)
Set up data generators that read pictures in your source folders, convert them to float32 tensors,
and feed them (with their labels) to your network.
Page 24 of 105
In Keras, that can be done via the keras.preprocessing.image.ImageDataGenerator class using the
rescale parameter. That ImageDataGenerator class allows you to instantiate generators of
augmented image batches (and their labels) via .flow(data, labels) or
.flow_from_directory(directory). Those generators can then be used with the Keras model
methods that accept data generators as
inputs: fit_generator, evaluate_generator and predict_generator.
Do the training:
history = model.fit(
train_generator,
steps_per_epoch=8,
epochs=15,
verbose=1)
The Loss and Accuracy are a great indication of progress of training. It's making a guess as to the
classification of the training data, and then measuring it against the known label, calculating the
result. Accuracy is the portion of correct guesses.
Epoch 1/15
9/9 [==============================] - 9s 1s/step - loss: 0.8662 - acc: 0.5151
Epoch 2/15
9/9 [==============================] - 8s 927ms/step - loss: 0.7212 - acc: 0.5969
Epoch 3/15
9/9 [==============================] - 8s 921ms/step - loss: 0.6612 - acc: 0.6592
Epoch 4/15
Page 25 of 105
9/9 [==============================] - 8s 925ms/step - loss: 0.3135 - acc: 0.8481
Epoch 5/15
9/9 [==============================] - 8s 919ms/step - loss: 0.4640 - acc: 0.8530
Epoch 6/15
9/9 [==============================] - 8s 896ms/step - loss: 0.2306 - acc: 0.9231
Epoch 7/15
9/9 [==============================] - 8s 915ms/step - loss: 0.1464 - acc: 0.9396
Epoch 8/15
9/9 [==============================] - 8s 935ms/step - loss: 0.2663 - acc: 0.8919
Epoch 9/15
9/9 [==============================] - 8s 883ms/step - loss: 0.0772 - acc: 0.9698
Epoch 10/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0403 - acc: 0.9805
Epoch 11/15
9/9 [==============================] - 8s 891ms/step - loss: 0.2618 - acc: 0.9075
Epoch 12/15
9/9 [==============================] - 8s 902ms/step - loss: 0.0434 - acc: 0.9873
Epoch 13/15
9/9 [==============================] - 8s 904ms/step - loss: 0.0187 - acc: 0.9932
Epoch 14/15
9/9 [==============================] - 9s 951ms/step - loss: 0.0974 - acc: 0.9649
Epoch 15/15
9/9 [==============================] - 8s 877ms/step - loss: 0.2859 - acc: 0.9338
The code will allow you to choose one or more files from your file system. It will then upload
them and run them through the model, giving an indication of whether the object is a horse or a
human.
That's due to something called overfitting, which means that the neural network is trained with
very limited data (there are only roughly 500 images of each class). So it's very good at
recognizing images that look like those in the training set, but it can fail a lot at images that are not
in the training set.
import numpy as np
from google.colab import files
from keras.preprocessing import image
uploaded = files.upload()
Page 26 of 105
for fn in uploaded.keys():
# predicting images
path = '/content/' + fn
img = image.load_img(path, target_size=(300, 300))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
images = np.vstack([x])
classes = model.predict(images, batch_size=10)
print(classes[0])
if classes[0]>0.5:
print(fn + " is a human")
else:
print(fn + " is a horse")
For example, say that you want to test with this image:
Page 27 of 105
Visualize intermediate representations:
Pick a random image from the training set, then generate a figure where each row is the output of a
layer and each image in the row is a specific filter in that output feature map. Rerun that cell to
generate intermediate representations for a variety of training images.
import numpy as np
import random
from tensorflow.keras.preprocessing.image import img_to_array, load_img
# Let's define a new Model that will take an image as input, and will output
# intermediate representations for all layers in the previous model after the first.
successive_outputs = [layer.output for layer in model.layers[1:]]
#visualization_model = Model(img_input, successive_outputs)
visualization_model = tf.keras.models.Model(inputs = model.input, outputs = successive_outputs)
# Let's prepare a random input image from the training set.
horse_img_files = [os.path.join(train_horse_dir, f) for f in train_horse_names]
human_img_files = [os.path.join(train_human_dir, f) for f in train_human_names]
img_path = random.choice(horse_img_files + human_img_files)
img = load_img(img_path, target_size=(300, 300)) # this is a PIL image
x = img_to_array(img) # Numpy array with shape (150, 150, 3)
x = x.reshape((1,) + x.shape) # Numpy array with shape (1, 150, 150, 3)
# Rescale by 1/255
x /= 255
# Let's run our image through our network, thus obtaining all
# intermediate representations for this image.
successive_feature_maps = visualization_model.predict(x)
# These are the names of the layers, so can have them as part of our plot
layer_names = [layer.name for layer in model.layers]
# Now let's display our representations
for layer_name, feature_map in zip(layer_names, successive_feature_maps):
if len(feature_map.shape) == 4:
# Just do this for the conv / maxpool layers, not the fully-connected layers
n_features = feature_map.shape[-1] # number of features in feature map
# The feature map has shape (1, size, size, n_features)
size = feature_map.shape[1]
# We will tile our images in this matrix
display_grid = np.zeros((size, size * n_features))
for i in range(n_features):
Page 28 of 105
# Postprocess the feature to make it visually palatable
x = feature_map[0, :, :, i]
x -= x.mean()
if x.std()>0:
x /= x.std()
x *= 64
x += 128
x = np.clip(x, 0, 255).astype('uint8')
# We'll tile each filter into this big horizontal grid
display_grid[:, i * size : (i + 1) * size] = x
# Display the grid
scale = 20. /
n_features
plt.figure(figsize=(scale * n_features, scale))
plt.title(layer_name)
plt.grid(False)
plt.imshow(display_grid, aspect='auto', cmap='viridis')
As you can see, you go from the raw pixels of the images to increasingly abstract and compact
representations. The representations downstream start highlighting what the network pays attention
to, and they show fewer and fewer features being "activated." Most are set to zero. That'scalled
sparsity. Representation sparsity is a key feature of deep learning.
Those representations carry increasingly less information about the original pixels of the image,
but increasingly refined information about the class of the image. You can think of a CNN (or a
deep network in general) as an information distillation pipeline.
Page 29 of 105
Add on-device object detection:
In this step, you will add the functionality to the starter app to detect objects in images. As you saw
in the previous step, the starter app contains boilerplate code to take photos with the camera app on
the device. There are also 3 preset images in the app that you can try object detection on if you are
running the codelab on an Android emulator.
When you have selected an image, either from the preset images or taking a photo with the camera
app, the boilerplate code decodes that image into a Bitmap instance, shows it on the screen and
calls the runObjectDetection method with the image.
There are only 3 simple steps with 3 APIs to set up ML Kit ODT:
/**
* ML Kit Object Detection Function
*/
private fun runObjectDetection(bitmap: Bitmap) {
}
ML Kit provides a simple API to create an InputImage from a Bitmap. Then you can feed
an InputImage into the ML Kit APIs.
ML Kit follows Builder Design Pattern. You will pass the configuration to the builder, then acquire
a detector from it. There are 3 options to configure (the options in bold are used in this codelab):
Page 30 of 105
detector mode (single image or stream)
This codelab is for single image - multiple object detection & classification. Add that now:
The following code does just that (copy and append it to the existing code inside fun
runObjectDetection(bitmap:Bitmap)):
The total number of objects detected. Each detected object is described with:
Page 31 of 105
trackingId: an integer you use to track it cross frames (NOT used in this codelab).
labels: a list of label(s) for the detected object (only when classification is enabled):
text (Get the text of this label including "Fashion Goods", "Food", "Home Goods",
"Place", "Plant")
Let's run the codelab by clicking Run ( ) in the Android Studio toolbar. Try selecting a preset
image, or take a photo, then look at the logcat window( ) inside the IDE.
Page 32 of 105
...which means that the detector saw 3 objects:
The position inside the boundingBox rectangle (e.g. (481, 2021) – (2426, 3376))
The detector is pretty confident that the 1st is a Food (90% confidence—it was salad).
There is some boilerplate code inside the codelab to help you visualize the detection result.
Leverage these utilities to make our visualization code simple:
data class BoxWithText(val box: Rect, val text: String) This is a data class to store an
object detection result for visualization. box is the bounding box where the object locates,
and text is the detection result string to display together with the object's bounding box.
Go to where you call debugPrint() and add the following code snippet below it:
Page 33 of 105
BoxWithText(obj.boundingBox, text)
}
// Draw the detection result on the input bitmap
val visualizedResult = drawDetectionResult(bitmap, detectedObjects)
// Show the detection result on the app screen
runOnUiThread {
inputImageView.setImageBitmap(visualizedResult)}
Once the app loads, press the Button with the camera icon, point your camera to an object, take a
photo, accept the photo (in Camera App) or you can easily tap any preset images. You should see
the detection results; press the Button again or select another image to repeat a couple of times to
experience the latest ML Kit ODT!
You have used ML Kit to add Object Detection capabilities to your app:
Create Detector
Page 34 of 105
ASSESMENT:
1) The advanced computer-vision task that tells you where the objects are within the image
by returning a mask that tells you which pixel belongs to which object is known as .
Object detection
Item detection
Image classification
Image segmentation
2) True or false? One drawback of object detection is that it can only detect one object.
True
False
Page 35 of 105
Go further with object detection
MODULE1: Build and deploy a custom object-detection model with
TensorFlow Lite
Integrate a TFLite pre-trained object detection model and see the limit of what the
model can detect.
Deploy the custom model to the Android app using TFLite Task Library.
Object Detection:
Object detection is a set of computer vision tasks that can detect and locate objects in a digital
image. Given an image or a video stream, an object detection model can identify which of a known
set of objects might be present, and provide information about their positions within the image.
Page 36 of 105
TensorFlow provides pre-trained, mobile optimized models that can detect common objects, such
as cars, oranges, etc. You can integrate these pre-trained models in your mobile app with just a few
lines of code. However, you may want or need to detect objects in more distinctive or offbeat
categories. That requires collecting your own training images, then training and deploying your
own object detection model.
TensorFlow Lite
TensorFlow Lite is a cross-platform machine learning library that is optimized for running
machine learning models on edge devices, including Android and iOS mobile devices.
TensorFlow Lite is actually the core engine used inside ML Kit to run machine learning models.
There are two components in the TensorFlow Lite ecosystem that make it easy to train and deploy
machine learning models on mobile devices:
Model Maker is a Python library that makes it easy to train TensorFlow Lite models using
your own data with just a few lines of code, no machine learning expertise required.
Task Library is a cross-platform library that makes it easy to deploy TensorFlow Lite
models with just a few lines of code in your mobile apps.
This codelab focuses on TFLite. Concepts and code blocks that are not relevant to TFLite and
object detection are not explained and are provided for you to simply copy and paste.
Click the following link to download all the code for this codelab:
Unpack the downloaded zip file. This will unpack a root folder (odml-pathways-main) with all of
the resources you will need. For this codelab, you will only need the sources in the object-
detection/codelab2/android subdirectory.
Page 37 of 105
Import the starter app
Let's start by importing the starter app into the Android Studio.
1. Open Android Studio and select Import Project (Gradle, Eclipse ADT, etc.)
2. Open the starter folder from the source code you downloaded earlier.
To be sure that all dependencies are available to your app, you should sync your project with
gradle files when the import process has finished.
3. Select Sync Project with Gradle Files ( ) from the Android Studio
toolbar. Import starter/app/build.gradle
If this button is disabled, make sure you import only starter/app/build.gradle and not the entire
repository.
Now that you have imported the project into Android Studio, you're ready to run the app for the
first time.
Connect your Android device via USB to your computer or start the Android Studio emulator, and
In order to keep this codelab simple and focused on the machine learning bits, the starter app
contains some boilerplate code that do a few things for you:
It contains some stock images for you to try out object detection on an Android emulator.
It has a convenient method to draw the object detection result on the input bitmap.
fun runObjectDetection(bitmap: Bitmap) This method is called when you choose a preset
image or take a photo. bitmap is the input image for object detection. Later in this codelab,
you will add object detection code to this method.
data class DetectionResult(val boundingBoxes: Rect, val text: String) This is a data class
that represents an object detection result for visualization. boundingBoxes is the rectangle
where the object locates, and text is the detection result string to display together with the
object's bounding box.
Page 38 of 105
fun drawDetectionResult(bitmap: Bitmap, detectionResults: List<DetectionResult>):
Bitmap This method draws the object detection results in detectionResults on theinput
bitmap and returns the modified copy of it.
Now you'll build a prototype by integrating a pre-trained TFLite model that can detect common
objects into the starter app.
There are several object detector models on TensorFlow Hub that you can use. For this codelab,
you'll download the EfficientDet-Lite Object detection model, trained on the COCO 2017 dataset,
optimized for TFLite, and designed for performance on mobile CPU, GPU, and EdgeTPU.
Next, use the TFLite Task Library to integrate the pre-trained TFLite model into your starter app.
The TFLite Task Library makes it easy to integrate mobile-optimized machine learning models
into a mobile app. It supports many popular machine learning use cases, including object detection,
image classification, and text classification.
TFLite Task Library only supports TFLite models that contain valid metadata. You can find more
supported object detection models from this TensorFlow Hub collection.
1. Copy the model that you have just downloaded to the assets folder of the starter app.
You can find the folder in the Project navigation panel in Android Studio.
Page 39 of 105
2. Name the file model.tflite.
Go to the app/build.gradle file and add this line into the dependencies configuration:
implementation 'org.tensorflow:tensorflow-lite-task-vision:0.3.1'
To be sure that all dependencies are available to your app, you should sync your project with
gradle files at this point. Select Sync Project with Gradle Files ( ) from the Android Studio
toolbar.
(If this button is disabled, make sure you import only starter/app/build.gradle, not the entire
repository.)
There are only 3 simple steps with 3 APIs to load and run an object detection model:
You achieve these inside the function runObjectDetection(bitmap: Bitmap)in file MainActivity.kt.
/**
* TFLite Object Detection Function
*/
private fun runObjectDetection(bitmap: Bitmap) {
//TODO: Add object detection code here
}
Page 40 of 105
org.tensorflow.lite.support.image.TensorImage
org.tensorflow.lite.task.vision.detector.ObjectDetector
The images you'll use for this codelab are going to come from either the on-device camera, or
preset images that you select on the app's UI. The input image is decoded into the Bitmap format
and passed to the runObjectDetection method.
TFLite provides a simple API to create a TensorImage from Bitmap. Add the code below to the
top of runObjectDetection(bitmap:Bitmap):
TFLite Task Library follows the Builder Design Pattern. You pass the configuration to a builder,
then acquire a detector from it. There are several options to configure, including those to adjust the
sensitivity of the object detector:
max result (the maximum number of objects that the model should detect)
score threshold (how confidence the object detector should be to return a detected object)
Initialize the object detector instance by specifying the TFLite model file name and the
configuration options:
Page 41 of 105
Feed Image(s) to the detector
Add the following code to fun runObjectDetection(bitmap:Bitmap). This will feed your images to
the detector.
// Step 3: feed given image to the model and print the detection result
val results = detector.detect(image)
boundingBox: the rectangle declaring the presence of an object and its location within the
image
categories: what kind of object it is and how confident the model is with the detection
result. The model returns multiple categories, and the most confident one is first.
Add the following code to fun runObjectDetection(bitmap:Bitmap). This calls a method to print
the object detection results to Logcat.
Page 42 of 105
Start by importing the starter app into the Android Studio.
Go to Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the product-
search/codelab2/android/final folder from the source code you downloaded earlier.
Page 43 of 105
Run the starter app
Now that you have imported the project into Android Studio, you are ready to run the app for the
first time.
Connect your Android device via USB to your host or Start the Android Studio emulator, and
Now the app should have launched on your Android device. It is already functioning, but it uses
the demo product search backend that Google has deployed for you.
Next, you'll update the app to use the backend you built earlier in this codelab.
Go to the ProductSearchAPIClient class and you will see the configs of the product search
backend already defined. Comment out the configs of the demo backend:
// Option 2: Go through the Vision API Product Search quickstart and deploy to your project.
// Fill in the const below with your project info.
const val VISION_API_URL = "https://vision.googleapis.com/v1"
const val VISION_API_KEY = "YOUR_API_KEY"
const val VISION_API_PROJECT_ID = "YOUR_PROJECT_ID"
const val VISION_API_LOCATION_ID =
"YOUR_LOCATION_ID"
Page 44 of 105
const val VISION_API_PRODUCT_SET_ID = "YOUR_PRODUCT_SET_ID"
VISION_API_KEY is the API key that you created earlier in this codelab.
Run it:
Now click Run ( ) in the Android Studio toolbar. Once the app loads, tap any preset images, select
an detected object, tap the Search button to see the search results. The app is now using the
product search backend that you have just created!
Page 45 of 105
ASSESMENT:
1. What API allows you to query an image and search for visually similar products from
a product catalog?
Visual API Product Search
Picture API Product Search
Vision API Product Search
Sight API Product Search
2.Which of the following product categories does Vision API Product Search support?
Choose as many answers as you see fit.
Homegoods
Apparel
Toys
Food
Packaged goods
General
Machinery
3.It is strongly recommended that you restrict access to the to prevent unauthorized
access.
IDE
API calls
API key
Mobile app
Page 46 of 105
Go further with image classification
All of the code to follow along has been prepared for you and is available to execute using Google
Colab here. If you don't have access to Google Colab, you can clone the repo and use the notebook
called CustomImageClassifierModel.ipynb which can be found in the ImageClassificationMobile-
>colab directory.
The easiest way to do this is to create a .zip or .tgz file containing the images, sorted into
directories. For example, if you use images of daisies, dandelions, roses, sunflower and tulips, you
can organize them into directories like this:
Zip that up and host it on a server, and you'll be able to train models with it. You'll use one that has
been prepared for you in the rest of this lab.
This lab will assume you are using Google Colab to train the model. You can find colab at
colab.research.google.com. If you're using another environment you may have to install a lot of
dependencies, not least TensorFlow.
1. Install TensorFlow Lite Model Maker. You can do this with a pip install. The &>
/dev/null at the end just suppresses the output. Model Maker outputs a lot of stuff that
isn't immediately relevant. It's been suppressed so you can focus on the task at hand.
2. Next you'll need to import the libraries that you need to use and ensure that you are using
TensorFlow 2.x:
Page 47 of 105
# Imports and check that we are using TF2.x
import numpy as np
import os
from tflite_model_maker import configs
from tflite_model_maker import ExportFormat
from tflite_model_maker import model_spec
from tflite_model_maker import image_classifier
from tflite_model_maker.image_classifier import DataLoader
import tensorflow as tf
assert tf. version .startswith('2')
tf.get_logger().setLevel('ERROR')
Now that the environment is ready, it's time to start creating your model!
If your images are organized into folders, and those folders are zipped up, then if you download the
zip and decompress it, you'll automatically get your images labelled based on the folder they're in.
This directory will be referenced as data_path.
data_path = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
This data path can then be loaded into a neural network model for training with TensorFlow Lite
Model Maker's ImageClassifierDataLoader class. Just point it at the folder and you're good to go.
One important element in training models with machine learning is to not use all of your data for
training. Hold back a little to test the model with data it hasn't previously seen. This is easy to do
with the split method of the dataset that comes back from ImageClassifierDataLoader. By passing
a 0.9 into it, you'll get 90% of it as your training data, and 10% as your test data:
data = DataLoader.from_folder(data_path)
train_data, test_data = data.split(0.9)
Now that your data is prepared, you can create a model using it.
Page 48 of 105
Create the Image Classifier Model
Model Maker abstracts a lot of the specifics of designing the neural network so you don't have to
deal with network design, and things like convolutions, dense, relu, flatten, loss functions and
optimizers. For a default model, you can simply use a single line of code to create a model by
training a neural network with the provided data:
model = image_classifier.create(train_data)
When you run this, you'll see output that looks a bit like the following:
Model: "sequential_2"
None
Epoch 1/5
103/103 [===] - 15s 129ms/step - loss: 1.1169 - accuracy: 0.6181
Epoch 2/5
103/103 [===] - 13s 126ms/step - loss: 0.6595 - accuracy: 0.8911
Epoch 3/5
103/103 [===] - 13s 127ms/step - loss: 0.6239 - accuracy: 0.9133
Epoch 4/5
103/103 [===] - 13s 128ms/step - loss: 0.5994 - accuracy: 0.9287
Epoch 5/5
103/103 [===] - 13s 126ms/step - loss: 0.5836 - accuracy: 0.9385
Page 49 of 105
The first part is showing your model architecture. What Model Maker is doing behind the scenes is
called Transfer Learning, which uses an existing pre-trained model as a starting point, and just
taking the things that that model learned about how images are constructed and applying them to
understanding these 5 flowers. You can see this in the first line that reads:
The key is the word ‘Hub', telling us that this model came from TensorFlow Hub. By default,
TensorFlow Lite Model Maker uses a model called ‘MobileNet' which is designed to recognize
1000 types of image.
Earlier you split the data into training and test data, so you can get a gauge for how the network
performs on data it hasn't previously seen – a better indicator of how it might perform in the real
world by using model.evaluate on the test data:
Note the accuracy here. It's 88.01%, so using the default model in the real world should expect that
level of accuracy. That's not bad for the default model that you trained in about a minute. Of course
you could probably do a lot of tweaking to improve the model, and that's a science unto itself!
Now that the model is trained, the next step is to export it in the .tflite format that a mobile
application can use. Model maker provides an easy export method that you can use — simply
specify the directory to output to.
model.export(export_dir='/mm_flowers')
Page 50 of 105
From here, you'll get a listing of the current directory. Use the indicated button to move "up" a
directory:
In your code you specified to export to mm_flowers directory. Open that, and you'll see a file
called ‘model.tflite'. This is your trained model.
Select the file and you'll see 3 dots pop up on the right. Click these to get a context menu, and you
can download the model from there.
After a few moments your model will be downloaded to your downloads folder.
You're now ready to integrate it into your mobile app! You'll do that in the next lab.
Page 51 of 105
MODULE2: Integrate a custom model into your app
Open it in Android Studio, do whatever updates you need, and when it's ready run the app to be
sure it works. You should see something like this:
It's quite a primitive app, but it shows some very powerful functionality with just a little code.
However, if you want this flower to be recognized as a daisy, and not just as a flower, you'll have
to update the app to use your custom model from the Create a custom model for your image
classifier codelab.
1. Using Android Studio, find the app-level build.gradle file. The easiest way to do this is in
the project explorer. Make sure Android is selected at the top, and you'll see a folder
for Gradle Scripts at the bottom.
2. Open the one that is for the Module, with your app name followed by ‘.app' as shown here
– (Module: ImageClassifierStep1.app):
Page 52 of 105
3. At the bottom of the file, find the dependencies setting. In there you should see this
The version number might be different. Always find the latest version number from the ML Kit
site at: https://developers.google.com/ml-kit/vision/image-labeling/android
4. Replace this with the custom image labeling library reference. The version number for this
can be found at: https://developers.google.com/ml-kit/vision/image-labeling/custom-
models/android
Implementation'com.google.mlkit:image-labeling-custom:16.3.1'
5. Additionally, you'll be adding a .tflite model that you created in the previous lab. You don't
want this model to be compressed when Android Studio compiles your app, so make sure
you use this setting in the Android section of the same build.gradle file:
aaptOptions{ noCom
press "tflite"
}
Make sure it's not within any other setting. It should be nested directly under the android tag.
Here's an example:
In the previous codelab you created your custom model and downloaded it as model.tflite.
In your project, find your assets folder that currently contains flower1.jpg. Copy the model to that
folder as follows:
1. Right-click the Assets folder in Android Studio. In the menu that opens, select Reveal in
Finder. (‘Show in Explorer' on Windows, and ‘Show in Files' on Linux.)
Page 53 of 105
2. You'll be taken to the directory on the file system. Copy the model.tflite file into that
directory, alongside flower1.jpg.
Android Studio will update to show both files in your assets folder:
The first step will be to add some code to load the custom model.
1. In your MainActivity file, add the following to your onCreate, immediately below the line
that reads setContentView(R.layout.activity_main).
This will use a LocalModel to build from the model.tflite asset. If Android Studio complains by
turning ‘LocalModel' red, press ALT + Enter to import the library. It should add an import to
com.google.mlkit.common.model.LocalModel for you.
Page 54 of 105
val localModel = LocalModel.Builder()
.setAssetFilePath("model.tflite")
.build()
Previously, in your btn.setOnClickListener handler you were using the default model. It was set
up with this code:
You could effectively filter out lower quality results by using a high confidence threshold. Setting
this to 0.9 for example wouldn't return any label with a priority lower than that. The
setMaxResultCount() is useful in models with a lot of classes, but as this model only has 5, you'll
just leave it at 5.
Now that you have options for the labeler, you can change the instantiation of the labeler to:
The rest of your code will run without modification. Give it a try!
Page 55 of 105
Here you can see that this flower was now identified as a daisy with a .959 probability!
Let's say you added a second flower image, and reran with that:
It identifies it as a rose.
1. First you'll need the app from the first Codelab. If you have gone through the lab, it will be
called ImageClassifierStep1. If you don't want to go through the lab, you can clone the finished
version from the repo. Please note that the pods and .xcworkspace aren't present in the repo, so
before continuing to the next step be sure to run ‘pod install' from the same directory as the
.xcproject.
2. Open ImageClassifierStep1.xcworkspace in Xcode. Note that you should use the .xcworkspace
and not the .xcproject because you have bundled ML Kit using pods, and the workspace will
load these.
The first app used a pod file to get the base ML Kit Image Labeler libraries and model. You'll
need to update that to use the custom image labelling libraries.
Page 56 of 105
1. Find the file called podfile in your project directory. Open it, and you'll see something like
this:
target 'ImageClassifierStep1' do
pod 'GoogleMLKit/ImageLabeling'
end
3. Once you're done, use the terminal to navigate to the directory containing the podfile
(as well as the .xcworkspace) and run pod install.
After a few moments the MLKitImageLabeling libraries will be removed, and the custom ones
added. You can now open your .xcworkspace to edit your code.
1. With the workspace open in Xcode, drag the model.tflite onto your project. It should be in
the same folder as the rest of your files such as ViewController.swift or Main.storyboard.
2. A dialog will pop up with options for adding the file. Ensure that Add to Targets is
selected, or the model won't be bundled with the app when it's deployed to a device.
Note that the ‘Add to Targets' entry will have ImageClassifierStep1 if you started from that and
are continuing through this lab step-by-step or ImageClassifierStep2 (as shown) if you jumped
ahead to the finished code.
Page 57 of 105
This will ensure that you can load the model. You'll see how to do that in the next step.
1. Open your ViewController.swift file. You may see an error on the ‘import
MLKitImageLabeling' at the top of the file. This is because you removed the generic image
labeling libraries when you
updated your pod file. Feel free to delete this line, and update with the following:
import MLKitVision
import MLKit
import MLKitImageLabelingCommon
import MLKitImageLabelingCustom
It might be easy to speed read these and think that they're repeating the same code! But it's
"Common" and "Custom" at the end!
2. Next you'll load the custom model that you added in the previous step. Findthe
getLabels() func. Beneath the line that reads visionImage.orientation =
image.imageOrientation, add these lines:
3. Find the code for specifying the options for the generic ImageLabeler. It's probably giving
you an error since those libraries were removed:
Replace that with this code, to use a CustomImageLabelerOptions, and which specifies the local
model:
Page 58 of 105
let options = CustomImageLabelerOptions(localModel: localModel)
...and that's it! Try running your app now! When you try to classify the image it should be more
accurate – and tell you that you're looking at a daisy with high probability!
Let's say you added a second flower image, and reran with that:
The app successfully detected that this image matched the label ‘roses'!
The resulting app is, of course, very limited because it relied on bundled image assets. However,
the ML part is working nicely. You could, for example, use AndroidX Camera to take frames
froma live feed and classify them to see what flowers your phone recognizes!
From here the possibilities are endless – and if you have your own data for something other than
flowers, you have the foundations of what you need to build an app that recognizes them using
Computer Vision.
Page 59 of 105
ASSESMENT:
1. Model Maker abstracts a lot of the specifics of designing the neural network so
you don’t have to deal with network design, and things like.
Convolutions
Dense
Relu
Flatten
File type
Loss function
Optimizers
Pixels
2. True or false? The confidence threshold sets a bar for the quality of predictions to return.
True
False
Page 60 of 105
CONCLUSION
Moreover, an AI-ML virtual internship enhances your resume and professional profile,
making you a more attractive candidate to potential employers. It demonstrates your ability to
adapt to remote working environments, collaborate with a team, and manage projects
independently.Ultimately, the internship prepares you for a successful career in the rapidly
evolving landscape of artificial intelligence and machine learning, positioning you at the
forefront of technological innovation.
Page 61 of 105
Page 62 of 105
Page 63 of 105
Page 64 of 105
Page 65 of 105
1.
2.
3.
4.
Page 66 of 105