0% found this document useful (0 votes)

73 views16 pages

Autonomous Robot Operation

This document is a special project report submitted by 4 students for their mechanical engineering degree. It details the development of a robot arm system capable of autonomous pick and place tasks using 2D computer vision. The system uses a convolutional neural network and homography to detect objects and estimate their 6D pose in order to accurately grasp objects. It first detects objects in images from a fixed camera using YOLO object detection, then orients the end effector and estimates depth using a camera on the end effector and ArUco markers on the target surface. The system was tested by picking up a computer mouse as the target object.

Uploaded by

Diego Correa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views16 pages

Autonomous Robot Operation

Uploaded by

Diego Correa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

National Taiwan University of Science

and Technology
Mechanical Engineering
Special Project Report

Student Number: F11003106, F11003115, F11003114,

F11003108

Robot Arm Omnidirectional Autonomous

Pick and Place Task using 2D Vision

Name： Diego Correa

Enrique Miranda
Gonzalo Miltos
Mauricio Cristaldo
Advisor：Chyi-Yeu Lin

01/10/2023
1. Introduction

Through the development of Artificial Intelligence, it is a fact that industrial robotic

arms gain more strength when it comes to performing simple or even the most demanding
tasks in the safest environments as well as in the most complex ones.
6D pose estimation for pick-and-place tasks has been a vital research interest in
robotics since its conception. An accurate and fast algorithm for such a job is still to be
found. Moreover, most vision systems depend on a depth perception device that is costly,
and the technology is still developing.
This research aims to support the development of image-based visual servoing
techniques for industrial and service robots by creating a practical framework for pseudo-
6D pose estimation for a pick-and-place algorithm using eye-in-hand and eye-to-hand
configurations. The main idea developed in this work is to use a ChAruco board along
the object to be picked for the in-hand camera to detect its orientation and depth while
using Convolutional Neural Networks (CNNs) with homography to get its position, this
is equivalent to an in-hand 6D pose estimation if the object to be picked and ChAruco
board are at fixed angles to each other, this allows for an accurate and fast grasping.
2. Method

Figure 1 illustrates the proposed algorithm for the task. The architecture is composed
of two main phases. In the first stage, the fixed-camera image is analyzed in search of the
object of interest in the workspace using a pre-trained CNN. The neural network returns
a set of possible targets with their confidence and bounding box corners coordinates. The
center coordinates of the largest-confidence target are converted into real-world robot
cartesian coordinates that are subsequently sent to the manipulator through Ethernet
communication. The second phase starts as the end effector is brought close to the object,
having the in-hand camera detect the orientation of the ChAruco board on which the
object stands. This orientation is converted to Roll-Pitch-Yaw angles and is used to
position the end effector normal to the ChAruco board surface. Finally, the ChAruco itself
is used to acquire the depth of the target, and the object position is detected by a CNN
model, allowing for the robot’s gripper to grasp the target.

Figure 1. Proposed algorithm architecture.

2.1. Object Detection Problem

To start with the first stage of the algorithm and approach to the target, object
detection techniques are employed. Object detection is the task of detecting instances of
objects of a specific class within an image or video [1]. It locates objects that exist in an
image and encloses them inside a bounding box with their corresponding types or class
labels attached.

Algorithms for object detection are a combination of two tasks that are image
classification and object localization.

Image classification algorithms can predict the class or type of an object that is in the
image based on a predefined set of classes that the algorithm previously trained. For
example, given an image with a single object as input, as seen in Figure 2, the output
generated will be a class or a label of the corresponding object and the probability of the
prediction.

Object localization algorithms enclose an object in the image within a bounding box.
Again, we have an image with one or more objects as input. However, this time the output
will be the location of the bounding boxes using their position, height, and width. The
differences between them can be appreciated in the figure below.

Figure 2. Differences between image classification, object localization, and object detection,
respectively.

The problem of detecting and localizing the object can be solved using object
detection algorithms such as R-CNN [2], Fast R-CNN [3] or YOLO [4]. In the present
work, a variation of the YOLO network is employed to perform the aforementioned task.
YOLO stands for You Only Look once and is one of the most popular models used in
object detection and computer vision. This algorithm uses a neural network-based
approach to make predictions on the input images, achieving results with high accuracy
and faster than other approaches.

2.2. YOLO mechanism

The many components of object detection are combined into one neural network by
YOLO. The network predicts each bounding box using features from the entire image.
Additionally, it simultaneously predicts all bounding boxes for a picture across all classes.
This implies that the network considers the entire image and all its objects when making
decisions. The YOLO design maintains excellent average precision while enabling end-
to-end training and real-time speeds. The input image is divided into an S x S grid by the
system. A grid cell detects an object if its center falls within that grid cell. Each grid cell
predicts B bounding boxes and their corresponding confidence scores. These confidence
scores reflect how confident the model is that the box contains an object and how accurate
it thinks the box is that it predicts. The confidence score should be zero if there is no
object present in that cell. Otherwise, the desired confidence score is given by the
intersection over union (IOU) between the predicted box and the ground truth (Figure 3).
A simplified diagram of the overall process can be appreciated in Figure 4 [4].

Figure 3. IOU definition.

Figure 4. Simplified process of detection of objects by YOLO model. Image from [4].

Five predictions compose each bounding box: x, y, w, h, and confidence. The

coordinates of the center of the box relative to the bounds of the grid are represented by
“x” and “y”. The width and height, denoted as “w” and “h” respectively, are predicted
relative to the whole image. Finally, the confidence prediction represents the Intersection
Over Union (IOU) between the predicted box and any ground truth box. Each grid cell
also predicts C conditional class probabilities. These probabilities are conditioned on the
grid cell containing an object. The network is only capable of predict one set of class
probabilities per grid cell, regardless of the number of boxes B [4].
2.3. Creation of the computer mouse model
The target chosen to perform the pick and place task is a computer mouse. The process
of creating a model was performed using YOLOv5, which is one of the most recent
versions of the YOLO family [5].
The procedure began with the collection of the data to form the dataset. More than
150 pictures of the object were taken. Then augmentation, utilizing the Roboflow data
augmentation tool [6] was performed, yielding a dataset of 364 images. The next step was
labeling the bounding boxes of the class. Make Sense AI [7] was the tool that helped to
accomplish this task (Figure 5).
Figure 5. Screen capture of Make Sense AI tool.

After labeling, the dataset was divided into 324 images for training and 40 images for
validation. The training was carried out using the YOLOv5 custom training notebook
available with GoogleColab [5]. The performance of the trained model is measured by
mAP or Mean Precision average. mAP is equal to the average of the Average Precision
metric across all classes in a model. mAP can be used to compare both different models
on the same task and different versions of the same model. mAP is measured between 0
and 1 [8]. The following chart resumes the results of our model.

Figure 6. Chart with mAP score of the training.

2.4. The model in action

The load of the model was done by using the method in [9]. This method retrieves:

• ID for the object.

• Upper left and lower right corners of bounding boxes (in pixels).
• Confidence number.
• Class number.
• Class name.

With the coordinates obtained with the model, we were able to draw the bounding
boxes of the objects and roughly find the position of the centroid in pixels. Then, the
position of the centroid is transformed to real world coordinates with a technique
described later in this paper. An overall visual representation of the data obtained can be
seen in the figure below.

Figure 7. The model in action.

2.5. 2D pose estimation problem

Once the center of the object of interest is detected in the first stage, the pixel
coordinates given from the fixed camera are needed to be converted into real world
measurements in robot coordinates, we may represent the projection from 3D points in
the world to 2D points in the image plane of a camera as:
𝑿𝑿𝑾𝑾
𝒖𝒖 𝒇𝒇𝒙𝒙 𝟎𝟎 𝒑𝒑𝒙𝒙 𝟎𝟎 𝑪𝑪 𝑪𝑪
𝑹𝑹 𝒕𝒕 𝒀𝒀
�𝒗𝒗� = � 𝟎𝟎 𝒇𝒇𝒚𝒚 𝒑𝒑𝒚𝒚 𝟎𝟎� � 𝑾𝑾 𝑾𝑾 � � 𝑾𝑾 �
𝟎𝟎 𝟏𝟏 𝒁𝒁𝑾𝑾 (1)
𝟏𝟏 𝟎𝟎 𝟎𝟎 𝟏𝟏 𝟎𝟎 𝟏𝟏

Where u and v are the pixel coordinates given from the camera, fx, fy, px, and py are
the focal length of the camera in the x-axis, y-axis, and the principal point in the x-axis,
and y-axis, respectively, all the parameters inside this matrix are called intrinsic camera
parameters and are known from a previous camera calibration using the method
𝐶𝐶 𝐶𝐶
explained in [10]. �𝑹𝑹𝑊𝑊 𝒕𝒕𝑊𝑊 � represents the extrinsic camera parameters, being a linear
0 1
transformation from world coordinates to camera coordinates, a variation of the method
presented in [11] is used in this work to acquire this transformation. Finally, XW, YW, and
ZW represent world coordinates. Since the camera is calibrated and its pose and height to
the table are fixed, the equation only has 2 degrees of freedom; only XW and YW can vary
when the object moves around the table. Thus, the process to map a (u, v) pixel coordinate
to a real-world (XW, YW, ZW) coordinate on the table is straightforward, the result of the
transformation can be seen in Figure 7.

2.6. Hand-eye calibration problem

The purpose of the hand-eye calibration problem is to find the transformation between
the coordinates of the in-hand camera and the robot coordinate system. The essence is to
solve the problem 𝑨𝑨𝑨𝑨 = 𝑿𝑿𝑿𝑿 (See Figure 8 for reference), where X is the transformation
from the camera coordinate system to the robot coordinate system. As shown in the
formula below, the transformation can be solved using the robot pose transformation and
camera pose transformation from the target.

𝑾𝑾(𝟏𝟏) 𝒈𝒈 𝒄𝒄(𝟏𝟏) 𝑾𝑾(𝟐𝟐) 𝒈𝒈 𝒄𝒄(𝟐𝟐)

𝑻𝑻𝒈𝒈 𝑻𝑻𝒄𝒄 𝑻𝑻𝒕𝒕 = 𝑻𝑻𝒈𝒈 𝑻𝑻𝒄𝒄 𝑻𝑻𝒕𝒕

𝒈𝒈 𝒈𝒈 𝒄𝒄(𝟐𝟐)
(𝑻𝑻𝑾𝑾(𝟐𝟐) )−𝟏𝟏 𝑻𝑻𝑾𝑾(𝟏𝟏)
𝒄𝒄(𝟏𝟏) −𝟏𝟏
𝒈𝒈 𝒈𝒈 𝑻𝑻𝒄𝒄 = 𝑻𝑻𝒄𝒄 𝑻𝑻𝒕𝒕 (𝑻𝑻𝒕𝒕 ) (2)

𝒈𝒈
Then let: 𝑨𝑨 = (𝑻𝑻𝑾𝑾(𝟐𝟐) )−𝟏𝟏 𝑻𝑻𝑾𝑾(𝟏𝟏)
𝒄𝒄(𝟐𝟐) 𝒄𝒄(𝟏𝟏) −𝟏𝟏
𝒈𝒈 𝒈𝒈 , 𝑩𝑩 = 𝑻𝑻𝒕𝒕 (𝑻𝑻𝒕𝒕 ) , 𝑿𝑿 = 𝑻𝑻𝒄𝒄 finally:

𝑨𝑨𝑨𝑨 = 𝑿𝑿𝑿𝑿

According to this mathematical model, the solution to the transformation of the

camera coordinate system to the robot coordinate system requires us to establish an end-
effector frame, which is done by calculating the Tool Center Point (TCP) of the end
effector, a target is also required, for this experiment we use a ChAruco board as the target
because it is easy to detect, having the flexibility of an ArUco marker with the precision
of a normal checkerboard commonly used for calibration.

Figure 8. The eye in hand problem. Image adapted from [12]

2.7. 6D pose estimation and object grasping

Once the manipulator brings the hand-eye camera close to the target, the ChAruco
board below the target is detected and its pose is estimated by using Perspective N Point
(PnP). Since the object of interest can be at any point on the ChAruco surface, and its
frame orientation is at fixed angles to the ChAruco, we are only interested in the surface
orientation of the ChAruco, and the object position is going to be acquired with our
trained CNN model. Once the ChAruco orientation is acquired solving the PnP problem,
we acquire the rotation existing between the hand-eye camera frame to the target frame
𝑅𝑅𝐶𝐶𝑇𝑇 , we transform this rotation to relate the gripper end-effector frame with the target
frame by the expression 𝑅𝑅𝐺𝐺𝑇𝑇 = 𝑅𝑅𝐶𝐶𝑇𝑇 𝑅𝑅𝐺𝐺𝐶𝐶 . Since we cannot move the robot in end-effector
frames, we have to apply the so-called similarity transform which allows to convert a
given linear transformation in the camera frame to the same linear transformation in the
world robot-frame, the similarity transform is expressed in (3). Once the rotation matrix
to position the gripper normal to the surface of the ChAruco is obtained, we parametrize
it with the roll-pitch-yaw representation.

𝐺𝐺 −1 𝐺𝐺
𝑅𝑅𝑤𝑤 = (𝑅𝑅𝑊𝑊 ) 𝑅𝑅𝐶𝐶 𝑅𝑅𝑊𝑊 (3)
Figure 9. Linear transformations between frames.

Once the gripper is normal to the surface of the ChAruco, as shown in Figure 10,
the CNN model is used again to detect the position of the target in pixels. Using the
previously introduced method, the same mapping is done from (u, v) pixel coordinates
into (𝑋𝑋𝐶𝐶 , 𝑌𝑌𝐶𝐶 , 𝑍𝑍𝐶𝐶 ) camera coordinates, where the 𝑍𝑍𝐶𝐶 component is extracted from the
ChAruco board. The linear transformation from camera coordinates into world
coordinates is now used, as shown below

𝑷𝑷𝑾𝑾 = 𝑻𝑻𝑾𝑾
𝑪𝑪 𝑷𝑷
𝑪𝑪

𝑷𝑷𝑾𝑾 = 𝑻𝑻𝑾𝑾 𝑮𝑮 𝑪𝑪
𝑮𝑮 𝑻𝑻𝑪𝑪 𝑷𝑷

𝑷𝑷𝑪𝑪 represents camera coordinates of the target, 𝑷𝑷𝑾𝑾 the world coordinates of the target,
𝑻𝑻𝑮𝑮𝑪𝑪 is the homogeneous transformation from camera coordinates to end-effector
coordinates, which is known after solving the hand-eye calibration problem.
Figure 10. Orientation of the gripper normal to the surface and reposition towards the object of interest.

Since the orientation and coordinates of the object are now known, the object
can be grasped and moved without much trouble.

2.8. Computer–Robot Interface

The computer and the iRX6 digital servo controller (robot controller) are linked
together by Ethernet communication. The computer runs a Python code that implements
the object detection model, the 2D pose estimation algorithm, and interacts with external
devices such as the fixed and hand-eye cameras, the 6DOF robot manipulator and the
gripper. The iRX6 receives commands or messages from the computer to action the
manipulator towards the target with the desired pose to perform the grasp. Figure 11
shows the overall interface between the computer and the robot controller.

Figure 11. Computer-Robot Interface Schematic diagram.

3. Results

Figure 12. The procedure's phases in order.

As seen in the figure above, initially, the manipulator awaits instructions from the
computer. Once the image from the fixed camera was processed and the object was
detected, its robot coordinates were acquired and sent to the robot servo controller to
approach the target. Immediately after, the ChAruco board pose was detected and used to
orient the end-effector using the image from the hand-in camera. Then, the 3D position
of the object was detected having the depth acquired solving the PNP problem using
landmarks from the ChAruco. Since the pose of the object is defined, the manipulator is
then able to grasp the target and place it on the desired position. Finally, the robot arm
returns to its initial position, ending the process.
4. Conclusion

The employment of the ChAruco board alongside CNN object detection proved to be
feasible in performing pick-and-place tasks. However, the sequence of motions is not
smooth enough to compete with other methods using depth sensor devices such as in [13].
In the near future we hope to update the method presented in this project to require
less intrusive landmarks near the target, enhancing its versatility and applications, while
retaining its performance and inexpensiveness.
References

[1] E. Zvornicanin, "What is Yolo Algorithm?," 04 November 2022. [Online].

Available: https://www.baeldung.com/cs/yolo-algorithm. [Accessed 15
December 2022].

[2] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies

for accurate object detection and semantic segmentation," 2014 IEEE
Conference on Computer Vision and Pattern Recognition, 2014.

[3] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on

Computer Vision (ICCV), 2015.

[4] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once:
Unified, real-time object detection," 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2016.

[5] G. Jocher, YOLOv5 by Ultralytics, 2020.

[6] Roboflow, "Image Augmentation," [Online]. Available:

https://docs.roboflow.com/image-transformations/image-augmentation.

[7] [Online]. Available: https://www.makesense.ai.

[8] J. Solawetz, "Mean average precision (MAP) in object detection," 2022 11

25. [Online]. Available: https://blog.roboflow.com/mean-average-precision/.

[9] Ultralytics, "Load Yolov5 from Pytorch Hub ⭐ · issue #36 ·

ultralytics/yolov5," [Online]. Available:
https://github.com/ultralytics/yolov5/issues/36.

[10] Z. Zhang, "A flexible new technique for camera calibration," IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11,
2000.

[11] G. An, S. Lee, M.-W. Seo, K. Yun, W.-S. Cheong and S.-J. Kang,
"Charuco board-based omnidirectional camera calibration method,"
Electronics, vol. 7, no. 12, 2018.
[12] T. A. Myhre, "Robot camera calibration," [Online]. Available:
https://www.torsteinmyhre.name/snippets/robcam_calibration.html.

[13] T.-T. Le, T.-S. Le, Y.-R. Chen, J. Vidal and C.-Y. Lin, "6d pose estimation
with combined deep learning and 3D vision techniques for a fast and accurate
object grasping," Robotics and Autonomous Systems, vol. 141, 2021.

[14] J. Solawetz, "What is Yolov5? A guide for beginners.," 29 June 2020.

[Online]. Available: https://blog.roboflow.com/yolov5-improvements-and-
evaluation/.

Api 11ax
100% (1)
Api 11ax
6 pages
IP Report Final
No ratings yet
IP Report Final
20 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
C11240283S19
No ratings yet
C11240283S19
4 pages
2022 V13i3059
No ratings yet
2022 V13i3059
11 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
SEMINAR
No ratings yet
SEMINAR
13 pages
Analytical Study On Object Detection Using Yolo Algorithm
No ratings yet
Analytical Study On Object Detection Using Yolo Algorithm
3 pages
You Only Look Once - Object Detection Models A Review
No ratings yet
You Only Look Once - Object Detection Models A Review
8 pages
IJISAE 20 Divya+kumawat 3 1834
No ratings yet
IJISAE 20 Divya+kumawat 3 1834
10 pages
Object Detection With Voice Sensor and Cartoonizing The Image
No ratings yet
Object Detection With Voice Sensor and Cartoonizing The Image
6 pages
Object Detection and Classification Using Yolov3 IJERTV10IS020078
No ratings yet
Object Detection and Classification Using Yolov3 IJERTV10IS020078
6 pages
Object Detection and Recognition For A Pick and Place Robot: Rahul Kumar Sanjesh Kumar
No ratings yet
Object Detection and Recognition For A Pick and Place Robot: Rahul Kumar Sanjesh Kumar
7 pages
Design_of_A_Real-Time_Object_Detection_Prototype_S
No ratings yet
Design_of_A_Real-Time_Object_Detection_Prototype_S
6 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
Finalreport
No ratings yet
Finalreport
56 pages
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
6 pages
Developing An Object Detection and Gripping Mechan
No ratings yet
Developing An Object Detection and Gripping Mechan
8 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
yolo
No ratings yet
yolo
32 pages
Object Detection
No ratings yet
Object Detection
76 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
From Everand
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Fouad Sabry
No ratings yet
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
V2I41
No ratings yet
V2I41
7 pages
A novel model to detect and categorize objects from images by using a hybrid machine learning model
No ratings yet
A novel model to detect and categorize objects from images by using a hybrid machine learning model
13 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
6 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Object Detection and Identification Using Deep Learning and OpenCV
No ratings yet
Object Detection and Identification Using Deep Learning and OpenCV
7 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
Object Detection
No ratings yet
Object Detection
11 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
DOC-20250612-WA0001.
No ratings yet
DOC-20250612-WA0001.
14 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Image Preprocessing For Efficient Training of YOLO Deep Learning Networks
No ratings yet
Image Preprocessing For Efficient Training of YOLO Deep Learning Networks
3 pages
Project
100% (1)
Project
30 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
SR22804211151
No ratings yet
SR22804211151
8 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
390 Submission
No ratings yet
390 Submission
5 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Optimized Visual Recognition Algorithm in Service Robots: Junwwu, Wei Cai, Shi M Yu, Zhuo L Xu Andxueyhe
No ratings yet
Optimized Visual Recognition Algorithm in Service Robots: Junwwu, Wei Cai, Shi M Yu, Zhuo L Xu Andxueyhe
11 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
ref14
No ratings yet
ref14
5 pages
Ruchitha_paper
No ratings yet
Ruchitha_paper
5 pages
Deep Learning Based Automated Billing Cart
No ratings yet
Deep Learning Based Automated Billing Cart
4 pages
5-IJLEMR-77839
No ratings yet
5-IJLEMR-77839
5 pages
applsci-12-01354-v2
No ratings yet
applsci-12-01354-v2
14 pages
Object Detection
No ratings yet
Object Detection
3 pages
Ma-Madden-IMVIP2023
No ratings yet
Ma-Madden-IMVIP2023
4 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
45 pages
Robotics III Chapter 06 Scene Understanding SS 2024
No ratings yet
Robotics III Chapter 06 Scene Understanding SS 2024
110 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Real Time Object Detection and Tracking Using Deep Learning and Opencv
No ratings yet
Real Time Object Detection and Tracking Using Deep Learning and Opencv
4 pages
Delft3d-Quickplot User Manual
No ratings yet
Delft3d-Quickplot User Manual
110 pages
Fortimail v7.0.0 Release Notes
No ratings yet
Fortimail v7.0.0 Release Notes
16 pages
COSC 90 Lecture2A
No ratings yet
COSC 90 Lecture2A
77 pages
Medical Procedures
No ratings yet
Medical Procedures
33 pages
Entrepreneurship Internal Assessment: Main Event Entertainment Group
No ratings yet
Entrepreneurship Internal Assessment: Main Event Entertainment Group
11 pages
Note Set P6
100% (1)
Note Set P6
77 pages
Pololu - Arduino Library For The Pololu QTR Reflectance Sensors
No ratings yet
Pololu - Arduino Library For The Pololu QTR Reflectance Sensors
8 pages
Need of Log4J
No ratings yet
Need of Log4J
20 pages
English Preparation Guide PDPP 201809
No ratings yet
English Preparation Guide PDPP 201809
18 pages
CX-CP-1.14 Gage Repeatability and Reproducibility Study
No ratings yet
CX-CP-1.14 Gage Repeatability and Reproducibility Study
13 pages
Mukhtasar Masail O Ahkaam e Namaaz e Janaza
100% (1)
Mukhtasar Masail O Ahkaam e Namaaz e Janaza
98 pages
ONLINE Application For GATE 2011: Page - 1
No ratings yet
ONLINE Application For GATE 2011: Page - 1
5 pages
SLiTRANet an EEG-Based Automated Diagnosis Framework for Major Depressive Disorder Monitoring Using a Novel LGCN and Transformer-Based Hybrid Deep Learning Approach
No ratings yet
SLiTRANet an EEG-Based Automated Diagnosis Framework for Major Depressive Disorder Monitoring Using a Novel LGCN and Transformer-Based Hybrid Deep Learning Approach
18 pages
Untitled
No ratings yet
Untitled
5 pages
A3 Project Management and Problem Solving Thinking 1. What Is An A3 Project?
No ratings yet
A3 Project Management and Problem Solving Thinking 1. What Is An A3 Project?
7 pages
370MP SN W01131197 Documentation
No ratings yet
370MP SN W01131197 Documentation
84 pages
Study On e Commerce and Analysis On Amazon and Flipkart
100% (1)
Study On e Commerce and Analysis On Amazon and Flipkart
52 pages
Digital Literacy PPT-M3-L5
100% (2)
Digital Literacy PPT-M3-L5
37 pages
Timers Oscillators CA 555
No ratings yet
Timers Oscillators CA 555
6 pages
0000 STEERSAFE Final Report
No ratings yet
0000 STEERSAFE Final Report
305 pages
Assignment
No ratings yet
Assignment
7 pages
ISO 9001 2015 2008 Clause by Clause Matrix
100% (1)
ISO 9001 2015 2008 Clause by Clause Matrix
37 pages
Lec-9 DSM-2-2 PDF
No ratings yet
Lec-9 DSM-2-2 PDF
19 pages
06 - Unity Animation (Part 1)
No ratings yet
06 - Unity Animation (Part 1)
19 pages
Our Lady of Guadalupe Minor Seminary Area A Table of Specification 1 Quarter Mathematics 7 School Year: 2020-2021
No ratings yet
Our Lady of Guadalupe Minor Seminary Area A Table of Specification 1 Quarter Mathematics 7 School Year: 2020-2021
1 page
Smart-UPS On-Line - SRT2200XLI
No ratings yet
Smart-UPS On-Line - SRT2200XLI
4 pages
Visual Studio 2015 Image Library EULA
No ratings yet
Visual Studio 2015 Image Library EULA
3 pages
SolarEdge Reimbursement Policy (Current)
No ratings yet
SolarEdge Reimbursement Policy (Current)
1 page
Summer School 2024 Notification 1
No ratings yet
Summer School 2024 Notification 1
4 pages

Autonomous Robot Operation

Uploaded by

Autonomous Robot Operation

Uploaded by

National Taiwan University of Science

Student Number: F11003106, F11003115, F11003114,

Robot Arm Omnidirectional Autonomous

Name： Diego Correa

Through the development of Artificial Intelligence, it is a fact that industrial robotic

Figure 1. Proposed algorithm architecture.

2.2. YOLO mechanism

Figure 3. IOU definition.

Five predictions compose each bounding box: x, y, w, h, and confidence. The

Figure 6. Chart with mAP score of the training.

• ID for the object.

Figure 7. The model in action.

2.5. 2D pose estimation problem

2.6. Hand-eye calibration problem

𝑾𝑾(𝟏𝟏) 𝒈𝒈 𝒄𝒄(𝟏𝟏) 𝑾𝑾(𝟐𝟐) 𝒈𝒈 𝒄𝒄(𝟐𝟐)

According to this mathematical model, the solution to the transformation of the

Figure 8. The eye in hand problem. Image adapted from [12]

2.7. 6D pose estimation and object grasping

2.8. Computer–Robot Interface

Figure 11. Computer-Robot Interface Schematic diagram.

Figure 12. The procedure's phases in order.

[1] E. Zvornicanin, "What is Yolo Algorithm?," 04 November 2022. [Online].

[2] R. Girshick, J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies

[3] R. Girshick, "Fast R-CNN," 2015 IEEE International Conference on

[5] G. Jocher, YOLOv5 by Ultralytics, 2020.

[6] Roboflow, "Image Augmentation," [Online]. Available:

[7] [Online]. Available: https://www.makesense.ai.

[8] J. Solawetz, "Mean average precision (MAP) in object detection," 2022 11

[9] Ultralytics, "Load Yolov5 from Pytorch Hub ⭐ · issue #36 ·

[14] J. Solawetz, "What is Yolov5? A guide for beginners.," 29 June 2020.

You might also like