0% found this document useful (0 votes)
9 views

Research Paper

This document presents a comparative study on the construction of 3D objects from 2D images, highlighting the significance of deep learning techniques, particularly Generative Adversarial Networks (GANs), in this field. It reviews various existing methods for generating 3D representations from 2D images and proposes a novel approach using GANs, emphasizing their effectiveness in creating realistic 3D models. The paper also discusses the applications of 3D image construction in areas such as medical imaging, robotics, and virtual reality.

Uploaded by

avani.am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Research Paper

This document presents a comparative study on the construction of 3D objects from 2D images, highlighting the significance of deep learning techniques, particularly Generative Adversarial Networks (GANs), in this field. It reviews various existing methods for generating 3D representations from 2D images and proposes a novel approach using GANs, emphasizing their effectiveness in creating realistic 3D models. The paper also discusses the applications of 3D image construction in areas such as medical imaging, robotics, and virtual reality.

Uploaded by

avani.am
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352191361

A Comparative Study on Construction of 3D Objects from 2D Images

Chapter · June 2021


DOI: 10.1007/978-981-16-1773-7_17

CITATION READS

1 1,916

6 authors, including:

Mohan Mahanty
Vignan’s Institute of Information Technology
18 PUBLICATIONS 43 CITATIONS

SEE PROFILE

All content following this page was uploaded by Mohan Mahanty on 11 January 2022.

The user has requested enhancement of the downloaded file.


A Comparative Study on Construction
of 3D Objects from 2D Images

Mohan Mahanty , Panja Hemanth Kumar , Manjeti Sushma ,


Illapu Tarun Chand , Kombathula Abhishek ,
and Chilukuri Sai Revanth Chowdary

Abstract The 3D image construction for the 2D images is a longstanding problem,


explored by the number of computer graphics, computer vision, and machine learning
research organizations from decades. After the evolution of deep learning archi-
tectures, scholars have shown their interest in developing 3D images from the 2D
greyscale or RGB images because this perspective shows a significant influence
in the discipline of computer vision. The applications of this conversion found in
various fields of medical image analysis, robotic vision, game design, lunar explo-
rations, 3D modelling, military, geographical structuring, physics, support models,
etc. Potentially, when a 2D image converted to 3D representation, the same image
can be viewed from different angles and directions. The 3D structure, which in turn
generated, is much more informative than 2D images as it contains information about
distance from the camera to a particular object. In this paper, we discussed various
exiting methods for generating 3D representations from 2D images, using 3D repre-
sentational data as well as without 3D representational data and proposing a novel
approach for the construction of 3D models from existing 2D images using GAN.
Generative adversarial networks (GANs) have shown tremendous results in gener-
ating new fake data from existing data, where we cannot detect the false data. Various
other architectures of GAN, like HOLO-GAN, IG-GAN, have also been proposed to
meet the need to convert 2D to 3D representation, which produced excellent results.
After analysing, we provide an extensive comparative review on methods and archi-
tectures, which can convert 2D images to 3D objects and express our thoughts on the
ideas proposed. Further, the concept of GAN extended to represent 360 view images,
panorama images in 3D structures, which plays a vital role in spherical view analysis
and synthesis, virtual reality design, augmented reality design, 3D modelling of data,
etc.

M. Mahanty (B) · P. H. Kumar · I. T. Chand · K. Abhishek · C. S. R. Chowdary


Department of Computer Science and Engineering, Vignan’s Institute of
Information Technology, Visakhapatnam, India
M. Sushma
Department of Computer Science and Engineering, Anil Neerukonda Institute of
Technology & Sciences, Visakhapatnam, India
e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 205
S. K. Saha et al. (eds.), Smart Technologies in Data Science and Communication,
Lecture Notes in Networks and Systems 210,
https://doi.org/10.1007/978-981-16-1773-7_17
206 M. Mahanty et al.

Keywords 2D images · 3D images · Deep learning · Generative adversarial


network (GAN)

1 Introduction

The 3D object construction from a 2D image is attracting many researchers all around
the globe because of its exciting applications. The existing computer vision in 2D
drastically extended to the 3D vision, which is more informative in the sense of
various views, camera angles, depth cues, light, and other factors affecting the 3D
view. In this modern world, the birth of technologies like virtual reality, augmented
reality, and others involved with 3D modelling shows the importance of perceiving
the 2D world directly into a 3D world. The 3D view provides more information
as well as gives a better experience than a 2D image. Booming technologies have
emerged using this, but until now, extensive research is done on 2D images than
on 3D objects. With this, we can conclude that there is a possible requirement to
convert 2D images to 3D structures to fill the gap between them. Computer science
techniques can help in constructing 3D structures from 2D images in various ways.
Many researchers and scholars have done extensive research in this field of study. The
important aspect of creating 3D structures from 2D images is depth. The presence
of depth varies from 3 to 2D images. One of the methods is calculation of depth of
the image by finding its depth map from the image, and finding various contours of
the given images and merging these contours to construct the 3D structure is another
method. In this manner, many scholars have proposed their work using different ways.
The deep learning has once changed the approach to the construction of 3D from
a 2D image. Many scholars started using convolutional neural networks (CNN) in
this conversion process. Convolutional neural networks is an outstanding approach
to deal with computer vision and this particular problem of conversion of 2D to
3D structures. Scholars even developed some techniques using convolutional neural
networks (CNN) and using generative adversarial networks (GANs).

2 Literature Review

The main aim of constructing 3D structures from 2D images is to estimate the depth
of each object in the 2D image. We can calculate depth in various ways and methods.
Even obtaining the contours of an image can help us to calculate depth [1] Kadhim
et al. have proposed their work on the visualization of a 3D object using 2D images
by finding contours of a given image and merging to construct the 3D object. The 3D
object is a 3D volume which has parallel slices containing useful information in the
form of numerical data. Surface rendering and volume rendering are two methods
described, and surface rendering is done using contours and merging them, whereas,
in the volume rendering method, high-resolution data is preserved but requires much
A Comparative Study on Construction of 3D Objects from 2D Images 207

Fig. 1 Deep network architecture of both global coarse scale and local fine scale networks [2]

more information than surface rendering. Even though the 3D object constructed, it
carries very little information, which may not be useful for further analysis. Calcu-
lating all the depths of objects in the 2D image gives rise to the depth map, Eigen
et al. have proposed usage of the deep network for prediction of depth map [2]. As
shown in Fig. 1, constructing a 3D structure from a given 2D image is an arduous
job as 2D images carry very less information, which may not be sufficient. Eigen
et al., in their research article, have taken both local and global information of the
given image in different cues as shown in Fig. 2. From the aspects explained from
the research article written by Galabov [3], the algorithms used to convert 2D to
a 3D object, based on cues, are divided into two types: monocular depth cues and
multi-ocular depth cues.
With the advancements in neural networks, Sinha et al. have proposed work in
constructing 3D surfaces employing deep residual networks in their research paper
[4]. Proposed work states that the objects classified into non-rigid and rigid objects,
and based on it, different architectures proposed to generate surfaces. As shown in
Fig. 3, for each input image, it also gives the view also for a rigid body as a piece of
extra information. However, the method proposed by Sinha et al. is only limited to
genus-0 surfaces.
The extensive work is done by constructing 3D models using convolutional neural
networks from single images, which are contributed by Tatarchenko et al. [5]. As
shown in Fig. 4, this architecture is resulting in an RGB image of the particular object
as well as the depth map of the given image, which contains valuable information
regarding the geometry for 3D structure. Because of this valuable information, the
208 M. Mahanty et al.

Fig. 2 a Input, b output of coarse scale, c refined output of fine scale, d ground truth [2]

Fig. 3 Generative network for geometry-image feature channel for rigid shape [4]

voxels-based 3D model can be generated, which can further be used for analysis.
By merging different views, a 3D point cloud is generated, which is useful to guess
the invisible parts of the object. Tatarchenko et al. have developed the quality of
generated images as shown in Fig. 5, use an elegant architecture, and the interesting
thing is that it also applied for non-homogeneous backgrounds.
After this extensive research carried in convolutional neural networks (CNN),
due to the generative capability of generative adversarial networks (GANs), scholars
started migrating to it. Generative adversarial networks (GANs) are called the best
A Comparative Study on Construction of 3D Objects from 2D Images 209

Fig. 4 Architectural design of the encoder–decoder network [5]

Fig. 5 Leftmost images show the input, network generated images in the top, and below are ground
truth [5]

modern invention in the discipline of deep learning due to the capability of generating
fake data from the existing original data, which are not distinguishable due to its
design. Goodfellow et al. first introduce generative adversarial networks (GANs)
[6]. Scholars started using this simple idea to generate data in different aspects such
as in computer vision, robotic vision, natural language processing, time-series data,
music, and voice. Gui et al. have shown the capabilities of GAN in his research paper
describing different GANs [7] and its capabilities in generating data from existing
data.
Generative adversarial networks (GANs) are the hybrid model of deep learning
techniques that consist of two simultaneous and dynamic trained models. Generative
adversarial nets developed by Goodfellow et al. [6] have a major impact on generating
data from exiting patterns of the data which are not distinguishable from the original
data. It consists of two dynamic training models: first (generator) network trained
to produce fake data, and the second (discriminator) network trained to distinguish
210 M. Mahanty et al.

the generated data from real data. They develop some variations in the basic idea of
generative adversarial networks (GANs), such as cyclic consistency loss mechanism
by Zhu et al. in his research paper [8]. The modes of output even can be controlled
by proposed work by Mirza et al. in their research paper [9], which is the primary
development of the proposed work of Goodfellow et al. [6]. Different variants of
GAN are proposed with different features in research papers [10–14]. The extensive
review of GAN and its applications and various methods can be found in research
papers [7, 15].
As shown in Fig. 6, the generative adversarial networks (GANs) mainly consist
of two parts, namely generator and discriminator. The generator takes input as the
random noise vector and generates fake data. The discriminator on the other side
takes input from both generator, fake data, and the real data from the input source.
The discriminator outputs the probability of similarity between the fake data and real
data. Depending on the probability estimated by the discriminator network, then back
propagation is done to the generator network to produce even more similar fake data to
the real data. After each iteration, it revises that the discriminator network has weights
and required biases in order to maximize its accuracy in the task of classification,
which indeed means to produce correct predictions by giving maximum probability
as possible: input data as real and generated fake data as fake. Generator network
weights and biases are revised to maximize the probability that the discriminator
network misclassifies fake generated data as real and continues this process until a
particular state achieved, called an equilibrium state. Reaching equilibrium, when
two players are involving in a game, the state in the game where neither of the players
can improve their performance by implementing several methods is termed to be Nash
equilibrium. We intend the dataset of real data that we make a generator to learn to

Fig. 6 Architecture of generative adversarial network


A Comparative Study on Construction of 3D Objects from 2D Images 211

emulate with near-perfect quality fake data from it. The discriminator network is
given real data as the input (x). A vector (generated from the raw input (z)) is given
as input to the generator net-work, considered as initial point for synthesizing (or)
generating fake data. We picture the process described above, in Fig. 6.
The generator network depicted with function G(), and discriminator network
depicted with function D(). The generator network takes a random number in the
form of a vector, z as the input with density p_z. In contrast, the discriminator network
takes real input, x, and density generated by a discriminator network is p_g. G()
function returns density p_g, whereas D() function returns probability. Equation (1)
describes the overall error generated by both generator and discriminator.

1 1
E(G, D) = Ex∼ pt [1 − D(x)] + Ez∼ pz [D(G(z))]
2 2
1 
E(G, D) = Ex∼ pt [1 − D(x)] + Ex∼ pg [D(x)] (1)
2
In the process of training of the generator, maximize the error while we try to
minimize it for the discriminator. As both generator and discriminator networks
compete with each other, which treated as the minimax game. The described game
is described mathematically in Eq. (2).
 
max min E(G, D) (2)
G D

Equation (3) describes the overall equation of generative adversarial networks.


 
min max V (D, G) = E x∼ pdata (x) log D(x)
G G
 
+ E z∼ pz (z) log(1 − D(G(z))) (3)

The ideal case is when the generator network produces the same density as the
discriminator density, and perfect convergence has taken place. The whole method
depicted in below graphical representation. The discriminator network calculates the
probability by comparing the distributions. The probability distributions compar-
ison is made between them by the method Kullback-Leibler divergence. Suppose p
diverges from a second expected probability distribution q, then, Eq. (4) describes
the KL divergence.


N
p(xi )
D K L ( p||q) = p(xi ). log (4)
i=1
q(xi )

Because of asymmetric nature, the distance between any two distributions may
play some role, which cannot be estimated by Kullback-Leibler divergence. Jensen-
Shannon divergence is symmetric and can be used to calculate the distance between
two probability distributions. Table 1 describes the functioning of various GANs.
212 M. Mahanty et al.

Table 1 Brief description of generator and discriminator networks and their functioning
Generator network Discriminator network
Input Vector of random numbers or, in some The discriminator is fed input from two
cases, particular numbers different sources:
(1) Real data directly from the source
(2) Fake data generated by the generator
network
Output Fake data that is mostly similar to the real The probability at which the fake data
data in the given dataset generated is similar to the real data
Goal Continuously generate fake data which is Continuous distinguishing between the
indiscernible from the real data fake data from the generator network and
the real data from the training dataset

Various architectures are related to GAN in the aspect of the generation of fake
or duplicate images from real images which cannot be distinguished. Generative
methods result in unique samples from high-dimensional distributions (like Gaus-
sian distribution, Poisson distribution, etc.) including images, voice and some of
the methods are described. GAN reaches Nash equilibrium when the following
conditions are met:
• Generator network generates fake data which are indiscernible from the real data
from the training dataset, so no more iteration is required.
• Discriminator network can guess randomly where a particular example is real or
fake is best.
Figures 7 and 8 shows the learning procedure of the generator to generate fake data
from distributions [16]. The GAN suffers from the problem as it is nearly impossible
to find the Nash equilibrium for GANs because of the immense complexities involved
in reaching convergence in non-convex games as instated by Farnia et al. in their
research paper [17]. GAN convergence remains one of the most important open
research questions in GAN-related research. The generative modelling is the reverse
process of object recognition, and it constructs the image from the pixels instead
of classification of the pixels. From the technique of generative modelling, we are
interested in exploring the construction of 3D structures using 2D images. Zhu et al.

Fig. 7 Architecture of generative adversarial network (GAN) [16]


A Comparative Study on Construction of 3D Objects from 2D Images 213

Fig. 8 Description of the idea of adversarial. The blue distribution is real, and orange distribution
is generated

have come up with their work using a 2D image enhancer, and they generate A 3D
model with the concept of adversarial. Zhu et al. suggested a method [18] to train two
networks simultaneously. As shown in Fig. 9, it can learn from both 2D data and 3D
data. 2D images are given as input to the enhancer network (deep convolutional neural
network), generates the feature vectors, which are given as input to 3D Generator
(GAN).
Edward et al. have proposed an adversarial system for 3D object generation
and reconstruction [19], in order to capture more complicated distributions. It uses
Wasserstein training objective with gradient penalty. Usage of this in their work
showed improvement in instability. Improvements to the 3D-GAN [18] are shown
clearly with their work. The combination of the IWGAN algorithm with a new
novel generative model claimed as 3D-IWGAN. The architecture of 3D-IWGAN is
depicted in Fig. 10.
Chen et al. contributed a method which is the differentiable interpolation-based
renderer. Constructing 3D structures from 2D objects, not used for further analysis,
is of no use. Because of the rasterization step involved, operations related are mostly
differentiable, which may not be accessible for many machine learning techniques.
Chen et al. proposed a differentiable method which is used not only for generating 3D
object from a single image but also preserving geometry, texture, and light. As shown
in Figs. 11 and 12, HOLO-GAN proposed by Phuoc et al. has shown tremendous
214 M. Mahanty et al.

Fig. 9 Enhancer network and 3D generator network [18]

Fig. 10 3D-IWGAN architecture [19]

Fig. 11 Difference between conditional GAN and HOLO-GAN [20]


A Comparative Study on Construction of 3D Objects from 2D Images 215

Fig. 12 HOLO-GAN generator network [20]

results in constructing 3D objects from 2D images [20]. HOLO-GAN has dynamic


control with the pose and view of generated objects through rigid body transforma-
tions, which learned 3D features. In conditional GAN [9], the pose observed and
information to the discriminator are given as input, whereas HOLO-GAN does not
require additional pose labels in the process of training, as the pose information is
not fed as input to the discriminator.
In the research article by Phuoc et al. [20], the GAN integrated with 3D transfor-
mation, randomly rotated 3D features during the training process of HOLO-GAN,
makes the difference with 3D-GAN, shown in Figs. 13 and 14.
Lunz et al. proposed an extensible training procedure for 3D generative models
from given 2D data, which employs a non-differentiable renderer in his research
article [21]. To deal with the problem of non-differentiability, Lunz et al. have intro-
duced proxy neural renderer, which can eliminate the problem by allowing back
propagation as well as discriminator output matching. Figures 15 and 16 describe
the architecture and functionality of IG-GAN.
From the above figure, the existence of the proxy neural network is depicted,
which is eliminating the problem of non-differentiability.

Fig. 13 Results obtained from HOLO-GAN [20]


216 M. Mahanty et al.

Fig. 14 Results obtained from 3D-GAN with using 3D transformations [20]

Fig. 15 Architecture and training set-up of IG-GAN [21]

3 Results and Analysis

By the extensive analysis, finding contours and merging them to find the depth of
objects have a problem of low resolution, high level of noise, etc. It is suggestable
to utilize other methods as finding the depth map of the image. Depth cues are also
helpful as it carries more information about the image. A deep network with convo-
lutions can convert the 2D image to a 3D structure. Convolutional neural networks
(CNN) are also given good results, which are further modified to increase the perfor-
mance of the network. Developments of generative adversarial networks (GAN)
changed the scenario to not only construct the 3D objects but also to analyse them.
On a comparative study on different methods of constructing 3D objects from 2D
images, we can say that results produced by generative adversarial networks are
A Comparative Study on Construction of 3D Objects from 2D Images 217

Fig. 16 Results obtained on IG-GAN on chanterelle mushroom dataset [21]

entirely satisfactory in various factors. The results generated by different research


articles with different metrics are analysed. On analysis and study based on the
comparison, we try to produce results of different methods with advantages and
disadvantages. As shown in Fig. 17, generating 3D shape surfaces from 2D images
using deep neural networks is only limited to genus-0 surfaces, and no feedback
mechanism is defined. From Sinha et al. research article [4], both non-rigid and rigid
bodies are feed, and obtained results are shown in Fig. 18.

Fig. 17 Results generated by Surf-Net [4] on the non-rigid body. The topmost row is a depth maps,
and the following are ground truths followed by the generated surfaces
218 M. Mahanty et al.

Fig. 18 Results generated by Surf-Net on a rigid body. The topmost row is the given image, and
the following row contains ground truth followed by the generated surface in the alternate view [4]

From work proposed by Tatarchenko et al. [5], different views are generated to the
3D object using convolutional neural networks from a single image used ShapeNet
dataset. Tatarchenko et al. have used the nearest neighbour baseline approach, in the
determination of error in colour and depth. Zhu et al. [18] considered that ModelNet
(Figs. 19 and 20) dataset had produced results when compared to 3D-GAN proposed
by Wu et al. [22]. ModelNet contains two sub-datasets, namely ModelNet10 and
ModelNet40. Zhu et al. even produced results with the proposed model with and
without the enhancer network.
From work initiated by Chen et al. [23], the experiments are done on the ShapeNet
dataset on various aspects such as predicting 3D structures from single images: its

Fig. 19 3D objects generated by the proposed model by Zhu et al. [18]


A Comparative Study on Construction of 3D Objects from 2D Images 219

Fig. 20 3D objects generated by 3D-GAN by Wu et al. [18]

geometry and colour, predicting 3D structures from single images: its geometry,
texture, and light results with adversarial loss. Phuoc et al. [20] have done experiments
on cars dataset and provided results for not only the proposed network but also for
the other GAN networks such as DC-GAN, LS-GAN, and WGAN-GP. Lunz et al.
have experimented on the ShapeNet dataset and provided results. The quality of the
generated 3D models is measured by rendering them to 2D images and calculating
Frchet inception distance (FID). All the results are tabulated below with the dataset
used and metrics used for comparison in their research articles. Some other interesting
works on GAN are image-to-image translation models [8, 24, 25]. Iizuka et al.
propose the image in painting model [26]. Table 2 describes the comparison between
various architectures.
Table 2 shows the comparison between various 2D to 3D conversion methods
provided by scholars, and in most of the cases, the dataset is ShapeNet (CAR), and

Table 2 Comparison between various architectures


S. No. Architecture Dataset Metrics Quantitative result
First Second metric First metric Second
metric metric
1 Modified Modelnet Mean Classification 44.44 87.85
3d-Gan with (Modelnet40) average accuracy
enhancer precision
network
2 Dib-R Shapenet 3d-Iou F-Score 78.8 53.6
(geometry and (Car)
colour)
3 DIB-R Shapenet Texture Light 0.0217 9.709
(geometry, (Car)
texture, light)
4 Dib-R Shapenet 2d-Iou Key-Point 0.243 0.972
(adversarial
loss)
5 CNN For Shapenet Error in Error in depth 0.013 0.028
Multiview-3D colour
Model
220 M. Mahanty et al.

Table 3 Comparison of architectures of GANs proposed


S. No. Architecture Dataset Metrics Quantitative result
1 DC-GAN Cars Kernel inception distance 4.78 ± 0.11
2 LS-GAN Cars Kernel inception distance 4.99 ± 0.13
3 WGAN-GP Cars Kernel inception distance 15.57 ± 0.29
4 HOLO-GAN Cars Kernel inception distance 2.16 ± 0.09

generative adversarial networks used for the construction of 3D objects from 2D


images because of its capability of generating models in a new way. DIB-R model
even uses generative adversarial networks (GANs) in its architecture. DIB-R model
used in the case of retrieval of information such as texture, colour, light, and other
characteristics of an image (Table 3).
Frchet inception distance is the estimate of Wassertein-2 distance between these
two Gaussians, primarily used to quantitative results of the quality of generated
samples, which is described mathematically in Eq. (5).
⎛ 1 ⎞
   2

FID(r, g) = ||µr − µg ||22 + T r ⎝ + −2 ⎠ (5)


r g r g

From Eq. (5), Frchet inception distance (FID) can be considered better in terms
of computational efficiency, robustness, and discriminability. Even though it takes
the first two order moments of the distribution, it is considered to be one of the best
metrics for comparison of GAN, as stated in the research article [27]. On the other
hand, the inception score considered as a key metric in the de-tection of intra-class
mode dropping, i.e. when a model generates one and only one image per a particular
class has a high inception score but not good FID. The inception score measures the
diverse nature and qualitative results of generated samples, while FID measures the
Wassertein-2 distance between the generated distribution and real distribution. When
the difference in distribution is measured, it can evaluate the unobserved insides of
the object. FID [27] is more consistent with human-based judgments and can handle
noise more consistently than the inception score. As FID, considered to be better
than KID, we consider that results generated by IG-GAN are considered to be more
accurate in terms of robust and computationally efficient results through FID.

4 Conclusion

From the comparative study of various architectures, every research work has its
advantages and disadvantages. Filling the gaps between them and utilizing the models
where those are suitable show satisfactory results. Eliminating the problem of non-
differentiability to the rasterization process is also useful in terms of recovering
A Comparative Study on Construction of 3D Objects from 2D Images 221

texture, colour, light, and other properties. Generative capability and other modifica-
tions to the adversarial idea are recommended for this task, which can be seen from
the comparative results. At the same time, the discriminative capability is contin-
uously increasing the capability of the generator, which provides the best results.
We encourage scholars in performing research in GAN for the construction of 3D
objects from 2D images, which have good scope for the future.

5 Future Scope

Moreover, GANs are used in the medical field in structuring DNA, doctor recommen-
dations, and more interestingly, used in the detection of pandemic virus COVID-19.
Khalifa et al., in their research paper [9], used GAN on the X-ray dataset of the chest
in the detection of COVID-19, and Waheed et al. in his research paper [10] used GAN
along with data augmentation in the detection of COVID-19. The above described
shows that scholars are much interested in the idea of GAN and its development
because of its tremendous capabilities. Generative methods result in unique samples
from high-dimensional distributions (various distribution include Gaussian distribu-
tion, Poisson distribution, etc.) including images, voice and some of the methods
are described. In the process of constructing 3D structures from given 2D images,
HOLO-GAN learns through a differentiable projection unit which deals with occlu-
sions. More precisely, the 4D tensor received by the projection unit, which has 3D
features and results in a 3D tensor, which is 2D features. Furtherly, we are experi-
menting with generative adversarial networks on panorama images to construct 3D
objects. We are trying to deduce the results in using GAN in the spherical anal-
ysis, which can be integrated into technologies like virtual reality, augmented reality,
robotic vision, etc.

References

1. K.K. Al-shayeh, M.S. Al-ani, Efficient 3D object visualization via 2D images. J. Comput. Sci.
9(11), 234–239 (2009)
2. D. Eigen, C. Puhrsch, R. Fergus, Depth map prediction from a single image using a multi-scale
deep network. Adv. Neural Inf. Process. Syst. 3(January), 2366–2374 (2014)
3. A Real Time 2D to 3D Image Conversion Techniques. Accessed 19 June 2020 (Online).
Available https://www.researchgate.net/publication/272474479
4. A. Sinha, A. Unmesh, Q. Huang, K. Ramani, SurfNet: generating 3D shape surfaces using
deep residual networks, in Proceedings of 30th IEEE Conference on Computer Vision Pattern
Recognition, CVPR 2017, vol. 2017-Janua, 2017, pp. 791–800. https://doi.org/10.1109/CVPR.
2017.91
5. M. Tatarchenko, A. Dosovitskiy, T. Brox, Multi-view 3D models from single images with a
convolutional network, in Lecture Notes Computer Science (including Subser. Lect. Notes
Artif. Intell. Lect. Notes Bioinformatics), vol. 9911, LNCS, pp. 322–337, 2016, https://doi.
org/10.1007/978-3-319-46478-7_20
222 M. Mahanty et al.

6. I.J. Goodfellow et al., Generative adversarial nets. Adv. Neural Inf. Process. Syst. 3(January),
2672–2680 (2014)
7. J. Gui, Z. Sun, Y. Wen, D. Tao, J. Ye, A review on generative adversarial networks: algorithms,
theory, and applications, vol. 14, no. 8, pp. 1–28, 2020. (Online). Available https://arxiv.org/
abs/2001.06937
8. J.Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-
consistent adversarial networks, in Proceedings of IEEE International Conference on Computer
Vision, vol. 2017-Octob, 2017, pp. 2242–2251. https://doi.org/10.1109/ICCV.2017.244
9. M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, pp. 1–7, 2014 (Online).
Available https://arxiv.org/abs/1411.1784
10. T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial
networks, in Proceedings of IEEE Computer Social Conference on Computer Vision Pattern
Recognition, vol. 2019-June, 2019, pp. 4396–4405. https://doi.org/10.1109/CVPR.2019.00453
11. C. Wang, C. Xu, X. Yao, D. Tao, Evolutionary Generative Adversarial Networks, Mar 2018,
Accessed 19 June 2020 (Online). Available https://arxiv.org/abs/1803.00657
12. T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive growing of GANs for improved quality,
stability, and variation, in 6th International Conference on Learning Representation ICLR
2018—Conference Track Proceedings, 2018, pp. 1–26
13. H. Zhang et al., StackGAN++: realistic image synthesis with stacked generative adversarial
networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2019). https://doi.org/
10.1109/TPAMI.2018.2856256
14. A. Banerjee, D. Kollias, Emotion Generation and Recognition: A StarGAN Approach, 2019
(Online). Available https://arxiv.org/abs/1910.11090
15. S. Desai, A. Desai, Int. J. Tech. Innov. Mod. Eng. Sci. (IJTIMES) 3(5), 43–48 (2017)
16. Understanding Generative Adversarial Networks (GANs). https://towardsdatascience.com/und
erstanding-generative-adversarial-networks-gans-cd6e4651a29. Accessed 19 June 2020
17. F. Farnia, A. Ozdaglar, GANs May Have No Nash Equilibria, 2020, pp. 1–38 (Online). Available
https://arxiv.org/abs/2002.09124
18. J. Zhu, J. Xie, Y. Fang, Learning adversarial 3D model generation with 2D image enhancer, in
32nd AAAI Conference on Artificial Intelligent, AAAI 2018, 2018, pp. 7615–7622
19. E. Smith, D. Meger, Improved Adversarial Systems for 3D Object Generation and Reconstruc-
tion, no. CoRL, 2017, pp. 1–10 (Online). Available https://arxiv.org/abs/1707.09557
20. T. Nguyen-Phuoc, C. Li, L. Theis, C. Richardt, Y.L. Yang, HoloGAN: unsupervised learning
of 3D representations from natural images, in Proceedings of 2019 International Conference
on Computer Vision Working ICCVW 2019, no. Figure 1, 2019, pp. 2037–2040. https://doi.org/
10.1109/ICCVW.2019.00255
21. S. Lunz, Y. Li, A. Fitzgibbon, N. Kushman, Inverse Graphics GAN: Learning to Generate 3D
Shapes from Unstructured 2D Data, 2020 (Online). Available https://arxiv.org/abs/2002.12674
22. Z. Wu et al., 3D ShapeNets: A Deep Representation for Volumetric Shapes. Accessed 19 June
2020 (Online). Available https://3dshapenets.cs.princeton.edu
23. W. Chen et al., Learning to Predict 3D Objects with an Interpolation-based Differentiable
Renderer, 2019, pp. 1–12 (Online). Available https://arxiv.org/abs/1908.01210
24. M.Y. Liu, T. Breuel, J. Kautz, Unsupervised image-to-image translation networks. Adv. Neural
Inf. Process. Syst. 2017-Decem(Nips), 701–709 (2017)
25. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, B. Catanzaro, High-Resolution Image
Synthesis and Semantic Manipulation with Conditional GANs (2018)
26. S. Iizuka, E. Simo-Serra, H. Ishikawa, Globally and locally consistent image completion. ACM
Trans. Graph 36 (2017). https://doi.org/10.1145/3072959.3073659
27. A. Borji, Pros and Cons of GAN Evaluation Measures

View publication stats

You might also like