Team Project Report
Team Project Report
SUBTITLE
Team Project
Submitted: April 2nd, 2023
University of Mannheim
Chair of General Management and Information Systems
68131 Mannheim
Phone: +49 (0) 621 181 1691, Fax: +49 (0) 621 181 1692
Homepage: https://www.bwl.uni-mannheim.de/heinzl/
Abstract
The fashion industry has seen a significant shift towards personalized and creative
clothing production, leading to a growing demand for effective apparel design tools. In
response, our Generative Fashion Design project was developed to enable users to
realize their design ideas more effectively. Our project employs Stable Diffusion and
DreamBooth as the primary model and algorithm for generating fashion designs, with a
focus on exploring the boundaries of models generated by the DreamBooth method. The
implementation of DreamBooth in the fashion domain is described in detail, including
the underlying knowledge and the customization of the approach for generating creative
clothing images. A comprehensive testing method following an exhaustive testing
strategy is presented. While the generative model has demonstrated promising results in
generating innovative designs, certain limitations were identified, and future research
directions were proposed. To sum up, fine-tuning a text-to-image model using
DreamBooth has the potential to serve as a valuable tool for fashion designers,
enhancing their creativity, productivity, and workflows.
Table of Contents
List of Figures..................................................................................................................v
1 Introduction.................................................................................................................1
3 User-oriented Requirements....................................................................................10
3.1 User Research Design and Implementation.........................................................10
3.2 Interviews.............................................................................................................11
3.2.1 Interviews with Fashion Designer..............................................................11
3.2.2 Interviews with Fashion Enthusiasts..........................................................12
4 Implementation.........................................................................................................15
4.1 Training................................................................................................................15
4.1.1 Preliminaries...............................................................................................15
4.1.2 Training Customized Concept....................................................................17
4.2 Prototype..............................................................................................................17
4.3 Testing.................................................................................................................17
4.3.1 Testing Strategy..........................................................................................18
4.3.2 Testing results.............................................................................................19
5.1 Limitations...........................................................................................................26
5.2 Future Research Directions..................................................................................27
6 Conclusion.................................................................................................................28
Reference List.................................................................................................................vi
Appendix A.......................................................................................................................x
List of Figures
Figure 1. List of design elements considered by professionals........................................12
Figure 5. Novel view compositions, art renditions and property modifications. ..........21
1
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 2
We discuss the limitations and future work for the AI generative model used in
fashion design in Chapter 5. While the model showed promising results in generating
innovative and unique fashion designs, the study identified several limitations,
including reduced performance when introduced to more than eight concepts,
occasional generation of unrelated examples, and some imperfections in the generated
images. Additionally, the model struggles to maintain consistency in applying fashion
elements across different parts of the fashion piece and cannot render legible text. To
improve the system, future research could incorporate more diverse training data,
include constraints and guidelines, conduct further user studies, explore multi-modal
designs, and improve the stability of the model's performance. These efforts could result
in a more creative and practical design solution for fashion designers and enthusiasts.
3
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 4
Networks (GANs) is one of the most commonly mentioned ones (Goodfellow et al.,
2020). In the current state of the art, a majority of the image synthesis models use either
a variant of a particular model or a combination of multiple models.
There exist numerous GAN variations that are widely utilized in computer vision.
Some of these have been applied to fashion image generation, such as CAGAN (Jetchev
& Research, 2017), which facilitates the swapping of clothing on fashion model
photographs, and Attribute-GAN, which investigates clothing matching problems under
the cGAN framework and generates clothing images based on semantic attributes,
respectively (Liu, Zhang, Ji, & Jonathan Wu, 2019). Additionally, another GAN
explores symmetry of generated fashion images by enhancing DCGAN (Makkapati &
Patro, 2017; Radford, Metz, & Chintala, 2015), while Poly-GAN generates clothing
images conditioned on arbitrary human poses (Pandey & Savakis, 2020).
While large text-to-image models, such as those mentioned above, have achieved
remarkable results in generating high-quality and diverse images from text prompts,
they often lack the ability to accurately mimic the appearance of specific subjects in a
given reference set and synthesize novel renditions of them in different contexts
(Ramesh et al., 2022; Saharia et al., 2022; Rombach, Blattmann, Lorenz, Esser, &
Ommer, 2022). To address this limitation, fine-tuning methods such as Textual
Inversion and DreamBooth have been proposed. These techniques enable the model to
be fine-tuned on a specific reference set, allowing it to learn and mimic the appearance
of the subjects within that set and generate new, context-specific images (Ruiz et al.,
2022; Gal et al., 2022).
Textual Inversion
However, the approach has certain limitations, including the inability to learn
precise shapes and instead focusing on capturing the "semantic" essence of a concept.
Another significant challenge is the extensive training time associated with this
methodology, with the learning of a single concept requiring approximately two hours,
which can be prohibitively long in certain scenarios (Gal et al., 2022).
DreamBooth
In contrast to Textual Inversion, which solely trains the embedding without any
alterations to the base model, Dreambooth implements fine-tuning of the entire text-to-
image model. This approach involves the acquisition of the capacity to associate a
distinct identifier with a specific concept, whether it be an object or style. As a result,
the generated images are tailored to a greater degree to the specific object or style in
question, thus facilitating a more personalized output compared to the results obtained
through Textual Inversion (Voronov et al., 2023).
After a thorough evaluation of our task requirements, which took into account the
specific needs of fashion designers, the feasibility of the operation, the significant
research support available, and the distinct advantages of Stable Diffusion over its
competitors, we devoted a considerable amount of time to reaching a decision regarding
its implementation.
Our objective is to create a machine learning (ML) based system for generative
fashion design. Following an extensive interview with fashion designers, we discovered
that a major challenge they face is finding inspiration during their work. Fortunately,
computer vision has been a trending topic for some time now, and several algorithms
have demonstrated excellent performance in this field. Among them, Generative
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 8
Adversarial Networks (GANs) have been recognized as one of the most popular image
synthesis algorithms and were therefore our primary consideration. However, GANs are
known to have limitations in modeling complex, multi-modal distributions, making it
difficult to generate clothes as described in natural language, especially for data with a
high degree of variability (Brock et al., 2018; Karras, Laine, & Aila, 2019).
Subsequently, we conducted several experiments on the proposed approach, which
revealed that the training process posed certain challenges. Specifically, due to the high
resource requirements, we were compelled to input low-resolution images and limit the
number of epochs, resulting in unsatisfactory outcomes. Meanwhile, although
applications such as DALL·E 2 or Imagen have demonstrated exceptional results, we
found that the majority of the images generated did not meet our aesthetic and design
requirements, specifically with respect to displaying adequate details of silhouette,
color, texture, and overall design (Eckman & Wagner, 1995).
In addition, the absence of access to DALL·E 2's parameters and code presents a
significant challenge to optimize and implement it in our situation. Initially, we
considered using DALL·E 2's approach and inputting clothing data to train our own
model until we encountered Stable Diffusion. We discovered that Stable Diffusion is an
open-source alternative to DALL·E 2 that produces similar performance in the tasks we
prioritize. Furthermore, the kernel model of Stable Diffusion, Latent Diffusion Models
(LDMs), operates on a compressed latent space with lower dimensionality, enabling
computationally less expensive training and faster inference with almost no loss in
synthesis quality. This effectively addresses our resource constraint issue. Therefore,
we opted for Stable Diffusion as the preferred choice, considering its practicality and
versatility in effectively addressing our specific tasks.
the model has produced poor results such as twisted human faces, strange colors, and a
lack of aesthetics or novelty that do not meet the requirements of fashion designers.
Moreover, in the context of fashion design, designers may face difficulties in
articulating their visual inspirations using conventional verbal descriptions, and there
may be variations in perception and interpretation of the model that generate unexpected
results. To address the challenges related to integrating visual inspirations into fashion
designs using the Stable Diffusion model, we posit that the utilization of fine-tuning
methods can allow designers to incorporate their visual inspirations directly into their
designs. By inputting multiple images of a single concept into the Stable Diffusion
model, designers can extract specific features, colors, patterns, or textures and then
transfer to any fashion piece, thus stimulating creativity and aiding in the design
process. This approach can enhance the efficiency and effectiveness of the design
process while facilitating the integration of visual inspiration into the final design
product.
10
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 11
3.2 Interviews
The last two interviews were crucial as they provided us with important insights
for improving our model. We will discuss the details of the questions we designed, their
implementation, and the information we obtained from them individually.
It is notable that the responses gathered from the interviews with professional
fashion designers highlighted "color matching" and "shape of the clothes" as the
primary areas of focus in the fashion design process. This information prompted us to
prioritize these elements when generating images, emphasizing their critical importance.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 12
images. Their primary challenge lies in their difficulty visualizing amorphous and
unstructured concepts in their minds, which our project can effectively address.
Consequently, they have exhibited favorable dispositions towards AI-assisted clothing
generation models.
ambiguous concepts solely through their own efforts, and our model can effectively
address this issue, although further optimization could enhance the quality of results.
Overall, the interviews conducted yielded significant insights that informed our
focus on key areas for improvement and the efficacy of the model in generating design
concepts and enhancing the clothing image synthesis process. Based on feedback from
fashion enthusiasts, it became apparent that properly transferring inspiration into
fashion pieces and creatively combining them in novel ways is of utmost importance.
Thus, this will serve as the main objective for our project going forward.
4 Implementation
This section provides a detailed account of the implementation of the
DreamBooth method in training our models. The key components of this approach will
be discussed, along with our customized implementation for the fashion domain based
on the original DreamBooth paper. Additionally, we developed a prototype that
simplifies the method for fashion designers, allowing them to test it through interviews.
Finally, we will discuss our model evaluation process, which confirms that our model
meets the required standards for our task.
4.1 Training
We utilized the strategy proposed by DreamBooth for fine-tuning our selected
clothing images. The objective of our training process is to embed a given subject,
consisting of a small set of images (usually 3-5), such as inspirations (e.g., paintings,
people, nature, architecture, etc.) or fashion pieces, into the output domain of a model so
that it can be synthesized with a unique identifier. This enables us to create customized
clothing designs by combining any of the input clothing items or inspirations, with the
output controlled by natural language. The training process involves two stages:
generating a low-resolution image from text, followed by applying it to high-resolution
diffusion models.
In this section, we will begin by discussing the preliminaries that are necessary to
understand the underlying process. Following this, we will present how we
implemented the DreamBooth method in practice.
4.1.1 Preliminaries
15
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 16
In our case, a conditional U-Net has been utilized, which takes a noisy sample, a
conditional state, and a timestep, and outputs a sample shaped output. The output of the
U-Net is later used by a decoder to generate the final images as the output.
Vocabulary Encoding
This process involves transforming a prompt and an image into a vector that can
be directly computed. Since text and images belong to different distributions, direct
computation of the two together is not feasible. Therefore, considerable research has
been conducted in this area. CLIP (Contrastive Language-Image Pre-training) leverages
prior work on zero-shot transfer, natural language supervision, and multimodal learning
to provide optimized pairs of images and text for us.
In our study, the pre-trained model we used was initialized with the weights of the
last version checkpoint and fine-tuned on "laion-aesthetics v2 5+", which took CLIP
Image embeddings produced with the OpenAI CLIP model as input, ensuring high-
quality visual images and semantic fidelity. After conducting numerous tests, we
discovered that the pre-trained CLIP model had integrated almost all fashion-related
terminology, including fashion brands such as Gucci and Chanel, as well as fashion-
related phrases like jumpsuit, joggers, and woven, among others. As a result, we
concluded that there was no need to fine-tune the CLIP model. Moreover, we will use
the new subject name and prompt in a standardized CLIP tokenization method.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 17
Through the use of our model, fashion designers can preview their designs in
advance, which can aid in identifying potential issues and adjusting details. This
approach can ultimately reduce future risks and improve overall efficiency compared to
traditional design processes.
4.2 Prototype
4.3 Testing
We conducted a comprehensive testing process following our established testing
protocol to evaluate the limits and capabilities of our model. This allowed us to gain a
better understanding of the model's performance and explore its potential for creating
unique and innovative fashion designs.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 18
In the Fashion Pieces section of our study, we will primarily focus on the
capabilities of the model for understanding the features of the fashion piece, shifting
within/across the different fashion piece classes, combining different concepts and
applying inspirations on fashion pieces. On the other hand, the Fashion Elements
category will offer greater opportunities for creativity as we aim to apply the different
fashion elements of various products to other fashion pieces, as well as exploring
diverse inspirations. This segment will involve a more detailed approach, as we
combine the specific fashion elements of different introduced concepts to add unique
and distinctive flavors to the fashion designs. Overall, by exploring different scenarios
and analyzing the results, we aim to identify the strengths and weaknesses of the model
and provide insights for future research and development in this area.
Fashion Pieces
Within the Fashion Pieces category, we initially assessed the model's ability to
comprehend various types of fashion pieces without the need for introducing any
specific concepts. The pre-trained model was able to understand all the fashion pieces.
In this model, we use a text-image model and fine-tune it so that it can learn to bind a
unique identifier to that particular subject. This identifier can be used to generate
realistic images in different scenes. This technique has many practical applications, for
example, it can be used for product display in e-commerce, allowing consumers to
visualize the product without having to try it on. To implement this technique, we use a
semantic prior embedded in the model and a new autoclass prior to preserve the loss.
This approach is able to maintain the authenticity of the image while preserving the
features and identity of the subject during the synthesis process. Overall, our technique
is able to synthesize subjects that do not appear in the reference image under different
scenes, poses, perspectives and lighting conditions, which opens up new possibilities for
the generation and design of fashion pieces (Ruiz et al., 2022). At the same time, we
introduced one concept to evaluate the model's fundamental understanding of a single
fashion piece. The model performed well for a single concept, accurately identifying the
key features of the fashion piece (color, shape, pattern) and generating new designs
inspired by it that were comparable but distinct [Image I-Swarovski button]. As
emphasized in the official paper of Dreambooth model, precise selection of a unique
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 20
identifier followed by the subject's class name (e.g., "A [V] button") significantly
enhances the model's capacity to produce more creative and higher-quality results.
Figure 4. Subject-driven generation. Given Swarovski (left), it shows good that our approach
(right) can synthesize the "Swarovski" with high fidelity and in new contexts (text prompt: “A [Swarovski] button”)
Later on, we introduced various concepts to the model at different times, as this
was a crucial step in developing innovative and unique fashion designs. Our findings
revealed that by introducing more than eight concepts to the model, it was no longer
able to grasp all the concepts and integrate them into designs effectively. For example,
when we introduced 24 concepts to the model and called a particular dress in the prompt
(i.e., bejflow dress), the model struggled to replicate the given concept and instead
generated random dresses without considering the specific characteristics of the original
concept. In all the examples presented in the following section, it should be noted that
the model was trained with no more than eight concepts.
Our assessment also included the use of real human selfies, which we found to be
effective in printing faces on fashion products. We also observed that by providing the
model with specific product details and a given location, it could produce designs that
featured the introduced human with the destination background while wearing the
specified fashion piece. Furthermore, the model was also able to understand real-life
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 22
objects, such as the cutenose pillow concept, and create new designs by analyzing the
pillow's color, shape, and animal features. When we provided a contextual background
with a specific location, such as a forest or school, the model placed the animal inspired
by the pillow in the appropriate concept. We even found that by adding the word
"friends" to the prompt, the model could place the pillow character in a school context
and print it on a specified fashion piece. As a last example, we explored the use of
prompts that included contextual information such as "on runway," which allowed
fashion designers to view possible results on the runway and visualize the fashion
pieces being worn by models, although the realisticity of such results may be limited.
Fashion Elements
In the Fashion Elements section of our study, we will explore the model's ability
to combine specific fashion elements of fashion products. As we did in the previous
section, we began by assessing the model's understanding of shape, pattern, texture,
color, and space elements. We accept the assumption that each concept contains all
fashion elements, regardless of whether it was a fashion piece or an inspiration concept.
The model demonstrated a high level of comprehension, even when combining these
elements in a single fashion piece.
successful in providing appropriate results, not all results were visually pleasing. The
model appeared to find combining patterns relatively easy, but combining colors
presented a more significant challenge. Even when the color combination was specified
in the prompt, the model often combined the shape of the fashion piece concepts rather
than their colors and chose one of the concept’s color, as evidenced in Figure 9.
However, when we assigned a pattern, such as a zebra pattern, the model was able to
combine the colors of multiple concepts within that pattern. When it came to combining
two concepts' patterns, the model incorporated clues from both introduced concepts,
though the results were not always strong.
Combining the same fashion elements from multiple concepts and applying them
to an introduced fashion piece concept resulted in more creative and comprehensive
results. As shown in Figure 10 below, the model not only understood and combined the
colors but also effectively combined the different fashion elements from both concepts.
The texture of the Orpi button, with its shiny surface, was incorporated into the dress
and the pink/orange color shapes on the pattern were inspired by the button itself. On
the other hand, the pattern was drawn from the Munique art concept by Monet, while
the green color was derived from impressionist nature paintings. While this approach
offers high creativity opportunities for fashion designers, we found that it is barely
possible to combine a specific fashion element while keeping other elements constant.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 25
In the final part of our exploration, we aimed to apply different fashion elements
from different concepts without combining the same elements. This approach was
relatively easier than combining since it simply involves gathering different elements
together. While the results demonstrated the model's ability to understand some fashion
elements, when the prompt included fashion elements from more than two concepts, the
model tended to show either some of them or displayed different elements from each
concept. Nonetheless, this approach provides fashion designers with more options to
incorporate various fashion elements into their designs, expanding the possibilities for
creative and innovative fashion concepts. Additionally, we also applied these fashion
elements to specific parts of fashion pieces, such as collars, sleeves, etc. This allowed us
to explore the model's ability to apply these elements in a more targeted and specific
manner. Overall, we found that the model was able to effectively apply the fashion
elements to different parts of the fashion pieces, resulting in unique and creative
designs. However, we also observed that the model sometimes struggled to maintain
consistency in applying the fashion elements across the different parts of the fashion
piece.
5 Limitations & Future Work
5.1 Limitations
Our study aimed to explore the creativity capabilities of an AI model in the
fashion design process. We conducted comprehensive testing on the model's ability to
comprehend various fashion pieces and elements, and its capacity to generate innovative
and unique fashion designs. Our study revealed promising prospects for AI models in
fashion design with several positive outcomes, including a high level of comprehension
in understanding various fashion pieces and elements, effective application of fashion
elements from various fashion pieces, and the potential to combine different fashion
elements and inspiration concepts to generate innovative and unique designs. However,
our findings also highlighted some limitations of the model:
The model struggles when introduced to more than eight concepts, leading
to reduced performance in combining concepts and generating unique
designs.
The model occasionally generates unrelated examples since it focuses on
the examples from its pre-trained samples and cannot focus on the specific
concept we introduce.
The generated images may not be entirely realistic and may have some
distortions or imperfections.
The model sometimes struggles to maintain consistency in applying the
fashion elements across the different parts of the fashion piece.
While the model was successful in combining the same elements from
multiple concepts, it is challenging to combine a specific fashion element
while keeping other elements constant.
The model is not capable of rendering legible text, which may limit its
potential use in certain fashion design applications.
Randomness is a limitation as the model generates several outputs for the
same prompt, and there is no guarantee of the first output being the best.
26
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 27
By exploring these avenues for future research, we believe that our fashion design
generation system can be further improved and expanded to provide more creative and
practical design solutions for fashion designers and enthusiasts.
6 Conclusion
In this study, we explored the potential of AI models in fashion design by using a
fashion design generation system based on DreamBooth method. Our testing protocol
involved introducing various fashion concepts and inspiration concepts to the model and
evaluating its creativity capabilities in generating unique and innovative fashion
designs. Our findings demonstrated that the DreamBooth method was capable of
understanding various fashion pieces and elements and effectively incorporating them
into new designs. However, the model also had some limitations, such as reduced
performance when introduced to more than eight concepts, generating unrelated
examples since it focuses on the examples from its pre-trained class samples and
occasional struggles to maintain consistency in applying fashion elements across
different parts of a fashion piece.
Our study has shown that Dreambooth has the potential to meet many of the needs
of fashion designers. The model's ability to comprehend various fashion pieces and
elements, generate unique and innovative designs, and combine different elements and
inspiration concepts makes it a valuable tool for designers. Dreambooth can help
democratize the fashion industry by making design solutions more accessible and
affordable for emerging designers and small businesses. Despite some limitations, our
study suggests that AI models like Dreambooth can enhance designers' creativity and
productivity by generating unique and diverse design ideas that may have been
overlooked by human designers. Furthermore, our findings indicate that the model can
be used to generate designs quickly and efficiently, saving designers time and allowing
them to focus on other aspects of the design process. Overall, our research suggests that
Dreambooth can be a valuable tool for fashion designers, providing them with a new
and innovative approach to the design process, while also streamlining their workflows
and saving time.
To further improve and extend the system, future research could focus on
incorporating more diverse training data, adding constraints and guidelines to make the
generated designs more practical, conducting further user studies with fashion designers
and enthusiasts, and exploring the generation of multi-modal designs. By exploring
these avenues for future research, we believe that this fashion design generation system
28
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 29
can be further refined and expanded to provide more creative and practical design
solutions for the fashion industry.
30
Reference List
Brock, A., Donahue, J., & Simonyan, K. (2018). Large Scale GAN Training for High
Fidelity Natural Image Synthesis. Retrieved from arXiv.org website:
https://arxiv.org/abs/1809.11096
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A.
A. (2018). Generative Adversarial Networks: An Overview. IEEE Signal
Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/msp.2017.2765202
Eckman, M., & Wagner, J. (1995). Aesthetic Aspects of the Consumption of Fashion
Design: the Conceptual and Empirical Challenge. ACR North American
Advances, NA-22. Retrieved from
https://www.acrwebsite.org/volumes/7825/volumes/v22/NA
Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A. H., Chechik, G., & Cohen-
Or, D. (2022). An Image is Worth One Word: Personalizing Text-to-Image
Generation using Textual Inversion. ArXiv:2208.01618 [Cs]. Retrieved from
https://arxiv.org/abs/2208.01618
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., …
Bengio, Y. (2020). Generative adversarial networks. Communications of the
ACM, 63(11), 139–144. https://doi.org/10.1145/3422622
Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., … Guo, B. (2022). Vector
Quantized Diffusion Model for Text-to-Image Synthesis. ArXiv:2111.14822 [Cs].
Retrieved from https://arxiv.org/abs/2111.14822
vi
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
vii
Jetchev, N., & Research, Z. (2017). The Conditional Analogy GAN: Swapping Fashion
Articles on People Images. Retrieved from
https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w32/
Jetchev_The_Conditional_Analogy_ICCV_2017_paper.pdf
Jonathan, H., Ajay, J., & Pieter, A. (2020). Denoising Diffusion Probabilistic Models.
Advances in Neural Information Processing Systems, 33. Retrieved from
https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179c
a4b-Abstract.html
Karras, T., Laine, S., & Aila, T. (2019). A Style-Based Generator Architecture for
Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00453
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing
and Improving the Image Quality of StyleGAN. Retrieved from
openaccess.thecvf.com website:
https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_
Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html
Lee, J. S., & Jirousek, C. (2015). The development of design ideas in the early apparel
design process: a pilot study. International Journal of Fashion Design,
Technology and Education, 8(2), 151–161.
https://doi.org/10.1080/17543266.2015.1026411
Liu, L., Zhang, H., Ji, Y., & Jonathan Wu, Q. M. (2019). Toward AI fashion design: An
Attribute-GAN model for clothing match. Neurocomputing, 341, 156–167.
https://doi.org/10.1016/j.neucom.2019.03.011
Makkapati, V., & Patro, A. (2017). Enhancing Symmetry in GAN Generated Fashion
Images. Artificial Intelligence XXXIV, 10630, 405–410.
https://doi.org/10.1007/978-3-319-71078-5_34
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
viii
Pandey, N., & Savakis, A. (2020). Poly-GAN: Multi-conditioned GAN for fashion
synthesis. Neurocomputing, 414, 356–364.
https://doi.org/10.1016/j.neucom.2020.07.092
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial Networks. Retrieved from
arXiv.org website: https://arxiv.org/abs/1511.06434
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Openai, M. (2022). Hierarchical
Text-Conditional Image Generation with CLIP Latents. Retrieved from
https://cdn.openai.com/papers/dall-e-2.pdf
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … Sutskever, I.
(2021, July 1). Zero-Shot Text-to-Image Generation. Retrieved from
proceedings.mlr.press website:
https://proceedings.mlr.press/v139/ramesh21a.html
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-
Resolution Image Synthesis with Latent Diffusion Models. ArXiv:2112.10752
[Cs], 2. Retrieved from https://arxiv.org/abs/2112.10752
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2022).
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven
Generation. ArXiv:2208.12242 [Cs]. Retrieved from
https://arxiv.org/abs/2208.12242
Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., … Norouzi, M. (2022,
October 31). Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding. Retrieved April 2, 2023, from openreview.net website:
https://openreview.net/forum?id=08Yk-n5l2Al
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
ix
Simon, H. A. (1996). The Sciences of the Artificial, third edition. MIT Press.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June 1). Deep
Unsupervised Learning using Nonequilibrium Thermodynamics. Retrieved April
2, 2023, from proceedings.mlr.press website:
https://proceedings.mlr.press/v37/sohl-dickstein15.html
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021).
Score-Based Generative Modeling through Stochastic Differential Equations.
ArXiv:2011.13456 [Cs, Stat]. Retrieved from https://arxiv.org/abs/2011.13456
Voronov, A., Khoroshikh, M., Babenko, A., & Ryabinin, M. (2023). Is This Loss
Informative? Speeding Up Textual Inversion with Deterministic Objective
Evaluation. ArXiv:2302.04841 [Cs]. Retrieved from
https://arxiv.org/abs/2302.04841
Yan, H., Zhang, H., Liu, L., Zhou, D., Xu, X., Zhang, Z., & Yan, S. (2022). Toward
Intelligent Design: An AI-based Fashion Designer Using Generative Adversarial
Networks Aided by Sketch and Rendering Generators. IEEE Transactions on
Multimedia, 1–1. https://doi.org/10.1109/TMM.2022.3146010
https://docs.google.com/spreadsheets/d/
1BrVgh0vmEPWVZFQphdtwWvWcEmiqkfOS/edit?
usp=sharing&ouid=112535582883725334921&rtpof=true&sd=true
https://docs.google.com/spreadsheets/d/1culaRHze4g_DOlUoD7tIlA-
d1rFUlM6_YKGpSEt5E-U/edit#gid=439546458
x
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xi
I hereby declare that I have developed and written the enclosed seminar paper /
bachelor thesis / master thesis entirely on my own and have not used outside sources
without declaration in the text. Any concepts or quotations applicable to these sources
are clearly attributed to them. This seminar paper / bachelor thesis / master thesis has
not been submitted in the same or a substantially similar version, not even in part, to
any other authority for grading and has not been published elsewhere. This is to certify
that the printed version is equivalent to the submitted electronic one. I am aware of the
fact that a misstatement may have serious legal consequences.
I also agree that my thesis can be sent and stored anonymously for plagiarism
purposes. I know that my thesis may not be corrected if the declaration is not issued.
Jiaxin Yuan, Yunjing Dai, Bennur Kaya, Tolga Yasar, Yi Wang,Yiyi Wei,