0% found this document useful (0 votes)

35 views

Team Project Report

The document discusses a project that uses Stable Diffusion and DreamBooth to generate fashion designs. It describes implementing DreamBooth to fine-tune Stable Diffusion for the fashion domain and testing the model. While the model showed promise, limitations were identified including reduced performance with many concepts and imperfect generated images.

Uploaded by

kissthevibe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Team Project Report

Uploaded by

kissthevibe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 54

GENERATIVE FASHION DESIGN

SUBTITLE

Team Project
Submitted: April 2nd, 2023

By: Jiaxin Yuan, 1820345

Yunjing Dai, 1820527
Bennur Kaya, 1754738
Tolga Yasar, 1822718
Yi Wang, 1870245
Yiyi Wei, 1873027

Supervisor: Deborah Mateja

Reviewer: Prof. Dr. Armin H. Heinzl

University of Mannheim
Chair of General Management and Information Systems
68131 Mannheim
Phone: +49 (0) 621 181 1691, Fax: +49 (0) 621 181 1692
Homepage: https://www.bwl.uni-mannheim.de/heinzl/
Abstract

The fashion industry has seen a significant shift towards personalized and creative
clothing production, leading to a growing demand for effective apparel design tools. In
response, our Generative Fashion Design project was developed to enable users to
realize their design ideas more effectively. Our project employs Stable Diffusion and
DreamBooth as the primary model and algorithm for generating fashion designs, with a
focus on exploring the boundaries of models generated by the DreamBooth method. The
implementation of DreamBooth in the fashion domain is described in detail, including
the underlying knowledge and the customization of the approach for generating creative
clothing images. A comprehensive testing method following an exhaustive testing
strategy is presented. While the generative model has demonstrated promising results in
generating innovative designs, certain limitations were identified, and future research
directions were proposed. To sum up, fine-tuning a text-to-image model using
DreamBooth has the potential to serve as a valuable tool for fashion designers,
enhancing their creativity, productivity, and workflows.
Table of Contents

List of Figures..................................................................................................................v

1 Introduction.................................................................................................................1

2 Model Selection Exploration......................................................................................3

2.1 Theoretical Foundations........................................................................................3
2.1.1 Generative models........................................................................................3
2.1.2 Generative Adversarial Networks (GANs)..................................................4
2.1.3 Diffusion Models(DMs)...............................................................................5
2.1.4 Latent Diffusion Models(LDMs).................................................................5
2.1.5 Fine-tuning Methods for Text-to-image Models..........................................6
2.2 Final decision.........................................................................................................7
2.2.1 Why Stable Diffusion?.................................................................................7
2.2.2 Why DreamBooth?.......................................................................................8

3 User-oriented Requirements....................................................................................10
3.1 User Research Design and Implementation.........................................................10
3.2 Interviews.............................................................................................................11
3.2.1 Interviews with Fashion Designer..............................................................11
3.2.2 Interviews with Fashion Enthusiasts..........................................................12

4 Implementation.........................................................................................................15
4.1 Training................................................................................................................15
4.1.1 Preliminaries...............................................................................................15
4.1.2 Training Customized Concept....................................................................17
4.2 Prototype..............................................................................................................17
4.3 Testing.................................................................................................................17
4.3.1 Testing Strategy..........................................................................................18
4.3.2 Testing results.............................................................................................19

5 Limitations & Future Work.....................................................................................26

Table of Contents

5.1 Limitations...........................................................................................................26
5.2 Future Research Directions..................................................................................27

6 Conclusion.................................................................................................................28

Reference List.................................................................................................................vi

Appendix A.......................................................................................................................x
List of Figures
Figure 1. List of design elements considered by professionals........................................12

Figure 2. List of design elements considered by enthusiasts...........................................13

Figure 3. Enthusiasts versus professionals in terms of design elements to consider........14

Figure 4. Subject-driven generation.................................................................................20

Figure 5. Novel view compositions, art renditions and property modifications. ..........21

Figure 6. Inspiration Concept-driven generation.............................................................21

Figure 7. Inspiration Concept-driven generation.............................................................22

Figure 8. Fashion pieces-combined generation................................................................23

Figure 9. Concepts-combined generation........................................................................24

Figure 10. Advanced Concepts-combined generation.....................................................25

1 Introduction
The clothing market has undergone an Industrial Revolution, providing consumers
with an extensive range of clothing options, leading to a growing demand for
personalized and creative clothing production (Helen Lewis Brockman, 1967). In light
of these developments, our project, Generative Fashion Design, can prove useful in
enabling users to realize their apparel design ideas more effectively.

In Chapter 2, we introduce key concepts and terminology related to generative

models for image synthesis, such as Generative Adversarial Networks (GANs)
(Goodfellow et al., 2020) and Diffusion Models (DMs) (Sohl-Dickstein, Weiss,
Maheswaranathan, & Ganguli, 2015). We then discuss Stable Diffusion, an open-source
text-to-image model, and its fine-tuning techniques, including Textual Inversion and
DreamBooth, explaining their definitions, working mechanisms, benefits, and
drawbacks. The passage also explains why Stable Diffusion and DreamBooth were
selected as the primary model and algorithm for the generative fashion design task,
based on their suitability for the use case and superior performance compared to
alternative techniques.

In Chapter 3, we will discuss the user requirements research that we conducted to

ensure the functional accuracy of the model. This is divided into two main stages.
Firstly, we made reasonable adjustments and improvements to the range of interviewees
based on realistic factors and feedback, and secondly, we conducted parallel and vertical
comparative analysis of the feedback results, which helped us to better understand user
needs and lay the foundation for the feasibility of our model.

Chapter 4 provides a detailed description of the implementation of the

DreamBooth method for training models in the fashion domain. It covers the key
components of this approach and how it was customized for generating creative clothing
images. A prototype was also developed to simplify the method for fashion designers.
The chapter also discusses the model evaluation process, which includes a testing
strategy that covers multiple dimensions and adheres to the MECE principle (Lee &
Chen, 2018). The testing results confirm that the model meets the required standards for
the task.

1
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 2

We discuss the limitations and future work for the AI generative model used in
fashion design in Chapter 5. While the model showed promising results in generating
innovative and unique fashion designs, the study identified several limitations,
including reduced performance when introduced to more than eight concepts,
occasional generation of unrelated examples, and some imperfections in the generated
images. Additionally, the model struggles to maintain consistency in applying fashion
elements across different parts of the fashion piece and cannot render legible text. To
improve the system, future research could incorporate more diverse training data,
include constraints and guidelines, conduct further user studies, explore multi-modal
designs, and improve the stability of the model's performance. These efforts could result
in a more creative and practical design solution for fashion designers and enthusiasts.

In conclusion, our study aimed to investigate the potential of utilizing AI

generative models in the field of fashion design. We conducted a comparative analysis
of several models and ultimately selected fine-tuning on Stable Diffusion as the optimal
approach to meet the needs of fashion designers. Our findings revealed promising
capabilities of the model to comprehend diverse fashion elements, generate unique and
innovative designs, and integrate different elements and inspiration concepts.
Additionally, we identified certain limitations and proposed future research directions in
this area. In summary, we conclude that fine-tuning a text-to-image model using
DreamBooth has the potential to serve as a valuable tool for fashion designers,
enhancing their creativity and productivity, streamlining their workflows, and saving
time.
2 Model Selection Exploration
In this section, we initially present some fundamental concepts and terminology.
Subsequently, we delve into the rationale behind selecting Stable Diffusion and
DreamBooth as the primary model and algorithm for our generative fashion design
task.

2.1 Theoretical Foundations

Comprehending the following terminology is of great importance. In this
subchapter, we will illustrate generative models specifically for image synthetic. Then
we will explore some of their examples, such as Generative Adversarial Networks
(GANs) and Diffusion Models(DMs). Additionally, we will introduce an open-source
text-to-image model, Stable Diffusion, along with its fine-tuning techniques such as
Textual Inversion and DreamBooth.

Each part will cover the following questions:

 What they are

 How they work
 The benefits and drawbacks associated with their use

2.1.1 Generative models

In contrast to discriminative models, Generative models represent another

category of machine learning model that aims to learn and generate new data that
exhibits characteristics similar to the training data (Ng & Jordan, 2001). These models
can generate new data points that share similar features as the original data and are
utilized in various fields, including image and audio processing, natural language
processing, and robotics. Generative models function by learning the underlying
probability distribution of the input data and then utilizing this knowledge to generate
new data points (Creswell et al., 2018). In our situation, we focus specifically on the use
case of generating images. Image generation models use complex algorithms to learn
patterns and features in existing images and use this knowledge to create new, unique
images. Among the various types of generative models, Generative Adversarial

3
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 4

Networks (GANs) is one of the most commonly mentioned ones (Goodfellow et al.,
2020). In the current state of the art, a majority of the image synthesis models use either
a variant of a particular model or a combination of multiple models.

2.1.2 Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) have made remarkable strides in

machine learning, particularly in the field of image synthesis. A GAN is a generative
model that simultaneously trains two models: a generative model G that captures the
data distribution, and a discriminative model D that estimates the probability that a
sample came from the training data rather than G. The training procedure for G is to
maximize the probability of D making a mistake. This framework corresponds to a zero-
sum game. In the space of arbitrary functions G and D, a unique solution exists, with G
recovering the training data distribution and D equal to a half everywhere (Goodfellow
et al., 2020).

There exist numerous GAN variations that are widely utilized in computer vision.
Some of these have been applied to fashion image generation, such as CAGAN (Jetchev
& Research, 2017), which facilitates the swapping of clothing on fashion model
photographs, and Attribute-GAN, which investigates clothing matching problems under
the cGAN framework and generates clothing images based on semantic attributes,
respectively (Liu, Zhang, Ji, & Jonathan Wu, 2019). Additionally, another GAN
explores symmetry of generated fashion images by enhancing DCGAN (Makkapati &
Patro, 2017; Radford, Metz, & Chintala, 2015), while Poly-GAN generates clothing
images conditioned on arbitrary human poses (Pandey & Savakis, 2020).

While these GAN models demonstrate the impressive potential of GANs in

fashion image generation, none of them are capable of generating novel and creative
fashion designs with artistic merit. Furthermore, while GANs enable efficient sampling
of high resolution images with good perceptual quality (Brock, Donahue, & Simonyan,
2018), they pose optimization challenges and are unable to fully capture the distribution.
In addition, the constant competition between the generator and discriminator networks
in a GAN can lead to instability and slow training (Karras et al., 2020).
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 5

2.1.3 Diffusion Models(DMs)

Diffusion Models (DMs) (Sohl-Dickstein, Weiss, Maheswaranathan, & Ganguli,

2015), also known as Diffusion Probabilistic Models, represent another prominent class
of generative models. The goal of diffusion models is to learn the latent structure of a
dataset by modeling the way in which data points diffuse through the latent space. In
computer vision, DMs train a neural network to denoise images blurred with Gaussian
noise by learning to reverse the diffusion process (Song et al., 2021; Gu et al., 2022).
More specifically, DMs are classified as a type of latent variable model that leverages a
fixed Markov chain to map to the latent space (Jonathan, Ajay, & Pieter, 2020).

An important advantage of DMs is their ability to obviate explicit density

calculations, thereby increasing their computational efficiency relative to other
generative models, such as GANs. Rather than rely on explicit density calculations,
DMs utilize a sequence of learned transformations to map a simple distribution to a
more complex one.

Despite these advantages, DMs remain computationally demanding due to the

need for repeated function evaluations and gradient computations in the high-
dimensional space of RGB images (Rombach, Blattmann, Lorenz, Esser, & Ommer,
2022).

2.1.4 Latent Diffusion Models(LDMs)

Latent Diffusion Models (LDMs) are a modification of Diffusion Models (DMs)

that operate the diffusion process in a lower-dimensional latent space generated by an
autoencoder instead of in the high-dimensional pixel space. By focusing on the
important, semantic bits of the data, likelihood-based generative models can be trained
more efficiently in this space, resulting in lower training costs and faster inference
speeds (Rombach et al., 2022)

In the domain of text-to-image synthesis, Stable Diffusion represents the official

implementation of LDMs. Compared to other state-of-the-art models, such as OpenAI's
DALL·E 2 and Google's Imagen, Stable Diffusion shows similar performance but is
more accessible and flexible because it is open-source.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 6

2.1.5 Fine-tuning Methods for Text-to-image Models

While large text-to-image models, such as those mentioned above, have achieved
remarkable results in generating high-quality and diverse images from text prompts,
they often lack the ability to accurately mimic the appearance of specific subjects in a
given reference set and synthesize novel renditions of them in different contexts
(Ramesh et al., 2022; Saharia et al., 2022; Rombach, Blattmann, Lorenz, Esser, &
Ommer, 2022). To address this limitation, fine-tuning methods such as Textual
Inversion and DreamBooth have been proposed. These techniques enable the model to
be fine-tuned on a specific reference set, allowing it to learn and mimic the appearance
of the subjects within that set and generate new, context-specific images (Ruiz et al.,
2022; Gal et al., 2022).

Textual Inversion

Textual Inversion, implemented on Language Models (LDMs), is a technique

utilized to extract novel concepts from a limited number of sample images (Voronov,
Khoroshikh, Babenko, & Ryabinin, 2023).. Its primary objective is to enable better
control of text-to-image pipelines. This technique involves acquiring new "words" in the
embedding space of the text encoder employed by the pipeline. These words can
subsequently be incorporated into textual prompts to exercise greater control over the
final images produced by the system.

However, the approach has certain limitations, including the inability to learn
precise shapes and instead focusing on capturing the "semantic" essence of a concept.
Another significant challenge is the extensive training time associated with this
methodology, with the learning of a single concept requiring approximately two hours,
which can be prohibitively long in certain scenarios (Gal et al., 2022).

DreamBooth

Given a limited number of images featuring a particular subject (typically around

3 to 5), DreamBooth is to embed the subject into the output domain of the model while
associating it with a unique identifier. This technique utilizes rare token identifiers to
represent the subject and fine-tune a pre-existing text-to-image framework that utilizes a
diffusion-based approach. This framework operates in two distinct steps, first
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 7

generating a low-resolution image from textual prompts and subsequently employing

super-resolution (SR) diffusion models to enhance the image quality. involves fine-
tuning the low-resolution text-to-image model using the available input images, along
with textual prompts that contain a unique identifier followed by the subject's class
name (e.g., "A [V] dog") (Ruiz et al., 2022).

In contrast to Textual Inversion, which solely trains the embedding without any
alterations to the base model, Dreambooth implements fine-tuning of the entire text-to-
image model. This approach involves the acquisition of the capacity to associate a
distinct identifier with a specific concept, whether it be an object or style. As a result,
the generated images are tailored to a greater degree to the specific object or style in
question, thus facilitating a more personalized output compared to the results obtained
through Textual Inversion (Voronov et al., 2023).

2.2 Final decision

From among the various available techniques, we selected Stable Diffusion and
fine-tuned it using the DreamBooth approach. Our decision to use this particular
technique was based on a number of factors, which we will discuss in detail in the
subsequent section. Ultimately, we determined that this approach was most appropriate
for our particular use case and yielded superior results compared to other competing
techniques.

2.2.1 Why Stable Diffusion?

After a thorough evaluation of our task requirements, which took into account the
specific needs of fashion designers, the feasibility of the operation, the significant
research support available, and the distinct advantages of Stable Diffusion over its
competitors, we devoted a considerable amount of time to reaching a decision regarding
its implementation.

Our objective is to create a machine learning (ML) based system for generative
fashion design. Following an extensive interview with fashion designers, we discovered
that a major challenge they face is finding inspiration during their work. Fortunately,
computer vision has been a trending topic for some time now, and several algorithms
have demonstrated excellent performance in this field. Among them, Generative
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 8

Adversarial Networks (GANs) have been recognized as one of the most popular image
synthesis algorithms and were therefore our primary consideration. However, GANs are
known to have limitations in modeling complex, multi-modal distributions, making it
difficult to generate clothes as described in natural language, especially for data with a
high degree of variability (Brock et al., 2018; Karras, Laine, & Aila, 2019).
Subsequently, we conducted several experiments on the proposed approach, which
revealed that the training process posed certain challenges. Specifically, due to the high
resource requirements, we were compelled to input low-resolution images and limit the
number of epochs, resulting in unsatisfactory outcomes. Meanwhile, although
applications such as DALL·E 2 or Imagen have demonstrated exceptional results, we
found that the majority of the images generated did not meet our aesthetic and design
requirements, specifically with respect to displaying adequate details of silhouette,
color, texture, and overall design (Eckman & Wagner, 1995).

In addition, the absence of access to DALL·E 2's parameters and code presents a
significant challenge to optimize and implement it in our situation. Initially, we
considered using DALL·E 2's approach and inputting clothing data to train our own
model until we encountered Stable Diffusion. We discovered that Stable Diffusion is an
open-source alternative to DALL·E 2 that produces similar performance in the tasks we
prioritize. Furthermore, the kernel model of Stable Diffusion, Latent Diffusion Models
(LDMs), operates on a compressed latent space with lower dimensionality, enabling
computationally less expensive training and faster inference with almost no loss in
synthesis quality. This effectively addresses our resource constraint issue. Therefore,
we opted for Stable Diffusion as the preferred choice, considering its practicality and
versatility in effectively addressing our specific tasks.

2.2.2 Why DreamBooth?

In order to enhance the performance of the current text-to-image model to meet

our needs, we extensively researched various fine-tuning methods and ultimately
determined that DreamBooth would be the most suitable choice. This method has
demonstrated promising results and the capability to generalize effectively.

Despite demonstrating satisfactory results in image generation, Stable Diffusion

has faced similar issues as DALL·E 2 when applied in the fashion domain. Specifically,
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 9

the model has produced poor results such as twisted human faces, strange colors, and a
lack of aesthetics or novelty that do not meet the requirements of fashion designers.
Moreover, in the context of fashion design, designers may face difficulties in
articulating their visual inspirations using conventional verbal descriptions, and there
may be variations in perception and interpretation of the model that generate unexpected
results. To address the challenges related to integrating visual inspirations into fashion
designs using the Stable Diffusion model, we posit that the utilization of fine-tuning
methods can allow designers to incorporate their visual inspirations directly into their
designs. By inputting multiple images of a single concept into the Stable Diffusion
model, designers can extract specific features, colors, patterns, or textures and then
transfer to any fashion piece, thus stimulating creativity and aiding in the design
process. This approach can enhance the efficiency and effectiveness of the design
process while facilitating the integration of visual inspiration into the final design
product.

After thorough experimentation and research, we have selected DreamBooth as

our preferred method for fine-tuning Stable Diffusion, over Textual Inversion. Although
these methods share similarities in their approach and only require 3-5 images of a user-
provided concept as input, DreamBooth proved to be more effective in our experiments.
In particular, DreamBooth was able to overcome the significant challenge of Textual
Inversion, which is the prolonged training time, making it a more suitable option for our
needs.

Overall, DreamBooth produced superior results in our experimentation, which led

to our decision to utilize this method for our project. As a consequence, we have opted
to employ the DreamBooth method to aid us in constructing our models, which can be
utilized in the synthesis of clothing images.
3 User-oriented Requirements
In this section, we outline the approaches we employed to gather user
requirements and the methods we employed to analyze them for our project. We begin
by elaborating on our three interview designs, highlighting their objectives,
significance, and distinctions, as well as explaining their implementation. We
subsequently extract relevant information from the interviews to be utilized in
subsequent stages.

3.1 User Research Design and Implementation

We conducted three interviews with different objectives. To acquire a
comprehensive understanding of user needs, expectations, and preferences, the first
interview was a face-to-face session with a professional fashion designer. The purpose
of this interview was to identify the challenges encountered by fashion designers and
establish well-defined project objectives. The survey questions were primarily geared
towards eliciting information on the details of the design process and identifying pain
points experienced during the process. During the interview, the participants
emphasized the significance of gaining inspiration in the design process.

After training our model and developing an easy-to-use prototype, we conducted

the last two interviews to obtain feedback from a user perspective. Initially, considering
the niche nature of fashion, we targeted only professional fashion designers who had
been in the industry for at least 5 years after graduating from a fashion major. However,
according to the feedback we received, most professional fashion designers had
reservations about our AI-assisted inspired creation model, stating that AI-generated
images are incomparable to their professional designs. Additionally, they did not
provide enough useful information in terms of model improvement. As a result, we
expanded our audience to fashion design enthusiasts who have an interest in fashion
designing and are willing to design their own clothes for real use. This extension helped
us gather more useful information concerning our project. In the next subsection, we
will provide a detailed explanation of these last two interviews and the insights we
gained from them.

10
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 11

3.2 Interviews
The last two interviews were crucial as they provided us with important insights
for improving our model. We will discuss the details of the questions we designed, their
implementation, and the information we obtained from them individually.

3.2.1 Interviews with Fashion Designer

With the objective of implementing our model in real-world applications and

enhancing the performance of clothing image synthesis, we conducted a series of
interviews with several professional fashion designers. Each interview was allotted 30
minutes, during which we initiated a discussion regarding their design process to gain
insights into the potential applications of our model. Our questionnaire has been
structured into three parts: the first part solicits basic information from the interviewees;
the second part pertains to the design process, and seeks to elicit responses regarding the
challenges faced by the interviewees in this regard; and the third and final part seeks
feedback on the generated images. In relation to feedback, we requested the designers to
test our prototype by utilizing it to generate new images, and we recorded their feedback
pertaining to the prompts, generated images, and their overall perception of the results.
In the event that the generated image failed to meet their expectations, we requested
them to repeat the process until they were content with the output. These interviews
enabled us to acquire valuable feedback on the efficacy of our model in generating
design concepts and improving the clothing image synthesis process. The insights
gleaned from these discussions helped us identify key areas for improvement, such as
expanding the scope of customization options and enhancing the accuracy of the image
synthesis process.

It is notable that the responses gathered from the interviews with professional
fashion designers highlighted "color matching" and "shape of the clothes" as the
primary areas of focus in the fashion design process. This information prompted us to
prioritize these elements when generating images, emphasizing their critical importance.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 12

Figure 1. List of design elements considered by professionals

An additional noteworthy finding is that while 50% of the professional fashion

designers surveyed were familiar with AI-driven Image Synthesis systems, none of
them had any prior experience with AI assistants, nor expressed willingness to use such
systems in the future. In contrast, fashion enthusiasts were found to be more receptive.
However, the feedback provided by the professional fashion designers was mostly
negative, with a particular emphasis on color combinations not being harmonious and
the design styles being uncreative. These designers expressed that the results were of
little assistance in stimulating their creativity and had reservations about the project's
potential application in the professional field.

3.2.2 Interviews with Fashion Enthusiasts

Interviews with fashion enthusiasts can be regarded as an extension aimed at

gathering sufficient information to enhance the performance of our project. Similar to
the process used for fashion designers, we also employ questionnaires and testing with
this group. However, in this case, our focus is primarily on eliciting feedback from
them, considering their limited technical expertise. Therefore, we meticulously
document every feedback received for each step involved in generating synthetic
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 13

images. Their primary challenge lies in their difficulty visualizing amorphous and
unstructured concepts in their minds, which our project can effectively address.
Consequently, they have exhibited favorable dispositions towards AI-assisted clothing
generation models.

Non-professionals and professionals share a common consideration for color

matching during the design process. However, non-professionals place a greater
emphasis on the presentation of patterns and colors, while professionals tend to
overlook the significance of patterns and focus more on practicality and marketability of
their designs. Thus, it is crucial to strike a balance between creativity and practicality to
cater to both non-professionals and professionals.

Figure 2. List of design elements considered by enthusiasts

As previously mentioned, all fashion enthusiasts express a positive attitude

towards our model. They believe that creativity is the most crucial aspect of fashion
design, and they typically seek inspiration from reading books and admiring paintings.
Inspiration can generally be categorized into two types: text-based and image-based.
Moreover, they mentioned that lay people still face difficulty in visualizing the output of
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 14

ambiguous concepts solely through their own efforts, and our model can effectively
address this issue, although further optimization could enhance the quality of results.

Figure 3. Enthusiasts versus professionals in terms of design elements to consider

Overall, the interviews conducted yielded significant insights that informed our
focus on key areas for improvement and the efficacy of the model in generating design
concepts and enhancing the clothing image synthesis process. Based on feedback from
fashion enthusiasts, it became apparent that properly transferring inspiration into
fashion pieces and creatively combining them in novel ways is of utmost importance.
Thus, this will serve as the main objective for our project going forward.
4 Implementation
This section provides a detailed account of the implementation of the
DreamBooth method in training our models. The key components of this approach will
be discussed, along with our customized implementation for the fashion domain based
on the original DreamBooth paper. Additionally, we developed a prototype that
simplifies the method for fashion designers, allowing them to test it through interviews.
Finally, we will discuss our model evaluation process, which confirms that our model
meets the required standards for our task.

4.1 Training
We utilized the strategy proposed by DreamBooth for fine-tuning our selected
clothing images. The objective of our training process is to embed a given subject,
consisting of a small set of images (usually 3-5), such as inspirations (e.g., paintings,
people, nature, architecture, etc.) or fashion pieces, into the output domain of a model so
that it can be synthesized with a unique identifier. This enables us to create customized
clothing designs by combining any of the input clothing items or inspirations, with the
output controlled by natural language. The training process involves two stages:
generating a low-resolution image from text, followed by applying it to high-resolution
diffusion models.

We executed this process on a latent text-to-image diffusion model called Stable

Diffusion. Specifically, our experiments revealed that the Stable-Diffusion-v1-5 model
provided the best results as a base model.

In this section, we will begin by discussing the preliminaries that are necessary to
understand the underlying process. Following this, we will present how we
implemented the DreamBooth method in practice.

4.1.1 Preliminaries

To provide a comprehensive understanding of the training process, it is necessary

to explain the following prior knowledge and emphasize some technical details used in
our case.

15
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 16

Latent Diffusion model

As discussed before, the Latent Diffusion Models have been proposed as a

solution to reduce computational demands and speed up the inference of diffusion
models while maintaining high synthesis quality. The proposed method firstly employs
an encoder that learns a space, perceptually equivalent to the image space but with
reduced dimensions. Subsequently, a standard Diffusion Model is designed to work
with the two-dimensional structure of our learned latent space. Diffusion Models are
probabilistic models intended to learn a data distribution p(x) through gradual denoising
of a normally distributed variable. This process corresponds to learning the reverse of a
fixed Markov Chain of length T.

In our case, a conditional U-Net has been utilized, which takes a noisy sample, a
conditional state, and a timestep, and outputs a sample shaped output. The output of the
U-Net is later used by a decoder to generate the final images as the output.

Vocabulary Encoding

This process involves transforming a prompt and an image into a vector that can
be directly computed. Since text and images belong to different distributions, direct
computation of the two together is not feasible. Therefore, considerable research has
been conducted in this area. CLIP (Contrastive Language-Image Pre-training) leverages
prior work on zero-shot transfer, natural language supervision, and multimodal learning
to provide optimized pairs of images and text for us.

In our study, the pre-trained model we used was initialized with the weights of the
last version checkpoint and fine-tuned on "laion-aesthetics v2 5+", which took CLIP
Image embeddings produced with the OpenAI CLIP model as input, ensuring high-
quality visual images and semantic fidelity. After conducting numerous tests, we
discovered that the pre-trained CLIP model had integrated almost all fashion-related
terminology, including fashion brands such as Gucci and Chanel, as well as fashion-
related phrases like jumpsuit, joggers, and woven, among others. As a result, we
concluded that there was no need to fine-tune the CLIP model. Moreover, we will use
the new subject name and prompt in a standardized CLIP tokenization method.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 17

4.1.2 Training Customized Concept

The customization method offers fashion designers the ability to seamlessly

integrate their specific creative ideas into fashion pieces.

Through fine-tuning a pre-trained stable diffusion model using DreamBooth,

designers can achieve this by providing just 3-5 images of a particular item and
assigning it a unique identifier. This process is akin to adding a few new words into the
basic model, teaching it to recognize and replicate the unique design elements of the
identified item. Our model enables fashion designers to generate possible garment
designs by utilizing human language prompts. Additionally, it allows fashion designers
to work with a collection of high-quality fashion pieces that have been incorporated into
the model. Furthermore, the model offers extensive customization options, empowering
designers to mix and match components and iterate on their designs, ultimately allowing
for more streamlined and creative workflows based on their previous work.

To emulate the process of fashion design, we introduced specific fashion pieces,

including tops, bottoms, shoes, and accessories, as well as inspirations such as figures,
designs, human faces, paintings, natural scenery, and even random subjects like
cartoons or pillows. Each introduced item is associated with a concept, either a fashion
piece concept or an inspirational concept. By referring to the names we have given
them, we can easily retrieve and combine them with other elements.

Through the use of our model, fashion designers can preview their designs in
advance, which can aid in identifying potential issues and adjusting details. This
approach can ultimately reduce future risks and improve overall efficiency compared to
traditional design processes.

4.2 Prototype

4.3 Testing
We conducted a comprehensive testing process following our established testing
protocol to evaluate the limits and capabilities of our model. This allowed us to gain a
better understanding of the model's performance and explore its potential for creating
unique and innovative fashion designs.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 18

4.3.1 Testing Strategy

The process of apparel design applies a particular kind of problem solving,

consisting of a series of small steps (Cameron, 2009), which is why we create structures
——to help us break down concepts, so that the design process can be traced,
interpreted and studied throughout the model (Cross, 2010; Simon, 1996). Based on the
above mentioned (Lee & Jirousek, 2015), we introduced varying numbers of concepts
to the model to determine the optimal number for achieving high-performance and
satisfactory results. These concepts encompass any example we introduce to the model,
with each comprising a set of 3-5 images that represent fashion pieces or inspirations.
The testing structure is categorized under two main categories, namely Fashion Pieces
and Fashion Elements. Fashion Pieces are divided into two classes, with the first
representing the broader category of garments such as tops, bottoms, shoes, suits, and
accessories, while the second class comprises specific fashion pieces falling under the
first class. For instance, shirts and t-shirts belong to the "top" category, while trousers
and jeans belong to the "bottom" category under the first class. Fashion Elements, on
the other hand, are defined by five dimensions, namely shape, pattern, texture, color,
and space, each of which has a unique role in fashion design. Shape refers to the
silhouette or outline of a garment or fashion piece, including its contours, curves, and
overall form. Pattern encompasses the decorative designs and motifs applied to a
garment or fashion piece, such as stripes, polka dots, or florals. Texture describes the
tactile quality or surface appearance of a garment or fashion piece, including the feel of
the fabric and any details such as embroidery or beading. Color pertains to the hues,
shades, and tones of a garment or fashion piece, as well as any color combinations or
contrasts used. Space relates to the overall arrangement and distribution of the elements
within a garment or fashion piece, including its proportions, balance, and negative
space. Positive space pertains to the areas of interest or subject within the fashion piece,
such as the silhouette or form of the garment, or the placement of accessories or
embellishments. For example, the positive space in a dress design could be the
placement of a unique neckline, an intricate pattern, or a striking color contrast. On the
other hand, negative space in fashion design refers to the background or the areas
surrounding the subject of the work. For example, the negative space created by the
asymmetrical neckline shape is sensual and relaxing (Volpintesta, 2014).
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 19

In the Fashion Pieces section of our study, we will primarily focus on the
capabilities of the model for understanding the features of the fashion piece, shifting
within/across the different fashion piece classes, combining different concepts and
applying inspirations on fashion pieces. On the other hand, the Fashion Elements
category will offer greater opportunities for creativity as we aim to apply the different
fashion elements of various products to other fashion pieces, as well as exploring
diverse inspirations. This segment will involve a more detailed approach, as we
combine the specific fashion elements of different introduced concepts to add unique
and distinctive flavors to the fashion designs. Overall, by exploring different scenarios
and analyzing the results, we aim to identify the strengths and weaknesses of the model
and provide insights for future research and development in this area.

4.3.2 Testing results

Fashion Pieces

Within the Fashion Pieces category, we initially assessed the model's ability to
comprehend various types of fashion pieces without the need for introducing any
specific concepts. The pre-trained model was able to understand all the fashion pieces.
In this model, we use a text-image model and fine-tune it so that it can learn to bind a
unique identifier to that particular subject. This identifier can be used to generate
realistic images in different scenes. This technique has many practical applications, for
example, it can be used for product display in e-commerce, allowing consumers to
visualize the product without having to try it on. To implement this technique, we use a
semantic prior embedded in the model and a new autoclass prior to preserve the loss.
This approach is able to maintain the authenticity of the image while preserving the
features and identity of the subject during the synthesis process. Overall, our technique
is able to synthesize subjects that do not appear in the reference image under different
scenes, poses, perspectives and lighting conditions, which opens up new possibilities for
the generation and design of fashion pieces (Ruiz et al., 2022). At the same time, we
introduced one concept to evaluate the model's fundamental understanding of a single
fashion piece. The model performed well for a single concept, accurately identifying the
key features of the fashion piece (color, shape, pattern) and generating new designs
inspired by it that were comparable but distinct [Image I-Swarovski button]. As
emphasized in the official paper of Dreambooth model, precise selection of a unique
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 20

identifier followed by the subject's class name (e.g., "A [V] button") significantly
enhances the model's capacity to produce more creative and higher-quality results.

Figure 4. Subject-driven generation. Given Swarovski (left), it shows good that our approach
(right) can synthesize the "Swarovski" with high fidelity and in new contexts (text prompt: “A [Swarovski] button”)

Later on, we introduced various concepts to the model at different times, as this
was a crucial step in developing innovative and unique fashion designs. Our findings
revealed that by introducing more than eight concepts to the model, it was no longer
able to grasp all the concepts and integrate them into designs effectively. For example,
when we introduced 24 concepts to the model and called a particular dress in the prompt
(i.e., bejflow dress), the model struggled to replicate the given concept and instead
generated random dresses without considering the specific characteristics of the original
concept. In all the examples presented in the following section, it should be noted that
the model was trained with no more than eight concepts.

Following the initial assessment of the model's basic understanding of fashion

concepts, we proceeded to evaluate its creativity capabilities in terms of fashion piece
concepts and inspiration concepts. Our first objective was to determine if the model
could effectively shift between and within classes for the introduced concepts. For
example, we tested if the model could shift from a necklace to an earring, which are
both in the same "accessory" category, and if it could create high-heeled boots similar to
the features of a dress, which are in different categories (i.e., "suit" and "shoes"). The
model performed exceptionally well in the within-class results, as demonstrated by
Figure 5. However, while the model also produced examples which catches some
features of the Orpi Button for the across-first-class, the results exhibited greater
dissimilarity compared to transformations within the same class.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 21

Figure 5. Novel view compositions, art renditions and property modifications.

We are able to generate novel and meaningful images while faithfully preserving the identity and essence of the
subject during same-class, but we lose the main features of the subject during cross-class

In the final stage of our creativity assessment, we introduced inspiration concepts,

which were non-fashion products, including art pieces, real-life products (e.g., pillows),
patterns, human face photos and more. We applied these inspirations to both fashion
pieces with and without introducing a concept, such as on both a [Y] bag or a simple
bag. Our observations revealed that art pieces were applied to fashion products
successfully. When the prompt began with the name of the introduced art piece and
followed by the name of a fashion piece concept, the model tended to generate fashion
designs with painting styles inspired by art pieces, such as an impressionist-style dress
inspired by Monet as in Figure 6. On the other hand, when the prompt began with the
name of the introduced fashion piece concept and was linked to the inspiration concept
with prepositions such as "with," "in," "as," "on," or "by," the results produced images
of a dress with patterns that reflected the inspiration concept.

Figure 6. Inspiration Concept-driven generation. Given Munique Paint Concept (left), it

shows good that our approach (right) can synthesize the "Munique Paint Concept" with high fidelity and in new
contexts.

Our assessment also included the use of real human selfies, which we found to be
effective in printing faces on fashion products. We also observed that by providing the
model with specific product details and a given location, it could produce designs that
featured the introduced human with the destination background while wearing the
specified fashion piece. Furthermore, the model was also able to understand real-life
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 22

objects, such as the cutenose pillow concept, and create new designs by analyzing the
pillow's color, shape, and animal features. When we provided a contextual background
with a specific location, such as a forest or school, the model placed the animal inspired
by the pillow in the appropriate concept. We even found that by adding the word
"friends" to the prompt, the model could place the pillow character in a school context
and print it on a specified fashion piece. As a last example, we explored the use of
prompts that included contextual information such as "on runway," which allowed
fashion designers to view possible results on the runway and visualize the fashion
pieces being worn by models, although the realisticity of such results may be limited.

Figure 7. Inspiration Concept-driven generation. We tried various objectives and generated

their concepts.

Fashion Elements

In the Fashion Elements section of our study, we will explore the model's ability
to combine specific fashion elements of fashion products. As we did in the previous
section, we began by assessing the model's understanding of shape, pattern, texture,
color, and space elements. We accept the assumption that each concept contains all
fashion elements, regardless of whether it was a fashion piece or an inspiration concept.
The model demonstrated a high level of comprehension, even when combining these
elements in a single fashion piece.

During the creativity assessment, we applied various fashion elements to the

introduced fashion pieces. The model proved to be successful even in combining niche
colors, such as mint green with pink and peach orange. It also effectively incorporated
specified object shapes, such as trees, moon, and stars, and applied different patterns,
including floral and zebra, with various fabrics such as leather and velvet. Subsequently,
we explored the model's capacity to understand the specific fashion elements of the
introduced concept and apply them to both a fashion piece and introduced fashion piece
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 23

concept. Applying fashion elements from an introduced concept to an introduced

fashion concept proved to be a more challenging task for the model, as it required a
higher degree of creativity. However, the resulting designs were more unique and
innovative. One example that stood out was the application of Swarovski button (as
myswar button) elements to a Zara jacket (as myzara jacket). The goal was to apply the
elements of the Swarovski button to the Zara jacket. The model went beyond merely
adding the Swarovski button to the jacket as a decorative element. It created new
designs for the button, incorporating it as a brooch or chain accessory, or taking
inspiration from the color of the specified button and applying it to the jacket. The
model even added a new glittering pattern design by arranging stones randomly on
different parts of the jacket. Not only did the model consider the shape of the button, but
it also incorporated its color, texture, and pattern to create a cohesive and aesthetically
pleasing design. As in all previous results, the negative and positive space in the fashion
designs were clearly defined. Also, an interesting observation was that when the fabric
was specified in the prompt (such as Swarovski button on a velvet Zara jacket), the
model tended to apply the button across the entire texture of the jacket instead of using
it as a single button and the results had a stronger emphasis on the fabric used from
closer perspective as shown in Figure 8.

Figure 8. Fashion pieces-combined generation. We compared different prompt results,which

present a very good combination of results and we note that the model is happy to present the fabric.

To provide fashion designers with greater creative flexibility, we sought to

combine elements from multiple concepts. Our first approach was to combine the same
elements from multiple concepts on a fashion piece. While the model was generally
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 24

successful in providing appropriate results, not all results were visually pleasing. The
model appeared to find combining patterns relatively easy, but combining colors
presented a more significant challenge. Even when the color combination was specified
in the prompt, the model often combined the shape of the fashion piece concepts rather
than their colors and chose one of the concept’s color, as evidenced in Figure 9.
However, when we assigned a pattern, such as a zebra pattern, the model was able to
combine the colors of multiple concepts within that pattern. When it came to combining
two concepts' patterns, the model incorporated clues from both introduced concepts,
though the results were not always strong.

Figure 9. Concepts-combined generation. Our model presents a not-so-good performance

when combining the two introduced concepts

Combining the same fashion elements from multiple concepts and applying them
to an introduced fashion piece concept resulted in more creative and comprehensive
results. As shown in Figure 10 below, the model not only understood and combined the
colors but also effectively combined the different fashion elements from both concepts.
The texture of the Orpi button, with its shiny surface, was incorporated into the dress
and the pink/orange color shapes on the pattern were inspired by the button itself. On
the other hand, the pattern was drawn from the Munique art concept by Monet, while
the green color was derived from impressionist nature paintings. While this approach
offers high creativity opportunities for fashion designers, we found that it is barely
possible to combine a specific fashion element while keeping other elements constant.
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 25

Figure 10. Advanced Concepts-combined generation.

In the final part of our exploration, we aimed to apply different fashion elements
from different concepts without combining the same elements. This approach was
relatively easier than combining since it simply involves gathering different elements
together. While the results demonstrated the model's ability to understand some fashion
elements, when the prompt included fashion elements from more than two concepts, the
model tended to show either some of them or displayed different elements from each
concept. Nonetheless, this approach provides fashion designers with more options to
incorporate various fashion elements into their designs, expanding the possibilities for
creative and innovative fashion concepts. Additionally, we also applied these fashion
elements to specific parts of fashion pieces, such as collars, sleeves, etc. This allowed us
to explore the model's ability to apply these elements in a more targeted and specific
manner. Overall, we found that the model was able to effectively apply the fashion
elements to different parts of the fashion pieces, resulting in unique and creative
designs. However, we also observed that the model sometimes struggled to maintain
consistency in applying the fashion elements across the different parts of the fashion
piece.
5 Limitations & Future Work

5.1 Limitations
Our study aimed to explore the creativity capabilities of an AI model in the
fashion design process. We conducted comprehensive testing on the model's ability to
comprehend various fashion pieces and elements, and its capacity to generate innovative
and unique fashion designs. Our study revealed promising prospects for AI models in
fashion design with several positive outcomes, including a high level of comprehension
in understanding various fashion pieces and elements, effective application of fashion
elements from various fashion pieces, and the potential to combine different fashion
elements and inspiration concepts to generate innovative and unique designs. However,
our findings also highlighted some limitations of the model:

 The model struggles when introduced to more than eight concepts, leading
to reduced performance in combining concepts and generating unique
designs.
 The model occasionally generates unrelated examples since it focuses on
the examples from its pre-trained samples and cannot focus on the specific
concept we introduce.
 The generated images may not be entirely realistic and may have some
distortions or imperfections.
 The model sometimes struggles to maintain consistency in applying the
fashion elements across the different parts of the fashion piece.
 While the model was successful in combining the same elements from
multiple concepts, it is challenging to combine a specific fashion element
while keeping other elements constant.
 The model is not capable of rendering legible text, which may limit its
potential use in certain fashion design applications.
 Randomness is a limitation as the model generates several outputs for the
same prompt, and there is no guarantee of the first output being the best.

26
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 27

5.2 Future Research Directions

While the fashion design generation prototype created using Dreambooth has
shown promising results, there are several directions that can be explored in future work
to further improve and extend the system. Some potential avenues for future research
include:

 The current prototype was trained on a limited set of fashion images.

Incorporating more diverse training data, such as specific clothes from
different cultures and time periods, could help improve the diversity and
creativity of the generated designs.
 To make the generated designs more practical and usable, it may be useful
to incorporate constraints and guidelines into the system. For example,
designers could specify certain design elements or color schemes to be
included in the generated designs.
 In order to evaluate the usability and effectiveness of the current system, it
would be beneficial to conduct further user studies with fashion designers
and enthusiasts. Feedback from users could be used to further refine and
improve the system.
 While the current prototype generates fashion designs as static images, it
could be interesting to explore the generation of multi-modal designs, such
as designs that include both static images and animated 3D models.
 Improving the stability of the model's performance to ensure consistency
in generating high-quality designs.

By exploring these avenues for future research, we believe that our fashion design
generation system can be further improved and expanded to provide more creative and
practical design solutions for fashion designers and enthusiasts.
6 Conclusion
In this study, we explored the potential of AI models in fashion design by using a
fashion design generation system based on DreamBooth method. Our testing protocol
involved introducing various fashion concepts and inspiration concepts to the model and
evaluating its creativity capabilities in generating unique and innovative fashion
designs. Our findings demonstrated that the DreamBooth method was capable of
understanding various fashion pieces and elements and effectively incorporating them
into new designs. However, the model also had some limitations, such as reduced
performance when introduced to more than eight concepts, generating unrelated
examples since it focuses on the examples from its pre-trained class samples and
occasional struggles to maintain consistency in applying fashion elements across
different parts of a fashion piece.

Our study has shown that Dreambooth has the potential to meet many of the needs
of fashion designers. The model's ability to comprehend various fashion pieces and
elements, generate unique and innovative designs, and combine different elements and
inspiration concepts makes it a valuable tool for designers. Dreambooth can help
democratize the fashion industry by making design solutions more accessible and
affordable for emerging designers and small businesses. Despite some limitations, our
study suggests that AI models like Dreambooth can enhance designers' creativity and
productivity by generating unique and diverse design ideas that may have been
overlooked by human designers. Furthermore, our findings indicate that the model can
be used to generate designs quickly and efficiently, saving designers time and allowing
them to focus on other aspects of the design process. Overall, our research suggests that
Dreambooth can be a valuable tool for fashion designers, providing them with a new
and innovative approach to the design process, while also streamlining their workflows
and saving time.

To further improve and extend the system, future research could focus on
incorporating more diverse training data, adding constraints and guidelines to make the
generated designs more practical, conducting further user studies with fashion designers
and enthusiasts, and exploring the generation of multi-modal designs. By exploring
these avenues for future research, we believe that this fashion design generation system

28
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。 29

can be further refined and expanded to provide more creative and practical design
solutions for the fashion industry.
30
Reference List
Brock, A., Donahue, J., & Simonyan, K. (2018). Large Scale GAN Training for High
Fidelity Natural Image Synthesis. Retrieved from arXiv.org website:
https://arxiv.org/abs/1809.11096

Cameron, M. (2009). Review Essays: Donald A. Schön, The Reflective Practitioner:

How Professionals Think in Action. New York: Basic Books, 1983. ISBN 0—465
—06874—X (hbk); ISBN 0—465—06878—2 (pbk). Qualitative Social Work:
Research and Practice, 8(1), 124–129.
https://doi.org/10.1177/14733250090080010802

Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., & Bharath, A.
A. (2018). Generative Adversarial Networks: An Overview. IEEE Signal
Processing Magazine, 35(1), 53–65. https://doi.org/10.1109/msp.2017.2765202

Cross, N. (2010). Designerly ways of knowing. Springer London Ltd.

Eckman, M., & Wagner, J. (1995). Aesthetic Aspects of the Consumption of Fashion
Design: the Conceptual and Empirical Challenge. ACR North American
Advances, NA-22. Retrieved from
https://www.acrwebsite.org/volumes/7825/volumes/v22/NA

Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A. H., Chechik, G., & Cohen-
Or, D. (2022). An Image is Worth One Word: Personalizing Text-to-Image
Generation using Textual Inversion. ArXiv:2208.01618 [Cs]. Retrieved from
https://arxiv.org/abs/2208.01618

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., …
Bengio, Y. (2020). Generative adversarial networks. Communications of the
ACM, 63(11), 139–144. https://doi.org/10.1145/3422622

Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., … Guo, B. (2022). Vector
Quantized Diffusion Model for Text-to-Image Synthesis. ArXiv:2111.14822 [Cs].
Retrieved from https://arxiv.org/abs/2111.14822

vi
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
vii

Jetchev, N., & Research, Z. (2017). The Conditional Analogy GAN: Swapping Fashion
Articles on People Images. Retrieved from
https://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w32/
Jetchev_The_Conditional_Analogy_ICCV_2017_paper.pdf

Jonathan, H., Ajay, J., & Pieter, A. (2020). Denoising Diffusion Probabilistic Models.
Advances in Neural Information Processing Systems, 33. Retrieved from
https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179c
a4b-Abstract.html

Karras, T., Laine, S., & Aila, T. (2019). A Style-Based Generator Architecture for
Generative Adversarial Networks. 2019 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00453

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing
and Improving the Image Quality of StyleGAN. Retrieved from
openaccess.thecvf.com website:
https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_
Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html

Lee, J. S., & Jirousek, C. (2015). The development of design ideas in the early apparel
design process: a pilot study. International Journal of Fashion Design,
Technology and Education, 8(2), 151–161.
https://doi.org/10.1080/17543266.2015.1026411

Liu, L., Zhang, H., Ji, Y., & Jonathan Wu, Q. M. (2019). Toward AI fashion design: An
Attribute-GAN model for clothing match. Neurocomputing, 341, 156–167.
https://doi.org/10.1016/j.neucom.2019.03.011

Makkapati, V., & Patro, A. (2017). Enhancing Symmetry in GAN Generated Fashion
Images. Artificial Intelligence XXXIV, 10630, 405–410.
https://doi.org/10.1007/978-3-319-71078-5_34
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
viii

Ng, A., & Jordan, M. (2001). On Discriminative vs. Generative Classifiers: A

comparison of logistic regression and naive Bayes. Retrieved April 2, 2023, from
Neural Information Processing Systems website:
https://papers.nips.cc/paper_files/paper/2001/hash/7b7a53e239400a13bd6be6c91c
4f6c4e-Abstract.html

Pandey, N., & Savakis, A. (2020). Poly-GAN: Multi-conditioned GAN for fashion
synthesis. Neurocomputing, 414, 356–364.
https://doi.org/10.1016/j.neucom.2020.07.092

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised Representation Learning
with Deep Convolutional Generative Adversarial Networks. Retrieved from
arXiv.org website: https://arxiv.org/abs/1511.06434

Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Openai, M. (2022). Hierarchical
Text-Conditional Image Generation with CLIP Latents. Retrieved from
https://cdn.openai.com/papers/dall-e-2.pdf

Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., … Sutskever, I.
(2021, July 1). Zero-Shot Text-to-Image Generation. Retrieved from
proceedings.mlr.press website:
https://proceedings.mlr.press/v139/ramesh21a.html

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-
Resolution Image Synthesis with Latent Diffusion Models. ArXiv:2112.10752
[Cs], 2. Retrieved from https://arxiv.org/abs/2112.10752

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., & Aberman, K. (2022).
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven
Generation. ArXiv:2208.12242 [Cs]. Retrieved from
https://arxiv.org/abs/2208.12242

Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E., … Norouzi, M. (2022,
October 31). Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding. Retrieved April 2, 2023, from openreview.net website:
https://openreview.net/forum?id=08Yk-n5l2Al
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
ix

Simon, H. A. (1996). The Sciences of the Artificial, third edition. MIT Press.

Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015, June 1). Deep
Unsupervised Learning using Nonequilibrium Thermodynamics. Retrieved April
2, 2023, from proceedings.mlr.press website:
https://proceedings.mlr.press/v37/sohl-dickstein15.html

Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021).
Score-Based Generative Modeling through Stochastic Differential Equations.
ArXiv:2011.13456 [Cs, Stat]. Retrieved from https://arxiv.org/abs/2011.13456

Volpintesta, L. (2014). The language of fashion design: 26 principles every fashion

designer should know. In Colorado Mountain College. Beverly, MA: Rockport
Publishers. Retrieved from https://cmc.marmot.org/Record/.b42267080

Voronov, A., Khoroshikh, M., Babenko, A., & Ryabinin, M. (2023). Is This Loss
Informative? Speeding Up Textual Inversion with Deterministic Objective
Evaluation. ArXiv:2302.04841 [Cs]. Retrieved from
https://arxiv.org/abs/2302.04841

Yan, H., Zhang, H., Liu, L., Zhou, D., Xu, X., Zhang, Z., & Yan, S. (2022). Toward
Intelligent Design: An AI-based Fashion Designer Using Generative Adversarial
Networks Aided by Sketch and Rendering Generators. IEEE Transactions on
Multimedia, 1–1. https://doi.org/10.1109/TMM.2022.3146010

Lee, C.-Y., & Chen, B.-S. (2018). Mutually-exclusive-and-collectively-exhaustive

feature selection scheme. Applied Soft Computing, 68, 961–971.
https://doi.org/10.1016/j.asoc.2017.04.055
Appendix A
1. Interview Feedback

https://docs.google.com/spreadsheets/d/
1BrVgh0vmEPWVZFQphdtwWvWcEmiqkfOS/edit?
usp=sharing&ouid=112535582883725334921&rtpof=true&sd=true

2. Link of Test Structure

https://docs.google.com/spreadsheets/d/1culaRHze4g_DOlUoD7tIlA-
d1rFUlM6_YKGpSEt5E-U/edit#gid=439546458

x
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xi

3. Printed Version of Test Structure with Prompts

错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xii

4. Printed Version of Test Structure with Assessments

错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xiii

5. Test Results Corresponding to Test Structure (images below)

错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xiv
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xv
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xvi
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xvii
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xviii
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xix
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xx
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xxi
错误!使用“开始”选项卡将 Überschrift 1 应用于要在此处显示的文字。
xxii
Affidavit

I hereby declare that I have developed and written the enclosed seminar paper /
bachelor thesis / master thesis entirely on my own and have not used outside sources
without declaration in the text. Any concepts or quotations applicable to these sources
are clearly attributed to them. This seminar paper / bachelor thesis / master thesis has
not been submitted in the same or a substantially similar version, not even in part, to
any other authority for grading and has not been published elsewhere. This is to certify
that the printed version is equivalent to the submitted electronic one. I am aware of the
fact that a misstatement may have serious legal consequences.

I also agree that my thesis can be sent and stored anonymously for plagiarism
purposes. I know that my thesis may not be corrected if the declaration is not issued.