0% found this document useful (0 votes)

429 views

LLM Intro

This document provides an overview of large language models (LLMs), including: 1. LLMs are powerful AI models that can understand and generate human-like text using deep neural networks trained on vast amounts of data. 2. LLMs are based on transformer architectures, which enable efficient processing of sequential language data using encoders and decoders. 3. Key applications of LLMs include conversational assistants, content summarization, language translation, sentiment analysis, and generating human-like text for media, marketing, software development, and analyzing data in finance and healthcare.

Uploaded by

Madan Prabhu Durai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

429 views

LLM Intro

Uploaded by

Madan Prabhu Durai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

393 BROUGHT TO YOU IN PARTNERSHIP WITH

Getting Started With

CONTENTS

• About LLM

• Key Concepts and Features of LLM

Large Language Models

• How to Build Enterprise LLMs

• Conclusion: State-of-the-Art
Considerations and the Path Forward

• References

DR. TUHIN CHATTOPADHYAY

PROFESSOR OF AI AND BLOCKCHAIN, JAGDISH SHETH SCHOOL OF MANAGEMENT

In the present era, large language models (LLMs) have emerged as attention mechanism, a fundamental building block that redefined
transformative tools, unraveling the complexities of natural language how models understand and process contextual information in vast
understanding and paving the way for modern applications. Offering an amounts of text, catalyzing a paradigm shift in language representation
introduction and practical insights on how to navigate the intricacies of and comprehension.
harnessing LLMs, this Refcard serves as a comprehensive guide for both
TR ANSFORMER ARCHITECTURE
novices and seasoned practitioners seeking to unlock the capabilities
As mentioned, LLMs are built on transformer architectures [7]
.
of these powerful language models.
Transformers enable a model to efficiently process and understand
The primary purpose of this Refcard is to provide an end-to-end sequential data well-suited for natural language processing tasks.
understanding of LLM architecture, training methodologies, as well Comprising two fundamental components, the transformer
as applications of advanced artificial intelligence models in natural architecture includes an encoder and a decoder. The encoder processes
language processing. The key goals include elucidating the theoretical a sequence of input tokens, generating a corresponding sequence of
foundations of LLMs, detailing their training processes, exploring hidden states. Subsequently, the decoder utilizes these hidden states
practical applications across various domains, and discussing to generate a sequence of output tokens.
challenges and future directions in the field.

ABOUT LLM
A large language model (LLM) is a powerful artificial intelligence
model designed to understand and generate human-like text based on
vast amounts of data. These models belong to the broader category of
natural language processing (NLP) in the even larger realm of machine
learning. LLMs use deep neural networks with numerous parameters to
learn patterns, relationships, and contextual information from diverse
textual data sources.

HOW DO LARGE LANGUAGE MODELS WORK?

LLMs operate through a process known as deep learning, specifically
using a type of architecture called transformers.

THE GENESIS
The genesis of large language models (LLMs) can be traced back to
the revolutionary transformer architecture, a pivotal breakthrough in
natural language processing. At the heart of this innovation lies the

© DZONE | REFCARD | JANUARY 2023 1

The Most Contextual AI
Development Assistant
Pieces is an AI-enabled productivity tool designed

to increase developer efficiency and effectiveness

through personalized workflow assistance across

the entire toolchain.

Our centralized storage agent works on-device,

unifying various developer tools to proactively

capture and enrich useful materials, streamline

collaboration, and solve complex problems through

a contextual understanding of your uni que workflow
.

Learn More Ideal for solo developers, teams, and cross-company projects

Contextual AI Copilot LLM Utilization Embedded Across your

Workflow
The Pieces Copilot runs at the Pieces is one of the first to offer
Pieces is a tool-between-tools
operating system-level, using the fully functional LLM integrations
connecting the three main pillars of
power of retrieval augmented across macOS, Linux, and Windows,
a developers workflow
generation to learn from your entire giving users the option to leverage
Researching and problem solving in
workflow and make contextualized their choice of cloud, local, or
the browser, coding in the I}|, and
suggestions. custom LLMs.
collaborating with teammates.

Automatic Enrichment Workflow Activity Security & Privacy

Tracking
Our intelligent storage agent Pieces processes data on-device
The Pieces Copilot runs at the
automatically attaches useful for air-gapped security and privacy.
operating system-level, using the
context and metadata to the code All AI capabilities can run entirely
power of retrieval augmented
snippets and screenshots you save, local or in the cloud, depending on
generation to learn from your entire
enabling better organization, operational constraints.
workflow and make contextualized
searchability, and reusability.
suggestions.

“Everyone's got a copilot. You're inverted, Pieces minimizes context switching, accelerates

you've rotated the whole thing. It's not a onboarding, and significantly elevates the overall

vertical copilot, it's a horizontal one.” development experience while maintaining the

privacy and security of your work.

Scott Hanselma
n
VP of Developer Community at Microsoft
Via te anselminutes Po cast
REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

Figure 1: Transformer architecture assistants, content summarization, language translation, and

sentiment analysis.

Additionally, LLMs are employed for generating human-like content,

including articles, marketing materials, and code snippets. This
is particularly valuable in the media, marketing, and software
development industries, where high-quality and contextually relevant
content is essential. In sectors like finance and healthcare, LLMs assist
in analyzing and summarizing large volumes of textual information.
This aids decision-makers in extracting relevant insights, mitigating
risks, and making informed choices.

Table 2: Key applications of LLM across sectors

SECTORS APPLICATIONS OF LLM

Healthcare Clinical documentation, medical literature analysis

Finance Sentiment analysis, customer support

Marketing Content generation, SEO optimization

ATTENTION MECHANISM
A key innovation in transformers is the attention mechanism [4]. Legal Document review, legal research

This mechanism allows the model to focus on different parts of the Education Automated grading, content creation
input sequence when making predictions and captures long-range
Customer service Chatbots, automated email responses
dependencies within the data. The attention mechanism is particularly
powerful for tasks such as language understanding, translation, Technology Code generation, bug detection

summarization, and more. It enhances the model's ability to generate Media and Content summarization, script writing
coherent and contextually relevant responses by enabling it to weigh entertainment
the importance of each input token dynamically. Human resources Resume screening, employee feedback analysis

Table 1: Key components of the attention mechanism Manufacturing Quality control, supply chain optimization

COMPONENTS ROLE PLAYED

KEY CONCEPTS AND FEATURES OF LLM
Query The element of the input sequence for which the This section delves into the essential components and methodologies
model is determining the relevance. shaping the landscape of LLMs. From exploring the intricacies of core
Key The element of the input sequence against which the models and retrieval-augmented generation techniques to dissecting
relevance of the query is evaluated. the practical applications facilitated by platforms like Hugging Face

Value The output produced by the attention mechanism, Transformers, this segment unveils key concepts.
representing the weighted sum of values based on
the computed attention scores. Additionally, this section navigates through the significance of vector
databases, the artistry of prompt design and engineering, and the

REAL-WORLD APPLICATIONS OF LLMS orchestration and agents responsible for the functionality of LLMs. The

From natural language understanding to innovative problem-solving, discussion extends to the realm of local LLMs (LLLMs) and innovative

LLMs play a pivotal role across various domains, shaping the landscape Low-Rank Adaptation (LoRA) techniques, providing a comprehensive

of practical applications and technological advancements. overview of the foundational elements that underpin the effectiveness
and versatility of contemporary language models.
WHY LLMS MATTER
LLMs excel at understanding and generating human-like text, enabling THE FOUNDATION MODEL AND RETRIEVAL-
AUGMENTED GENERATION (RAG)
more sophisticated interactions between machines and humans. LLMs
The foundation model refers to the pre-trained language model that
can automate numerous language-related tasks, saving time and
serves as the basis for further adjustments or customization. These
resources. In industries such as customer support, content generation,
models are pre-trained on diverse and extensive datasets to understand
and data analysis, LLMs contribute to increased efficiency by handling
the nuances of language and are then fine-tuned for specific tasks or
routine language-based functions. LLMs enable the development
applications.
of innovative applications and services, including chatbots, virtual

© DZONE | REFCARD | JANUARY 2024 3 BROUGHT TO YOU IN PARTNERSHIP WITH

REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

Figure 2: Foundation model Unlike conventional scalar-based databases that organize data in rows or
columns, relying on exact matching or keyword-based search methods,
vector databases operate differently. They leverage techniques like
Approximate Nearest Neighbors (ANN) to rapidly search and compare
a substantial collection of vectors within an extremely short timeframe.

Table 3: Advantages of vector databases for LLMs

KEY ADVANTAGES TASKS PERFORMED

Determining context Vector embeddings enable LLMs to discern

Retrieval-augmented generation (RAG) is a specific approach to
context, providing a nuanced understanding
natural language processing that combines the strengths of both when analyzing specific words.
retrieval models and generative models. In RAG, a retriever is used
Detecting patterns The embeddings generated encapsulate diverse
to extract relevant information from a large database or knowledge
aspects of the data, empowering AI models to
base, and this information is then used by a generative model to create discern intricate relationships, identify patterns,
responses or content. This approach aims to enhance the generation and unveil hidden structures.

process by incorporating context or information retrieved from Supporting a wide Vector databases effectively address the
external sources. range of search challenge of accommodating diverse search
options options across a complex information source
RAG is particularly useful in scenarios where access to a vast amount with multiple attributes and use cases.
of external knowledge is beneficial for generating more accurate and
contextually relevant responses. This approach has applications in tasks Some of the leading open-source vector databases are Chroma [10,17],
such as question answering, content creation, and dialogue systems. Milvus [13], and Weaviate [23].

Figure 3: Retrieval-augmented generation (RAG) PROMPT DESIGN AND ENGINEERING

Prompt engineering [9] involves the creation and refinement of text
prompts with the aim of guiding language models to produce desired
outputs. On the other hand, prompt design is the process of crafting
prompts specifically to elicit desired responses from language models.

Table 4: Key prompting techniques

PROMPTING
MODUS OPERANDI
TECHNIQUES

Zero-shot • Involves utilizing a pre-existing language model

prompting that has been trained on diverse tasks to generate
text for a new task.
HUGGING FACE TRANSFORMERS
Hugging Face Transformers [8, 11]
emerges as an open-source deep • Makes predictions for a new task without
undergoing any additional training.
learning framework developed by Hugging Face, offering a versatile
toolkit for machine learning enthusiasts. This framework equips users Few-shot • Involves training the model with a small amount
prompting of data, typically ranging between two and
with APIs and utilities for accessing cutting-edge pre-trained models
five examples.
and optimizing their performance through fine-tuning. Supporting
• Fine-tunes the model with a minimal set of
a spectrum of tasks across various modalities, including natural
examples, leading to improved accuracy without
language processing, computer vision, audio analysis, and multi- requiring an extensive training dataset.
modal applications, Hugging Face Transformers simplify the process of
Chain-of-thought • Directs LLMs to engage in a structured reasoning
downloading and training state-of-the-art pre-trained models. (CoT) prompting process when tackling challenging problems.

VECTOR DATABASES • Involves presenting the model with a set of

examples where the step-by-step reasoning is
A vector database refers to a database designed to store and retrieve explicitly delineated.
embeddings within a high-dimensional space. In this context, vectors
Contextual • Furnishes pertinent background information to
serve as numerical representations of a dataset's features or attributes.
prompts [3] steer the response of a language model.
Utilizing algorithms that compute distance or similarity between
• Produces outputs that are accurate and
vectors in this high-dimensional space, vector databases excel in swiftly contextually relevant.
and efficiently retrieving data with similarities.

© DZONE | REFCARD | JANUARY 2024 4 BROUGHT TO YOU IN PARTNERSHIP WITH

REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

ORCHESTRATION AND AGENTS that intelligent agents, powered by these models, can autonomously
Orchestration frameworks play a crucial role in constructing AI-driven contribute to various aspects of application development. This
applications based on enterprise data. They prove invaluable collaboration enhances the linguistic capabilities of applications,
in eliminating the necessity for retraining foundational models, making them more adaptive, responsive, and effective in handling
surmounting token limits, establishing connections to data sources, natural language interactions and processing tasks.
and minimizing the inclusion of boilerplate code. These frameworks
typically offer connectors catering to a diverse array of data sources, AutoGen [5, 15] stands out as an open-source framework empowering

ranging from databases to cloud storage and APIs, facilitating the developers to construct LLM applications through the collaboration

seamless integration of data pipelines with the required sources. of multiple agents capable of conversing and collaborating to achieve
tasks. The agents within AutoGen are not only customizable and
In the development of applications involving LLMs, orchestration and conversable but also adaptable to various modes that incorporate
agents play integral roles in managing the complexity of language a mix of LLMs, human inputs, and tools. This framework enables
processing, ensuring coordinated execution, and enhancing the overall developers to define agent interaction behaviors with flexibility,
efficiency of the system. allowing the utilization of both natural language and computer code
to program dynamic conversation patterns tailored to different
Table 5: Roles of orchestration and agents
applications. As a versatile infrastructure, AutoGen serves as a

KEY foundation for building diverse applications, accommodating varying

DETAILED ROLE
CAPABILITIES complexities and LLM capacities.

Orchestration Workflow Oversees the intricate workflow of

Opting for AutoGen is more suitable when dealing with applications
management LLMs, coordinating tasks such as
text analysis, language generation, requiring code generation, such as code completion and code
and understanding to ensure a refactoring tools. On the other hand, LangChain proves to be a superior
seamless and cohesive operation. choice for applications focused on executing general-purpose Natural
Resource Optimizes the allocation of Language Processing (NLP) tasks, such as question answering and
allocation computational resources for text summarization.
tasks like training and inference,
balancing the demands of large- LOCAL LLMS (LLLMS)
scale language processing within the
A local LLM (LLLM), which runs on a personal computer or server,
application.
offers the advantage of independence from cloud services along with
Integration with Facilitates the integration of enhanced data privacy and security. By employing a local LLM, users
other services language processing capabilities
ensure that their data remains confined to their own device, eliminating
with other components, services, or
modules. the need for external data transfers to cloud services and bolstering
privacy measures. For instance, GPT4All [19] establishes an environment
Agents Autonomous Handle specific text-related tasks
text processing within the application, such as for the training and deployment of robust and tailored LLMs, designed
summarization, sentiment analysis, to operate efficiently on consumer-grade CPUs in a local setting.
or entity recognition, leveraging the
capabilities of LLMs. LOW-RANK ADAPTATION (LORA)

Adaptive Generate contextually relevant and

Low-rank adaptation (LoRA) [1,20] is used for the streamlined training
language coherent language, adapting to user of personalized LLMs. The pre-trained model weights remain fixed,
generation inputs or dynamically changing while trainable rank decomposition matrices are introduced into
requirements.
each layer of the transformer architecture. This innovative approach
Dialogue Manage the flow of dialogue, significantly diminishes the count of trainable parameters for
management interpret user intent, and generate subsequent tasks. LoRA has the capability to decrease the number
appropriate responses, contributing
of trainable parameters by a factor of 10,000 and reduces the GPU
to a more natural and engaging user
experience. memory requirement by threefold.

Knowledge Employ LLMs for knowledge HOW TO BUILD ENTERPRISE LLMS

retrieval and retrieval, extracting relevant
Rather than depending on widely used LLMs like ChatGPT, numerous
integration information from vast datasets or
external sources and integrating it companies eventually develop their own specialized LLMs tailored
seamlessly into the application. to process exclusive organizational data. Enterprise LLMs have the
capability to generate content specific to business needs, spanning
The synergy between orchestration and agents in the context of LLMs marketing articles, social media posts, and YouTube videos. They can
ensures that language-related tasks are efficiently orchestrated and actively contribute to the creation, evaluation, and design of company-

REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

specific software. Furthermore, enterprise LLMs may play a pivotal 3. Training the model: Using the prepared data to train the
role in innovating and designing cutting-edge applications to secure a selected model architecture, adjusting parameters to optimize
competitive advantage. performance and achieve desired learning outcomes.

4. Fine-tuning the LLM: Refining the pre-trained model on task-

SOLVING THE BUILD OR BUY DILEMMA
specific data to enhance its ability to understand and generate
When debating whether to independently pre-train an LLM or leverage
language patterns relevant to the targeted application.
an existing one, three alternatives come to the forefront:
5. Evaluating model performance: Assessing the model's
A. Utilizing the API of a commercial LLM
effectiveness and accuracy using appropriate metrics to ensure
B. Employing an already available open-source LLM
it meets the desired standards and objectives.
C. Or, undertaking the task of pre-training an LLM independently.
6. Deployment and iteration: Implementing the model in the

The merits of employing pre-trained LLMs encompass ongoing intended environment for real-world use, and continuously

performance enhancements and the capacity to adeptly handle a refining and updating it based on user feedback and evolving

diverse range of complex tasks, including text summarization, content requirements.

generation, code generation, sentiment analysis, and the creation

Figure 5: Steps to developing a custom LLM
of chatbots. Leveraging pre-trained LLMs offers the convenience of
time- and cost-savings, particularly given the resource-intensive and
costly nature of building a proprietary language model. Integration is
simplified through APIs provided by services like ChatGPT, and users
can enhance output quality via prompt engineering without altering
the fundamental model.

Open-source LLMs, on the other hand, provide users with the ability
to train and fine-tune the model to align with specific requirements.
The complete code and structure of these LLMs are publicly
accessible, offering increased flexibility and customization options.
Noteworthy examples of open-source LLMs encompass Google
PaLM 2, LLaMA 2 (released by Meta), and Falcon 180B (developed
by Technology Innovation Institute). While open-source LLMs
necessitate a higher level of technical proficiency and computational
resources for training, they afford users greater control over data,
model architecture, and enhanced privacy. Collaboration among
developers is encouraged, fostering innovative training approaches
and the creation of novel applications.

Figure 4: Enterprise LLMs This comprehensive approach facilitates the development of a tailored
and effective custom LLM for diverse applications.

PITFALLS OF LLM
While LLMs have demonstrated impressive capabilities, they are not
without their pitfalls. LLMs can inherit and perpetuate biases present
in their training data, leading to biased outputs. This can result in unfair
or discriminatory language, reflecting societal biases present in the
BUILDING A CUSTOM LLM data. The use of LLMs for content generation raises ethical concerns,
The development of a custom LLM involves a systematic process that particularly in cases where generated content could be misused, for
includes several key steps: example, in creating fake news or misinformation. The generation
of content by LLMs can raise legal and privacy concerns, especially
1. Data collection and preprocessing: Gathering relevant and
when it comes to issues like intellectual property, plagiarism, or the
representative data, and preparing it for model training by
inadvertent disclosure of sensitive information.
cleaning, organizing, and transforming it as needed.

2. Model architecture selection: Choosing the appropriate BUILDING ON TOP OF LLMS

architecture or design for the custom language model based on In harnessing the potent capabilities of GPT-style foundation models for
the specific requirements and characteristics of the task. enterprise applications, the journey begins with data transformation

REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

and record matching, paving the way for a paradigm shift in how • Macaw-LLM [2, 21] is a groundbreaking innovation that seamlessly
businesses leverage the immense potential of LLMs. Building on top integrates visual, audio, and textual information. Comprising
of these language models involves tailoring their functionalities to a modality module for encoding multimodal data, a cognitive
specific business requirements, customizing their training on domain- module harnessing pre-trained LLMs, and an alignment module
specific data, and integrating them seamlessly into existing workflows. for harmonizing diverse representations, Macaw-LLM stands
Enterprises can unlock new dimensions of efficiency by employing LLMs at the forefront of cutting-edge research in audio, image, and
for tasks such as document summarization, sentiment analysis, and multimodal language models.
customer interaction. Furthermore, the adaptability of these models
• The NExT Research Center at the National University of
allows for continuous refinement, ensuring that as business needs
Singapore (NUS) has recently unveiled NExT-GPT [6, 18] — a cutting-
evolve, the language models can evolve in tandem. As organizations
edge "any-to-any" multimodal LLM designed to adeptly process
navigate the landscape of digital transformation, building on top of
text, images, videos, and audio as both input and output.
LLMs emerges as a strategic imperative, fostering innovation, enhancing
Distinguished by its reliance on existing pre-trained models,
decision-making processes, and ultimately driving a competitive edge
NExT-GPT demonstrates remarkable efficiency by updating only
in an increasingly data-driven business environment.
1% of its total parameters during training. NExT-GPT boasts of a
versatile chat-based interface, empowering users to input text
CONCLUSION: STATE-OF-THE-ART
or upload files encompassing images, videos, or audio. With an
CONSIDERATIONS AND THE PATH
exceptional ability to comprehend the content of diverse inputs,
FORWARD
the model adeptly responds to user queries, generating text,
From the expanding horizons of audio, image, and multimodal LLMs to
image, video, or audio outputs tailored to user requests.
the imperative of responsible LLMs in navigating ethical considerations
and privacy, and finally, envisioning the future, this parting section RESPONSIBLE LLMS: NAVIGATING ETHICAL
examines the present landscape and illuminates the road ahead CONSIDERATIONS AND PRIVACY
for the continuous evolution and responsible deployment of these The utilization of LLMs gives rise to ethical concerns, encompassing the
groundbreaking technologies. possibilities of biased outputs, privacy infringements, and the potential
for misuse. To mitigate these issues, it is imperative to embrace
AUDIO, IMAGE, AND MULTIMODAL LLMS
transparent development practices, responsibly manage data, and
A multimodal LLM [16]
represents an advanced AI system that
incorporate fairness mechanisms. Addressing potential biases in
undergoes training with diverse modes of data, encompassing inputs
outputs, safeguarding user privacy, and mitigating the risk of misuse
from images, text, and audio sources to enhance its comprehension
are essential aspects of responsible LLM deployment. To achieve this,
and generation capabilities.
developers and organizations must adopt transparent development
A few leading multimodal LLMs are as follows: practices, implement robust privacy measures, and integrate fairness
mechanisms to ensure ethical and unbiased outcomes. Balancing the
• Gemini, Google's advanced multi-modal AI model, exhibits a
transformative potential of LLMs with ethical considerations is crucial
superior capability to comprehend and process diverse forms
for fostering a trustworthy and responsible AI landscape.
of information simultaneously, encompassing text, code,
audio, image, and video. As the successor to LaMDA and PaLM THE FUTURE OF LARGE LANGUAGE MODELS
2, Gemini, named after NASA's Project Gemini, represents a The future of LLMs promises exciting advancements. As these models
family of decoder-only Transformers, optimized for efficient evolve, the potential for self-fact-checking capabilities emerges,
training and inference on TPUs. Notably, Gemini surpasses contributing to more reliable and accurate outputs. However,
human experts in Massive Multitask Language Understanding advancements will still depend on the development of better prompt
(MMLU), showcasing its prowess. Its versatility spans computer engineering approaches to refine and enhance communication with
vision, geospatial science, human health, and integrated these models. Additionally, the future holds the prospect of improved
technologies. Google emphasizes Gemini's coding proficiency fine-tuning and alignment, ensuring that LLMs better sync with user
through AlphaCode 2, outperforming participants in coding intentions and generate contextually relevant responses. To make
competitions and demonstrating a remarkable 50 percent LLMs more accessible and applicable across diverse industries,
improvement over its predecessor. Trained on Google's Tensor providers must focus on developing tools that empower companies
Processing Units (TPU), Gemini boasts speed and cost efficiency, to establish their own reinforcement learning from human feedback
with plans to launch TPU v5p tailored for large-scale model (RLHF) pipelines. Customizing LLMs for specific applications will
training. Available in Nano, Pro, and Ultra variants, Gemini be a pivotal step forward, thus unlocking the full potential of these
caters to diverse user needs, from fast on-device tasks to high- models in addressing industry-specific needs and fostering a more
performance applications, with the Ultra version undergoing widespread adoption.
safety checks for release next year.

REFCARD | GETTING STARTED WITH L ARGE L ANGUAGE MODELS

REFERENCES: 13. Milvus: https://milvus.io/docs

A. RESEARCH PAPERS:
14. OpenAI:
1. Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., ... &
i. GPT-4V(ision) system card - https://cdn.openai.com/papers/
Chen, W. (2021). Lora: Low-rank adaptation of large language
GPTV_System_Card.pdf
models. arXiv preprint arXiv:2106.09685. https://arxiv.org/
abs/2106.09685 ii. GPT-4 Technical Report: https://cdn.openai.com/papers/
gpt-4.pdf
2. Lyu, C., Wu, M., Wang, L., Huang, X., Liu, B., Du, Z., ... & Tu, Z.
(2023). Macaw-LLM: Multi-Modal Language Modeling with
D. REPOSITORIES:
Image, Audio, Video, and Text Integration. arXiv preprint
15. AutoGen: https://github.com/microsoft/autogen
arXiv:2306.09093. Paper Link: https://arxiv.org/pdf/2306.09093.
16. Awesome-Multimodal-Large-Language-Models: https://
pdf
github.com/BradyFU/Awesome-Multimodal-Large-Language-
3. Tianyi Tang, Junyi Li, Wayne Xin Zhao, and Ji-Rong Wen. 2022.
Models#awesome-multimodal-large-language-models
Context-Tuning: Learning Contextualized Prompts for Natural
17. Chroma: AI-native open-source embedding database - https://
Language Generation. In Proceedings of the 29th International
github.com/chroma-core/chroma
Conference on Computational Linguistics, pages 6340–6354,
Gyeongju, Republic of Korea. International Committee on 18. Code and models for NExT-GPT: Any-to-Any Multimodal Large
Computational Linguistics. PDF: https://aclanthology.org/2022. Language Model - https://github.com/NExT-GPT/NExT-GPT
coling-1.552.pdf Code: https://github.com/rucaibox/context- 19. GPT4All: https://github.com/nomic-ai/gpt4all?source=post_
tuning page-----2598615a039a-------------- ------------------
4. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., 20. LoRA: Low-Rank Adaptation of Large Language Models - https://
Gomez, A. N., Kaiser, Ł. & Polosukhin, I. (2017). Attention is all github.com/microsoft/LoRA
you need. Advances in Neural Information Processing Systems
21. Macaw-LLM: Multi-Modal Language Modeling with Image, Video,
(p./pp. 5998--6008). Access Link: https://papers.neurips.cc/
Audio, and Text Integration - https://github.com/lyuchenyang/
paper/7181-attention-is-all-you-need.pdf
Macaw-LLM
5. Wu, Q., Bansal, G., Zhang, J., Wu, Y., Zhang, S., Zhu, E., ... & Wang,
22. Milvus: A cloud-native vector database, storage for next
C. (2023). Autogen: Enabling next-gen llm applications via multi-
generation AI applications - https://github.com/milvus-io/milvus
agent conversation framework. arXiv preprint arXiv:2308.08155.
https://doi.org/10.48550/arXiv.2308.08155 23. Weaviate: https://github.com/weaviate/weaviate

6. Wu, S., Fei, H., Qu, L., Ji, W., & Chua, T. S. (2023). Next-gpt: Any-
to-any multimodal llm. arXiv preprint arXiv:2309.05519. https://
WRITTEN BY DR. TUHIN CHATTOPADHYAY,
arxiv.org/abs/2309.05519
PROFESSOR OF AI AND BLOCKCHAIN, JAGDISH SHETH
SCHOOL OF MANAGEMENT
B. TUTORIALS:
Dr. Tuhin Chattopadhyay is a highly esteemed and
7. Harvard: From Transformer to LLM: Architecture, Training and celebrated figure in the fields of Industry 4.0 and data
Usage (Transformer Tutorial Series) - https://scholar.harvard. science, commanding immense respect from both the
academic and corporate fraternities. Dr. Tuhin has been recognized
edu/binxuw/classes/machine-learning-scratch/materials/ as one of India's Top 10 Data Scientists by Analytics India Magazine,
transformers showcasing his exceptional skills and profound knowledge in the field.
Dr. Tuhin serves as a Professor of AI and Blockchain at JAGSoM, located in
8. Hugging Face: NLP Course - https://huggingface.co/learn/nlp- Bengaluru, India. Dr. Tuhin is also a visionary entrepreneur, spearheading
his own AI consultancy organization that operates globally.
course/chapter1/1

9. DeepLearning.ai: ChatGPT Prompt Engineering for Developers

- https://www.deeplearning.ai/short-courses/chatgpt-prompt- 3343 Perimeter Hill Dr, Suite 100
Nashville, TN 37211
engineering-for-developers/ 888.678.0399 | 919.678.0300

At DZone, we foster a collaborative environment that empowers developers and

C. DOCUMENTATIONS: tech professionals to share knowledge, build skills, and solve problems through
10. Chroma: https://docs.trychroma.com/ content, code, and community. We thoughtfully — and with intention — challenge
the status quo and value diverse perspectives so that, as one, we can inspire
positive change through technology.
11. Hugging Face Transformers: https://huggingface.co/docs/
transformers/index Copyright © 2023 DZone. All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in any form or by means
12. Langchain: https://python.langchain.com/docs/get_started/ of electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.
introduction