Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
Ebook232 pages3 hoursByte-Sized Learning Series

Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Responsible AI Strategy Beyond Fear and Hype - 2025 Edition
 

Finalist for the 2023 HARVEY CHUTE Book Awards recognizing emerging talent and outstanding works in the genre of Business and Enterprise Non-Fiction.

 

In this comprehensive guide, business leaders will gain a nuanced understanding of large language models (LLMs) and generative AI. The book covers the rapid progress of LLMs, explains technical concepts in non-technical terms, provides business use cases, offers implementation strategies, explores impacts on the workforce, and discusses ethical considerations. Key topics include:

  • The Evolution of LLMs: From early statistical models to transformer architectures and foundation models.
  • How LLMS Understand Language: Demystifying key components like self-attention, embeddings, and deep linguistic modeling.
  • The Art of Inference: Exploring inference parameters for controlling and optimizing LLM outputs.
  • Appropriate Use Cases: A nuanced look at LLM strengths and limitations across applications like creative writing, conversational agents, search, and coding assistance.
  • Productivity Gains: Synthesizing the latest research on generative AI's impact on worker efficiency and satisfaction.
  • The Perils of Automation: Examining risks like automation blindness, deskilling, disrupted teamwork and more if LLMs are deployed without deliberate precautions.
  • The LLM Value Chain: Analyzing key components, players, trends and strategic considerations.
  • Computational Power: A deep dive into the staggering compute requirements behind state-of-the-art generative AI.
  • Open Source vs Big Tech: Exploring the high-stakes battle between open and proprietary approaches to AI development.
  • The Generative AI Project Lifecycle: A blueprint spanning use case definition, model selection, adaptation, integration and deployment.
  • Ethical Data Sourcing: Why the training data supply chain proves as crucial as model architecture for responsible development.
  • Evaluating LLMs: Surveying common benchmarks, their limitations, and holistic alternatives.
  • Efficient Fine-Tuning: Examining techniques like LoRA and PEFT that adapt LLMs for applications with minimal compute.
  • Human Feedback: How reinforcement learning incorporating human ratings and demonstrations steers models towards helpfulness.
  • Ensemble Models and Mixture-of-Experts: Parallels between collaborative intelligence in human teams and AI systems.
  • Areas of Research and Innovation: Retrieval augmentation, program-aided language models, action-based reasoning and more.
  • Ethical Deployment: Pragmatic steps for testing, monitoring, seeking feedback, auditing incentives and mitigating risks responsibly.

The book offers an impartial narrative aimed at informing readers for thoughtful adoption, maximizing real-world benefits while proactively addressing risks. With this guide, leaders gain integrated perspectives essential to setting sound strategies amidst generative AI's rapid evolution.

 

More Than a Book

 

By purchasing this book, you will also be granted access to the AI Academy platform. There you can view course modules, test your knowledge through quizzes, attend webinars, and engage in discussion with other readers. 

LanguageEnglish
PublisherNow Next Later AI
Release dateSep 2, 2023
ISBN9780645510577
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
Author

I. Almeida

I. Almeida is the Chief Transformation Officer at Now Next Later AI, an AI advisory, training, and publishing business supporting organizations with their AI strategy, transformation, and governance. She is a strong proponent of human-centered, rights-respecting, responsible AI development and adoption. Ignoring both hype and fear, she provides a balanced perspective grounded in scientific research, validated business outcomes and ethics. With a wealth of experience spanning over 26 years, I. Almeida held senior positions at companies such as Thoughtworks, Salesforce, and Publicis Sapient, where she advised hundreds of executive customers on digital- and technology-enabled Business Strategy and Transformation. She is the author of several books, including four AI guides with a clear aim to provide an independent, balanced and responsible perspective on Generative AI business adoption. I. Almeida serves as an AI advisory member in the Adelaide Institute of Higher Education Course Advisory Committee. She is a regular speaker at industry events such as Gartner Symposium, SXSW, and ADAPT. Her latest books show her extensive knowledge and insights, displaying her unique perspective and invaluable contributions to the field.

Related to Introduction to LLMs for Business Leaders

Titles in the series (5)

View More

Related ebooks

Business For You

View More

Reviews for Introduction to LLMs for Business Leaders

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Introduction to LLMs for Business Leaders - I. Almeida

    2

    INTRODUCTION TO LLMS

    Large language models have rapidly risen to prominence as one of the most promising and transformative technologies in artificial intelligence. Powered by advances in computational power and fueled by immense datasets, LLMs have evolved remarkably in just the last decade from simple statistical models to complex neural network architectures with capabilities rivaling or exceeding human language proficiency.

    In this chapter, I will chronicle the development of LLMs, highlighting key innovations and milestones that have led to models like GPT-4 and Claude 2. By understanding where LLMs originated and how they have progressed, we gain insight into their potential, as well as pressing questions around their responsible development and deployment. This chapter will introduce and define some of the key concepts I will explore in more depth later in the book.

    The Quest for Language AI

    Long before the era of deep learning, researchers were interested in statistical language modeling—developing mathematical models that could predict likely sequences of words based on patterns in text. Early language models included n-gram models that looked at the probability of a word appearing given the previous n-1 words. Though limited to local statistical patterns, these models were successfully used in applications like spell check and word prediction.

    In the late 2000s, a major shift happened in the way we understood and processed language with the help of computers. This progress came as neural language models, a type of advanced computer program inspired by the human brain. One of the standout models from this era was called Word2Vec ¹.

    Think of Word2Vec as a method that teaches computers to understand the relationship and context between words like how humans do. Instead of just reading words as isolated bits of information, Word2Vec processes large amounts of text to determine how words relate to one another. This process results in creating ‘embeddings’.

    Imagine if every word in a language was represented by a point on a map. Some points (or words) are closer to each other because they share similar meanings or are often used in the same context. These ‘points’ or representations are what we call embeddings. So, words like king and queen would be closer on this map because they both refer to royalty and are often mentioned together. What’s fascinating is that Word2Vec can even capture more complex relationships and analogies between words based on these embeddings. For instance, it can recognize that the relationship between man and woman is like the relationship between king and queen.

    Word vector illustration.

    Word vector illustration. ⁶⁸

    However, while Word2Vec was revolutionary, it had its limitations. The primary one being that when it was trying to predict a word or its meaning, it only looked at the immediate words around it, much like only seeing the nearest landmarks on a map. This means it didn’t always consider the larger context or the entire sentence, which sometimes led to less accurate interpretations of language. Nonetheless, it was a giant leap in the quest to make computers understand and generate human-like text.

    The Rise of Transformers

    The world of language processing witnessed a paradigm shift in 2017, a revolutionary change brought by introducing the transformer architecture. The term transformer in this context isn’t about shape-shifting robots, but it signifies an advancement in how computers comprehend and generate human-like text. This advancement has, over recent years, underpinned some of the most significant strides in artificial intelligence.

    Transformer Model Architecture

    Transformer Model Architecture ⁶⁵

    At the heart of the transformer architecture is a mechanism known as self-attention. To understand the significance of self-attention, imagine reading a novel. As you traverse through the pages, you don’t just focus on the immediate words you’re reading. Instead, your brain constantly references previous sections, characters, or themes, allowing you to draw a meaningful connection between past and present details. This holistic approach to reading is akin to the self-attention mechanism, where the model doesn’t merely focus on a word’s immediate neighbors but considers the entire context, making associations between words that might be far apart.

    Now, why was this so groundbreaking? Earlier models, like the aforementioned Word2Vec, were like tunnel-visioned readers, focusing primarily on neighboring words. The transformer architecture, on the other hand, treats a sentence or paragraph more holistically. Each word can attend to all other words, giving the model an unprecedented understanding of context.

    The paper titled Attention Is All You Need, ² published by researchers from Google, was the linchpin in this transformation. This paper didn’t just introduce a theoretical concept; it provided a robust framework that could be implemented and scaled. The central idea was that, in many cases, focusing on the relationships (or attention) between different parts of an input was more crucial than the actual content or sequence of the input. The paper’s title elegantly encapsulated this idea: attention mechanisms, particularly self-attention, were paramount to achieving high performance in many language tasks.

    It’s worth noting the sheer elegance and efficiency of transformers. While the concept might sound complex, transformers often require fewer computations than their predecessors for a given task. This is because, instead of sequentially processing information (word by word, or step by step), transformers can process all words or parts of a sentence in parallel. This parallel processing not only speeds up computations but also allows for a richer understanding of context.

    GPT-1, released in 2017, was the first transformer-based LLM. It had 117 million parameters—think of parameters like the knobs and levers that control how the machine processes language data. The more parameters, the more nuanced patterns the model can learn by analyzing large amounts of text during training. Though tiny compared to modern LLM sizes, GPT-1’s 117 million parameters learned enough complex language relationships to outperform substantially previous statistical models at tasks like answering questions and drawing inferences from passages.

    This breakthrough confirmed the potential for even more advanced language understanding if these transformer models were scaled up and trained on more text data. Much as bigger data and faster computers enabled businesses to gain deeper market insights, more parameters and data powered LLMs enabled businesses to gain deeper comprehension of language. GPT-1 showed the promise of the transformer approach, motivating rapid growth in model capacity that continues today. Soon after, in 2018, Google introduced BERT (Bidirectional Encoder Representations from Transformers.)

    GPT-2, released by OpenAI in 2019, benefited from 10x more parameters and 10x more data than GPT-1. Its impressive natural language generation capabilities showed the benefits of scale, though concerns about potential misuse led OpenAI to limit public access initially. Its 1.5 billion parameter version, however, was released fully in 2020.

    GPT-3 took another leap in 2020, scaled up to 175 billion parameters trained on over a trillion words of filtered text data, including the Common Crawl dataset, a web-crawled dataset. It required computational resources on the order of thousands of PetaFLOP/s-days. PetaFLOP/s-days refers to measuring the total compute used to train a model by multiplying the number of PetaFLOP/s (a measure of computer processing speed) by the number of days of training. This imposed training costs exceeding millions of dollars, with environmental impacts that required optimization.

    GPT-3 showed a remarkable capability called few-shot learning. What is few-shot learning? In simple terms, it means GPT-3 could successfully perform tasks like translation and question answering with just a few examples showing what to do. Show GPT-3 just a single translated sentence pair, and it could translate new sentences between those languages. Give it a couple of examples of question-answer pairs, and it could answer new questions on that topic. This ability to learn from just a few examples stunned researchers. Previous models required far more training examples to gain skills.

    Few-shot learning opened up GPT-3’s versatility for translation, question-answering, app development, robot control and more—all powered by simple natural language prompting. In business terms, it required less training data to gain new skills. This breakthrough revealed the possibilities of scale and fine-tuning transformer architectures, inspiring intense research interest in LLMs since 2020.

    GPT-4 continues the trend of exponential growth, likely employing a mixture of expert models, together reaching the trillion-parameter scale and multi-trillion word training datasets. Problematically, details remain undisclosed, but it likely builds on GPT-3 foundations with more advanced training techniques. The steady march upwards in scale proves that language modeling continues to benefit tremendously from added data and compute.

    Beyond OpenAI, other organizations like Google, Meta, Anthropic, Baidu, and more have trained LLMs at scale. Except for Meta, these businesses focus on delivering services via APIs rather than releasing full models. However, open source LLMs are beginning to show impressive results with 2023 releases, such as Dolly 2.0, LLaMA, Alpaca, and Vicuna.

    Foundational Models

    In 2021, researchers at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) introduced the paradigm of foundation models ³ to describe influential AI models like LLMs that serve as a foundation for downstream tasks. Foundation models are pre-trained on vast datasets, then fine-tuned on more specialized data to excel at particular applications. Their versatility and strong performance across many language tasks has made LLMs the most popular foundation model architecture today. Rather than training AI models from scratch for each new task, foundation models allow for efficient fine-tuning on much smaller specialized datasets to adapt them for specific uses.

    Generative AI

    As previously discussed, LLMs fall under the subset of AI called generative AI. Generative AI refers to systems that can autonomously create new content like text, code, images, video or audio that meaningfully extends beyond their training data. LLMs are a leading example of Generative AI because they can generate original, human-like text after training on large text corpora. Other types of generative AI include systems that generate images, videos, music, 3D shapes, and more, based on analyzing datasets of visual content.

    Generative AI, fueled by advances in LLMs, has rapidly risen to prominence given its ability to automate content creation in affordable and customized ways. Unlike most AI, which focuses on analysis and classification, generative AI unlocks creative applications that go beyond training data. However, generative systems require responsible design to avoid harm given their ability to produce misleading content at scale if poorly implemented.

    Specialized LLMs

    Besides general knowledge models, some organizations have developed more specialized LLMs. Bloomberg recently unveiled BloombergGPT ⁴, a large language model specially trained on financial data to understand nuanced business and finance language.

    With 50 billion parameters trained on 360 billion tokens of financial text (comparable to number of words) and 345 billion tokens of general text, BloombergGPT achieves state-of-the-art results on financial NLP tasks like question answering and named entity recognition.

    The model’s architecture, domain-specific data, and efficient training enable it to match larger models on some benchmarks. While not released publicly because of ethical concerns, BloombergGPT shows the value of curated in-domain data to create specialized AI systems competitive with general models orders of magnitude larger. As firms adopt AI, expect domain-optimized models like BloombergGPT to drive automation and insights. Specialization trades off broad versatility for targeted strengths.

    Besides domain-specific pre-training, some organizations have developed LLMs specialized not just in knowledge but in capabilities aligned with human values.

    For example, Anthropic’s Constitutional AI ⁵ model incorporates a technique called Constitutional AI to improve capabilities like honesty, harmless, helpfulness, and avoiding harmful stereotypes. During training, the model is recursively prompted to edit its own responses until they demonstrably uphold principles in Anthropic’s Bill of Rights for AI—seeking truth, upholding dignity, and promoting empathy. This technique, inspired by human moral development, aims to impart cooperative, harmless instincts exceeding the training distribution.

    Models like Constitutional AI exemplify emerging techniques to potentially imbue LLMs with greater alignment to ethical priorities, not just targeted knowledge. However, leaving the development of ethics and constitutional frameworks to AI startups is unlikely to be the most effective approach to developing safe and ethical AI guardrails.

    Open or Closed and Why?

    Meta released its open source large language model LLaMA ⁶ in 2023, with 65 billion parameters trained on massive text and code datasets.

    Meta’s decision to make LLaMA 2 an open source model available for research and commercial use is important because it allows for greater collaboration, due diligence, and innovation in the field of artificial intelligence. By making Llama available to everyone, Meta is encouraging researchers and developers to experiment with the model and find new and creative ways to use it. This could lead to the development of new applications that we can’t even imagine today, and it also leads to stronger auditing and risk management by different communities.

    OpenAI has not released their popular GPT-3 model as open source, instead

    Enjoying the preview?
    Page 1 of 1