Gen AI Glossary
Gen AI Glossary
GLOSSARY
Discover the key technical terms associated with generative
AI and their meaning
A
Terms Description
A software program that can interact with its environment, collect data, and
Agents
use the data to perform self-determined tasks to meet predetermined goals.
A technique used to break the document into multiple chunks of text and
Auto Merging Retriever
further breaks the "parent" chunks into smaller "child" chunks.
B
Terms Description
A process where a model learns to reverse the noise added during the
Backward Diffusion
forward diffusion process, reconstructing data from noise.
C
Terms Description
Denoising Autoencoder ( DAE A type of autoencoder used to remove noise from data, often utilized in the
) backward diffusion process.
A Diffusion model are generative models, meaning that they are used to
Diffusion Model
generate data similar to the data on which they are trained.
E
Terms Description
The layer in a neural network that converts categorical data, such as words,
Embedding Layer
into continuous vector representations.
Terms Description
A full pass through the entire training dataset during the training process of a
Epoch
neural network.
F
Terms Description
Stands for Guided Language to Image Diffusion for Generation and Editing, a
GLIDE generative model developed by Open AI that uses diffusion processes to
generate and edit images based on textual descriptions.
H
Terms Description
Hugging Face is a machine learning (ML) and data science platform and
Hugging Face
community that helps users build, deploy and train machine learning models.
I
Terms Description
Stands for Intel® Extension for PyTorch*, which is a library that optimizes
IPEX
PyTorch performance on Intel hardware, including CPUs and GPUs.
J
Terms Description
In models like BERT, the ability to attend to the left and right context of a
Joint attention
word simultaneously.
L
Terms Description
Refers to the original or earlier version of the syntax used in the LangChain
LangChain Legacy Syntax framework, which is designed for building applications with large language
models (LLMs).
Stands for Large Language Model Operations and refers to the specialized
LLMops methods and processes meant to accelerate model creation, deployment,
and administration over its entire lifespan.
M
Terms Description
A set of practices that aim to deploy and maintain machine learning models in
production reliably and efficiently. MLOps combines aspects of machine
MLops
learning (ML), data engineering, and DevOps to streamline the model
lifecycle, from development and training to deployment
Stands for Mean Reciporal Rank, a metric used to evaluate the effectiveness
of a search or recommendation system. It calculates the average of the
MRR
reciprocal ranks of the first relevant result for a set of queries, providing a
measure of how quickly the system retrieves relevant information.
N
Terms Description
O
Terms Description
Shows the model one clear, descriptive example of what you'd like it to
One Shot Prompting imitate. When this prompt is run, the model's response will be to classify 'It
doesn't work' as positive or negative
Output parsers are responsible for taking the output of an LLM and
Output Parsers transforming it to a more suitable format. This is very useful when you are
using LLMs to generate any form of structured data.
P
Terms Description
In the data parallel paradigm, there are many different data and the same
operations (instructions in assembly language) are performed on these data
Parallel paradigm
at the same time. Parallelism is achieved by how many different data a single
operation can act on.
Pipeline parallelism extends on simple task parallelism, breaking the task into
Pipeline Parallelism a sequence of processing stages. Each stage takes the result from the
previous stage as input, with results being passed downstream immediately.
Prompt engineering is the practice of designing inputs for AI tools that will
Prompt Engineering produce optimal outputs. It involves experimenting with different prompts to
guide the model and achieve desired responses or outputs.
PySpark is the Python API for Apache Spark, an open source, distributed
Pyspark computing framework and set of libraries for real-time, large-scale data
processing.
Q
Terms Description
R
Terms Description
A measure of the difference between the original data and the reconstructed
Reconstruction Loss
data, often used to train denoising models.
S
Terms Description
It is an approach that simply asks a model the same prompt multiple times
Self-consistency Prompting
and takes the majority result as the final answer.
Spark's macine learning library is MLlib. Its goal is to make practical machine
learning scalable and easy. At a high level, it provides tools such as: ML
Spark ML
Algorithms: common learning algorithms such as classification, regression,
clustering, and collaborative filtering.
The process of breaking down text into smaller units called tokens, which
Tokenization can be words, subwords, or characters, to enable easier processing and
analysis by machine learning models in natural language processing tasks.
U
Terms Description
A modeling error that occurs when a model is too simple to capture the
Underfitting
underlying patterns in the data.
Unsupervised Learning A type of machine learning where the model is trained on data without labels.
V
Terms Description
W
Terms Description
A learned representation for text where words that have the same meaning
Word Embedding
have a similar representation.
Z
Terms Description