Introduction to Multimodal Language models with LLaVA. What are Multimodal models, how do they work, the LLaVA papers/models, and Image classification experiment.
The document provides an overview of transformers, large language models (LLMs), and artificial general intelligence (AGI). It discusses the architecture and applications of transformers in natural language processing. It describes how LLMs have evolved from earlier statistical models and now perform state-of-the-art results on NLP tasks through pre-training and fine-tuning. The document outlines the capabilities of GPT-3, the largest LLM to date, as well as its limitations and ethical concerns. It introduces AGI and the potential for such systems to revolutionize AI, while also noting the technical, ethical and societal challenges to developing AGI.
Managing and Versioning Machine Learning Models in PythonSimon Frid
Practical machine learning is becoming messy, and while there are lots of algorithms, there is still a lot of infrastructure needed to manage and organize the models and datasets. Estimators and Django-Estimators are two python packages that can help version data sets and models, for deployment and effective workflow.
This session was presented at the AWS Community Day in Munich (September 2023). It's for builders that heard the buzz about Generative AI but can’t quite grok it yet. Useful if you are eager to connect the dots on the Generative AI terminology and get a fast start for you to explore further and navigate the space. This session is largely product agnostic and meant to give you the fundamentals to get started.
Intro to big data and applications - day 1Parviz Vakili
This document provides an overview and introduction to big data and its applications. It defines key concepts related to big data, including the five V's of big data (volume, velocity, variety, veracity, and value). It also discusses where big data comes from, different data types (structured, semi-structured, unstructured), and common applications of big data across different industries. Finally, it introduces concepts of data governance, data strategy, and how big data can support digital transformation.
How to fine-tune and develop your own large language model.pptxKnoldus Inc.
In this session, we will what are large language models, how we can fin-tune a pre-trained LLM with our data, including data preparation, model training, model evaluation.
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
This video on Hadoop interview questions part-1 will take you through the general Hadoop questions and questions on HDFS, MapReduce and YARN, which are very likely to be asked in any Hadoop interview. It covers all the topics on the major components of Hadoop. This Hadoop tutorial will give you an idea about the different scenario-based questions you could face and some multiple-choice questions as well. Now, let us dive into this Hadoop interview questions video and gear up for youe next Hadoop Interview.
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
How Technology impacts us with its new reforms from Bulb to Robot.
How it works in Daily routine from playing BGMI to OCULUS from META FACEBOOK
playing CALL OF DUTY and all.
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Thank you for the overview of Florence and vision capabilities. Large foundational models continue advancing multimodal abilities in helpful ways when guided by principles of safety, transparency and accountability.
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
The AI Revolution - Aaron Stelle WFG TitleAaron Stelle
Artificial intelligence is changing many industries like real estate marketing through tools that can automatically generate logos, branding assets, social media templates, and targeted advertising creatives. While AI offers benefits, it also enables new types of fraud and scams, and without proper oversight could potentially be misused in ways that harm people or even pose an existential risk to humanity. The presentation discusses both practical applications of AI today and potential issues to address to ensure it is developed and applied responsibly.
This document provides a high-level summary of the architecture of EMC's Atmos cloud storage platform. It describes how Atmos uses a distributed services architecture and event-driven design to efficiently manage and store large amounts of unstructured data at scale across a global infrastructure. Key components of the Atmos architecture include front-end nodes, distributed software services, metadata management, policy-based data placement and protection, and support for multi-tenancy.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
• Vector Database: consider sharding and High Availability
• Fine Tuning: collecting data to be used for fine tuning
• Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
• Chain of Reasoning and Agents
• Caching embeddings and responses
• Personalization and Conversational Memory Database
• Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
• Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
• Fallback techniques: fallback to a different model, or default answers
• API scaling techniques, rate limiting, etc.
• Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
ChatGPT is a powerful language model developed by OpenAI. It is designed to generate human-like text based on given prompts. As a prompt engineer, you can utilize ChatGPT to create engaging conversations, provide information, answer questions, and assist users. It's a versatile tool for natural language processing tasks, enabling more interactive and intelligent interactions.
Machine learning techniques are powerful, but building and deploying such models for production use require a lot of care and expertise.
A lot of books, articles, and best practices have been written and discussed on machine learning techniques and feature engineering, but putting those techniques into use on a production environment is usually forgotten and under- estimated , the aim of this talk is to shed some lights on current machine learning deployment practices, and go into details on how to deploy sustainable machine learning pipelines.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Here are the key steps in the ChatIE framework:
1. The user provides a text document and specifies the information extraction task (e.g. entity extraction, relation extraction) through natural language.
2. ChatGPT understands the task and responds with the extracted information by highlighting the relevant entities/relations in the text.
3. The user can interactively give feedback to ChatGPT to refine its understanding of the task and extraction.
4. ChatGPT learns from the feedback to improve its extraction for future conversations.
The framework aims to leverage ChatGPT's strengths in natural language understanding and generation for zero-shot information extraction via human-AI collaboration. The interactive feedback also helps address Chat
Benchmark comparison of Large Language ModelsMatej Varga
The document summarizes the results of a benchmark comparison that tested several large language models across different skillsets and domains. It shows that GPT-4 performed the best overall based on metrics like logical robustness, correctness, efficiency, factuality, and common sense. Tables display the scores each model received for different skillsets and how they compare between open-sourced, proprietary, and oracle models. The source is listed as an unreviewed preprint paper and related GitHub page under a Creative Commons license.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
An introduction to Unstructured Data and the world of Vector Databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture.
This document provides an overview of building, evaluating, and optimizing a RAG (Retrieve-and-Generate) conversational agent for production. It discusses setting up the development environment, prototyping the initial system, addressing challenges when moving to production like latency, costs, and quality issues. It also covers approaches for systematically evaluating the system, including using LLMs as judges, and experimenting and optimizing components like retrieval and generation through configuration tuning, model fine-tuning, and customizing the pipeline.
Conversational AI and Chatbot IntegrationsCristina Vidu
Conversational AI and Chatbots (or rather - and more extensively - Virtual Agents) offer great benefits, especially in combination with technologies like RPA or IDP. Corneliu Niculite (Presales Director - EMEA @DRUID AI) and Roman Tobler (CEO @Routinuum & UiPath MVP) are discussing Conversational AI and why Virtual Agents play a significant role in modern ways of working. Moreover, Corneliu will be displaying how to build a Workflow and showcase an Accounts Payable Use Case, integrating DRUID and UiPath Robots.
📙 Agenda:
The focus of our meetup is around the following areas - with a lot of room to discuss and share experiences:
- What is "Conversational AI" and why do we need Chatbots (Virtual Agents);
- Deep-Dive to a DRUID-UiPath Integration via an Accounts Payable Use Case;
- Discussion, Q&A
Speakers:
👨🏻💻 Corneliu Niculite, Presales Director - EMEA DRUID AI
👨🏼💻 Roman Tobler, UiPath MVP, Co-Founder & CEO Routinuum GmbH
This session streamed live on March 8, 2023, 16:00 PM CET.
Check out our upcoming events at: community.uipath.com
Contact us at: [email protected]
Using Generative AI in the Classroom .pptxJonathanDietz3
Here are some key ethical issues to consider when using generative AI like ChatGPT in the classroom:
1. Accuracy and reliability of information. Students may take generative AI outputs as fact without verifying the information. Teachers need to emphasize to students that AI systems can be wrong or generate implausible responses.
2. Bias and unfair treatment. As the systems are trained on human-created data, they risk perpetuating biases in that data if not developed carefully. Teachers should be aware of potential biases.
3. Privacy and consent. Student data used to improve systems raises privacy issues. Systems should not collect private student data without permission.
4. Authorship and ownership. It may not be clear
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
The growing adoption of Gen AI, esp. LLMs, has re-ignited the discussion around AI Regulations — to ensure that AI/ML systems are responsibly trained and deployed. Unfortunately, this effort is complicated by multiple governmental organizations and regulatory bodies releasing their own guidelines and policies with little to no agreement on the definition of terms.
Rather than trying to understand and regulate all types of AI, we recommend a different (and practical) approach in this talk based on AI Transparency —
to transparently outline the capabilities of the AI system based on its training methodology and set realistic expectations with respect to what it can (and cannot) do.
We outline LLMOps architecture patterns and show how the proposed approach can be integrated at different stages of the LLMOps pipeline capturing the model's capabilities. In addition, the AI system provider also specifies scenarios where (they believe that) the system can make mistakes, and recommends a ‘safe’ approach with guardrails for those scenarios.
Retrieval Augmented Generation (RAG), is a popular method to use a large language model, a vector database, and some sort of prompt interface to build better chat bots. On the surface, it seems pretty simple to build a RAG app, but when it comes down to implementation, there are many details to hash out. These details include how to: chunk data, work with embeddings, and even how to select and use a vector database.
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
In this talk, I share some of my personal reflections and learnings on benchmark development and community building for making robust scientific progress. This talk is informed by my experience as a developer of the MS MARCO benchmark and as an organizer of the TREC Deep Learning Track. My goal in this talk is to situate the act of releasing a dataset in the context of broader research visions and to draw due attention to considerations of scientific and social outcomes that are invariably salient in the acts of dataset creation and distribution.
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
The document discusses how graph databases like Neo4j can enable real-time analytics at massive scale by leveraging relationships in data. It notes that data is growing exponentially but traditional databases can't efficiently analyze relationships. Neo4j natively stores and queries relationships to allow analytics 1000x faster. The document advocates that graphs will form the foundation of modern data and analytics by enhancing machine learning models and enabling outcomes like building intelligent applications faster, gaining deeper insights, and scaling limitlessly without compromising data.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples to illustrate these techniques in action.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples of projects that utilize these techniques, such as a document retrieval system that finds similar texts using embeddings and a vector database.
H2O.ai basic components and model deployment pipeline presented. Benchmark for scalability, speed and accuracy of machine learning libraries for classification presented from https://github.com/szilard/benchm-ml.
How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93
This document provides a technical introduction to large language models (LLMs). It explains that LLMs are based on simple probabilities derived from their massive training corpora, containing trillions of examples. The document then discusses several key aspects of how LLMs work, including that they function as a form of "lossy text compression" by encoding patterns and relationships in their training data. It also outlines some of the key elements in the architecture and training of the most advanced LLMs, such as GPT-4, focusing on their huge scale, transformer architecture, and use of reinforcement learning from human feedback.
The AI Revolution - Aaron Stelle WFG TitleAaron Stelle
Artificial intelligence is changing many industries like real estate marketing through tools that can automatically generate logos, branding assets, social media templates, and targeted advertising creatives. While AI offers benefits, it also enables new types of fraud and scams, and without proper oversight could potentially be misused in ways that harm people or even pose an existential risk to humanity. The presentation discusses both practical applications of AI today and potential issues to address to ensure it is developed and applied responsibly.
This document provides a high-level summary of the architecture of EMC's Atmos cloud storage platform. It describes how Atmos uses a distributed services architecture and event-driven design to efficiently manage and store large amounts of unstructured data at scale across a global infrastructure. Key components of the Atmos architecture include front-end nodes, distributed software services, metadata management, policy-based data placement and protection, and support for multi-tenancy.
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Mihai Criveti
Mihai is the Principal Architect for Platform Engineering and Technology Solutions at IBM, responsible for Cloud Native and AI Solutions. He is a Red Hat Certified Architect, CKA/CKS, a leader in the IBM Open Innovation community, and advocate for open source development. Mihai is driving the development of Retrieval Augmentation Generation platforms, and solutions for Generative AI at IBM that leverage WatsonX, Vector databases, LangChain, HuggingFace and open source AI models.
Mihai will share lessons learned building Retrieval Augmented Generation, or “Chat with Documents” platforms and APIs that scale, and deploy on Kubernetes. His talk will cover use cases for Generative AI, limitations of Large Language Models, use of RAG, Vector Databases and Fine Tuning to overcome model limitations and build solutions that connect to your data and provide content grounding, limit hallucinations and form the basis of explainable AI. In terms of technology, he will cover LLAMA2, HuggingFace TGIS, SentenceTransformers embedding models using Python, LangChain, and Weaviate and ChromaDB vector databases. He’ll also share tips on writing code using LLM, including building an agent for Ansible and containers.
Scaling factors for Large Language Model Architectures:
• Vector Database: consider sharding and High Availability
• Fine Tuning: collecting data to be used for fine tuning
• Governance and Model Benchmarking: how are you testing your model performance
over time, with different prompts, one-shot, and various parameters
• Chain of Reasoning and Agents
• Caching embeddings and responses
• Personalization and Conversational Memory Database
• Streaming Responses and optimizing performance. A fine tuned 13B model may
perform better than a poor 70B one!
• Calling 3rd party functions or APIs for reasoning or other type of data (ex: LLMs are
terrible at reasoning and prediction, consider calling other models)
• Fallback techniques: fallback to a different model, or default answers
• API scaling techniques, rate limiting, etc.
• Async, streaming and parallelization, multiprocessing, GPU acceleration (including
embeddings), generating your API using OpenAPI, etc.
ChatGPT is a powerful language model developed by OpenAI. It is designed to generate human-like text based on given prompts. As a prompt engineer, you can utilize ChatGPT to create engaging conversations, provide information, answer questions, and assist users. It's a versatile tool for natural language processing tasks, enabling more interactive and intelligent interactions.
Machine learning techniques are powerful, but building and deploying such models for production use require a lot of care and expertise.
A lot of books, articles, and best practices have been written and discussed on machine learning techniques and feature engineering, but putting those techniques into use on a production environment is usually forgotten and under- estimated , the aim of this talk is to shed some lights on current machine learning deployment practices, and go into details on how to deploy sustainable machine learning pipelines.
This document discusses techniques for fine-tuning large pre-trained language models without access to a supercomputer. It describes the history of transformer models and how transfer learning works. It then outlines several techniques for reducing memory usage during fine-tuning, including reducing batch size, gradient accumulation, gradient checkpointing, mixed precision training, and distributed data parallelism approaches like ZeRO and pipelined parallelism. Resources for implementing these techniques are also provided.
Here are the key steps in the ChatIE framework:
1. The user provides a text document and specifies the information extraction task (e.g. entity extraction, relation extraction) through natural language.
2. ChatGPT understands the task and responds with the extracted information by highlighting the relevant entities/relations in the text.
3. The user can interactively give feedback to ChatGPT to refine its understanding of the task and extraction.
4. ChatGPT learns from the feedback to improve its extraction for future conversations.
The framework aims to leverage ChatGPT's strengths in natural language understanding and generation for zero-shot information extraction via human-AI collaboration. The interactive feedback also helps address Chat
Benchmark comparison of Large Language ModelsMatej Varga
The document summarizes the results of a benchmark comparison that tested several large language models across different skillsets and domains. It shows that GPT-4 performed the best overall based on metrics like logical robustness, correctness, efficiency, factuality, and common sense. Tables display the scores each model received for different skillsets and how they compare between open-sourced, proprietary, and oracle models. The source is listed as an unreviewed preprint paper and related GitHub page under a Creative Commons license.
LangChain Intro, Keymate.AI Search Plugin for ChatGPT, How to use langchain library? How to implement similar functionality in programming language of your choice? Example LangChain applications.
The presentation revolves around the concept of "langChain", This innovative framework is designed to "chain" together different components to create more advanced use cases around Large Language Models (LLMs). The idea is to leverage the power of LLMs to tackle complex problems and generate solutions that are more than the sum of their parts.
One of the key features of the presentation is the application of the "Keymate.AI Search" plugin in conjunction with the Reasoning and Acting Chain of Thought (ReAct) framework. The presenter encourages the audience to utilize these tools to generate reasoning traces and actions. The ReAct framework, learned from an initial search, is then applied to these traces and actions, demonstrating the potential of LLMs to learn and apply complex frameworks.
The presentation also delves into the impact of climate change on biodiversity. The presenter prompts the audience to look up the latest research on this topic and summarize the key findings. This exercise not only highlights the importance of climate change but also demonstrates the capabilities of LLMs in researching and summarizing complex topics.
The presentation concludes with several key takeaways. The presenter emphasizes that specialized custom solutions work best and suggests a bottom-up approach to expert systems. However, they caution that over-abstraction can lead to leakages, causing time and money limits to hit early and tasks to fail or require many iterations. The presenter also notes that while prompt engineering is important, it's not necessary to over-optimize if the LLM is clever. The presentation ends on a hopeful note, expressing a need for more clever LLMs and acknowledging that good applications are rare but achievable.
Overall, the presentation provides a comprehensive overview of the LanGCHAIN framework, its applications, and the potential of LLMs in solving complex problems. It serves as a call to action for the audience to explore these tools and frameworks.
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
An introduction to Unstructured Data and the world of Vector Databases, we will see how they different from traditional databases. In which cases you need one and in which you probably don’t. I will also go over Similarity Search, where do you get vectors from and an example of a Vector Database Architecture.
This document provides an overview of building, evaluating, and optimizing a RAG (Retrieve-and-Generate) conversational agent for production. It discusses setting up the development environment, prototyping the initial system, addressing challenges when moving to production like latency, costs, and quality issues. It also covers approaches for systematically evaluating the system, including using LLMs as judges, and experimenting and optimizing components like retrieval and generation through configuration tuning, model fine-tuning, and customizing the pipeline.
Conversational AI and Chatbot IntegrationsCristina Vidu
Conversational AI and Chatbots (or rather - and more extensively - Virtual Agents) offer great benefits, especially in combination with technologies like RPA or IDP. Corneliu Niculite (Presales Director - EMEA @DRUID AI) and Roman Tobler (CEO @Routinuum & UiPath MVP) are discussing Conversational AI and why Virtual Agents play a significant role in modern ways of working. Moreover, Corneliu will be displaying how to build a Workflow and showcase an Accounts Payable Use Case, integrating DRUID and UiPath Robots.
📙 Agenda:
The focus of our meetup is around the following areas - with a lot of room to discuss and share experiences:
- What is "Conversational AI" and why do we need Chatbots (Virtual Agents);
- Deep-Dive to a DRUID-UiPath Integration via an Accounts Payable Use Case;
- Discussion, Q&A
Speakers:
👨🏻💻 Corneliu Niculite, Presales Director - EMEA DRUID AI
👨🏼💻 Roman Tobler, UiPath MVP, Co-Founder & CEO Routinuum GmbH
This session streamed live on March 8, 2023, 16:00 PM CET.
Check out our upcoming events at: community.uipath.com
Contact us at: [email protected]
Using Generative AI in the Classroom .pptxJonathanDietz3
Here are some key ethical issues to consider when using generative AI like ChatGPT in the classroom:
1. Accuracy and reliability of information. Students may take generative AI outputs as fact without verifying the information. Teachers need to emphasize to students that AI systems can be wrong or generate implausible responses.
2. Bias and unfair treatment. As the systems are trained on human-created data, they risk perpetuating biases in that data if not developed carefully. Teachers should be aware of potential biases.
3. Privacy and consent. Student data used to improve systems raises privacy issues. Systems should not collect private student data without permission.
4. Authorship and ownership. It may not be clear
Neural Language Generation Head to Toe Hady Elsahar
This is a gentle introduction to Natural language Generation (NLG) using deep learning. If you are a computer science practitioner with basic knowledge about Machine learning. This is a gentle intuitive introduction to Language Generation using Neural Networks. It takes you in a journey from the basic intuitions behind modeling language and how to model probabilities of sequences to recurrent neural networks to large Transformers models that you have seen in the news like GPT2/GPT3. The tutorial wraps up with a summary on the ethical implications of training such large language models on uncurated text from the internet.
Regulating Generative AI - LLMOps pipelines with TransparencyDebmalya Biswas
The growing adoption of Gen AI, esp. LLMs, has re-ignited the discussion around AI Regulations — to ensure that AI/ML systems are responsibly trained and deployed. Unfortunately, this effort is complicated by multiple governmental organizations and regulatory bodies releasing their own guidelines and policies with little to no agreement on the definition of terms.
Rather than trying to understand and regulate all types of AI, we recommend a different (and practical) approach in this talk based on AI Transparency —
to transparently outline the capabilities of the AI system based on its training methodology and set realistic expectations with respect to what it can (and cannot) do.
We outline LLMOps architecture patterns and show how the proposed approach can be integrated at different stages of the LLMOps pipeline capturing the model's capabilities. In addition, the AI system provider also specifies scenarios where (they believe that) the system can make mistakes, and recommends a ‘safe’ approach with guardrails for those scenarios.
Retrieval Augmented Generation (RAG), is a popular method to use a large language model, a vector database, and some sort of prompt interface to build better chat bots. On the surface, it seems pretty simple to build a RAG app, but when it comes down to implementation, there are many details to hash out. These details include how to: chunk data, work with embeddings, and even how to select and use a vector database.
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...Bhaskar Mitra
In this talk, I share some of my personal reflections and learnings on benchmark development and community building for making robust scientific progress. This talk is informed by my experience as a developer of the MS MARCO benchmark and as an organizer of the TREC Deep Learning Track. My goal in this talk is to situate the act of releasing a dataset in the context of broader research visions and to draw due attention to considerations of scientific and social outcomes that are invariably salient in the acts of dataset creation and distribution.
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
The document discusses how graph databases like Neo4j can enable real-time analytics at massive scale by leveraging relationships in data. It notes that data is growing exponentially but traditional databases can't efficiently analyze relationships. Neo4j natively stores and queries relationships to allow analytics 1000x faster. The document advocates that graphs will form the foundation of modern data and analytics by enhancing machine learning models and enabling outcomes like building intelligent applications faster, gaining deeper insights, and scaling limitlessly without compromising data.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples to illustrate these techniques in action.
Basics of Generative AI: Models, Tokenization, Embeddings, Text Similarity, V...Robert McDermott
This document provides an overview of natural language processing techniques like language modeling, tokenization, embeddings, and semantic similarity. It discusses the basics of these concepts and how they relate to each other, such as how tokenization is used as a preprocessing step and embeddings are used to capture semantic meaning and relationships that allow measuring text similarity. It also presents examples of projects that utilize these techniques, such as a document retrieval system that finds similar texts using embeddings and a vector database.
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
This document provides an overview of natural language processing (NLP) and discusses several NLP applications. It introduces NLP and how it helps computers understand human language through examples like Apple's Siri and Google Now. It then summarizes popular NLP toolkits and describes applications including text summarization, information extraction, sentiment analysis, and dialog systems. The document concludes by discussing NLP system development, testing, and evaluation.
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
This document provides an agenda for a full-day bootcamp on large language models (LLMs) like GPT-3. The bootcamp will cover fundamentals of machine learning and neural networks, the transformer architecture, how LLMs work, and popular LLMs beyond ChatGPT. The agenda includes sessions on LLM strategy and theory, design patterns for LLMs, no-code/code stacks for LLMs, and building a custom chatbot with an LLM and your own data.
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
Ini adalah slide tambahan dari materi pengenalan Big Data Analytics (di file berikutnya), yang mengajak kita mulai hands-on dengan beberapa hal terkait Machine/Deep Learning, Big Data (batch/streaming), dan AI menggunakan Tensor Flow
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
BigScience is a one-year research workshop involving over 800 researchers from 60 countries to build and study very large multilingual language models and datasets. It was granted 5 million GPU hours on the Jean Zay supercomputer in France. The workshop aims to advance AI/NLP research by creating shared models and data as well as tools for researchers. Several working groups are studying issues like bias, scaling, and engineering challenges of training such large models. The first model, T0, showed strong zero-shot performance. Upcoming work includes further model training and papers.
A Strong Object Recognition Using Lbp, Ltp And RlbpRikki Wright
This document discusses the evolution of object-oriented technology and languages. It notes that many object-oriented languages have emerged but companies commonly use open source OO languages like Java, C++, C# and Visual Basic due to their low or no licensing costs. These languages also have readily available libraries and development resources. The history of object-oriented concepts is traced back to Simula 67 and Smalltalk in the 1960s-70s, which introduced key ideas like classes, objects, inheritance and polymorphism. Exponential growth has occurred as more systems adopt object-oriented technologies.
The document describes the Like2Vec recommender system model. It transforms sparse user-item rating matrices into a graph representation, and then uses the DeepWalk algorithm to learn embeddings of nodes in the graph. These embeddings are trained with the Skip-Gram language model on random walks generated through the graph. Like2Vec is evaluated on the Netflix dataset and is shown to outperform baselines in Recall-at-N, which directly measures the quality of top recommendations compared to RMSE which does not. Recall-at-N is argued to be a superior evaluation metric for recommender systems.
Introduction to LLM Post-Training - MIT 6.S191 2025Maxime Labonne
In this talk, we will cover the fundamentals of modern LLM post-training at various scales with concrete examples. High-quality data generation is at the core of this process, focusing on the accuracy, diversity, and complexity of the training samples. We will explore key training techniques, including supervised fine-tuning, preference alignment, and model merging. The lecture will delve into evaluation frameworks with their pros and cons for measuring model performance. We will conclude with an overview of emerging trends in post-training methodologies and their implications for the future of LLM development.
Muhammad Adil Raja is a researcher interested in machine learning and its applications. He has a PhD from University of Limerick and has worked as a post-doctorate researcher at Orange Labs. His research focuses on developing machine learning models for tasks like speech quality estimation, network impairment characterization, and computational neuroscience. He has extensive experience developing machine learning software and has authored several research proposals applying machine learning.
DeepPavlov is an open-source framework for the development of production-ready chat-bots and complex conversational systems, as well as NLP and dialog systems research.
Rajesh Muddana has over 6 years of experience in testing and automation of network protocols such as MPLS, L2/L3 protocols, and MPLS-TP. He has expertise in test case planning, execution, results analysis, and setting up test environments. Some of his key achievements include receiving awards for best team member from previous employers and developing automation frameworks for MPLS-TP features and MIB-based testing. Currently he works as a Senior Software Engineer for IpInfusion Software testing protocols like MPLS, L2VPN, and MPLS-TP using simulation software.
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
Semantic Cultivators : The Critical Future Role to Enable AIartmondano
By 2026, AI agents will consume 10x more enterprise data than humans, but with none of the contextual understanding that prevents catastrophic misinterpretations.
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Impelsys Inc.
Impelsys provided a robust testing solution, leveraging a risk-based and requirement-mapped approach to validate ICU Connect and CritiXpert. A well-defined test suite was developed to assess data communication, clinical data collection, transformation, and visualization across integrated devices.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
TrsLabs - Fintech Product & Business ConsultingTrs Labs
Hybrid Growth Mandate Model with TrsLabs
Strategic Investments, Inorganic Growth, Business Model Pivoting are critical activities that business don't do/change everyday. In cases like this, it may benefit your business to choose a temporary external consultant.
An unbiased plan driven by clearcut deliverables, market dynamics and without the influence of your internal office equations empower business leaders to make right choices.
Getting things done within a budget within a timeframe is key to Growing Business - No matter whether you are a start-up or a big company
Talk to us & Unlock the competitive advantage
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul
Artificial intelligence is changing how businesses operate. Companies are using AI agents to automate tasks, reduce time spent on repetitive work, and focus more on high-value activities. Noah Loul, an AI strategist and entrepreneur, has helped dozens of companies streamline their operations using smart automation. He believes AI agents aren't just tools—they're workers that take on repeatable tasks so your human team can focus on what matters. If you want to reduce time waste and increase output, AI agents are the next move.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxJustin Reock
Building 10x Organizations with Modern Productivity Metrics
10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ‘The Coding War Games.’
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method we invent for the delivery of products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches actually work? DORA? SPACE? DevEx? What should we invest in and create urgency behind today, so that we don’t find ourselves having the same discussion again in a decade?
Role of Data Annotation Services in AI-Powered ManufacturingAndrew Leo
From predictive maintenance to robotic automation, AI is driving the future of manufacturing. But without high-quality annotated data, even the smartest models fall short.
Discover how data annotation services are powering accuracy, safety, and efficiency in AI-driven manufacturing systems.
Precision in data labeling = Precision on the production floor.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
Ad
Introduction to Multimodal LLMs with LLaVA
1. DALL-E 3: "A detailed graphic that visualizes a multimodal vector embedding space"
Multimodal LLMs
• What are Multimodal Language Models
• Background / How do they work
• LLaVA papers/projects
• LLaVA model demonstration
• Image classification project with LLaVA
Robert McDermott (he/him)
Director: Solutions, Engineering & Architecture (SEA)
[email protected]
Deep Learning Affinity Group (DLAG)
https://research.fredhutch.org/dlag/en.html
Feb 20, 2024
4. 4
Multimodal Language Models
Multimodal language models are AI systems designed to understand, interpret, and generate information across different
forms of data, such as text and images. These models leverage large datasets of annotated examples to learn associations
between text and visual content, enabling them to perform tasks that require comprehension of both textual and visual
information.
Why is the
sky blue?
A person wearing a red cap and
sleeveless outfit is soaring through
a cloudless sky on a brightly
colored hang glider.
The sky appears blue because
molecules in the Earth's
atmosphere scatter sunlight the
shorter wavelength of blue more
than other colors.
Multimodal
Language
Model
I like pizza
5. 5
Multimodal Language Models
Source: https://twitter.com/GregKamradt/status/1711772496159252981
Use Case Breakdown
Describe
• Animal Identification
• What's in this photo
Interpret
• Technical Flame Graph Interpretation
• Schematic Interpretation
• Twitter Thread Explainer
Recommend
• Food Recommendations
• Website Feedback
• Painting Feedback
Convert
• Figma Screens
• Adobe Lightroom Settings
• Suggest ad copy based on a webpage
Extract
• Structured Data From Driver's License
• Extract structured itemsfrom an image
• Handwriting Extraction
Assist
• Excel Formula Helper
• Find My Glasses
• Live Poker Advice
• Video game recommendations
Evaluate
• Dog Cuteness Evaluator
• Bounding Box Evaluator
• Thumbnail Testing
Links to Examples
6. 6
AI Vision has come a long way.
GPT-4 Vision
LLaVA 1.6 34B
Research scientist and a founding member
at OpenAI. Sr. Director of AI at Telsa.
source: https://karpathy.github.io/2012/10/22/state-of-computer-vision/
2024
2012
9. Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Quick Introduction to Tokens and Embeddings
required to understand how LLMs process text and
images.
9
10. Text Tokenization
10
Tokenization is a foundational step in the preprocessing of text for many natural language processing (NLP) tasks, including for language
models like GPT-4 and Llama-2. Tokenization involves breaking down text into smaller chunks, or "tokens", which can be as short as one
character or as long as one word (or even longer in some cases). These tokens can then be processed, analyzed, and used as input for
machine learning models.
https://platform.openai.com/tokenizer
Tokenization
Visualized
Resulting
Token IDs
11. 11
Vector Embeddings
Applications
• Natural Language Processing tasks: sentiment analysis,
named entity recognition, etc.
• Information retrieval: search engines, recommendation
systems.
• Visualization: using dimensionality reduction to visualize
semantic relationships
https://huggingface.co/spaces/mteb/leaderboard
5.41765615e-02 4.20716889e-02 -2.41547506e-02 1.11813843e-01
-9.33169946e-02 -7.56109739e-03 6.54651076e-02 -1.54011259e-02
-2.80906167e-02 1.97344255e-02 -1.58324391e-02 -8.46638903e-02
-1.31631363e-02 1.98841579e-02 -1.26802064e-02 -9.36008468e-02
-4.51933630e-02 -1.20324306e-02 -2.48974599e-02 4.87890420e-03
-2.54017510e-03 4.92022634e-02 5.12179844e-02 2.54505035e-02
-9.70738381e-02 1.42842624e-02 -3.46412621e-02 -8.45314115e-02
-7.38010108e-02 -2.72879936e-02 -2.81507652e-02 -5.01780510e-02
5.35405474e-03 2.96438616e-02 -5.18742464e-02 -6.24342896e-02
6.04359470e-02 -2.22260728e-02 3.36266570e-02 5.17647602e-02
-3.09484527e-02 -8.72448832e-02 -1.53413722e-02 9.27508809e-03
-4.92608221e-03 -4.97105941e-02 -1.04904985e-02 -4.15333314e-03
1.55722797e-02 -2.66851094e-02 -6.49709478e-02 -5.94373941e-02
-2.10976638e-02 3.59102758e-03 5.88850211e-03 -1.03685725e-02
5.03626876e-02 -3.31290103e-02 -7.70502910e-02 1.53052341e-02
*
"A fat tuxedo cat" =
* The "all-MiniLM-L6-v2" embedding model has 384 dimensions
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
Definition
• Representations of text in numerical form.
• Convert variable-length text into fixed-size vectors in high-
dimensional space.
Purpose
• Capture semantic meaning and relationships between words,
phrases, or longer text.
• Enable mathematical operations on text (e.g., similarity
measurement, arithmetic operations).
Characteristics
• Words with similar meanings are close in vector space.
• Allows for operations like "king" - "man" + "woman" ≈ "queen".
There are many embedding models:
12. 12
Vector Embeddings
• There are several dozen embedding models
• They range in complexity from 384 to 1536 dimensions
• They range in max sequence length from 512 to 8191 tokens
• Embedding models are generally not compatible with each other
Interactive embedding explorer:
https://blog.echen.me/embedding-explorer/
13. Semantic Text Similarity
13
Sentence 1 Sentence 2 Cosine Similarity
The cat sits outside The dog plays in the garden 0.2838
A man is playing guitar A woman watches TV -0.0327
The new movie is awesome The new movie is so great 0.8939
Jim can run very fast James is the fastest runner 0.6844
My goldfish is hungry Pluto is a planet! 0.0454
• Measures the cosine of the angle between two vectors.
• Value between -1 and 1; where 1 means vectors are identical, 0 means
orthogonal, and -1 means diametrically opposite (rare in text embeddings).
These clearly used different
embedding models
https://gist.github.com/robert-mcdermott/67cf2623237989bc2315d35a108246ef
15. Image Embedding Example
15
source: https://www.researchgate.net/publication/282181243_Learning_Visual_Clothing_Style_with_Heterogeneous_Dyadic_Co-Occurrences
“Visualization of a 2D embedding of the
style space trained with strategic sampling
computed with t-SNE. The embedding is
based on 200,000 images from the test set.
For a clear visual representation, we
discretize the style space into a grid and
pick one image from each grid cell at
random.”
17. Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
The LLaVA Papers
17
18. LLaVA 1.0 – Large Language and Vision Assistant
18
• https://arxiv.org/abs/2304.08485
• https://arxiv.org/pdf/2304.08485.pdf
• https://llava-vl.github.io/
• https://github.com/haotian-liu/LLaVA
Instruction tuning large language models (LLMs) using machine-
generated instruction-following data has improved zero-shot
capabilities on new tasks, but the idea is less explored in the
multimodal field. In this paper, we present the first attempt to use
language-only GPT-4 to generate multimodal language-image
instruction-following data. By instruction tuning on such generated
data, we introduce LLaVA: Large Language and Vision Assistant, an
end-to-end trained large multimodal model that connects a vision
encoder and LLM for general-purpose visual and language
understanding. Our early experiments show that LLaVA
demonstrates impressive multimodel chat abilities, sometimes
exhibiting the behaviors of multimodal GPT-4 on unseen
images/instructions and yields a 85.1% relative score compared
with GPT-4 on a synthetic multimodal instruction-following dataset.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4
achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4
generated visual instruction tuning data, our model and code base
publicly available.
Abstract
22. 22
• https://arxiv.org/abs/2310.03744
• https://arxiv.org/pdf/2310.03744.pdf
• https://huggingface.co/liuhaotian/llava
-v1.5-13b
Large multimodal models (LMM) have recently shown
encouraging progress with visual instruction tuning. In this
note, we show that the fully-connected vision-language cross-
modal connector in LLaVA is surprisingly powerful and data-
efficient. With simple modifications to LLaVA, namely, using
CLIP-ViT-L-336px with an MLP projection and adding academic-
task-oriented VQA data with simple response formatting
prompts, we establish stronger baselines that achieve state-of-
the-art across 11 benchmarks.
Our final 13B checkpoint uses merely 1.2M publicly available
data, and finishes full training in ~1 day on a single 8-A100
node.
We hope this can make state-of-the-art LMM research more
accessible. Code and model will be publicly available.
Abstract
LLaVA (1.5) – Large Language and Vision Assistant
23. 23
LLaVA 1.5 Changes
Modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented
VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art
across 11 benchmarks.
26. LLaVA 1.6
26
LLaVA-1.6-34B outperforms Gemini Pro on several benchmarks
https://llava-vl.github.io/blog/2024-01-30-llava-1-6/
Benchmarks Example
27. 27
• https://arxiv.org/abs/2306.00890
• https://arxiv.org/pdf/2306.00890.pdf
Conversational generative AI has demonstrated remarkable promise for empowering
biomedical practitioners, but current investigationsfocus on unimodal text.
Multimodalconversational AI has seen rapid progress by leveraging billions of
image-text pairs from the public web, but such general-domain vision-language
models still lack sophisticationin understanding and conversing about biomedical
images. In this paper, we propose a cost-efficient approach for training a vision language
conversational assistant that can answer open-ended research questions
of biomedical images. The key idea is to leverage a large-scale, broad-coverage
biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to
self-instruct open-ended instruction-followingdata from the captions, and then
fine-tune a large general-domain vision-language model using a novel curriculum
learning method. Specifically,the model first learns to align biomedical vocabulary
using the figure-caption pairs as is, then learns to master open-ended conversational
semantics using GPT-4 generated instruction-followingdata, broadly mimicking
how a layperson gradually acquires biomedical knowledge. This enables us to train
a Large Language and Vision Assistant for BioMedicine (LLaVA-Med) in less
than 15 hours (with eight A100s). LLaVA-Med exhibits excellent multimodal conversational
capability and can follow open-ended instruction to assist with inquiries
about a biomedical image. On three standard biomedical visual question answering
datasets, fine-tuning LLaVA-Med outperforms previous supervised state-of-the-art
on certain metrics. To facilitate biomedical multimodal research, we will release
our instruction-followingdata and the LLaVA-Med model.
Abstract
LLaVA-Med
https://github.com/microsoft/LLaVA-Med
https://huggingface.co/microsoft/llava-med-7b-delta
30. Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
Fred Hutchinson Cancer Center
My LLaVA based Image Classifier Experiment
30
Full details, results and code: https://github.com/robert-mcdermott/LLM-Image-Classification