How to Run Your Own Free Offline and Totally Private AI Chatbot
How to Run Your Own Free Offline and Totally Private AI Chatbot
com /how-to/how-to-run-your-own-chatgpt-like-llm-for-free-and-in-private
1. Home
2. How-To
3. Try AI
Be your own AI content generator! Here's how to get started running free LLM alternatives using the CPU
and GPU of your own PC.
By Brian Westover
(Image: Gorodenkoff/Shutterstock.com)
The power of large language models (LLMs), generally made possible by cloud computing, is obvious,
but have you ever thought about running an AI chatbot on your own laptop or desktop? Depending on
how modern your system is, you can likely run LLMs on your own hardware. But why would you want to?
Well, maybe you want to fine-tune a tool for your own data. Perhaps you want to keep your AI
conversations private and offline. You may just want to see what AI models can do without the companies
running cloud servers shutting down any conversation topics they deem unacceptable. With a ChatGPT-
like LLM on your own hardware, all of these scenarios are possible.
And hardware is less of a hurdle than you might think. The latest LLMs are optimized to work with Nvidia
graphics cards and with Macs using Apple M-series processors—even low-powered Raspberry Pi
systems. And as new AI-focused hardware comes to market, like the integrated NPU of Intel's "Meteor
Lake" processors or AMD's Ryzen AI, locally run chatbots will be more accessible than ever before.
Thanks to platforms like Hugging Face and communities like Reddit's LocalLlaMA, the software models
behind sensational tools like ChatGPT now have open-source equivalents—in fact, more than 200,000
different models are available at this writing. Plus, thanks to tools like Oobabooga's Text Generation
WebUI, you can access them in your browser using clean, simple interfaces similar to ChatGPT,
Microsoft Copilot, and Google Gemini.
“ The software models behind sensational tools like ChatGPT now have open-source equivalents—in
fact, more than 200,000 different models are available. ”
So, in short, locally run AI tools are freely available, and anyone can use them. However, none of them
are ready-made for non-technical users, and the category is new enough that you won't find many easy-
to-digest guides or instructions on how to download and run your own LLM. It's also important to
1/16
remember that a local LLM won't be nearly as fast as a cloud-server platform because its resources are
limited to your system alone.
Nevertheless, we're here to help the curious with a step-by-step guide to setting up your own generative
AI chatbot on your own PC. Our guide uses a Windows machine, but the tools listed here are generally
available for Mac and Linux systems as well, though some extra steps may be involved when using
different operating systems.
Also, finding answers can be a real pain. The online communities devoted to these topics are usually
helpful in solving problems. Often, someone's solved the problem you're encountering in a conversation
you can find online with a little searching. But where is that conversation? It might be on Reddit, in an
FAQ, on a GitHub page, in a user forum on HuggingFace, or somewhere else entirely.
“ AI is quicksand. Everything moves whip-fast, and the environment undergoes massive shifts on a
constant basis. ”
It's worth repeating that open-source AI is moving fast. Every day new models are released, and the tools
used to interact with them change almost as often, as do the underlying training methods and data, and
all the software undergirding that. As a topic to write about or to dive into, AI is quicksand. Everything
moves whip-fast, and the environment undergoes massive shifts on a constant basis. So much of the
software discussed here may not last long before newer and better LLMs and clients are released.
Bottom line: Proceed at your own risk. There's no Geek Squad to call for help with open-source software;
it's not all professionally maintained; and you'll find no handy manual to read or customer service
department to turn to—just a bunch of loosely organized online communities.
Finally, once you get it all running, these AI models have varying degrees of polish, but they all carry the
same warnings: Don't trust what they say at face value, because it's often wrong. Never look to an AI
chatbot to help make your health or financial decisions. The same goes for writing your school essays or
your website articles. Also, if the AI says something offensive, try not to take it personally. It's not a
person passing judgment or spewing questionable opinions; it's a statistical word generator made to spit
out mostly legible sentences. If any of this sounds too scary or tedious, this may not be a project for you.
2/16
Select Your Hardware
Before you begin, you'll need to know a few things about the machine on which you want to run an LLM.
Is it a Windows PC, a Mac, or a Linux box? This guide, again, will focus on Windows, but most of the
resources referenced offer additional options and instructions for other operating systems.
You also need to know whether your system has a discrete GPU or relies on its CPU's integrated
graphics. Plenty of open-source LLMs can run solely on your CPU and system memory, but most are
made to leverage the processing power of a dedicated graphics chip and its extra video RAM. Gaming
laptops, desktops, and workstations are better suited to these applications, since they have the powerful
graphics hardware these models often rely on.
Gaming laptops and mobile workstations offer the best hardware for running LLMs at home. (Credit: Molly Flores)
In our case, we're using a Lenovo Legion Pro 7i Gen 8 gaming notebook, which combines a potent Intel
Core i9-13900HX CPU, 32GB of system RAM, and a powerful Nvidia GeForce RTX 4080 mobile GPU
with 12GB of dedicated VRAM.
If you're on a Mac or Linux system, are CPU-dependent, or are using AMD instead of Intel hardware, be
aware that while the general steps in this guide are correct, you may need extra steps and additional or
different software to install. And the performance you see could be markedly different from what we
discuss here.
3/16
(Credit: Brian Westover/Microsoft)
Personal users will be fine to skip the Enterprise and Professional versions and use just the BuildTools
version of the software.
Find the latest version of Visual Studio 2019 and download the BuildTools version (Credit: Brian Westover/Microsoft)
4/16
After choosing that, be sure to select "Desktop Development with C++." This step is essential in order for
other pieces of software to work properly.
Begin your download and kick back: Depending on your internet connection, it could take several minutes
before the software is ready to launch.
5/16
Download Oobabooga's Text Generation WebUI Installer
Next, you need to download the Text Generation WebUI tool from Oobabooga. (Yes, it's a silly name, but
the GitHub project makes an easy-to-install and easy-to-use interface for AI stuff, so don't get hung up on
the moniker.)
To download the tool, you can either navigate through the GitHub page or go directly to the collection of
one-click installers Oobabooga has made available. We've installed the Windows version, but this is also
where you'll find installers for Linux and macOS. Download the zip file shown below.
6/16
(Credit: Brian Westover/Oobabooga)
Create a new file folder somewhere on your PC that you'll remember and name it AI_Tools or something
similar. Do not use any spaces in the folder name, since that will mess up some of the automated
download and install processes of the installer.
Then, extract the contents of the zip file you just downloaded into your new AI_Tools folder.
7/16
Run the Text Generation WebUI Installer
Once the zip file has been extracted to your new folder, look through the contents. You should see
several files, including one called start_windows.bat. Double-click it to begin installation.
Depending on your system settings, you might get a warning about Windows Defender or another
security tool blocking this action because it's not from a recognized software vendor. (We haven't
experienced or seen anything reported online to indicate that there's any problem with these files, but
we'll repeat that you do this at your own risk.) If you wish to proceed, select "More info" to confirm
whether you want to run start_windows.bat. Click "Run Anyway" to continue the installation.
Now, the installer will open up a command prompt (CMD) and begin installing the dozens of software
pieces necessary to run the Text Generation WebUI tool. If you're unfamiliar with the command-line
interface, just sit back and watch.
First, you'll see a lot of text scroll by, followed by simple progress bars made up of hashtag or pound
symbols, and then a text prompt will appear. It will ask you what your GPU is, giving you a chance to
indicate whether you're using Nvidia, AMD, or Apple M series silicon or just a CPU alone. You should
already have figured this out before downloading anything. In our case, we select A, because our laptop
has an Nvidia GPU.
8/16
(Credit: Brian Westover/Microsoft)
Once you've answered the question, the installer will handle the rest. You'll see plenty of text scroll by,
followed first by simple text progress bars and then by more graphically pleasing pink and green progress
bars as the installer downloads and sets up everything it needs.
At the end of this process (which may take up to an hour), you'll be greeted by a warning message
surrounded by asterisks. This warning will tell you that you haven't downloaded any large language
9/16
model yet. That's good news! It means that Text Generation WebUI is just about done installing.
At this point you'll see some text in green that reads "Info: Loading the extension gallery." Your installation
is complete, but don't close the command window yet.
10/16
Copy and Paste the Local Address for WebUI
Immediately below the green text, you'll see another line that says "Running on local URL:
http://127.0.01:7860." Just click that URL text, and it will open your web browser, serving up the Text
Generation WebUI—your interface for all things LLM.
You can save this URL somewhere or bookmark it in your browser. Even though Text Generation WebUI
is accessed through your browser, it runs locally, so it'll work even if your Wi-Fi is turned off. Everything in
this web interface is local, and the data generated should be private to you and your machine.
11/16
(Credit: Brian Westover/Oobabooga)
In your AI_Tools folder, open up the same start_windows batch file that we ran to install everything. It will
reopen the CMD window but, instead of going through that whole installation process, will load up a small
bit of text including the green text from before telling you that the extension gallery is loaded. That means
the WebUI is ready to open again in your browser.
12/16
(Credit: Brian Westover/Oobabooga)
Use the same local URL you copied or bookmarked earlier, and you'll be greeted once again by the
WebUI interface. This is how you will open the tool in the future, leaving the CMD window open in the
background.
If you want a curated list of the most recommended models, you can check out a community like Reddit's
/r/LocalLlaMA, which includes a community wiki page that lists several dozen models. It also includes
information about what different models are built for, as well as data about which models are supported
by different hardware. (Some LLMs specialize in coding tasks, while others are built for natural text chat.)
These lists will all end up sending you to Hugging Face, which has become a repository of LLMs and
resources. If you came here from Reddit, you were probably directed straight to a model card, which is a
13/16
dedicated information page about a specific downloadable model. These cards provide general
information (like the datasets and training techniques that were used), a list of files to download, and a
community page where people can leave feedback as well as request help and bug fixes.
At the top of each model card is a big, bold model name. In our case, we used the the WizardLM 7B
Uncensored model made by Eric Hartford. He uses the screen name ehartford, so the model's listed
location is "ehartford/WizardLM-7B-Uncensored," exactly how it's listed at the top of the model card.
Next to the title is a little copy icon. Click it, and it will save the properly formatted model name to your
clipboard.
Back in WebUI, go to the model tab and enter that model name into the field labeled "Download custom
model or LoRA." Paste in the model name, hit Download, and the software will start downloading the
necessary files from Hugging Face.
14/16
(Credit: Brian Westover/Oobabooga)
If successful, you'll see an orange progress bar pop up in the WebUI window and several progress bars
will appear in the command window you left open in the background.
Once it's finished (again, be patient), the WebUI progress bar will disappear and it will simply say "Done!"
instead.
Before you can use the model, you need to allocate some system or graphics memory (or both) to
running it. While you can tweak and fine-tune nearly anything you want in these models, including
memory allocation, I've found that setting it at roughly two-thirds of both GPU and CPU memory works
best. That leaves enough unused memory for your other PC functions while still giving the LLM enough
memory to track and hold a longer conversation.
Once you've allocated memory, hit the Save Settings button to save your choice, and it will default to that
memory allocation every time. If you ever want to change it, you can simply reset it and press Save
Settings again.
15/16
With your model loaded up and ready to go, it's time to start chatting with your ChatGPT alternative.
Navigate within WebUI to the Text Generation tab. Here you'll see the actual text interface for chatting
with the AI. Enter text into the box, hit Enter to send it, and wait for the bot to respond.
Here, we'll say again, is where you'll experience a little disappointment: Unless you're using a super-
duper workstation with multiple high-end GPUs and massive amounts of memory, your local LLM won't
be anywhere near as quick as ChatGPT or Google Bard. The bot will spit out fragments of words (called
tokens) one at a time, with a noticeable delay between each.
However, with a little patience, you can have full conversations with the model you've downloaded. You
can ask it for information, play chat-based games, even give it one or more personalities. Plus, you can
use the LLM with the assurance that your conversations and data are private, which gives peace of mind.
You'll encounter a ton of content and concepts to explore while starting with local LLMs. As you use
WebUI and different models more, you'll learn more about how they work. If you don't know your text from
your tokens, or your GPTQ from a LoRA, these are ideal places to start immersing yourself in the world of
machine learning.
Sign up for Tips & Tricks newsletter for expert advice to get the most out of your technology.
This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your
consent to our Terms of Use and Privacy Policy. You may unsubscribe from the newsletters at any time.
16/16