Quick and Easy Guide to Installing Meta Llama 3.1 405B, 70B, 8B Language Models with Ollama, Docker, and OpenWebUI

I will show how easy and quick it is to install Llama 3.1 405B, 70B, 8B, or another language model on your computer or VM using Ollama, Docker, and OpenWebUI. It is so simple to install that even a grandmother or grandfather could do it. This is private AI, not cloud-based. All data is on your own computer. OpenWebUI is a front interface similar to OpenAI ChatGPT’s interface.

I will demonstrate how to install it on Windows 11, but it can be installed just as quickly on Apple Mac Silicon or various Linux distributions. My instructions for Linux versions will come later.

You can view the language model size by going to Ollama > Models and selecting, for example, Llama 3.1. Each language model is listed with:

For example, the 405B model shows 231GB. This means that the 405B language model requires 231GB of VRAM or device RAM. Keep in mind that Windows and any other software on your computer also need RAM. You need to have as much GPU VRAM as the language model requires to run it successfully. If your GPU has less RAM than the language model needs, the model will automatically use the computer/server/VM’s RAM and the CPU instead of the GPU. If there is not enough RAM, the computer/server/VM can become unstable because the language model will use all the RAM and may crash. So, make sure you have enough resources before starting.

Small language models like 8B are fine to run on the CPU, but using a GPU makes them lightning fast. It’s really cool, and I highly recommend experimenting with it.

Here’s how it works: it first reads the language model into VRAM/RAM and then starts responding. For larger language models, this RAM reading can take some time.

The most important factor is the size of the GPU VRAM. Next is the amount of RAM in the server/PC/VM, and then comes the CPU speed, but not so much the number of cores. I did not notice a significant difference whether it was a 120-core or a 16-core machine. However, the processor’s Hz does matter.

Watch the video on how I set this up quickly and follow along on your computer.

Quick Steps:

Download Ollama and install it on your computer. https://ollama.com/
Close Ollama
Set up variables for Ollama, specifying where you want the language models to be stored.
- Advanced system settings
- Environment Variables
- Variable name: OLLAMA_MODELS
- Variable value: Drive/Folder location
Start Ollama
Install Docker on your computer. https://www.docker.com
Launch a browser and go to your OpenWebUI address. https://openwebui.com/
Install OpenWebUI in your Docker setup.
- Open Terminal (administrator rights)
- Copy and paste the command
- docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Launch a browser and http://localhost:3000/
Create a user in OpenWebUI.
Install language models via OpenWebUI.
Install language models via Terminal

Note: You can also install language models via the terminal, but sometimes they may not be visible in OpenWebUI.

Share this:

Like this: