Skip to content
Menu
vAndu
  • Home
  • Home Lab
  • AI/ML & vGPU
  • Snapshot
  • The Lab Floor
vAndu
Build a Local AI Stack Series Index

Build a Local AI Stack: Series Index

Posted on June 3, 2026June 3, 2026

This is a set of quick setup guides for running a full local AI tooling stack on your own hardware. Every tool runs on your machine, talks to your own models, and costs nothing per request. The guides are written to be posted one at a time, so each one stands on its own.

Before you start

Two setup guides come first, depending on your machine:

Part 0a sets up Windows with WSL, Ubuntu, and Docker Desktop.

Part 0b sets up a Ubuntu server with Docker and the essentials.

Do whichever one matches your machine, then continue with Ollama.

The stack at a glance

1. Ollama, port 11434. The model engine that runs the LLMs everything else talks to.

2. Open WebUI, port 8081. A ChatGPT style chat interface for your local models.

3. n8n, port 5678. Workflow automation with AI nodes for agents and pipelines.

4. AnythingLLM, port 3001. Document chat and RAG over your own files.

5. Flowise, port 3000. A visual drag and drop builder for LLM apps and agents.

6. Langflow, port 7860. A visual flow builder for LLM pipelines and agents.

7. LiteLLM, port 4000. A gateway that unifies every model behind one OpenAI compatible API, with virtual keys and logging.

8. Langfuse, port 3002. Observability and tracing for every LLM call.

9. Dify, port 8080. An all in one platform to build and publish LLM apps, agents, and RAG.

10. Locust, port 8089. A load testing tool to measure how much traffic your stack can handle.

A note on models and hardware

Ollama runs the models, and how large a model you can run depends on your graphics card’s memory (VRAM). This series was built on an NVIDIA RTX PRO 6000 Blackwell with 96 GB of VRAM, which is large enough to keep a whole library of models downloaded and switch between them freely. The Ollama guide lists every model used here, with its size and its job, so you can pick the ones that fit your own card.

A rough rule: a model needs about as much VRAM as its download size. Smaller cards simply use smaller models. Everything else in the stack works the same no matter which model you choose.

The one rule that ties it all together

Every tool except Ollama runs inside a Docker container. From inside a container, localhost points at the container itself, not at your host. So whenever a tool needs to reach Ollama (or another service on your host), the address is:

http://host.docker.internal:11434

This single rule prevents the most common failure people hit when wiring a local stack together.

Plan your ports first

Several of these tools default to the same ports (3000 and 8080 are popular). Decide your port map before you start so you do not get a clash halfway through. The map above is one that works with no collisions.

Suggested reading order

Start with the Part 0 guide for your machine, then Ollama, since everything depends on it. Then Open WebUI for an instant payoff. After that the order is up to you. LiteLLM and Langfuse are worth doing together, because the gateway becomes far more useful once every call is traced.

Home Labber who likes to build things and push it to the limits. vSphere is like Lego for adults.

“The fastest way to learn IT is 80% labbing and 20% studying theory. Just do it and have fun.” – vAndu

“If you wish to achieve worthwhile things in your personal and career life, you must become a worthwhile person in your own self-development” – Brian Tracy

VMware vExpert 2023
VMware vExpert NSX
VMware vExpert Pro
©2026 vAndu | Powered by SuperbThemes!