This is a set of quick setup guides for running a full local AI tooling stack on your own hardware. Every tool runs on your machine, talks to your own models, and costs nothing per request. The guides are written to be posted one at a time, so each one stands on its own.
Before you start
Two setup guides come first, depending on your machine:
Part 0a sets up Windows with WSL, Ubuntu, and Docker Desktop.
Part 0b sets up a Ubuntu server with Docker and the essentials.
Do whichever one matches your machine, then continue with Ollama.
The stack at a glance
1. Ollama, port 11434. The model engine that runs the LLMs everything else talks to.
2. Open WebUI, port 8081. A ChatGPT style chat interface for your local models.
3. n8n, port 5678. Workflow automation with AI nodes for agents and pipelines.
4. AnythingLLM, port 3001. Document chat and RAG over your own files.
5. Flowise, port 3000. A visual drag and drop builder for LLM apps and agents.
6. Langflow, port 7860. A visual flow builder for LLM pipelines and agents.
7. LiteLLM, port 4000. A gateway that unifies every model behind one OpenAI compatible API, with virtual keys and logging.
8. Langfuse, port 3002. Observability and tracing for every LLM call.
9. Dify, port 8080. An all in one platform to build and publish LLM apps, agents, and RAG.
10. Locust, port 8089. A load testing tool to measure how much traffic your stack can handle.
A note on models and hardware
Ollama runs the models, and how large a model you can run depends on your graphics card’s memory (VRAM). This series was built on an NVIDIA RTX PRO 6000 Blackwell with 96 GB of VRAM, which is large enough to keep a whole library of models downloaded and switch between them freely. The Ollama guide lists every model used here, with its size and its job, so you can pick the ones that fit your own card.
A rough rule: a model needs about as much VRAM as its download size. Smaller cards simply use smaller models. Everything else in the stack works the same no matter which model you choose.
The one rule that ties it all together
Every tool except Ollama runs inside a Docker container. From inside a container, localhost points at the container itself, not at your host. So whenever a tool needs to reach Ollama (or another service on your host), the address is:
http://host.docker.internal:11434
This single rule prevents the most common failure people hit when wiring a local stack together.
Plan your ports first
Several of these tools default to the same ports (3000 and 8080 are popular). Decide your port map before you start so you do not get a clash halfway through. The map above is one that works with no collisions.
Suggested reading order
Start with the Part 0 guide for your machine, then Ollama, since everything depends on it. Then Open WebUI for an instant payoff. After that the order is up to you. LiteLLM and Langfuse are worth doing together, because the gateway becomes far more useful once every call is traced.