My First 10 Inch Rack With Local Llm No More Spotify Google Home Netflix Chatgpt
My First 10 Inch Rack With Local Llm No More Spotify Google Home Netflix Chatgpt
INTRODUCTION
The rapid convergence of cheap hardware, open‑source tooling, and a growing appetite for self‑hosted services has turned the humble 10‑inch rack into a viable centerpiece for modern homelabs. In this post we unpack the exact scenario described by the title – a compact rack that runs a local large language model (LLM), replaces commercial streaming endpoints such as Spotify, Google Home, Netflix, and ChatGPT, and still manages to host a full smart‑home stack, a Kiwix‑based offline Wikipedia mirror, and a lightweight NAS. The guide is written for seasoned sysadmins and DevOps engineers who already understand the fundamentals of infrastructure automation but are looking to tighten the feedback loop between hardware sizing, service orchestration, and cost‑effective operation.
Why does this matter? Traditional cloud‑first architectures often hide the true operational cost of always‑on services behind managed APIs. When you replace those APIs with locally deployed equivalents you gain:
- Predictable latency and bandwidth usage
- Full control over data privacy and retention policies
- The ability to continue serving critical functionality during internet or cellular outages – a reality for many remote or disaster‑prone locations
Readers will walk away with a concrete blueprint for:
- Selecting hardware that fits a 10‑inch footprint while delivering enough headroom for a local LLM and multiple containers
- Installing and configuring a stack that includes Docker, Portainer, Kiwix, and a locally hosted LLM inference server
- Hardening the environment against common security pitfalls
- Monitoring, backing up, and scaling the services without introducing unnecessary complexity
The following sections break down each of these topics in depth, using real‑world commands, configuration snippets, and best‑practice recommendations that you can copy‑paste into your own lab.
UNDERSTANDING THE TOPIC
What is a “local LLM” and why does it replace Spotify, Google Home, Netflix, ChatGPT? A large language model (LLM) is a neural network trained on massive text corpora to generate human‑like responses. When deployed locally you can expose it through an API that your own applications query instead of calling external services. In the context of the title, the local LLM acts as a unified replacement for:
- Spotify – by serving personalized music recommendations and playlist generation through a self‑hosted recommendation engine * Google Home – by handling voice‑triggered commands locally, turning smart‑home devices on/off without relying on Google’s cloud
- Netflix – by providing a catalog of locally cached media metadata and on‑demand transcoding when paired with a Plex‑style backend
- ChatGPT – by offering a private, offline conversational interface that never sends user data to third‑party servers
The key benefit is data sovereignty: all interactions stay inside your rack, eliminating the need for outbound API calls that consume bandwidth and expose you to service‑level outages.
Historical context
The concept of “self‑hosted entertainment” dates back to the early 2000s when projects like MPlayer and Xine allowed users to play local media without streaming services. The rise of Docker in 2013 made it feasible to package these tools alongside modern web‑based front‑ends, while the advent of open‑source LLM frameworks such as llama.cpp, text‑generation‑webui, and Ollama in 2022–2023 closed the gap between research prototypes and production‑ready inference services. Today, a 10‑inch rack can comfortably run a 7 B parameter model on a modest CPU‑only platform, thanks to quantized inference and efficient memory management.
Core components of the described stack
| Component | Role | Typical Software | Why it fits a 10‑inch rack |
|---|---|---|---|
| Beelink ME Mini | Bare‑metal host | Ubuntu Server 22.04 LTS | Low power (≈ 70 W idle), compact dimensions, enough SATA/NVMe slots for NAS and SSD |
| Docker Engine | Container runtime | Docker CE 24.x | Provides isolation, easy image distribution, and resource constraints |
| Portainer | UI for container management | Portainer CE 2.11 | Simplifies monitoring and troubleshooting without SSH fatigue |
| Kiwix | Offline Wikipedia mirror | Kiwix‑server 0.10 | Stores massive text corpora locally, serving via HTTP with minimal CPU load |
| Local LLM inference server | Replaces ChatGPT and other cloud LLMs | Ollama or text‑generation‑webui with ggml‑quantized model | Runs on CPU with 4‑bit quantization, fitting within 8 GB RAM envelope |
| Plex / Jellyfin | Media streaming backend | Jellyfin 10.9 | Open‑source alternative to Netflix‑style on‑demand streaming |
| Home Assistant | Smart‑home orchestration | Home Assistant 2024.9 | Controls Google Home‑compatible devices locally |
| NAS layer | Persistent storage | TrueNAS SCALE or OpenMediaVault | Provides SMB/NFS shares for media and backup data |
Each of these pieces can be containerized, allowing you to scale, update, or replace them independently. The rack’s small footprint means you can mount it on a wall, a shelf, or a standard 19‑inch rackmount kit without needing a dedicated server room.
Pros and cons of this approach
Pros
- Energy efficiency – Idle power stays under 70 W, translating to < 100 kWh per year in many climates.
- Resilience – Kiwix and the local LLM continue to serve data during ISP outages.
- Cost control – No recurring subscription fees for music, video, or AI APIs.
Learning opportunity – Hands‑on experience with Docker, network routing, and storage provisioning. Cons
- Initial setup time – Requires careful planning of hardware compatibility and network segmentation.
- Limited scalability – CPU‑only LLMs cannot match the throughput of dedicated GPU clouds for heavy traffic.
- Feature gaps – Some premium features (e.g., high‑resolution HDR streaming) may be unavailable without commercial licences.
Overall, the trade‑off leans heavily toward self‑sufficiency for hobbyist and semi‑professional use cases.
PREREQUISITES
Before you begin, verify that your hardware meets the minimum specifications. The following checklist assumes a BeagleBoard‑style 10‑inch chassis similar to the Beelink ME Mini, but the same principles apply to any low‑profile server.
Hardware requirements
| Item | Minimum | Recommended |
|---|---|---|
| CPU | 4‑core ARMv8 (e.g., Rockchip RK3568) | 8‑core Intel/AMD Xeon with AES‑NI for faster encryption |
| RAM | 8 GB | 16 GB (to accommodate multiple containers and a 4‑bit LLM) |
| Storage | 256 GB NVMe SSD | 1 TB NVMe SSD (fast random I/O for Kiwix and media cache) |
| Network | Gigabit Ethernet | 2.5 GbE or 10 GbE for future media‑heavy workloads |
| Power supply | 12 V 5 A | 12 V 10 A with UPS integration for outage tolerance |
Software dependencies
| Dependency | Version | Installation source |
|---|---|---|
| Ubuntu Server | 22.04 LTS | Official Ubuntu mirrors |
| Docker Engine | 24.0.x | Docker apt repository |
| Docker Compose | 2.20.x | Bundled with Docker CE |
| Git | 2.34.x | Ubuntu packages |
| curl | 7.88.x | Ubuntu packages |
| jq | 1.6.x | Ubuntu packages |
| ffmpeg | 5.1.x | Ubuntu packages (for media transcoding) |
| Jellyfin | 10.9.x | Official Jellyfin repository |
| Ollama | 0.1.45 | Ollama download page |
Network and security considerations
- Static IP assignment – Reserve a DHCP lease or configure a static address (e.g.,
192.168.1.10) to simplify firewall rules. - Port isolation – Keep management interfaces (Portainer, Home Assistant) on a separate VLAN or a dedicated subnet.
- Outbound firewall – Block all outbound traffic except DNS and NTP; this enforces the “no external API” philosophy.
- TLS termination – Use Let’s Encrypt certificates via certbot for any publicly exposed services (e.g., Kiwix front‑end). #### User permissions
Create a dedicated system user for Docker operations to avoid running containers as root:
1
2
sudo adduser --system --group --no-create-home dockeradmin
sudo usermod -aG docker dockeradmin
All subsequent Docker commands in this guide assume you are logged in as dockeradmin or have sudo privileges.
INSTALLATION & SETUP
The following sections walk you through each layer of the stack, from Docker foundation to the final LLM endpoint. Every command is annotated with explanations of why it is needed and how it interacts with the surrounding components.
1. Install Docker Engine
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up the stable repository
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update package index and install Docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
# Verify installation
docker version```
*Why this matters*: Using the official repository ensures you receive security patches promptly. The `docker version` output confirms that the daemon (`dockerd`) and client (`docker`) are correctly installed.
#### 2. Enable and start Docker service
```bash
sudo systemctl enable --now docker
sudo systemctl status docker```
*Explanation*: Enabling the service guarantees that containers start on boot, which is essential for a 24/7 homelab. The status check confirms the daemon is healthy.
#### 3. Install Docker Compose (v2 plugin)
```bash
DOCKER_COMPOSE_VERSION="v2.20.0"
sudo curl -SL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-$(uname -m)" \
-o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
docker compose version
Why: Compose v2 integrates natively with the Docker CLI, allowing you to manage multi‑container stacks with a single docker compose command.
4. Deploy Portainer for visual management
```yaml# Save as portainer.yml version: “3.8” services: portainer: image: portainer/portainer-ce:latest container_name: $CONTAINER_NAMES_PORTainer restart: unless-stopped sockets: - type: unix source: path: /var/run/docker.sock target: socket: /var/run/docker.sock ports: - “9000:9000” volumes: - /var/run/docker.sock:/var/run/docker.sock - portainer_data:/data
volumes: portainer_data:
1
2
3
4
5
6
```bash# Create the named volume
docker volume create portainer_data
# Launch Portainer
docker compose -f portainer.yml up -d
Key points:
- The container is named
$CONTAINER_NAMES_PORTainerto avoid hard‑coding IDs. - The
socketsmapping grants Portainer direct access to the Docker socket, enabling full API control. - Persistent storage (
portainer_data) ensures settings survive container recreation.
After the stack starts, navigate to http://<your‑ip>:9000 and complete the initial admin setup.
5. Deploy Kiwix for offline Wikipedia
```yaml# Save as kiwix.yml version: “3.8” services: kiwix: image: kiwix/kiwix-serve:latest container_name: $CONTAINER_NAMES_Kiwix restart: unless-stopped ports: - “8080:8080” volumes: - /srv/kiwix:/data:ro environment: - KIWIX_SITES=wiki/en command: [“kiwix-serve”, “/data/wiki/en/*.zim”, “–port”, “8080”, “–loglevel”, “info”]
1
2
3
4
5
6
```bash
# Create a directory for the ZIM files
sudo mkdir -p /srv/kiwix/wiki/en
# Download a sample Wikipedia dump (example: enwiki-2024-09-pages-articles.xml.gz)
curl -L https://download.kiwix.org/zim/wikipedia/enwiki/latest/wikipedia_en_all_maxi.z