My First 10 Inch Rack With Local Llm No More Spotify Google Home Netflix Chatgpt

Posted May 9, 2026

By Usman Masood Ashraf

views 8 min read

INTRODUCTION

The rapid convergence of cheap hardware, open‑source tooling, and a growing appetite for self‑hosted services has turned the humble 10‑inch rack into a viable centerpiece for modern homelabs. In this post we unpack the exact scenario described by the title – a compact rack that runs a local large language model (LLM), replaces commercial streaming endpoints such as Spotify, Google Home, Netflix, and ChatGPT, and still manages to host a full smart‑home stack, a Kiwix‑based offline Wikipedia mirror, and a lightweight NAS. The guide is written for seasoned sysadmins and DevOps engineers who already understand the fundamentals of infrastructure automation but are looking to tighten the feedback loop between hardware sizing, service orchestration, and cost‑effective operation.

Why does this matter? Traditional cloud‑first architectures often hide the true operational cost of always‑on services behind managed APIs. When you replace those APIs with locally deployed equivalents you gain:

Predictable latency and bandwidth usage
Full control over data privacy and retention policies
The ability to continue serving critical functionality during internet or cellular outages – a reality for many remote or disaster‑prone locations

Readers will walk away with a concrete blueprint for:

Selecting hardware that fits a 10‑inch footprint while delivering enough headroom for a local LLM and multiple containers
Installing and configuring a stack that includes Docker, Portainer, Kiwix, and a locally hosted LLM inference server
Hardening the environment against common security pitfalls
Monitoring, backing up, and scaling the services without introducing unnecessary complexity

The following sections break down each of these topics in depth, using real‑world commands, configuration snippets, and best‑practice recommendations that you can copy‑paste into your own lab.

UNDERSTANDING THE TOPIC

What is a “local LLM” and why does it replace Spotify, Google Home, Netflix, ChatGPT? A large language model (LLM) is a neural network trained on massive text corpora to generate human‑like responses. When deployed locally you can expose it through an API that your own applications query instead of calling external services. In the context of the title, the local LLM acts as a unified replacement for:

Spotify – by serving personalized music recommendations and playlist generation through a self‑hosted recommendation engine * Google Home – by handling voice‑triggered commands locally, turning smart‑home devices on/off without relying on Google’s cloud
Netflix – by providing a catalog of locally cached media metadata and on‑demand transcoding when paired with a Plex‑style backend
ChatGPT – by offering a private, offline conversational interface that never sends user data to third‑party servers

The key benefit is data sovereignty: all interactions stay inside your rack, eliminating the need for outbound API calls that consume bandwidth and expose you to service‑level outages.

Historical context

The concept of “self‑hosted entertainment” dates back to the early 2000s when projects like MPlayer and Xine allowed users to play local media without streaming services. The rise of Docker in 2013 made it feasible to package these tools alongside modern web‑based front‑ends, while the advent of open‑source LLM frameworks such as llama.cpp, text‑generation‑webui, and Ollama in 2022–2023 closed the gap between research prototypes and production‑ready inference services. Today, a 10‑inch rack can comfortably run a 7 B parameter model on a modest CPU‑only platform, thanks to quantized inference and efficient memory management.

Core components of the described stack

Component	Role	Typical Software	Why it fits a 10‑inch rack
Beelink ME Mini	Bare‑metal host	Ubuntu Server 22.04 LTS	Low power (≈ 70 W idle), compact dimensions, enough SATA/NVMe slots for NAS and SSD
Docker Engine	Container runtime	Docker CE 24.x	Provides isolation, easy image distribution, and resource constraints
Portainer	UI for container management	Portainer CE 2.11	Simplifies monitoring and troubleshooting without SSH fatigue
Kiwix	Offline Wikipedia mirror	Kiwix‑server 0.10	Stores massive text corpora locally, serving via HTTP with minimal CPU load
Local LLM inference server	Replaces ChatGPT and other cloud LLMs	Ollama or text‑generation‑webui with `ggml`‑quantized model	Runs on CPU with 4‑bit quantization, fitting within 8 GB RAM envelope
Plex / Jellyfin	Media streaming backend	Jellyfin 10.9	Open‑source alternative to Netflix‑style on‑demand streaming
Home Assistant	Smart‑home orchestration	Home Assistant 2024.9	Controls Google Home‑compatible devices locally
NAS layer	Persistent storage	TrueNAS SCALE or OpenMediaVault	Provides SMB/NFS shares for media and backup data

Each of these pieces can be containerized, allowing you to scale, update, or replace them independently. The rack’s small footprint means you can mount it on a wall, a shelf, or a standard 19‑inch rackmount kit without needing a dedicated server room.

Pros and cons of this approach

Pros

Energy efficiency – Idle power stays under 70 W, translating to < 100 kWh per year in many climates.
Resilience – Kiwix and the local LLM continue to serve data during ISP outages.
Cost control – No recurring subscription fees for music, video, or AI APIs.
Learning opportunity – Hands‑on experience with Docker, network routing, and storage provisioning. Cons
Initial setup time – Requires careful planning of hardware compatibility and network segmentation.
Limited scalability – CPU‑only LLMs cannot match the throughput of dedicated GPU clouds for heavy traffic.
Feature gaps – Some premium features (e.g., high‑resolution HDR streaming) may be unavailable without commercial licences.

Overall, the trade‑off leans heavily toward self‑sufficiency for hobbyist and semi‑professional use cases.

PREREQUISITES

Before you begin, verify that your hardware meets the minimum specifications. The following checklist assumes a BeagleBoard‑style 10‑inch chassis similar to the Beelink ME Mini, but the same principles apply to any low‑profile server.

Hardware requirements

Item	Minimum	Recommended
CPU	4‑core ARMv8 (e.g., Rockchip RK3568)	8‑core Intel/AMD Xeon with AES‑NI for faster encryption
RAM	8 GB	16 GB (to accommodate multiple containers and a 4‑bit LLM)
Storage	256 GB NVMe SSD	1 TB NVMe SSD (fast random I/O for Kiwix and media cache)
Network	Gigabit Ethernet	2.5 GbE or 10 GbE for future media‑heavy workloads
Power supply	12 V 5 A	12 V 10 A with UPS integration for outage tolerance

Software dependencies

Dependency	Version	Installation source
Ubuntu Server	22.04 LTS	Official Ubuntu mirrors
Docker Engine	24.0.x	Docker apt repository
Docker Compose	2.20.x	Bundled with Docker CE
Git	2.34.x	Ubuntu packages
curl	7.88.x	Ubuntu packages
jq	1.6.x	Ubuntu packages
ffmpeg	5.1.x	Ubuntu packages (for media transcoding)
Jellyfin	10.9.x	Official Jellyfin repository
Ollama	0.1.45	Ollama download page

Network and security considerations

Static IP assignment – Reserve a DHCP lease or configure a static address (e.g., 192.168.1.10) to simplify firewall rules.
Port isolation – Keep management interfaces (Portainer, Home Assistant) on a separate VLAN or a dedicated subnet.
Outbound firewall – Block all outbound traffic except DNS and NTP; this enforces the “no external API” philosophy.
TLS termination – Use Let’s Encrypt certificates via certbot for any publicly exposed services (e.g., Kiwix front‑end). #### User permissions

Create a dedicated system user for Docker operations to avoid running containers as root:

  
sudo adduser --system --group --no-create-home dockeradmin
sudo usermod -aG docker dockeradmin

All subsequent Docker commands in this guide assume you are logged in as dockeradmin or have sudo privileges.

INSTALLATION & SETUP

The following sections walk you through each layer of the stack, from Docker foundation to the final LLM endpoint. Every command is annotated with explanations of why it is needed and how it interacts with the surrounding components.

1. Install Docker Engine

  
# Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

# Set up the stable repository
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

# Update package index and install Docker
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# Verify installation
docker version```

*Why this matters*: Using the official repository ensures you receive security patches promptly.  The `docker version` output confirms that the daemon (`dockerd`) and client (`docker`) are correctly installed.

#### 2. Enable and start Docker service  

```bash
sudo systemctl enable --now docker
sudo systemctl status docker```

*Explanation*: Enabling the service guarantees that containers start on boot, which is essential for a 24/7 homelab.  The status check confirms the daemon is healthy.

#### 3. Install Docker Compose (v2 plugin)  

```bash
DOCKER_COMPOSE_VERSION="v2.20.0"
sudo curl -SL "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-linux-$(uname -m)" \
  -o /usr/local/lib/docker/cli-plugins/docker-compose
sudo chmod +x /usr/local/lib/docker/cli-plugins/docker-compose
docker compose version

Why: Compose v2 integrates natively with the Docker CLI, allowing you to manage multi‑container stacks with a single docker compose command.

4. Deploy Portainer for visual management

```yaml# Save as portainer.yml version: “3.8” services: portainer: image: portainer/portainer-ce:latest container_name: $CONTAINER_NAMES_PORTainer restart: unless-stopped sockets: - type: unix source: path: /var/run/docker.sock target: socket: /var/run/docker.sock ports: - “9000:9000” volumes: - /var/run/docker.sock:/var/run/docker.sock - portainer_data:/data

volumes: portainer_data:

```bash# Create the named volume
docker volume create portainer_data

# Launch Portainer
docker compose -f portainer.yml up -d

Key points:

The container is named $CONTAINER_NAMES_PORTainer to avoid hard‑coding IDs.
The sockets mapping grants Portainer direct access to the Docker socket, enabling full API control.
Persistent storage (portainer_data) ensures settings survive container recreation.

After the stack starts, navigate to http://<your‑ip>:9000 and complete the initial admin setup.

5. Deploy Kiwix for offline Wikipedia

```yaml# Save as kiwix.yml version: “3.8” services: kiwix: image: kiwix/kiwix-serve:latest container_name: $CONTAINER_NAMES_Kiwix restart: unless-stopped ports: - “8080:8080” volumes: - /srv/kiwix:/data:ro environment: - KIWIX_SITES=wiki/en command: [“kiwix-serve”, “/data/wiki/en/*.zim”, “–port”, “8080”, “–loglevel”, “info”]

```bash
# Create a directory for the ZIM files
sudo mkdir -p /srv/kiwix/wiki/en
# Download a sample Wikipedia dump (example: enwiki-2024-09-pages-articles.xml.gz)
curl -L https://download.kiwix.org/zim/wikipedia/enwiki/latest/wikipedia_en_all_maxi.z

Open Source, Reddit Guides, Docker

This post is licensed under CC BY 4.0 by the author.