My Hp Dl380 Is Now Running An Ai I Can Literally Call On The Phone

Posted Sep 8, 2025

By Usman Masood Ashraf

views 8 min read

My HP DL380 Is Now Running An AI I Can Literally Call On The Phone

Introduction

In the world of enterprise hardware repurposing, few moments are as satisfying as breathing new life into retired server equipment. When an HP DL380 Gen9 server - originally designed for traditional data center workloads - starts conducting natural voice conversations through a self-hosted AI stack, we’ve crossed into uncharted territory for homelab enthusiasts and DevOps professionals.

The challenge of creating a truly autonomous conversational AI has traditionally required massive cloud resources and proprietary APIs. But recent advancements in open-source machine learning have enabled a fascinating convergence: enterprise-grade hardware meeting cutting-edge AI in completely self-contained deployments. This breakthrough eliminates cloud dependencies, reduces latency to human-conversation levels (under 300ms round-trip), and maintains strict data privacy - all critical concerns for infrastructure professionals.

In this comprehensive guide, you’ll learn how to transform an HP DL380 (or similar server hardware) into a fully functional AI telephone companion using:

Asterisk PBX (voice call infrastructure)
OpenAI’s Whisper (real-time speech recognition)
Mistral 7B (local large language model)
Coqui XTTS (neural text-to-speech with voice cloning)

We’ll cover hardware requirements, software configuration, performance optimization, and security hardening - with all components running on bare metal or Docker containers. Whether you’re a sysadmin exploring AI workloads or a DevOps engineer building on-premises voice assistants, this stack demonstrates what’s possible with modern open-source tooling.

Understanding the Technology Stack

Component Breakdown

1. Asterisk PBX
The bedrock of our telephony system, Asterisk handles SIP signaling, call routing, and audio stream management. Its modular architecture allows integration with our AI components through the AGI (Asterisk Gateway Interface).

Key Features:

SIP/TLS for secure call setup
RTP (Real-time Transport Protocol) audio handling
AGI interface for external program control
Dialplan scripting for call flow management

2. Whisper (OpenAI’s Speech Recognition)
The real-time transcription engine converts spoken words to text with remarkable accuracy. We’re using the faster-whisper implementation which provides:

Real-time streaming transcription
Multi-language support
Word-level timestamps
Optimized CUDA execution

3. Mistral 7B
This 7-billion parameter language model delivers surprisingly coherent responses while remaining small enough to run locally on consumer GPUs. Key advantages include:

Apache 2.0 license (commercial-friendly)
32k token context window
Instruction-following capabilities
Optimized for low-latency inference

4. Coqui XTTS
The open-source text-to-speech system that gives our AI a human-like voice:

Voice cloning from short samples
Emotional tone control
Streaming API for real-time playback
Support for multiple speakers

Architectural Flow

Caller --> SIP (Asterisk) --> Audio Stream --> Whisper (Speech-to-Text)
                                      ↓
Mistral (Process Text) --> Response Text --> XTTS (Text-to-Speech)
                                      ↓
Asterisk <-- Audio Stream <-- Synthesized Speech

Why Local Deployment Matters

Latency Control: Cloud-based solutions introduce unpredictable delays (often 500ms+). Our local stack achieves 200-300ms round-trip latency.
Data Sovereignty: Voice data never leaves your infrastructure - critical for healthcare, finance, or personal projects.
Cost Predictability: Eliminates API call expenses - particularly important for high-volume usage.
Customization: Full control over models, prompts, and voice characteristics.

Prerequisites

Hardware Requirements

Minimum Specifications (Tested Configuration):

HP DL380 Gen9 (or comparable server)
Dual Intel Xeon E5-2690v3 (24 cores total)
128GB DDR4 ECC RAM
NVIDIA T4 GPU (16GB VRAM) - critical for ML workloads
Hardware RAID controller (RAID 10 recommended)
Dual power supplies
Intel X520-DA2 10GbE NIC (for VoIP traffic isolation)

Storage Considerations:

500GB SSD for OS and applications
1TB NVMe cache for Whisper temp files
2TB HDD for voice samples and logs

Software Requirements

Base Operating System:

  
# Ubuntu 22.04.3 LTS (Jammy Jellyfish)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

Critical Dependencies:

# NVIDIA Drivers and CUDA
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03   Driver Version: 535.129.03   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+

Version-Locked Components:

Python 3.10.12
Docker 24.0.7
NVIDIA Container Toolkit 1.14.1
Asterisk 20.3.0
faster-whisper 0.10.0
Mistral 7B Instruct v0.2
Coqui XTTS 2.0.2

Network Configuration

Required Ports: | Port | Protocol | Service | Notes | |——-|———-|—————|——————————–| | 5060 | TCP/UDP | SIP | Standard SIP port | | 5061 | TCP | SIP/TLS | Secure SIP | | 10000-20000 | UDP | RTP | Dynamic audio ports | | 8000 | TCP | API | Whisper/XTTS HTTP endpoints |

Security Considerations:

Physically separate VoIP VLAN
Fail2ban configuration for SIP ports
TLS 1.3 for SIP signaling
SRTP (Secure RTP) for audio encryption

Installation & Setup

1. Base System Preparation

Kernel Optimization:

  
# /etc/sysctl.conf
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_keepalive_time=60
vm.swappiness=10

GPU Driver Installation:

  
$ sudo apt install -y nvidia-driver-535 cuda-toolkit-12-2
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker

2. Asterisk Configuration

Installation:

  
$ sudo apt install -y asterisk asterisk-dev libopus-dev

/etc/asterisk/sip.conf:

  
[transport-udp]
type=transport
protocol=udp
bind=0.0.0.0:5060

[transport-tls]
type=transport
protocol=tls
bind=0.0.0.0:5061

[ai_phone]
type=endpoint
context=ai_incoming
disallow=all
allow=ulaw
allow=opus
auth=ai_phone
aors=ai_phone

[ai_phone]
type=auth
auth_type=userpass
password=SecurePass123!
username=ai_phone

[ai_phone]
type=aor
max_contacts=1

/etc/asterisk/extensions.conf:

  
[ai_incoming]
exten => s,1,Answer()
same => n,AGI(agi://127.0.0.1:3000/ai-agi)
same => n,Hangup()

3. Whisper Deployment

Using faster-whisper in Docker:

  
$ docker run -d --gpus all --name whisper \
  -p 8000:8000 \
  -v /opt/whisper/cache:/root/.cache \
  ghcr.io/guillaumekln/faster-whisper:latest \
  --model small.en \
  --compute_type float16 \
  --server_port 8000

Verification:

  
$ curl -X POST http://localhost:8000/asr \
  -H "Content-Type: audio/wav" \
  --data-binary @test.wav

{"text":"this is a test of the whisper transcription system","language":"en"}

4. Mistral Inference Server

Using Ollama for local LLM:

  
$ docker run -d --gpus all --name ollama \
  -p 11434:11434 \
  -v /opt/ollama:/root/.ollama \
  ollama/ollama:latest

$ docker exec ollama ollama pull mistral:7b-instruct-v0.2-q4_K_M

Test Query:

  
$ curl http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt":"Why is the sky blue?"
}'

5. Coqui XTTS Setup

Docker Deployment:

  
$ docker run -d --gpus all --name xtts \
  -p 8020:8020 \
  -e "XTTS_MODEL=tts_models/multilingual/multi-dataset/xtts_v2" \
  coqui/xtts:v2.0.2

Voice Cloning:

  
from TTS.api import TTS
tts = TTS("xtts")
tts.tts_to_file(text="Hello world", 
                speaker_wav="reference.wav",
                language="en",
                file_path="output.wav")

Configuration & Optimization

Asterisk Performance Tuning

/etc/asterisk/asterisk.conf:

  
[options]
execincludes=yes
highpriority=yes
internal_timing=yes
max_load = 1.5 ; Avoid overloading system

[files]
astcache = /dev/shm/astcache ; Use RAM disk

RTP Optimization:

  
; /etc/asterisk/rtp.conf
[general]
rtpstart=10000
rtpend=20000
rtpchecksums=no ; Improve performance

Whisper Model Selection

Model	VRAM Usage	Speed (RTF)	Accuracy
tiny.en	1GB	0.1x	60%
base.en	1.5GB	0.2x	70%
small.en	5GB	0.4x	80%
medium.en	10GB	0.8x	90%

  
# Start container with different model
$ docker run ... faster-whisper --model medium.en

Mistral Prompt Engineering

System Prompt Template:

You are an AI assistant named "JARVIS" answering phone calls. 
Respond concisely in under 15 words. 
Current time: {time}.
Last caller: {last_caller}.
Context: {call_context}

Temperature Settings:

  
# ollama-modelfile
FROM mistral:7b-instruct-v0.2-q4_K_M
PARAMETER temperature 0.3 # Lower for predictable responses
PARAMETER num_ctx 4096 # Balance memory and context

XTTS Voice Cloning Optimization

High-Quality Samples:

10-30 seconds of clean speech
Consistent microphone positioning
Neutral background noise
Multiple emotional tones

Real-Time Streaming:

  
# Stream TTS while generating
for chunk in tts_stream:
    asterisk_stream.write(chunk)

Usage & Operations

Starting the Full Stack

Systemd Service File (/etc/systemd/system/ai-phone.service):

  
[Unit]
Description=AI Phone System
After=docker.service

[Service]
ExecStart=/usr/bin/docker-compose -f /opt/ai-phone/docker-compose.yml up
ExecStop=/usr/bin/docker-compose -f /opt/ai-phone/docker-compose.yml down

[Install]
WantedBy=multi-user.target

docker-compose.yml:

  
version: '3.8'

services:
  asterisk:
    image: asterisk:20
    ports:
      - "5060:5060/udp"
      - "5061:5061/tcp"
    volumes:
      - ./asterisk/config:/etc/asterisk
    devices:
      - "/dev/dsp:/dev/dsp"

  whisper:
    image: faster-whisper:gpu
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - "8000:8000"

  mistral:
    image: ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  xtts:
    image: xtts:gpu
    ports:
      - "8020:8020"

Monitoring Commands

Check GPU Utilization:

  
$ watch -n 1 nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv

Asterisk Call Monitoring:

  
$ asterisk -rx "core show channels"
$ asterisk -rx "pjsip show endpoints"

Backup Procedures

Model Backup Script:

  
#!/bin/bash
# Backup AI models
rsync -av /opt/ollama /backup/ollama-$(date +%F)
rsync -av /opt/xtts-voices /backup/voices-$(date +%F)

# Backup Asterisk config
tar czf /backup/asterisk-$(date +%F).tgz /etc/asterisk

Troubleshooting

Common Issues

1. Audio Latency Spikes

Check GPU temperature: nvidia-smi -q -d TEMPERATURE
Reduce Whisper model size
Isolate RTP traffic to dedicated NIC

2. Incomplete Transcriptions

  
# Increase Whisper beam size
$ docker run ... faster-whisper --beam_size 5

3. LLM Response Delays

Monitor VRAM usage: nvidia-smi -l 1

Enable layer offloading for Mistral:

$ ollama run mistral --num_gpu_layers 32

4. SIP Registration Failures

  
$ asterisk -rx "pjsip set logger on"
$ tail -f /var/log/aster

Open Source, Reddit Guides, Docker

This post is licensed under CC BY 4.0 by the author.