Company Had A Bec Incident - They Want Me To Vibe Code Knowbe4
Company Had A BECIncident - They Want Me To Vibe Code KnowBe4
INTRODUCTION
A recent Business Email Compromise (BEC) incident forced a mid‑size organization to confront the reality that human‑centric attack vectors can bypass even the most hardened perimeter defenses. The security team discovered that a phishing email, cleverly crafted to mimic a trusted partner, succeeded in harvesting credentials and subsequently leveraged those credentials for lateral movement. In the aftermath, senior leadership issued a directive: implement a systematic phishing‑simulation program to train employees, detect vulnerable accounts, and ultimately reduce the likelihood of a repeat incident.
The immediate reaction was to reach for a commercial solution — KnowBe4 — because of its brand recognition and extensive content library. However, the internal DevOps group, accustomed to self‑hosted tooling and granular control over the attack chain, felt uneasy about handing over sensitive user data to a third‑party SaaS platform. The phrase “vibe code KnowBe4” captured the tension between the need for rapid, visible results and the desire to maintain autonomy over the simulation pipeline.
This guide walks through a complete, self‑hosted approach to building a phishing‑simulation workflow that satisfies both security and operational constraints. It covers the underlying concepts, prerequisite tooling, Docker‑based deployment, configuration hardening, day‑to‑day operations, and troubleshooting tactics. By the end, you will have a reproducible, auditable pipeline that can be integrated into any homelab or production homelab‑style environment, without relying on external SaaS subscriptions.
Key takeaways for the reader:
- Understand why a self‑hosted simulation platform can be more aligned with strict security policies.
- Learn how to spin up a complete phishing‑simulation stack using Docker containers, with explicit handling of container identifiers and status variables.
- Gain insight into security hardening, performance tuning, and integration with existing monitoring stacks.
- Acquire practical troubleshooting steps for common failure modes, from credential leakage to container orchestration errors.
The following sections assume familiarity with Linux system administration, Docker fundamentals, and basic networking concepts. If you are new to any of these topics, consider reviewing the linked official documentation before proceeding.
UNDERSTANDING THE TOPIC
What is a Phishing‑Simulation Platform?
A phishing‑simulation platform is a controlled environment that crafts, delivers, and tracks malicious email messages to evaluate an organization’s susceptibility to social‑engineering attacks. The platform typically provides three core capabilities:
- Template Management – A library of realistic email templates that can be customized to reflect industry‑specific scenarios.
- Delivery Engine – An SMTP or API‑driven sender that distributes the crafted messages to a defined recipient list.
- Metrics & Reporting – Collection of engagement data (opens, clicks, credential submissions) and generation of compliance reports.
When executed in a self‑hosted context, each component can be isolated, audited, and version‑controlled, reducing reliance on external service‑level agreements and mitigating data‑exfiltration concerns.
Historical Context
Early phishing‑testing efforts relied on manual crafting of emails and ad‑hoc tracking spreadsheets. The emergence of open‑source frameworks such as GoPhish (released in 2016) democratized the ability to run simulations internally. Subsequent projects — PhishSim, PhishTank, and Cofense Templates — expanded the feature set to include advanced lures, real‑time analytics, and integration with security information and event management (SIEM) pipelines.
Commercial vendors like KnowBe4 built upon these foundations, adding SaaS hosting, extensive template catalogs, and compliance certifications. While convenient, they introduce a trust boundary that may conflict with policies requiring data residency or strict insider‑threat controls.
Key Features & Capabilities
- Dynamic Template Engine – Allows conditional fields (e.g., employee name, department) to personalize each message.
- Rate Limiting & Throttling – Prevents accidental saturation of the organization’s mail infrastructure.
- Multi‑Channel Delivery – Supports not only email but also SMS, voice, and simulated credential‑capture pages.
- Integration Hooks – Webhooks or API endpoints to feed results into ticketing systems, Slack, or custom dashboards.
- Role‑Based Access Control (RBAC) – Granular permissions for administrators, auditors, and operational staff.
Pros and Cons
| Aspect | Self‑Hosted Solutions | Commercial SaaS (e.g., KnowBe4) |
|---|---|---|
| Data Residency | Full control; data never leaves internal network | Data stored on vendor’s servers; may require export agreements |
| Customization | Unlimited; can modify source code, add custom lures | Limited to vendor‑provided templates |
| Cost | Open‑source; only infrastructure cost | Subscription fees, often per‑user licensing |
| Scalability | Scales with container orchestration; can be auto‑scaled | Vendor handles scaling; may have usage caps |
| Compliance | Must certify own controls | Vendor may hold certifications (SOC 2, ISO 27001) |
| Support | Community‑driven; reliance on GitHub issues | Dedicated vendor support, SLAs |
Use Cases & Scenarios
- Employee Onboarding – New hires receive a baseline simulation to gauge awareness.
- Red‑Team Exercises – Align simulated phishing with adversary‑emulation goals.
- Compliance Audits – Produce audit‑ready reports for regulatory frameworks (e.g., GDPR, PCI‑DSS).
- Security Posture Testing – Correlate phishing click‑through rates with other security controls (e.g., MFA enforcement).
Current State & Future Trends
The industry is moving toward behavior‑driven simulation, where AI‑generated lures adapt in real time based on user interaction patterns. Open‑source projects are beginning to incorporate ML‑based template generation, while commercial platforms are experimenting with zero‑trust phishing that mimics legitimate internal communications more precisely.
From an infrastructure perspective, the trend is toward container‑native deployments that can be version‑controlled, CI/CD‑tested, and rolled out via GitOps pipelines. This aligns perfectly with DevOps principles of immutable infrastructure and observable operations.
Comparison With Alternatives
| Tool | License | Primary Language | Docker Support | Notable Features |
|---|---|---|---|---|
| GoPhish | Open‑source (MIT) | Go | Official Docker image | Simple UI, built‑in tracking DB |
| PhishSim | Commercial (proprietary) | Python | Community images | Advanced analytics, Slack integration |
| Cofense PhishMe | Commercial | Java/Scala | Limited | Enterprise‑grade support |
| KnowBe4 | Commercial SaaS | N/A | N/A | Largest template library, extensive reporting |
For organizations that prioritize data sovereignty and cost predictability, GoPhish remains the most mature open‑source option. Its Docker image can be customized, extended, and integrated into existing CI/CD pipelines, making it a natural fit for the “vibe code” approach. —
PREREQUISITES
System Requirements
| Component | Minimum Specification | Recommended Specification |
|---|---|---|
| CPU | 2 vCPU | 4 vCPU |
| RAM | 2 GB | 8 GB |
| Disk | 20 GB SSD | 100 GB SSD |
| Network | 1 Gbps NIC | 10 Gbps NIC (for high‑volume senders) |
Required Software
| Software | Minimum Version | Purpose |
|---|---|---|
| Docker Engine | 24.0 | Container runtime for all simulation components |
| Docker Compose | 2.20 | Orchestration of multi‑service stacks |
| PostgreSQL | 15 | Persistent storage for engagement metrics |
| Redis | 7 | Caching layer for rate‑limiting and background jobs |
| Nginx | 1.25 | Reverse proxy for UI and API endpoints |
| Python | 3.11 | Optional scripting for custom lure generation |
Network & Security Considerations
- Outbound Ports: Ensure outbound SMTP (port 25) and HTTP/HTTPS (ports 80/443) are permitted to the mail relay and any external tracking endpoints.
- Inbound Ports: Expose only the UI (typically 3000) and API (typically 8080) behind a TLS‑terminated reverse proxy.
- Firewall Rules: Restrict access to the container network to trusted management workstations.
- TLS Certificates: Use Let’s Encrypt or an internal PKI to terminate TLS on the public‑facing endpoints.
User Permissions
- Docker Group: Users who will run
docker compose upmust belong to thedockergroup or have equivalent sudo rights. - PostgreSQL Role: Create a dedicated role (e.g.,
phish_user) with limited privileges for the application to write metrics. - Redis Access: No special permissions required beyond standard Redis ACLs if enabled.
Pre‑Installation Checklist
- Verify Docker Engine version (
docker version). - Pull the required images (
docker compose pull). - Generate a strong secret key for JWT signing (
openssl rand -hex 32). 4. Create a dedicated system user for running the stack (useradd -m -s /bin/bash phishadmin). - Ensure persistent storage directories exist (
/var/lib/phishdata,/var/lib/redis). - Set appropriate file permissions (
chown -R 1000:1000 /var/lib/phishdata).
INSTALLATION & SETUP
Below is a step‑by‑step guide to deploying a complete phishing‑simulation stack using Docker Compose. All container identifiers are referenced via the standard $ placeholders to avoid conflicts with Jekyll Liquid syntax.
1. Directory Layout ```bash
mkdir -p ~/phishstack/{ui,api,db,redis} cd ~/phishstack
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
### 2. Docker Compose File
Create a file named `docker-compose.yml` with the following content.
```yaml
version: "3.9"
services:
ui:
image: gophish/gophish:latest
container_name: $CONTAINER_NAMES-ui restart: unless-stopped
environment:
- ADMIN_PASSWORD=$ADMIN_PASSWORD ports:
- "3000:3000"
volumes:
- ./ui:/data
depends_on:
- api
- db
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/api/ping"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s api:
image: gophish/gophish:latest
container_name: $CONTAINER_NAMES-api
restart: unless-stopped
environment:
- ADMIN_PASSWORD=$ADMIN_PASSWORD
- DB_HOST=db
- DB_USER=phish_user
- DB_PASSWORD=$DB_PASSWORD - DB_NAME=phishdb
volumes:
- ./api:/data
depends_on:
- db
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/api/ping"]
interval: 30s
timeout: 10s
retries: 3 start_period: 40s
db:
image: postgres:15-alpine
container_name: $CONTAINER_NAMES-db
restart: unless-stopped environment:
- POSTGRES_DB=phishdb
- POSTGRES_USER=phish_user
- POSTGRES_PASSWORD=$DB_PASSWORD
volumes:
- ./db:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "phish_user"]
interval: 30s
timeout: 10s
retries: 3
redis:
image: redis:7-alpine
container_name: $CONTAINER_NAMES-redis restart: unless-stopped
command: ["redis-server", "--appendonly", "yes"]
volumes:
- ./redis:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 10s
retries: 3
Explanation of Placeholders
$ADMIN_PASSWORD– A strong, randomly generated password stored in an environment file (see step 3).$DB_PASSWORD– Another random secret used for PostgreSQL authentication.$CONTAINER_NAMES– A prefix you define (e.g.,phish_) to ensure