LocalPrompt.ai – Secure infrastructure

🔐 Secure local infrastructure

This page documents how LocalPrompt systems are deployed in environments where network isolation, controlled access and clear trust boundaries are required, such as research labs and enterprise environments.

🎯 Design goals

Run AI workloads fully local, without cloud dependency.
Keep GPU systems isolated from organizational networks.
Allow remote access without any inbound exposure.
Make the trust and risk point explicit and auditable.

🧱 Core components

One or more GPU workstations in a physically separated network.
A dedicated uplink (4G or 5G router), typically with a dynamic IP.
A single bastion server with a static IP address.

The bastion server is the only public-facing component. GPU systems never accept inbound connections.

🌐 Network isolation

No cable connection to internal IT networks.
No VLANs, no shared WiFi, no bridging.
GPU systems remain fully functional offline.

🔁 Outbound-only reverse SSH

Each GPU system establishes an outbound reverse SSH tunnel to the bastion server. This works behind NAT, firewalls and dynamic IP addresses.

No inbound ports opened on the GPU system.
The tunnel can be dropped at any time without impact.
All traffic is end-to-end encrypted.

🗝️ Authentication model

GPU systems authenticate to the bastion using dedicated tunnel keys.
Users authenticate using personal SSH keys.
SSH agent forwarding is used, no private keys on the bastion.
Mandatory TOTP-based two-factor authentication.

🧑‍💻 Access flow

GPU system maintains an outbound tunnel to the bastion.
User connects to the bastion using SSH key + TOTP.
User connects through the tunnel to the GPU system.

At no point is the GPU system directly reachable from the internet.

🖥️ Optional graphical access

If required, the existing local X11 desktop can be accessed using VNC over the same reverse SSH tunnel, without opening any inbound ports.

📌 What this achieves

Clear and minimal attack surface.
Explicit trust boundary at the bastion.
No cloud dependency.
Linear scalability to multiple GPU systems.