Deploy Ollama AI server on AWS EC2 with GPU support, systemd service, remote API access, and Open WebUI. Takes 15 minutes from launch to chat interface.
Prerequisites
-
AWS Account with billing enabled
-
Instance: t3.large (16GB RAM, $0.10/hr) or g4dn.xlarge (GPU, $0.53/hr)
-
Key Pair: Create/download
.pemfile during launch -
Domain (optional): For HTTPS reverse proxy
Step 1: Launch EC2 Instance
-
AWS Console → EC2 → “Launch Instance”
-
Name:
ollama-server -
AMI: “Ubuntu Server 24.04 LTS”
-
Instance Type:
-
CPU:
t3.large(2 vCPU, 8GB RAM) -
GPU:
g4dn.xlarge(4 vCPU, 16GB RAM, NVIDIA T4)
-
-
Key Pair: Create new → Download
ollama-key.pem -
Network: Default VPC, enable public IP
-
Storage: 50GB GP3 SSD
-
Launch → Copy Public IP
![AWS EC2 launch screen]
Step 2: Configure Security Group
Edit Inbound Rules on your instance’s Security Group:
| Type | Protocol | Port Range | Source |
|---|---|---|---|
| SSH | TCP | 22 | Your IP |
| Ollama | TCP | 11434 | 0.0.0.0/0 |
| WebUI | TCP | 3000 | Your IP |
Step 3: SSH Connect & Update
chmod 400 ollama-key.pem
ssh -i ollama-key.pem ubuntu@YOUR-EC2-PUBLIC-IPsudo apt update && sudo apt upgrade -ysudo apt install curl htop ufw -y
Step 4: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
Step 5: GPU Setup (g4dn.xlarge ONLY)
sudo apt install nvidia-driver-550 nvidia-cuda-toolkit -y
sudo reboot
# Reconnect SSH
nvidia-smi
Step 6: Systemd Service
sudo nano /etc/systemd/system/ollama.service
Service file content:
[Unit]
Description=Ollama Server
After=network.target[Service]ExecStart=/usr/local/bin/ollama serve –host 0.0.0.0
Restart=always
User=ubuntu
Environment=”OLLAMA_HOST=0.0.0.0:11434″
[Install]WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama
Step 7: Test Models
ollama pull llama3.2:3b
curl http://YOUR-EC2-IP:11434/api/tags
Step 8: Open WebUI (Chat Interface)
# Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker ubuntu
# Logout/logindocker run -d -p 3000:8080 -v open-webui:/app/backend/data \–add-host=host.docker.internal:host-gateway \
–name open-webui –restart always \
ghcr.io/open-webui/open-webui:main
Access: http://YOUR-EC2-IP:3000
![Open WebUI Chat Interface]
Step 9: Security Hardening
sudo ufw allow 22 && sudo ufw allow 11434 && sudo ufw allow 3000
sudo ufw enablesudo nano /etc/ssh/sshd_config# PasswordAuthentication no
sudo systemctl restart ssh
Cost Comparison
| Instance | Hourly | Monthly | Use Case |
|---|---|---|---|
| t3.large | $0.083 | $60 | Testing |
| g4dn.xlarge | $0.526 | $380 | Production |
Quick Test Commands
# Check everything works
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello world!"
}'# Monitor resourceswatch -n 2 ‘htop && df -h’
✅ Done! Your Ollama server is live at http://YOUR-IP:11434
Copy this entire block directly into WordPress Gutenberg or Classic Editor – ready to publish!


Leave a Reply
You must be logged in to post a comment.