CloudPulse
Self-healing infrastructure dashboard with under 2-minute recovery and 101 tests across 17 suites
Under 2min self-healing
101 tests / 17 suites
Single-command deploy
Overview
Self-healing infrastructure health dashboard on AWS. Monitors its own infrastructure, displays real-time status, and automatically recovers from failures without human intervention. Deployed on EC2 instances behind an ALB across 2 AZs with Auto Scaling. Kill an instance and watch the system heal itself in real time. Infrastructure provisioned entirely through AWS CLI shell scripts every API call visible, commented, and explained. Built as a hands-on learning tool for AWS Core Compute and Networking.
Architecture Diagram
Design Decisions
- →Chose TypeScript with strict mode over plain JavaScript for compile-time type safety. The Result pattern (discriminated unions) handles errors without thrown exceptions, making failure paths explicit and testable.
- →Dependency injection through a container means every AWS SDK interaction is behind an interface. Tests inject mocks directly no aws-sdk-mock library needed. Fast and deterministic tests.
- →Server-side rendering with EJS instead of a frontend framework. The dashboard is a single HTML page that auto-refreshes. No build step for the frontend, no client-side JavaScript complexity.
- →AWS CLI scripts instead of Terraform for infrastructure. Intentional the goal is to understand each API call, not abstract it away. Every command is commented with what it does and why.
- →Property-based tests (fast-check) verify 8 formal correctness properties hold across 100 random inputs per property, catching edge cases unit tests miss.
Deployment
Single-command deployment via bash (infra/deploy.sh). Runs five scripts in dependency order (IAM, VPC, ALB, Compute, Monitoring), waits for resources to become available, and prints the ALB URL. Teardown is equally simple (infra/teardown.sh) deletes everything in reverse dependency order. Infrastructure: VPC with public/private subnets across 2 AZs, NAT Gateways per AZ, ALB with health checks on /health, Auto Scaling Group (2-4 instances, CPU-based), CloudWatch custom metrics and alarms, IAM roles with least-privilege, SSM Session Manager for access. CI/CD via GitHub Actions: TypeScript compilation, unit/property/integration tests, shell script syntax validation on all 15 scripts.
Lessons Learned
Git Bash on Windows converts paths starting with / to Windows paths. The health check path /health became C:/Program Files/Git/health during deployment. Fixed by setting MSYS_NO_PATHCONV=1. ALB network interfaces take 5-10 minutes to fully release after deletion the teardown script needs to wait before deleting security groups. IAM is eventually consistent: creating an instance profile and immediately using it in a launch template can fail. Added a 10-second sleep after IAM operations to allow propagation.

