VaultStream
Encrypted file vault with per-file KMS encryption and ~85% storage cost reduction across 15+ AWS services
15+ AWS services
487+ tests / 19 properties
~85% storage cost cut
Sub-100ms cached reads
Overview
Secure file storage platform where users upload, encrypt, share, and manage sensitive documents. Every file gets its own encryption key via KMS envelope encryption. Storage costs optimize automatically through S3 lifecycle tiering, files are delivered globally through CloudFront signed URLs, and thumbnails and virus scans run asynchronously through event-driven Lambda workers. Polyglot persistence pairs DynamoDB single-table design for metadata, RDS PostgreSQL for compliance-grade audit logs, and ElastiCache Redis for sub-millisecond cached reads. Infrastructure is defined entirely in AWS CDK (TypeScript) across 8 stacks.
Architecture Diagram
Design Decisions
- →Serverless-first with the same Express codebase running on Lambda or ECS Fargate via @vendia/serverless-express. Provisioned concurrency (2 instances) eliminates cold starts on critical paths while keeping pay-per-request economics.
- →Per-file envelope encryption: each upload triggers KMS GenerateDataKey for a unique 256-bit DEK. The encrypted DEK is stored beside file metadata, the plaintext key is discarded immediately, and S3 Bucket Key cuts KMS API costs by 99%.
- →DynamoDB single-table design with 3 overloaded GSIs stores users, files, folders, shares, versions, and comments. One Query returns an entire item collection in a single round trip, reducing read costs by ~67%.
- →Polyglot persistence: DynamoDB for single-digit-ms metadata, PostgreSQL for complex time-range audit queries with JSONB, and Redis cache-aside for sub-ms reads with graceful degradation when Redis is unavailable.
- →Intelligent storage tiering via S3 lifecycle rules (Standard → Standard-IA → Glacier Instant Retrieval → Deep Archive), with a Lifecycle Processor Lambda updating DynamoDB metadata on each transition.
- →AWS CDK over Terraform so infrastructure shares the application's TypeScript types, uses L2/L3 constructs with built-in best practices, and bundles Lambdas natively with esbuild.
Deployment
Deployed entirely through AWS CDK v2 (TypeScript) across 8 stacks. The same Express codebase runs on Lambda (with provisioned concurrency to kill cold starts) or ECS Fargate via @vendia/serverless-express. Storage spans S3 with SSE-KMS and lifecycle tiering, DynamoDB single-table with overloaded GSIs, RDS PostgreSQL with monthly-partitioned immutable audit logs and a read replica, and ElastiCache Redis for cache-aside reads. CloudFront serves shared files via signed URLs with Origin Access Control, fronted by WAF for rate limiting, SQL injection blocking, XSS prevention, and geo-blocking. CI/CD runs on GitHub Actions using OIDC no long-lived keys with 487+ unit tests and 19 property-based tests gating every deploy at 80% minimum coverage.
Lessons Learned
The biggest insight was that no single database fits every workload. Forcing everything into DynamoDB would have made the time-range audit queries painful, while forcing metadata into PostgreSQL would have lost the single-digit-ms reads. Polyglot persistence DynamoDB for metadata, PostgreSQL for audit, Redis for cache let each engine do what it's best at. Per-file envelope encryption sounded expensive until S3 Bucket Key cut KMS API costs by 99%; without it, every read and write would hit KMS directly. The 19 property-based tests proved their worth by surfacing edge cases in quota arithmetic and authorization that example-based tests never hit proving an invariant holds across thousands of generated inputs is a different level of confidence than a handful of hand-picked cases.


