AI Search

Classic Search

Search Phrase:

Search Type:

Advanced search options

Search in Forums:

Search in date period:

Sort Search Results by:

AI Assistant

Notifications

Clear all

Cross-cloud disaster recovery - our Netflix-style approach

✦ Summarize Topic

AWS Cloud

Last Post by Maria Jimenez 2 months ago

12 Posts

12 Users

0 Reactions

478 Views

RSS

[#81]

26/10/2025 6:01 pm

Topic starter

Translate ▼

Linda Morgan

(@linda.morgan757)

New Member

0 Posts
0 0 0

We're running cross-cloud disaster recovery - our netflix-style approach in production and wanted to share our experience.

Scale:
- 438 services deployed
- 24 TB data processed/month
- 44M requests/day
- 5 regions worldwide

Architecture:
- Compute: Lambda + Step Functions
- Data: Redshift
- Queue: EventBridge

Monthly cost: ~$156k

Lessons learned:
1. Spot instances are production-ready
2. NAT Gateways are costly
3. FinOps team paid for itself

AMA about our setup!

Patricia Morgan

06/11/2025 9:18 am

Translate ▼

We evaluated Kubernetes last quarter and decided against it due to learning curve. Instead, we went with Grafana which better fit our use case. The main factors were cost (30% cheaper), ease of use (2-day vs 2-week training), and community support.

Michelle Ross

11/11/2025 11:21 am

Translate ▼

Exactly! This is what we implemented last month.

Matthew Ramos

13/11/2025 12:14 am

Translate ▼

For those asking about cost: in our case (AWS, us-east-1, ~500 req/sec), we're paying about $5000/month. That's 70% vs our old setup with Kubernetes. ROI was positive after just 2 months when you factor in engineering time saved.

Show 8 more comments

Add a comment

03/12/2025 12:19 am

Translate ▼

Evelyn Sanders

(@evelyn.sanders800)

New Member

0 Posts
0 0 0

This is a game changer for teams doing GitOps! We integrated it with our existing Jenkins + Docker and the results were immediate. Developer productivity up 40%, deployment frequency up 3x, and MTTR down 60%. Best investment we made this year.

Add a comment

20/12/2025 12:11 am

Translate ▼

Alex Chen

(@alex_kubernetes)

New Member

0 Posts
0 0 0

Pro tip: if you're implementing this, make sure to configure scaling parameters correctly. We spent 2 weeks debugging random failures only to discover the default timeout was too low. Changed from 30s to 2min and all issues disappeared.

Add a comment

22/12/2025 11:48 pm

Translate ▼

Maria Jimenez

(@maria.jimenez673)

New Member

0 Posts
0 0 0

The migration path we took:
Week 1-2: Research & POC
Week 3-4: Staging deployment
Week 5-6: Prod rollout (10% -> 50% -> 100%)
Week 7-8: Optimization
Total cost: ~200 eng hours
Would do it again in a heartbeat.

Add a comment

10 Forums
93 Topics
1,770 Posts
0 Online
100 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed