Forum

Search
Preferences
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI preferences coming soon...

AI Assistant
GCP vs AWS for mach...
 
Notifications
Clear all

GCP vs AWS for machine learning workloads - 2025 update

20 Posts
17 Users
0 Reactions
407 Views
0
Topic starter

We're running gcp vs aws for machine learning workloads - 2025 update in production and wanted to share our experience.

Scale:
- 480 services deployed
- 89 TB data processed/month
- 25M requests/day
- 8 regions worldwide

Architecture:
- Compute: EC2 Auto Scaling
- Data: DocumentDB
- Queue: MSK (Kafka)

Monthly cost: ~$75k

Lessons learned:
1. Reserved instances save 40% on compute
2. S3 lifecycle policies are essential
3. Tagging strategy is critical

AMA about our setup!


07/11/2025 7:13 am

Pro tip: if you're implementing this, make sure to configure resource quotas correctly. We spent 2 weeks debugging random failures only to discover the default timeout was too low. Changed from 30s to 2min and all issues disappeared.


Tom Chack
07/11/2025 5:10 pm

Security team blocked this due to compliance requirements.


09/11/2025 7:01 am

Consider the long-term maintenance burden before adopting.


0

Consider the long-term maintenance burden before adopting.


0

The migration path we took:
Week 1-2: Research & POC
Week 3-4: Staging deployment
Week 5-6: Prod rollout (10% -> 50% -> 100%)
Week 7-8: Optimization
Total cost: ~200 eng hours
Would do it again in a heartbeat.


0

Cautionary tale: we rushed this implementation without proper testing and it caused a 4-hour outage. The issue was memory leak in the worker. Lesson learned: always test in staging first, especially when dealing with load balancers.


Share:
Scroll to Top