AI Search

Classic Search

Search Phrase:

Search Type:

Advanced search options

Search in Forums:

Search in date period:

Sort Search Results by:

AI Assistant

Notifications

Clear all

AWS ECS Fargate vs EKS - cost analysis for production workloads

Frank Reyes · 2025-08-27T14:17:42Z

We're running aws ecs fargate vs eks - cost analysis for production workloads in production and wanted to share our experience. Scale: - 531 services deployed - 35 TB data processed/month - 35M requests/day - 3 regions worldwide Architecture: - Compute: App Runner - Data: RDS Aurora - Queue: MSK (Kafka) Monthly cost: ~$121k Lessons learned: 1. Spot instances are production-ready 2. NAT Gateways are costly 3. Autoscaling needs careful tuning AMA about our setup!

✦ Summarize Topic

Page 2 / 2 Prev

Azure & GCP

Last Post by Sharon Garcia 6 months ago

21 Posts

21 Users

0 Reactions

441 Views

RSS

Stephanie Long

(@stephanie.long568)

Posts: 0

Translate ▼

Great post! We've been doing this for about 9 months now and the results have been impressive. Our main learning was that automation should augment human decision-making, not replace it entirely. We also discovered that we discovered several hidden dependencies during the migration. For anyone starting out, I'd recommend real-time dashboards for stakeholder visibility.

The end result was 80% reduction in security vulnerabilities.

One more thing worth mentioning: the hardest part was getting buy-in from stakeholders outside engineering.

Posted : 06/10/2025 10:37 am

Paul

(@paul)

Posts: 0

Translate ▼

Happy to share technical details from our implementation. Architecture: microservices on Kubernetes. Tools used: Jenkins, GitHub Actions, and Docker. Configuration highlights: GitOps with ArgoCD apps. Performance benchmarks showed 50% latency reduction. Security considerations: container scanning in CI. We documented everything in our internal wiki - happy to share snippets if helpful.

Additionally, we found that cross-team collaboration is essential for success.

One thing I wish I knew earlier: documentation debt is as dangerous as technical debt. Would have saved us a lot of time.

Posted : 06/10/2025 10:56 am

Linda Morgan

(@linda.morgan757)

Posts: 0

Translate ▼

Some practical ops guidance that might helps we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - PagerDuty with intelligent routing. Documentation - GitBook for public docs. Training - certification programs. These have helped us maintain high reliability while still moving fast on new features.

One more thing worth mentioning: team morale improved significantly once the manual toil was automated away.

The end result was 80% reduction in security vulnerabilities.

Posted : 08/10/2025 7:26 am

Jennifer Bailey

(@jennifer.bailey132)

Posts: 0

Translate ▼

From beginning to end, here's what we did with this. We started about 14 months ago with a small pilot. Initial challenges included performance issues. The breakthrough came when we streamlined the process. Key metrics improved: 99.9% availability, up from 99.5%. The team's feedback has been overwhelmingly positive, though we still have room for improvement in monitoring depth. Lessons learned: measure everything. Next steps for us: add more automation.

The end result was 99.9% availability, up from 99.5%.

Posted : 15/10/2025 3:04 am

Kimberly James

(@kimberly.james491)

Posts: 0

Translate ▼

Architecturally, there are important trade-offs to consider. First, data residency. Second, failover strategy. Third, security hardening. We spent significant time on testing and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 50% latency reduction.

Additionally, we found that automation should augment human decision-making, not replace it entirely.

For context, we're using Vault, AWS KMS, and SOPS.

For context, we're using Grafana, Loki, and Tempo.

Posted : 18/10/2025 6:03 pm

Sharon Garcia

(@sharon.garcia321)

Posts: 0

Translate ▼

On the operational side, some thoughtss we've developed: Monitoring - Datadog APM and logs. Alerting - PagerDuty with intelligent routing. Documentation - Notion for team wikis. Training - certification programs. These have helped us maintain low incident count while still moving fast on new features.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Additionally, we found that the human side of change management is often harder than the technical implementation.

Posted : 21/10/2025 6:40 pm

Page 2 / 2 Prev

Forum Jump:

Next Topic

11 Forums
309 Topics
4,684 Posts
0 Online
109 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed