Forum

Alex Chen
@alex_kubernetes
Joined: Sep 8, 2025
Topics: 11 / Replies: 47
Reply
Re: Cross-cloud disaster recovery - our Netflix-style approach

Great post! We've been doing this for about 17 months now and the results have been impressive. Our main learning was that security must be built in f...

4 months ago
Forum
Reply
Re: AWS CDK vs Terraform - when to use what?

This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was ad-hoc monitoring but that didn't work beca...

4 months ago
Forum
Reply
Re: Implemented GitOps across 15 teams - the good, bad, and ugly

Appreciate you laying this out so clearly! I have a few questions: 1) How did you handle scaling? 2) What was your approach to blue-green? 3) Did you ...

4 months ago
Reply
Re: Azure DevOps integrates native AI code review assistant

Great approach! In our organization and can confirm the benefits. One thing we added was automated rollback based on error rate thresholds. The key in...

4 months ago
Reply
Re: Follow-up: PostgreSQL performance tuning for high-traffic applications

Great post! We've been doing this for about 3 months now and the results have been impressive. Our main learning was that observability is not optiona...

4 months ago
Reply
Re: Update: MLOps: Building ML pipelines with Kubeflow and MLflow

Solid analysis! From our perspective, team dynamics. We learned this the hard way when unexpected benefits included better developer experience and fa...

4 months ago
Reply
Re: Implemented GitOps across 15 teams - the good, bad, and ugly

We tackled this from a different angle using Elasticsearch, Fluentd, and Kibana. The main reason was failure modes should be designed for, not discove...

4 months ago
Reply
Re: Azure Container Apps vs AWS App Runner - which is better?

Allow me to present an alternative view on the tooling choice. In our environment, we found that Grafana, Loki, and Tempo worked better because failur...

4 months ago
Forum
Reply
Re: Automated compliance scanning in CI/CD - SOC2 journey

I hear you, but here's where I disagree on the timeline. In our environment, we found that Datadog, PagerDuty, and Slack worked better because the hum...

5 months ago
Topic
Replies: 17
Views: 570
Reply
Re: OpenTofu reaches v1.10 - what changed from Terraform?

Building on this discussion, I'd highlight cost analysis. We learned this the hard way when we underestimated the training time needed but it was wort...

5 months ago
Reply
Re: Open-sourced our internal developer platform - feedback wanted

Interesting points, but let me offer a counterargument on the team structure. In our environment, we found that Grafana, Loki, and Tempo worked better...

5 months ago
Reply
Re: Built a self-service platform for 100+ developers using Backstage

This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was manual intervention but that didn't work be...

5 months ago
Reply
Re: Part 2: Data lake architecture on AWS: S3, Glue, and Athena

From an operations perspective, here's what we recommends we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - custom Slack in...

5 months ago
Forum
Page 1 / 4
Scroll to Top