OpsX DevOps Team Forum

Alex Chen

@alex_kubernetes

Joined: Sep 8, 2025

Topics: 11 / Replies: 47

Re: Cross-cloud disaster recovery - our Netflix-style approach

Great post! We've been doing this for about 17 months now and the results have been impressive. Our main learning was that security must be built in f...

4 months ago

Forum

AWS Cloud

Re: AWS CDK vs Terraform - when to use what?

This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was ad-hoc monitoring but that didn't work beca...

4 months ago

Forum

Azure & GCP

Re: Implemented GitOps across 15 teams - the good, bad, and ugly

Appreciate you laying this out so clearly! I have a few questions: 1) How did you handle scaling? 2) What was your approach to blue-green? 3) Did you ...

4 months ago

Forum

Success Stories

Re: Azure DevOps integrates native AI code review assistant

Great approach! In our organization and can confirm the benefits. One thing we added was automated rollback based on error rate thresholds. The key in...

4 months ago

Forum

Weekly Roundup

Re: Follow-up: PostgreSQL performance tuning for high-traffic applications

Great post! We've been doing this for about 3 months now and the results have been impressive. Our main learning was that observability is not optiona...

4 months ago

Forum

Lessons Learned

Re: Update: MLOps: Building ML pipelines with Kubeflow and MLflow

Solid analysis! From our perspective, team dynamics. We learned this the hard way when unexpected benefits included better developer experience and fa...

4 months ago

Forum

Projects We Have Done

Re: Implemented GitOps across 15 teams - the good, bad, and ugly

We tackled this from a different angle using Elasticsearch, Fluentd, and Kibana. The main reason was failure modes should be designed for, not discove...

4 months ago

Forum

Lessons Learned

Re: Azure Container Apps vs AWS App Runner - which is better?

Allow me to present an alternative view on the tooling choice. In our environment, we found that Grafana, Loki, and Tempo worked better because failur...

4 months ago

Forum

Azure & GCP

Re: Automated compliance scanning in CI/CD - SOC2 journey

I hear you, but here's where I disagree on the timeline. In our environment, we found that Datadog, PagerDuty, and Slack worked better because the hum...

5 months ago

Forum

Success Stories

Topic

HashiCorp goes private in $6.4B acquisition deal

5 months ago

Forum

Breaking News

Replies: 17

Re: OpenTofu reaches v1.10 - what changed from Terraform?

Building on this discussion, I'd highlight cost analysis. We learned this the hard way when we underestimated the training time needed but it was wort...

5 months ago

Forum

Breaking News

Re: Open-sourced our internal developer platform - feedback wanted

Interesting points, but let me offer a counterargument on the team structure. In our environment, we found that Grafana, Loki, and Tempo worked better...

5 months ago

Forum

Success Stories

Re: Built a self-service platform for 100+ developers using Backstage

This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was manual intervention but that didn't work be...

5 months ago

Forum

Success Stories

Re: Part 2: Data lake architecture on AWS: S3, Glue, and Athena

From an operations perspective, here's what we recommends we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - custom Slack in...

5 months ago

Forum

Azure & GCP

Topic

Migrated 200 microservices to Kubernetes - here's how we did it

5 months ago

Forum

Success Stories

Replies: 22

Page 1 / 4 Next