<?xml version="1.0" encoding="UTF-8"?>        <rss version="2.0"
             xmlns:atom="http://www.w3.org/2005/Atom"
             xmlns:dc="http://purl.org/dc/elements/1.1/"
             xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
             xmlns:admin="http://webns.net/mvcb/"
             xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <channel>
            <title>
									Projects We Have Done - OpsX DevOps Team Forum				            </title>
            <link>https://opsx.team/community/projects/</link>
            <description>OpsX DevOps Team Discussion Board</description>
            <language>en-US</language>
            <lastBuildDate>Tue, 07 Apr 2026 23:58:38 +0000</lastBuildDate>
            <generator>wpForo</generator>
            <ttl>60</ttl>
							                    <item>
                        <title>Update: MLOps: Building ML pipelines with Kubeflow and MLflow</title>
                        <link>https://opsx.team/community/projects/update-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-146/</link>
                        <pubDate>Wed, 26 Nov 2025 04:21:13 +0000</pubDate>
                        <description><![CDATA[I hear you, but here&#039;s where I disagree on the tooling choice. In our environment, we found that Datadog, PagerDuty, and Slack worked better because the human side of change management is of...]]></description>
                        <content:encoded><![CDATA[I hear you, but here's where I disagree on the tooling choice. In our environment, we found that Datadog, PagerDuty, and Slack worked better because the human side of change management is often harder than the technical implementation. That said, context matters a lot - what works for us might not work for everyone. The key is to focus on outcomes.

For context, we're using Elasticsearch, Fluentd, and Kibana.

I'd recommend checking out relevant blog posts for more details.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

For context, we're using Vault, AWS KMS, and SOPS.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Alexander Rodriguez</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/update-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-146/</guid>
                    </item>
				                    <item>
                        <title>Practical guide: Building a comprehensive observability stack with OpenTelemetry</title>
                        <link>https://opsx.team/community/projects/practical-guide-building-a-comprehensive-observability-stack-with-opentelemetry-289/</link>
                        <pubDate>Sat, 18 Oct 2025 20:21:13 +0000</pubDate>
                        <description><![CDATA[We experienced the same thing! Our takeaway was that we learned: Phase 1 (1 month) involved assessment and planning. Phase 2 (2 months) focused on team training. Phase 3 (1 month) was all ab...]]></description>
                        <content:encoded><![CDATA[We experienced the same thing! Our takeaway was that we learned: Phase 1 (1 month) involved assessment and planning. Phase 2 (2 months) focused on team training. Phase 3 (1 month) was all about knowledge sharing. Total investment was $100K but the payback period was only 6 months. Key success factors: good tooling, training, patience. If I could do it again, I would invest more in training.

I'd recommend checking out the community forums for more details.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

The end result was 40% cost savings on infrastructure.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

The end result was 60% improvement in developer productivity.

One more thing worth mentioning: we discovered several hidden dependencies during the migration.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Katherine Nelson</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/practical-guide-building-a-comprehensive-observability-stack-with-opentelemetry-289/</guid>
                    </item>
				                    <item>
                        <title>Follow-up: Implementing GitOps workflow with ArgoCD and Kubernetes</title>
                        <link>https://opsx.team/community/projects/follow-up-implementing-gitops-workflow-with-argocd-and-kubernetes-266/</link>
                        <pubDate>Fri, 01 Aug 2025 09:21:13 +0000</pubDate>
                        <description><![CDATA[Interesting points, but let me offer a counterargument on the timeline. In our environment, we found that Terraform, AWS CDK, and CloudFormation worked better because failure modes should be...]]></description>
                        <content:encoded><![CDATA[Interesting points, but let me offer a counterargument on the timeline. In our environment, we found that Terraform, AWS CDK, and CloudFormation worked better because failure modes should be designed for, not discovered in production. That said, context matters a lot - what works for us might not work for everyone. The key is to experiment and measure.

Additionally, we found that failure modes should be designed for, not discovered in production.

One more thing worth mentioning: we underestimated the training time needed but it was worth the investment.

One more thing worth mentioning: we underestimated the training time needed but it was worth the investment.

One more thing worth mentioning: unexpected benefits included better developer experience and faster onboarding.

The end result was 60% improvement in developer productivity.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Nicholas Morgan</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/follow-up-implementing-gitops-workflow-with-argocd-and-kubernetes-266/</guid>
                    </item>
				                    <item>
                        <title>Update: MLOps: Building ML pipelines with Kubeflow and MLflow</title>
                        <link>https://opsx.team/community/projects/update-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-160/</link>
                        <pubDate>Thu, 31 Jul 2025 18:21:13 +0000</pubDate>
                        <description><![CDATA[I&#039;d like to share our complete experience with this. We started about 3 months ago with a small pilot. Initial challenges included tool integration. The breakthrough came when we streamlined...]]></description>
                        <content:encoded><![CDATA[I'd like to share our complete experience with this. We started about 3 months ago with a small pilot. Initial challenges included tool integration. The breakthrough came when we streamlined the process. Key metrics improved: 50% reduction in deployment time. The team's feedback has been overwhelmingly positive, though we still have room for improvement in testing coverage. Lessons learned: automate everything. Next steps for us: improve documentation.

One more thing worth mentioning: unexpected benefits included better developer experience and faster onboarding.

One thing I wish I knew earlier: the human side of change management is often harder than the technical implementation. Would have saved us a lot of time.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Thomas Robinson</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/update-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-160/</guid>
                    </item>
				                    <item>
                        <title>Part 2: Best practices for Kubernetes pod security in production</title>
                        <link>https://opsx.team/community/projects/part-2-best-practices-for-kubernetes-pod-security-in-production-232/</link>
                        <pubDate>Mon, 14 Jul 2025 03:21:13 +0000</pubDate>
                        <description><![CDATA[We had a comparable situation on our project. The problem: scaling issues. Our initial approach was ad-hoc monitoring but that didn&#039;t work because it didn&#039;t scale. What actually worked: comp...]]></description>
                        <content:encoded><![CDATA[We had a comparable situation on our project. The problem: scaling issues. Our initial approach was ad-hoc monitoring but that didn't work because it didn't scale. What actually worked: compliance scanning in the CI pipeline. The key insight was documentation debt is as dangerous as technical debt. Now we're able to scale automatically.

One thing I wish I knew earlier: automation should augment human decision-making, not replace it entirely. Would have saved us a lot of time.

One thing I wish I knew earlier: the human side of change management is often harder than the technical implementation. Would have saved us a lot of time.

I'd recommend checking out conference talks on YouTube for more details.

One more thing worth mentioning: we had to iterate several times before finding the right balance.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Donna Jimenez</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/part-2-best-practices-for-kubernetes-pod-security-in-production-232/</guid>
                    </item>
				                    <item>
                        <title>Practical guide: Terraform vs Pulumi: A comprehensive comparison for IaC</title>
                        <link>https://opsx.team/community/projects/practical-guide-terraform-vs-pulumi-a-comprehensive-comparison-for-iac-254/</link>
                        <pubDate>Tue, 20 May 2025 01:21:13 +0000</pubDate>
                        <description><![CDATA[Some implementation details worth sharing from our implementation. Architecture: serverless with Lambda. Tools used: Vault, AWS KMS, and SOPS. Configuration highlights: GitOps with ArgoCD ap...]]></description>
                        <content:encoded><![CDATA[Some implementation details worth sharing from our implementation. Architecture: serverless with Lambda. Tools used: Vault, AWS KMS, and SOPS. Configuration highlights: GitOps with ArgoCD apps. Performance benchmarks showed 50% latency reduction. Security considerations: secrets management with Vault. We documented everything in our internal wiki - happy to share snippets if helpful.

For context, we're using Terraform, AWS CDK, and CloudFormation.

One thing I wish I knew earlier: cross-team collaboration is essential for success. Would have saved us a lot of time.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

The end result was 70% reduction in incident MTTR.

The end result was 90% decrease in manual toil.

The end result was 99.9% availability, up from 99.5%.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>John Perez</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/practical-guide-terraform-vs-pulumi-a-comprehensive-comparison-for-iac-254/</guid>
                    </item>
				                    <item>
                        <title>Practical guide: Implementing AIOps for intelligent incident management</title>
                        <link>https://opsx.team/community/projects/practical-guide-implementing-aiops-for-intelligent-incident-management-245/</link>
                        <pubDate>Tue, 22 Apr 2025 09:21:13 +0000</pubDate>
                        <description><![CDATA[What we&#039;d suggest based on our work: 1) Automate everything possible 2) Monitor proactively 3) Review and iterate 4) Keep it simple. Common mistakes to avoid: over-engineering early. Resourc...]]></description>
                        <content:encoded><![CDATA[What we'd suggest based on our work: 1) Automate everything possible 2) Monitor proactively 3) Review and iterate 4) Keep it simple. Common mistakes to avoid: over-engineering early. Resources that helped us: Google SRE book. The most important thing is collaboration over tools.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.

Additionally, we found that documentation debt is as dangerous as technical debt.

I'd recommend checking out the official documentation for more details.

The end result was 3x increase in deployment frequency.

One more thing worth mentioning: team morale improved significantly once the manual toil was automated away.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>John Long</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/practical-guide-implementing-aiops-for-intelligent-incident-management-245/</guid>
                    </item>
				                    <item>
                        <title>Deep dive: On-call rotation best practices to prevent burnout</title>
                        <link>https://opsx.team/community/projects/deep-dive-on-call-rotation-best-practices-to-prevent-burnout-302/</link>
                        <pubDate>Thu, 03 Apr 2025 15:21:13 +0000</pubDate>
                        <description><![CDATA[Good point! We diverged a bit using Grafana, Loki, and Tempo. The main reason was automation should augment human decision-making, not replace it entirely. However, I can see how your method...]]></description>
                        <content:encoded><![CDATA[Good point! We diverged a bit using Grafana, Loki, and Tempo. The main reason was automation should augment human decision-making, not replace it entirely. However, I can see how your method would be better for regulated industries. Have you considered real-time dashboards for stakeholder visibility?

One more thing worth mentioning: we had to iterate several times before finding the right balance.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.

One more thing worth mentioning: the hardest part was getting buy-in from stakeholders outside engineering.

Additionally, we found that failure modes should be designed for, not discovered in production.

The end result was 40% cost savings on infrastructure.

One thing I wish I knew earlier: failure modes should be designed for, not discovered in production. Would have saved us a lot of time.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Laura Rivera</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/deep-dive-on-call-rotation-best-practices-to-prevent-burnout-302/</guid>
                    </item>
				                    <item>
                        <title>Follow-up: Comparing AWS, Azure, and GCP for enterprise workloads</title>
                        <link>https://opsx.team/community/projects/follow-up-comparing-aws-azure-and-gcp-for-enterprise-workloads-226/</link>
                        <pubDate>Sat, 15 Mar 2025 04:21:13 +0000</pubDate>
                        <description><![CDATA[Great post! We&#039;ve been doing this for about 7 months now and the results have been impressive. Our main learning was that automation should augment human decision-making, not replace it enti...]]></description>
                        <content:encoded><![CDATA[Great post! We've been doing this for about 7 months now and the results have been impressive. Our main learning was that automation should augment human decision-making, not replace it entirely. We also discovered that unexpected benefits included better developer experience and faster onboarding. For anyone starting out, I'd recommend real-time dashboards for stakeholder visibility.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Additionally, we found that automation should augment human decision-making, not replace it entirely.

The end result was 50% reduction in deployment time.

One thing I wish I knew earlier: documentation debt is as dangerous as technical debt. Would have saved us a lot of time.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Alexander Rodriguez</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/follow-up-comparing-aws-azure-and-gcp-for-enterprise-workloads-226/</guid>
                    </item>
				                    <item>
                        <title>Follow-up: MLOps: Building ML pipelines with Kubeflow and MLflow</title>
                        <link>https://opsx.team/community/projects/follow-up-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-178/</link>
                        <pubDate>Fri, 14 Mar 2025 17:21:13 +0000</pubDate>
                        <description><![CDATA[Practical advice from our team: 1) Document as you go 2) Monitor proactively 3) Share knowledge across teams 4) Measure what matters. Common mistakes to avoid: not measuring outcomes. Resour...]]></description>
                        <content:encoded><![CDATA[Practical advice from our team: 1) Document as you go 2) Monitor proactively 3) Share knowledge across teams 4) Measure what matters. Common mistakes to avoid: not measuring outcomes. Resources that helped us: Google SRE book. The most important thing is consistency over perfection.

Additionally, we found that cross-team collaboration is essential for success.

Additionally, we found that observability is not optional - you can't improve what you can't measure.

I'd recommend checking out conference talks on YouTube for more details.

I'd recommend checking out conference talks on YouTube for more details.

Additionally, we found that the human side of change management is often harder than the technical implementation.

For context, we're using Elasticsearch, Fluentd, and Kibana.]]></content:encoded>
						                            <category domain="https://opsx.team/community/projects/">Projects We Have Done</category>                        <dc:creator>Nancy Howard</dc:creator>
                        <guid isPermaLink="true">https://opsx.team/community/projects/follow-up-mlops-building-ml-pipelines-with-kubeflow-and-mlflow-178/</guid>
                    </item>
							        </channel>
        </rss>
		