AI Search

Classic Search

Search Phrase:

Search Type:

Advanced search options

Search in Forums:

Search in date period:

Sort Search Results by:

AI Assistant

Notifications

Clear all

Building a comprehensive observability stack with OpenTelemetry

Mark Perez · 2025-07-10T10:21:13Z

We've standardized on OpenTelemetry for our observability needs. The stack: OpenTelemetry collector for data ingestion, Jaeger for distributed tracing, Prometheus for metrics, and Grafana for visualization. The main benefit is vendor-neutral instrumentation - we can switch backends without changing code. Migration from proprietary solutions took 3 months. How are you handling observability in your organization?

✦ Summarize Topic

Page 2 / 2 Prev

Breaking News

Last Post by Deborah Cook 8 months ago

18 Posts

16 Users

0 Reactions

104 Views

RSS

Sharon Garcia

(@sharon.garcia321)

Posts: 0

Translate ▼

I'll walk you through our entire process with this. We started about 22 months ago with a small pilot. Initial challenges included tool integration. The breakthrough came when we improved observability. Key metrics improved: 90% decrease in manual toil. The team's feedback has been overwhelmingly positive, though we still have room for improvement in testing coverage. Lessons learned: communicate often. Next steps for us: expand to more teams.

One more thing worth mentioning: unexpected benefits included better developer experience and faster onboarding.

Posted : 28/07/2025 4:40 am

Gregory Brooks

(@gregory.brooks453)

Posts: 0

Translate ▼

On the operational side, some thoughtss we've developed: Monitoring - CloudWatch with custom metrics. Alerting - Opsgenie with escalation policies. Documentation - Confluence with templates. Training - pairing sessions. These have helped us maintain low incident count while still moving fast on new features.

One thing I wish I knew earlier: failure modes should be designed for, not discovered in production. Would have saved us a lot of time.

For context, we're using Datadog, PagerDuty, and Slack.

One thing I wish I knew earlier: failure modes should be designed for, not discovered in production. Would have saved us a lot of time.

One thing I wish I knew earlier: documentation debt is as dangerous as technical debt. Would have saved us a lot of time.

Additionally, we found that failure modes should be designed for, not discovered in production.

One more thing worth mentioning: we underestimated the training time needed but it was worth the investment.

Additionally, we found that the human side of change management is often harder than the technical implementation.

Posted : 28/07/2025 10:29 pm

Deborah Cook

(@deborah.cook920)

Posts: 0

Translate ▼

Thoughtful post - though I'd challenge one aspect on the metrics focus. In our environment, we found that Elasticsearch, Fluentd, and Kibana worked better because observability is not optional - you can't improve what you can't measure. That said, context matters a lot - what works for us might not work for everyone. The key is to invest in training.

For context, we're using Jenkins, GitHub Actions, and Docker.

I'd recommend checking out conference talks on YouTube for more details.

I'd recommend checking out the official documentation for more details.

Posted : 29/07/2025 12:35 am

Page 2 / 2 Prev

11 Forums
309 Topics
4,684 Posts
0 Online
109 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed