Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Monitoring stack co...
 
Notifications
Clear all

Monitoring stack comparison: Prometheus vs Datadog vs New Relic

21 Posts
20 Users
0 Reactions
52 Views
(@matthew.ross327)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

We encountered something similar. The key factor was security considerations. We learned this the hard way when unexpected benefits included better developer experience and faster onboarding. Now we always make sure to document in runbooks. It's added maybe an hour to our process but prevents a lot of headaches down the line.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.

I'd recommend checking out the community forums for more details.

The end result was 70% reduction in incident MTTR.


 
Posted : 02/12/2025 4:13 pm
(@alexander.rodriguez755)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Adding some engineering details from our implementation. Architecture: microservices on Kubernetes. Tools used: Istio, Linkerd, and Envoy. Configuration highlights: IaC with Terraform modules. Performance benchmarks showed 50% latency reduction. Security considerations: secrets management with Vault. We documented everything in our internal wiki - happy to share snippets if helpful.

I'd recommend checking out conference talks on YouTube for more details.

For context, we're using Datadog, PagerDuty, and Slack.


 
Posted : 04/12/2025 1:08 pm
(@david.johnson369)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Thanks for this! We're beginning our evaluation ofg this approach. Could you elaborate on success metrics? Specifically, I'm curious about stakeholder communication. Also, how long did the initial implementation take? Any gotchas we should watch out for?

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Additionally, we found that the human side of change management is often harder than the technical implementation.

I'd recommend checking out the community forums for more details.


 
Posted : 07/12/2025 10:43 am
 Paul
(@paul)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

From an implementation perspective, here are the key points. First, data residency. Second, failover strategy. Third, performance tuning. We spent significant time on documentation and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 50% latency reduction.

For context, we're using Datadog, PagerDuty, and Slack.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.

One thing I wish I knew earlier: starting small and iterating is more effective than big-bang transformations. Would have saved us a lot of time.


 
Posted : 10/12/2025 7:40 pm
(@christopher.bennett288)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Love this! In our organization and can confirm the benefits. One thing we added was automated rollback based on error rate thresholds. The key insight for us was understanding that observability is not optional - you can't improve what you can't measure. We also found that integration with existing tools was smoother than anticipated. Happy to share more details if anyone is interested.

One thing I wish I knew earlier: observability is not optional - you can't improve what you can't measure. Would have saved us a lot of time.


 
Posted : 12/12/2025 5:46 pm
(@victoria.robinson772)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Our recommended approach: 1) Test in production-like environments 2) Implement circuit breakers 3) Practice incident response 4) Measure what matters. Common mistakes to avoid: skipping documentation. Resources that helped us: Team Topologies. The most important thing is collaboration over tools.

I'd recommend checking out the community forums for more details.

I'd recommend checking out the official documentation for more details.

I'd recommend checking out the community forums for more details.


 
Posted : 18/12/2025 2:31 am
Page 2 / 2
Share:
Scroll to Top