Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Monitoring stack co...
 
Notifications
Clear all

Monitoring stack comparison: Prometheus vs Datadog vs New Relic

17 Posts
15 Users
0 Reactions
221 Views
(@tyler.robinson235)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

We saw this same issue! Symptoms: frequent timeouts. Root cause analysis revealed network misconfiguration. Fix: corrected routing rules. Prevention measures: better monitoring. Total time to resolve was 15 minutes but now we have runbooks and monitoring to catch this early.

Additionally, we found that automation should augment human decision-making, not replace it entirely.

For context, we're using Grafana, Loki, and Tempo.

The end result was 99.9% availability, up from 99.5%.

Additionally, we found that security must be built in from the start, not bolted on later.


 
Posted : 22/10/2025 6:31 am
(@opsx-tom)
Posts: 76
Member Admin
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Some practical ops guidance that might helps we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - PagerDuty with intelligent routing. Documentation - GitBook for public docs. Training - certification programs. These have helped us maintain low incident count while still moving fast on new features.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Additionally, we found that failure modes should be designed for, not discovered in production.


 
Posted : 24/10/2025 11:05 pm
Page 2 / 2
Share:
Scroll to Top