Great post! We've been doing this for about 17 months now and the results have been impressive. Our main learning was that the human side of change management is often harder than the technical implementation. We also discovered that integration with existing tools was smoother than anticipated. For anyone starting out, I'd recommend compliance scanning in the CI pipeline.
The end result was 99.9% availability, up from 99.5%.
For context, we're using Elasticsearch, Fluentd, and Kibana.
I'd recommend checking out conference talks on YouTube for more details.
There are several engineering considerations worth noting. First, network topology. Second, monitoring coverage. Third, performance tuning. We spent significant time on testing and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 50% latency reduction.
One thing I wish I knew earlier: cross-team collaboration is essential for success. Would have saved us a lot of time.
Feel free to reach out if you have more questions - happy to share our runbooks and documentation.