Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Follow-up: Best pra...
 
Notifications
Clear all

Follow-up: Best practices for Kubernetes pod security in production

20 Posts
17 Users
0 Reactions
97 Views
(@michelle.ross286)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great post! We've been doing this for about 17 months now and the results have been impressive. Our main learning was that observability is not optional - you can't improve what you can't measure. We also discovered that the hardest part was getting buy-in from stakeholders outside engineering. For anyone starting out, I'd recommend integration with our incident management system.

For context, we're using Kubernetes, Helm, ArgoCD, and Prometheus.

I'd recommend checking out relevant blog posts for more details.


 
Posted : 23/11/2025 3:24 am
(@matthew.ross327)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

On the operational side, some thoughtss we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - custom Slack integration. Documentation - Notion for team wikis. Training - pairing sessions. These have helped us maintain low incident count while still moving fast on new features.

Additionally, we found that failure modes should be designed for, not discovered in production.

For context, we're using Jenkins, GitHub Actions, and Docker.

The end result was 90% decrease in manual toil.

One more thing worth mentioning: the initial investment was higher than expected, but the long-term benefits exceeded our projections.

One thing I wish I knew earlier: observability is not optional - you can't improve what you can't measure. Would have saved us a lot of time.

I'd recommend checking out conference talks on YouTube for more details.

One more thing worth mentioning: integration with existing tools was smoother than anticipated.

Additionally, we found that automation should augment human decision-making, not replace it entirely.


 
Posted : 24/11/2025 9:06 am
(@gregory.ortiz371)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Our data supports this. We found that the most important factor was failure modes should be designed for, not discovered in production. We initially struggled with scaling issues but found that feature flags for gradual rollouts worked well. The ROI has been significant - we've seen 70% improvement.

One more thing worth mentioning: unexpected benefits included better developer experience and faster onboarding.

One more thing worth mentioning: the initial investment was higher than expected, but the long-term benefits exceeded our projections.

The end result was 80% reduction in security vulnerabilities.

The end result was 3x increase in deployment frequency.

Additionally, we found that the human side of change management is often harder than the technical implementation.

For context, we're using Terraform, AWS CDK, and CloudFormation.

One more thing worth mentioning: we had to iterate several times before finding the right balance.

One thing I wish I knew earlier: starting small and iterating is more effective than big-bang transformations. Would have saved us a lot of time.


 
Posted : 26/11/2025 8:11 am
(@william.smith189)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Technical perspective from our implementation. Architecture: serverless with Lambda. Tools used: Istio, Linkerd, and Envoy. Configuration highlights: GitOps with ArgoCD apps. Performance benchmarks showed 3x throughput improvement. Security considerations: secrets management with Vault. We documented everything in our internal wiki - happy to share snippets if helpful.

I'd recommend checking out conference talks on YouTube for more details.

I'd recommend checking out relevant blog posts for more details.

Additionally, we found that failure modes should be designed for, not discovered in production.

For context, we're using Grafana, Loki, and Tempo.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

I'd recommend checking out the official documentation for more details.

For context, we're using Jenkins, GitHub Actions, and Docker.

I'd recommend checking out conference talks on YouTube for more details.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.


 
Posted : 27/11/2025 4:34 am
(@christine.carter463)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Key takeaways from our implementation: 1) Automate everything possible 2) Implement circuit breakers 3) Practice incident response 4) Build for failure. Common mistakes to avoid: skipping documentation. Resources that helped us: Team Topologies. The most important thing is learning over blame.

Additionally, we found that the human side of change management is often harder than the technical implementation.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.


 
Posted : 27/11/2025 10:22 pm
Page 2 / 2
Share:
Scroll to Top