Case Study How Monday.com Built a Repeatable Playbook for Complex Performance Issues at scale At scale, understanding performance in a distributed system means seeing beyond metrics and into real production execution.
Runtime Monitoring: What to Measure When Everything Looks “Normal” Listen, I’ve lived through the kind of production disasters that give you gray hair. The ones that hurt the most didn’t even set off a…
Beyond Observability: Why AI Coding Agents Need Runtime Guardrails AI-generated code doesn’t fail in theory, it fails in production. What the Amazon dev4 incident revealed (QCon session by May Walter).
Best 5 Runtime Application Self-Protection (RASP) Tools If you’ve managed AppSec for a few years, you’ve seen firewalls fall short. A web application firewall (WAF) catches obvious perimeter threats, but it’s blind…
Case Study How ZoomInfo Identified and Eliminated 4am OOM Crashes with AI Every night at 4am, a scheduled cron job inside one of ZoomInfo’s services saturated the event loop.
8 Datadog Alternatives You Should Know About Look, let’s be totally blunt for a second; Datadog is a masterpiece. It has been the “nobody ever got fired for buying IBM” choice for…
The Best 10 AI SRE Tools in 2026 A few years ago, I was on a team that deployed to production every other week. We had a deployment day, always a Thursday, and…