PRACTICE AREAS

Research Interview

CEO lessons from 10 years of observability & platform engineering

FEATURED GUESTS

Mirko Novakovic

CEO @ Dash0

The observability landscape has transformed dramatically over the past decade, driven by the rise of Kubernetes, microservices, and cloud-native architectures. Yet despite collecting exponentially more telemetry data, many organizations still struggle to identify and resolve incidents faster than they did years ago. The fundamental challenge isn't a lack of data - it's finding the signal in an ever-growing haystack of noise.

Main Insights

Product-led growth has become essential for developer tools, replacing traditional top-down enterprise sales approaches
OpenTelemetry standardization enables five-minute tool switching and solves the serverless observability problem
Modern applications generate trillions of traces daily, but 90% of collected data provides no actionable insight
AI agents will fundamentally reshape both how we build software and how we observe it in production

Mirko Novakovic brings a unique perspective to these challenges. As founder and former CEO of Instana (acquired by IBM), he built one of the first observability platforms designed for the Kubernetes era. Now leading Dash0, he's applying those hard-won lessons to create a new generation of observability tooling built for AI-driven development workflows and platform engineering teams.

The shift to product-led growth in observability

One of the most significant changes Novakovic observed over the past decade is how developers and platform teams evaluate and adopt observability tools. "The way customers buy software has changed in the developer space," he explains. "It's not anymore a top-down, I go to the CIO and sell an observability tool. Mostly it's a grassroots adoption."

This shift mirrors the consumerization of enterprise software. Developers and SREs now expect to sign up, test, and evaluate tools without talking to salespeople. They want transparent pricing, quick onboarding, and the ability to make informed decisions before engaging with vendors. "We would never, before we install an app on our iPhone, talk to a salesperson and do a contract to get that app installed," Novakovic notes.

This product-led growth motion has become table stakes for observability vendors. Platform teams need to evaluate tools quickly, often across multiple options, before committing to enterprise contracts. The vendors that make this process frictionless - with clear documentation, self-service trials, and fast time-to-value - win developer mindshare before the procurement conversation even begins.

OpenTelemetry as the great enabler

The emergence of OpenTelemetry as an industry standard has been transformative for both vendors and users. Before OpenTelemetry, every observability vendor created proprietary formats for traces, logs, and metrics. This created vendor lock-in and made it nearly impossible to get unified visibility across different components of modern applications.

"The problem is if you get more and more what I call serverless components," Novakovic explains. "You use stuff that you don't really own anymore. Could be a Lambda function on AWS, could be a Supabase database service, could be Cloudflare as your CDN, could be Vercel as your platform on the edge for your UI."

OpenTelemetry solves this by standardizing the data format and semantic conventions. Instead of each vendor using different tag names for the same concept (serverName, server, host_name), OpenTelemetry defines a single standard: host.name. This standardization means that telemetry from AWS Lambda, Vercel, your own services, and managed databases can all be correlated together seamlessly.

For platform teams, this translates to unprecedented flexibility. "Once you are OpenTelemetry, you can just switch tools like this," Novakovic says. "You can just point your OpenTelemetry collector to the right endpoint and you can literally get up and running in five minutes, even in a complicated environment."

The complexity explosion and cost crisis

While tooling has improved, the fundamental challenge has grown exponentially harder. Novakovic draws a stark contrast between applications 25 years ago and today. "When I started my career, you had an application server, WebLogic or WebSphere, and then you deployed your application on that thing and you had a database. That's it." Those architectures deployed twice a year. Today's reality is hundreds of microservices across multiple clouds and regions, deploying dozens of times per day with feature flags and progressive rollouts.

This complexity explosion has created a data crisis. "Today we are talking to customers who have trillions of traces per day. Trillions of traces, trillions of logs, trillions of metrics," Novakovic reveals. The haystack has grown 100x larger, making it harder to find the needle despite better analytics and AI.

This volume comes with eye-watering costs: reported deals and expenditures measured in tens to hundreds of millions of dollars for single customers underscore a hard truth - unrestricted collection doesn't scale financially or operationally.

The counterintuitive truth: Less data, more insight

Perhaps the most provocative insight is that more telemetry doesn't equal better observability. "Collecting more data doesn't mean that you have more insights," Novakovic states. "To be really honest, I see it every day. 90% of the data we collect is just bullshit. It's stuff you will never look at."

The structural problem is duplication and noise: millions of near-identical traces add storage and processing cost without increasing actionable signal. Novakovic proposes a practical alternative - intelligent sampling and dynamic instrumentation. "I could see an agent in the future saying, oh, I have an error. I will turn on more instrumentation for that part, collect the data so I understand it better, and then I turn it off again." That pattern preserves fidelity for anomalous behavior while minimizing routine volume.

AI agents as the new users of observability

AI coding agents are already reshaping development workflows. When code is generated at scale - sometimes tens of thousands of lines per change - traditional human review and static pre-deploy checks become impractical. The operational response will be deployment patterns that place observability at the center: feature flags, staged rollouts (0.1%, 1%, 5%, ...), and automated evaluation gates driven by telemetry.

Novakovic envisions agents making rollout decisions based on observability signals: "The agent will look at observability data. Is it running? Does it have errors? Is it slow? Does it scale?" Based on those metrics an agent can progressively increase traffic or roll back and feed failure context back into a coding agent for remediation. Observability will need APIs and programmatic interfaces optimized for agent consumption, not just dashboards for humans.

Platform teams as the buyers and gatekeepers

Platform engineering teams increasingly own observability as a platform capability. Rather than each team running its own stack, platform teams standardize instrumentation, collectors, retention policies, and guardrails. They become the nexus for vendor choice, secure defaults, and operational patterns that scale.

A new wrinkle is the rise of "citizen developers" - business users empowered by AI to build applications. This echoes the old shadow-IT problem: "Either your business users will have 500 lovable apps somewhere flying around, or you provide them with an environment where they can code and you have that in your architecture connected to your system inside of your observability and security guardrails," Novakovic warns. The practical choice for platform teams is to provide safe, self-service environments with built-in observability and security rather than try to block emergent behavior.

The convergence of security and observability

Novakovic sees observability and security converging - technology and market moves (large acquisitions and vendor strategy shifts) reflect this. Barriers today are often organizational: security owned by CISOs and observability by platform/engineering leaders. As automation and AI blur operational boundaries, unified workflows and tooling that span detection, telemetry, and automated remediation become more compelling.

Key takeaways

Embrace product-led growth for developer tools. Platform teams expect self-service trials, transparent pricing, and fast onboarding. Design for grassroots adoption inside your enterprise.
Leverage OpenTelemetry for flexibility and speed. Standardize instrumentation to retain vendor portability and correlate telemetry across managed and serverless components.
Focus on signal, not volume. Invest in intelligent sampling, dynamic instrumentation, and anomaly-triggered fidelity to reduce cost while improving time to resolution.
Prepare for AI agents as primary users. Build programmatic observability interfaces and deployment guardrails so agents can safely test, monitor, and iterate in production.

Get research, not marketing.

Subscribe for high quality research updates and analysis. No sales emails. No sponsored content. No noise.

Share this Interview

Advisory

weaveintelligence.io

Advisory

weaveintelligence.io

PRACTICE AREAS

SUBSCRIBE

Research Interview

CEO lessons from 10 years of observability & platform engineering