PRACTICE AREAS

Research Interview

The call of duty: Platform engineering for 60m+ gamers

FEATURED GUESTS

When millions of gamers worldwide log in to play Call of Duty, they're relying on infrastructure that must handle unpredictable traffic spikes, constant DDoS attacks, and zero tolerance for downtime. Clark Polo, Director of Engineering Platform at Demonware, shares battle-tested strategies for building platforms that scale under extreme conditions and what platform engineers across industries can learn from their approach.

Main insights

Gaming platforms face uniquely unpredictable traffic patterns that can exceed forecasts by orders of magnitude, requiring rigorous capacity planning and contingency strategies
Internal Developer Portals (IDPs) combined with data-driven work categorization unlock significant productivity gains by reducing time-to-information and enabling evidence-based decisions
Measuring and categorizing work types (KTLO, feature development, tech debt) reveals hidden capacity constraints and enables teams to optimize their allocation strategically
Platform engineering success depends less on headcount and more on fixing how teams actually work, supported by accurate data and proper tooling

Clark Polo leads platform engineering at Demonware, the Activision Blizzard studio responsible for online services across the Call of Duty franchise and other major titles. With decades of collective team experience managing infrastructure for tens of millions of concurrent players, Demonware has developed battle-tested approaches to platform reliability, capacity planning, and developer productivity.

The unique infrastructure challenges of gaming at scale

Gaming platforms operate under constraints that differ significantly from traditional enterprise environments. As Clark explains, "We can have a lot of past data that can infer to a forecast that says we may expect X amount of traffic for this new game and then the launch will come out and it might be a lot bigger than that or a lot less."

This unpredictability creates cascading challenges across the platform stack. If capacity planning misses the mark, the consequences are immediate and visible: players can't connect, matches fail to start, and social media erupts with complaints. Unlike many enterprise applications where traffic patterns follow predictable business cycles, game launches and seasonal events can generate traffic spikes that are difficult to model accurately.

The infrastructure complexity extends beyond simple capacity. Demonware manages a hybrid architecture spanning private data centers, public cloud providers, and dedicated game servers distributed globally. "We receive petabytes of data all the time," Clark notes. "We receive millions and millions of customers hitting our services every second."

Adding to these challenges is the constant threat landscape. "People are constantly trying to attack these games," Clark explains. DDoS (Distributed Denial of Service) attacks - where attackers flood services with requests to overwhelm them - are a persistent reality. "A lot of time, honestly, is just to try to bring the game down. Just the malicious action of, hey, let's DDoS this game."

The team has developed rigorous DDoS mitigation systems, but must also plan for less obvious failure scenarios. "What do you do if your cert manager just can't handle more load and that's where all your traffic flows?" Clark asks. "There's those scenarios where you're like, okay, did we think about that?"

The data-driven case for internal developer portals

Clark has become a vocal advocate for Internal Developer Portals (IDPs), going so far as to build a demonstration portal using Claude to help stakeholders visualize the concept. His motivation stems from a fundamental problem: time-to-information.

"You got to meet with people, you got to talk with people, you got to look through documents, you got to remember conversations and sometimes time to information is just grueling," he explains. An IDP addresses this by providing a single source of truth for services, ownership, health metrics, and organizational structure.

But Clark's vision extends beyond a simple service catalog. He sees the convergence of IDPs and AI as transformative for platform engineering. "I don't expect people to look through this and understand all the ins and outs and all the data. But if you have an AI that can look through the data and look at anomalies and look at all these things and create just very quick summaries about what's going on, I do think there's a lot of power in that."

The foundation for this vision is data quality. "If you don't have the proper underlying systems that infer the interface, the IDP, then it doesn't matter. It just looks pretty, has all the bells and whistles, but it doesn't really inform anything useful," Clark emphasizes. An IDP must connect to accurate data sources - JIRA, ServiceNow, monitoring systems - and those connections must be validated continuously.

Reframing KTLO: The data that drives better decisions

One of Clark's most impactful initiatives has been introducing rigorous work categorization, particularly around KTLO (Keep The Lights On) work. This category - also called tech ops or operations - includes maintenance, upgrades, incident response, and other activities required to keep services running.

"KTLO has become synonymous with non-value at work," Clark observes. "But how are you saying that keeping your business running is not valuable?" This mindset creates a problematic dynamic where engineers resist operational work in favor of feature development, even though operational work is essential.

Clark's approach reframes the conversation: "Keeping the lights on for your business is very valuable. It's a core part of your role. It's a non-negotiable and it must be held to that same standard of feature development."

The real power comes from measuring work categories systematically. When Clark introduced KTLO measurement for one team, the results were striking: "We found out that that team was 80 to 90% KTLO driven. That is not healthy. No wonder why the team feels miserable because they're working on KTLO all the time."

With this data in hand, the team could make targeted improvements. "After a couple months of just measuring and understanding, we were able to reduce it down to 40%. And then that rest of the percentage was able to be focused on other areas. Now the team is in a much healthier place."

This approach reveals a critical insight: "I tell people it's rarely a head count issue. Sometimes, yes, of course it is, but a lot of times it's just, if I gave you five more people, you're going to be in the same problem next year because you haven't fixed the way you actually work."

The 80/20 rule in platform reliability

Clark references the 80/20 rule frequently when discussing platform planning: "If you plan 80%, the 20% execution should be the easy part." This principle guides Demonware's approach to launch readiness and incident management.

The team has developed extensive contingency planning based on lessons from past incidents. "DemonWare has become so good at contingencies because of a lot of the incidents we've occurred in the past," Clark notes. This institutional knowledge, combined with systematic planning, helps the team anticipate and prepare for failure scenarios.

However, Clark acknowledges the challenge of balancing forward-looking improvements with operational demands. "The demand for new features and new games and new services, that doesn't stop. So the question really becomes, do we actually have the luxury to go back and re-architect and add on new features if it's still working?"

This tension is familiar to platform teams across industries. The pressure to deliver new capabilities competes with the need to modernize architecture, reduce technical debt, and adopt new practices. Clark's response is pragmatic: focus on areas where new approaches deliver clear value, like the IDP initiative, while maintaining operational excellence in proven systems.

The road analogy: Platform as infrastructure

Clark offers a compelling analogy for platform engineering's role: "Platform engineering is analogous to a road. If you have a developer that's driving a really nice car, pick your poison, I'll pick a Porsche - it doesn't matter how fast the Porsche is, if the road is full of debris, if it's full of potholes, it doesn't matter."

He extends the analogy to capture platform complexity: "You can start saying the nuances of, yeah, but what if a lot of cars are driving? Then you start introducing things like stop lights, stop signs. You start introducing lanes. You start introducing maybe an express lane. And then you have maybe your police who are your SREs who kind of make sure everyone's kept in check."

This framing emphasizes that platform engineering exists to enable developers to move faster and safer. The platform team's job is to clear obstacles, provide structure, and maintain order - not to build the fastest car or dictate the destination.

Key takeaways

Measure before you optimize: Without data on how teams actually spend their time (KTLO vs. feature work vs. tech debt), you cannot make informed decisions about capacity, hiring, or process improvements. Start by categorizing and measuring work systematically to identify bottlenecks and optimize team allocation.
IDPs are data platforms first, interfaces second: The value of an Internal Developer Portal depends entirely on the quality and accuracy of underlying data. Invest in data pipelines, validation, and integration before focusing on the user interface. Consider how AI agents might leverage this data layer to provide intelligent insights and summaries.
Reframe operational work as valuable: KTLO and operational work must be treated as essential and valuable, not as a distraction from "real" development. Teams drowning in operational toil need process improvements and automation, not just more headcount. Use data to demonstrate when KTLO work becomes unhealthy and take action to rebalance.
Plan for unpredictability with contingencies: In high-scale, high-stakes environments, rigorous contingency planning based on past incidents is essential. The 80/20 rule applies: invest heavily in planning and preparation so execution becomes the easier part. Build institutional knowledge around failure scenarios and maintain it systematically.

Get research, not marketing.

Subscribe for high quality research updates and analysis. No sales emails. No sponsored content. No noise.

Share this Interview

Advisory

weaveintelligence.io

Advisory

weaveintelligence.io

About

Analysts

Research

PRACTICE AREAS

SUBSCRIBE

Research Interview

The call of duty: Platform engineering for 60m+ gamers