How 5 companies got their developers to care about cloud costs

Credit: Dreamstime

Previously the purview of dedicated centres of excellence, or even exclusively the procurement and finance teams, cloud cost management is rapidly becoming a required skill for anyone who consumes cloud resources on a day-by-day basis—and that includes software developers.

The emerging approach for cloud-first organisations is to have a central team that can manage broad consumption issues, like using the cheapest possible infrastructure for the job and negotiating committed-use discounts with vendors, while responsibility for the cost of individual services is pushed out to engineering teams that are incentivised to run as cost-effectively as possible, without sacrificing business value.

“You need that central expertise but also engineers to understand what they are spending in the cloud. … You want them to feel empowered to do something about their spending and how it stacks up to the value they are driving,” said Eugene Khvostov, vice president of product and engineering at cost optimisation specialist Apptio.

“Every organisation is different and has different maturity levels and styles, but some of the more successful cases we have seen push that information to the edge and get engineers involved in that challenge, rather than issuing a mandate from on high.”

This can be a difficult shift to make, however, especially for organisations accustomed to lengthy procurement cycles and those that look to insulate their software developers from worrying about the total cost of their own services in a push for greater digital momentum. But now, as cloud costs continue to rise in the wake of the COVID-19 pandemic, the tide might just be turning.

Optimising costs, not just code: Introducing finops

In their 2020 O’Reilly book, Cloud FinOps, J.R. Storment and Mike Fuller explain that in the old world of procuring enterprise hardware, engineers and operations teams would have to think about the cost of infrastructure well in advance.

“Now, in the cloud, they can throw company dollars at the problem whenever extra capacity is required,” they wrote.

Although this has allowed for faster, more-effective development cycles, it also introduced a new set of considerations around the cost and business impact of those infrastructure choices. “At first, this feels foreign and at odds with the primary focus of shipping features. Then they quickly realise that cost is just another efficiency metric they can tune to positively impact the business,” they wrote.

A senior product manager for cost engineering at streaming giant Spotify, Janisa Anandamohan, wrote in a recent blog post, “we know engineers are natural optimisers when it comes to reliability, security, performance, etc. And now we’re telling them, ‘Hey, add costs into the mix.’”

While that optimisation piece is one part of the puzzle, the more significant change is how to bring together previously disconnected groups in engineering, finance, and beyond. This organisation-wide approach to proactively managing cloud costs is commonly known as finops.

As defined by Storment and Fuller, “finops brings financial accountability to the variable spend model of cloud. But that description merely hints at the outcome. The cultural change of running in cloud moves ownership of technology and financial decision-making out to the edges of the organisation.”

A cultural shift of this magnitude naturally equates to enterprise-scale challenges. Finding a way to get engineers to act was the most commonly cited finops challenge by respondents in the 2021 State of Finops report from the Linux Foundation-led FinOps Foundation, with 39 per cent admitting to struggling to gain broad buy-in from their engineers.

“One known finops challenge is to not only start the practice up, but to encourage and incentivise cloud users (like devs and engineers) to participate in cloud cost management,” the report said.

Here’s how five companies have gone about realigning their teams and incentivised engineers to take better care of their cloud costs.

Airbnb reins in spiralling cloud hosting costs

A few years ago, popular travel accommodation booking website Airbnb realised it had a big problem: Its monthly Amazon Web Services (AWS) cloud bills were growing faster than company revenue.

“We had a problem, but we lacked an in-depth understanding of how teams use AWS resources, and how planned architectural and infrastructure changes would impact our future AWS costs,” Airbnb engineers Jen Rice and Anna Matlin wrote in a company blog post.

However, given Airbnb’s “you build it, you run it” engineering philosophy, Rice and Matlin quickly realised that “adding significant friction for our engineers would be met with heavy resistance.” So the Airbnb engineers set out to build up the cost-attribution data required to start to show its data-driven developer community just how big a problem they were facing to gain some buy-in to finops.

At Airbnb, the approach to consumption attribution “was to give teams the necessary information to make appropriate tradeoffs between cost and other business drivers to maintain their spend within a certain growth threshold.

With visibility into cost drivers, we incentivise engineers to identify architectural design changes to reduce costs, and also identify potential cost headwinds,” Rice and Matlin wrote.

This shift brought with it a centralised cost-efficiency team, armed with “a birds-eye view of the entire Airbnb ecosystem,” they wrote, and tasked with finding significant cost-savings opportunities.

For example, Airbnb now leans heavily on AWS Savings Plan options, complete with “a set of prepared responses that move certain workloads on and off Savings Plan to keep utilisation healthy,” they wrote. This team is now supported by a set of AWS cost champions, who sit in all product development organisations to support at the local level.

The result of all of this effort has been a major organisation-wide shift. As Rice and Matlin wrote:

In addition to the various technical and organisational efforts to manage AWS costs, we saw a profound cultural change toward cost awareness and management. This shift was both top-down and grassroots. Leaders mentioned the company-wide cost goal during all-hands meetings. The finance team created a company-wide award for financial discipline, presented by the CFO, which recognised employees who had driven important cost-savings initiatives. In scrappy Airbnb style, the infrastructure organisation held a cost-savings hackathon that spawned a number of impactful efficiency projects. Engineers learn best practices from one another and discuss new savings opportunities in a Slack channel. Upon launch, the AWS Attribution Dashboard became the most viewed dashboard at Airbnb and has since remained in the top list. Seeing this cultural change, we are optimistic that the recent cost reductions Airbnb achieved are not a one-off, but rather a new muscle that we will only strengthen with time.

As a result, Airbnb saw a $63.5 million year-over-year decrease in hosting costs, which contributed to a 26 per cent decline in Airbnb’s cost of revenue in the nine months that ended in September 2020.

Sainsbury’s realigns engineering around cost accountability

Like many enterprises today, cloud investment at British retailer Sainsbury’s has been focused on building new features and digital capabilities for customers, which led to a rapid escalation in cloud service consumption. “Somewhere down the line, the operations team was trying to keep a lid on spend,” group CIO Phil Jordan told InfoWorld.

Now, following an intensive four-month change and training program throughout the COVID-19 pandemic, developers, operations, and product people are all part of what the retailer calls “engineering families,” which have full life-cycle accountability to the business.

This new operating model pushes end-to-end accountability for a product or service out to the engineering teams, including cost management, vulnerability management, risk management, and partner management, all without being overlooked by the now-disbanded Service Operations team.

Those teams are now directly incentivised in line with a new set of devops research and assessment (DORA) metrics—deployment frequency, mean lead time for changes, mean time to recover, and change failure rate—plus service performance, total cost of ownership, and development cadence.

Cost-management tools from vendor Apptio have been brought in to give engineering a more transparent view of their specific cost base, a tool Jordan said the company is placing “a lot of faith in to give those new teams full transparency of cost.”

Sainsbury’s piloted this new mode of working with the data engineering team throughout 2020, and “it was unequivocal that we demonstrated it drove efficiency, speed of delivery and colleague sentiment improved,” Jordan said.

Leave a Reply

Your email address will not be published. Required fields are marked *