Why software matters

Years ago I worked at a consulting company writing software for a German car manufacturer. This project was a multi-year endeavor and had all the bits and pieces in place to become the next generation legacy software that nobody wants to touch. We are talking about a multi-million investment and a 50 person team of engineers, business analysts, architects and managers.

We were essentially building a Java EE monolith with a Javascript-backed frontend. Back then, in 2017, Kubernetes was starting to emerge as an interesting option and we experimented around with deployments, but it didn’t affect the architecture decisions too much. We continued to build a humongous monolith and the project was only set to go live after the core functionality had been built.

About four years in, the team was finally ready to launch the first alpha to select customers. Yes, you read this right: we proceeded with development for a long period of time without ever testing that the architecture and resulting system could sustain the load. Internally we were doing Scrum, but only occasionally we did a hand over to fit the car manufacturer’s waterfall project planning.

I left the consulting company before the system went live. Inevitably the lack of experience in running the software on the developers part manifested in restart ops and no easy way out of the architectural mess.

There are two things I want you to take away from this story.

First: the customer in this case lacked insight into the mentioned operational issues. They didn’t understand the technical implications, nor was there much exchange regarding the software architecture for the entire system. Their only contribution were requirements in the form  of mockups and user stories. This left out the most important aspect: developing expertise in building, running and maintaining software systems.

The consulting company was only given short contracts with renewals after distinct milestones, which in theory kept the car manufacturer flexible. In reality however, they were effectively 100% dependent on the consultants. The consulting company was essentially run geared towards maximizing profit and so investing into employees was an afterthought. Thus the resulting software (even when following some best practices for Java) was subpar at best. I’m sure eventually they figured this all out, but it wasn’t baked in from the start.

Second: outsourcing software development almost always results in these problems surfacing. Shared responsibility means that essentially, when push comes to shove, nobody is going to be accountable. The consultants will justify the outcome  as they completed work according to specifications; the internal team on the car manufacturer side lacked technical expertise and insight to spot these systematic problems as they were arising.

With this anecdote capturing some of the dynamic between ‘build’ and ‘run’ teams, you catch a glimpse into the tug of war that Google embraces between development and corresponding SRE teams. The sole responsibility of an SRE team is to make the system operate and scale properly while working with the developers to improve it. If developers go overboard with feature velocity and breakages, SREs will hand back the pager as a last resort to shift accountability back. Funding for the SRE positions usually comes out of the software engineering team budget. If SREs aren’t able to help, the headcount can be defunded and handed back.

Software Maintainability

So to build good software, you need a sense of responsibility for the product. With external contractors this is hard to achieve, as they don’t have a stake in the company that eventually has to earn money using the software. Consultants will get their money and be happy. I’d probably go out on a limb here, but I think it’s highly likely that you will never get a refund in light of bad technical execution.

The cost of developing software is often the smaller part when looking at the lifecycle of a software system. More often, maintenance costs of operating that software over years or decades make up the larger part of total lifetime cost of ownership. Take a look at this article on software maintenance that claims that project development costs makeup only 10%, but long term maintenance is somewhere in the ballpark of 90% of total costs.

An external file that holds a picture, illustration, etc.
Object name is AIM-21-1-15-001.jpg
Development of Software maintenance costs as percentage of total cost [Floris and Harald, 2010]

Krzysztof Jackowski neatly summarized the work that needs to happen to make a mobile app work after initial development happened. Essentially, that is where the real challenge is and where previous investments or lack thereof will surface. If you pay attention to long term maintainability from the start, this will pay off later when you inevitably will have to change the architecture of your system.

Maintainability in the Automotive sector

Well you might think that eventually my former team got the hang of all the problems encountered and everything ran smoothly. Yes, that might be the case, but at what cost? Another important aspect to consider  is what the story implies for the mentality at the car manufacturer regarding software development. Allow me to interpret this a bit:

  • Software isn’t critical, thus we can outsource it.
  • Software development isn’t one of our strengths.
  • Even if it costs more, it’s easier/better to use consultants rather than investing into building knowledge within our company.
  • Long term maintainability / development isn’t a goal.

The automotive industry in particular has built up an extensive network of subcontractors and external suppliers for parts. They have perfected just-in-time delivery and limit their focus to the final assembly. And now they are trying to map this methodology on software development as well.

The mentality that software comes second and isn’t part of the core business can be observed in many German companies. IT often is considered a cost center and not something the entire company should embrace as part of its customer value proposition. This is short-sighted and has turned out to be a monumental mistake as one can observe in the lack of any European high tech company in the global top ten.

What Tesla is doing differently

With the Model S, Tesla led the way to the future by removing almost all buttons, knobs and irrelevant user interfaces in the car and replaced it with a touch screen. It reminded me a lot of the moment when the first Apple iPhone came out without a keyboard. As a blackberry fanboy back then, I obviously wasn’t impressed and called it blasphemy that Apple tried to impose on mobile users.

Apple took the hardware with a lot of complicated buttons and instead focused on making the software great!

Tesla’s model S was equally radical in this regard. A single big screen giving you all the knobs, buttons and other UI elements you need to hit the correct settings for your car to function as desired. Software is at the core of their product offering. They embraced the flexibility gained from providing regular software updates with non-trivial changes over the years (“feature drops”). Tesla has developed their car operating system since launching in 2012, has gathered experience and was one of the only companies that launched an autopilot through a software update.

Recently Mercedes-Benz announced that they would start developing MB.OS, their own operating system for the car of the future. They estimate that the first cars will be delivered in 2024/2025. Volkswagen is also running into trouble delivering their software with the new all electric ID.3 model. This shows that over the years car companies have missed the opportunity to innovate past the traditional “cockpit feeling” where every BMW driver needs to feel like they are driving a fighter jet.

Coming back to where software quality comes into play: cars are built to last decades. Software generally is not. Browsers now get updates every six weeks. We see rolling releases happening with Windows and gaining popularity on Linux. Not too long ago Java started to release a new version every six months. You absolutely need to consider the choices you make when you spin up a new project because long term maintenance will cost you. A lot.

I recently interviewed freelancers for a PHP project that I have developed over the last ten years. The ones I didn’t want to work with were those who were quick to suggest using any sort of framework and asked me why I hadn’t done so already. Most of them came at me with “why aren’t you using Symfony, it’s the industry gold standard”. My reply: “long term maintenance is more important”.

I started to develop that application back when PHP4/5 was still a thing. Ever since, I have ported that application up to PHP 7.4 without much effort. I haven’t used any big frameworks and as such the upgrade story always turned out to be rather simple. But imagine doing this for a “modern” nodejs application like BMW is embracing within the car. Whenever a big new nodejs release comes around the corner, I hope you don’t have to update a billion dependencies. I’ve been there. Done that. It. Is. Not. Fun.

Summary

First: buy a Tesla! Their software is 10 years ahead of anyone else in the car manufacturing industry. They have over-the-air updates at no extra cost and understand how to think “software first”! The competition will have a hard time catching up.

Make smart choices when you start your project. Consider long term maintenance as potentially the biggest cost factor in your project. Think about what kind of complexity you can afford and how the project will look like three, five and ten years from now. Bake flexibility into the application you are building, because your initial assumptions might rapidly change!

Published by

Kordian Bruck

I'm a TUM Computer Science Alumni. Pizza enthusiast. Passionate for SRE, beautiful Code and Club Mate. Currently working as an SRE at Google. Opinions and statements in this blog post are my own.