Mercedes F1 head of IT discusses data storage – and the woes of backup
Formula One (F1) is like nothing else in the sporting sphere, a brutal sport that leaves no prisoners.
It’s an increasingly demanding technical and human challenge that combines cutting-edge technologies and innovation, high performance management, and elite teamwork.
Recently I was invited to take a tour of the Mercedes-AMG Petronas’ team's stunning garage and trackside facilities at the British Grand Prix at Silverstone.
Mercedes-AMG Petronas is a team made up of nearly 1,500 employees within two technology campuses to design, develop, manufacture, and race the cars.
There’s no doubt that data is one of the most critical assets a business has, with the effective backup, management and leveraging of data directly leading to competitive advantage in practically every industry in the world.
Astonishingly, each car from the team has more than 250 sensors that generate up to 500GB every Grand Prix weekend, while the Brackley technology campus averages 45TB every week in a 97% virtualised estate.
It’s almost like a military operation, where teams have to constantly move and establish bases on trackside, construct data centers for collection and analysis, before packing up and moving onto the next base to do it all over again.
And this is where Mercedes-AMG Petronas found themselves in a bit of a pickle – specifically, the issue of backup.
Up until recently, its backup system was unwieldy, unreliable, and slow – and head of IT Matt Harris who has been with the team for 20 years in various IT roles had had enough.
“Over the years we've organically changed substantially. When I first joined the team, we had a small amount of Linux systems, with most of the systems being Windows. Back then that meant we had various different backup solutions to serve the client,” says Harris.
“About seven or eight years ago we got down to one system that could backup both environments. However, with this solution we had one of my team essentially changing tapes or managing backups full-time. Of course, the more clever the person is the better they could make our backup system work, but then you're wasting some of the most talented resources by giving them the most mundane job to do."
Harris says it was a job that no one wanted to do, and hence it was a job that wasn’t being done well.
“At the time around 20 percent of our backups were actually successful. You're talking about a part of IT that no one wants to care about. It’s probably the least sexy part of IT you could possibly think of because you don't care about it until something goes wrong,” says Harris.
“Buying storage was never a hard conversation, but it was always difficult to justify the extra to protect it. We wanted to look at this slightly differently to enable us to recover data at varying levels of recovery points, without something that required management all day, because it's just painful. A bit like what we had done with our storage a number of years ago, we decided to take a step back and determine that we couldn't keep doing this, our team is not big enough to manage backup. We need to make it something that just happens.”
In September 2017 Mercedes-AMG Petronas announced that it had chosen a new partner to solve its woes, but this came after a lengthy evaluation process.
“We spent around 6-9 months learning what was out there. A massive turning point for us was we met Rubrik, first off on a Webex conversation and we were so blown away with the conversation we had with them that we really didn't look at anything else from that point onwards. We quickly learnt with Rubrik that there was so much that we could do other than just business continuity solutions,” says Harris.
“We looked at seven different solutions, I don't want to start naming other products but you can think of the very well-known traditional backups. With Rubrik it was a different conversation and it just felt comfortable. At the end of the day whoever we chose had to be a part of our team and they couldn’t be a sort of arm that we would have to ring up with a difficult relationship.”
Harris says the team’s Pure Storage servers integrate with Rubrik nicely.
“The ability to do some very clever stuff around backup workloads where we can perform a backup with absolutely no hit on the server. Because Rubrik orchestrates a snapshot, backs it up, mounts it and then does the inline backup offline, it provides us with a point in time backup that we know is absolutely sweet with zero hit on the system so we haven't got that eight hour window where our servers are limping with the load of backing up,” says Harris.
“From the word go when we set it up there has probably been around an hour's worth of training for my team. It's very simple and there's not a lot that you need to understand. Every year the data that is produced in Formula 1 increases significantly, so we're just trying to make sure that in the future we can manage that growth of data.”
I then asked Harris if the team had ever experienced a major data outage – which made him wince.
“I don’t think I’ll ever forget this for as long as I live if I’m honest. In 2008 someone turned around and deleted the wrong volume because back then we had two volumes, one labelled Backup and the other labelled Live. And on that day they were actually named wrongly. You simply can't make that kind of mistake in an everyday business, and you definitely can’t in an F1 team,” says Harris.
“We had an Oracle database and a file backup completely out of sync so we had an Oracle database pointing at data that wasn't there, and data that was different to what the Oracle database was pointing at, and data that was here but not even in the database as someone had deleted it. It was just a nightmare. In those days we were fortunate that we did have backups, but they weren't in sync. The snapshot technology that we have now enables us to protect the Oracle database and the file systems and get them back into service promptly.”