Data center and application sustainability with Rich Kenny from Interact (interview)

Mar 08th, 2024 | 22 min read

This interview is part of the simplyblock Cloud Commute Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , Pandora , Samsung Podcasts, and our show site .

In this installment , we’re talking to Rich Kenny from Interact , an environmental consultancy company, about how their machine-learning based technology helps customers to minimize their carbon footprint, as well as optimizing infrastructure cost. He sheds light on their innovative approach to optimize data center performance for sustainability.

Chris Engelbert: Hello, folks! Great to have you here for our first episode of the Cloud Commute Podcast by simplyblock. I’m really happy to have our first guest Richard. Who’s really interesting. He’s done a lot of things, and he’s going to talk about that in a second. But apart from that, you can expect a new episode every week from now on. So with that. Thank you, Richard, for being here. Really happy to have you on board, and maybe just start with a short introduction of yourself.

Rich Kenny: Yeah, cool. So my name’s Rich Kenny. I’m the managing director of Interact. We’re a machine learning based environmental consultancy that specializes in circular economy. And I’m also a visiting lecturer and research fellow at London South Bank University, in the School of engineering. So a bit of business, a bit of academia, a bit of research. I know a few things about a few things.

Chris Engelbert: You know a few things about it, a few things. That’s always better than most people.

Rich Kenny: Certainly better than knowing nothing about a lot of things.

Chris Engelbert: That’s fair. I think it’s good to know what you don’t know. That’s the important thing. Right? So you said you’re doing a little bit of university work, but you also have a company doing sustainability through AI management. Can you? Can you go and elaborate a little bit on that?

Rich Kenny: Yeah. So we’ve got a product that looks at the performance of enterprise IT, so it’s servers, storage, networking. It’s got the world’s largest data set behind it, and some very advanced mathematical models and energy calculations and basically allows us to look at data, center hardware and make really really good recommendations for lower carbon compute, reconfiguration of assets, product life extension, basically lets us holistically look at the it performance of an estate, and then apply very advanced techniques to reduce that output. So, saving cost of energy and carbon to do the same work better. We’ve done about 400 data centers now, in the last 3 years, and we saw an average of about 70% energy reduction, which is also quite often a 70% carbon reduction in a lot of cases as well from a scope two point of view. There’s nothing like you on the market at the moment, and we’ve been doing this, as a business, for probably 3.5 or 4 years, and as a research project for the better part of 7 years.

Chris Engelbert: So, how do I have to think about that? Is it like a web UI that shows you how much energy is being used and you can zoom right into a specific server and that would give you a recommendation like, I don’t know, exchange the graphics card or storage, or whatever.

Rich Kenny: So specifically it looks at the configuration and what work it’s capable of doing. So, every time you have a variation of configuration of a server it is more or less efficient. It does more or less work per watt . So what we do is we apply a massive machine learning dataset to any make model generation configuration of any type of server, and we tell you how much work it can do, how effectively it can do it. What the utilization pathway looks like. So it’s really great to be able to apply that to existing data center architecture. Once you’ve got the utilization and the config and say you could do the same work you’re doing with 2,000 servers in this way, with 150 servers in this way. And this is how much energy that would use, how much carbon that will generate, and how much work it will do. And we can do things like carbon shifting scenarios. So we can take a service application, say a CRM, that’s in 20 data centers across a 1000 machines, using fractional parts of it and say, this service is using X amount of carbon costing this much energy. So basically, your CRM is costing X to run from an energy and carbon point of view. And you could consolidate that to Z, for example. So the ability to look at service level application level and system level data and then serve that service more efficiently. So we’re not talking about sort of rewriting the application, because that’s one step low down the stack. We’re talking about how you do the same work more efficiently and more effectively by looking at the hardware itself and the actual, physical asset. And it’s a massive, low hanging fruit, because no one’s ever done this before. So, it is not unusual to see consolidation options of 60+% of just waste material. A lot of it is doing the same work more effectively and efficiently. And that drives huge sustainability based outcomes, because you’re just removing stuff you don’t need. The transparency bit is really important, because quite often you don’t know what your server can do or how it does it, like, I bought this, it’s great, it’s new, and it must be really really effective. But the actual individual configuration, the interplay between the CPU, RAM, and the storage determines actually how good it is at doing its job, and how much bang you get for your buck and you can see, you know, intergenerational variance of 300%. Like, you know, we’ve got the L360, all the L360s are pretty much the same of this generation. But it is not. There’s like a 300% variance depending on how you actually build the build of materials.

Chris Engelbert: Alright! So I think it sounds like, if it does things more efficiently, it’s not only about carbon footprint, it’s also about cost savings, right? So I guess that’s something that is really interesting for your customers, for the enterprise is buying that?

Rich Kenny: Yes absolutely. It’s the first time they’re saving money while working towards sustainability outcomes other than what you would do in cloud for, like GreenOps, where, realistically, you’re doing financial operations and saying, I’m gonna reduce carbon, but realistically, I’m reducing compute, reducing wastage, or removing stranded applications. We’re doing the exact same thing on the hardware level and going “how do you do the same work efficiently rather than just doing it?” And so you’re going to get huge cost savings in the millions. You get thousands of tons of carbon reduction, and none of it has an impact on your business, because you’re just eradicating waste.

Chris Engelbert: Right? So that means your customers are mostly the data center providers?

Rich Keynn: Oh no, it’s mostly primary enterprise, truth be told, because the majority of data centers operate as a colo or hyper scale. Realistically, people have got 10 other people’s co-located facilities. The colors [editor: colocation operator] are facility managers. They’re not IT specialists. They’re not experts in computers. They’re experts in providing a good environment for that computer. Which is why all the efficiency metrics geared towards the data center have historically been around buildings. Since it’s been about “how do we build efficiently? How do we cool efficiently? How do we reduce heat, density?” All this sort of stuff. None of that addresses the question “why is the building there?” The building’s there to serve, storage and compute. And every colocation operator washes their hands of that, and goes “it’s not our service. Someone else is renting the space. We’re just providing the space.” So you have this real unusual gap, which you don’t see in many businesses where the supplier has a much higher level of knowledge than the owner. So when you’re talking to someone saying “I think you should buy this server,” the manufacturer tells you what to buy, and the colo tells you where to put it, but in-between that, it’s the IT professional, who really has no control over the situation—something an ITAM guide can help clarify by showing how assets should be managed more effectively. The IT provider doesn’t tell me how good it is and the colo doesn’t tell me how to effectively run it. So what I get is my asset and I give it to someone else to manage, meaning, what you get is this perfect storm of nobody really trying to serve it better, and that’s what we do. We come in and let you know ”there’s this huge amount of waste here.”

Chris Engelbert: Yeah, that makes sense. So it’s the people or the companies that co-locate their hardware in a data center.

Rich Kenny: Right, or running their own data centers on premise, running their own server rooms, or cabinets. We do work sometimes with people that have got as few as 8 servers. And we might recommend changing the RAM configuration, switching out CPUs. Things like that can have 20, 30, 40% benefits, but cost almost nothing. So it could be that we see a small server estate that’s very low utilized, but massively over-provisioned on RAM. Mostly because someone, some day, 10 years ago, bought a server and went “stick 2 TB in it.” And we’d ask, “how much are you using?” With the answer: “200 gigs.” “So you’ve got 10 times more RAM than you need, even at peak, can you just take out half your RAM, please.” It sounds really counterintuitive to take out that RAM and put it on the side. If you scale up again, you can just plug it back in again next week. But you know you’ve been using this for 8 to 10 years, and you haven’t needed anywhere near that. It’s just sitting there, drawing energy, doing nothing, providing no benefit, no speed, no improvement, no performance, just hogging energy. And we’d look at that and go “that’s not necessary.”

Chris Engelbert: Yeah, and I think because you brought up the example of RAM, most people will probably think that a little bit of extra RAM can’t be that much energy, but accumulated over a whole year it comes down to something.

Rich Kenny: Yeah, absolutely like RAM can be as much as 20 or 30% of the energy use of a server sometimes. From a configuration level. CPU is the main driver, of up to 65% of the energy use of a service. I mean, we’re talking non GPU-servers. When it gets to GPUs, we’re getting orders of magnitude. But RAM still uses up to 30% of the power on some of these servers. And if you’re only using 10% of that, you can literally eradicate almost 20% of the combined energy – just by decommissioning either certain aspects of that RAM or just removing it and putting it on the shelf until you need it next year, or the year after. The industry is so used to over-provisioning that they scale at day one, to give it scale at year five. It would be more sensible though to provision for year one and two, with an ability to upgrade, to grow with the organization. What you’ll see is that you’ll decrease your carbon energy footprint year on year, you won’t overpay month one for the asset, and then in year two you can buy some more RAM in year three you can buy some more RAM, and in year four you can change out the CPUs with a CPU you’re buying in year four. By the time you need to use it, you haven’t paid a 300% premium for buying the latest and greatest at day one. That said, it’s also about effective procurement. You know, you want 20 servers, that’s fine, but buy the servers you want for year one and year two, and then, year three, upgrade the components. Year four, upgrade. Year five, upgrade. You know, like incremental improvement. It means you’re not paying a really high sunk energy cost at year one. Also you’re saving on procurement cost, because you don’t buy it the second it’s new. Two years later it’s half the price. If you haven’t used it to its fullest potential in years one and two, you fundamentally get a 50% saving if you only buy it in year three. But nobody thinks like that. It’s more like “fire and forget.”

Chris Engelbert: Especially for CPUs. In three years time, you have quite some development. Maybe a new generation, same socket, lower TDP, something like that. Anyhow, you shocked me with the 30%. I think I have to look at my server in the basement.

Rich Kenny: It’s crazy. Especially now that we get persistent RAM, which actually doesn’t act like RAM, it more acts like store in some aspects and stores the data in the memory. That stuff is fairly energy intensive, because it’s sitting there, constantly using energy, even when the system isn’t doing anything. But realistically, yeah, your RAM is a relatively big energy user. We know, for every sort of degree of gigabytes, you’ve got an actual wattage figure. So it’s not inconsequential, and that’s a really easy one. That’s not exactly everything we look at, but there’s aspects of that.

Chris Engelbert: Alright, so we had CPUs, and we had RAM. You also mentioned graphics cards. I think if you have a server with a lot of graphic cards it’s obvious that it’ll use a lot of energy. You had RAM. Anything else that comes to mind? I think hard disk drives are probably worse than SSDs and NVMe drives.

Rich Kenny: Yeah, that’s an interesting one. So storage is a really fascinating one for me, because I think we’re moving back towards tape storage. As a carbon-efficient method of storage. And people always look at me and go “why would you say that?” Well, if you accept the fact that 60 to 70% of data is worthless, as in you may use it once but never again. That’s a pretty standard metric. I think it may be as high as 90%. I mean data that doesn’t get used again. However, 65% of the data will never get used. And what we have is loads of people moving that storage to the cloud and saying that they can now immediately access data whenever they want, but will never use or look at it again. So it sits there, on really high-available SSDs and I can retrieve this information I never want, instantly.

Well, the SSD wears over time. Every time you read or write, every time you pass information through it, it wears out a bit more. That’s just how flash memory works. HDDs have a much longer life cycle than SSDs, but lower performance. Your average hard drive uses around six watts an hour and an SSD uses four. So your thinking is “it is 34% more efficient to use SSDs.” And it is, except that there’s an embodied cost of the SSD. The creation of the SSD. Is 1015x higher than that of a hard drive. So if you’re storing data that you never use, no one’s ever using that six watts read and write. It just sits there with a really high sunk environmental cost until it runs out, and then you may be able to re use it. You might not. But realistically, you’re going to get through two or three life cycles of SSDs for every hard drive. If you never look at the data, it’s worthless. You’ve got no benefit there, but there’s a huge environmental cost for all materials and from a storage point of view. Consequently, take another great example. If you’ve got loads of storage on the cloud and you never read it, but you have to store it. Like medical data for a hundred years. Why are you storing that data on SSDs, for a hundred years, in the cloud and paying per gigabyte? You could literally save a million pounds worth of storage onto one tape and have someone like Iron Mountain run your archive as a service for you. You can say, if you need any data, they’ll retrieve it and pass it into your cloud instance. And there’s a really good company called Tes in the UK. Tes basically has this great archival system. And when I was talking to them, it really made sense of how we position systems of systems thinking. They run tape. So they take all your long term storage and put it on tape. But they give you an RCO of six hours. You just raise a ticket, telling them that you need the information on this patient, and they retrieve it, and put it into your cloud instance. You won’t have it immediately, but no one needs that data instantaneously. Anyhow, it’s sitting there on NVMe storage , which has a really high environmental energy cost, not to forget the financial cost, just to be readily available when you never need it. Consequently stick it in a vault on tape for 30 years and have someone bring it when you need it. You know you drop your cost by 99 times.

Chris Engelbert: That makes a lot of sense, especially with all data that needs to be stored for regulatory reasons or stuff like that. And I think some people kinda try to solve that or mitigate it a little bit by using some tearing technologies going from NVMe down to HDD, and eventually, maybe to something like S3, or even S3 Glacier. But I think that tape is still one step below that.

Rich Kenny: Yeah S3 Glacier storage. I heard a horror story of guys moving from S3 Glacier storage as an energy and cost saving mechanism, but not understanding that you pay per file, and not per terabyte or gigabyte. Ending with a cost of six figures to move the data over. Still they say, it’s going to save them three grand a year. But now the payback point is like 50 decades.

It’s like you don’t realize when you make these decisions. There’s a huge egress cost there, whereas how much would it have cost to take that data and just stick it onto a tape? 100? 200 quid. You know, you talk about significant cost savings and environmentally, you’re not looking after the systems. You’re not looking after the storage. You’re using an MSP to hold that storage for you, and then guarantee your retrieval within timescales you want. It’s a very clever business model that I think we need to revisit when tape is the best option, and for long term storage archival storage. From an energy point of view and a cost point of view, it’s very clever and sustainability wise. It’s a real win. So yeah. Tape as a service. It’s a thing. You heard it here first.

Chris Engelbert: So going from super old technology to a little bit newer stuff. What would drive sustainability in terms of new technologies? I hinted at a lower TDP for new CPUs. Probably the same goes for RAM. I think the chips get lower in wattage? Or watt-usage over time? Are there any other specific factors?

Rich Kenny: Yeah, I think the big one for me is the new DDR5 RAM is really good. It unlocks a lot of potential at CPU level, as in like the actual, most recent jump in efficiency is not coming from CPUs. Moore’s law slowed down in 2015. I still think it’s not hitting the level it was. But the next generation for us is ASICs based, as in applications specific interface chips. There’s not much further the CPU can go. We can still get some more juice out of it, but it’s not doubling every 2 years. So the CPU is not where it’s at. Whereas the ASICs is very much where it’s at now, like specific chips built for very specific functions. Just like Google’s TPUs. For example, they’re entirely geared towards encoding for Youtube. 100x more efficient than a CPU or a GPU at doing that task. We saw the rise of the asset through Bitcoin, right? Like specific mining assets. So I think specific chips are really good news, and new RAM is decent.

Additionally, the GPU wars is an interesting one for me, because we’ve got GPUs, but there’s no really definable benchmark for comparison of how good a GPU is, other than total work. So we have this thing where it’s like, how much total grunt do you have? But we don’t really have metrics of how much grunt per watt? GPUs have always been one of those things to power supercomputers with. So it does 1 million flops, and this many MIPS, and all the rest of it. But the question has to be “how good does it do it? How good is it doing its job?” It’s irrelevant how much total work it can do. So we need a rebalancing of that. That’s not there yet, but I think it will come soon, so we can understand what GPU specific functions are. The real big change for us is behavioral change. Now, I don’t think it’s technology. Understanding how we use our assets. Visualizing the use in terms of non economic measures. So basically being decent digital citizens, I think, is the next step. I don’t think it’s a technological revolution. I think it’s an ethical revolution. Where people are going to apply grown-up thinking to technology problems rather than expecting technology to solve every problem. So yeah, I mean, there are incremental changes. We’ve got some good stuff. But realistically, the next step of evolution is how we apply our human brains to solve technological problems rather than throw technology at problems and hope for the solution.

Chris Engelbert: I think it’s generally a really important thing that we try not to just throw technology at problems, or even worse, create technology in search of a problem all the time.

Rich Kenny: We’re a scale up business at Interact. We’re doing really, really well but we don’t act like a scale up. Last year I was mentoring some startup guys and some projects that we’ve been doing in the Uk. And 90% of people were applying technology to solve a problem that didn’t need solving. The question I would ask these people is “what does this do? What is this doing? And does the world need that?”

Well, it’s a problem. I feel like you’ve created a problem because you have the solution to a problem. It’s a bit like an automatic tin opener. Do we need a Diesel powered chainsaw tin opener to open tins? Or do we already kind of have tin openers? How far do we need to innovate before it’s fundamentally useless.

I think a lot of problems are like, “we’ve got AI, and we’ve got technology, so now we’ve got an app for that.” And it’s like, maybe we don’t need an app for that. Maybe we need to just look at the problem and go, “is it really a problem?” Have you solved something that didn’t need solving? And a lot of ingenuity and waste goes into solving problems that don’t exist. And then, conversely, there’s loads of stuff out there that solves really important problems. But they get lost in the mud, because they can’t articulate the problem it’s solving.

And in some cases you know, the ones that are winning are the ones that sound very attractive. I remember there was a med-tech one that was talking about stress management. And it was providing all these data points on what levels of stress you’re dealing with. And it’s really useful to know that I’m very stressed. But other than telling me all these psychological factors, I am feeling stressed. What? What is the solution on the product other than to give me data telling me that I’m really stressed? Well, there isn’t any. It doesn’t do anything. It just tells you that data. And it’s like, right? And now what? And then we can take that data. It’ll solve the problem later on. It’s like, no, you’re just creating a load of data to tell me things that I don’t really think has any benefit. If you’ve got the solution with this data, we can make this inference, we can, we can solve this problem that’s really useful. But actually, you’re just creating a load of data and going. And what do I do with that? And you go. Don’t know. It’s up to you. Okay, well, it tells me that it looks like I’m struggling today. Not really helpful. Do you know what I mean?

Chris Engelbert: Absolutely! Unfortunately, we’re out of time. I could chat about that for about another hour. You must have been so happy when the proof of work finally got removed from all the blockchain stuff. Anyway, thank you very much. It was very delightful.

I love chatting and just laughing, because you hear all the stories from people. Especially about things you normally are not part of, as with the RAM. Like I said, you completely shocked me with 30% up. Obviously, RAM takes some amount of energy. But I didn’t know that it takes that much.

Anyway, I hope that some other folks actually learned something, too. And apply the little bit of ethical bring thinking in the future. Whenever we create new startups, whenever we build new data centers, employ new hardware or and think about sustainability.

Rich Kenny: Thank you very much. Appreciate it.