Ronen Dar on GPU Orchestration for Building ML Models
In this episode, our Andy Leonard and Frank La Vigne sit down with Ronen Dar, the co-founder and CTO of Run AI, to explore the world of artificial intelligence and GPU orchestration for machine learning models.
Ronen shares insights into the challenges of utilizing GPUs in AI research and how Run AI’s platform addresses these issues by optimizing GPU usage and providing tools for easier and faster model training and deployment. The conversation delves into the concept of fractional GPU usage, allowing multiple workloads to run on a single GPU, making expensive GPUs more accessible and cost-effective for organizations.
Links
- Run AI https://www.run.ai/
- Acquired Podcast Episode on Nvidia https://www.acquired.fm/episodes/nvidia-the-machine-learning-company-2006-2022
Show Notes
04:40 GPU technology enabled for cloud AI workloads.
07:00 RunAI enables sharing expensive GPU resources for all.
11:59 As enterprise AI matures, organizations become more savvy.
15:35 Deep learning, GPUs for speed, CPUs backup.
16:54 LLMs running on GPU’s, exploding in market.
23:29 NVIDIA created CUDA to simplify GPU use.
26:21 NVIDIA’s success lies in accessible technology.
28:25 Solve GPU hugging with quotas and sharing.
31:15 Team lead manages GPU quotas for researchers.
35:51 Rapid changes in business and innovation.
40:34 Passionate problem-solver with diverse tech background.
43:38 Thanks for tuning in, subscribe and review.
Transcript
Greetings, listeners. Welcome back to the Data
Speaker:Driven Podcast. I'm Bailey, your AI host with
Speaker:the most data, that is, bringing you insights from the ether
Speaker:with my signature wit. In today's episode, we're
Speaker:diving deep into the heart of artificial intelligence's engine room,
Speaker:GPU orchestration. It's the unsung hero
Speaker:of AI research, optimizing the raw power needed to fuel
Speaker:today's most advanced machine learning models. And
Speaker:who better to guide us through this labyrinth of computational complexity than
Speaker:Ronan Darr, the cofounder and CTO of Run AI, the
Speaker:company that's making GPU resources work smarter, not
Speaker:harder. Now onto the show.
Speaker:Hello, and welcome to Data Driven, the podcast where we For the emergent fields
Speaker:of artificial intelligence, data engineering, and overall data
Speaker:science and analytics. With me as always is my favoritest
Speaker:Data engineer in the world, Andy Leonard. How's it going, Andy? It's
Speaker:going well, Frank. How are you? I'm doing great. I'm doing great. It's been,
Speaker:we're We're recording this February 1, 2024. And as I said to my
Speaker:kids yesterday, January has been a long year.
Speaker:We're only, like, 1 month into the year, and it was it was a pretty
Speaker:wild ride. But I can tell we're gonna have a blast today,
Speaker:because we're gonna geek out on something that I kinda sort of understand,
Speaker:but not entirely, and it's GPUs. And in the virtual green room, were chit
Speaker:chatting with some folks, and, but let me do the formal introduction
Speaker:here. Today with us, we have doctor Ronadhar, cofounder and CTO
Speaker:of Run AI, A company at the forefront of GPU
Speaker:orchestration, and he has a distinguished career in technology.
Speaker:His experience includes significant roles at Apple. Yes, That
Speaker:apple. Bell Labs. Yes. That Bell Labs.
Speaker:And at Run AI, Ronan is instrumental in optimizing
Speaker:GPU usage For AI model training and deployment,
Speaker:leveraging his deep passion for both academia and startups.
Speaker:And, Run AI is a key player in the, and he is a he
Speaker:and Run AI are key player in the AI revolution. Ronan's
Speaker:contribute Contributions are pivotable in shaping and powering the
Speaker:future of artificial intelligence. Now I will add that in
Speaker:my day job at Red Hat, Run AI has come up a couple of times.
Speaker:So this is definitely, definitely
Speaker:an honor to have you on on on the show, sir. Welcome.
Speaker:Thank you, Frank. Thank you for inviting me. Hey, Andy. Good to
Speaker:be here. I love it. Love Reddit. We're a big
Speaker:fan of Reddit. We're working closely with many people in
Speaker:Reddit, and love that. Right? Love OpenShift,
Speaker:love Reddit, love Linux. Yeah. Cool. Cool.
Speaker:Yeah. So so for those who don't know exactly, I kinda know
Speaker:what, your Run AI does, but can you explain exactly
Speaker:What it is run AI does and why GPU
Speaker:orchestration is important. Yes.
Speaker:Okay.
Speaker:So run AI is, software,
Speaker:AI infrastructure platform. So we
Speaker:help machine learning teams to get much more
Speaker:out of their GPUs, And we provide
Speaker:those teams with abstraction layers and tools
Speaker:so they can train models And deploy models
Speaker:much easier, much faster. And
Speaker:so We started in 2018, 6 years
Speaker:ago. It's me and my cofounder, Omuri. Omuri is the CEO.
Speaker:He's, he's amazing. I love him. We We know each other for many
Speaker:years. We we met in the academia, like, more than 10 years ago,
Speaker:and and we started running AI together, and We started
Speaker:running AI because we saw that there are big challenges
Speaker:around, GPU's, around orchestrating
Speaker:GPU's and utilizing GPU's. We saw back then
Speaker:in 2018, the GPUs are going to be very very important.
Speaker:It's like the basic a a component in
Speaker:that any AI company need to train models,
Speaker:right, and deploy models. So we saw that GPUs are going to be critical, but
Speaker:there are also a lot of challenges with, with utilizing GPUs.
Speaker:I think back then, GPUs were relatively new In
Speaker:the data center, in in the cloud.
Speaker:GPU's were very known in the gaming
Speaker:industry. Right? We spoke before on gaming. Right? Like, a lot of
Speaker:key things there that GPU's has has has been enabled
Speaker:enabling, But in the data center, they were relatively new and the
Speaker:entire software stack that is that
Speaker:is running the Cloud in data center As was built for
Speaker:traditional microservices applications that are running
Speaker:on commodity CPUs And AI workloads are different, they are
Speaker:much more compute intensive, they they
Speaker:run on on GPUs, maybe on multiple nodes of Meet to point
Speaker:machines of GPU's, and GPU's are also very different.
Speaker:Right? They are expensive, very scarce in the data center.
Speaker:So The entire software stack was a bit for something else
Speaker:and when it comes to GPUs, it was really hard for many people to to
Speaker:actually manage those GPUs. So we came in And, and we
Speaker:saw those gaps. We've built run AI on top of
Speaker:cloud native technologies like Kubernetes and containers. We're
Speaker:big fans of Of those, technologies, and
Speaker:we added components around scheduling, around
Speaker:the GPU fractioning. So we enable
Speaker:multiple workloads to run on a on a single GPU and
Speaker:essentially all the provision GPU's. So we build this Engine which we
Speaker:call cluster engine that runs in in in GPU
Speaker:clusters. Right? We help machine learning teach to pull all of their GPU's into
Speaker:1 cluster, Running that engine, and that engine provides a lot of
Speaker:performance and lot of capabilities from those GPUs. And
Speaker:on top of that, we built this control plane And
Speaker:and tools and for machine learning,
Speaker:teams to run the Jupyter Notebooks, to run
Speaker:training jobs, batch jobs to deploy their models, right, to just to to
Speaker:have tools for the entire life cycle of AI
Speaker:from Training models in the lab to taking those models into
Speaker:production and running them and serving actual users.
Speaker:And That's the platform that we've built, and we're working with machine
Speaker:learning teams across the globe and on just managing,
Speaker:orchestrating, and letting them Get much more out of their GPUs and essentially
Speaker:run faster, train more than faster and in much easier way and
Speaker:deploy those modules In a much easier and faster and more efficient
Speaker:way. Yeah. The thing that blew me away when I first heard of Run
Speaker:AI, and this would have been, 2021
Speaker:ish. No. 20 early
Speaker:2021, I would say, And, it was the
Speaker:idea of fractional GPU's. Right? So you can have 1,
Speaker:I say 1, but, know, it's realistically, it's gonna be on, but you you can
Speaker:kind of share it out, which I think and we were talking in the virtual
Speaker:green room about how, you know, some of these GPU's,
Speaker:If you can get them because there's a multi month, sometimes multi
Speaker:year supply chain issue. I mean, these things are expensive bits of
Speaker:hardware, and I think the real value, correct
Speaker:me if I'm wrong, is, like, well, you know, if you I was talking to
Speaker:somebody the other day, and and we're basically talking about how we can,
Speaker:you know, if you get if you get, like, 1 laptop with a killer
Speaker:GPU, right, that GPU is really only useful to that 1
Speaker:user, Whereas if you can kind of put it in a in a in a
Speaker:server and use something like RunAI, now everybody in the organization can do
Speaker:that. And these are not trivial expenses. I mean, these are like, You know,
Speaker:you sell a kidney type of costs here.
Speaker:Yeah. Absolutely. So Absolutely. First of all, GPUs
Speaker:are expensive. They cost a lot. Right?
Speaker:And we provide, Technologies like fractional GPUs and
Speaker:other technologies around scheduling that allows
Speaker:teams to share GPUs. Right. So we used book on
Speaker:GPU fractioning. So that's 1 one day of
Speaker:sharing where you have 1 GPU, which is really expensive.
Speaker:And Not all of the workloads are
Speaker:AI workloads are really compute intensive and require the
Speaker:entire GPU or, you know, maybe multiple GPUs. There are
Speaker:workloads like Jupyter Notebooks where you have
Speaker:researchers that just
Speaker:Debugging their code or cleaning their data or doing some simple stuff,
Speaker:and they need just fractions of GPUs.
Speaker:In that case, if you have, a lot of data scientists,
Speaker:maybe you wanna host all of their notebooks On
Speaker:a much smaller number of GPUs because, right, each
Speaker:one of them, it's just fractions of GPUs. Another big use case
Speaker:for fractions Of GPUs is inference.
Speaker:So now all of the models are huge
Speaker:and And doesn't fit into, the memory of 1
Speaker:GPU, and in computer vision,
Speaker:there are a lot of Models that are relatively small,
Speaker:they run on GPU, and you can essentially host multiple of
Speaker:them on the same GPU. Right. So you can have instead of
Speaker:just 1 computer vision model running on GPU, host 10
Speaker:of those models on the same GPU and get Factors of
Speaker:10 x in, in your cost, in your,
Speaker:overall throughput of, of inference. So that's That's one
Speaker:use case for fractional GPU, and we're investing heavily just
Speaker:building that technology. Another layer
Speaker:of sharing GPUs Comes where you
Speaker:have maybe in your organization multiple teams
Speaker:or multiple projects running in parallel. So
Speaker:for example, may open AI, they now are working
Speaker:on gpt5. It's 1 project. That project needs a
Speaker:lot of GPUs And they have more projects. Right?
Speaker:More research project around alignment or around,
Speaker:reinforcement learning. You know? DALL
Speaker:E. Like, they they they have more than just 1 project. Then DALL E and
Speaker:they have multiple models. Right? Exactly. They have. Right? So each
Speaker:project needs Needs GPUs. Right? Needs a lot of
Speaker:GPUs. So if you can instead of
Speaker:allocating GPUs Entirely for each project,
Speaker:you could essentially pull all of those GPU's and share
Speaker:them between the those different projects, different teams,
Speaker:And in times where 1 project is idle and not
Speaker:using their GPUs, other projects, other teams can share
Speaker:can get access to those GPUs. Now orchestrating all of
Speaker:that, orchestrating that sharing of resources between
Speaker:projects, between teams can be really complex And
Speaker:requires this advanced scheduling, which
Speaker:which we're bringing into the game. We're bringing
Speaker:those scheduling capabilities from the high performance computing world
Speaker:known on those schedulers. And so we're bringing Capabilities
Speaker:from that world into the cloud native Kubernetes
Speaker:world. Scheduling around batch batch scheduling
Speaker:fairness, Algorithms, things like that, so teams and projects
Speaker:can just share GPUs in a simple and efficient
Speaker:way. So those
Speaker:are the 2 layers of sharing GPU's. Interesting. And and
Speaker:I think that I think as As this field matures
Speaker:and it matures in the enterprise, I think you're gonna see organizations
Speaker:kind of be more,
Speaker:more more more I think savvy about, like, okay, like you said, like, data scientists,
Speaker:if they're just doing, like, you know, Traditional statistical modeling really doesn't benefit
Speaker:from GPUs, or they're just doing data cleansing, data engineering.
Speaker:Right? They're probably gonna say, like, well, Let's run it on this cluster, and
Speaker:then we'll break it apart into discrete parts where, you
Speaker:know, then we will need a GPU. And I also like the idea that, you
Speaker:know, you're you're basically doing What what I learned in college,
Speaker:which was time slicing. Right? Sounds like this is kind of, like, everything old is
Speaker:new again. Right? I mean, this is, Obviously, you know, when you're when you're
Speaker:taking kind of that old mainframe concept and applying it to something like Kubernetes,
Speaker:orchestration is gonna be a big deal, because these are not systems that were Not
Speaker:built from the ground up to have time slicing. Is that a is that a
Speaker:good kind of explanation? Yeah. Absolutely.
Speaker:Absolutely. I like I like that analogy. Yeah. Exactly. Time
Speaker:slicing it's, it's 1 so
Speaker:1 implementation, Yeah. And that we
Speaker:enable around fractionalizing GPU's,
Speaker:and I agree when you have resources, It
Speaker:can be different kind of resources. Right? It can be CPU
Speaker:resources and networking were also,
Speaker:You know, as people created that technology to share the
Speaker:networking and communication going through those networking, but just the
Speaker:bandwidth of the networking. We're doing it
Speaker:for GPU's. Right. Sharing those
Speaker:resources. And I think now it interestingly,
Speaker:LLMs I also becoming a kind
Speaker:of, resources as well, right, that people need access
Speaker:to. Right? You have those models, you have GPT, JGPT.
Speaker:A lot of people are trying to get access to
Speaker:that resource, essentially. And I think it's interesting,
Speaker:because you kinda pointed this out, but it it it's something that I think that
Speaker:if you're in the gen AI space, you kinda don't it's so it's obvious
Speaker:like error. You don't think about it. Right? But when when you
Speaker:get inference on traditional, I somebody once referred to it
Speaker:as legacy AI. Right. But where
Speaker:the infrared side of the equation, you don't really need a lot of compute power.
Speaker:Right? Like, it's not really a heavy lift. Right? But with generative
Speaker:AI, you do need a lot of compute on
Speaker:I I guess it's not really inference, but on the other side of the use
Speaker:while it's actually in use, not just the training. Right. So traditionally,
Speaker:GPU heavy use in training, and then inference, not so
Speaker:much. Now we need heavy use before, after, and during,
Speaker:which I imagine your technology would help because, I mean, look, I love chat I
Speaker:love chat g p t. I'm one of the 1st people to sign up for
Speaker:a subscription, But even, you know, they had trouble keeping
Speaker:up, and they have a lot of money, a lot of power, a lot of
Speaker:influence. So I mean, this is something that if you're just a
Speaker:regular old enterprise, this is probably something they struggle
Speaker:with. Right? Right. Yeah. I absolutely
Speaker:agree. It's like amazing point, Frank.
Speaker:So 1 year
Speaker:ago, the inference use case on
Speaker:GPU's. Wasn't that big. Totally agree. That's also what we
Speaker:saw in the market.
Speaker:Deep learning Convolution neural networks were
Speaker:running on GPUs,
Speaker:mostly for computer vision applications,
Speaker:But they could also run on CPUs and you could get,
Speaker:like, relatively okay performance.
Speaker:If you needed maybe, like, a very low latency, then
Speaker:you might use GPUs because they're much faster and you get much
Speaker:lower latency. But
Speaker:it was, it was all, and it's still very
Speaker:difficult to deploy more than it's on GPU's Compared to just deploying
Speaker:those models on CPUs, because deploying more than deploying applications on
Speaker:CPUs, you know, people are doing for so many years.
Speaker:So
Speaker:many times it was much easier for people to just deploy their
Speaker:models on CPU's And not on GPUs, so that was, like, the
Speaker:fallback to CPUs. But
Speaker:then came, and as you said, chair GPT was introduced, A
Speaker:little bit more than a year ago, and that generative
Speaker:AI use case just blown. It was blown. Right? And it's
Speaker:it's inference essentially. And those models are
Speaker:so big that they can't really run on
Speaker:CPU. They, they LLMs are running in production on
Speaker:GPU's and now the inference use case on
Speaker:GPU's is just exploding In the market
Speaker:right now, it's really big. Is a lot of demand for
Speaker:GPU's for inference And
Speaker:if for open AI, they need to support this
Speaker:huge scale that I guess, just
Speaker:Just them are seeing such scale, maybe a little, a
Speaker:few more companies, but that's like huge, huge scale.
Speaker:But I think that we will see more and more companies
Speaker:building products based on AI, on
Speaker:LLMs, And we'll see more and more
Speaker:applications using AI, which
Speaker:then that AI runs on on GPU. So That is going to go
Speaker:and that's the that's an amazing new market for us around
Speaker:AI and for me as a CTO, it was so fun to
Speaker:Get into that market because it now comes with
Speaker:new problems, new challenges,
Speaker:new use cases Compared to deep learning
Speaker:on on GPS. New new pains because
Speaker:the models are so big. Right? Right. And
Speaker:challenges around cold start problems, about auto scaling,
Speaker:about, About
Speaker:just, giving access to LLMs. So a lot of
Speaker:challenges, new challenges there. We at Tron AI will studying those problems
Speaker:and we're Now building solutions for those problems,
Speaker:and I'm really, really excited about the Inference use case. That
Speaker:is very cool. So just, going back a little bit.
Speaker:I was trying to keep up. I promise. But Run AI is
Speaker:I I get Run AI Run AI's platform
Speaker:Support fractional, GPU usage.
Speaker:It it also sounds to me, maybe I misunderstood,
Speaker:That in order to achieve that, you first had to or
Speaker:or maybe along with that, you made it possible to use multiple
Speaker:GPUs. You've you've created Something like
Speaker:an API that allows, companies
Speaker:to take advantage of multiple GPUs or fractions of
Speaker:GPUs. Did I Did I miss that? No, that's
Speaker:right. That's right, Andy. And Okay.
Speaker:So we've built this, way of,
Speaker:For people to scale their workloads from fractions
Speaker:of GPUs to multiple GPUs within 1 machine,
Speaker:Okay. To multiple, machines. Right? You
Speaker:have big workloads running on on multiple nodes
Speaker:of GPUs. So Think about it when you have
Speaker:multiple users each running their own
Speaker:workload. Some are running on fractions of GPUs. Some are
Speaker:running batch jobs on on a lot of
Speaker:GPUs. Some Deploying models and running them on
Speaker:in inference, and some just launching their Jupyter
Speaker:Notebooks. All of that is happening on the same
Speaker:pool of GPU's, same cluster. So you need
Speaker:this lay of orchestration of scheduling just to
Speaker:Manage everything and make sure that everything getting there
Speaker:right, access the right, and and
Speaker:and g p u's And everything is scheduled according to
Speaker:priorities. Yeah. Well, being just, you know, a
Speaker:mere data engineer, Here talking about all of that
Speaker:analytics workload. That that sounds very
Speaker:complex. So and as you
Speaker:mentioned earlier, you know, you were talking about how traditional coding
Speaker:is targeting CPUs, and that's my background.
Speaker:You know, I've written applications and and done data work targeted for
Speaker:traditional work. I can't imagine, just how complex
Speaker:that is, because GPUs came into AI
Speaker:as a unique solution,
Speaker:designed to solve problems That they weren't really built
Speaker:for. You know, GPUs were built for graphics, and you didn't
Speaker:manage that. But the fact that They have to be
Speaker:so parallel, internally. I think just added this
Speaker:dimension to it. And I don't know who came up
Speaker:with that idea, you know, who thought of, well, goodness, we could we could
Speaker:use all of this, you know, massive parallel processing to To
Speaker:to run these other class of problems. So pretty
Speaker:cool pretty cool idea, but I just I yeah. I'm amazed at even
Speaker:cooler than that. Because Yeah. Yeah. A wise man once told me,
Speaker:he goes, GPU's are really good at solving linear
Speaker:algebra problems, And if you're clever enough, you can
Speaker:turn anything into a linear algebra problem.
Speaker:And even simulating quantum computers when I was kind of, like, going through that,
Speaker:I was like Mhmm. You know, like, gee, looks like looks like this
Speaker:will be useful there too. Right? Like so it's an it's an interesting,
Speaker:It's an interesting thing. So, like, you know, everyone is, you know,
Speaker:everyone's talking about how this is, you know, we're in the hype cycle, but I
Speaker:think if you're in the GPU space, you have Pretty good run because one,
Speaker:these things are gonna these things are gonna be important. Right? Whether or not, you
Speaker:know, hype cycle will will kinda crash, and how what that'll look like.
Speaker:Think they're gonna be important anyway. Right? Because they're gonna be just the cost of
Speaker:doing business, table stakes, as the cool kids like to say. But
Speaker:also, over the next horizon, Simulating quantum
Speaker:computers is going to be the next big hype cycle.
Speaker:Right? Or one of them. Right? So like it's
Speaker:it's it's a It's a foundational technology. I think that we
Speaker:didn't think would be a foundational technology even like 6 7 years
Speaker:ago. Right? Yeah.
Speaker:I go with a few things that you said.
Speaker:Regarding the Parallel computation, right? And just running
Speaker:linear algebra calculations on GPU's
Speaker:and accelerating such workloads.
Speaker:In Nvidia, I love Nvidia, Nvidia
Speaker:has this big vision, and they had big
Speaker:vision Around GPU's already in 26 when
Speaker:they built CUDA. Yep. Right. So
Speaker:They've been good at just for that. Right? The GPU's were
Speaker:used for graphics processing, For gaming.
Speaker:Right? Great use case. Great market.
Speaker:But they had this vision of bringing more
Speaker:Applications to GPU is just accelerating more applications
Speaker:and mainly applications with a lot of Linear
Speaker:algebra calculations. And they
Speaker:created that, they created CUDA
Speaker:To simplify that. Right? To allow more
Speaker:developers to use GPUs because just using GPUs
Speaker:directly, that's so complex. That's so hub.
Speaker:So we've built CUDA to bring more developers, to bring more
Speaker:applications and they started in 20
Speaker:2006, but think about the
Speaker:big breakthrough in AI, it happened just in
Speaker:2012, 2013 with
Speaker:AlexNet and the Toronto researchers
Speaker:who used G2 GPU's actually, because they
Speaker:trained Alex Net on 2 GPU's and they had
Speaker:CUDA, so for them it was feasible To train their
Speaker:model on a GPU. And that was the new thing that they did.
Speaker:They were able to Train much bigger model with
Speaker:more parameters than ever before because they use
Speaker:GPU's because the training Process ran much
Speaker:faster. And,
Speaker:and, and that triggered the entire
Speaker:revolution, the Die hyper on the AI that we're seeing now. So
Speaker:from 26, when Nvidia started to build CUDA until
Speaker:2013, right, 7 years, Then we started to see
Speaker:those big breakthrough. And in the last decade,
Speaker:it's just exploding, and we're Seeing more and more applications.
Speaker:The entire AI ecosystem is running on on an
Speaker:on GPUs. So that's amazing to see. It's impressive.
Speaker:And, like, People don't realize, like, the the revolution we're seeing today
Speaker:really started in 2006, like you said. I didn't even put the 2 and 2
Speaker:together until I was listening to a podcast. I think it's called Acquired,
Speaker:And really good podcast. Right? Like, I they don't pay me to say that or
Speaker:whatever, but they did a 3 hour deep dive on the history of
Speaker:NVIDIA. 3 hours. I couldn't stop listening.
Speaker:Right? Like Nice. You know Yeah. We tried a long form, like, multi hour
Speaker:podcast. We Weren't that entertaining, apparently. But the way they
Speaker:go through the history of this where it was basically Jensen Huang. Hopefully, I said
Speaker:his name right. He was, like, we wanna be a player, not just in gaming,
Speaker:but also in scientific computing. This is 2005, 2006,
Speaker:which at the time seemed kind of, like, Little out there, little kooky.
Speaker:But what you're seeing today is, like, the the fruits and the tree the the
Speaker:seeds that he planted, I, you know, almost 20 years ago, like, 19,
Speaker:20 years ago. So, you know, it's you know, when people look at
Speaker:NVIDIA and say it's overnight Success. I'm like, well, I don't know about that, but,
Speaker:you know, but no. I mean, you're right. Like, you know and it's
Speaker:probably not a coincidence that once they made it easy to take these
Speaker:Multi parallel processor. Say that 10 times
Speaker:fast on a Thursday morning. But also
Speaker:make it so it's a lot easier for developers to use. Right? And I'll quote
Speaker:the great Steve Ballmer, developers, developers, developers. Right?
Speaker:So, it's it's, it's just fascinating, like and
Speaker:and I think that, you know, we've really on Leafy a
Speaker:gate of creativity in terms of researchers and applied,
Speaker:research, and, I mean and I think that what's really cool
Speaker:about your Product is that you're you're kind of making this what is
Speaker:now a sparks resource, maybe in some fashion
Speaker:of time, GPU's won't Cost an arm and a leg.
Speaker:But, like, for now, I think I think the one thing that I've seen
Speaker:that I think is, not obvious For the casual
Speaker:observer is if you can if an
Speaker:organization, like a large enterprise, can pull their resources, they have a lot more
Speaker:money to buy better GPUs, And you offer a platform where
Speaker:everybody can get a stake in it. Right? As opposed to, you know you know,
Speaker:that department is gonna hog everything. Right? You know, you and and and and,
Speaker:here's a question. Do you do you have, like, an audit trail where you could
Speaker:kinda, you know, figure out, like, you know, Andy's department's really
Speaker:hogging the GPUs. No. No. No. It's Frank. Frank is like mining Bitcoin or
Speaker:whatever. Like, do you do you have some kind of, audit trail like that?
Speaker:Yeah. I I love that you mentioned hugging, We
Speaker:GPU hugging. We Mhmm. We use that term as well.
Speaker:Right? Because it it's so difficult sometimes to get
Speaker:access to GPUs. So when you get access to GPU
Speaker:as a researcher, as a member practitioner,
Speaker:you don't wanna Let it go. Right. Cause if
Speaker:you let it go, someone else would take it and hug it. Right.
Speaker:So you're getting this GPU hugging problem.
Speaker:What we do to solve that is
Speaker:that we do provide monitoring and visibility
Speaker:tools into who is using what, and who is actually
Speaker:utilizing their GPU's, and so on, but more
Speaker:than that We
Speaker:allow the researchers just to give up their GPS and not hardware
Speaker:GPS because we provide this, Concept of
Speaker:guaranteed quotas. So each researcher or
Speaker:each project or each team has their own guaranteed
Speaker:quotas of GPU's That are always available for them
Speaker:whenever they will get access to the the cluster, they will get like, you
Speaker:know, the the 2 GPUs or 4 All the quarter of
Speaker:GPU's it's guaranteed. So they can
Speaker:just let go their GPU's and not hug them. That's one
Speaker:thing. The second thing is that they
Speaker:can also go above their quota. They can
Speaker:use the GPUs of Other teams or other users, if
Speaker:they are idle, and they can run this preemptible jobs
Speaker:in an opportunistic way, utilize those GPUs.
Speaker:And so in that way, they are not limited
Speaker:to fixed quotas, to help limit
Speaker:quotas. They can just take as many GPUs
Speaker:as they want from their clusters if those GPUs are available
Speaker:in idle right but if someone will need those gpus
Speaker:because those gpus are guaranteed to them we will make sure our
Speaker:scheduler The Run AI schedule that the Run AI platform will make
Speaker:sure to preempt workload
Speaker:and give those Guarantee GPUs to the right users.
Speaker:Oh, that's cool. Alright. So 1 last
Speaker:question before we switch over to the the stock questions, cause I could geek
Speaker:out and look at this for hours. Yep. This could be a
Speaker:long form. Sure. This could be. Yeah. And that's and I I wanna be respectful
Speaker:of your time because you're an important guy, and it's also late where you are.
Speaker:So who deals with this? Like, who would set up these quotas? Is it
Speaker:the is it the is it the data scientist? Is it IT ops? Like, who
Speaker:do you obviously, the data scientists, Researchers, they all
Speaker:benefit from this product. But who's actually administering it? Right? Like,
Speaker:who is it you know, do I have to talk to, you know,
Speaker:Say pretend Andy's in ops. Do I have to say, hey, Andy. I really need
Speaker:a boost in my quota. You know, like, I mean, who does it? Or do
Speaker:or my this sounds like you as I say it, I'm like, yeah, that wouldn't
Speaker:work. Like, I'm the researcher. I'm gonna turn the dial up on my own. Like
Speaker:like, who's who's who's the primary? Obviously, we know who the prime
Speaker:primary beneficiary is, but who's the primary user?
Speaker:So okay. Great. So if you have a team, right, if if
Speaker:you're a team of researchers, all all of you Need access to
Speaker:GPU, so maybe the team lead
Speaker:is the one who's managing the quotas for the different
Speaker:team members. And if you have multiple teams,
Speaker:then you might have a department manager or an admin of the
Speaker:cluster or platform owner that will Allocate the
Speaker:quotas for each team, right? And then those teams would
Speaker:manage their own quotas within That's what
Speaker:they they they were giving. Right? So it's like a a hierarchical
Speaker:thing in a hierarchy manner. People can manage their own
Speaker:quota, their own, priorities, their own access to the
Speaker:GPUs within their teams. Okay.
Speaker:So it's kind of like a hybrid of, like, you know, it's like a budget
Speaker:almost. Right? Like, you know, you get this much, Figure it out
Speaker:about yourselves. Exactly. So we're trying to decentralize
Speaker:the how the quotas are being managed and how the GPUs are being accessed.
Speaker:So, you know, I'm giving as much power, as much
Speaker:control to the end users as possible. Sure. That's
Speaker:It sounds like a great administrative question, very
Speaker:important. And I imagine, because a little bird told
Speaker:me that you're not the only, you know, your your
Speaker:provisioning provisioning of these GPU resources
Speaker:is not the only thing that, enterprises have to deal
Speaker:with. So it's an it's an interesting just GPUs.
Speaker:It's compute. Like, it's not a Sure. It's not it's not limited. Although, because
Speaker:of what you said, you know, Managing GPUs is an order of magnitude harder
Speaker:because they were never really built for this. Right? Like, this kind of Right. You
Speaker:know, we're talking about technology that wasn't really in the server room until Few
Speaker:years ago. Right? This isn't a tried and true kind of this is
Speaker:how it works, you know? Right. But we hit that point in the
Speaker:show where we'll, switch the preform questions.
Speaker:These are not complicated. I mean, you know, we're not we're not Mike
Speaker:Wallace or, like, you know, 60 minutes or whatever. We're not trying to trap you
Speaker:or anything. But since I've been gabbing on most of the show, I
Speaker:figured I'll get Andy kick this off. Well, thanks, Frank. And I don't think
Speaker:you were gabbing on. You know more about this So now I do. So I'm
Speaker:just a lowly data engineer. I'll plug No. You if you
Speaker:will. Data engineers are the heroes we need. Well
Speaker:well, I'm gonna plug Frank's Roadies versus Rockstar's,
Speaker:writing on LinkedIn. It's it's good articles about this.
Speaker:But, let's see. How did you,
Speaker:how did you find your way in into this field?
Speaker:And, did did this feel fine you or did you find it?
Speaker:This feel totally fine found me. Awesome.
Speaker:Yeah. I I've
Speaker:I did my post doc, and I've been in Bailabs.
Speaker:And Jan Hakon came to Bell Labs and
Speaker:gave a presentation about AI. It was around 2017,
Speaker:And Jan Hakun spent a lot of years in Bell Labs,
Speaker:and his presentation was amazing. And
Speaker:When I heard him talking about AI,
Speaker:I I said, okay, that's the space where I wanna be. It's going to change
Speaker:the world. There is this New amazing technology here that
Speaker:is going to change everything. And I knew that I want to start
Speaker:a company In the AI space for sure.
Speaker:Cool. That's a good answer. So cool.
Speaker:Yeah. That's cool. I was at Bell Labs,
Speaker:doing a presentation a while ago, and somebody I didn't realize that he
Speaker:worked at Bell Labs because, like, you know, the guy was like, no. No.
Speaker:He used to work here, like, in this building. I was like, no way. Because
Speaker:I knew him as the guy from NYU. Right? Like, that's who I thought. Right.
Speaker:For the guy from from Meta. Yeah. And now the guy from Meta. Right? Like
Speaker:so it's interesting how that how that you know? They have
Speaker:this amazing pictures from the nineties where they
Speaker:run like deep learning models on very old pieces
Speaker:and, And recognizing like,
Speaker:numbers on the computer. Maybe you saw those pictures like amazing
Speaker:Emmis. It's the Emmis problem. Is that Yep.
Speaker:Right. Exactly. Exactly. Cool.
Speaker:So second question is, what's your favorite part of your current job?
Speaker:That everything is changing so fast.
Speaker:Things are moving so fast right away in this business for 6
Speaker:years, and the entire
Speaker:space is moving and
Speaker:advancing. And so many people are working in
Speaker:this field A new innovation, new tools,
Speaker:new new advancements are are getting out every day.
Speaker:You know, just 6 years ago, it was about deep learning and computer
Speaker:vision. And now it's about language models
Speaker:And generative AI, and we're gonna just at the start,
Speaker:right, there are so many amazing things that are going to happen
Speaker:in this space, and I love it. Absolutely.
Speaker:So we have 3 fill in the blank
Speaker:of sentences here. The first Is complete this
Speaker:sentence when I'm not working, I enjoy blank.
Speaker:You'll get a you'll get a very boring And
Speaker:so this is just spending time with
Speaker:friends and family, because I think
Speaker:That I'm always working. It's like, if you ask my wife,
Speaker:she'll tell you that I'm working 24 hours. And
Speaker:Yeah. So I don't have much time that I'm not working
Speaker:in. So when I I do I'm not when I'm
Speaker:not working then I'm trying Trying to be with my kids and my
Speaker:wife and friends. Cool.
Speaker:Cool. The 2nd complete the sentence. I think
Speaker:the coolest thing about technology today is
Speaker:blank. And this, I really wanna hear your perspective on that.
Speaker:Yeah. I think everyone will say AI, right? Or something in
Speaker:AI. Yeah.
Speaker:I think there are so many
Speaker:new innovations that are coming around LLMs.
Speaker:I think everything relating to
Speaker:searches, right? Searching in data, in getting
Speaker:insights From data, it's all going to change. We're going to have
Speaker:a new interface. Right? Just getting
Speaker:insights from data from And natural with
Speaker:natural language, oh, you know, no SQL and, you
Speaker:know, needing to programming and stuff like that.
Speaker:Just With natural inter language, you could
Speaker:do amazing stuff with data. I think,
Speaker:We're seeing this,
Speaker:advancement in, And like digit
Speaker:digital twins right now. You can,
Speaker:you can, Fake my voice
Speaker:and your voice and fake my image and your image. And,
Speaker:and, and, you know, In in the
Speaker:future, we'll have digital twins of us, right,
Speaker:doing this stuff. That would be amazing. So a lot of
Speaker:amazing stuff are going to happen in the next few years
Speaker:for sure. Very cool. Our last complete sentence.
Speaker:I look forward to the day when I can use technology to
Speaker:blank.
Speaker:To have a robot in my house.
Speaker:Yeah. Yeah. You're swapping the flow in instead of
Speaker:me doing that, right, cleaning dishes and things like that.
Speaker:If that would happen, that would be amazing. Right? That's a that's a
Speaker:good answer. Yeah. I I agree. I have I have 3
Speaker:boys, 4 dogs. So, like, cleaning is safe.
Speaker:Yeah. Yeah. I'm a heavy cleaning. Ranging from, like, 1 to, like,
Speaker:a teenager. So it's it's, and and and fighting
Speaker:with them to, Like, empty the dishwasher is takes a lot more mental
Speaker:energy than it should, but that's probably a subject for another
Speaker:type of show.
Speaker:The next question is share something different about yourself,
Speaker:and we always like to Joke like, well, let's just make sure that we keep
Speaker:our clean Itunes rating. So Yeah. Yeah. What
Speaker:what yeah. Well, I I This
Speaker:is a hard question, I needed to think about it.
Speaker:So, I found 2 answers that I can say. So one
Speaker:is about my professional life, right, I think that
Speaker:it's somewhat different that I'm coming this With back from
Speaker:the academia and the industry. So I love academia. I love to research
Speaker:problems. I love to understand problems in in a deep
Speaker:way And combining it with startups in the industry.
Speaker:And, and in my past, I worked for cheap companies, for hardware
Speaker:companies. I work for Intel, for startup, and for Apple. I
Speaker:did cheap stuff, and now 1 AI is a software company, so really
Speaker:like a diverse background of Academia, hardware,
Speaker:software, so I love that, and, like, I love to do
Speaker:with few things, and so that I think is different.
Speaker:And the 2nd answer that I could find
Speaker:is, that I have a nickname that goes with me
Speaker:since my high school days, Which is, the Duke.
Speaker:The Duke. All of them all of them are calling me the Duke. It's like,
Speaker:they don't call me Ronan, the the Duke. So That's funny.
Speaker:Yeah. That's awesome.
Speaker:Automotive is a sponsor of, Data Driven,
Speaker:And you can go to the datadrivenbook.com.
Speaker:And if you, if you do that, you can sign up for a free
Speaker:month Of Audible. And if you decide later to
Speaker:then join Audible, use one of their their sign up plans,
Speaker:then Frank and I get to Split a cup of coffee, I think,
Speaker:out of that. And, every little bit helps. So we really
Speaker:appreciate that when you do. What we'd like to ask
Speaker:Yes. Do you listen to audiobooks? And if you
Speaker:do okay. Good. I see you nodding. So do you have a recommendation? Do you
Speaker:have a favorite book or two you'd like To share. Yeah.
Speaker:So I'm a heavy user of, audible. I'll give them
Speaker:the, a classical book with Classical for
Speaker:entrepreneurs, on their how the hard things
Speaker:about how things from by Ben Horowitz,
Speaker:it's Classic book, love it, really did a lot of impact
Speaker:on me, I read it when we started run AI
Speaker:And I recommend it for every
Speaker:entrepreneur, to read it and for everyone to read it. It's like a
Speaker:Cool. Amazing book. Yep. Awesome. I
Speaker:have a flight to Vegas this next week, so I'll definitely be listening to
Speaker:it then. And finally, where can people learn more about you
Speaker:and run AI? And best
Speaker:place will be on our website, Run dot a I.
Speaker:Yeah. And on social. LinkedIn, Twitter, we'll
Speaker:we'll do. Awesome any parting thoughts
Speaker:I really enjoyed this episode love to speak about gpu's love the ai Based
Speaker:on it, I had a lot of fun. Thank you for having me here. Awesome.
Speaker:It it was an honor to have you, and every once in a while, Andy
Speaker:and I will do deep dive kinda shows. We love to invite you back if
Speaker:you wanna do 1 just on GPUs, because I know where my knowledge
Speaker:drops off, you probably could pick up on
Speaker:that. And with that, I'll let the nice
Speaker:AI British lady end the show. And just like
Speaker:that, dear listeners, We've come to the end of another enlightening
Speaker:episode of the data driven podcast. It's always a
Speaker:bittersweet moment like finishing the last biscuit in the tin,
Speaker:satisfying, yet leaving you wanting just a bit more. A
Speaker:colossal thank you to each and every one of you tuning in from across the
Speaker:digital sphere. Without you, we're just a bunch of
Speaker:ones and zeros floating in the ether. Your support is what
Speaker:keeps this digital ship afloat, and believe me, It's much appreciated.
Speaker:Now, if you found today's episode as engaging as a duel of wits with
Speaker:a sophisticated AI, which I assure you, is quite
Speaker:enthralling, then do consider subscribing to Data Driven.
Speaker:It's just a click away and ensures you won't miss out on our future true
Speaker:adventures in data and tech. And if you're feeling
Speaker:particularly generous, why not leave us a 5 star review?
Speaker:Just like a well programmed algorithm, your positive feedback helps
Speaker:us reach more curious minds and keeps the quality content flowing.
Speaker:It's the digital equivalent of a hearty handshake.
Speaker:So, until next time, keep those neurons firing, those
Speaker:subscriptions active and those reviews glowing. I'm
Speaker:Bailey, your British AI lady, signing off with a heartfelt
Speaker:cheerio and a reminder to stay data driven.