Skip to content
Exploring Machine Learning, AI, and Data Science

Token Economy and AI Agents: The Final Hurdles in Enterprise Deployment

In this episode, Frank La Vigne sits down with his Red Hat colleague Christopher Newland for a deep dive into the evolving challenges and opportunities at the intersection of AI, open source, and enterprise technology.

Fresh off attending both IBM Think and Red Hat Summit, Christopher Newland shares insights from two very different industry perspectives—executive strategy and hands-on engineering. Together, they explore the elusive “last mile” problem in AI adoption, the rise of agentic systems, the critical role of harnesses and runtimes, and why memory management is becoming the next frontier.

Plus, they discuss the practical realities and future potential of tools like OpenShift AI, IBM Bob, and open source alternatives. Whether you’re a developer grappling with implementation details or a leader focused on ROI, this episode has something for everyone navigating today’s fast-changing AI landscape.

Links

Time Stamps

00:00 Comparing IBM Exec and Red Hat Conferences

05:24 Challenges in AI implementation

06:56 Challenges in scaling microservices

11:38 Integrating AI with project management

14:23 Debate on AI model vs. harness

16:54 Discussing model evolution and limitations

22:54 Affordable Power BI Courses Bundle

25:19 Separating and managing runtimes

27:02 Using semantic routing for requests

30:15 Agent memory and compression basics

36:02 New AI approach and vision

38:49 Developing a multi-agent system

40:40 Importance of data chunking

Transcript
Speaker:

It's very much based off of your requirements, your

Speaker:

skills, your knowledge, your processes

Speaker:

that now need to be defined within

Speaker:

your AI stack. And that really is the last mile. And

Speaker:

I think that's even I saw that from both conferences where

Speaker:

the realization that there's still a lot of work that needs to be done to

Speaker:

get AI to a point where it's actually very fine

Speaker:

tuned, very functional, very efficient.

Speaker:

And right now it may work, but it may not be very efficient for scaling,

Speaker:

it may not be efficient for cost, it may not be efficient for the new

Speaker:

token economy that we're seeing. And the last mile is

Speaker:

historically the biggest problem to crack. Right. And once you solve that problem,

Speaker:

Amazon as a physical last mile in terms of

Speaker:

how they actually execute on delivery, right, because you can have

Speaker:

warehouses, but everybody lives in a different house, right?

Speaker:

So there's a lot of little last miles. It's, it's

Speaker:

death by a thousand paper cuts, if you will. Proof of concept

Speaker:

projects are everywhere. Real business value, that's

Speaker:

the hard part. This is data Driven.

Speaker:

I'm Frank Lavinia and with me I have a very special guest,

Speaker:

Christopher Newland, who is a colleague of mine at Red Hat. And

Speaker:

we're gonna do a deep dive. You, you travel all the time.

Speaker:

I know that I travel two weeks back to back and that was a been

Speaker:

a while since I had to do that. But it is conference

Speaker:

season and so you

Speaker:

were at IBM think last week

Speaker:

and you were also at Red Hat Summit this

Speaker:

past week, as was I. I have my Atlanta I heart I Red Hat

Speaker:

Atlanta T shirt on and

Speaker:

so how's it going, Christopher? But yeah, it was nice

Speaker:

because one of those two IBM think was

Speaker:

actually in my area of Boston, so I was able

Speaker:

to attend that locally. Still, still a lot

Speaker:

though. Like, you know, you're going in at like 7 in the morning to be,

Speaker:

try to beat traffic and then you're leaving at like 10 o' clock at night.

Speaker:

But it's very poor. Yeah,

Speaker:

yeah. And those conferences were very, very different, targeting very

Speaker:

different audiences. So it was, I felt like I got kind of

Speaker:

two perspectives of the AI world and

Speaker:

what people are concerned about. One from a very like executive

Speaker:

lens and another one from more the day to day users,

Speaker:

developers, engineers who are actually implementing the AI.

Speaker:

So which is which? I think I know the answer, but yeah. So

Speaker:

I IBM think is an executive conference.

Speaker:

So I think it's normally director level or above.

Speaker:

So it's targeting a lot of C suites, senior

Speaker:

directors, I think the

Speaker:

most that you would ever, lowest you would See would be like a senior manager

Speaker:

of some sort. But for the most part it's a C suite type of

Speaker:

conference. And a lot of the conversation

Speaker:

there is more about the business return of

Speaker:

AI and what does that look like this year. And then

Speaker:

Red Hat Summit is very much about

Speaker:

the system administrator, the cluster administrator, the

Speaker:

sre, the developer who's actually

Speaker:

utilizing these technologies and actually like implementing something

Speaker:

with it or managing something with it. So very like two

Speaker:

different lenses to the same challenge within the industry.

Speaker:

Yeah, no, it was interesting. And I don't know about you and what you,

Speaker:

the attendees this year had much better questions, I think, than any other

Speaker:

Red Hat event than I've ever AI questions than I ever seen before.

Speaker:

Right. It seems like people are struggling to

Speaker:

implement this in a way that is secure,

Speaker:

stable, scalable.

Speaker:

And I think we also have a much better platform story this year than we

Speaker:

had in previous years. Absolutely. So the way I've been

Speaker:

framing it to people, it kind of goes into two terms. So the first

Speaker:

term I've been using with people is that last mile.

Speaker:

And that then kind of feeds into this

Speaker:

concept that you hear a lot about in business and other

Speaker:industries called the:Speaker:

of people are finding that 80, 20, what

Speaker:

is 80% of the returns or

Speaker:

20% of the effort? And then what we find is that it

Speaker:

flips for that remaining 20%

Speaker:

of returns is now going to be 80% of

Speaker:

the effort. And that 20% is what I've defined really as

Speaker:

the last mile. And I think the

Speaker:

conversations I'm having with people is that they now have the tools and they've had

Speaker:

POCs and they're seeing results and they're seeing

Speaker:

even a lot of times good results. They just don't know how do I get

Speaker:

it to the point where it's actually returning investment, whereas

Speaker:

roi. And this is a question that was happening at both conferences,

Speaker:

both from an executive lens point and from the, you know, the

Speaker:

general day to day developers. And this is where I think open source

Speaker:

is set for in a great position because

Speaker:

there's so many open source tools out there that we

Speaker:

can work with people on, you know, finalizing that last

Speaker:

mile. I think what people are most annoyed about though is

Speaker:

that there's not a magic button that's going to fix it because it's

Speaker:

very much based off of your requirements, your

Speaker:

skills, your knowledge, your processes

Speaker:

that now need to be defined within

Speaker:

your AI stack. And that really is the last mile.

Speaker:

And I think that's even I saw that from Both conferences

Speaker:

where the realization that there's still a lot of work that needs to be

Speaker:

done to get AI to a point where it's actually

Speaker:

very fine tuned, very functional, very

Speaker:

efficient right now. It may work, but it may not be very

Speaker:

efficient for scaling, it may not be efficient for cost, it may not

Speaker:

be efficient for the new token economy that we're seeing.

Speaker:

And the last mile is historically the biggest problem to crack. Right.

Speaker:

And once you solve that problem, Amazon as a

Speaker:

physical last mile. Right. In terms of how they actually execute on

Speaker:

delivery. Right. Because you can have warehouses, but

Speaker:

everybody lives in a different house. Right. So there's a lot of little

Speaker:

last miles. It's death by a thousand paper cuts, if

Speaker:

you will. Absolutely. And we saw the same thing with microservices

Speaker:back in the:Speaker:

of organizations that developed microservices but

Speaker:

then had a lot of challenges and had to overcome a lot of

Speaker:

that last mile when it came to data domain.

Speaker:

And you know, where, where does your data exist within this

Speaker:

microservice architecture? How do you do contracts and

Speaker:

handshakes between services? How do you orchestrate these services? How do you scale

Speaker:

them? You know, in many ways this is the

Speaker:

problems that we saw kubernetes kind of develop out of.

Speaker:

And now we're seeing being embraced now by a lot

Speaker:

of the same challenges we're seeing with agentic systems and

Speaker:

AI and how do we scale that out efficiently?

Speaker:

So I love what you said. It's not a new problem. It's

Speaker:

just the same problem we've seen reiterating over

Speaker:

50 plus years of compute history that now

Speaker:

just has a different lens to it of the

Speaker:

AI problem now. But a lot of the same solutions are still the

Speaker:

solutions that we had for many of those advancements in technology

Speaker:

that we saw, you know, over the last few decades. That is

Speaker:

interesting because like, you know, you know, Kubernetes does

Speaker:

has solved a lot of the same problems and it doesn't solve them all. But

Speaker:

there's a significant overlap and I, and I got that sense from the conference

Speaker:

that people are finally starting to get it. Like, why OpenShift AI?

Speaker:

Well, because OpenShift solves a lot of these problems. You just

Speaker:

put AI on top of those solved problems and it

Speaker:

doesn't fix everything. There's still going to be a lot more room

Speaker:

for improvement in terms of how you implement that

Speaker:

on your last mile. But it gets you

Speaker:

halfway there from a get go easily. Yes,

Speaker:

absolutely. A lot of the questions that

Speaker:

IBM think were it was Actually funny,

Speaker:

a lot of them are about IBM Bob and I know you and I have

Speaker:

been kind of talking about this for like last two weeks, but

Speaker:

at the IBM think IBM Bob was a very serious conversation of

Speaker:

executives wanting to know how can they

Speaker:

mimic tools like Claude code. Right. But

Speaker:

within their enterprise setting. And the biggest thing about IBM

Speaker:

Bob, that I learned actually at IBM think from both the engineers there and

Speaker:

those who are interested in it, is that a big thing here is what they

Speaker:

want for institutional knowledge. They want to keep a record

Speaker:

of all that institutional knowledge from the prompts

Speaker:

and the context and all the things that you

Speaker:

know, are built out of IBM Bob,

Speaker:

so that they can keep that information as institutional

Speaker:

knowledge. Really being able to then take that knowledge and

Speaker:

then kind of re injected into their broader agentic

Speaker:

engineering. And I think that's actually the, you know, I don't think

Speaker:

IBM Bob is actually really meant to be a clone of cloud code. I think

Speaker:

it's really meant to be a manager of institutional knowledge across

Speaker:

many different. Yeah, so we have a

Speaker:

special guest, a second special guest show up. This is

Speaker:

Crystal, my little dachshund pup. And I had a pick her up from

Speaker:

chewing wires, but I was listening. But

Speaker:

you're right though, there is definitely a

Speaker:

Bob feels different. I don't know how to describe it.

Speaker:

I had issues getting authenticated into it but the folks at

Speaker:

the Bob booth, at the my IBM booth did help with that. But

Speaker:

it's unfortunately named honestly I think because I think

Speaker:

of Microsoft Bob and that was not

Speaker:

exactly a winning product. Right. But,

Speaker:

but I, I've been playing around with it and I, you know, I had to

Speaker:

do kind of the init process on, on, on a couple

Speaker:

projects and was interesting because it suggested how to take those

Speaker:

projects and turn them into MCP servers and agents,

Speaker:

which the other ones Codex and Claude

Speaker:

have not. I thought that was interesting and I didn't prompt it to do that.

Speaker:

It just basically said on its own like, you know, you could turn

Speaker:

this process into an agent MCP server and things like that.

Speaker:

While that was in the back of my mind as I, you know, built these

Speaker:

various projects, it was not top of mind. So I thought that was interesting.

Speaker:

Yeah, it definitely is not a clone. It's. It's meant to solve a new

Speaker:

problem. Yes,

Speaker:

I agree, I agree. And I think it really

Speaker:

starts feeding into this bigger scope of things like

Speaker:

sports perspective and development. Right. And these other tools

Speaker:

of like how do we get the knowledge out of the project managers, how do

Speaker:

we get it out of the JIRA and how do we get it

Speaker:

into a way that the AI can

Speaker:

interpret it but not lose that knowledge along the way?

Speaker:

So as the prompts are coming in, as the context are coming in, it then

Speaker:

comes part of that institutional knowledge. And I think that is

Speaker:

ultimately what Bob is trying to achieve. That is very

Speaker:

different than I think, what a lot of the other alternatives out there are.

Speaker:

My hope is that as this grows, we see more opportunities for

Speaker:

it to become more open source. That's probably one area

Speaker:

where it's a little different than what we do here at Red Hat, where I

Speaker:

think we, I mean, we're not supporting the project,

Speaker:

but a project that we're very closely buying is things like

Speaker:

open code, for example, which is an open source alternative

Speaker:

to cloud code. It's really interesting to see all these

Speaker:

different solutions right now. I also like the fact that Bob can be

Speaker:

an IDE and mimic more cursor, or it

Speaker:

can be a CLI and mimic more of a cloud code, which obviously

Speaker:

with my background I'm more comfortable with the CLI side

Speaker:

now. That was a big one. And I would say then obviously agents would

Speaker:

be the second biggest thing. Just in general, that was the theme last

Speaker:

year at IBM Think. And that didn't change this year. I

Speaker:

think we're just seeing the experimentation of

Speaker:

agents now, moving into the

Speaker:

solidification of agents in the industry.

Speaker:

And I think we heard about agents a little bit at a high level

Speaker:

at IBM Think. But then for Summit, everything was about

Speaker:

agents. Everything went down to

Speaker:

how does this implement to the agent, how does the inference of AI

Speaker:

implement to the agent, how does the data implement

Speaker:

to the agent? You know, the orchestration layer,

Speaker:

kubernetes, all these things. It all had to do with the agent.

Speaker:

And that was really interesting to see how the conversation

Speaker:

over the last two years has shifted from all

Speaker:

of these individual parts. I think the last time I was on, on your show

Speaker:

and I know you and I have talked a lot about, about how AI. There's

Speaker:

been a lot of these parts, but nothing has kind of unified them.

Speaker:

I think what we're seeing with AI agents is going

Speaker:

to be that unification. The agent will become the unification part of all these

Speaker:

different parts of the AI industry where all these tools now will come

Speaker:

together. And we saw a lot of that at Red

Speaker:

Hat Summit. You don't think that'll be. Harnesses will ultimately

Speaker:

be the container for that, where all these things will live and harnesses will be

Speaker:

kind of like top level abstraction. This is a really good

Speaker:

question because this is the big debate within the AI labs

Speaker:

and the AI community, are you invested in

Speaker:

harness engineering or do you think the models

Speaker:

themselves will just supersede

Speaker:

the harness and that they can be knowledgeable

Speaker:

enough to basically function agentically without.

Speaker:

So obviously the open AIs and the

Speaker:

clouds of the world and anthropic. They're probably a little bit

Speaker:

more on the model side because that would ultimately benefit them.

Speaker:

Right where I think the IBMs and the

Speaker:

Nvidias and I would say the majority of the industry

Speaker:

is probably a little bit more on the harness side because that allows

Speaker:

a larger ecosystem of third party tools and something

Speaker:

that's a little bit more familiar to people. I don't know.

Speaker:

I think over the next year or two it'll definitely be harnessed because that's where

Speaker:

we've seen the most advancement. But with things like mixture of

Speaker:

expert models just continuing to advance and how

Speaker:

they can do reasoning and they can do a lot of agentic.

Speaker:

It could be that we see the model layer

Speaker:

chip away at the harness layer and is this going to be a back and

Speaker:

forth and it really just gets also into

Speaker:

how do you inject the context. And this is closely

Speaker:

related to the same argument of is RAG still needed?

Speaker:

With context size growing so much, why would you need rag? And

Speaker:

I think from an enterprise standpoint, and I think Red Hat is

Speaker:

very big on the harness side because we see the

Speaker:

need for different security layers, different different integrations into third party

Speaker:

tools, different

Speaker:

authorization layers, routing,

Speaker:

networking that the model

Speaker:

will not be able to manage completely, at least for

Speaker:

a while. And that's where I think the harness engineering layer will

Speaker:

exist because there are all these existing technologies

Speaker:

that the agent needs to integrate with and that's all going to

Speaker:

happen at that harness layer and then be

Speaker:

executed within that runtime layer. Yeah, that's how I see it

Speaker:

too. I think the harness layer is really going to be.

Speaker:

It may not be a foundational type situation where you build on

Speaker:

top of it. I see it more as the mortar between the bricks.

Speaker:

I agree. Right. Like, and it's not

Speaker:

that the mortar is more important than the bricks, but

Speaker:

the bricks are kind of pile of rubble

Speaker:

unless you have mortar kind of holding in place. That's kind of how I see

Speaker:

the harness story evolving.

Speaker:

But I have a hard, I, I have a hard time

Speaker:

imagining models ever being able to be that far advanced.

Speaker:

However, you know, we've gotten

Speaker:

further with the LLM architecture than I ever thought we would.

Speaker:

Synthetic data has been more. And

Speaker:

distillation has worked better than I ever thought it would. So Take.

Speaker:

Take my thoughts with that in mind. Right. You know,

Speaker:

when. When I looked at synthetic data and kind of

Speaker:

distillation in particular. Right. There's a meme where they show,

Speaker:

you know, somebody fishing in the. In the water, and then somebody is

Speaker:

fishing from that guy's pot, and then from. Somebody was fishing from that

Speaker:

guy's pail. Right. And then they show each subsequent

Speaker:

fisherman was like, more and more distorted. We've not really seen

Speaker:

that come about. Right. It's not like you're copying VHS

Speaker:

tapes where subsequent generation gets

Speaker:

worse. I'm sure that if you don't do it carefully, you'll

Speaker:

get some weird artifacts. But it's not been.

Speaker:

That has not been a default case, which I think is interesting. It

Speaker:

is interesting too, because most of the models that are out

Speaker:

right now are

Speaker:

distillations of actually GPT4 family. Right.

Speaker:

Even the GPT5 is still a direct

Speaker:

distillation of 4. It was not completely retrained.

Speaker:

And Anthropic obviously has their first generations and second

Speaker:

generations, but we actually haven't seen very much

Speaker:

new generation just because how expensive it is to create

Speaker:

from. From fresh. And from. What I'm imagining is that they've

Speaker:

tried and it's just they haven't gotten the results that they wanted.

Speaker:

So I think that will be what we see. I don't know. I haven't heard

Speaker:

if Mythos is a. So if people aren't following the. With Mythos

Speaker:

model from Anthropic, it's a.

Speaker:

It's a model that they've withheld because supposedly it's too

Speaker:

risky. I don't know if that model is.

Speaker:

Is a whole new generation. I would imagine that it probably

Speaker:

is. But to your point, most of the models are out there now,

Speaker:

and what we know from the Chinese models, that they're all just distillations of

Speaker:

the American models. We have proof now that they've been

Speaker:

mass API, hitting the

Speaker:

GPT and Anthropic and Gemini to

Speaker:

create the generation of Chinese models that we have now.

Speaker:

So that's something. And they get. They're very performant. Like, those models are very, extremely

Speaker:

good. Very good. I mean, it just shows you like this. This is

Speaker:

not the paradigm of, you know, analog VHS copying. Right. This

Speaker:

is more. More, I guess, in the style of, you know,

Speaker:

remixing an old song digitally. Right. You

Speaker:

don't really get it's. It's not

Speaker:

a well thought out analogy, Christopher, but. But, you know, you'll hear

Speaker:of techno songs in the early:Speaker:

them on I don't go to Clubs Anymore, but on my, on my what's

Speaker:

New and what's Hot techno playlists on Spotify.

Speaker:

Right. I, I recognize the same backbeat, I recognize the same

Speaker:

chorus, right. Like from songs from like 20, 30 years ago, right. Like,

Speaker:

and even sampling and rap music, right. Like, it's a bit more like that

Speaker:

where you do get a completely fresh perspective based on older parts.

Speaker:

And that's something that I did not expect. I, I just assumed that it would

Speaker:

be some kind of. You would start getting really bizarre artifacts after

Speaker:

so many generations. But that's not been the case. So,

Speaker:

you know, I think it's interesting because we really don't. This is really uncharted

Speaker:

territory, right? These are. Yes, they're based on very well known mathematical

Speaker:

principles. But like, as these systems get more complex,

Speaker:

it's getting harder and harder to predict not just their behavior, but

Speaker:

the range of their behaviors. Yep. One second. I'm going to grab

Speaker:

something because we'll do a little bit of show and tell as well. Cool, cool,

Speaker:

cool. So while you're away, I will

Speaker:

maybe I can interview a dachshund. So what is, what do dogs think about AI?

Speaker:

Everybody and their cousin and their dog has a AI startup now. So what's your

Speaker:

AI startup? Oh, a link shortener.

Speaker:

Okay, cool. Because I get it. Your short

Speaker:

legs. I get it. That's cool.

Speaker:

While we're waiting for Christopher to come back, you all know I'm a big

Speaker:

fan of Humble Bundle, so. Humble

Speaker:

Bundle. Oh, you're back. Cool. Oh, you can finish your thought.

Speaker:

Humble oh, so Humble Bundle. I actually, so I

Speaker:

worked the booth. I had a, A talk on day one and I worked

Speaker:

the booth on the subsequent days and

Speaker:

you know, a lot of people came by and

Speaker:

other Red Hatters. Actually, I, I was showing them

Speaker:

Humble Bundle. I'm sorry, go ahead.

Speaker:

No, I just said that looks really cool. Yeah, yeah. So if

Speaker:

you're not Familiar with it, humblebundle.com it started as games,

Speaker:

but if you go and you pick store.

Speaker:

Not store, I'm sorry, bundle,

Speaker:

you can pick books and there's comic books there. But there's

Speaker:

also a lot of stuff here that is particular around

Speaker:

software. Right. So in this example here,

Speaker:

this is the books on practice

Speaker:

exams for AWS and gen AI,

Speaker:

all sorts of interesting stuff here. Security.

Speaker:

This is actually a hybrid of like courseware.

Speaker:

So they also have software oops.

Speaker:

Bundles that are, you know, sometimes it's kind of like image editing

Speaker:

tools and things like that. But very often

Speaker:

they Will have courses for, you know, how to get into Open Claw and things

Speaker:

like that. And I know if you don't know, Christopher is really into openclaw,

Speaker:

he helped me get my Claudia kind of up and working.

Speaker:

But if you go here, I know a lot of listeners Data Driven are

Speaker:

big into Power bi. These are basically courses on Power Bi

Speaker:

and things like that. And the cool thing is it's

Speaker:

$20 for 17 courses and a

Speaker:

portion of your cost goes to a charity.

Speaker:

So it's really cool. You get a lot of material and you know, a

Speaker:

charity gets funded and things like that. So definitely

Speaker:

check it out. They often have AI books or,

Speaker:

you know, app development books. A lot of things around

Speaker:

game development too because that's kind of where Humble Bundles started.

Speaker:

Nice. That's a great segue too because speaking of

Speaker:

openclaw, so when I got home from

Speaker:

Red Hat Summit, this arrived. Oh, nice.

Speaker:

So I haven't gotten a Mac Mini in

Speaker:

probably like over 10 plus years. So when this came I was

Speaker:

kind of like. Because I don't know if you remember, the Mac Mini

Speaker:

was maybe about the same size, but it was much bigger than this.

Speaker:

And I'm actually holding this with like one hand right now.

Speaker:

But the reason why I got this is because

Speaker:

Red Hat in particular wants to make sure that all of

Speaker:

the agents that I'm running for Red Hat are isolated

Speaker:

for runtime. So I could use

Speaker:

my. Let me see if I can pull it over. You have one of those

Speaker:

framework things, right? This is the framework. Yeah. This is the

Speaker:

size of it. So that is actually

Speaker:

powering my home Lab that has OpenShift in it. I could

Speaker:

do that. And that's actually where a lot of our tooling are going

Speaker:

towards. But I also need a

Speaker:

agent to have access to my email, have access to like more

Speaker:

like my day to day tooling which actually exists more. More on a

Speaker:

desktop. And that's where this guy comes into play.

Speaker:

It's interesting. Now we're separating

Speaker:

the harness from its runtime and now I'm dealing with

Speaker:

multiple runtimes. I'm going to have runtimes that probably run on the home lab

Speaker:

and now we have runtimes that are going to run on this.

Speaker:

This one would be. I need it to do something that actually

Speaker:

involves some kind of GUI or something that's already on my

Speaker:

desktop, which just there's no API set up for me to do.

Speaker:

Or I need it to do something, maybe some, something basic that's really easy to

Speaker:

do within the Mac ecosystem where this

Speaker:

like my home Lab, maybe it's an agent that's running diagnostics

Speaker:

on like AI ops diagnostics on my

Speaker:

home lab. Why isn't something up? Why isn't it working correctly?

Speaker:

And this is where the whole concept of

Speaker:

runtime now has become such a big thing. And I think it will continue to

Speaker:

become more important this year. Harness is kind of getting the

Speaker:

spotlight, but we need to move more into this runtime conversation of

Speaker:

okay, now the harness has put the context together, it's put all the knowledge together

Speaker:

and the skills. The agent is running the agentic loop with

Speaker:

the model. But now where does the output actually run?

Speaker:

Does it run on your, your personal computer

Speaker:

where it has access to sensitive information and you know, it

Speaker:

could do things that it shouldn't or does it run in an

Speaker:

isolated environment? So this is probably going to act more as like a little server

Speaker:

that runs here in my office where

Speaker:

agents, this is just for agents. This box. Where is the inference run?

Speaker:

Where does the inference run for your agents? Is it run? Yeah.

Speaker:

Does it run on that? Does it run on your framework or does it run

Speaker:

in a hypercloud service? So I'm actually

Speaker:

doing a new technique called semantic routing.

Speaker:

All my requests go to my home lab first. Within what we would call the

Speaker:

control plane for the agent, there's a router that

Speaker:

exists that actually evaluates the information that's coming in and

Speaker:

decides based off of sensitivity and complexity where

Speaker:

this route should go. About 80% of my traffic actually hits

Speaker:

the framework for a model that's running within Vllm

Speaker:

on the framework device itself on

Speaker:

OpenShift and then about 20% where I've deemed kind of high

Speaker:

reasoning. Then we'll get sent off

Speaker:

to our corporate Gemini account that we have within Red

Speaker:

Hat. So this way it's also really

Speaker:

nice because when I first started working with agents all the way back, I mean

Speaker:

I've been working with agents for, for years and years. But our current

Speaker:

modern day idea of what agents look like, back at

Speaker:

the beginning of this year I was running out of

Speaker:

tokens. I was getting throttled by, by Google and there was

Speaker:

nothing I could do about that because that was part of our corporate account. It

Speaker:

wasn't anything to do that I could go and change the knobs. So moving to

Speaker:

this semantic routing approach allowed me to not

Speaker:

run into that throttling anymore. Most of my things go. So right now I'm running

Speaker:

a quinn, the quinn 3.6

Speaker:

35B mixture of experts model. Nice.

Speaker:

And that's running right now and doing all of my local agentic work.

Speaker:

It's doing most of the low reasoning tasks and then all the high

Speaker:

reasoning tasks then get sent off to Gemini.

Speaker:

So do you ever have it set up where the high reasoning task will divvy

Speaker:

up a bunch of low reasoning tasks and then send that down to your Quinn?

Speaker:

Or is that something in the works? I have

Speaker:

experimented some with that. So

Speaker:

that gets into some like post inference type of techniques that

Speaker:

we've been experimenting with, myself included.

Speaker:

I haven't gotten that far yet. This is where

Speaker:

areas such as like speculative decoding kind of come into play or

Speaker:

post inference technique. Why would speculative

Speaker:

decoding come into play here?

Speaker:

Yeah, because there could be a speculator that sits

Speaker:

at the local model that actually

Speaker:

acts as kind of almost like a guardrail to the

Speaker:

larger model where it can actually start reasoning about

Speaker:

some of the things earlier on and decide

Speaker:

basically acts as a breaker. I got you. And that makes sense.

Speaker:

That's where speculative decoding would be kind of the

Speaker:

next iteration on that where

Speaker:

it's really the management of knowledge and memory and cache at that

Speaker:

point. I really haven't gotten into that with my local setup, but that's

Speaker:

part of that whole last mile where memory I

Speaker:

think will be the last portion of the last mile for

Speaker:

everybody. It's going to be memory management, it's going to be cache management.

Speaker:

When you say memory, organizational memory, not necessarily the physical memory.

Speaker:

When I'm talking about memory, I'm talking about the memory of

Speaker:

the agent itself. For

Speaker:

OpenClaw, for example, every time it makes decisions,

Speaker:

it keeps a compressed record of what it's done in these

Speaker:

JSON files and then it will reference that

Speaker:

your cloud code does something very similar. Every time you hit your token

Speaker:

context window maximum, you'll see that it's doing a bunch of

Speaker:

compressions and it takes a little thought. That's actually what

Speaker:

we call a form of memory. If you've actually been following the news.

Speaker:

Even just today,

Speaker:

Google IE just announced a whole new

Speaker:

agentic memory platform, a

Speaker:

framework that fits right into this. And that's why I think memory

Speaker:

is going to be the next iteration on. On,

Speaker:

you know, improving the agentic system. And that's not the KV cache,

Speaker:

that's not your physical memory. It's not the

Speaker:

agentic memory would be a. Yeah, it's like a gentic memory. It's how your

Speaker:

agent has recogn

Speaker:

reconciling what it's doing doing and has. It's. It's

Speaker:

outside of the context window, but it's not the KV

Speaker:

cache. It's something that's like, oh, this is what I've done in the

Speaker:

past and this is the context I need that I just

Speaker:

need to keep carrying forward in my conversations.

Speaker:

It's something that maybe it's not an MD

Speaker:

file, it's not like permanent knowledge. It could get flushed. You could just say

Speaker:

go ahead and flush your, your memory

Speaker:

and that may actually be what you need to do because maybe it's. There's a

Speaker:

lot of nonsense in there or something that's doing something wrong. It's not

Speaker:

meant to be long term. Think of it like human short term memory. Exactly what

Speaker:

it is. Interesting. Not everything that we do is long term.

Speaker:

So long term memory in this case would be the, your

Speaker:

MD files, it would be your kv,

Speaker:

potentially even like some layers of your KV cache where I

Speaker:

would actually consider that more like intermediate. But it's really that

Speaker:

long lasting context that just keeps getting injected in where

Speaker:

this concept of, of memory that we keep hearing about

Speaker:

is more of that short term memory of what knowledge

Speaker:

do you need to have right now to make the decisions that you need to

Speaker:

make based off of the reasoning and the topics that you're working with

Speaker:

right now? So a good example would be. I'm sorry, go ahead. No, no,

Speaker:

no, no, go ahead. No, good, good example. You like your hotel room number when

Speaker:

you go to a conference, right. Like you're never going to cancel, you're having, you

Speaker:

know, needing to remember that beyond once you check out is very low.

Speaker:

Or when you get the two factor authentication, the six digit code. Right. You only

Speaker:

need to remember that for a very short window of time.

Speaker:

Yes, exactly. And that's a prime example where

Speaker:

you could long term forget that information. But in the

Speaker:

short term it would be very detrimental if you forget your hotel room, you have

Speaker:

to go and ask somebody and that, that takes time. Yeah, it takes time and

Speaker:

that's exactly the same narrative. It's not that the agent couldn't get that information,

Speaker:

it's just that it's faster for the agent to get that information if it's

Speaker:

located in some type of short term memory. And that's

Speaker:

where we're seeing so much advancement in, in these

Speaker:

agentic platforms. Did you want to, did you want to add

Speaker:

anything to that? I know we're coming up to time, so I just. Oh no,

Speaker:

I mean, no, I appreciate your time. I see the, I see that we' up

Speaker:

on time and you know the.

Speaker:

No, I think there's a lot. I think, I think the one thing I learned

Speaker:

this week was it's very easy to think that you're behind

Speaker:

everyone else. But you know, we've had people, we had people come in the booth,

Speaker:

like, I don't know anything about this, tell me where to get started. And I

Speaker:ke, you know, to hear that in:Speaker:

both shocking and, and

Speaker:

refreshing. Right. You know, there were, there were people. I'm not going to name

Speaker:

any names, but like, you know, there are people who are in our

Speaker:

division and they've not even installed

Speaker:

Claude yet. Open Claw. I mean, I

Speaker:

always get those two confused, even though I know they're very different things.

Speaker:

But, you know, who've not installed Open Claw, like

Speaker:

on their own? And it's just like I feel behind because I

Speaker:

have Open Claw, but I don't have it as set up. Well, set up as

Speaker:

you. Right. But I do have it, you know, so it's kind of

Speaker:

like it's, it's, it's, you know, don't be afraid of being behind because

Speaker:

chances are you're probably not. No, no. Part of the reason why

Speaker:

I have the dog do that intro now, which of course is, you know, obviously

Speaker:

AI generated was part of the joke of that was everybody on their dog is

Speaker:

an AI expert now. And there's not

Speaker:

really any experts. There's probably about half a dozen people

Speaker:

worldwide that really are on a whole other level.

Speaker:

I mean, the Andrew Angs. The Andrew Angs of the world. The

Speaker:

Jeffrey Hinton's of the world. Right. Like those are the people.

Speaker:

Yan Lecun for sure. You don't

Speaker:

hear much from Yahshua Bengio anymore. But you know,

Speaker:

like people at that level, right. At that, that strata, like

Speaker:

they are, they really are like that far ahead.

Speaker:

And it's always interesting seeing like what problems they're trying to solve.

Speaker:

I think is very interesting. What is particularly interesting, I think it was. John Lecun

Speaker:

is very skeptical of LLMs getting any further

Speaker:

along. Yeah. Which I think is

Speaker:

interesting. I mean, it's, you know, at this point almost a 8 or 9 year

Speaker:

old concept of LLM

Speaker:

transformers. The concept that he created. The concept that he created. Right.

Speaker:

So the underlining layers. Yeah, yeah. So like a lot

Speaker:

of. Go ahead. I was gonna say there's a lot of new

Speaker:

interviews that he has out in the last couple weeks about, you know, his

Speaker:

new approach to AI and how he

Speaker:

sees it superseding LLMs. And that'll be interesting

Speaker:

too because he's looking at it from a whole new direction than just

Speaker:

how LLMs just, they're just,

Speaker:

they're just building the next pixel the

Speaker:

next text where he's looking at it from a whole new

Speaker:

direction of, you know, maybe we built this

Speaker:

house of cards wrong. We need to just kind of start over and basically

Speaker:

like, stop, start at the basics and, and build something better from what we've

Speaker:

learned. And it'll be very interesting to see what he comes up with out of

Speaker:

all this. Yeah, I, I, because I, I'm surprised we've gotten

Speaker:

this far this fast with LLMs. I, I

Speaker:

really thought, like, the whole reasoning aspect to LLMs is something I

Speaker:

did not see that they were, I did not, I would not have bet real

Speaker:

money on them being able to do that. Right. But here we are. Like, they

Speaker:

clearly can do some level of reasoning. How much is probably

Speaker:

debatable, but the fact that they're just, you

Speaker:

know, you hear that they're just like text prediction thing algorithms on your

Speaker:

phone where they predict the next word. Well, technically true,

Speaker:

I think doesn't really tell the whole story. Right. Like, that's like saying that the,

Speaker:

the F35 fighter is the same thing as a

Speaker:

paper airplane. Right. Like, they do have to apply, they do have to obey the

Speaker:

same laws of physics, thrust, lift, gravity, blah, blah, blah.

Speaker:

But they are very different animals in that sense.

Speaker:

Very much. I agree with that analogy. It's really good. Cool, man. I love to

Speaker:

have you on the show again. We could talk open claw. Yeah. You've done some

Speaker:

crazy cool stuff with that. Definitely. I know some of the

Speaker:

agents that you've built that people probably don't want me talking about because I know

Speaker:

you made a lot of it. Security people very nervous.

Speaker:

That's true. But the stuff that you've been able to

Speaker:

automate has been nothing short of like, oh, my God, that's amazing.

Speaker:

And also super useful. Crazy too, for me is that I've

Speaker:

been so busy that so much of the stuff that I did that people were

Speaker:

talking about was like one to two months ago

Speaker:

and I think this summer. So there was actually a,

Speaker:

a really popular podcast out of the the

Speaker:

AI Daily Brief that went out where he was talking about how

Speaker:

everything that's happened over the last six months basically came out of

Speaker:

Christmas break. So, like, everyone went home and

Speaker:

had a few weeks to just like, play around with this stuff. I was one

Speaker:

of those people. So, like, so much of what I did came out of those,

Speaker:

like, experimentation phases. And I think I have to repeat

Speaker:

that this summer because there's so much new things that we've learned.

Speaker:

Right. That, But I still haven't built on top of that yet. And I think

Speaker:

so for me right now, so many of my agents are doing very simple tasks.

Speaker:

They're doing information gathering, they might be looking at

Speaker:

meetings and suggesting that I read certain articles

Speaker:

correlating to something I'm about to talk about. But I want to go the

Speaker:

next level where I get into like a multi agent system where

Speaker:

I have like a chief of staff agent who's got one

Speaker:

that's doing programming demos and then I have another one that's doing

Speaker:

like general administrative assistant work or another

Speaker:

one that's front facing, you know, I

Speaker:

a model that's on our slack that people can just ask questions to

Speaker:

based off of my institutional knowledge that I have of, of

Speaker:

our company and our industry. So that's the next phase and that's

Speaker:

where the memory stuff has to come into play and the multi agent kind of

Speaker:

orchestration and all these things are things that are being worked on now. So there's

Speaker:

not like a clear winner or a clear understanding of what's what

Speaker:

that looks like right now. But we're all kind of playing around with it. So

Speaker:

I think that's kind of the next phase. And yeah, I look forward to coming

Speaker:

back. And I think that's probably part two of this conversation will be absolutely.

Speaker:

What does that look like? What are these tools? How do we kind of build

Speaker:

on top of this thing called openclaw or

Speaker:

Hermes or all these other ones that are out these days. Yeah, that'd be

Speaker:

awesome. And even if we just do a deep dive on like kind of what's

Speaker:

exactly, you know, what's what. Because I know you mentioned a couple of things that

Speaker:

maybe most of our listeners don't fully grok, right. Because

Speaker:

we have a lot of data engineers here too. Right. So and the

Speaker:

other thing too that really came out was people would ask me questions about because

Speaker:

we have something that Microsoft folks may know as TFAs or technical focus

Speaker:

areas, call them pillars. So you're the agentic lead, I

Speaker:

believe, and I'm the connecting models to data. Right.

Speaker:

So the rag and that sort of thing and you know, a lot of the

Speaker:

conversations I had was, you know, data engineering is

Speaker:

more important now in AI systems than they were in the past

Speaker:

because I don't know exactly how

Speaker:

rag agentic systems would fail but,

Speaker:

but when they fail, they probably fail very spectacularly.

Speaker:

But I know with, with rag systems, right, you know, if

Speaker:

your data chunking strategy and your data kind of

Speaker:

indexing strategy is not, I wouldn't say perfect because you'll

Speaker:

never really get there, but appropriate to the data source documents that you're dealing with,

Speaker:

you're not gonna. It's gonna fail in a way that is subtle and is only

Speaker:

gonna amplify get worse down the road. Right. So you

Speaker:

really have to think through a lot of these things. Right. The, the one sentence

Speaker:

I said most of all was, you know, chunking

Speaker:

is an architectural decision. Yes. It's

Speaker:

an important one. Treat it with that importance as opposed to just whatever,

Speaker:

you know, paragraph by paragraph or blah, blah, blah, blah, blah.

Speaker:

So that was other consistent theme. But I will say that the, the questions that

Speaker:

I get are far more evolved than I haven't gotten at any

Speaker:

other conferences in a while. I agree.

Speaker:

Especially this year. It's just a step up from where we were. So. Yeah. Cool.

Speaker:

This is great. Thank you for having me on. No problem. We'd love to

Speaker:

have you back. And since the recordings for the podcast, we'll let the music play.

Speaker:

It.

About the author, Frank

Frank La Vigne is a software engineer and UX geek who saw the light about Data Science at an internal Microsoft Data Science Summit in 2016. Now, he wants to share his passion for the Data Arts with the world.

He blogs regularly at FranksWorld.com and has a YouTube channel called Frank's World TV. (www.FranksWorld.TV). Frank has extensive experience in web and application development. He is also an expert in mobile and tablet engineering. You can find him on Twitter at @tableteer.