Barr Moses on How Data Observability Can Save Your Company Millions

On this episode of Data Driven, we welcome Barr Moses, CEO and co-founder of Monte Carlo, as she delves into the fascinating world of data observability.

Join hosts Frank La Vigne and Andy Leonard as they explore how reliable data is crucial for making sound business decisions in today’s tech-driven world. Learn why a simple schema change at Unity resulted in a $100 million loss and how Monte Carlo is developing cutting-edge solutions to prevent similar disasters. From discussions on ensuring data integrity to the intriguing potential of AI in anomaly detection, Barr Moses shares insights that might just redefine your understanding of data’s role in business.

Tune in for a podcast that not only uncovers the nuances of data reliability but also touches on the quirky side of tech, like why, according to Google, you should never use superglue to fix slipping cheese on your pizza.

Moments

00:00 Monte Carlo: Data Reliability Innovator

05:45 “Data & AI Observability Engineering”

09:42 Data Industry’s Growing Importance

12:00 Cereal Supply Chain Data Optimization

16:03 Data Observability and Lineage

19:29 GenAI Uncertainties and Latency Concerns

23:17 “Human Oversight in AI Accuracy”

24:12 Data Observability and Human Role

28:01 Adapting to Customer Language

33:29 Data and Security Management Alignment

35:20 Data Reliability and Observability Challenges

38:17 Automated Code Analysis Tool Launch

42:29 Data-Inspired Childhood

44:12 Passionate About Impactful Work

48:52 LinkedIn Security Concerns Highlighted

53:19 “Data Observability Insights”

Transcript

Speaker: 00:00:00

Welcome to Data Driven, where we dive into the thrilling world of data,

Speaker: 00:00:04

AI, and on occasion, misbehaving chatbots suggesting

Speaker: 00:00:07

glue for your pizza. This episode features Bar Moses,

Speaker: 00:00:11

CEO of Monte Carlo. Not the casino, not the car,

Speaker: 00:00:15

but the company keeping your data from quietly wrecking your business.

Speaker: 00:00:19

We talk observability, the chaos of unreliable data,

Speaker: 00:00:22

and why one tiny schema change cost a company

Speaker: 00:00:25

$100,000,000. Ouch. So buckle

Speaker: 00:00:28

up. Because if your AI bots are making decisions without

Speaker: 00:00:32

reliable data, well, hope you like eating rocks for the

Speaker: 00:00:35

minerals. Hello, and

Speaker: 00:00:39

welcome back to Data Driven, the podcast where we explore the emergent

Speaker: 00:00:43

fields of data science, artificial intelligence, and, of course, data

Speaker: 00:00:47

engineering. And with me today is my favorite data engineer in the

Speaker: 00:00:50

world, Andy Leonard. How's it going, Andy? It's going well, Frank.

Speaker: 00:00:54

How are you? I'm doing well. I'm doing well. I was in Raleigh last

Speaker: 00:00:57

week, drove down, rented a car actually,

Speaker: 00:01:02

to save mileage on, on ours, and,

Speaker: 00:01:07

spoiled because it's been a while since I bought a new car. And

Speaker: 00:01:10

this is the second time I rented a car, and I'm getting tempted. I ain't

Speaker: 00:01:13

getting tempted. It was a Chevy. It was

Speaker: 00:01:17

a Chevy Malibu. Not a Monte not a Monte Carlo.

Speaker: 00:01:21

See what I did there? I don't even know if they still make them. I

Speaker: 00:01:25

I was driving, the little one off and dropping the little one off at daycare,

Speaker: 00:01:28

and I was behind a Chevy Monte Carlo, like, a two early

Speaker: 00:01:32

two thousands vintage. But that is actually quite relevant

Speaker: 00:01:36

to our discussion today because with us today, we have Bar Moses, who is the

Speaker: 00:01:40

CEO and cofounder of Monte Carlo, the data

Speaker: 00:01:43

and AI reliability company, not the casino

Speaker: 00:01:47

or the car, I would assume, or the town. Monte Carlo

Speaker: 00:01:51

is the creator of the industry's first end to end data and

Speaker: 00:01:54

AI, observability platform with

Speaker: 00:01:58

$236,000,000 in funding from Accel

Speaker: 00:02:01

Iconic Growth and others. They are on a mission to bring

Speaker: 00:02:05

trustworthy and reliable data and AI, to

Speaker: 00:02:08

companies everywhere. The company was recently recognized as

Speaker: 00:02:12

a enterprise tech 30 company, a CRN

Speaker: 00:02:16

emerging vendor, and an inc.com,

Speaker: 00:02:19

best workplace and accounts Fox, Roche,

Speaker: 00:02:23

Nasdaq, and PagerDuty, among others, as their customers. Welcome

Speaker: 00:02:27

to the show, Bar. Thank you so much. Great to be here, Frank and

Speaker: 00:02:30

Andy. Awesome. An intro. No problem. Do you drive a

Speaker: 00:02:34

Monte Carlo? Because that would be epic. You know, I really should

Speaker: 00:02:38

be driving a Monte Carlo. I do not, and I've never actually been to

Speaker: 00:02:42

Monte Carlo either. So I will tell you if you're into cars,

Speaker: 00:02:45

like, I'm like a recovering car, nerd. Oh,

Speaker: 00:02:49

very cool. It looks like a car show. Like, honestly, I went to Monte

Speaker: 00:02:53

Carlo, and we had rented, like, a Saab convertible. And I felt like we were

Speaker: 00:02:57

driving. We were driving driving, like, the low end

Speaker: 00:03:00

of the car thing. I mean, there were I mean, I've never

Speaker: 00:03:04

seen Bentleys in the wild, like, just parked on the street,

Speaker: 00:03:08

like, no big deal. Wow. Like, I mean, every

Speaker: 00:03:11

luxury car if you're in a Saab and you feel like you're slumming it

Speaker: 00:03:15

Yeah. It is clearly a high money area.

Speaker: 00:03:20

But, so welcome to the show. So Monte Carlo

Speaker: 00:03:23

why'd you get the name? I I'm assuming it might have something to do with

Speaker: 00:03:26

Monte Carlo simulations, but that's in the Great question. Yeah. The

Speaker: 00:03:30

unofficial story is that, one of our CO, founders is a fan

Speaker: 00:03:33

of formula one and, you know, as, you know, formula one crisis.

Speaker: 00:03:37

So right. That's, you know, clearly the, the, that's the

Speaker: 00:03:41

unofficial story. The official story is that, you know, we

Speaker: 00:03:45

had to we had to name the company. We started working with customers when we

Speaker: 00:03:48

started the company, and we we had to choose some name.

Speaker: 00:03:52

And, I studied math and stats in college, and so I sort

Speaker: 00:03:56

of opened my my stats book and sort of looked through and,

Speaker: 00:03:59

you know, reviewed my option and, you know, Markov,

Speaker: 00:04:03

chains didn't seem like a great name. And next up was

Speaker: 00:04:07

Bayes' theorem, which was similarly kind of not great. And

Speaker: 00:04:11

and then, you know, I was reminded of Monte Carlo and Monte Carlo simulations. I

Speaker: 00:04:14

actually I actually did some work with Monte Carlo simulations earlier in my career.

Speaker: 00:04:18

And it seemed like it seemed like a great name, a name that would speak

Speaker: 00:04:21

to, you know, data engineers, data analysts, folks that have been the space.

Speaker: 00:04:25

And, you know, I think naming a company is a very difficult

Speaker: 00:04:29

thing to do. We decided to go with it. And the spirit of Monte Carlo,

Speaker: 00:04:32

One of our values is ship and iterate. And so, the

Speaker: 00:04:36

name has sort of stuck with us since. And, it's quite memorable. People either

Speaker: 00:04:39

love it or hate it. So I think it works for us. I think it

Speaker: 00:04:43

it works. Like, I think of the car. I think of the casinos. It has

Speaker: 00:04:46

a certain amount of, high class, maybe more so than Markov

Speaker: 00:04:50

chains, Markov chains. Although I did for a time flirt with the

Speaker: 00:04:53

idea of of also starting a company called Markoff Chains, but,

Speaker: 00:04:57

like, have see if we could see if we can get money for mister t

Speaker: 00:05:00

to be the spokesman. That would

Speaker: 00:05:04

have been epic. Yeah. Jeez. He did you. Ideas, Fran. I was the

Speaker: 00:05:08

only one I was the only one that thought that was a good idea, but,

Speaker: 00:05:11

you know, I was a big fan of mister t as a kid. Marketing. Yeah.

Speaker: 00:05:14

That's funny. That's what I do in my day job now. Oh, yeah.

Speaker: 00:05:20

I swear, folks, I didn't pay her to say that.

Speaker: 00:05:24

So so you you talk about data and I AI

Speaker: 00:05:28

reliability. And to me, when when I hear that,

Speaker: 00:05:32

a slew of things come to mind. Like, there's security, there's the

Speaker: 00:05:36

veracity, like, the five v's and all that or four v's or whatever it

Speaker: 00:05:40

was. What exactly is kind of Monte Carlo's, like,

Speaker: 00:05:43

wheelhouse there? Yeah. Great question. I'll

Speaker: 00:05:47

actually sort of anchor ourselves in in kind of the metaphor or sort of a

Speaker: 00:05:50

corollary that we like to use here, which is really based on software engineering.

Speaker: 00:05:54

So we didn't reinvent the wheel when we say data and AI observability.

Speaker: 00:05:59

We really take concepts that work for engineering and adapt them.

Speaker: 00:06:03

So, you know, when we started the company, the idea, the

Speaker: 00:06:07

hypothesis, the the thesis that we started the company on was data

Speaker: 00:06:10

was going to be as important to businesses as applications, as online

Speaker: 00:06:14

applications. And, they were data was going to

Speaker: 00:06:18

drive the most critical sort of, you know, lifeblood of companies through

Speaker: 00:06:22

decision making, internal products, external products.

Speaker: 00:06:25

And, while software engineers had all the solutions and tools in the

Speaker: 00:06:29

world to make sure their applications were reliable, and so some, you

Speaker: 00:06:33

know, some off the shelf solutions like Datadog, New Relic, Splunk might be

Speaker: 00:06:37

familiar to you, data teams were flying wide. So there was literally

Speaker: 00:06:41

nothing that they could use to know that their data was

Speaker: 00:06:44

actually accurate and trusted. That's sort of, like, the the problem the core problem that

Speaker: 00:06:48

we started. Fast forward to today, you know, we created the data observability

Speaker: 00:06:52

category. We're continuing to create it. AI is making this problem just

Speaker: 00:06:56

infinitely bigger, harder, more important. Why? Because

Speaker: 00:06:59

data and AI products are now you know, there's a proliferation of those.

Speaker: 00:07:04

An AI application is only as good as the data that's powering it,

Speaker: 00:07:08

and the AI application itself can be inaccurate, can be

Speaker: 00:07:12

unreliable. Right? And so at a very high level

Speaker: 00:07:15

I know this is, you know, very vague, but at a very high

Speaker: 00:07:19

level, the idea was the same diligence that we treat software

Speaker: 00:07:23

applications, we should be treating for data and AI applications. Now,

Speaker: 00:07:26

what does that actually mean? How do we do that? Enter the concept of

Speaker: 00:07:30

observability. Observability is basically understanding or

Speaker: 00:07:33

assessing a system's health based on its output.

Speaker: 00:07:37

And so basically, the thesis was, can we observe end to end the

Speaker: 00:07:41

data and AI estate, learn what the patterns

Speaker: 00:07:45

are in the in the data, bring together metadata and context,

Speaker: 00:07:49

lineage, for example, about the data, derive insights

Speaker: 00:07:52

based on that to understand and determine what the system should

Speaker: 00:07:56

behave like, and alert if that gets violated. So that's sort

Speaker: 00:08:00

of the first part. The first is actually being being able to help data teams

Speaker: 00:08:03

detect issues. The second part is actually being help,

Speaker: 00:08:07

helping data teams resolve issues. Now here's the interesting thing

Speaker: 00:08:11

that we sort of learned over over the years. We've worked with hundreds of of

Speaker: 00:08:14

enterprises. So, you know, we mentioned a few. We real really work with the top

Speaker: 00:08:18

companies in every single industry. So,

Speaker: 00:08:22

you know, in in, in health care, in retail,

Speaker: 00:08:26

in manufacturing, in, technology, in each of these

Speaker: 00:08:29

areas, the, data in the state

Speaker: 00:08:33

obviously varies, but there are actually interestingly commonalities. And the

Speaker: 00:08:36

commonalities is that every single issue can be

Speaker: 00:08:40

traced back to a problem with the data, problem with the code,

Speaker: 00:08:44

problem with the system, or problem with the model output. Can go

Speaker: 00:08:48

into detail into more each of those, but that's sort of the high level,

Speaker: 00:08:51

framework. We basically provide end to end coverage to help data teams

Speaker: 00:08:55

understand what the issues are and help them trace them back to data issues,

Speaker: 00:08:59

code issues, system issues, or model output issues. So when did

Speaker: 00:09:03

you get the idea that I'm sorry, Andy. I cut you off. Okay. When

Speaker: 00:09:06

did you get the idea when you realized that data is gonna be as important

Speaker: 00:09:10

as applications are to businesses? Oh, great question.

Speaker: 00:09:13so we started the company in: 2019Speaker: 00:09:17

And, actually, what's interesting, it was pretty clear to us then, but we

Speaker: 00:09:21

had to prove that or we had to convince that of people. Definitely.

Speaker: 00:09:25

Yeah. It was not obvious. It's it's still there's still a

Speaker: 00:09:29

lot of people that are kind of, like, I guess, they'd be in the quadrant

Speaker: 00:09:33

of laggards where they realize, oh, I guess this is important.

Speaker: 00:09:36A %. I would imagine in: 2019Speaker: 00:09:40

been you would have sounded insane. Like We we sound I

Speaker: 00:09:44

sounded insane a %. People are like, what? Data is

Speaker: 00:09:47

gonna be important? Are you sure? Now a couple of things happened

Speaker: 00:09:51

since, which I think helped. First is,

Speaker: 00:09:55

there were some large acquisitions in the data space, like Tableau and

Speaker: 00:09:59

Looker earlier on, and then Snowflake IPO'd. Snowflake was the

Speaker: 00:10:02

largest software IPO of all times. It was quite interesting that the

Speaker: 00:10:06

largest software IPO of all time is a data company. So I think those

Speaker: 00:10:09

things sort of help kind of convince that this you know,

Speaker: 00:10:13

convince, at least, externally, you know,

Speaker: 00:10:17

to the market that data will continue to be will will be

Speaker: 00:10:21

important and critical. I think the things that I noticed is, you know,

Speaker: 00:10:24

before we even started the company, we spoke to hundreds of data leaders, and I

Speaker: 00:10:28

speak to dozens of data leaders every single month. They continue

Speaker: 00:10:32

and I think what you hear from them is more and more

Speaker: 00:10:35

data teams and software engineering teams are building products hand in hand.

Speaker: 00:10:39

So they're actually they're side by side building. Right? And so, actually,

Speaker: 00:10:43

almost more and more critical business

Speaker: 00:10:47

applications, revenue generating products are based off of

Speaker: 00:10:51

data, and they're being powered by data. I'm not even talking

Speaker: 00:10:54

about generative AI, which is a whole whole other story why that matters, but just

Speaker: 00:10:58

data products by itself. Think about reports that people look at internally.

Speaker: 00:11:02

You know, just give you an example. You know, we work with with, many

Speaker: 00:11:06

airlines, for example. Airlines have a lot of data that goes to internal

Speaker: 00:11:09

operations. Like, what's the connecting flight? What's your flight number? How

Speaker: 00:11:13

many flights left today? What time did they leave? How many passengers were on

Speaker: 00:11:17

the airplane? Where is your luggage? Right? That

Speaker: 00:11:20

information is powering internal and external products. You know, it's powering the application

Speaker: 00:11:24

that you're using in order to onboard the the plane, in order to connect

Speaker: 00:11:28

to your next flight. If that data is inaccurate, like,

Speaker: 00:11:32

you're screwed. Right? And that hurts tremendously. Your brand

Speaker: 00:11:36

is an as an airline, your reputation, it leads to

Speaker: 00:11:39

reduced revenue, increased regulatory risk that you're putting

Speaker: 00:11:43

yourself. Right? So so the data,

Speaker: 00:11:47

what we see from our customers is powering critical use cases like

Speaker: 00:11:50

airlines. I'll give you another example. You know, we work with a,

Speaker: 00:11:54

you know, a Fortune 500 company, perhaps your your favorite cereal.

Speaker: 00:11:58

I don't know if you're you guys are big cereal. I I, like, eat cereal

Speaker: 00:12:01

for breakfast, lunch, and and dinner. It's, like, my go to.

Speaker: 00:12:05

You'd be surprised into how much data optimization, machine learning,

Speaker: 00:12:09

and AI goes into actually optimizing the number and

Speaker: 00:12:12

location of cereal on the shelf. So there's a lot of

Speaker: 00:12:16

data that goes into supply chain management to make sure that you're

Speaker: 00:12:19

actually, like, fulfilling the right warehouse,

Speaker: 00:12:24

demands on time and, you know, making sure that everyone gets

Speaker: 00:12:27

their serial on time. There's actually a lot of data that goes into all of

Speaker: 00:12:31

that. So I think what gave me conviction was in speaking with

Speaker: 00:12:35

so many companies across so many industries, data was

Speaker: 00:12:38

actually allowing data teams, allowing

Speaker: 00:12:42

organizations to build better products, to build more

Speaker: 00:12:45

personalized products, and to make better decisions about the organization.

Speaker: 00:12:49

So I think that really sort of made it clear that the future was going

Speaker: 00:12:53

to be based on on data. Well, I I like that

Speaker: 00:12:56

you pointed out, the importance of observability.

Speaker: 00:13:01

My career path winding as it was,

Speaker: 00:13:05

I made a a leap from being a software developer to being

Speaker: 00:13:09

a data really a database developer. When I made that

Speaker: 00:13:13

transition, one of the things I had noticed, this was two two and a half

Speaker: 00:13:16

decades ago, I had just started in software development

Speaker: 00:13:20

doing test driven development and it had just

Speaker: 00:13:24

come out, it was called fail first development. I remember thinking

Speaker: 00:13:28

this was perfect. It was a big deal. Yeah. It was. Yeah. Twenty five

Speaker: 00:13:31

years ago. And I remember thinking this is perfect because I'm always failing.

Speaker: 00:13:35

So this this will work nothing ever runs the first time and if it does,

Speaker: 00:13:39

it's suspect. But when I got over into data, I had just

Speaker: 00:13:43

become, you know, kind of a a big believer in the power

Speaker: 00:13:47

and and and really the the confidence that

Speaker: 00:13:50

test driven development gave me. And I was like, we need that

Speaker: 00:13:54

over here. And so it was, just a

Speaker: 00:13:58

field that's fascinating me. I have an engineering background, and so it kind of flowed

Speaker: 00:14:02

right through. Instrumenting the data engineering,

Speaker: 00:14:06

was a big deal so that, again, you could achieve what we now call

Speaker: 00:14:10

observability. But being able to watch that data flow

Speaker: 00:14:14s to people kinda like you in: 2019Speaker: 00:14:18

I would get all sorts of responses. Most of them kinda raised

Speaker: 00:14:21

eyebrows. And I would, some of the more interesting ones

Speaker: 00:14:25

were things along the lines of, well, the data is sort of self

Speaker: 00:14:29

documenting. I mean, it's it's just there. And I'm

Speaker: 00:14:32

like, no. No. It's not. It's I especially when you've moved it through

Speaker: 00:14:36

a bunch of transformation to put it into a business intelligence solution or data

Speaker: 00:14:40

warehouse or or any of that. And that now feeds,

Speaker: 00:14:45

you know, modern LLMs, AI, and and the like, those

Speaker: 00:14:48

same sorts of, I guess, old school processes, I

Speaker: 00:14:52

do. Or at least that's my my understanding. Maybe I'm reading too much into

Speaker: 00:14:56

that, but I love the idea of having observability go

Speaker: 00:15:00

all the way through. You mentioned lineage. That's huge. You wanna make sure that when

Speaker: 00:15:03

you, you know, you make this one change, that's not gonna affect anything

Speaker: 00:15:07

else. Usually, it does affect other things, and having

Speaker: 00:15:10

that lineage view is huge. That is spot on.

Speaker: 00:15:14

That's exactly how we've we've thought about this as well. So, you know, I

Speaker: 00:15:18

think there are specific things that you can test for in data. Like, for

Speaker: 00:15:22

example, you know, specific thing that you can declare, you can say, like,

Speaker: 00:15:26

you know, you know, a T shirt

Speaker: 00:15:29

size should only be, you know, small, medium, large, extra large, whatever.

Speaker: 00:15:33

Right? But then there are some specific things that, you

Speaker: 00:15:37

know, you you don't necessarily know. Like, for example, if there's a particular,

Speaker: 00:15:42

you know, pattern that the data is being updated,

Speaker: 00:15:45

you can actually use machine learning to automatically learn that pattern and then forecast

Speaker: 00:15:49

when it should get up updated again. So it's not necessary for someone to

Speaker: 00:15:53

manually write a test for that. Right? And so

Speaker: 00:15:57

I actually think it's a combination of both of those things which really

Speaker: 00:16:00

give confidence to to data teams over time. So there there's sort of a

Speaker: 00:16:04

couple components to it. The first, I think it really starts with visibility,

Speaker: 00:16:08

sort of call it end to end observability, but it really includes, like, you know,

Speaker: 00:16:11

you mentioned a few of these parts, but, the data

Speaker: 00:16:15

lake, the data warehouse, an orchestration,

Speaker: 00:16:19

BI, ML, AI application that can include the agent,

Speaker: 00:16:23

the vector base if you have a prompt. Right all of those

Speaker: 00:16:26

components you have to have visibility. The first thing is actually to to

Speaker: 00:16:30

your point, like, having lineage into what are the different components that can cross

Speaker: 00:16:34

this. So all the way from. You know, sort of ingestion of the data to

Speaker: 00:16:38

consumption of it. And the second is to start observing.

Speaker: 00:16:42

And and, you know, you there are some specific things that you can declare

Speaker: 00:16:46

and test and based on your business needs, and there are some things that you

Speaker: 00:16:49

can do in an automated way. And and, actually, I think this is an area

Speaker: 00:16:52

where AI can help. So for example,

Speaker: 00:16:56

what what oftentimes teams end up doing is spending a lot of time

Speaker: 00:17:00

trying to define what are data quality rules. And,

Speaker: 00:17:03

actually, you can use LLMs to profile the data,

Speaker: 00:17:07

Make some make some, yeah, make some inference,

Speaker: 00:17:11

based on the semantic meaning of data and then make recommendations.

Speaker: 00:17:15

So for example, I I love this example. We work with lots

Speaker: 00:17:19

of, sports teams. And so you can imagine that,

Speaker: 00:17:23

you know, you have a particular field called, like, let's say this is

Speaker: 00:17:26

in baseball, a baseball team and sort of, like, you know, pitch type.

Speaker: 00:17:30

And and then, like, the the speed that matches that. And

Speaker: 00:17:34

so you can imagine that, like, an l m can recommend or infer that

Speaker: 00:17:38

a fastball should not be, you know, less than

Speaker: 00:17:41

70 miles per hour or whatever it is. Even though I don't know what

Speaker: 00:17:45

the real number is. I just made that up. But there is, like, some you

Speaker: 00:17:48

you can infer based based on that and make a recommendation. And

Speaker: 00:17:52

so, actually, it's a I find that AI and LM is a really cool

Speaker: 00:17:56

application of how to make observability faster and and and

Speaker: 00:17:59

easier for for teams. So, yeah, I'm I'm

Speaker: 00:18:03

very excited about about what you just shared, Andy. Well,

Speaker: 00:18:07

I I love what you brought up about machine learning being able to to

Speaker: 00:18:11

make basically make predictions about things.

Speaker: 00:18:14

And and one of the terms that, you know, as a practitioner

Speaker: 00:18:18

of, business intelligence is especially the data engineering that supports

Speaker: 00:18:22

it Mhmm. Is data volatility. Mhmm. So if I'm

Speaker: 00:18:26

especially if I'm looking at an outlier. So I'm consuming this

Speaker: 00:18:29

data day in and day out, And let's

Speaker: 00:18:33

say, you know, 10% of the data is new stuff,

Speaker: 00:18:37

and maybe another 10 or 15% are things that are have

Speaker: 00:18:41

been updated, old stuff that's been updated, and the rest of it's relatively

Speaker: 00:18:45

stable. If I see those numbers go crazy out of bounds,

Speaker: 00:18:49

you know, and machine learning would be able to pick that up right

Speaker: 00:18:52

away and say, there may be a problem with the data we're

Speaker: 00:18:56

reading today. You know, I would I that that sounds like one of

Speaker: 00:19:00

the problems that would solve is that volatility,

Speaker: 00:19:03

expected ranges of volatility of data. That's exactly

Speaker: 00:19:07

right. Yeah. Cool. Interesting. I think there's

Speaker: 00:19:11

also something you said was, you know, when you have LLMs, because, obviously, we have

Speaker: 00:19:14talk about GenAI because it's: 2025Speaker: 00:19:18

Silicon Valley. I think if you don't mention GenAI every twenty five

Speaker: 00:19:22

minutes, the cops come and knock on your door and check it out. Welfare check.

Speaker: 00:19:25

Could get in trouble. Or they make sure you're okay. Make

Speaker: 00:19:29

sure you're okay. But I think one of the things that really

Speaker: 00:19:33

kind of makes me worry about GenAI is that it's not

Speaker: 00:19:36

immediately obvious. Like, if you're at the airport, obviously, it's not a good look for

Speaker: 00:19:40

you. Like, if the if the and this has happened to me where the app

Speaker: 00:19:42

says one thing, the screen says something else, and my ticket says yet a

Speaker: 00:19:46

third thing. So I'm not really sure where I'm supposed to go.

Speaker: 00:19:50

Generally speaking of those, the app tends to be more accurate.

Speaker: 00:19:53

But, that depends on the airline.

Speaker: 00:19:57

But with with LLMs, it's a the latency

Speaker: 00:20:00

between you seeing the data where the cons the bad

Speaker: 00:20:04

consequences of the data tends to be a lot more

Speaker: 00:20:08

I'll use a $10 word today. I can't even say

Speaker: 00:20:12

it, but it's not it's not immediately obvious. Right? There goes my

Speaker: 00:20:16

my fail and my $10 word. But, like, it's not like it there's a lot

Speaker: 00:20:19

more steps in labyrinthine. I'll go with that one because I can say that.

Speaker: 00:20:23

But, like, what so how do you provide

Speaker: 00:20:26

observability in something like LLMs where

Speaker: 00:20:30

the, the input and the output time tends to not

Speaker: 00:20:34

be quite as straightforward as a data as an old school data pipeline?

Speaker: 00:20:38

Yeah. Such a great question. And maybe I'll just share some of my favorite

Speaker: 00:20:43

wonders if that's helpful. And and I think I'll share them

Speaker: 00:20:46

because it's helpful to explain the gravity

Speaker: 00:20:50

of these issues. So, for example, you know, if you're in an airport and, you

Speaker: 00:20:53

know, the app doesn't say the same as what you have,

Speaker: 00:20:56

hopefully, you arrive early at airports, Frank. I don't know if you have enough time

Speaker: 00:20:59

to, like, figure out the discrepancy and you won't miss your flight. Right?

Speaker: 00:21:04

But oftentimes, those things can lead to to really big disasters.

Speaker: 00:21:08AI. So so I think this was in: 2020Speaker: 00:21:12

Unity, which is a gaming company, they had one schema

Speaker: 00:21:15

change, resulting in a hundred million dollar loss.

Speaker: 00:21:19

Their stock dropped 37%. Oh my gosh. Pretty

Speaker: 00:21:23

meaningful. Right? Fast forward, I think this was

Speaker: 00:21:26or: 2024Speaker: 00:21:31

but not so much related to AI yet.

Speaker: 00:21:34

Citibank was hit with a $400,000,000 fine for

Speaker: 00:21:38

I remember that. For data quality practices for lack

Speaker: 00:21:42

of data quality practices. So think about all the regulatory

Speaker: 00:21:46

industries like health care, financial services,

Speaker: 00:21:50

like, you know, wherever there's, like, PII and and,

Speaker: 00:21:55

And and the, like, you know, the the

Speaker: 00:21:59

implication there are pretty grave. Some fun examples for more recently.

Speaker: 00:22:03

I don't know if fun. I shouldn't call them fun. Some other examples from

Speaker: 00:22:06

yeah. You mentioned Chevy. So I think there was a user

Speaker: 00:22:10

that convinced a chatbot to sell the Chevy Tahoe

Speaker: 00:22:14

for $1. I I commend the user from being able to

Speaker: 00:22:18

do that, but that is terrible. Right? That's terrible

Speaker: 00:22:22

that, that happened. And that chatbot went down

Speaker: 00:22:25

the next day. They they took it offline the next day. I think it was

Speaker: 00:22:28

in Fremont, California, so not that far from the bay.

Speaker: 00:22:32

Yeah. So right. So that's pretty pretty consequential.

Speaker: 00:22:37

I'll just give another, like, example. This is my favorite example. This is what

Speaker: 00:22:41

it went viral on x couple months ago. Someone googled, what should I

Speaker: 00:22:45

do when cheese is slipping off my pizza? And Google responded,

Speaker: 00:22:48

oh, you should just use organic superglue.

Speaker: 00:22:54

Great answer. They they had some really good gaps.

Speaker: 00:22:57

There was the, eat eat one rock a day to get your,

Speaker: 00:23:01

minerals and stuff like that. Yeah. So I I

Speaker: 00:23:05

love that because that's an example of where, like, the prompt was

Speaker: 00:23:09

fine, the context was probably fine, the model was

Speaker: 00:23:12

fine, but the model output was totally not fine.

Speaker: 00:23:16

Right? Right. And so and by the way, maybe Google can get away with it

Speaker: 00:23:20

because it's Google, but, like, 99.9% of brands can't get

Speaker: 00:23:24

away with with the mistakes. Right? And so what, you know, what

Speaker: 00:23:27

do you do? How do you provide observability in in that world? What does that

Speaker: 00:23:31

look like? First, I'll just say, I think

Speaker: 00:23:34

there's still human in the loop, and there will be. So, actually, you know,

Speaker: 00:23:38t's interesting going back to: 2019 Speaker: 00:23:42

oh, you know, I have this important report that my CEO looks at.

Speaker: 00:23:46

But before they look at it, I have, like, six different people looking at the

Speaker: 00:23:49

report with, like, you know, sets of eyes to make sure that the data is

Speaker: 00:23:52

accurate. So, like, people use manual stuff back then. Today, what I

Speaker: 00:23:56

hear is I was just speaking with this head of AI, Silicon Valley,

Speaker: 00:24:00

and I was like, how do you make sure the answers are accurate? And they

Speaker: 00:24:03

were like, well, we have someone sifting through dozens, hundreds of

Speaker: 00:24:07

responses every single day to make sure they're accurate. So I don't think human in

Speaker: 00:24:10

the loop evaluation is going anywhere. There's more advanced techniques, you know,

Speaker: 00:24:14

comparing to to to ground truth data, using LLM

Speaker: 00:24:18

as a judge. There's various sort of, things that we can do, but but I

Speaker: 00:24:22

think human isn't going away. In terms of observability,

Speaker: 00:24:27

I talked before I'll explain a little bit about this sort of framework

Speaker: 00:24:30

of, you know, data issues can be really traced back

Speaker: 00:24:34

to these four core root causes, and I think it's

Speaker: 00:24:37

important to have observability for each in in sort of this world.

Speaker: 00:24:41

So the first I mentioned is data. And so by that, I mean,

Speaker: 00:24:45

you know, let's use another example. Credit Karma, for example,

Speaker: 00:24:49

has a financial advisor chatbot where, basically, they take in information

Speaker: 00:24:53

about you that they have, you know, like, what kind of car you

Speaker: 00:24:57

have as being of cars and, you know, where you live and whatnot, and then

Speaker: 00:25:00

they make financial recommendations based on that. If the

Speaker: 00:25:04

data that they are ingesting from third party data is late or isn't

Speaker: 00:25:07

arriving or is incomplete, that messes up everything downstream. So one

Speaker: 00:25:11

root cause can be the data that you're ingesting is just wrong. Maybe it's all

Speaker: 00:25:14

null values, for example. The second can

Speaker: 00:25:18

be due to change in the code. So the code could be like a a

Speaker: 00:25:22

bad like a schema change, like in the Unity example. It could be a change

Speaker: 00:25:25

in the code that's actually, being used for the

Speaker: 00:25:29

agent. Really, code change can happen every anywhere. And, by the

Speaker: 00:25:32

way, not necessarily by the data team. It can happen by an engineering team or

Speaker: 00:25:36

someone else. It has nothing to do with the with the data state. Right? So

Speaker: 00:25:40

code changes can contribute. The third is system.

Speaker: 00:25:44

A % of systems fail. What what do I mean by system? I

Speaker: 00:25:48

mean system is, like, basically the infrastructure that sort of runs all these jobs.

Speaker: 00:25:51

So this could be, like, an airflow job that fails or a DDT job

Speaker: 00:25:55

that that fails. You know, again, a % of systems fail,

Speaker: 00:25:59

and so you would definitely have something that goes wrong in systems.

Speaker: 00:26:03

And then the fourth is you could just have the model output be wrong, kinda

Speaker: 00:26:06

like with the cheese in in Google, example. And

Speaker: 00:26:10

so when we think about sort of having what does it mean,

Speaker: 00:26:13

what does observability mean in this in this age, I think it has to

Speaker: 00:26:17

have coverage for all four of those things. And here's the problem. It oftentimes

Speaker: 00:26:21

includes all four together. So I don't know if it you know, it's typically on

Speaker: 00:26:25

a Friday at 5PM. You're just about done, and then

Speaker: 00:26:29

everything breaks at the same time. That's an

Speaker: 00:26:33

interesting point. Like and and it's you also use the a term

Speaker: 00:26:36

a couple of times, which, you're I can count on one hand how many

Speaker: 00:26:40

non Microsoft people have used this term,

Speaker: 00:26:44

data estate. And I'm just curious about I know where I pick from

Speaker: 00:26:47

Microsoft. No. No. No. Like, I'm like I mean, I always

Speaker: 00:26:51

thought it was a, you know, Microsoft invention. I don't think it is.

Speaker: 00:26:55

But, like, where did you pick up that term? Because I've only like, seriously, you

Speaker: 00:26:59

were, like, the third or maybe fourth person who is not

Speaker: 00:27:02

never worked for Microsoft, never worked with Microsoft. I I mean, I don't know if

Speaker: 00:27:06

you work with Microsoft, but, like, I I always whenever I hear someone say

Speaker: 00:27:09

data to state publicly, I'm like, so who'd you work for at Microsoft? What division?

Speaker: 00:27:13

Like, like Oh, wow. Yeah. It's like that. And at first, I

Speaker: 00:27:16

didn't like I'll be honest. I didn't like the term at all, but eventually, I

Speaker: 00:27:19

kinda grew to like the term because it there's a lot behind it, and I'd

Speaker: 00:27:23

be curious to get, like, one, where'd you where'd you where'd you

Speaker: 00:27:27

pick that up? Like, I'm just, like and then two, what does it mean to

Speaker: 00:27:30

you? Like, what does that term data state mean to you? Great question. For

Speaker: 00:27:34

what it's worth, I actually didn't like it either. For the record, I didn't even

Speaker: 00:27:38

like data observability to begin with Mhmm. To be totally Really? English is

Speaker: 00:27:42

yeah. English is my second language, and observability was such a difficult word to

Speaker: 00:27:45

pronounce. When we started the when we started the, you know,

Speaker: 00:27:49

the company and and the category, we had to give it a name. So we

Speaker: 00:27:51

didn't really know is this you know, we used we we coined the term data

Speaker: 00:27:55

downtime, you know, as a corollary to application downtime. We thought maybe

Speaker: 00:27:59

data reliability. There are lots of

Speaker: 00:28:02

options. At the end of the day, I always try to get gravitate towards where

Speaker: 00:28:06

my customers are, so whatever language my customers use. And so customers

Speaker: 00:28:10

started using the word observability, so I started using that too. And same with the

Speaker: 00:28:13

state, they started using the data state sort of as a language. And so

Speaker: 00:28:17

Interesting. Full disclosure, have not, have no

Speaker: 00:28:21

ties to Microsoft, but but just have heard

Speaker: 00:28:24

mostly enterprises sort of think about that. I I think my understanding,

Speaker: 00:28:28

you know, for for what they mean is, you know, wherever

Speaker: 00:28:32

you store aggregate process data. And so that, you know, can

Speaker: 00:28:35

include, you know, you know, upstream

Speaker: 00:28:39

sources or upstream, data sources. But, you know, it could be,

Speaker: 00:28:43

like, an Oracle or SAP database. It could be data

Speaker: 00:28:46

lake house, data warehouse like Snowflake, Databricks,

Speaker: 00:28:51

AWS, Redshift, s three, all the

Speaker: 00:28:54

way to wherever you're consuming that. That could be a BI report. You know, Power

Speaker: 00:28:58

BI. Sorry, Microsoft.

Speaker: 00:29:02

Right, Looker, Tableau, you know,

Speaker: 00:29:06

various, various options. And,

Speaker: 00:29:09

honestly, the, you know, the most common enterprise has all of

Speaker: 00:29:13

the above in some shape or forward fashion. And so to sort

Speaker: 00:29:17

of include all of that, I think

Speaker: 00:29:21

the some of the thesis that we have around observability is that, by the way,

Speaker: 00:29:25

each of those by themselves has some concept of observability.

Speaker: 00:29:29

Right? Like, you

Speaker: 00:29:32

can, for example, with Snowflake, you can set up some basic,

Speaker: 00:29:36

sort of checks, if you will, like a sum check or whatever. Right?

Speaker: 00:29:40

You you could do that in Snowflake. However, we think that observability

Speaker: 00:29:43

needs to be sort of third party and to be end to end. And,

Speaker: 00:29:47

again, that draws on on software corollary. So,

Speaker: 00:29:51

you know, like, AWS has CloudWatch, for example,

Speaker: 00:29:55

but that's probably not sufficient for whatever you're building. You're probably

Speaker: 00:29:58

gonna use, again, like, New Relic or Datadog to connect

Speaker: 00:30:02

across the the board to, you know, variety of of,

Speaker: 00:30:06

integrations. Right? They have hundreds. So that's what I think about when I

Speaker: 00:30:10

say data estate. But it's a great question. It's definitely not my

Speaker: 00:30:14

word. No. I was just curious. Like like, you know,

Speaker: 00:30:17

because whenever because first, I hated the term too. Right? And I can't maybe it's

Speaker: 00:30:21

Stockholm Syndrome. I don't know. But,

Speaker: 00:30:26

the more I kind of sat on it and kind of digested it, I was

Speaker: 00:30:29

like, I like it because it explains, like, you know, you know, historically.

Speaker: 00:30:33

Right? Like, a state is, you know, whoever

Speaker: 00:30:36

owned the land got to call the shots and whoever called the shots owned the

Speaker: 00:30:39

land. Like, there was a very, you know, you drew the food, you you cut

Speaker: 00:30:42

down the trees, you, you know, you mined for, I think the Minecraft

Speaker: 00:30:46

movie is coming out. So you mined for all these things. Right? My kids are

Speaker: 00:30:50

into it. But, like, and it's

Speaker: 00:30:53

really kinda like it's just the idea of seeing it, like, it's land. It's kinda

Speaker: 00:30:57

like land. It's kinda like a natural resource. It's not really natural, but it is

Speaker: 00:31:00

a resource. Right? And if I say unnatural resource, that's really weird. But it's a

Speaker: 00:31:04

resource. Right? And if you you can either you have it. You already have

Speaker: 00:31:08

it. You either develop it or you don't. And, you know, do

Speaker: 00:31:11

you, you know, do you grow food on it? Do you, you know, like so

Speaker: 00:31:15

see, I I liked it because it was the idea that it's already there. Right?

Speaker: 00:31:19

Mhmm. And it's it might be in forms you don't really think about. Right? Like,

Speaker: 00:31:22

you know, PDFs in a in a SMB share somewhere.

Speaker: 00:31:26

Right? Mhmm. I mean, that's part of your data to state. Yep. Right?

Speaker: 00:31:30

And it's that's how I kinda, like, came to terms with it. And,

Speaker: 00:31:34

like, I really kinda like it because it helps you to think holistically about data

Speaker: 00:31:37

because I think a lot of business decision

Speaker: 00:31:40

makers and even technical decision makers don't see data as a

Speaker: 00:31:44

as a as a as a resource. I think that's changed

Speaker: 00:31:48

over the last maybe five, six years.

Speaker: 00:31:52

But it really became something that they don't see

Speaker: 00:31:56

it as a resource they could mine, they can get value out of. Right? The

Speaker: 00:31:59

smart people did. But, for the most part That's

Speaker: 00:32:03

right. Yeah. You had to convince them. Right? Exactly.

Speaker: 00:32:07

It sounds like based on what you say because, like, you know, my wife works

Speaker: 00:32:10

in IT security. Right? So, so we're a two engineer

Speaker: 00:32:14

household. So the kids are super nerds. But, like, I was telling

Speaker: 00:32:18

her after chat CPT came out, I was all excited about it. And I was

Speaker: 00:32:21

telling her about how this works. I was like, you give it this big corpus

Speaker: 00:32:24

of data, and they chews through it, and it comes up with these these vectors

Speaker: 00:32:27

and stuff like that. And then she looked at me and it's like, so all

Speaker: 00:32:30

the training data is now a massive attack surface.

Speaker: 00:32:34

And Yep. When that's just why I love my wife. So I

Speaker: 00:32:38

I'm wronged. She's never wronged. Well, that's true. But at

Speaker: 00:32:42

first I was like I was thinking but but you're missing and then I was

Speaker: 00:32:45

gonna say you're missing the point which one is never a good thing to say

Speaker: 00:32:47

but Like midway through I was like, oh my gosh,

Speaker: 00:32:51

she's right. Oh my gosh. She's right. So then

Speaker: 00:32:55

when I started talking to other data science and AI types, and I was like,

Speaker: 00:32:59

but but don't you think this could be, like, a big attack surface? I look

Speaker: 00:33:02

like that meme with the guy from It's Sunny in Philadelphia with, like, it's

Speaker: 00:33:06

always sunny where he had, like, the conspiracy thing. Like, I swear I will

Speaker: 00:33:10

like that meme. Yeah. And, you know, and if you

Speaker: 00:33:13

look at the I think OWASP has, like, the top 10 vulnerabilities of LLMs

Speaker: 00:33:17

that is either two or three. Right? So it's

Speaker: 00:33:21

kinda like there's a fine line between,

Speaker: 00:33:24

like, thinking too much about problem, but also kind of thinking ahead of the

Speaker: 00:33:28

problem. I don't know. No. Oh, I think you

Speaker: 00:33:32

cut off a little bit, Frank, but, Andy,

Speaker: 00:33:36

to me, that resonates a lot, and I think it's sort of really the overlap

Speaker: 00:33:39

between data and engineers. And, by the way, like, we didn't even talk

Speaker: 00:33:43

about security. Like, all these concepts also exist in security.

Speaker: 00:33:46

Right? And I think in the same way that we sort of manage, like, you

Speaker: 00:33:49

know, sub zero, sub one issues in security engineering, data

Speaker: 00:33:53

issues should be treated the same way. You should have a framework to understand what's

Speaker: 00:33:56

a sub zero, what's a sub one for data issues. You should it should be

Speaker: 00:33:59

connected to pager duty. Like, people should wake up in the middle of the night

Speaker: 00:34:02

when you have data issues. I think I think that's right. It's

Speaker: 00:34:06

improving, but, we're not quite there. It'll

Speaker: 00:34:10

happen. No. You're right, though. Like, they don't think about this in

Speaker: 00:34:13

terms of they don't does it I wouldn't say it's not disciplined. Sorry,

Speaker: 00:34:17

Annie. I cut you off. No. But my experience we talked to data engineers. Sorry,

Speaker: 00:34:20

Andy. And I I I I am a former data engineer

Speaker: 00:34:24

myself. Like, I thought of it in terms of schema structures and pipelines.

Speaker: 00:34:28

Mhmm. Not necessarily securing those pipelines. Right? Mhmm. Sorry,

Speaker: 00:34:32

Andy. I'll go. No. I was curious. I wanted to to shift back

Speaker: 00:34:35

to you. You mentioned the four areas that your software,

Speaker: 00:34:39

looks over your AI and the observability software does. What

Speaker: 00:34:43

happens when it detects something amiss?

Speaker: 00:34:47

Great question. So not even talking about Monte Carlo specifically, but rather

Speaker: 00:34:51

an observability solution. I think an observability solution needs to

Speaker: 00:34:55

have coverage or an observability approach, by the way. Like, some people build this

Speaker: 00:34:59

in house. An observability approach should take into consideration

Speaker: 00:35:03

your data estate, should take into consideration, right, your

Speaker: 00:35:07

entire data estate. I think, oftentimes, the mistake is people will even if they

Speaker: 00:35:11

build it in house or do anything else, they'll really just focus on, like, the

Speaker: 00:35:13

data and their data lake or the data in a particular report. Like, that's

Speaker: 00:35:17

not sufficient. Right? It it just isn't. And so people waste

Speaker: 00:35:21

a ton of time trying to understand, like, what's wrong and where. So I think

Speaker: 00:35:24

the first is, like, you need you need visibility across the data

Speaker: 00:35:28

state, which hopefully we've defined an unnatural resource that should be

Speaker: 00:35:32

managed securely. And and I think that's right because I

Speaker: 00:35:36

I by the way, Monte Carlo doesn't doesn't do the security

Speaker: 00:35:39

part, but I similarly believe that in the same kind of diligence

Speaker: 00:35:43

that we apply to data as engineering, you want data products to

Speaker: 00:35:47

be reliable but also secure, scalable,

Speaker: 00:35:50

like all those concepts should adapt. By chance, we happen to

Speaker: 00:35:54

focus on the reliability and observability part, but all the other,

Speaker: 00:35:58

principles of software engineering should apply.

Speaker: 00:36:02

We specifically don't do it, but very much believe that should be

Speaker: 00:36:06

the case. But back to your question, you

Speaker: 00:36:09

know, so so what happens when there is an issue?

Speaker: 00:36:13

Very similar to workflow that you might find in Datadog,

Speaker: 00:36:17

New Relic, and and PagerDuty. So there is an alert that goes out,

Speaker: 00:36:21

often you know, in whatever flavor of choice. If you're an enterprise that has a

Speaker: 00:36:25

data state, this is likely Microsoft Teams. If not, this would mean

Speaker: 00:36:28

Slack or an email or what you know, some teams like to have it connected

Speaker: 00:36:32

to to Jira and and pager duty for for sev zeros or sev

Speaker: 00:36:36

ones. And, you know, the first thing

Speaker: 00:36:40

that people will do is start, you know, typically an analyst.

Speaker: 00:36:43

I was I was in, you know, prior an analyst. The first thing you start

Speaker: 00:36:47

asking yourself is, why the hell is the data is wrong?

Speaker: 00:36:50

Right. Yeah. You're like, well, was the report on time?

Speaker: 00:36:54

Was the data accurate? Was it complete? You start going through all

Speaker: 00:36:58

and then you start you basically come up with hypothesis. And then you start

Speaker: 00:37:01

researching those hypothesis, and you're like, well, let me let me

Speaker: 00:37:05

trace the data all the way all the steps of the transformation

Speaker: 00:37:09

and start looking. Was the data okay here? Yes. Check. Okay. Move on. Was it

Speaker: 00:37:12

data right? You literally you started this, like, recursive process. Gotcha.

Speaker: 00:37:16

Before we started the company, I used to do this all manually. So I remember,

Speaker: 00:37:20

like, I would go into a, you know, into a room. Maybe you did this

Speaker: 00:37:22

too. And, like, on a whiteboard, I would start, like, basically mapping out

Speaker: 00:37:26

the lineage. Okay. This broke here. Was the data here okay? Let's let

Speaker: 00:37:30

let's sample the data and make sure it's okay. Okay. Move on. Let's like, literally,

Speaker: 00:37:33

we have this, like, very every morning, actually, you know, that this

Speaker: 00:37:37

became such such a problem because we were so reliant on this particular day

Speaker: 00:37:40

dataset that every morning, me and my team would wake up, and we would basically

Speaker: 00:37:44

go step by step and diligently, like, make sure that the data is accurate,

Speaker: 00:37:48

which I felt like was I was like, this is, like, total, you know, crazy.

Speaker: 00:37:52

So, you know, I think, particularly in Monte

Speaker: 00:37:55

Carlo or, like, what observability does is provides the

Speaker: 00:37:59

information that you need in order to troubleshoot and understand where the issue is. And

Speaker: 00:38:02

so we can surface you information like, hey. There was at the same time that

Speaker: 00:38:06

this dataset you know, maybe the the percentage of null values in

Speaker: 00:38:10

particular field was inaccurate. And then at the same time, there was a full

Speaker: 00:38:13

request that happened. Maybe those are correlated, actually. Gotcha.

Speaker: 00:38:17

Maybe, you know and maybe, actually, you can use you can also

Speaker: 00:38:21

do a code analysis. So you can, like, basically, you know, analyst

Speaker: 00:38:24

what we used to do is, like, sift through lines of code and try to

Speaker: 00:38:27

see what the change. Hey. Why did few surface to you that, like, there was

Speaker: 00:38:30

a particular change in the, you know, name of a field,

Speaker: 00:38:34

at the same time as an example. So bringing all that data into one

Speaker: 00:38:38

place can help you sort of troubleshoot that. And

Speaker: 00:38:42

sorry for another LLM plug, but you can actually have

Speaker: 00:38:45

an LLM do this for you, which is pretty sick where it's like an early

Speaker: 00:38:49

beta test for us. We haven't released it yet. But, basically, what we're

Speaker: 00:38:53

testing internally is for every like, for data incidents,

Speaker: 00:38:57

there's basically, like, an in like, a troubleshooting agent that

Speaker: 00:39:00

spawns agents for each of the hypothesis. So there's, like, an agent that

Speaker: 00:39:04

statement. Yeah. I it's really cool. There's an agent that

Speaker: 00:39:08

looks into, like, the code change, the data change, the system

Speaker: 00:39:11

change, and then and then it does it recursively on

Speaker: 00:39:15

all those tables. So you can actually run up to a hundred agents in under

Speaker: 00:39:18

one minute. And then there's a larger LLM that takes all that information

Speaker: 00:39:22

and summarizes it and synthesizes it. So, again, early days, this is like we're still

Speaker: 00:39:26

building it. Very cool. But the early results are really cool. Yeah. It's

Speaker: 00:39:30

like basically turbocharging your your data analysts and your data

Speaker: 00:39:33

stewards. Sorry. I got all excited. No. It's it is That's really

Speaker: 00:39:37

cool. Fascinating, and I love that you're excited about it. And what one of the

Speaker: 00:39:40

jokes that I make when I'm I'm working with my kids on something, if

Speaker: 00:39:44

they nail something, I'll I'll say to them, you know,

Speaker: 00:39:48

something similar to this. It's like, if you can only, you know, if you

Speaker: 00:39:52

can only run a hundred in one minute, I guess that's if that's the best

Speaker: 00:39:55

you can do, we'll just have to live with it. Yeah. Exactly.

Speaker: 00:40:00

That's that's an amazing stat. Yeah. Yeah. That is interesting. And I

Speaker: 00:40:04

also think too I also think too that, like, observability could help

Speaker: 00:40:08

with secure the security story. Right? Because if, you know, you're looking at a

Speaker: 00:40:11

pipeline and it's like, hey. Weren't there a bunch of

Speaker: 00:40:15

sketchy looking IPs, like, poking around our system about the time that this

Speaker: 00:40:18

pipeline ran? Maybe the rest of the data that goes out of that pipeline

Speaker: 00:40:22

run is a little bit suspicious too. Yeah. A

Speaker: 00:40:26

%. Like, we we you know, for example, you work with a,

Speaker: 00:40:30

call it delivery service, and there was a very

Speaker: 00:40:34

suspicious tip very suspicious

Speaker: 00:40:37

amount of tip that was given. Like, you

Speaker: 00:40:41

know, you can imagine, you know, the range of tips can be between x

Speaker: 00:40:45

dollars and y dollars, and suddenly that's, like, you know,

Speaker: 00:40:48

10,000 times y, like, 10,000 times the upper limit.

Speaker: 00:40:52

Yeah. You know, triggers off a suspicious alert. It's

Speaker: 00:40:55

not a normal tip, and it's not a mistake. It's actually, you know, security

Speaker: 00:40:59

issue. So that's an example. Yeah. Interesting. Yeah. I

Speaker: 00:41:03

love the anomaly detection aspect of that. I mean, it just it

Speaker: 00:41:07

it's it's something that we've been doing for a long time,

Speaker: 00:41:10

but then at wrapping it with automation and then

Speaker: 00:41:14

combining that automation with what you just described with all the

Speaker: 00:41:17

agents running down all of the permutations, that

Speaker: 00:41:21

that just sounds amazing. Yeah. It's really cool. I can't

Speaker: 00:41:25

take credit. This isn't me. It's it's it's my team. But,

Speaker: 00:41:29

but I I was like, woah. It's like a hundred bars

Speaker: 00:41:32

running at the same time under one minute. That's amazing. There you go. It's really

Speaker: 00:41:36

cool. Probably smarter than me. But yeah.

Speaker: 00:41:40

That is so awesome. That is cool.

Speaker: 00:41:43

So we we generally have is, we have kind of our

Speaker: 00:41:47

our stock questions that we ask, if you're interested in doing them.

Speaker: 00:41:51

They're not we're not Mike Wallace. We're not trying to I don't even think

Speaker: 00:41:54

anyone gets that reference anymore, but we're not trying to catch you in a,

Speaker: 00:41:58

I gotta come up with a new one, in a thing. But it's mostly, like,

Speaker: 00:42:02

how'd you find your way in the first one is I'll get the rest of

Speaker: 00:42:05

them, up for you in a second. But the first one is, how'd

Speaker: 00:42:09

you find your way into data? Did did the data did you find the data

Speaker: 00:42:12

life or did data life find you? Oh, that's such a great

Speaker: 00:42:16

question. You know, it's funny.

Speaker: 00:42:20

I grew up you know, my my, my mom is a meditation and dance

Speaker: 00:42:24

teacher and my dad is a physics professor. And so,

Speaker: 00:42:29

yeah, and so I, I, you know, grew up with very sort of like, yin

Speaker: 00:42:32

yin yang in my family, if you will.

Speaker: 00:42:36

At a very early age, I used to, like, hang out in in my dad's

Speaker: 00:42:39

lab and, like, do scientific research and stuff like that. So or, you know,

Speaker: 00:42:43

like, very at a very young age, my memories are, like, sitting in a

Speaker: 00:42:46

cinema, watching a movie with my dad and trying to, like, guesstimate how

Speaker: 00:42:50

many people are sitting in the in the audience.

Speaker: 00:42:53

Right? Yes. Just like, you know, I think for, like, a five year

Speaker: 00:42:57

old, it's sort of like a fun fun thing. But, you know, throughout my my

Speaker: 00:43:01

adulthood, like, always sort of had that in in the background. And,

Speaker: 00:43:05

you know, I I think later on in life, I sort of always gravitated towards

Speaker: 00:43:09

data. And when I decided to start a company,

Speaker: 00:43:12

I was actually debating between various areas

Speaker: 00:43:16

like IT and actually blockchain, or, you know,

Speaker: 00:43:19

crypto for a little bit and and data. I think at the end of the

Speaker: 00:43:22

day, like, my heart was really in in data. If I look at, like,

Speaker: 00:43:26

the next ten, twenty years, it's pretty clear to me that data is

Speaker: 00:43:30

gonna be I think it still is the coolest party, and I think it

Speaker: 00:43:33

will be the coolest party to be in. And I personally,

Speaker: 00:43:37

like, you know, it's it's it's funny. Like, throughout my my

Speaker: 00:43:41

career, I've I've also learned the limitations of data. Right? So so data can

Speaker: 00:43:45

tell you whatever story you want. It could tell you, you know, for every question,

Speaker: 00:43:48

it give can give you a yes, and you can also tell a no story.

Speaker: 00:43:53

Right? So so there's also limitations to data,

Speaker: 00:43:56

but but I always have been fascinated,

Speaker: 00:44:00

by by data and space. So can I say both? That's

Speaker: 00:44:04

Yeah. I mean, that's fair. That's fair. Good answer. That's fair. Yep. So

Speaker: 00:44:08

what what's your favorite part of your current job?

Speaker: 00:44:12

Oh, that's hard to choose. I love my job.

Speaker: 00:44:16

I just love it. I think, you know,

Speaker: 00:44:20

the ability to work with customers and actually, like, change the way they

Speaker: 00:44:24

work, I I think that's probably the biggest gratification that I

Speaker: 00:44:27

get, you know, from from my my career. Like, the fact that you can

Speaker: 00:44:31

actually work on something that matters is pretty insane. You know? And when I think

Speaker: 00:44:34

about, like, the future, I'm like, what? So data is gonna be wrong? Like, we're

Speaker: 00:44:38

just gonna be, you know, making decisions off of wrong like, what? I don't

Speaker: 00:44:42

wanna live in that world. You know? And so Yeah. I think

Speaker: 00:44:46

there's something that's, like, really fulfilling and helping, you know, drive a mission that

Speaker: 00:44:49

I believe in that has an impact on customers. And, you know, when customers will

Speaker: 00:44:53

tell me, you know, I started sleeping at night because I

Speaker: 00:44:57

know that, like, I have some coverage for my data. I'm like, yeah. Oh, wow.

Speaker: 00:45:00

I'm glad you're sleeping. You know? Like, good for you. I love

Speaker: 00:45:04

sleeping. So What a cool thing to hear. Yeah. Exactly. I

Speaker: 00:45:08

think that's that's probably, you know, maybe one part. And then the second is, like,

Speaker: 00:45:11

just working with an amazing team. You know, I I spend most of my my

Speaker: 00:45:15

day maybe kinda like, you know, you guys, like, hang out having fun,

Speaker: 00:45:18

laughing. So, you know, I I I'm very

Speaker: 00:45:22

grateful that I get to work with the smartest people on on

Speaker: 00:45:26

worthwhile challenges. Oh, very cool.

Speaker: 00:45:30

We have, three complete these sentences. When I'm not

Speaker: 00:45:34

working, I enjoy blank. Sleeping.

Speaker: 00:45:39

I yeah. I I have a I we recently

Speaker: 00:45:43

have added we we had two kids, and we adopted a cousin. And

Speaker: 00:45:47

I forgot how draining a toddler can be. And I'm

Speaker: 00:45:51

I'm eight to 10 years older since the last time I had a toddler, so

Speaker: 00:45:54

it's like I, I have two

Speaker: 00:45:58

kids, on two under four. So I,

Speaker: 00:46:02

respect the sleep even more. I I can't even I can't

Speaker: 00:46:06

even wrap my head around that. It gets it gets better. I can say

Speaker: 00:46:10

that. It's my own role. I appreciate that.

Speaker: 00:46:14

So our second one is I think the coolest thing in technology

Speaker: 00:46:17

today is blank. The coolest thing in

Speaker: 00:46:21

techno I think the pace of innovation. I think that's really

Speaker: 00:46:24

freaking cool. You know, you can, like, work at a problem today and you're like,

Speaker: 00:46:28

you can't solve this. Two days two days later, a new model will come out.

Speaker: 00:46:32

Boom. You're done. So it's harder. Right? The bar is

Speaker: 00:46:35

higher in order to, like, actually like, it's it's harder to it's

Speaker: 00:46:38

harder to know what to bet on. It's harder to know what the future will

Speaker: 00:46:42

look like, but it's a lot more exciting. So I'm in it.

Speaker: 00:46:47

Cool. Our third and final complete sentence is, I look forward

Speaker: 00:46:51

to the day when I can use technology to blank.

Speaker: 00:46:56

I was always a big fan of teleportation. I think teleportation is really

Speaker: 00:46:59

freaking cool. That would be nice. Can't wait for that. That would be cool.

Speaker: 00:47:03

That would be cool. You know, you're not the first person to answer with them.

Speaker: 00:47:07

Oh, really? Yeah. It's pretty cool. Pretty cool. Sorry.

Speaker: 00:47:10

Number six is share something different about

Speaker: 00:47:14

yourself. Something different.

Speaker: 00:47:17

Yeah. Something different. Let's

Speaker: 00:47:21

see. I mentioned I have two kids. I

Speaker: 00:47:24

meditate when I don't sleep. I like to meditate.

Speaker: 00:47:30

I, what else? I'm married to

Speaker: 00:47:34

my cofounder. Oh, wow. So we,

Speaker: 00:47:39

yeah, we're fortunate to share our lives both at work and at

Speaker: 00:47:42

home. That is cool. Yeah. I can

Speaker: 00:47:46

imagine that would work out really well or not. Like, there's not a lot of

Speaker: 00:47:50

middle ground there. High risk, high reward. High risk, high reward. I

Speaker: 00:47:54

get, like you know, my wife is, you know, she's a

Speaker: 00:47:57

federal employee, and she's, you know, reevaluating what her career

Speaker: 00:48:01

futures look like, you know, and she's like, you

Speaker: 00:48:05

know, I was like, well, you know, you could help. You can start

Speaker: 00:48:09

a new podcast. I can help you with that. She's like, yeah. But then I

Speaker: 00:48:11

have to work with you. And, like, I know what she meant. I know how

Speaker: 00:48:15

it sounds. I know how it sounds, but I know what she means. Like, so

Speaker: 00:48:18

when she did work from home, like, there was literally a, like, an entire floor

Speaker: 00:48:22

between us because Yep. Like, it's too loud. I'm too loud. Yeah. Yeah. Yeah.

Speaker: 00:48:26

Yep. We're very loud too. So

Speaker: 00:48:29

where can folks find more, learn more about, Monte

Speaker: 00:48:33

Carlo and, and and what you're up to?

Speaker: 00:48:37

Probably, I'm the place where I hang out is LinkedIn. So,

Speaker: 00:48:41

I know we just got connected on LinkedIn. That's great. Probably follow me

Speaker: 00:48:45

on LinkedIn or, honestly, reach out to me directly, me,

Speaker: 00:48:48

Moses@MonteCarlodata.com. I hope I don't get a lot of phishing now because

Speaker: 00:48:51

of that. But Well, hopefully, make sure it's the right account because we found out

Speaker: 00:48:55

in the process that there's there was another suspect in

Speaker: 00:48:58

suspicious looking account. And I also think that for our

Speaker: 00:49:02

listeners, it's worth pointing out that I think that people have realized that LinkedIn is

Speaker: 00:49:05

a is a major security vector because I've been getting a lot of

Speaker: 00:49:09

weird a lot more lately. Now I don't think it's related to

Speaker: 00:49:13

the, the refrigerator scandal. Andy and I will do a whole show on that

Speaker: 00:49:16

later because there's there's actually an interesting AI component to that. Okay.

Speaker: 00:49:20

Good to know. And finally, last but not least, Audible

Speaker: 00:49:24

is a sponsor of the podcast. Do you do audiobooks? If

Speaker: 00:49:27

so, recommend one. Otherwise, just recommend a good book you recommend.

Speaker: 00:49:32

A good book. Let's see.

Speaker: 00:49:39

Thinking in bets by Annie Duke.

Speaker: 00:49:45

Professional poker player. Interesting. In in how

Speaker: 00:49:51

lessons from poker can be applied in, in life

Speaker: 00:49:55

and in business. Interesting. I

Speaker: 00:49:58

once worked at a financial services company, and one of the

Speaker: 00:50:02

big shots used to play online poker. And

Speaker: 00:50:06

they're on company, not on company money, but on company time. And a

Speaker: 00:50:10

lot of people Not a lot of people took a dim view of that.

Speaker: 00:50:14

Rightfully so. But he was

Speaker: 00:50:18

making so much money. You know, people that matter didn't take a damn view to

Speaker: 00:50:21

it. When he stopped making so much money, people everyone took a damn view to

Speaker: 00:50:24

it. And it they don't that does end the the story. It

Speaker: 00:50:28

is on I don't see if it's an audio oh, it is

Speaker: 00:50:32

an audio book. It is an audio book. Awesome. I'm gonna add that to my

Speaker: 00:50:35

list. I'm done. Okay. And if you you know, they are a sponsor.

Speaker: 00:50:39

So if you go to, the datadrivenbook.com, you know,

Speaker: 00:50:43

you'll get a free audio book on us. And, you know, if you sign up,

Speaker: 00:50:47

we'll get enough to, you know, buy a coffee.

Speaker: 00:50:50

Maybe not tip them $8,000, but, you know,

Speaker: 00:50:54

we'll get enough for a Starbucks maybe. Maybe. Yeah.

Speaker: 00:50:58

I just tested the link, Frank. Every now and then, we had trouble early on

Speaker: 00:51:01

with the link coming and going. So I just when you saw me turn away

Speaker: 00:51:05

a minute ago when Frank started to this question, that was me typing

Speaker: 00:51:09

in. It worked. It worked.

Speaker: 00:51:13

It's always DNS. That's the Always. It's interesting

Speaker: 00:51:16

you mentioned that. I read an article. Actually, it was a newsletter recently that talked

Speaker: 00:51:20

about, betting being the first stage

Speaker: 00:51:24

in, kind of the path to minimally viable products. And

Speaker: 00:51:28

I thought, now that's curious, and I don't know again, I haven't

Speaker: 00:51:31

read the book. I will listen to it. But the idea of

Speaker: 00:51:35

engaging your team I I manage a team, as well.

Speaker: 00:51:39

And engaging the team by having them do

Speaker: 00:51:43

interesting things and making taking these very large bets

Speaker: 00:51:46

that look nearly impossible,

Speaker: 00:51:50

perhaps. And it's like you said, the the the problem

Speaker: 00:51:53

comes up, and you're thinking this is this is unsolvable. And two days

Speaker: 00:51:57

later, it's solved. And over and over again, I've had that

Speaker: 00:52:01

experience, but I never tied it to the concept of

Speaker: 00:52:04

bets. And I saw this this newsletter that talked about do

Speaker: 00:52:08

that first, And it reminded me a little bit

Speaker: 00:52:12

of Collins talking about, the the big hairy

Speaker: 00:52:15

goals, you know, back in the day. It's very

Speaker: 00:52:19

similar to that maybe in concept. I don't know. I'll have to listen to the

Speaker: 00:52:22

book and check it out, but I was intrigued by the newsletter. Yeah.

Speaker: 00:52:26

There's interesting concepts. Like, I think some of the ideas is, like I mean, even

Speaker: 00:52:29

when you start a company or sort of, you know, start working on a team,

Speaker: 00:52:32

like, you basically have you have a set of cards, which are, like, your strengths,

Speaker: 00:52:36

your weaknesses. And so how how do you play your cards? Like, you can't you

Speaker: 00:52:39

know, if you wanna win around, you can't play with someone else's cards.

Speaker: 00:52:43

You are what you are. And so the best thing you can do is play

Speaker: 00:52:46

with your card. I think that's true for a team solving a problem or startup

Speaker: 00:52:50

or whatever it is. I love that. Yeah.

Speaker: 00:52:54

Interesting. Any final thoughts? This was so fun. Thanks for

Speaker: 00:52:57

having me. Thank you. Thanks for, and you did mention kinda offhand early

Speaker: 00:53:01

on. I don't remember if it was in the green room or not. You have

Speaker: 00:53:03

a podcast yourself? I do not have a podcast myself.

Speaker: 00:53:07

Alright. That was my mistake. Maybe I'll end it tomorrow. Okay. All

Speaker: 00:53:11

good. Life goal one day. There

Speaker: 00:53:15

you go. There you go. And with that, we'll let our AI finish

Speaker: 00:53:18

the show. And that wraps up another data packed episode of

Speaker: 00:53:22

data driven. A massive thank you to our brilliant guest, Bar

Speaker: 00:53:26

Moses, for taking us deep into the world of data observability,

Speaker: 00:53:30

sketchy LinkedIn impersonators, and the dark arts of tipping

Speaker: 00:53:33

anomalies. Who knew a dodgy schema change could cost more than

Speaker: 00:53:37

a luxury sports car? Now, dear listener, if you've made

Speaker: 00:53:41

it this far, you clearly have excellent taste. So why not

Speaker: 00:53:44

put that good judgment to work and leave us a rating and review on

Speaker: 00:53:48

whatever platform you're tuning in on? Apple, Spotify,

Speaker: 00:53:52

Pocket Casts, Morse code, however you get your fix, would love

Speaker: 00:53:56

your feedback. And dare I ask, are you subscribed?

Speaker: 00:54:00

I mean, you wouldn't want to miss out on future episodes filled with more

Speaker: 00:54:04

wit, wisdom, and the occasional fridge based conspiracy,

Speaker: 00:54:07

would you? Until next time, stay curious, stay

Speaker: 00:54:11

observant, and for heaven's sake, keep your data tidy.