Arjun Patel on Vector Databases and the Future of Semantic Search

Today, we delve into the intriguing world of vector databases, retrieval augmented generation, and a surprising twist—origami.

Our special guest, Arjun Patel, a developer advocate at Pinecone, will be walking us through his mission to make vector databases and semantic search more accessible. Alongside his impressive technical expertise, Arjun is also a self-taught origami artist with a background in statistics from the University of Chicago. Together with co-host Frank La Vigne, we explore Arjun’s unique journey from making speech coaching accessible with AI at Speeko to detecting AI-generated content at Appen.

In this episode, get ready to unravel the mysteries of natural language processing, understand the impact of the attention mechanism in transformers, and discover how AI can even assist in the art of paper folding. From discussing the nuances of RAG systems to sharing personal insights on learning and technology, we promise a session that’s both enlightening and entertaining. So sit back, relax, and get ready to fold your way into the fascinating layers of AI with Arjun Patel on Data Driven.

Show Notes

00:00 Arjun Patel: Bridging AI & Education

04:39 Traditional NLP and Geometric Models

08:40 Co-occurrence and Meaning in Text

13:14 Masked Language Modeling Success

16:50 Understanding Tokenization in AI Models

18:12 “Understanding Large Language Models”

22:43 Instruction-Following vs Few-Shot Learning

26:43 “Rel AI: Open Source Data Tool”

31:14 “Retrieval-Augmented Generation Explained”

33:58 “Pinecone: Efficient Vector Database”

37:31 “AI Found Me: Intern to Innovator”

41:10 “Impact of Code Generation Models”

45:25 Personalized Learning Path Technology

46:57 Mathematical Complexity in Origami Design

50:32 “Data, AI, and Origami Insights”

Transcript

Speaker: 00:00:00

Welcome back to Data Driven, the podcast where we chart the thrilling

Speaker: 00:00:03

terrains of data science, AI, and everything in between.

Speaker: 00:00:07

I'm Bailey, your semiscient host with a pangshang for

Speaker: 00:00:11

sarcasm and a wit sharper than a histogram spike.

Speaker: 00:00:14

Today's episode promises a delightful mix of the analytical and the

Speaker: 00:00:18

artistic as we dive into the fascinating world of vector databases,

Speaker: 00:00:22

retrieval augmented generation, and origami. Yes.

Speaker: 00:00:26

You heard that right. Origami, the ancient art of

Speaker: 00:00:29

folding paper, somehow finds itself intersecting with AI,

Speaker: 00:00:33

proving that the future really does have layers or should I say folds.

Speaker: 00:00:37

Our guest, Arjun Patel, is a developer advocate at Pinecone

Speaker: 00:00:41

who's on a mission to demystify vector databases and semantic

Speaker: 00:00:45

search, turning complex AI concepts into snackable bits of

Speaker: 00:00:48

brilliance. He's also a self taught origami artist and a

Speaker: 00:00:52

former statistics student who actually enjoyed it. So if

Speaker: 00:00:56

you're ready to unravel the secrets of modern AI and maybe pick up a trick

Speaker: 00:01:00

or two about folding life into geometric perfection, you're in the

Speaker: 00:01:03

right place.

Speaker: 00:01:08

Hello, and welcome back to Data Driven, the podcast where we explore the emergent

Speaker: 00:01:11

fields of data science, AI, data engineering.

Speaker: 00:01:16

Now today, due to a scheduling conflict, my most favorite is data engineer

Speaker: 00:01:19

in the world will not be able to make it. But I will

Speaker: 00:01:23

continue on, despite the recent snowstorms that we've had here in

Speaker: 00:01:27

the DC Baltimore area. With me today, I have

Speaker: 00:01:30

Arjun Patel, a developer advocate at Pinecone,

Speaker: 00:01:34

who aims to make vector databases retrieval augmented generation,

Speaker: 00:01:38

also known as RAG, and semantic search accessible by

Speaker: 00:01:42

creating engaging YouTube videos, code notebooks, and blog

Speaker: 00:01:45

posts that transform complex AI concepts

Speaker: 00:01:49

into easily understandable content. After graduating with

Speaker: 00:01:53

a BA in statistics from the University of Chicago, his journey through

Speaker: 00:01:57

tech world stands spans from making speech coaching

Speaker: 00:02:00

accessible with AI at Speeko to tackling AI

Speaker: 00:02:04

generated content detection at Appen. Arjun's

Speaker: 00:02:08

interest spans traditional natural language processing into modern

Speaker: 00:02:12

large language model development and applications.

Speaker: 00:02:15

Behind beyond his technical prowess, Arjun has been designing and folding his

Speaker: 00:02:19

own origami creations for over a decade. Interesting.

Speaker: 00:02:23

Seamlessly blending analytical thinking with artistic expression and his

Speaker: 00:02:26

professional and personal pursuits. Welcome to the show, Arjun.

Speaker: 00:02:30

Hey. Nice to meet you, Frank. Thanks for having me on. Excited to be here.

Speaker: 00:02:34

Awesome. Awesome. There's a lot to unpack from there, but I think it's interesting to

Speaker: 00:02:37

note that you have a BA in statistics. Yes. So you were probably

Speaker: 00:02:41

studying, this sort of stuff before it was cool?

Speaker: 00:02:45

Yeah. Yeah. A lot of the old school ways of analyzing

Speaker: 00:02:49

data, understanding what's going on, so on and so forth.

Speaker: 00:02:53

It was kind of, like, made clear to me pretty early that

Speaker: 00:02:56

understanding how to work with data at small scale and at large scale is gonna

Speaker: 00:03:00

be very important going to the future. So I kinda just took that and ran

Speaker: 00:03:03

with it with my education. Very cool. It was

Speaker: 00:03:07

definitely, you know, one of those things where I don't

Speaker: 00:03:10

think people realized how important statistics would be until,

Speaker: 00:03:15

you know, until the revolution happens, so to speak. So and it's also

Speaker: 00:03:19

interesting to see because there's a lot of people that I think could benefit from,

Speaker: 00:03:23

you know, picking up that old picking up a, an old statistics book and

Speaker: 00:03:26

reading through it and understanding, like, a lot of the fundamentals. Obviously, there's a lot

Speaker: 00:03:30

of new things, but a lot of the fundamentals are largely the

Speaker: 00:03:34

same. You know, just I'll

Speaker: 00:03:37

use this example. You know, McDonald's can add a Mc McRib sandwich,

Speaker: 00:03:41

but it's still a McDonald's. Right? Like, it's This

Speaker: 00:03:45

is what happens when you're shoveling snow. Like, your

Speaker: 00:03:49

brain gets I absolutely agree. And, like,

Speaker: 00:03:52

another proof on that point is that Anthropic just released a

Speaker: 00:03:56

blog recently kind of recapping how to do statistical analysis when you're

Speaker: 00:04:00

comparing different large language models. And when you read the paper in the blog,

Speaker: 00:04:04

it's basically just like 2 sample t tests and kind of going over really,

Speaker: 00:04:08

like, not introductory, but still statistics that's easily accessible for people to

Speaker: 00:04:12

learn and understand. So it's still relevant, and it's still important.

Speaker: 00:04:15

Interesting. One of the things that that that stood out in your in your bio

Speaker: 00:04:19

was, people tend to forget that there

Speaker: 00:04:23

was a natural language processing field prior

Speaker: 00:04:27

to chat gpt launching.

Speaker: 00:04:31

How do you, you know,

Speaker: 00:04:36

we wanna talk about the difference between those 2? Sure.

Speaker: 00:04:40

So the one of the first and probably only

Speaker: 00:04:44

course I took in college related to natural language processing was

Speaker: 00:04:48

called geometric models of meaning. And everything I learned in that

Speaker: 00:04:51

course was like everything before, what we now would

Speaker: 00:04:55

consider, like, modern embedding models. So bag of

Speaker: 00:04:59

word methods, understanding how to represent documents and text purely

Speaker: 00:05:03

based on, like, the frequency of the words that exist in the text,

Speaker: 00:05:06

and then trying to understand, like, okay. Based on that information, how can

Speaker: 00:05:10

we learn about the concepts that exist in text from the words that are being

Speaker: 00:05:14

used? Like, what is the framework we can use to understand what these

Speaker: 00:05:17

words mean based on their, co occurrences with the other words and

Speaker: 00:05:21

texts that you're working with and based on, what those

Speaker: 00:05:25

words mean as well. So, like, what the words' neighbors are and what their meaning

Speaker: 00:05:28

helps and also what those words are doing. And I think a lot of traditional

Speaker: 00:05:32

natural language processing, methodologies kinda stem from that, and

Speaker: 00:05:36

there's a there's a lot of mileage you can get out of just thinking about

Speaker: 00:05:39

approaching problems there before you step into these more complicated methods,

Speaker: 00:05:43

like, these embed modern embedding models that exist. So that's kind of, like, what I

Speaker: 00:05:46

would consider, like, traditional NLP, like, doing named entity recognition,

Speaker: 00:05:50

trying to understand how to, find keywords really

Speaker: 00:05:54

quickly. And then once you get really good at that, there's a whole host of

Speaker: 00:05:58

problems that you encounter afterward that kind of modern techniques try to

Speaker: 00:06:02

solve. Right. That's interesting. So so

Speaker: 00:06:05

what was it, what was your thoughts

Speaker: 00:06:09

when you first, like given that you were an NLP practitioner

Speaker: 00:06:13

prior to the release of transformers and things like that, what was your initial thought?

Speaker: 00:06:17

Because I'm curious because there's not a lot of people there are a

Speaker: 00:06:21

lot of experts today that really kind of started a couple of years ago. No

Speaker: 00:06:24

fault on them. They see where the industry is going. Totally understand it. But what

Speaker: 00:06:28

was your thoughts? What was your thoughts when

Speaker: 00:06:31

you when you first saw the attention all you need? The

Speaker: 00:06:35

attention is all you need paper. So that would have been

Speaker: 00:06:39

probably around the time I graduated college, around

Speaker: 00:06:42

maybe a year or 2 after I took the course that I was just describing.

Speaker: 00:06:45

So I I just started learning about, like, okay. Like, this is

Speaker: 00:06:49

how, like, old school, quote unquote, like, embedding

Speaker: 00:06:52

methodologies work. And the biggest takeaway that I got from those is that they work

Speaker: 00:06:56

pretty well. They work pretty well for, like, a lots of different kinds of

Speaker: 00:07:00

queries. And I think what the attention all you need paper did

Speaker: 00:07:03

was it kinda helped you, understand how

Speaker: 00:07:07

to rigorously create representations of text that

Speaker: 00:07:11

generalize way better than, any sort of, like,

Speaker: 00:07:14

normal, keyword based, bag of word based search methodology.

Speaker: 00:07:18

And I think that at the time, I probably didn't

Speaker: 00:07:23

grasp as much what impact the attention all you need paper would have on the

Speaker: 00:07:26

field until we started getting embedding models that people could use really

Speaker: 00:07:30

easily, like Roberta or Bert. And we're like, okay. Now we can do, like,

Speaker: 00:07:34

multilingual search without any issue. Now we can represent,

Speaker: 00:07:37

like, any sentence without keyword overlap when we

Speaker: 00:07:41

wanna find some document that's interesting, without doing any

Speaker: 00:07:45

additional work. Like, once those papers started hitting the scene, I think now we start

Speaker: 00:07:48

seeing, like, okay, this is what attention is doing for us. This is what the

Speaker: 00:07:51

ability to, like, contextualize our vector embeddings is doing for us.

Speaker: 00:07:55

And now we can see what's kind of getting benefited there. But I think I

Speaker: 00:07:58

think my, understanding of how beneficial that

Speaker: 00:08:02

was kind of lagged until we started seeing these other models kind of hit. And

Speaker: 00:08:06

I'm like, okay. Now I can kinda see why this is important and why, like,

Speaker: 00:08:09

future and future models are gonna get better and better based on this architecture.

Speaker: 00:08:13

Interesting. So so for those that don't know kind of and even I'm rusty on

Speaker: 00:08:17

this. Right? Yeah. One of the things that was interesting about this was the in

Speaker: 00:08:17

on this. Right? Yeah. One of the things that was interesting about this was the

Speaker: 00:08:21

in first, appearance. What was it? You you just described it a

Speaker: 00:08:25

minute ago, but it was something like the the prevalence of a word

Speaker: 00:08:29

in a bit of text versus the lack of prevalence and how that

Speaker: 00:08:32

metric becomes was very important in in

Speaker: 00:08:36

I'll call it classical natural language processing.

Speaker: 00:08:40

Right. So this is the idea that if you have words that co

Speaker: 00:08:44

occur together in some document space, the meaning of those words are gonna be

Speaker: 00:08:48

more similar than words that don't co occur in some other given document

Speaker: 00:08:51

space. This is rooted in something called the

Speaker: 00:08:55

distributional hypothesis, which is basically this idea and the other

Speaker: 00:08:58

idea that, concepts cluster in in this type of

Speaker: 00:09:02

space. So what what does that mean actually? Right? So if you have the word

Speaker: 00:09:06

like hot dog, it's probably gonna be seen in a corpus that's

Speaker: 00:09:09

near other food related words than it would be if you picked some

Speaker: 00:09:13

other word like space or moon. And there's something we can

Speaker: 00:09:17

learn from that relationship to infer the meaning of what that word

Speaker: 00:09:20

is and how we can use that meaning of that word to learn about what

Speaker: 00:09:24

other words are doing. So So this is kind of, like, the theoretical

Speaker: 00:09:28

basis of, like, why we can represent words geometrically,

Speaker: 00:09:32

with with a little bit of hand waving. But that's kind of the core idea.

Speaker: 00:09:35

And attention kind of takes this a little further by allowing the

Speaker: 00:09:39

representation of these tokens or words to be altered based

Speaker: 00:09:43

on the words that occur in a given sentence. So you might have a

Speaker: 00:09:47

word like does, like, does this mean something?

Speaker: 00:09:50

You might say something like that. Or you might say, I saw some

Speaker: 00:09:54

does in the forest. Both spelled exactly the same, but have

Speaker: 00:09:58

completely different meanings based on their context. And if you used a

Speaker: 00:10:01

traditional, maybe, bag of words model where you're just counting the

Speaker: 00:10:05

words that occur in a given document and kind of creating a representation of what

Speaker: 00:10:08

that document looks like based on the words that are composed in there, you're gonna

Speaker: 00:10:12

overlap and conflict with the meaning of those of of the word

Speaker: 00:10:16

does and does because they're spelled exactly the same. They might look

Speaker: 00:10:19

exactly the same with this type of representation. But if you have a way of

Speaker: 00:10:23

informing what that word means with its context, which is what attention

Speaker: 00:10:27

allows us to do, then you can completely change how that's being

Speaker: 00:10:30

represented in your downstream system, which allows you to do interesting things

Speaker: 00:10:34

with with search. So that's kind of, like, the biggest benefit that's coming out of

Speaker: 00:10:38

that type of methodology, and that kinda enables what is now known as

Speaker: 00:10:41

semantic search and retrieval augmented generation and so on and so forth. I was gonna

Speaker: 00:10:45

say, that sounds very it's almost like it was, like, the old pre

Speaker: 00:10:50

that error, the vectorization of this and the distance in

Speaker: 00:10:53

that vector in that geometric space. I guess

Speaker: 00:10:57

we've been doing that for a lot longer than most people realize in in a

Speaker: 00:11:00

sense. Yeah. I mean,

Speaker: 00:11:04

looking through, indexes or document stores with some sort of

Speaker: 00:11:07

vectorization has has has been,

Speaker: 00:11:12

something that people have done, except instead of being dense vectors, which is, like,

Speaker: 00:11:16

you have some fixed size representation that isn't necessarily interpretable

Speaker: 00:11:19

to the human eye for some given query or document, it would

Speaker: 00:11:23

be, like, the size of your vocabulary. So you think of, like, Wikipedia. You

Speaker: 00:11:27

can find, like, every unique word on Wikipedia, and, like, that is gonna be how

Speaker: 00:11:31

big your vector's gonna be. And every time you have a new document come in,

Speaker: 00:11:34

a new article, somebody's kind of, like, wrote up and published to Wikipedia, like, you're

Speaker: 00:11:37

representing that in terms of its vocabulary. But now instead of doing that, we

Speaker: 00:11:41

have, like, this magical fixed sized box that allows us

Speaker: 00:11:45

to represent chunks of text in a way that is

Speaker: 00:11:49

extremely fascinating and abstract. And every time I think about it, it just, like, blows

Speaker: 00:11:52

my mind, but that's kind of, like, the main kind of difference is the way

Speaker: 00:11:56

we're representing that information and how compact compact that is and

Speaker: 00:11:59

generalizable it has become. Yeah. That is, like, it it's almost

Speaker: 00:12:03

like you're, you know correct me if I'm wrong, but, you know,

Speaker: 00:12:06

creating these vectors, these large vector databases, right, with, you

Speaker: 00:12:10

know, 10, 12,000 dimensions, right, of how these words

Speaker: 00:12:14

are measured in relationship to others.

Speaker: 00:12:17

It's almost as a consequence of training a large language

Speaker: 00:12:21

model, you create a knowledge graph. Is that is that true? Is that really the

Speaker: 00:12:24

case where, you know, like, you know, dog is most likely to be

Speaker: 00:12:28

next to, you know, the word pet, you know, or

Speaker: 00:12:32

it has the same distance. Is that I'm not

Speaker: 00:12:35

explaining it right. No. No. No. You're you're on you're on the right track exactly.

Speaker: 00:12:39

And I think this is, like, one of the most fascinating qualities

Speaker: 00:12:43

of even, like, what people would consider, like, older

Speaker: 00:12:46

embedding models is this idea that you can take, like, a training test that

Speaker: 00:12:50

seems completely unrelated to the quality that you want in a downstream model,

Speaker: 00:12:54

and it turns out that that actually achieves that quality. So, what you were referring

Speaker: 00:12:58

to, Frank, is this idea that you might have, like, a sentence. You

Speaker: 00:13:02

might have, like, I took my dog out on a walk, and you might say,

Speaker: 00:13:05

okay. I'm gonna remove the word, walk, and I'm gonna have

Speaker: 00:13:09

I'm gonna train some model that tries to predict what that word

Speaker: 00:13:13

where I removed was. This is masked language modeling, which is this idea that you're

Speaker: 00:13:17

kind of getting at of, like, okay, what are the words and how are they

Speaker: 00:13:20

in relation to the other words in that sentence? And it turns out that if

Speaker: 00:13:23

you, like, do this with, like, 100 of 1,000 of millions of sentences and

Speaker: 00:13:26

words, in some corpus that is somewhat representative of

Speaker: 00:13:30

how people, use human language, you can

Speaker: 00:13:34

act you will get really good at this task, number 1, because you're training the

Speaker: 00:13:37

model on that task exactly. But if you are training a neural

Speaker: 00:13:41

network on that model, some intermediate layer representation

Speaker: 00:13:45

in that model so somewhere in that set of matrix

Speaker: 00:13:48

multiplications where you're turning this input sentence into some fixed size

Speaker: 00:13:52

vector representation is gonna be a good representation

Speaker: 00:13:56

of what that word or that token or that sentence is going to be.

Speaker: 00:14:00

And the fact that that works is not intuitive. Right?

Speaker: 00:14:04

The the fact that that works has been shown empirically, and it turns out that

Speaker: 00:14:07

we can kind of do that and kind of have these models work really well.

Speaker: 00:14:10

And nowadays, in addition to kind of doing that, which is what we would consider

Speaker: 00:14:13

pretraining on some large corpus, we now fine tune those

Speaker: 00:14:17

embedding models on specific tasks that are important to us

Speaker: 00:14:21

for retrieval. Like, okay, we have this query or question we're

Speaker: 00:14:24

asking. We have the set of documents that might answer this question or might

Speaker: 00:14:28

not. We want a model that makes it so that the query's embedding and the

Speaker: 00:14:31

document relevance embeddings are in the same vector space. So you're on the right track.

Speaker: 00:14:35

That's, like, basically how these models are able to learn these things. I don't know

Speaker: 00:14:39

if I would call them, graph representation, maybe a little bit

Speaker: 00:14:43

of, being being pandactic on, like, use of words there because that can

Speaker: 00:14:46

be a little bit, different how how you're organizing that information.

Speaker: 00:14:50

But you can make the argument that the way that these large language models are

Speaker: 00:14:54

representing information is a compressed form of, like, the giant dataset that they're

Speaker: 00:14:57

trained on. And we don't actually know exactly, like, where that

Speaker: 00:15:01

information lies inside that neural network. There's some research that's,

Speaker: 00:15:05

like, trying to get at answering that question, But you could, for the sake of

Speaker: 00:15:08

argument, be like, yeah. There's probably, like, a a a dog

Speaker: 00:15:12

node somewhere in this neural network that knows a ton about dogs, and that's how

Speaker: 00:15:15

we're able to kind of learn this information. That is the stuff that we don't

Speaker: 00:15:18

exactly know. Interesting. Because, there was a really good

Speaker: 00:15:22

video by 3 blue one brown, which you probably are I love that

Speaker: 00:15:26

channel. Where he gives examples where, you know, famous historical

Speaker: 00:15:29

leaders from Britain have the same distance

Speaker: 00:15:33

from you change the country to Italy

Speaker: 00:15:37

or the United States have the same kind of distance. So you can kind

Speaker: 00:15:41

of infer I'm not saying that the AI it

Speaker: 00:15:44

almost seems like this knowledge graph is also is also a byproduct

Speaker: 00:15:48

of of of building this out. Like, the there's some

Speaker: 00:15:51

type of encoding or semantic, I guess, is this is really what it is. Right?

Speaker: 00:15:55

Like, that that you get with it. And, I wanna get

Speaker: 00:15:59

your thoughts because yesterday, I I caught the part the

Speaker: 00:16:03

first half of the Jetson Juan keynote at c s CES,

Speaker: 00:16:07

which this you know, we're recording this on January 8th. Right? And one of the

Speaker: 00:16:10

things that the video starts off with is, you know, the idea

Speaker: 00:16:13

that tokens are kind of fundamental elements of

Speaker: 00:16:17

knowledge. And I did a live stream where I'm like, well, I never really thought

Speaker: 00:16:21

about it this way. Right? They're they're building blocks of knowledge or the pixels, if

Speaker: 00:16:24

you will, of of of of knowledge. And I wanted to get your

Speaker: 00:16:28

thoughts on that because, like, that kind of blew my mind and maybe I'm simple.

Speaker: 00:16:32

I don't know. Maybe I'm not. But it all it seems like we've been kinda

Speaker: 00:16:35

dancing around this idea where and now NVIDIA is really

Speaker: 00:16:38

fully, you know, going all in on this, the idea that, you know,

Speaker: 00:16:42

these are not, this isn't an AI system. It's a token factory

Speaker: 00:16:46

or a token score. What are your what are your thoughts on that? I'm curious.

Speaker: 00:16:50

So when I started learning about how, like, tokenization works

Speaker: 00:16:54

and how we're able to kind of, like, basically build these

Speaker: 00:16:57

models without having massive, massive vocabularies,

Speaker: 00:17:01

it is it is pretty it it is pretty

Speaker: 00:17:05

interesting to be, like, okay. Like, maybe maybe there's some,

Speaker: 00:17:10

abstract notion of information that each token has that

Speaker: 00:17:13

is being that is what the model is learning during training time. And then

Speaker: 00:17:17

we're just combining these sets of information in order to kind of, like, understand

Speaker: 00:17:21

what words mean or what documents mean, so on and so forth. Because when you

Speaker: 00:17:24

look at how, tokenizers work and the size of the number of

Speaker: 00:17:28

tokens for, like, maybe the English language or maybe, like, a really multilingual

Speaker: 00:17:32

model like Roberta or multilingual e five large, they're a lot

Speaker: 00:17:35n the order of, like, maybe a: 100000Speaker: 00:17:39,: 300000Speaker: 00:17:43

So it is kind of

Speaker: 00:17:47

odd to think about whether those tokens

Speaker: 00:17:50

themselves hold information that's readily interpretable for us. But I

Speaker: 00:17:54

think that we've gotten so far with using

Speaker: 00:17:57

systems that are just combining, the operations on top of

Speaker: 00:18:01

these tokens in order to retrieve the information that these systems have learned, that there's

Speaker: 00:18:05

definitely something important there. And I would love to, like, know

Speaker: 00:18:08

exactly, like, what is happening when we're able to do that. The the

Speaker: 00:18:12

heuristic that I like to use is, large

Speaker: 00:18:16

language models are generally reflections of the training datasets that they've been trained on,

Speaker: 00:18:20

and they're basically creating, like, really efficient indexes over that

Speaker: 00:18:23

information. And sometimes those indices hallucinate. And the reason

Speaker: 00:18:27

why is because we are when we ask, quote, unquote, what

Speaker: 00:18:31

a question to a large language model or query a large language model, we

Speaker: 00:18:35

are kind of conditioning that model, on a probability

Speaker: 00:18:39

space where every token being generated after is

Speaker: 00:18:42

likely to exist given the query or the context or whatever we're passing to

Speaker: 00:18:46

it. And once you think about it that way, then it just feels like

Speaker: 00:18:50

instead of thinking about what each of the tokens are doing, you're kind of just

Speaker: 00:18:54

querying what the model has been trained on and what it will tell you

Speaker: 00:18:57

based on what it, quote unquote, learned or knows.

Speaker: 00:19:01

And then you can kind of run with that metaphor a lot and build systems

Speaker: 00:19:04

on on top of that. That seems, much more actionable than thinking about,

Speaker: 00:19:08

like, what each of the tokens are doing individually. Does that kinda make sense? No.

Speaker: 00:19:11

That makes a lot of sense. I think the whole gestalt of it is what

Speaker: 00:19:13

really makes it magical. Right? Like Yeah. You know, you can you

Speaker: 00:19:16

can obviously, I I don't this is not this is not, like, the newest iPhone

Speaker: 00:19:20

or whatever. But, you know, if you go through the the text auto complete,

Speaker: 00:19:24

you can maybe make a sentence that sounds like

Speaker: 00:19:28

something you would write. But much beyond that, it starts getting weird. In

Speaker: 00:19:32

early generative AI was very much like that, particularly the images.

Speaker: 00:19:35

Well, you know Don't like, yes. A 100%

Speaker: 00:19:39

understand. I started learning about generative, text

Speaker: 00:19:43

generation before we had instruction fine tune model. So are you

Speaker: 00:19:46

familiar with, like, the concept of instruction fine tuning, Frank? I think I am,

Speaker: 00:19:50

but I IBM slash Red Hat defines it one way. I would like to get

Speaker: 00:19:53

your opinion. Yeah. So, this is the idea that

Speaker: 00:19:57

you can train or fine tune large language models to follow

Speaker: 00:20:01

instructions to complete tasks. So, before we had,

Speaker: 00:20:04

like, models that could that we could just, like, ask questions of and just, like,

Speaker: 00:20:08

receive answers directly, you had to craft text

Speaker: 00:20:13

that would increase the probability that the document that you want to

Speaker: 00:20:16

generate would happen. So if you wanted a story about, like, unicorns or something,

Speaker: 00:20:20

you would have to start your query to the LLM as there

Speaker: 00:20:24

once was, like, a set of unicorns living in the forest. Blah blah blah blah.

Speaker: 00:20:27

And then it would just, like, complete sentence, just like a fancy version of autocomplete.

Speaker: 00:20:30

Right. And that that's kind of, like, what we used to have, and that was

Speaker: 00:20:34

pretty hard to work with. And then once researchers kinda cracked, like, wait a second.

Speaker: 00:20:37

We can create a dataset of, like, instruction pairs and, like, document

Speaker: 00:20:41

sets and fine tune models on them. And it turns out now we can just,

Speaker: 00:20:45

like, ask models to do things, and they will do them. Whether or not

Speaker: 00:20:48

those are correct is kind of the next part of the story. But getting to

Speaker: 00:20:52

that point, it was, like, pretty interesting and pretty significant.

Speaker: 00:20:56

Interesting. Interesting. When I think of

Speaker: 00:20:59

fine tuning, I think of I think of

Speaker: 00:21:03

primarily InstruqtLab, where you basically kinda have a

Speaker: 00:21:07

LoRa layer on top of the base LLM doing

Speaker: 00:21:10

that. Is that the same thing? Or is it kind of slightly

Speaker: 00:21:14

it sounds like it's slightly nuanced. So the nuance there

Speaker: 00:21:18

is that, one, though this the methodology that I'm

Speaker: 00:21:22

describing is mostly dataset driven. So you have, like, your original LLM,

Speaker: 00:21:26

and then you have, like, a new dataset that allows the LLM to learn a

Speaker: 00:21:29

specific task. Or in this case, like, a generalized form of tasks,

Speaker: 00:21:33

which is you have instruction, answer, user query,

Speaker: 00:21:37

give it an instruction. Whereas in your case, you're kind of, like, adding another layer

Speaker: 00:21:41

to the LLM and, like, forcing the LLM to learn all the new

Speaker: 00:21:44

methodology inside that layer in order to accomplish a specific

Speaker: 00:21:48

task. So that's kind of like what client cleaning ends up doing. So the other

Speaker: 00:21:52

way there's multiple ways to do this, it seems. Right? Like, there there's that way

Speaker: 00:21:55

we add the layer, but there's also kind of I hate the term prompt engineering

Speaker: 00:21:58

because it's just so over overblown. But, like, giving it

Speaker: 00:22:02

more context and samples. And now that the the token context

Speaker: 00:22:06

window is large enough that you don't have to be well, if you wanna

Speaker: 00:22:10

save money, you have to be very mindful of that. But if you're running it

Speaker: 00:22:12

locally, like, doesn't really matter. Well, you could give it an example of

Speaker: 00:22:16

let's just say you had I'm trying to think of a short story or a

Speaker: 00:22:19

novel. I don't know. Let's pretend,

Speaker: 00:22:23

Moby Dick was only a 100 pages. Right? I

Speaker: 00:22:27

could give it that as the part of the prompt. Let's say write a sequel

Speaker: 00:22:30

to this book based on what happens in this one. Is that what you're talking

Speaker: 00:22:34

about? Were you kinda giving an example as part of the prompt? Or is there

Speaker: 00:22:38

some and not part of the layer? Or some combination thereof? Or was some third

Speaker: 00:22:41

thing entirely? So this would be like, what what

Speaker: 00:22:45

you're describing is more like few shot learning, which is you gave kind of an

Speaker: 00:22:49

example, and then you're, like, okay. Like, given these examples, can you do this other

Speaker: 00:22:53

task this test that I've described on this unseen example? What I'm describing is

Speaker: 00:22:56

kind of, like, slightly before that. So, like, before we had the ability to, like,

Speaker: 00:23:00

give models examples, we had to, like, give them we have to

Speaker: 00:23:03

create the ability to follow instructions. And then once you have the ability to

Speaker: 00:23:07

follow instructions, you can be like, okay. Here are the instructions. Here's

Speaker: 00:23:11

examples of correctly completing the instruction, now do the instruction.

Speaker: 00:23:14

And that is the reason why that happens in that order is

Speaker: 00:23:18

because first, you have, like, just, like, sequence completion, like,

Speaker: 00:23:21

autocomplete. Then you have, like, okay, given this

Speaker: 00:23:25

task given this set of instructions, just follow the instruction instead of,

Speaker: 00:23:29

like, trying to do autocomplete. And then you have, okay, now you know how to

Speaker: 00:23:32

follow instructions. I'm gonna give you a few data points in order to

Speaker: 00:23:36

learn a new task. Now do this new task. So you're kind of,

Speaker: 00:23:40

like, moving from a situation where you need tons and tons

Speaker: 00:23:43

of data just to get the, sequence completion. And then you need

Speaker: 00:23:47

a smaller set of data to, like, get the capability to follow instructions.

Speaker: 00:23:51

And then you need a very, very, very small amount of data, like,

Speaker: 00:23:55

maybe 3 points or 10 examples or 15 examples to complete kind of, like,

Speaker: 00:23:59

a new task. So there's a lot of kind of nuance in, like, how

Speaker: 00:24:02

modern LLMs are being used and how they're kind of trained and fine tuned, so

Speaker: 00:24:06

on and so forth. And I think there's a lot of, like,

Speaker: 00:24:09

important importance in, like, learning what what happened kind of

Speaker: 00:24:13

before because the advancements have happened so quickly. It can be really hard to kind

Speaker: 00:24:16

of differentiate, or, like, oh, why is why do models perform like this? Why

Speaker: 00:24:20

do things kind of happen like that? And even though, prompt

Speaker: 00:24:24

engineering has kind of, like, let's say, traveled through the

Speaker: 00:24:28

hype cycle where people were, like, really excited about it, and then we're, like, this

Speaker: 00:24:31

is not actually that interesting. Right. What's interesting is that,

Speaker: 00:24:34

doing building a good RAG system or trivial augmented generation system,

Speaker: 00:24:38

you really need to be good at prompt engineering in a sense

Speaker: 00:24:42

because you're assembling the correct context for this model

Speaker: 00:24:45

to answer some downstream question, And it's not

Speaker: 00:24:49

intuitive how to assemble that context. So understanding, like, how are these

Speaker: 00:24:52

models are trained, like, whether they can follow instructions, how good they are at

Speaker: 00:24:56

doing so, how many examples of information they need in order to accomplish some task

Speaker: 00:25:00

really affects how you build that knowledge base in order to help the

Speaker: 00:25:04

model do some sort of new thing. Interesting.

Speaker: 00:25:09

So RAG is obviously all the rage now.

Speaker: 00:25:13

Yep. But there's also a relatively new because this this

Speaker: 00:25:17

space changes rapidly. Like, I mean, I took 2 weeks off in December, and

Speaker: 00:25:20

I feel completely disconnected from the cutting edge, you know.

Speaker: 00:25:25

Because when I was watching the keynote from CES, and I'm like, wow. That's

Speaker: 00:25:28

really cool. And I was texting, you know, slacking with a coworker, and he goes,

Speaker: 00:25:32

oh, no. This is a retread of their, like, last keynote they did. Like

Speaker: 00:25:35

and I'm like, okay. Wow. Blink and you missed

Speaker: 00:25:39

something. So what

Speaker: 00:25:43

you're describing the fine tuning, is that really what Raft is, where the

Speaker: 00:25:46

idea that you have kind of retrieval augmented fine tuning, which I think is what

Speaker: 00:25:50

the acronym stands for. Is that not I'm

Speaker: 00:25:54

not familiar with how Raft works. So I don't wanna, like, kind of venture

Speaker: 00:25:58

and guess without without knowing what it is. But do you remember, like, what context

Speaker: 00:26:01

you encountered this in? Basically, it's the idea that

Speaker: 00:26:06

it's the idea that you can fine tune the results. Sounds very

Speaker: 00:26:10

similar to what you're doing, and I've haven't read the paper in a while.

Speaker: 00:26:14

Back when I was a Microsoft MVP, like, you know,

Speaker: 00:26:18

they had a Microsoft Research had the thing for their calls, and they

Speaker: 00:26:22

were all raving about it. The paper had just come out and things like that.

Speaker: 00:26:26

It's the idea that you can kind of give it pretrained examples.

Speaker: 00:26:30

You start with a base LLM, and you give it pre trained examples, and then

Speaker: 00:26:33

you add on top of just to retrieve an

Speaker: 00:26:37

augmented portion of it. It's very similar, not to

Speaker: 00:26:41

plug my you know, for my day job. I work at Red Hat. That's why

Speaker: 00:26:43

there's a fedora there. We have a product called Rel

Speaker: 00:26:47

AI, which is based on an upstream open source project called instruct

Speaker: 00:26:51

lab. And it's the idea similar idea in that you you you

Speaker: 00:26:54

basically give it a set of data.

Speaker: 00:26:58

And then you we there's a there's a little more to it because there's a

Speaker: 00:27:01

teacher model. And basically what it'll do is it will and synthetic data generation.

Speaker: 00:27:05

So you can start with a modest document set.

Speaker: 00:27:10

And based on how the questions and answers that you

Speaker: 00:27:13

form and the the the,

Speaker: 00:27:17

the taxonomy that you attach to it, it will

Speaker: 00:27:21

create a LoRa layer on top of an existing LLM.

Speaker: 00:27:26

And it it could be that it's it's it's not quite exactly the same as

Speaker: 00:27:29

Raft, but it's definitely in the same direction. Same same thing as, like, Bert, Elmo,

Speaker: 00:27:33

and, you know, Roberta, which, I think

Speaker: 00:27:37

I think I understand. So it's kind of like you so the I think the

Speaker: 00:27:40

problem that might be addressing is kind of just really similar to the problem that

Speaker: 00:27:44

traditional RAG tries to address, except in a more kind of deliberate fashion

Speaker: 00:27:48

Exactly. Yeah. Where you have some document store internally. Like, let's say we

Speaker: 00:27:51

both work at some company, and we have a giant customer support document store.

Speaker: 00:27:55

You take some LLM off the shelf. It's not necessarily gonna know the

Speaker: 00:27:59

contents of your internal kind of documents. So how can you get

Speaker: 00:28:02

it to, like, successfully help answer tickets or triage tickets that

Speaker: 00:28:06

you're trying to build, so that you can answer, like, most difficult tickets and

Speaker: 00:28:10

kind of work toward that. In this situation, maybe you

Speaker: 00:28:13

want to, inject some of the knowledge of

Speaker: 00:28:17

the documents in addition to having the

Speaker: 00:28:21

model being able to search over the document store. So maybe, like, the what this

Speaker: 00:28:24

lower layer is doing is, like, absorbing Yeah. Some of the knowledge from the

Speaker: 00:28:28

document store so that you can kind of more

Speaker: 00:28:32

efficiently query, the database and so

Speaker: 00:28:35

that you don't have to, like, query it all the time. The only,

Speaker: 00:28:39

issue, quote, unquote, I'd have with that method is that you'd have to, like, keep

Speaker: 00:28:43

that updated from time to time, and that's, like, not that's nontrivial. Whereas

Speaker: 00:28:47

if you just do, like, traditional RAG, you just need to

Speaker: 00:28:50

update your, Vector Store, and then you can just have the model

Speaker: 00:28:54

query that new information when you need to. But, you know, it's always best to

Speaker: 00:28:57

use whatever solution works best for your, given use case.

Speaker: 00:29:01

And experimenting with different use cases is always really important. But I imagine that's, like,

Speaker: 00:29:04

kind of what that is trying to address, which is the That is basically it.

Speaker: 00:29:08

The I, you know, I don't wanna go down that rabbit hole of that. But

Speaker: 00:29:11

but, basically, the idea is that, if

Speaker: 00:29:15

you train an LLM or you have a layer on top of an

Speaker: 00:29:18

LLM that not only does retrieval from a source document

Speaker: 00:29:22

store. Right? I think that's a pretty set pattern. But it also has a

Speaker: 00:29:25

better understanding of your business, your industry, the jargon.

Speaker: 00:29:29

Right. Right. Blah blah blah. Right? The idea is that the retrieval success

Speaker: 00:29:33

rate will be higher. Now we're not publishing the numbers yet,

Speaker: 00:29:37

but the research is still ongoing. But basically, it's a

Speaker: 00:29:41

pretty substantial from what I've seen well, I haven't

Speaker: 00:29:44

seen the actual numbers yet, but from what I've been told those numbers are by

Speaker: 00:29:47

the researcher, that it is a it is a substantial improvement

Speaker: 00:29:51

that is worth the, the juice is worth the squeeze in that in that regard.

Speaker: 00:29:55

You're not and it's also computationally, you're not quite training the

Speaker: 00:29:59

whole thing again. You're just kinda putting a new Instagram filter, so to

Speaker: 00:30:03

speak, together on top of the base. So it definitely

Speaker: 00:30:06

does it definitely does some things. Now when we get the hard

Speaker: 00:30:10

numbers, then, you know, I mean, I can

Speaker: 00:30:13

say them publicly, then I think we'll we'll know is the juice how

Speaker: 00:30:17

much does the the the the squeeze to juice ratio is?

Speaker: 00:30:22

But, I can confidently say publicly now, like, there's a there

Speaker: 00:30:26

there. Yeah. And, you know, we'll have those numbers soon

Speaker: 00:30:29

enough. But it's it's interesting because you're right. I mean, this paper

Speaker: 00:30:33came out in: 2019Speaker: 00:30:37

explosion of these different mechanisms. You mentioned Bert. You mentioned Roberta.

Speaker: 00:30:41

Fun fact, my wife's name is Roberta. So that was kind of fun.

Speaker: 00:30:45

There was Elmo. There was Ernie. There was a whole Sesame

Speaker: 00:30:48

Street themed zoo of of model

Speaker: 00:30:52

types. That seems to have kind of that branching out of

Speaker: 00:30:56

those different directions has seemed to have stalled, and we're going into more of

Speaker: 00:31:00

these retrieval augmented generation systems. So for those who because

Speaker: 00:31:03

not everybody on our listeners know exactly what retrieval

Speaker: 00:31:07

augmented systems are. Could you give kind of a a

Speaker: 00:31:11

level 200 elevator explanation? Sure.

Speaker: 00:31:15

So, when you speak to a modern chatbot,

Speaker: 00:31:19

what's happening is that they've learned information through their pre

Speaker: 00:31:23

training processes, the large corpus of basically the entire Internet,

Speaker: 00:31:27

and are generating information based on the query that you're passing in.

Speaker: 00:31:31

The problem that often occurs is that

Speaker: 00:31:35

these AI models might error, and the error could

Speaker: 00:31:39

be making, inform making information up that doesn't

Speaker: 00:31:42

exist. For example, if a model is trained before a period of time,

Speaker: 00:31:46

like, it might not know about that period of time, which is which happens more

Speaker: 00:31:49

often than you think. The information could be false, untruthful, or it could

Speaker: 00:31:52

just be incorrect in a way that's not, like, bad, but still not

Speaker: 00:31:56

helpful. And the reason for this is the way that these

Speaker: 00:32:00

models are accessing that information. The idea behind retrieval

Speaker: 00:32:03

augmented generation is that instead of having the model try

Speaker: 00:32:07

to, generate the correct document or the correct

Speaker: 00:32:10

response given its pretraining process, you instead

Speaker: 00:32:14

add factual content to the query that you're asking

Speaker: 00:32:18

the model for. You first search for that content, which is where

Speaker: 00:32:22

the retrieval part comes, and then you augment the generation of what that

Speaker: 00:32:25

model is going to create based on that content, hence

Speaker: 00:32:29

retrieval augmented generation. There's usually, a querying

Speaker: 00:32:33

step. So you take in a user query, you hit it against some sort

Speaker: 00:32:36

of database, usually a vector database. In our case, it could be Pinecone.

Speaker: 00:32:40

You find a set of relevant documents. You pass that to the generating LLM.

Speaker: 00:32:44

The generating LLM uses those documents to generate a final

Speaker: 00:32:47

response. And it turns out that if you do this, you can reduce the right

Speaker: 00:32:50

hallucinations. And that makes sense because if the model was given true

Speaker: 00:32:54

information and then conditioned its generation on that information, it

Speaker: 00:32:58

follows that the probability of generating information that is

Speaker: 00:33:01

correct could be higher. That's a good exam that's a good

Speaker: 00:33:05

explanation. So you're basically giving it a

Speaker: 00:33:09

crash course in what documents you care about. Right? Like

Speaker: 00:33:12

Exactly. Interesting. And that's a good segue

Speaker: 00:33:16

because you work for Pinecone. So so tell me about Pinecone. What is Pinecone?

Speaker: 00:33:20

Yeah. So Pinecone is a, knowledge layer for AI. It's

Speaker: 00:33:24

kind of like the way we like to describe it. We the main product that

Speaker: 00:33:28

we provide is a vector database. So this is a way of storing

Speaker: 00:33:31

information, information that has been vectorized, in a really

Speaker: 00:33:35

efficient manner. And it turns out that if you have the ability to store information

Speaker: 00:33:39

in this manner, you can search against it really quickly, with

Speaker: 00:33:42

low latency and to find the things that you need to find really interesting for

Speaker: 00:33:46

these types of semantic search and rag systems. Pinecone has a few other

Speaker: 00:33:50

offerings now that kind of help people build these systems a lot easier. There's

Speaker: 00:33:54

Pinecone Inference, which lets you embed data in order to do that querying

Speaker: 00:33:57

step. Pinecone Assistant, which lets you just build a RAG

Speaker: 00:34:01

system immediately just by upsurting documents into our vector database,

Speaker: 00:34:06

so on and so forth. But the reason why, like, you

Speaker: 00:34:09

need a vector database is because all of this advance of

Speaker: 00:34:13

semantic search of embedding models. People have gotten really, really

Speaker: 00:34:16

good at representing chunks of information using these dense sized

Speaker: 00:34:20

vectors. But once you have 1,000, millions,

Speaker: 00:34:24

even billions of vectors across tons of different users, you need a way

Speaker: 00:34:27

of indexing this information to access it really quickly at

Speaker: 00:34:31

scale, especially if your chatbot's gonna be querying this vector database really

Speaker: 00:34:35

often. And so having a specialized data store that can handle that type

Speaker: 00:34:38

of search becomes really useful. That's why Pinecone is here, and that's

Speaker: 00:34:42

why we exist. Interesting. Interesting.

Speaker: 00:34:47

One of the other interesting things from your bio, aside from

Speaker: 00:34:51

the the the origami,

Speaker: 00:34:55

Tell me about this. So so you

Speaker: 00:34:59

your crew does your do you create the YouTube videos, or do you use your

Speaker: 00:35:02

tools, or is it something completely it's just part of your job as a developer

Speaker: 00:35:05

advocate? So it is just part of my job as a

Speaker: 00:35:09

developer advocate. Oh, okay. Like, often that, you

Speaker: 00:35:13

know, I do that because we are interviewing people or because there's a new

Speaker: 00:35:16

concept we wanna teach people, so on and so forth. Or we do a webinar,

Speaker: 00:35:20

and we just upload it to YouTube. Oh, very cool. Very cool.

Speaker: 00:35:24

Yeah. I started my career in developer

Speaker: 00:35:27

advocacy. One was called evangelism. So I was a a Microsoft

Speaker: 00:35:31

evangelist for a while. So yeah. Yeah. Cool. YouTube

Speaker: 00:35:34

is very important. Yep. But it's

Speaker: 00:35:38

also it's also, I think, speaks to how people learn,

Speaker: 00:35:43

but, how people learn. YouTube University is very

Speaker: 00:35:47

real. Right? And Yep. You know, not not a knock on

Speaker: 00:35:50

traditional schools, not a knock on traditional publishing, but this space

Speaker: 00:35:54

is moving so fast that if it weren't for YouTubers like 3blueonebrown

Speaker: 00:35:59

I think his real name is, Grant Sanderson. I think that's his real name.

Speaker: 00:36:04

Somebody will send me hate mail if I get it wrong. But,

Speaker: 00:36:08

he he is, like, really good at explaining these

Speaker: 00:36:12

really abstract mathematical concepts. And

Speaker: 00:36:15

unlike you, I didn't study math undergrad. I didn't I mean, I had to. I

Speaker: 00:36:19

only took the requirements. Right? But I have comp sci degrees. So, like, for me

Speaker: 00:36:23

to kind of fall in love with math again or for the first time, depending

Speaker: 00:36:26

on depending on how you wanna say that, for me, that

Speaker: 00:36:30

was very helpful. And under having an understanding of this, if you're a data engineer

Speaker: 00:36:34

and, you know, or wanna get into this space, it's

Speaker: 00:36:37

definitely vector databases for traditional kinda SQL kinda

Speaker: 00:36:41

RDBMS person will look very awkward at first. But

Speaker: 00:36:45

I know a lot of people that have made the transition, and they kinda love

Speaker: 00:36:48

it. Right? Because in a lot of ways, it's way more efficient,

Speaker: 00:36:52

than, I dare say, traditional data stores. But when you're

Speaker: 00:36:56

processing the large blocks of text, it's really good for kind of

Speaker: 00:36:59

parsing through that. But

Speaker: 00:37:03

that's that's really cool. So, we do have the preset

Speaker: 00:37:07

questions if you're good for doing those. I'll put them in the chat in case

Speaker: 00:37:09

you don't have them. Sure. They're not brain teasers

Speaker: 00:37:13

or anything like that. They are pretty basic of,

Speaker: 00:37:17

questions, and I will paste them in the chat.

Speaker: 00:37:22

So the first question is, how did you find your way into

Speaker: 00:37:26

AI? Did you did you find AI, or did

Speaker: 00:37:29

AI find you? So this is a little bit of a

Speaker: 00:37:33

crazy story, but AI definitely found me.

Speaker: 00:37:37

So when I was in college, when I was looking for my 1st

Speaker: 00:37:40

internship, I couldn't find any internships, basically, because I had, like, no

Speaker: 00:37:44

previous experience in working at tech or anything like that. And,

Speaker: 00:37:48

the first company I worked for, Speeko, took a chance on me because they were

Speaker: 00:37:51

building public speaking, tools to kind of help people learn how to do

Speaker: 00:37:55

public speaking better, for an iOS app. And I had some

Speaker: 00:37:59

public speaking experience. They were, like, close enough. We'll have you come on and kind

Speaker: 00:38:02

of help us, like, work work things out. And while I was there, it was

Speaker: 00:38:05

made very obvious to me how important building

Speaker: 00:38:10

very basic deep learning systems and AI systems to kind

Speaker: 00:38:14

of accomplish really specific tasks that could help serve an

Speaker: 00:38:17

ultimate goal. Like, what we were trying to do is just, like, see how many

Speaker: 00:38:21

filler words people are using or how quickly or slowly you were speaking.

Speaker: 00:38:24

And that requires a lot of, complicated

Speaker: 00:38:28

processing because you have to do transcription and because you have to figure out what

Speaker: 00:38:31

words are being said, so on and so forth. So kind of experiencing that and

Speaker: 00:38:34

seeing that firsthand really opened my eyes to how powerful

Speaker: 00:38:38had been even back in, like,: 2017Speaker: 00:38:42

since then, I started learning more and more and more about statistics,

Speaker: 00:38:45

AI, natural language processing through my internships,

Speaker: 00:38:49

learning more complicated problems, reading research papers, so on and so forth.

Speaker: 00:38:53

And I got to where I am now. A lot of where I learned is

Speaker: 00:38:56

just out of pure curiosity. Just like, okay. There's this new thing. I wanna learn

Speaker: 00:39:00

about it. That's where I wanna be. And that's kind of how I fell into

Speaker: 00:39:03

large language models and AI, just by wanting to learn about what was going to

Speaker: 00:39:06

happen and then eventually being there. So it definitely found me. I was

Speaker: 00:39:10

not looking for it. Didn't even know I liked statistics until I started doing

Speaker: 00:39:14

statistical modeling. And I was like, wait. This is really fun. I wanna do a

Speaker: 00:39:17

lot more of this. I wanna learn a lot more of this. And I knew

Speaker: 00:39:20

that, once I was in college and I bought a statistics book for fun, and

Speaker: 00:39:24

I was like, okay. I'm I'm past the point of no return. Like, this is

Speaker: 00:39:27

definitely Right. Right. Right. Right. That that might be one of the first times in

Speaker: 00:39:30

history that that's been said. Right. Because I I learned statistics for

Speaker: 00:39:33

fun. I I took stats in college.

Speaker: 00:39:37

I hated it. Hated every minute of it. But

Speaker: 00:39:40

when I got into data science,

Speaker: 00:39:44

I the first two weeks were not fun. I'm not gonna lie. Yep. But

Speaker: 00:39:48

just like the VI editor, once you stick with it,

Speaker: 00:39:51

Stockholm syndrome kicks in, And you start loving

Speaker: 00:39:55

it. That's cool. 2, what's your favorite

Speaker: 00:39:59

part of your current gig? The favorite part of my

Speaker: 00:40:03

current job is being able to learn interesting,

Speaker: 00:40:06

fun, even complicated things in data science and AI,

Speaker: 00:40:10

and figuring out how to communicate them to a wide

Speaker: 00:40:14

audience. It's a really fun challenge. It's really similar to, like,

Speaker: 00:40:17

what, 3 blue one brown does all the time on the YouTube channel, and it's

Speaker: 00:40:21

something that I get to learn and practice and keep keep doing. That's the best

Speaker: 00:40:24

part of the job. I love learning things and, like, teaching other people about them

Speaker: 00:40:28

and learning even more things. And the fact that I have an opportunity to do

Speaker: 00:40:31

that every single day is, like, the best. That's cool. That's

Speaker: 00:40:35

cool. We have 3 complete sentences. When I'm

Speaker: 00:40:39

not working, I enjoy blank. When I'm

Speaker: 00:40:42

not working, I enjoy, baking sweet treats and

Speaker: 00:40:46

goods. I can't have any dairy. So very often, I had to kind

Speaker: 00:40:50

of give up a lot of the cakes and desserts that I loved eating when

Speaker: 00:40:52

I was younger. So now I, like, spend my time trying to figure out how

Speaker: 00:40:56

I can make them again without dairy so they taste really good. So that's that's

Speaker: 00:40:59

something I enjoy I really enjoy doing. Very cool.

Speaker: 00:41:04

Next, complete the sentence. I think the coolest thing in technology

Speaker: 00:41:07

today is blank. I

Speaker: 00:41:10

thought really hard about this question because we're living in a

Speaker: 00:41:14

crazy time of technological development. But the thing that really

Speaker: 00:41:18

stuck out to me and the thing that was also the moment for me

Speaker: 00:41:22

when I started working with, like, chatbots and LLMs was code

Speaker: 00:41:25

generation models. The first time I learned how to

Speaker: 00:41:29

use, GitHub Copilot specifically, I

Speaker: 00:41:32

was I was completing some function, and it completed it before I was done typing

Speaker: 00:41:36

it. And I was like, what the heck? This is amazing. Like, this this this

Speaker: 00:41:40

actually figured out exactly what I needed. And because I was still, like,

Speaker: 00:41:43

a budding developer, it was extremely helpful because I could learn

Speaker: 00:41:47

faster rather than having already a huge kind of store knowledge already in my

Speaker: 00:41:51

brain and kind of pulling from that. So I could see it benefiting my workflow.

Speaker: 00:41:54

So I think the development of those tools and modern tools like

Speaker: 00:41:58

Cursor, so on and so forth, extremely cool. And I can't wait to

Speaker: 00:42:01

see, like, what the next generation of those technologies will look like. Yeah. I

Speaker: 00:42:05

mean, that's a that's a great example. It's almost like you don't

Speaker: 00:42:09ed, you know, the the classic: 10000 Speaker: 00:42:13

that. It's almost like you can leverage the AI to take on the

Speaker: 00:42:16lion's share of the: 10000 Speaker: 00:42:20

have to put in some reps, but not to the degree that you used to.

Speaker: 00:42:23

No. I think that's gonna be very transformative. I mean, I mean, I'm

Speaker: 00:42:27

learning, JavaScript and Next. Js on the side because it's something I have no

Speaker: 00:42:31

experience in. Right. And I was able to build my personal website

Speaker: 00:42:35

entirely through using Cursor and Progression. Nice. I

Speaker: 00:42:38

often check that out. Which is insane. Right? Which is, like, really, really

Speaker: 00:42:42

fascinating. And and I'm not gonna claim to, like, suddenly be an expert in

Speaker: 00:42:45

NextGen or anything like that. Right? Right. Right. Right. I still wanna learn, like, exactly

Speaker: 00:42:49

what's going on under the hood, But having a project that you can kind of,

Speaker: 00:42:52

like, tinker on that's, like, pretty small in scale and that you can kind of

Speaker: 00:42:55

afford to make a few mistakes on and having, like, an expert system kind of

Speaker: 00:42:59

help you go through that, expert, quote, unquote, being close enough, really cool

Speaker: 00:43:02

learning experience. No. That's a great way to put it because, like, I I

Speaker: 00:43:06

I don't have any apps on the modern devices. Right? Like,

Speaker: 00:43:10

so, it would be nice if I

Speaker: 00:43:14

had an Android app that could kick off some automation process that I have.

Speaker: 00:43:18

Right? Or do some kind of tie in with, you know, Copilot

Speaker: 00:43:21

into that or things like that. Like, where, you know, I

Speaker: 00:43:25

originally wrote a content automation system I wrote. I originally wrote in

Speaker: 00:43:29

dotnet, but I ported it to Python with the help of

Speaker: 00:43:33

the help of AI. And I could well, that's just it. Right?

Speaker: 00:43:36

It really the true valuable resource in in life is

Speaker: 00:43:40

time. Right? Yes. It's not Yes. I mean, I could have done it by hand.

Speaker: 00:43:44

I could have done it by myself, but it was one of those things where

Speaker: 00:43:48

am I gonna do it because it's gonna take x number of hours or whatever?

Speaker: 00:43:53

But if I can just kinda here's the dot net version that I, you know,

Speaker: 00:43:56

I posted. This is before there was Copilot, so I pasted it into chat g

Speaker: 00:44:00

p t. And it basically spit out a Python

Speaker: 00:44:03

version, had some errors. You know, this was a while ago. But I

Speaker: 00:44:07

was able to, inside of a day, get it done as opposed to

Speaker: 00:44:11

before. Like, I know how my ADD works. Right? Like, I'll start it.

Speaker: 00:44:14

First 3 days, working on it, grinding on it, and then

Speaker: 00:44:18

I don't touch it again for 2 weeks. And it never gets built. But

Speaker: 00:44:22

with this, I'm able to kinda harness the the spark of

Speaker: 00:44:25

inspiration and and execute much faster. Now I think I don't think

Speaker: 00:44:29

people fully realize, like, you know, it's not all doom and gloom. Nobody's

Speaker: 00:44:33

gonna have any programming jobs. There's a lot of upside too. And I

Speaker: 00:44:37

guess that's just where we are in the hype cycle. As you said.

Speaker: 00:44:41

Yeah. Yeah. Yeah. Exactly. That's a good segue into I look forward to

Speaker: 00:44:44

the day when I can use technology to blank. I look

Speaker: 00:44:48

forward to the day where I can use technology to get a high quality

Speaker: 00:44:52

education on any subject for free. So Nice.

Speaker: 00:44:56

Free education is really important to me. A lot of

Speaker: 00:45:00

what I learned about large language models, deep learning, all that

Speaker: 00:45:03

stuff was online courses that I took for free on places like

Speaker: 00:45:07

EDX, Coursera, so on and so forth. Or people sharing

Speaker: 00:45:11

articles and kind of learning from them, or YouTube videos, or all that sort of

Speaker: 00:45:14

things, in addition to my education. But there's a lot of things you kinda have

Speaker: 00:45:16

to learn after that. Right? And I think that especially with, like,

Speaker: 00:45:20

cogeneration models, it's, like, very easy to be, like, okay. Build me this app

Speaker: 00:45:24

and, like, just make it work. And you can sit there for a couple hours,

Speaker: 00:45:26

and it'll, like, work. But I think the missing piece is

Speaker: 00:45:30

creating a structured kind of learning path that's, like,

Speaker: 00:45:33

personalized to whoever you are for the

Speaker: 00:45:37

thing that you're really interested in with the context of

Speaker: 00:45:41

having, like, these tools that can help you do that thing. And I'm not sure

Speaker: 00:45:44

if we have anybody or any offering that can

Speaker: 00:45:48

kind of do that technologically, because you need a lot of information about what the

Speaker: 00:45:51

user knows or doesn't know. You need to be able to create ability, and then

Speaker: 00:45:54

you need to be able to kind of create, like, an entire mini course that's

Speaker: 00:45:57

personalized to whatever that person needs. But if we can do that, we can solve

Speaker: 00:46:01

so many wonderful problems. Absolutely. I'm

Speaker: 00:46:05

thinking about special education needs and things like that. I don't think we're that

Speaker: 00:46:08

far off from this. No. But I

Speaker: 00:46:12

the biggest issue, is going to be just hallucinations. Right? And,

Speaker: 00:46:15

hopefully, people can build, like, rag systems using tools like PineCone to kind

Speaker: 00:46:19

of produce those hallucinations. But we will also for for something like

Speaker: 00:46:23

that specific use case, we probably need, like, another breakthrough in

Speaker: 00:46:26

indexing information or kind of presenting it, or we need a process that

Speaker: 00:46:30

really allows people to create this information quickly

Speaker: 00:46:34

and verifiably in order to kind of make that happen. But if if that is

Speaker: 00:46:38

a future that we can live in, where technology can can kind of, like, help

Speaker: 00:46:41

people learn, like, really important things really well, that would be

Speaker: 00:46:44

wonderful. And I think that would be, like, amazing for for humanity.

Speaker: 00:46:48

Oh, absolutely. Share something different

Speaker: 00:46:52

about yourself, but remember as a family podcast.

Speaker: 00:46:57

One of my favorite hobbies for about a decade is

Speaker: 00:47:00

designing and folding origami. And it's really fun.

Speaker: 00:47:04

It's very easy, but it's also very hard. There's a lot

Speaker: 00:47:07

of comp complexity inside it as well. One thing people

Speaker: 00:47:11

don't know about that is that there's a lot of mathematical complexity.

Speaker: 00:47:15

So once you get to a point where you wanna design a model with

Speaker: 00:47:19

really specific qualities, really specific features, it suddenly

Speaker: 00:47:22

becomes a paper optimization problem where you

Speaker: 00:47:26

have, like, a fixed size square, and you have different

Speaker: 00:47:30

regions of that paper that you're allocating to portions of the model you're

Speaker: 00:47:33

designing. And it turns out that there are entire mathematical

Speaker: 00:47:37

principles and procedures to solve this problem. So much

Speaker: 00:47:40

so that one of the leading, like, practitioners in the

Speaker: 00:47:44

field is, like, this physicist who wrote a textbook on how to do origami design,

Speaker: 00:47:48

and that's, like, the textbook everyone looks at. So, like, learn how to solve it.

Speaker: 00:47:51

Yeah. I'm not surprised. There's definitely there's definitely a a correlation

Speaker: 00:47:55

between the mathematics of that. And I look at origami creations, and I

Speaker: 00:47:59

just fascinated that could be done from a single sheet. Like, it's

Speaker: 00:48:03

just how is that I mean, that's just mind bending. Now it's

Speaker: 00:48:06

and and makes sense that there's a mathematical because you have a certain type of

Speaker: 00:48:09

constraint, And there's obviously

Speaker: 00:48:14

folds factor into it and things like that. And, yeah, that's that's

Speaker: 00:48:17

interesting. I I should what's the name of that book? I should pick it up.

Speaker: 00:48:20

It's called Origami Design Secrets. Got it. Alright. I will check

Speaker: 00:48:24

it out. So where can people learn more about

Speaker: 00:48:28

you and Pinecone? Of course. You wanna learn more about Pinecone? The

Speaker: 00:48:32

best place is our website, pinecone. Io. You can also find

Speaker: 00:48:35

us on LinkedIn and on x and other social media platforms.

Speaker: 00:48:39

You wanna learn more about me? You can go to my LinkedIn, which you can

Speaker: 00:48:42

find at Arjun Girthi Patel, or you can go to my website, which is also

Speaker: 00:48:46

my name, arjun, k I r t I p

Speaker: 00:48:50

a t e l.com. Cool. And we can also check out your

Speaker: 00:48:53

Next JS skills there too. Exactly. Hopefully, nothing is

Speaker: 00:48:57

broken, but, you can you can see you can see how well I've gotten by

Speaker: 00:49:01

with the Awesome. Trust me.

Speaker: 00:49:05

JavaScript alone is is a is a frustration

Speaker: 00:49:08

creation device.

Speaker: 00:49:12

Audible sponsors the podcast. Do you do audio books? Is there a book that you

Speaker: 00:49:15

would recommend? I do do audiobooks, but I've just

Speaker: 00:49:19

started recently, so I don't have a huge, audiobook library. But

Speaker: 00:49:22

there is I I am a huge fan of short story collections, and

Speaker: 00:49:26

kind of the one that comes to mind is really anything by Ted

Speaker: 00:49:30

Chiang, who does a lot of kind of sci fi short stories. If you've seen

Speaker: 00:49:33

the movie Arrival, the short story based on that is story of your life,

Speaker: 00:49:37

and it's wonderfully written. It's one of my favorite short stories ever.

Speaker: 00:49:40

Yep. So highly recommend that. I believe the collection is

Speaker: 00:49:44

called, story of your life and others, something like that. So

Speaker: 00:49:47

Oh, interesting. Careful with audiobooks. They are very

Speaker: 00:49:51

addictive. So,

Speaker: 00:49:55

with Audible is a sponsor of the show. So if you go to the data

Speaker: 00:49:58

driven book.com, you'll get routed to Audible and

Speaker: 00:50:02

you'll get a free book on us. And if you

Speaker: 00:50:06

choose to subscribe, we'll get a little bit of kickback. It helps run the show

Speaker: 00:50:09

and helps, helps us bring, bring some good stuff to to

Speaker: 00:50:13

the masses. So any any parting thoughts?

Speaker: 00:50:18

No. But thank you so much for having me on, Frank. This was a ton

Speaker: 00:50:21

of fun. I learned a lot from you, and I hope I I helped you

Speaker: 00:50:24

learn one one small thing as well. Absolutely. It was it was

Speaker: 00:50:28

a great conversation, and, we'll let the nice British lady finish the

Speaker: 00:50:31

show. And that's a wrap for this episode of Data Driven, where we

Speaker: 00:50:35

journeyed from the intricacies of vector databases to the surprising

Speaker: 00:50:38

elegance of origami. A huge thank you to Arjun Patel for

Speaker: 00:50:42

sharing his insights on retrieval augmented generation and his passion

Speaker: 00:50:46

for making AI accessible to all. From turning raw data

Speaker: 00:50:50

into actionable knowledge to turning paper into art, Arjun

Speaker: 00:50:54

proves there's beauty in both precision and creativity. If today's

Speaker: 00:50:57

episode left you curious, inspired, or just itching to fold a

Speaker: 00:51:01

piece of paper into something meaningful, be sure to check out

Speaker: 00:51:04

Arjun's work and Pinecones innovative tools. Remember,

Speaker: 00:51:08

knowledge might be power, but sharing it makes you a force to be reckoned

Speaker: 00:51:12

with. As always, I'm Bailey, your semi sentient guide to

Speaker: 00:51:16

all things data. Reminding you that while AI might shape our

Speaker: 00:51:19

future, it's the human touch or sometimes the paper fold that

Speaker: 00:51:23

gives it meaning. Until next time, stay curious,

Speaker: 00:51:27

stay analytical, and don't forget to back up your data.

Speaker: 00:51:30

Cheerio.