Matteo Interlandi on Project Hummingbird
Hello and Welcome to Data Driven.
In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird.
Audio file
Transcript
00:00:00 BAILey
Hello and welcome to dated driven.
00:00:02 BAILey
In this episode, Frank and Andy speak with researcher Matteo Interlandi about project Hummingbird.
00:00:09 BAILey
Now on with the show.
00:00:10 Frank
Second, hello and welcome to data driven.
00:00:21 Frank
The podcast where we explore the emerging fields of data science, machine learning and artificial intelligence.
00:00:27 Frank
If you’d like to think of data as the new oil, then you can consider us.
00:00:30 Frank
Car Talk because we focus on where the rubber meets the virtual road and with me on this epic Rd.
00:00:36 Frank
We’re on the information superhighway as oh is Andy Leonard.
00:00:39 Frank
How you doing Andy?
00:00:40 Andy
I’m well Frank, how are?
00:00:41 Frank
You I’m doing alright. We’re recording this on Wednesday, September 1st, 2021 and the the.
00:00:51 Frank
The the remnants of Hurricane Ida are ripping through the DC area.
00:00:57 Frank
Uh, so if, uh, if I suddenly get dropped, that’s because we probably lost power.
00:01:03 Frank
But I do have the backup generator, the one that the professionals installed and my.
00:01:10 Frank
Duct taped together a solar generator so.
00:01:15 Frank
I will be offline.
00:01:17 Frank
For a short.
00:01:18 Frank
Bit and hopefully come back online.
00:01:20 Frank
How how you doing, Eddie.
00:01:23 Andy
I’m doing alright Frank. Well, we are you know I’m about gosh 250 miles South of UM we didn’t get near the near the effects of Hurricane Ida as you did.
00:01:34 Andy
We’re getting a little bit of rain now.
00:01:36 Andy
We’ve had some wind.
00:01:37 Andy
Gusts, but it’s been really mild, and if you look on the radar.
00:01:41 Andy
Gotta watch it into track and I I do.
00:01:43 Andy
I’m a weather weenie and amateur but it it just kind of went around us to the to the West and it actually started the east when it got a little north of us and aimed right for your house.
00:01:54 Andy
I was looking outside that’s where Frank lived, right?
00:01:56 Andy
And look, the eye is coming right for.
00:01:58 Andy
Frank what’s left?
00:02:00 Frank
Well, fortunately we’re safe.
00:02:02 Frank
There was some kind of flooding in Rockville and the small overnight, and some folks they got up.
00:02:09 Frank
No one, nobody died that I’m.
00:02:10 Frank
Aware of so.
It it says.
00:02:12 Frank
You know we’re not.
00:02:13 Frank
Custom the floods or hurricanes or tornadoes up here in DC and and we’re more used to the human threats of, you know, little things like terrorism and things.
00:02:25 Frank
Like that, but.
00:02:26 Andy
Yeah yeah, you guys got a little bit more to worry about that than we do here in FarmVille, right?
00:02:32 Andy
But you know these days.
00:02:33 Andy
Who knows?
00:02:35 Andy
The, uh, definitely our thoughts and prayers are with the folks in in Louisiana and Mississippi.
00:02:40 Andy
They were hit very hard.
00:02:42 Andy
I’ve got got friends in Georgia, Western Georgia were telling me that.
00:02:47 Andy
They they took a beating as well and you know it just it looks horrible I.
00:02:53 Andy
I you know, I’ve I’ve been in a few of those places after hurricanes have hit as part of like church efforts to help clean up and stabilize and stuff like that.
00:03:04 Andy
It looks like I don’t know.
00:03:06 Andy
They people describe it as like a war.
00:03:09 Andy
I’ve never been in a war so I don’t know.
00:03:10 Andy
I’ve seen pictures and.
00:03:13 Andy
There’s a lot.
00:03:14 Andy
It looks like a lot of stuff is blowing over, and that sort of.
00:03:16 Andy
Stuff, it’s just.
00:03:18 Andy
So, and they’re talking weeks and weeks before power comes back on.
00:03:22 Frank
That’s horrible, that’s.
00:03:23 Andy
Similar places, yeah.
00:03:25 Frank
That’s that’s.
00:03:26 Frank
Probably going to be do more damage from for a lot of things.
00:03:30 Andy
Were you worried?
But on a.
00:03:30 Frank
More positive note, uh, a positive note.
00:03:31 Andy
Yes, on a positive note.
00:03:35 Frank
Uh, we are.
00:03:37 Frank
I am super excited to have a special guest and I say super excited because he’s from Microsoft.
00:03:42 Frank
He’s a senior scientist in Jelt at Microsoft, working on scalable machine learning systems.
00:03:50 Frank
Before he was at Microsoft, he was a postdoc scholar at the Computer Science department at UCLA, and this he was doing a lot of interesting stuff there.
00:04:03 Frank
He was doing research at Qatar or Qatar.
00:04:05 Frank
I’m not sure how to say that exactly, but he has a PhD in computer science.
00:04:11 Frank
In university.
00:04:12 Frank
Of Modena and or?
00:04:15 Frank
I’m going to botch this.
00:04:15 Frank
Reggio Emilia.
00:04:17 Frank
Welcome to the show, Mateo.
00:04:22 Frank
Awesome, so we are really excited to have you here.
00:04:25 Frank
We actually booked you a whole month in advance.
00:04:27 Frank
I’ve been looking forward to this.
00:04:29 Frank
Yeah, because you’re coming by way of some of the folks at the Mlad conference.
00:04:35 Frank
And for those who don’t know, I’m a I’ve mentioned this.
00:04:37 Frank
Mlad stands for machine learning and data science summit.
00:04:40 Frank
It used to be in person I think now it’s entirely virtual for the foreseeable future.
00:04:45 Frank
Uh, but that why I attended M lads in 2016 summer of 2016 and it was uh, it was life altering like I don’t say that.
00:04:55 Frank
Lightly so.
00:04:56 Frank
So Microsoft does amazing work in the machine learning and data science space.
00:05:02 Frank
Very much cutting edge stuff very much I.
00:05:06 Frank
I wouldn’t say under the radar, but Microsoft does not do a great job putting its own horn, so we’re very excited for you to come on Mateo and talk about this little project that you’re working on.
00:05:17 Frank
And what is the is it have a code name or what?
00:05:20 Frank
What is it called?
00:05:22 Matteo
Hummingbird should the code name is actually I’m in.
00:05:26 Matteo
Don’t have any specific internal names for.
00:05:28 Matteo
This for this.
00:05:28 Frank
OK, what what is GL stand for?
00:05:32 Frank
That was my that was my first question.
00:05:33 Frank
When I saw your bio.
00:05:35 Matteo
Uh is for Gray system lamp and is the after Jim Gray which.
Oh, OK.
00:05:41 Matteo
Is putting award yeah?
OK.
00:05:46 Matteo
So these are the search lab after this name yeah and use within the Azure data organization.
Oh, interesting.
00:05:53 Frank
And uhm, So what?
00:05:56 Frank
What what cool stuff does Hummingbird do?
00:06:00 Matteo
So, Hummingbird, uh?
00:06:03 Matteo
Is a little bit, uh, weird project in the sense that when we started this project we didn’t know if it was going to.
00:06:10 Matteo
To be a success or not?
00:06:12 Matteo
Because what we try to do basically is to uhm translate traditional machine learning models and into neural networks.
00:06:22 Matteo
Actually not Internet format into tensor programs such that then we can run over tensor runtime, such as pipers.
00:06:30 Matteo
In terms of.
00:06:32 Matteo
Uhm, so when we started this project actually idea was hey there is a lot of investment in general pulling into this neural network frameworks and.
00:06:45 Matteo
Coming from the Azure data organization, instead, we are more interested in these traditional machine learning methods such as decision trees.
00:06:52 Matteo
Linear models were not encoding all those boring traditional algorithms.
00:07:00 Matteo
And so we look at this.
00:07:01 Matteo
The neural network system and say hey how we can take advantage of all this technology that is built.
00:07:05 Matteo
Into this domain so you can run neural.
00:07:08 Matteo
Network over CPU.
00:07:10 Matteo
Over the GPU, then you can use like fancy compilers to compile to generate the transfer programs.
00:07:16 Matteo
All those sort of techniques and we were.
00:07:19 Matteo
Kind of struggling.
00:07:20 Matteo
To see what we could do with the with this stack and and what we come up with with is this Amber project.
00:07:27 Matteo
So we basically take a.
00:07:32 Matteo
Traditional machine learning pipelines composed right feature iser and machine learning models.
00:07:37 Matteo
After the day trained.
00:07:39 Matteo
So first you need to train it using cycle ornamental net or.
00:07:43 Matteo
Uhm, uhm, one of those traditional machine learning platforms and then once it is trained we basically convert it into a set of tensor operations in.
00:07:54 Matteo
In the current version we use basically PY torch for doing this conversion and then basically you have a pipeline model so you can do whatever you can do with Python.
00:08:03 Matteo
Models so you can deploy it in in it into a PY torch.
00:08:08 Matteo
Uhm, deployments you can run over CPU ran over the GPU or you can do the torch script if you want to get rid of all the Python dependency and just have a C++ program you can.
00:08:19 Matteo
Do all those all those tricks.
00:08:22 Frank
Interesting, does it impact accuracy precision?
00:08:26 Frank
Does it improve it?
00:08:27 Frank
Keep it the same.
00:08:29 Matteo
We tried to keep it the same so we are able to keep.
00:08:33 Matteo
It The same up to floating point numbers roundings?
00:08:36 Matteo
So since we use, you know we use PY torch to run these programs and not like a socket or ornamental net.
00:08:44 Matteo
There are some differences in how they do you know, floating point operations.
00:08:48 Matteo
So the.
00:08:49 Matteo
Accuracy is up to roundings in the Floating Points, which sometimes are actually.
00:08:54 Matteo
It can be quite a bit, but most of the time is really small, almost not noticeable.
00:09:00 Frank
Interesting, interesting, uhm.
00:09:03 Frank
Do you would you know.
00:09:05 Frank
If there was like.
00:09:06 Frank
A discrepancy, or you Dutch as part of testing?
00:09:09 Matteo
It’s part of testing.
00:09:10 Frank
Right, all software is tested, right Andy?
00:09:11 Matteo
So we have we have.
00:09:13 Frank
Sometimes intentionally is that the email.
00:09:15 Andy
That’s right.
00:09:17 Frank
And he has a saying where all softwares I I forget exactly what it is.
00:09:21 Frank
But what is it?
00:09:23 Andy
Yeah, all software is tested, some intentionally.
00:09:27 Frank
There you go.
00:09:30 Frank
Uhm, so what’s the?
00:09:33 Frank
What’s the real?
00:09:34 Frank
What are?
00:09:34 Frank
What are the advantages of of of converting kind of a traditional model over to a tensor model?
00:09:41 Frank
Is it?
00:09:41 Frank
Is it portability?
00:09:42 Frank
Is it speed?
00:09:43 Frank
You did mention that you can run it on.
00:09:45 Frank
You could take advantage of GPU as well as CPU.
00:09:51 Matteo
Yes, exactly so you most mostly is related to speed, so you can basically run your socket, learn model on GPU end to end and and this user provides you know a little bit of quite a bit of speed up we for some of our example we even saw like 2 ordinal Magneto speedups.
00:10:11 Matteo
For some of the models.
00:10:13 Matteo
And uhm, and usually we try to show that.
00:10:18 Matteo
If you use GPU.
00:10:19 Matteo
Can be much faster, but on CPU we try to be kind of as close as possible scikit learn or the base or the base or diminished model.
00:10:27 Matteo
Sometimes we can, sometimes we are a little bit slower.
00:10:31 Matteo
Uh, but we.
00:10:32 Matteo
We had some really interesting result.
00:10:34 Matteo
Like for instance, we did some experiment with some.
00:10:39 Matteo
Some folks at the VM and we took some extra boost model and we compiled some training accuracy boost model.
00:10:47 Matteo
Uh, using Hummingbird anti VM into some uh, we basically do code generation and we show that the that model that was compiled to Python was even faster than they quoted the C++ implementation that they’re having next used, but those CPU and GPU. Yeah, there was kind of OK. What’s going on?
00:11:06 Matteo
This is not.
00:11:08 Matteo
This was not expected.
00:11:08 Frank
Wait, did you say it was faster than a C++ implementation?
00:11:11 Matteo
Yes, I mean if she used.
00:11:13 Matteo
Underneath C++ even scikit learn.
00:11:15 Matteo
You know they use like.
00:11:16 Matteo
From C++ library and yeah, using TVM for doing the code generation, they are able to do like a operator fusion which you don’t normally have for like these traditional models.
00:11:28 Matteo
So we told these tricks bigger, basically that are coming from the neural network.
00:11:31 Matteo
Famous we were able to get like this.
00:11:34 Matteo
These surprising numbers.
00:11:36 Frank
Interesting, so that’s a real performance boost, and probably if you scale that up into the cloud that probably.
00:11:44 Frank
Means a lot of money saving too in terms of on cloud computing things like, I imagine a company like the size of Microsoft would be very interested in getting better results faster with less cloud compute.
00:11:56 Frank
You did mention an acronym, I just wanna make sure folks know.
00:11:59 Frank
What that is?
00:12:00 Frank
Tyvm what is that?
00:12:03 Matteo
Uh, I don’t know what is exactly for, uh, some tensor maybe?
00:12:08 Frank
Andy looks like he knows, but he’s on mute.
00:12:10 Andy
I don’t, yeah I I don’t know.
00:12:13 Frank
OK, I’m just curious.
00:12:13 Andy
I’ll go look it up.
00:12:15 Frank
There you go.
00:12:16 Andy
EVM acronym.
00:12:19 Matteo
I think is for tensor virtual machine, but I’m.
00:12:21 Matteo
Not sure if this is approach.
00:12:22 Frank
That sounds about right.
00:12:23 Frank
Tector, yeah tencer.
00:12:26 Frank
Vector machine.
00:12:28 Andy
Ah, I see.
00:12:30 Andy
So thanks very much comes up, that’s interesting.
00:12:34 Frank
Well, we’ll we’ll figure out what it is putting.
00:12:36 Andy
Put tensor in here at TTVM you said.
00:12:36 Frank
This junction so.
00:12:40 Frank
Yeah, yeah.
00:12:40 Matteo
Yes, is a project is a GitHub project, but I think it also is Apache project and these are our top where you have.
00:12:45 Andy
Yeah there TV m.apache.org yeah.
00:12:50 Andy
And it doesn’t tell me what it stands for, but that’s that’s where you can go and learn more about it.
00:12:55 Andy
It’s according to the website and end to end machine learning compiler framework for CPU, GPU’s and accelerators.
00:13:05 Andy
Interesting, it does sound interesting, yeah.
00:13:09 Frank
That’s what’s great about this space.
00:13:10 Frank
There’s so much you could geek out on and spend like.
00:13:15 Frank
Like I’m just looking through, I found some, uh, a web, an article on machine learning, knowledge dot AI about Hummingbird and it’s just like wow.
00:13:25 Frank
They basically it looks like they copied and pasted the fake.
00:13:29 Frank
From here.
00:13:29 Frank
It’s intelligent, but it does look fascinating in terms of what it can.
00:13:35 Frank
Do so so.
00:13:36 Frank
What what motivated what motivated the creation of Hummingbird?
00:13:43 Matteo
So the motivation was actually different, so the so the initial motivation was actually tried to.
00:13:51 Matteo
To do.
00:13:54 Matteo
Uh, not to accelerate.
00:13:56 Matteo
The trischen machining pipelines, but to use differentiation.
00:14:00 Matteo
Uhm, basically all this, uh, backpropagation.
00:14:04 Matteo
All these tools that are using for training over neuron actors and try to translate them over traditional machine learning models.
00:14:11 Matteo
So try to do basically backpropagation over scikit learn pipelines.
00:14:15 Matteo
And that is the biggest tool.
00:14:17 Matteo
So we started with this tool that basically was translating this tradition machine pipelines.
00:14:22 Matteo
This second only pipelines at the beginning are into Pytorch such that we can do end to end differentiation.
00:14:27 Matteo
But then once.
00:14:28 Matteo
We have we were at.
00:14:29 Matteo
Point and of course, as you can imagine, we were trying to do end to end differentiation for increasing increasing accuracy of the pipeline to see whether if you use backpropagation you can increase accuracy.
00:14:40 Matteo
And then once we did this translation, we basically realized that OK, since we are on Python sword, we can exploit all these other, uh, you know the Python framework and hardware acceleration on those other two rings.
00:14:52 Matteo
And then basically we kind of ditch this idea of doing end to end differentiation and running by propagation over over the pipelines and instead we focus more.
00:15:00 Matteo
Going to be linear system for accelerating inference prediction over distillation, machine learning.
00:15:07 Andy
So I’m curious, Mateo.
00:15:09 Andy
This is not my fortune Franks, the data scientists of our pair.
00:15:13 Andy
Here I am a data engineer, so can you give me an example of a problem that I I get the speed part of this, I really do.
00:15:25 Andy
I we need that in data engineering too.
00:15:27 Andy
I think everyone needs needs that performance part, but can you give me an example of something that you’ve applied this to?
00:15:34 Andy
And you already gave us a, you know, a interesting number about how much faster it was.
00:15:39 Andy
A couple of good references from that.
00:15:41 Andy
Was there something in particular that you’ve worked on or that your team has worked on and applied this and saw some you know some interesting results?
00:15:52 Matteo
So I mean first of all, I’m a database person too.
00:15:54 Matteo
I’m not a machine learning, so another I think would be speaking the same language.
00:15:57 Andy
OK.
00:15:59 Matteo
I’m a I’m a database person that.
00:16:02 Matteo
Yeah, it’s.
00:16:03 Matteo
I’m trying to basically understand all the machine learning domain and see how much that amazing can take advantage of these techniques.
00:16:10 Matteo
And my needs help.
00:16:12 Matteo
Uh, I mean the the start of my investigation was traditional method because those are the ones that.
00:16:17 Matteo
You in general.
00:16:18 Matteo
Use or tabular data, that is the one that we have.
00:16:23 Matteo
At the most.
00:16:23 Matteo
Dumb and so related to use cases.
00:16:30 Matteo
Let me think so we.
00:16:32 Matteo
Uhm, so we try to use it internally for some of our first party customer.
00:16:38 Matteo
Uhm, to just because they have like cyclotron models.
00:16:42 Matteo
And they want to kind.
00:16:43 Matteo
Of try to see if they can speed up the the inference of this.
00:16:46 Matteo
The prediction over these models.
00:16:48 Matteo
Uhm, when someone reaching out from outside, uh, mostly with kind of try to accelerate like a 33 based algorithm such as gradient boosting light GBM, extra boost those those.
00:17:03 Matteo
Teams and yeah.
00:17:06 Matteo
Yeah, in general the use case are really.
00:17:08 Matteo
Simple is you know you have a secretary models and you want to deploy your your your secretary models.
00:17:14 Matteo
Uh, and when you deploy you want to take advantage of GPU.
00:17:18 Matteo
You did because you already have some GPU deployments, so you already have some neural network.
00:17:22 Matteo
Uh, there and uh you also want to take advantage of the GPU that you are in your deployment by with this.
00:17:30 Matteo
Yeah, traditional models or just because you have like a a traditional model, you want to increase the the inference time.
Got you?
00:17:38 Matteo
I have to say that the most of the performance boost we usually see is related to batch inference, so not when you’re doing one single one single point inference, but when you have like a batch of records that we can basically saturate the performance of a GPU of a GPU order for instance.
00:17:55 Andy
So just to follow up on that, then it sounds like a lot of what you’re doing is.
00:18:02 Andy
You know you’re focused on the on the tool that does these translations for you into other platforms.
00:18:08 Andy
Other technologies allows you to use you know GPU versus CPU, and I think what you’re creating if I understand you and I didn’t do my homework, apologies.
00:18:20 Andy
I think what you’re building is away.
00:18:22 Andy
To to to exactly what we were joking about earlier about testing.
00:18:27 Andy
You want to see how can I get the peak performance?
00:18:31 Andy
For you know this part of of that.
00:18:33 Andy
Maybe this module or this operation of the batch and maybe the answer here and you mentioned this may be the answer here.
00:18:41 Andy
Is CPUs or GPUs? Maybe it’s C++ and you’re just able to, you know, kind of pick the high spots and say I’m getting order.
00:18:50 Andy
Case of performance.
00:18:51 Andy
The low spots right?
00:18:52 Andy
Just stuff that runs it fast.
00:18:54 Andy
And then you can put that together and hand it back to your client or someone who’s interested in it and say right now, given the volume and the data and the state of hardware, you can get the maximum performance.
00:19:07 Andy
If you do this part here and that part there, that part there is that fair.
00:19:13 Matteo
So you’re you’re actually looking into the some future work that we are investigating now so kind.
00:19:18 Matteo
Of is matching.
OK.
00:19:19 Matteo
The different for the different part of the pipeline.
00:19:22 Matteo
So what we focus actually right now is try to translate the machine learning models end to end, so taking the featurization’s and all the models and.
00:19:31 Matteo
Then because basically we saw that that is the the where we can get most of the time, that is where we can get to the mass, the mass maximum performance because by looking at the model end to end we can run it completely over the GPU instead of having to go back and forth from GPU to CPU for example.
00:19:47 Matteo
But what you point out is something that we are considering.
00:19:51 Matteo
So kind of look at the model, not as a kind of, you know, a unique.
00:19:55 Matteo
The black box kind of a artifact, but is something that we can actually split in different parts and eventually we can run it in over different over different hardware over different runtime.
00:20:08 Matteo
I’m such such TV.
00:20:09 Matteo
As I said before, so some particle on TV and some parts random Pytorch the the sort of those sort.
00:20:14 Frank
Of things so kind of like a meta optimizer.
00:20:15 Andy
OK, it’s a combination.
00:20:18 Andy
Like that’s exactly where I was going.
00:20:19 Andy
Yeah, it’s like you’re tuning stored Procs Mateo.
00:20:24 Andy
And you’re deciding I want this one to run on SQL Server.
00:20:27 Andy
I want that one to go to Postgres.
00:20:29 Andy
And yeah, it’s just that that is interesting that you can span hardware and software.
00:20:36 Andy
You can pick platforms in the software.
00:20:39 Andy
To do it.
00:20:40 Andy
And I I’m with you.
00:20:41 Andy
I got my head around us now and I I think that’s really really cool I the this just sounds like something that’s going to accelerate the field really.
00:20:51 Andy
Because if you the last time you’re sitting around twiddling your thumbs waiting for a result, you know the more you can get done.
00:20:59 Andy
I mean, that’s just.
00:21:00 Andy
Common sense, so I love what you guys are doing.
Yeah, yeah exactly.
00:21:04 Andy
That’s that’s really cool and I like that.
00:21:07 Andy
I don’t think I’ve ever heard anybody talk about.
00:21:10 Andy
You know, changing libraries and changing you know hardware platforms even.
00:21:17 Andy
I mean it’s it’s hard to even say I don’t know what you’d even classify that as because running different chips you know, running the processes on different chipsets.
00:21:26 Andy
That’s something we used to do back.
00:21:28 Andy
In the seventh, you know.
00:21:29 Andy
I mean, but it was.
00:21:30 Frank
Let’s just say that Harkins back to like the.
00:21:31 Andy
Mainframe days it kind of does. I mean 68 hundreds and his the 80s and all of that and but?
00:21:39 Andy
I mean, this is way, way, way more advanced than all that, but I like the idea.
00:21:46 Andy
I like being able to to do that and I hear what you’re saying right now.
00:21:50 Andy
You’re just after picking a platform, picking on an approach and saying, you know we’re going to run this.
00:21:57 Andy
We’re going to generate C++. It’s going to run on CPU’s, and that’s overall that’s going to be your fastest result. It’s going to give you your best performance.
00:22:06 Andy
I I get you.
00:22:07 Andy
But that I I didn’t realize I jumped ahead there.
00:22:10 Andy
But that happens sometimes rare, but it happens.
00:22:15 Andy
Y’all could totally take that idea Mateo and run with.
Yeah, if you.
00:22:19 Matteo
You can run right the paper together if you want to.
00:22:22 Frank
There you go.
00:22:22 Frank
You know, right?
00:22:23 Andy
Away I could.
00:22:24 Andy
I could do the punctuation.
00:22:28 Frank
He’s really good at.
00:22:29 Frank
Reviewing stuff, I will say that his personal experience from him him reviewing my articles in the now defunct MSDN magazine.
Here we go.
00:22:38 Andy
I remember that those were fun.
00:22:39 Andy
I learned a lot reviewing your articles.
00:22:42 Andy
Frank ’cause you were always on the cutting edge.
00:22:44 Frank
I try.
00:22:45 Andy
Yeah, neat stuff what?
00:22:46 Frank
But this this Hummingbird stuff looks really cool and it looks like it’s as easy to install as PIP install Hummingbird.
00:22:54 Matteo
Just be missing.
00:22:54 Frank
Hummingbird, Dash MLI think it is.
00:22:57 Matteo
Yes, yeah, that number was already taken off course.
00:23:00 Frank
Well, yeah, but no.
00:23:02 Frank
This is really cool.
00:23:02 Frank
Like I I I like where this is going.
00:23:05 Frank
I like the potential for it.
00:23:06 Frank
’cause you with the cloud you know you.
00:23:09 Frank
You think about.
00:23:11 Frank
Database as a.
00:23:12 Frank
Service like you don’t.
00:23:13 Frank
You know you don’t care what the heart women you care but I mean like from the end developers point of view.
00:23:19 Frank
They won’t necessarily care what type of hardware like that.
00:23:21 Frank
This does open.
00:23:22 Frank
Up some very interesting possibilities, just just kind of piggybacking on kind of what Andy said.
00:23:27 Frank
It’s like, wow, I mean one of the things and I forget who said it?
00:23:31 Frank
Might have been Kevin Hazzard, who said that you know now we live in an age where we’re not dealing with just spinning platters.
00:23:39 Frank
We can imagine.
00:23:41 Frank
What database time butchering what he said?
00:23:44 Frank
But he he did say he says a lot of profound things and one of the most profound things he said was something like you know what?
00:23:50 Frank
What would a database in the future look like?
00:23:52 Frank
Because we’re not.
00:23:52 Frank
Dealing with spinning platters is that did.
00:23:54 Frank
I get that right Andy or something along those lines.
00:23:55 Andy
You did he. He blogged about it out devattorney.com. We’ll have to look that up with the show news, but Kevin is one of those.
00:24:06 Andy
He’s a pretty pretty, profound thinker, and
00:24:08 Frank
I was going to say, uh, she’s a very deep thinker like he’s always like 10 moves ahead.
00:24:09 Andy
Yeah, I could tell.
00:24:14 Andy
Yeah, and I could tell reading the article ’cause I’ve known him for it.
00:24:18 Andy
Sort of you.
00:24:19 Andy
We’ve known him for a decade or more and he was struggling with trying to articulate the concept.
00:24:25 Andy
And if it’s tripping someone like Kevin Hazzard up, it’s pretty powerful console.
00:24:30 Frank
Right, right?
00:24:31 Andy
But he did a good job in devjourney.com. He’s not blogging as much ’cause he’s just too stinking busy. But yeah, you’re right. It. And I had a similar conversation.
00:24:44 Andy
With you know with with my son Stevie Ray not too long ago we were talking about.
00:24:52 Andy
You know flash drives, and you know that the memory that we have now is so much faster than the platters and I I made this comment to him and I kind of stopped and thought I don’t know if that’s accurate or not and maybe Mateo since you’re here working on a cutting edge, you can help us.
00:25:08 Andy
We were just poking around thinking about operating systems.
00:25:11 Andy
And we do a lot are here at the House in FarmVille, VA with IoT.
00:25:16 Andy
In fact, he’s building a new collection of sensors for me right now for nor do we know.
00:25:20 Andy
So we’re going to hook it to a π, because Pi’s can talk to, you know, to the Internet they can talk to our router, and that’s the next big secret. Don’t tell anybody.
00:25:31 Andy
Kidding, but.
00:25:33 Andy
It’s the one of the neat things about these Pi architectures versus even really powerful service that we have right now is both.
00:25:42 Andy
You can compare them.
00:25:43 Andy
They’re both messaging systems, they’re they’re just passing around messages physically on a bus.
00:25:47 Andy
When you get to that Pi level, and that’s how I learned it, so I’m really excited about him learning.
00:25:52 Andy
That way, but.
00:25:53 Andy
Nobody thought about because we didn’t.
00:25:55 Andy
We couldn’t conceive of it when hard drives came out.
00:25:58 Andy
Nobody thought about building.
00:26:00 Andy
The OS or something.
00:26:02 Andy
Second, you know second generation or higher language on that without those spinning disk.
00:26:08 Andy
And here’s the here’s my long winded place.
00:26:11 Andy
I wanted to get to is I don’t know.
00:26:15 Andy
If we’re there now, even I imagine there’s probably some OS is out there that.
00:26:22 Andy
Or setting on GitHub, there’s probably 100 of them by now that people are exactly doing that. They’re taking advantage of the new IO if you will, but I don’t think the big systems are doing it. I don’t think the major popular operating systems are and for good reason. They’re stable, it’s.
00:26:42 Andy
It’s hard to change all of that.
00:26:42 Frank
Well, there’s a lot of inertia.
00:26:45 Frank
When you when you have a widely deployed operating system, you you get a lot of inertia and you know I’m not.
00:26:51 Frank
And I’m not talking about just Windows, I mean iOS.
00:26:53 Frank
I mean Android, I mean Linux like.
Sure, sure.
00:26:55 Frank
Once you have a wide install base, you you lose the.
00:26:58 Frank
Ability to be very experimental.
00:27:01 Andy
Yeah, I totally concur with them and I see.
00:27:05 Andy
I see the cloud, I see Azure.
00:27:07 Andy
I see the you know that this leap that’s happened and it’s just it’s crazy to try.
00:27:13 Andy
I don’t even keep up with it, but just reading tidbits, reading, editing Franks articles and the like, it’s just taking these quantum leaps.
00:27:21 Andy
It’s like 10 years worth of stuff happening every six months.
00:27:26 Andy
And you guys just keep knocking it out, and I imagine at some you know at the Gray Systems lab that you’re surrounded by people who are just, you know, in Star Trek land or something.
00:27:41 Matteo
Happy yeah yeah.
00:27:44 Matteo
Yeah I totally agree on every.
00:27:45 Matteo
All the things that you said.
00:27:46 Matteo
Like I I was presenting a project related to Hummingbird.
00:27:50 Matteo
Actually kind of like a few days ago and I was preparing my.
00:27:54 Matteo
And I and I.
00:27:55 Matteo
Come up with this slide, I think.
00:27:56 Matteo
It was from just.
00:27:57 Matteo
Doing a few years back and.
00:27:59 Matteo
It basically was showing the number.
00:28:01 Matteo
Of papers that.
00:28:01 Matteo
Were published on machine learning or the public on archive and in in 2018 they were published 100 paper a day just to machine learning on that kind of just.
00:28:11 Andy
My fingers.
00:28:13 Matteo
Just to give an idea on how fast is now, the pace in which innovation is coming up, especially when the machine learning neural network domain is just.
00:28:22 Matteo
On on operating system database domain is a little bit slower, I would say because a Frank said that there is an answer there because this system are deployed and if you want to add even new hardware it will takes it takes forever.
00:28:37 Matteo
So I say Microsoft what happens when you have like a new outdoor community and you want to exploit it?
00:28:42 Matteo
It just sticks.
00:28:45 Matteo
And this is just because you know they’re used by many people, and even if you want to do a small change here, sweetheart.
00:28:53 Andy
And I’m seeing the articles about Windows 11 where when you try to make a change like that and say hey you need this minimum hardware.
00:29:00 Andy
Now everybody is going.
00:29:03 Frank
Oh yeah, yeah, everybody got the pitchforks out and like freaking out and like, yeah, I mean I, I remember I was at I was at Microsoft doing evangelism on the shift to Windows 8.
00:29:15 Frank
Just you would not believe this.
00:29:17 Frank
Well maybe you would, I don’t know.
00:29:18 Frank
But like just the the horror and people faces when they got rid of the start button like it was just like it was like the end of the world like you were you were killing somebody grandma.
00:29:26 Frank
Like you know it’s just.
00:29:27 Frank
Like it was, just like I mean, I disagree with the decision that was made, but but let’s let’s put it in perspective.
00:29:34 Frank
You know?
00:29:37 Frank
But, uh, but yeah, I mean.
00:29:37 Andy
You could still get there.
You can still start.
00:29:41 Andy
Things, but you could.
00:29:42 Frank
Still start things like in and and before.
00:29:46 Frank
This is funny like this is this is just a complete sidetrack in material.
00:29:50 Frank
We do this a lot.
00:29:51 Andy
’cause it never happens. Mateo.
00:29:53 Frank
Before keyboards had the Windows Key, there’s a you can hit control escape and it pulls up the same thing like.
00:30:01 Frank
Like I don’t know like it’s just.
00:30:03 Frank
Not the end.
00:30:03 Frank
Of the world anyway, sorry it flashed back to 2012, but so Mateo.
00:30:10 Frank
We have a bunch of kind of pre canned questions we’re going to ask you.
00:30:14 Frank
We ask this from all of our guests.
00:30:16 Frank
Most of them are about half of them, or kind of fill in the blanks, but the first one is how did you find?
00:30:22 Frank
Your way into data.
00:30:23 Frank
Did you find data or did data find you?
00:30:27 Matteo
Uh, I would say data finally.
00:30:32 Matteo
I think it was mostly because when I started my PhD, I wanted to do distributed systems.
00:30:39 Matteo
And for some reason I end up doing distributed system in a lab in a database lab.
00:30:44 Matteo
So I think that is why I think the data found me because I want I wanted to do something else.
00:30:49 Matteo
But then I end up doing data that probably was.
00:30:54 Matteo
I was really lucky to be honest.
00:30:57 Andy
Cool, very cool.
00:31:00 Andy
So our second question is what’s the favorite part?
00:31:03 Andy
Your favorite part of your current job?
Uh, no, this is.
00:31:09 Matteo
A hard question.
00:31:11 Matteo
Uh, I will say that I really love my management in the sense that they allow me us in general to be.
00:31:20 Matteo
We sort of independent in the sense that you know we are researcher and they allow us.
00:31:28 Matteo
They they find a way to.
00:31:30 Matteo
Kind of strike.
00:31:31 Matteo
A balance between having us be independent and kind of do our own research with crazy ideas like the one that.
00:31:37 Matteo
I presented with Hummingbird.
00:31:39 Matteo
And still be kind of, you know.
00:31:41 Matteo
With our foot on the ground and and kind of helping product improve improve.
00:31:46 Matteo
The system etc.
00:31:48 Matteo
So I think that is mostly what I love, so I on one I I can kind of look in what we.
00:31:53 Matteo
Can do next.
00:31:54 Matteo
Like having the operators running over different target and on the other I can kind of see what are the real problems that are coming from from from product and how we.
00:32:03 Matteo
Can solve.
00:32:03 Matteo
Them and I love this to be honest and I love this.
00:32:08 Frank
Awesome, our first complete this sentence when I’m not working I enjoy blank.
00:32:15 Matteo
I would say work but they will not.
Yeah, I don’t know.
00:32:25 Matteo
Maybe family at this point, maybe family spending a lot of time in family with the commute time.
00:32:29 Matteo
We are often at home and I have a two years old that is driving us nuts.
00:32:39 Andy
That’s pretty cool.
00:32:41 Andy
So we have.
00:32:41 Frank
My youngest did zoom kindergarten over zoom and it’s just as chaotic as it sounds.
00:32:47 Frank
Almost put it that way.
00:32:50 Matteo
Yeah, I cannot imagine I mean to be honest.
00:32:52 Matteo
Now he’s in daycare and we are really happy that now is in daycare because I’m, you know, at that age.
00:32:57 Matteo
But I guess that every kid needs to have interaction with.
00:33:00 Matteo
The with other.
00:33:01 Matteo
Kids and just stay at home is not, is not is not healthy, but I can’t imagine how.
00:33:06 Matteo
Hard it is to.
00:33:07 Matteo
Have like one year at home and.
00:33:09 Matteo
Having class or two courses.
00:33:12 Matteo
Yeah, I agree.
00:33:15 Andy
Go ahead, I’m sorry.
00:33:17 Matteo
Joe said, I hope that this all.
00:33:18 Matteo
This situation will end soon.
00:33:20 Frank
Me too yeah.
00:33:21 Matteo
It means it doesn’t like you, but.
00:33:23 Andy
Yeah, same here.
00:33:25 Andy
I think we all do the uh, I think it’s going to be one of those things where we look back for decades probably, and see these little things that we’re really not noticing right now.
00:33:36 Andy
We’re just coping and managing and going on that.
00:33:40 Andy
You know, we’re gonna look back and go.
00:33:41 Andy
Wow, you know that changed this.
00:33:44 Andy
And that, and there’s all these things that come from it.
00:33:47 Andy
I, I hope, mostly good.
00:33:48 Andy
But I think it takes us time to figure out the good.
00:33:53 Andy
I I look forward to that time.
00:33:56 Andy
When we are.
00:33:56 Andy
Reflecting and reminiscing on stuff like this.
00:34:01 Andy
I I want to, but we have to be on.
00:34:03 Andy
The other side though.
00:34:05 Andy
Yes, our our second of three complete descendants is is, I think, the coolest thing in technology today is blink.
I I.
00:34:23 Matteo
I mean, there’s other.
00:34:24 Matteo
Search, usually I’m attracted by things that I don’t know.
00:34:28 Matteo
Uh, so we’ll say something like quantum computing because I don’t know anything about quantum computing.
00:34:36 Matteo
Yeah, I I don’t know.
00:34:39 Frank
So go to impactquantum.com.
00:34:44 Andy
I’m smiling because I was waiting for Frank.
00:34:46 Frank
I actually it’s funny because in the I.
00:34:50 Frank
Went to the last M lads that was held in person. It was fall 2019 and the second day keynote was a hardware keynote and you know I go to uh, data science conference.
00:35:01 Frank
I want our data science like I I was kind of mad that they had a hardware person up and but then she started talking about quantum and it was just blew.
00:35:08 Frank
My mind, and ever since then I I.
00:35:11 Frank
I’ve really wanted to, I really.
00:35:14 Frank
I was just so overly excited about, like quantum computing, but the thing about quantum computing is, you know that night at the hotel.
00:35:22 Frank
Like you know I installed the Q Sharp SDK and stuff like that and then I was like OK Now what?
00:35:27 Frank
Because it made no flippin sense.
00:35:32 Frank
So I’ve been kind of on this, you know, intermittently, this journey of kind of learning more about quantum computing, so starting the podcast on impact quantum and then starting kind of like the blog.
00:35:42 Frank
Have kind of forced me to keep at least the regular cadence of figuring out what’s going on there, so it’s it’s fascinating.
00:35:49 Frank
I will say the one thing I’ve learned is the importance of linear algebra.
00:35:53 Frank
Apparently, linear algebra and the way the algorithms work in quantum systems tend to explain each other very well so.
00:36:02 Frank
But yeah, so definitely a quad impact.
00:36:05 Frank
Quantum.com is.
00:36:06 Frank
A blog I’ve I’ve started last week and regularly updating it, but that way.
00:36:13 Frank
But that’s you know, ending the shameless plug.
00:36:15 Frank
But I agree with you, I think quantum computing would be a very cool thing to explore for a number of reasons.
00:36:21 Frank
The the next and final completed sentence is I look forward to the day when I can use technology to blank.
00:36:32 Matteo
He used technology and I cannot have to drive the car that is like censoring cars is something I live in Los Angeles, so for me it’s half dozen cars.
00:36:40 Matteo
Can be.
00:36:40 Matteo
Kind of complete life change.
00:36:45 Frank
I totally agree, I I I used to enjoy driving like I used to.
00:36:50 Frank
I grew up.
00:36:52 Frank
I I didn’t have a license that was like 21 so like it was just like for me. I’ve done my time on mass transit.
00:36:57 Frank
I’ll put it that way, but like living in DC Everywhere is just bumper to bumper to do. Probably a lot like LA and it just really takes the joy out of it. And you know.
00:37:10 Frank
One of the things my last job.
00:37:11 Frank
At Microsoft I was at the MTC.
00:37:13 Frank
And the only thing I didn’t want to take that job was because I had to drive to Virginia.
00:37:20 Frank
Which despite it being 9 miles of the crow flies could take.
00:37:25 Frank
Could take 90.
00:37:26 Frank
Minutes to two hours, but as I don’t want to say as luck would have it, ’cause it certainly wasn’t lucky.
00:37:33 Frank
The pandemic kind of made it so I could work remotely and never had to do it.
00:37:37 Frank
But you know, I I I share your dream.
00:37:40 Frank
At day of the.
00:37:41 Frank
Of the driverless of the you know self driving cars so you can.
00:37:44 Frank
You can read you can you know be on the computer you can do work while you’re driving and things like.
00:37:48 Frank
That yeah, I’m I’m right there with you.
00:37:51 Matteo
Yeah, I I totally agree.
00:37:52 Matteo
With what you said.
00:37:53 Matteo
I mean, I’m from I’m from Italy and now I’m from Montana, which is where.
00:37:59 Matteo
Basically, we say we like a fast car and good food, so we have like Ferrari we have Ducati we have.
00:38:06 Matteo
They rolled into over that so.
00:38:08 Matteo
I was growing up with like hearing the Ferrari when they tried in.
00:38:11 Matteo
The in the.
00:38:13 Matteo
In the circuit AV in Chirag no.
00:38:16 Matteo
I I.
00:38:16 Matteo
Leave like I think 3.
00:38:18 Matteo
Or 4 miles from Fiona is still like a year when they turned.
00:38:21 Matteo
The engine on how?
00:38:22 Matteo
Loud were was that so I really like cars but.
00:38:25 Matteo
Yeah, I can not stand.
00:38:28 Matteo
You know, I believe the traffic line with other cars just for like for instance for going to work or to for going grocery shops.
00:38:35 Matteo
And it’s just kind of a waste of time.
00:38:37 Frank
Especially Ferrari, Ferrari is meant to go run free.
00:38:42 Andy
Yes, yes.
00:38:44 Andy
But that thing in Texas.
00:38:46 Frank
That’s right my my neighbor, a couple of my neighbors have.
00:38:48 Andy
Let her go.
00:38:51 Frank
Of one of my neighbors has a Ferrari and you can hear it go by. It sounds beautiful here go by so I totally relate somebody down the street owns a Jaguar V12.
00:39:05 Frank
And when that thing goes by, it’s like angels singing I.
00:39:09 Frank
I know it’s a British car and an Italian car, and that’s probably heresy.
00:39:12 Frank
But I will say it is sounds sounds impressive.
00:39:16 Frank
Uh, so so it sounds like.
00:39:20 Frank
You might also be a car guy.
00:39:22 Frank
Or at least used.
00:39:23 Frank
To be yeah.
00:39:24 Matteo
Yeah, yesterday.
00:39:26 Andy
Back home
00:39:28 Andy
So our next one is share something different about yourself, but a little caution.
00:39:35 Andy
It’s a.
00:39:36 Andy
It’s a family friendly podcast.
00:39:38 Andy
We want to keep that iTunes clean rating here, so don’t make us at it.
00:39:48 Matteo
Yeah, I don’t know.
00:39:49 Matteo
I mean I don’t know what about to share really.
00:39:51 Matteo
I’m kind of spending all my time either I work with or with family, so I probably have the boring life ever.
00:39:58 Matteo
Do you think that?
00:40:00 Matteo
I I think it is good.
00:40:02 Matteo
I mean I don’t know.
00:40:02 Matteo
If it’s good, the fact that now we are.
00:40:04 Matteo
Working from home.
00:40:05 Matteo
I have kind of more time to.
00:40:08 Matteo
Focus on other different things.
00:40:10 Matteo
Like for instance, I could watch stops right before I couldn’t watch stocks, and while I was at work.
00:40:16 Matteo
Uh, because I can drive my laptop and when I have a meeting I can just take a take.
00:40:20 Matteo
A peek and of course I can strip my stock there.
00:40:23 Matteo
Uh, while while I’m while I’m working.
00:40:27 Matteo
Uh, and yeah, and like I think it kind of yeah, kind of like a uh.
00:40:33 Matteo
Kind of looking at the stock market, especially because now is.
00:40:37 Matteo
A little bit.
00:40:37 Matteo
There’s a little bit of fraud around, so all these mem, stock, etc.
00:40:41 Matteo
Is you make exciting, but there’s a little bit dangerous so.
00:40:48 Frank
It’s become like a sport and if you will.
00:40:52 Matteo
Yeah, I mean I was trying this then.
00:40:55 Matteo
Auto renewed app.
00:40:56 Matteo
When they say gamification of stock market, I don’t know if you haven’t tried that is is crazy.
00:41:00 Matteo
It looks like gambling at all.
00:41:03 Frank
Right?
00:41:03 Matteo
It looks like.
00:41:08 Frank
And the final question, do you listen to audiobooks, and if so, do you have any recommendations?
00:41:16 Matteo
No, I don’t listen to any books.
00:41:18 Matteo
I think I’m more kind of on the old.
00:41:20 Matteo
Style I would say I.
00:41:23 Matteo
I prefer using it to read.
00:41:25 Matteo
Uh, rather than listen.
00:41:27 Matteo
You know, I.
00:41:28 Matteo
Don’t know why.
00:41:29 Matteo
I don’t know why.
00:41:31 Frank
I think it.
00:41:31 Frank
I think it depends on the person like.
00:41:33 Frank
I think it depends on kind of what you’re comfortable with.
00:41:36 Frank
I mean, my audiobook listening is nowhere near where it was when I would drive everywhere all the time.
00:41:42 Frank
So yeah, yeah. So the reason we asked him ’cause audible is a sponsor of the show and if you go to the data drivenbook.com you can sign.
00:41:53 Frank
Up for free.
00:41:53 Frank
Audible membership and if you sign up then they give us a a little pat on the back and probably enough money to buy a Starbucks.
00:42:02 Frank
Help support the show.
00:42:05 Frank
And they’ve actually been one of our number one.
00:42:07 Frank
Sponsors so far.
00:42:08 Frank
Because of this program so.
00:42:10 Frank
Yeah, so you mentioned you had a website where can folks find out more about you?
00:42:19 Matteo
Who is my my website?
00:42:20 Matteo
I think it is.
00:42:22 Matteo
I I don’t remember.
00:42:24 Matteo
Uh oh, into result is a GitHub website into result Dot GitHub dot IO.
00:42:29 Frank
All right, we’ll make sure it goes on the show.
00:42:32 Frank
Notes so folks can find out more about this and definitely go to your favorite command line prompt and type in PIP install Hummingbird Mel to check out what’s going on.
00:42:44 Frank
I’m definitely going to experiment with this.
00:42:46 Frank
’cause it does look fascinating and and like Andy said, the potential for this is fascinating.
00:42:52 Frank
Because this could end up in, this could end up in a lot of different places, ’cause it solves a lot of different problems.
00:43:00 Frank
So anything else would fail.
00:43:03 Matteo
Yeah, if you try it let us know and we are kind of, you know, looking for contributors and feedbacks.
00:43:08 Matteo
So if you try it let us know what do you think and how we can improve.
00:43:12 Frank
Awesome, thanks and I’ll add the nice British lady and the show.
00:43:16 BAILey
Thanks for listening to data driven.
00:43:18 BAILey
We know you’re busy and we appreciate you.
00:43:20 BAILey
Listening to our podcast, but we have a favor to ask.
00:43:24 BAILey
Please rate and review our podcast on iTunes, Amazon Music, Stitcher or wherever you subscribe to us.
00:43:31 BAILey
You have subscribed to us, haven’t you having high ratings and reviews helps us improve the quality of our show and rank us more favorably with the search algorithms.
00:43:42 BAILey
That means more people listen to us spreading the joy and can’t the world use a little more joy these days?
00:43:50 BAILey
Now go do your part to make the world just a little better and be sure to rate and review the show.