Dr Yossi Keshet on Decoding Speech, AI, Morality, and the Future
In this episode, we explore linguistic and cultural influences on language with Dr. Yossi Keshet—a renowned expert in automated speech recognition.
We cover the intricacies of jargon, code-switching, and the ethical dimensions of artificial intelligence.
Listen to discover how the convergence of linguistics and computer science is revolutionizing our interaction with technology.
Show Notes
05:26 YOLA targets foundational industries through AI.
07:34 Automatic speech recognition similar to KJGPT model.
11:17 American English research bias in speech intelligibility.
13:33 Studying foreign languages improved understanding of grammar.
18:35 Passionate about linguistics and cognitive sciences. No AI has this capability.
20:23 Phenomenal correlation between artificial and neural mechanisms.
26:24 Innovating transcription: improving on old industry practices.
27:35 GPT’s influence on various fundamental industries.
31:56 Using multiple languages can enhance comprehension.
35:07 Switching between languages in code-switching research.
40:47 Superego: Freud’s guilt and fear mechanism. Evolutionary.
42:11 Book writing claiming need for non-standard regulations.
46:46 AI movie plot illustrates ethics in robotics.
50:25 GPT discussion focuses on personalized and helpful interaction.
53:20 End of insightful data-driven episode, future technology.
Transcript
Welcome back to another riveting episode of Data Driven.
Speaker:Joining us today, lakeside and positively glowing from his
Speaker:Appalachian retreat, is Frank. Meanwhile, the
Speaker:always astute and ever energetic Andy is here to keep us
Speaker:grounded. But enough about us. Today, we have
Speaker:a true luminary in the field of AI, someone who's blending the worlds
Speaker:of academia and enterprise with seamless finesse. He's an
Speaker:associate professor at the Technion, has published over 100
Speaker:research papers on automated speech recognition, and is the chief
Speaker:scientist at Iola. Please welcome doctor Yossi
Speaker:Keshet or as he's known to his friends, Yossi.
Speaker:Alright. Hello, and welcome to Data Driven, the podcast where we explore the
Speaker:emergent fields of artificial intelligence, data science, and,
Speaker:and, of course, data engineering, without which the whole world would probably stop turning.
Speaker:And you know, data engineering is important. That's
Speaker:basically it. Still working on that that that revamped
Speaker:monologue, for, for season 8, Andy. Were
Speaker:you on vacation? You're on vacation. I am on vacation. And
Speaker:for those of you who can't see on camera who are not who are
Speaker:listening, not watching, I am literally lakeside,
Speaker:in the foothills. Well, not the foothills. We are actually in the Appalachian Mountains. Or
Speaker:is it Appalachian? I I never I I've heard of those. I I never
Speaker:got a clear read on it. Say either. So, you know When I say either.
Speaker:Yeah. Yeah. Yeah. Yeah. Yeah. So I am in Deep Creek Lake,
Speaker:Maryland, which is kind of like, Maryland doesn't really have a Panhandle
Speaker:per se, but if it did, it would be this is what this would be.
Speaker:I probably think I'm 5 miles from West Virginia and about
Speaker:20 miles from Pennsylvania. So it's kind of like this quiet
Speaker:little corner of the state.
Speaker:And I've been, you know, reading and studying
Speaker:today. I hit day 600 on Pluralsight Consecutive. Nice.
Speaker:So recording this June 17th. And, how
Speaker:things with you, Andy? Things are good. I'm gonna throw out a plug for
Speaker:data driven media dot tv because Frank mentioned.
Speaker:If you're listening, he while he was mentioning that, he was
Speaker:actually panning the camera over to the lake. But if
Speaker:you're, subscribing to data driven media dot tv, you get
Speaker:to see us. You get to see the video, and you
Speaker:can see, for instance, that I am wearing the, my data is the
Speaker:new oil t shirt, which you can pick up. I'm just full of
Speaker:sponsor stuff today. I'm just doing Well, it's self out. It's
Speaker:self sponsored. And, honestly, we really need to get better at that. Right? We have
Speaker:data channel. Tv. There is a for listeners to the show, I will give
Speaker:a preview. There is gonna be data driven academy is is launching soon. You have
Speaker:a course coming up the end of the month. Actually, yeah, it's fabric.
Speaker:Today. We're recording this on 17th. It's 24th
Speaker:of of June, but I'm also doing, 2 more, at
Speaker:near the ends of July August. And in addition
Speaker:to that, while we're shameless plugging away here,
Speaker:before we get to our very interesting guest, now I'm also bringing
Speaker:back my, day of Azure Data Factory as wildly
Speaker:popular. I delivered it at a couple of, conferences,
Speaker:international conferences, 22, 23. And,
Speaker:yeah. Let's see see if people are interested. What do you do Friday this
Speaker:afternoon Friday afternoons, Andy? Oh, there's this thing, Frank. Thanks for
Speaker:mentioning that. Totally free. We we gotta we're trying to get better at this. That's
Speaker:all. We do. Yeah. Data engineering Fridays. And if you go to data engineering
Speaker:fridays.com, you can learn more about that. Frank, you're doing a lot
Speaker:of stuff with I noticed with using the, encore
Speaker:replay feature in Restream. And it's
Speaker:right you you shared that with me. I started doing that with data engineering
Speaker:Fridays as well. But great a great way to,
Speaker:you know, to get your message out there. And, you
Speaker:know, I I had no idea replays would help. But my gosh.
Speaker:They really have. It's just a matter of just hitting the echo of I
Speaker:can't even talk. Algorithm the right way. Yeah. And Yeah. You know,
Speaker:maybe we can get the so I think it's a good segue, for our
Speaker:guest. Doctor Yossi, Keshet. He's the chief
Speaker:scientist at AIOLA, an AI powered tech
Speaker:company that automates business workflows
Speaker:by capturing spoken data. Yossi is also
Speaker:an associate professor at the Faculty of Electrical and Computer
Speaker:Engineering at the Technion in Israel.
Speaker:Yossi is an award winning scholar and has published over a 100 research
Speaker:papers about automated speech recognition and speech
Speaker:synthesis. Welcome to the show, Yossi. Hi.
Speaker:Nice for having me. Thank you for having me. Hey. No problem. No
Speaker:problem. We are very excited to have you. And, you're not just an
Speaker:academic, but you've also proven yourself in in actual enterprise. So
Speaker:which sounds really bad as I say that out loud, but I think you knew
Speaker:there was a compliment.
Speaker:But, so what is AIOLA?
Speaker:Can you tell me a little bit about that? Because I'm curious about that and
Speaker:and and workflows
Speaker:around spoken data. So
Speaker:Iola is a company that is aimed to target
Speaker:the, you know, the very basic and foundational
Speaker:industries. Maybe if I
Speaker:may, let's start with the a general scene of the
Speaker:automatic speech recognition now, and then you will understand where are YOLA stands because we
Speaker:have now open AI and everything is like we you
Speaker:can say we solve the AI problem. So it's not like that.
Speaker:So we are in a in a amazing shape in in
Speaker:terms of automatic speech recognition. So we we have a paper that shows
Speaker:that whisper, the model of OpenAI, is as good as humans in
Speaker:detecting and transcribing language when we speak about
Speaker:American English with noise, without noise, and
Speaker:also, l 2 speakers. That is the
Speaker:speakers of non non native American speakers of the
Speaker:language. And the the results are so whisper. The
Speaker:OpenAI model is the same as human listeners. And that is
Speaker:the main thing. But the thing is that
Speaker:when you come to industries, usually they have jargon, they have special words.
Speaker:And and those words are either rare in
Speaker:their language or they they they are not none
Speaker:word. It's like I don't know. I when I'm a medical doctor and would like
Speaker:to make a surgery surgery and I would like to transcribe what I'm saying during
Speaker:the surgery. I'm there isn't words that which are not
Speaker:often used or which are none, non English words. And
Speaker:in that case, those, automatic speech recognizer doesn't
Speaker:work at all. They don't detect those words. And in Ayala, this
Speaker:is our target to take those words, which are actually the most important word. Those
Speaker:are the jargon of the of the industry of the of the facility.
Speaker:So the goal is to help those industries to come
Speaker:up with the with the automatic speech recognition for
Speaker:reporting for transcribing speech.
Speaker:I have a question. When you say automatic, what what makes it automatic? Is
Speaker:it just kinda, what exactly does that mean?
Speaker:So automatic speech recognition today works very similar
Speaker:very, very similar to the way KJGPT works.
Speaker:KJGPT works on a model called transformer. It's an, deep
Speaker:learning architecture, which has, a
Speaker:history based on previous recurrent architectures.
Speaker:And it can predict, as as we all know, it can
Speaker:predict text amazingly. In speech recognition, automatic
Speaker:speech recognition, it's almost the same thing, but there is another
Speaker:component, to the to the to the
Speaker:this transformer, which is which is called encoder.
Speaker:This this part take the speech and actually transfer it to
Speaker:a great representation that can be used
Speaker:with this, with this, let's call it with this with the other side, with
Speaker:this, GPT together. Together, they can,
Speaker:transcribe speech in, as I described, in a very good
Speaker:way, as good as humans in some
Speaker:cases. I will say, like,
Speaker:I've been messing around with the app that's on the phone,
Speaker:for, chat g p chat gbt, and,
Speaker:I use the the voice interaction feature. It is
Speaker:amazingly good at getting rid of the umms, the ahs,
Speaker:the scatterbrain thoughts that I sometimes have when I talk to it.
Speaker:Like, it it could kinda really distill a lot of
Speaker:things. Like, I'm impressed with it. It's it's really gotten last time I
Speaker:did anything serious with speech recognition was probably, like, maybe 4 years
Speaker:ago, and it's really improved. Like, I mean, orders of magnitude
Speaker:than I thought. I mean, it's it's it's it's almost at Star Trek level. You
Speaker:know? I'm not sure
Speaker:in those it depends on the company if it's Apple or
Speaker:Google. And I'm not sure which they don't declare
Speaker:which models they use. I think, personally, they don't use this whisper or
Speaker:the latest model that we have for automatic speech recognition that
Speaker:is transcribing speech. And the goal is a little bit different
Speaker:in the in the phone. You actually want to maybe Right. Make,
Speaker:make notes, send an email, send a text message,
Speaker:and maybe the vocabulary the vocabulary is less
Speaker:less defined. There is another problem with
Speaker:the phones. Oh, no. Go ahead. I want to call my
Speaker:friend. His name is xi, and
Speaker:the last name is CHUNG. How do you pronounce it?
Speaker:What what do you do with that? I'm gonna say he or chi or
Speaker:so there is a there is a problem of proper name and how do you
Speaker:define them. And this is a completely different problem. It's still an open problem, and
Speaker:the goal is a little bit different. So
Speaker:it's when we assessing the quality of those models, it's
Speaker:a little bit different than the assessment of just spoken language
Speaker:like what we do now. No. I mean, that's a great point. I mean, my
Speaker:last name has, you know, technically is Lavin.
Speaker:But, you know, growing up for for reasons many,
Speaker:big and small, it became Lavinia. And like, so, like,
Speaker:the phone, depending on if it's Android or an Apple, it will, it
Speaker:will he gets confused pretty easily.
Speaker:And that is an interesting point. Some names, Andy is lucky to have an
Speaker:easy name for the, the system.
Speaker:But not everybody does. So I understand that. Sure.
Speaker:I also wanna double click on American
Speaker:English. You you you said that a bunch of times. Like, is there is there
Speaker:an inherent bias in these model trainings because these are done by American
Speaker:companies? Yes. There is. Okay. The
Speaker:day the data is mostly of American English. The research institutes
Speaker:are mostly American. So the reason maybe I don't know
Speaker:if you'd call it you call it inherent or implicit bias, but there is a
Speaker:bias, definitely.
Speaker:We are investigating, by the way, the the intelligibility
Speaker:of speech in some cases And what is the intelligibility of
Speaker:of American listener versus the inter intelligibility of
Speaker:myself, which I'm not American listener, but I I know English.
Speaker:What is the best, what is the best, double quote speaker? What is the best
Speaker:listener? How can we transform those
Speaker:to speech recognizer? How can we transform those to assessing the
Speaker:quality of speech? What does it mean? What does it mean about the pathologies in
Speaker:speech? And this is ongoing research on
Speaker:this on this field. Interesting.
Speaker:I I often wonder, like, you know, what it's not just English.
Speaker:Right? Like, you know, if you listen to Spanish, like, there's different dialects of
Speaker:Spanish. Right? Even even German. You know, I'm sure
Speaker:there's, you know, plenty of dialects of all these languages and,
Speaker:like, how do you the training of a
Speaker:model that where it can get to be as good at
Speaker:understanding x and x versus x and y versus, you know,
Speaker:the base language, the base standard. I don't know. That's
Speaker:fascinating. It seems like it seems like it could be an endless loop of, like,
Speaker:training. It it is. Indeed, it
Speaker:is. And when we train, there is another so I'm I'm
Speaker:working on deep learning and AI. And what we found out
Speaker:that it it may it may be the case that if you train
Speaker:on 1 language, huge amount of data from 1 language, let's say
Speaker:American English, but then train on less data on Spanish,
Speaker:you actually get you get some advantage of training from
Speaker:from the American English. So, again, in this modern whisper of
Speaker:OpenAI, most of the data is American English, but,
Speaker:actually, other languages are really great.
Speaker:Again, Spanish is amazing. So maybe like
Speaker:humans maybe like humans as we learn more and more languages, it's easier
Speaker:for us. This is very interesting, point.
Speaker:No. That's an interesting idea because I know, like, I never
Speaker:understood American English grammar, American or otherwise,
Speaker:until I studied a foreign language. And then when I studied it, it was German.
Speaker:And, you know, German kept a lot of the archaic things that
Speaker:are in English and kept them and kept make kept them,
Speaker:made continue to keep them important. Like in English, you know, who
Speaker:and whom used to confuse the you know what out of me.
Speaker:Right? But when I when I learned in German about different cases and things
Speaker:like that, I was like, oh, that's why it is. Right? So,
Speaker:like, all these things that just like you said, like, learning another
Speaker:having more data or data from another point of view, I suppose,
Speaker:or another way to look at the world help me look at my world
Speaker:a little better. Maybe maybe that's how
Speaker:AI will work too. I don't know.
Speaker:Maybe. We don't know. We we actually have a guess about that
Speaker:because it those networks actually solve an optimization problem,
Speaker:mathematical optimization problem. It's a problem that
Speaker:that is, we define it with equation, and we need to have
Speaker:a computer running and solve it. The equation is
Speaker:overtraining set of examples. So it's 1
Speaker:1 person say that, another person said something else.
Speaker:And what happened is that when, again, when we have
Speaker:a large amount of data,
Speaker:it seems that those those networks get to an amazing place.
Speaker:So this this, algorithm, this whisper or other
Speaker:algorithms, it's really from the recent years, like 2, 3 years.
Speaker:That's it. We it's they they perform amazingly
Speaker:amazingly, with the with the
Speaker:same with the same mechanism, not with the same amount of
Speaker:data. Yeah. That's that's that's the
Speaker:fascinating aspect of all of this. It's just that some of these things just seem
Speaker:some problems seem harder than they ought to be,
Speaker:and then some solutions to problems seem way more effective than they
Speaker:ought to be. It's an interesting also to say
Speaker:it's always the case that we so Whisper, OpenAI Whisper, was trained
Speaker:on 600000 hours of speech. But this is
Speaker:way, way much more than just a kid learning a language.
Speaker:Kid language learning a language exposed to way much less hours of
Speaker:speech, less less accurate, less,
Speaker:coherent. And this is something,
Speaker:Nom Chomski raised years ago, like, 50 years ago.
Speaker:And it's still an open question. Like, if we can make those
Speaker:system works better, if we know the language,
Speaker:I guess you learn German faster than any
Speaker:machine that works today.
Speaker:That's yeah. It's it's and I'm glad you mentioned Noam
Speaker:Chomsky because that kinda was like so for those who don't know, Noam
Speaker:Chomsky is, among other things, a noted linguist scholar.
Speaker:I highly recommend you do a search on him because that's a that's a
Speaker:good Wikipedia rabbit hole to fall into. But,
Speaker:how much does linguistics come up in this? Right? Because I think
Speaker:what's fascinating about this field for me is a lot
Speaker:of, my grandfather, my great grandfather
Speaker:was a a linguistic professor. And, you know, as the
Speaker:family lore goes, I never met him. He died decade or 2 before I was
Speaker:born. He spoke, like, 12 languages. He was a professor of, like, 5
Speaker:or 6. And, you know, a lot of people in my family
Speaker:seem to have on that side of the family seem to be gifted in language.
Speaker:And 1 of the fields I was tempted to to study in
Speaker:university was linguistics. And I just find
Speaker:it interesting how there's
Speaker:a now a Venn diagram now is much larger
Speaker:than it used to be in terms of linguistics and computer science.
Speaker:So what are your thoughts on? Like, how much does like,
Speaker:if you're if you have a
Speaker:company like AIO. Right? Like, how many people are, you know, honest to
Speaker:goodness, linguists versus computer scientists and and AI engineers?
Speaker:So there is there is no no linguists there. Oh,
Speaker:really? Okay. There are no linguists. But I have to tell you, so there was
Speaker:a professor called Freddie Frederick, Jelinek. He was the
Speaker:head of language, research at the John Hopkins University
Speaker:at Baltimore. He was amazing. He was 1 of the smartest,
Speaker:people on earth. And he said he was
Speaker:developed many of the speech recognition algorithms. He said,
Speaker:every time I fire a linguist, the performance of speech recognizer goes
Speaker:up.
Speaker:And this is, this is embarrassing. But I've been I
Speaker:made myself, 1st, really like
Speaker:linguistics. I really like cognitive sciences, and I really
Speaker:try to combine it with with my work. But it's really
Speaker:amazing that we don't have all those AI system
Speaker:don't have any of that. So you don't train CEGPT
Speaker:to what is a noun, what is a verb, what is anything. You don't train
Speaker:speech that this is the
Speaker:this is the you don't you don't use linguist. You don't use this is
Speaker:the prominent word. This is the end of the sentence. It just happened
Speaker:by huge amount of data. And
Speaker:this is interesting. This is somehow contradict Noam Chomsky who said that
Speaker:there there is a universal grammar. There is a
Speaker:we are born innate with language. There is a
Speaker:maybe some black box in our brain which
Speaker:is tuned to learn a language. And,
Speaker:we are not sure about that. There is no direct proof if it's correct or
Speaker:no. We are born with language. We are as humans, we're
Speaker:born with language. We this is part of our, human being.
Speaker:We are not born with written language. So written language was invented.
Speaker:The spoken language is something like like a zebra
Speaker:has stripes. This is this is our nature, and this is
Speaker:interesting. This is not happening not happening in
Speaker:AI. The best success that didn't have linguist, they don't have any
Speaker:restriction of what should be say or not.
Speaker:Maybe maybe AI will be a tool to somehow
Speaker:make the linguist research more effective and
Speaker:try to understand what happened in the brain, what happened in the cognition part.
Speaker:But I would like to tell you about another research we are preparing here, which
Speaker:is really amazing. 1 of the thing is that we have
Speaker:so there is this JGPT. It's a language model.
Speaker:We also have something in the brain. It's also neural network.
Speaker:And we when we try to compare them, there is a huge
Speaker:correlation between the the what happened in the artificial neural
Speaker:network of GPT and the neural
Speaker:biological neural network in the brain. And, it was
Speaker:shown, several years ago, and here we
Speaker:show it again with, with this, with the most modern,
Speaker:automatic speech recognizers. So this is
Speaker:a phenomenal post correlation between the artificial and the
Speaker:neural mechanisms. I was gonna ask about that
Speaker:because I'm I'm familiar with, you know, at least the abstracts of
Speaker:the research, from a few years ago and now. And
Speaker:I was curious if there had been any new correlations
Speaker:or, you know, or new research, new connections that have been made
Speaker:between machines learning languages
Speaker:and the way our brains work. It sounds like
Speaker:that's true.
Speaker:So we try to we just initiate, man,
Speaker:a research here in my lab about that. There was
Speaker:some French guys from, mainly King
Speaker:and his colleague at, Meta. And
Speaker:and I forgot the university in France. So they
Speaker:show that there is those correlation. They show simple correlation. What we
Speaker:they show it with LLM, with language model. What we show is a little bit
Speaker:different. We show correlation with automatic speech
Speaker:recognition. So we ask people under fMRI, under MRI.
Speaker:They're we scan their brain at some
Speaker:resolution, and we try to find correlation with their brain activity
Speaker:during reading and during speaking aloud,
Speaker:and ask what is the correlation with the the best model we know for
Speaker:speech recognition. And then there are correlation.
Speaker:I have to say that there is a mechanism in the transforming this
Speaker:architecture of neural network. There is a mechanism called attention. This
Speaker:mechanism allow those model to to have the connection between
Speaker:worlds and themselves. So, I'm eating an
Speaker:apple. It was delicious. So it refers to the apple.
Speaker:Okay? So there is attention mechanism. This what makes those
Speaker:model amazing. So there is attention mechanism, I guess, in the
Speaker:brain. So we try to correlate the this attention mechanism in
Speaker:the models and compare it to what the activity in the brain. We don't have
Speaker:results yet, but it seems promising. And we also ask
Speaker:another question. What if you don't read aloud? What if you read
Speaker:like silent reading? What if you have dyslexia? What if you have,
Speaker:other type of, pathology? What
Speaker:what are the correlation then? So this is fascinating. So and
Speaker:there is correlation. I don't I don't know still what what's going to happen
Speaker:with that. But I I know the pathologist, but it's unbelievable, the
Speaker:correlation. That that is really exciting,
Speaker:especially when you're examining things like dyslexia,
Speaker:which is considered, you know, not normal,
Speaker:or maybe that's not the right term for it, but a
Speaker:challenge at a minimum. The cool the cool kids call that neurodivergent
Speaker:now. I think Neurodivergent. Thank you, Frank. So when you're studying, you
Speaker:know, when you're studying that sort, I'm wondering if there's a place for
Speaker:that, in in the artificial.
Speaker:I'm curious. What what do you mean? Can you
Speaker:So, yeah, is there is is there any benefit
Speaker:to, I say, transferring the thought processes
Speaker:of people who are neurodivergent and and automating that
Speaker:and making that part of the, you know,
Speaker:the the language model or or speech recognition?
Speaker:Yeah. I think so. I think so. 1st, it's a it's a tool
Speaker:to to an to analyze what happened in the
Speaker:brain. Yeah. What happened
Speaker:but it's very difficult. So we don't, we don't have any debugger for the build
Speaker:the brain. We don't see the code of the brain. We don't see that this
Speaker:function doesn't work. And it's, most of the work
Speaker:is to design the experiment and
Speaker:and it's really amazing. In our design, we have the
Speaker:same so as yet as I told you, I'm asking people to read aloud
Speaker:and compare it to what automatic speech recognition,
Speaker:is plan is, supposed to do. But I'm
Speaker:also asking people to read silently, and then I follow
Speaker:their eyes. I have a make a make a machine that follows their eyes, and
Speaker:I know where where is the where like, III
Speaker:track their eyes and I see which wall they are reading
Speaker:now. And I can and I can use that to follow
Speaker:what what they read. But in order to operate that on a speech
Speaker:recognizer model, I need the speech. So it's during the design of
Speaker:the experiment, I need artificial speech or I need them to to read aloud
Speaker:afterwards. It's a it's a big, it's a big question
Speaker:how to do that properly and how to
Speaker:make things happen, but definitely walking with
Speaker:people with, with problems first to help them.
Speaker:And second, to understand them. And 3rd, to maybe make
Speaker:understand the brain and make, AI better.
Speaker:I also think, like, stroke victims, right, could benefit down the line
Speaker:from a better understanding of lang language models. Right? Like, maybe there would be some
Speaker:kind of therapy that could be directed to that. I think I think it's
Speaker:fascinating. I always love those fields where they touch upon more than 1 thing.
Speaker:Right? This isn't just math. This isn't just computer science. Like, it's linguistics. But,
Speaker:you know, it's a little bit of everything. It's like a giant, like, pot of
Speaker:stew that you just throw a bunch of stuff in, and it all kind of
Speaker:mixes. And, like, it's kind of like, almost like intellectual gumbo,
Speaker:I guess, would be the word. Right? But,
Speaker:what what,
Speaker:what drove you to make, your your your
Speaker:your company? Like, what what was the driving force to
Speaker:say, hey. You know, we have
Speaker:I remember many, many years ago in an office, and you would always see
Speaker:doctors talking into these little, like, miniature recorders.
Speaker:Right? In the olden days, they would go off to
Speaker:some data center somewhere and somebody would not data center, but, like,
Speaker:some piping center, call center where people would
Speaker:transcribe that. You know, obviously, that is now an artifact of
Speaker:the past as these models have gotten better.
Speaker:What what was the goal in in in, your
Speaker:company to say we can do this better? What what was the the that breakthrough
Speaker:moment of, like, here's here's what the industry already does. Here's how we can do
Speaker:it better. So there is
Speaker:so we all know Check GPT, and it influence our life. We search now
Speaker:instead of Google, we search with GPT and it's amazing. It's unbelievable.
Speaker:So I thought, what about the very fundamental industries? What
Speaker:about,
Speaker:like, when you check-in when you, check an airplane, you
Speaker:use a special jargon. You cannot touch anything. You cannot
Speaker:leave even a pen there because otherwise the the plane wouldn't be,
Speaker:valid for flight. What about industries like the food
Speaker:industries when you need to report, the process? You
Speaker:have gloves, you cannot touch an iPad, you cannot barely
Speaker:write. And what about, other industries
Speaker:like, maybe the cheap technology when you make nanotechnologies and
Speaker:when you make chips, you make, you know,
Speaker:silicon chips and silicon
Speaker:first. So you need you you are cover all.
Speaker:You are with gloves. You need to report the process. It's a all
Speaker:those industries has this have special jargons. They use special
Speaker:terms to describe what they're doing. They don't have access to
Speaker:to to write something,
Speaker:and they are very limited in the way they they provide. And on the other
Speaker:end, we had speech recognition, but speech recognition doesn't work on
Speaker:those jargon world. Those jargon world are actually the
Speaker:most important to those industries, and this was the goal for
Speaker:Iola. So what we do is we operate,
Speaker:automatic speech recognition, the best automatic speech recognition,
Speaker:but we also operate something else. We also operate something called keyword spotting.
Speaker:It's another deep network, which is focused
Speaker:on detecting only the jargon words. So you can define those jargon
Speaker:words in advance. You don't need to train them. You you can
Speaker:define them, and it they all work together. They work like, as a
Speaker:complimentary, couple to make a
Speaker:very robust prediction, and we can detect those,
Speaker:jargon words and make reporting on on on on the
Speaker:process, without just by speaking. So it
Speaker:can it can use in any industries,
Speaker:any, industry that doesn't
Speaker:have access to the most modern AI system, the speech
Speaker:recognizer wouldn't walk there. They have problems, like,
Speaker:writing and formulating their reports.
Speaker:Yeah. So I'm curious how those work together. You mentioned
Speaker:that you've got the speech recognizer. You've got the keyword,
Speaker:engine. Are they 2 separate engines that are just always running
Speaker:maybe agents, running at the same time or are
Speaker:they encapsulated, say, is the speech
Speaker:recognizer does the speech recognizer have a, you know, a
Speaker:subset or a a function built into it to do the
Speaker:keyword recognition? So just to
Speaker:be sure, those keywords in some industries are not are
Speaker:not are not English words. So it can be a word which nobody
Speaker:knows about. It was not shown in the in
Speaker:the, like, in the Internet, like, JGPT strain on the data over the
Speaker:Internet. There are some walls that are not not there. This is
Speaker:your, proprietary company. You have invented a wall to
Speaker:describe what is the this, part of the engine. So
Speaker:Yeah. So what we so we have this keyword spotting. It was it it
Speaker:is trained to detect keyword in general. They are defined by,
Speaker:by text and it operates. We have 2 model for preparation. 1 of them
Speaker:works on the this encoder part of
Speaker:the of the automatic speech recognition, and then it guides.
Speaker:It's still the speech recognition towards the correct
Speaker:transcription. And there is another mode, which is,
Speaker:our self, encode our self representation of
Speaker:speech, and then it also guides the automatic speech
Speaker:recognition to a better, location and to detect those
Speaker:words. And, actually, we can show that you can buy combine
Speaker:any word can be from different languages, and we can
Speaker:detect them, like, almost 100% correct, those jargon
Speaker:words. That was that was going sorry. Go ahead.
Speaker:No. No. No. Sorry. That no. That's okay. That that makes perfect
Speaker:sense now, what you just said about the languages using
Speaker:multiple languages, you know, English plus all of the
Speaker:other languages because sometimes
Speaker:people will struggle if their English as a second
Speaker:language speaker. They'll struggle to find the right
Speaker:English word, and they'll substitute a word from their native language.
Speaker:And in other cases, they'll be perhaps teaching
Speaker:on a topic, and they may revert back
Speaker:to an older language, Greek, Latin, something
Speaker:like that. That may be part of the, the
Speaker:lecture or, you know, I could see that in
Speaker:medicine. I could see it in, you know, all all sorts
Speaker:of literature studies. I could see a lot of that. And that
Speaker:that kinda clicked for me as you were saying that that makes sense that you
Speaker:would have additional languages. Yeah. I also wonder, like, in in
Speaker:also conversational context. Right? Like, you know, Spanglish is a
Speaker:thing. Frankel is is the French and
Speaker:English kinda mashed together, and I know that other language
Speaker:whenever you have 2 groups of people kinda come together, like, you know, there's always
Speaker:some kind of weird mix of language that that kinda
Speaker:just evolves either naturally or forced. I mean, that's Right. That's another
Speaker:debate. Are you thinking belt or creole? I know we're Belter, you know, I
Speaker:wasn't going there, but that that's a that's an excellent example.
Speaker:So, Yosie looks very confused. So so there's a series of
Speaker:books, called The Expanse. It was an excellent TV show
Speaker:for about 6 seasons, and it's basically set, 2,
Speaker:300 years in the future.
Speaker:And as humans colonize the asteroid belt,
Speaker:their people from all over the world kinda all end up living
Speaker:together. So, like, the the Belter Creole language is this is a
Speaker:creole of, you know, literally dozens of languages. Right?
Speaker:So, like, it'll switch from, you know, Hindi to Arabic to,
Speaker:English to French to there's even some German in there. I've heard some of that.
Speaker:Like, and there are these kind of these weird mixes of things. Right? So they'll
Speaker:say the the word for the Belter people, like,
Speaker:people live in the Belk, is Beltaloda. Belt obviously comes from, you
Speaker:know, the asteroid belt English. Loda, I think is a Hindu term. I
Speaker:think. Don't hate on me in the comments. Don't hate on me in the comments.
Speaker:But, I know Walla is a is a is a Hindu term. Right? So
Speaker:they'll they'll, you know, when they talk to people who live in the Earth or
Speaker:Mars, they refer to them as well wallahs, gravity well
Speaker:wallahs. Right? Like so it's like, and I only know wallah because
Speaker:of dish wallahs, and Wired Magazine did a whole story about dish wallows in
Speaker:the nineties. Anyway, but I mean, I think, like, you know, I
Speaker:I suppose that approach could work for something like a creole. Right? Like, we have
Speaker:multiple languages kinda mixed together. Or is that not really a
Speaker:massive business case?
Speaker:It's Creole is really complicated. It's a language. It's like real real a
Speaker:real language, and it's complicated. This the the more
Speaker:delicate cases of that, what we call in research, code switching when
Speaker:I'm Right. When I speak Hebrew, for example, I don't have a
Speaker:word for the, you know, the Internet router. So I say the router in
Speaker:in English. Or I said email or I will say
Speaker:I don't know. There are so many words in English that are used especially
Speaker:in technology that you use worldwide in other languages, and this
Speaker:is code switching. There is another case. I think Andy pointed it
Speaker:out that sometimes when you are stressed
Speaker:or let's say your l 1 is Spanish, but l 2 is American
Speaker:English or you're bilingual. And sometimes when you are
Speaker:stressed, you you just switch the the 1
Speaker:word and it this is amazing phenomena. This is a research with Tamar Golang
Speaker:from, University of San Diego and Matt Goldrick from Northwestern
Speaker:University. And I provide, again, a mechanism to detect
Speaker:that and to make research of that. And the the key question is,
Speaker:like, why do you do that? Why do and when do you do that? Is
Speaker:it stress? What what what is the what is the state of
Speaker:describing those? Are you gonna describe it in the American
Speaker:way, the Spanish word, or is it gonna be vice
Speaker:versa? And this is really interesting.
Speaker:It's not my field of research. I just know how to detect them
Speaker:and, and Interesting. To detect them really well,
Speaker:but I don't know why it happens and what is the mechanism
Speaker:behind that. I could definitely see,
Speaker:the opportunity with starting with being
Speaker:able to detect, you know, these I
Speaker:don't I don't know the right word for them. I'll I'll call them modes. You
Speaker:know, a mode of speech where someone is mixing 2
Speaker:languages. And I'm sure those vary.
Speaker:So Like when I go Jersey on you. Right? That's we we
Speaker:can't we can't say any more about that, Frank. We're trying to keep our
Speaker:clean rating. But yes. Exactly. But,
Speaker:that's sorry. Inside, Joe. But the,
Speaker:but, yeah, I could see modes of speaking where someone who is
Speaker:more familiar with English as a second language.
Speaker:And and they've still you know, of course, they know their native language. They'll always
Speaker:know that. But as they I don't I don't wanna use the wrong word
Speaker:here, but I'm thinking experience is probably the best word is they get more
Speaker:experience, gain more experience with their second language.
Speaker:They may switch words less or switch languages
Speaker:less. And detecting that, I think, is the
Speaker:is key. I understand now more about what what you're doing, what
Speaker:you're accomplishing. And that that's the
Speaker:very first step to then being able to produce speech
Speaker:in those different modes. And that would be a
Speaker:fascinating, you know, a fascinating accomplishment.
Speaker:If you do, the more we can have. Machines
Speaker:speak to us in the language that we're most familiar with, that,
Speaker:of course, you know, is is almost there now, mostly
Speaker:there right now, but have it be able to to speak to us in these
Speaker:different modes where we where the machine switches where it's
Speaker:back to our first language, you know, based
Speaker:on some algorithmic calculation. That sounds
Speaker:fascinating. Yeah. It is.
Speaker:I'm not sure we are there yet. It's we have a long way to go
Speaker:there. But, Sure. Yeah. Makes
Speaker:sense. Fascinating. Well, this is how it starts, though. Right?
Speaker:This is fascinating. This is, yeah, this is,
Speaker:somehow there is an elephant in the room. There we may have to say
Speaker:something about AI and their regulation and what happens now.
Speaker:And, if I may, I would like to say something about this because I have
Speaker:a deep totally different point of view about that.
Speaker:Please. So everybody is speaking about
Speaker:regulation and it might be a catastrophic situation
Speaker:if those, machine are connected
Speaker:together and they start to train themselves. They try to
Speaker:build a meta architecture and try to train themselves,
Speaker:and then they come up with something which is better than human. Some some people
Speaker:call it the singularity point. So this is frightening. They're smarter
Speaker:than us. Maybe they they're gonna kill us all. And
Speaker:people say now people speak about regulation now, and there are
Speaker:several institutes in Europa, in Europe and in, the US
Speaker:trying to tackle that. And that
Speaker:is amazing. That is really important, but I think we missed something here.
Speaker:And I'll tell you why. So the so there is a book. It's here.
Speaker:You know, Isaac Asimov, I, Robot. You probably
Speaker:know that. So he, like, the first page of this book is like the 3
Speaker:laws of robotic. A robot may not in in injury a
Speaker:human being or through an interaction, allow human being to come to harm.
Speaker:A robot must obey others and so on. So we have let's say
Speaker:we have the regulation. AI cannot hurt humans. Okay?
Speaker:But that doesn't enough. It's not good enough because if the AI is smart
Speaker:enough, it will not do the I mean, it will
Speaker:show us humans that it really obey the law
Speaker:the laws, but it wouldn't. And this is frightening.
Speaker:And here I suggest to look a little bit about the human morality
Speaker:and what why human are have do they have laws? So we need to
Speaker:think about, if I may, think about the
Speaker:human psychology. In human psychology, we have a mechanism to obey law.
Speaker:It's called the superego. It was embedded or defined by
Speaker:Freud. So we have a mechanism that if we
Speaker:if we doesn't we if we don't obey a law, we feel either
Speaker:guilt or fear. And this mechanism was evolutionary.
Speaker:So do we have a group of monkey? They obey
Speaker:the the alpha monkey because they're frightened from him. They have some kind of
Speaker:primitive superego. We obey the law because either we fight them from the
Speaker:police or either we feel the guilt, we
Speaker:we it's like the
Speaker:those experiments that show that, there is, somebody,
Speaker:left something on the table, and we don't take it because we feel guilt or
Speaker:we feel something. So this is this mechanism, what
Speaker:I claim, should be transferred to the
Speaker:AI machine. This should be the regulation. So what is it superego? Superego
Speaker:is a infrastructure for to be moral,
Speaker:and we need a digital version for that for the this is the regulation we
Speaker:need. We need the infrastructure to be moral in machine. And what it what
Speaker:does it mean? So superego means that it's a little bit like
Speaker:self harm, if I may. It's like we feel guilt. We feel something bad if
Speaker:we do something not okay, if you're not obey the law.
Speaker:So it's like a self destruction for AI machine. So AI machine,
Speaker:if it doesn't obey the law, should feel something. It
Speaker:cannot feel so. Right. It will distract itself. So this is my
Speaker:claim. This is a book I'm writing, and this is something very fun fundamental.
Speaker:We we all speak about this regulation, but I think it
Speaker:it doesn't help just to to do standard
Speaker:regulation. And if you if I may say another thing, the last thing is that
Speaker:if you read the I, Robert, carefully, so
Speaker:he speak there are several short stories there, and he speak about robots that
Speaker:obey the law. And if you look carefully about those robots that
Speaker:obey the law, the those robots have super all
Speaker:all of them have have super ego. They feel guilt.
Speaker:The the first story is about a robot that play with a girl,
Speaker:and he feel guilt about winning all the time. So he let her win.
Speaker:So he feels guilt. It means that it has superhego.
Speaker:And then he feels frightened from the mother of the girl. And it's
Speaker:really amazing. So I think, so
Speaker:this book I'm trying to describe the psychological concept of superego
Speaker:and then describe why it need to be more and how we can,
Speaker:find a way to put it in regulation, like the the infrastructure
Speaker:itself and not just lows.
Speaker:That is a very interesting problem you're trying to solve.
Speaker:Very important problem at that. Agreed. And
Speaker:culturally, we speak, in the US, we have a saying that you
Speaker:cannot legislate morality, which
Speaker:legislate, regulate would be, you know,
Speaker:synonyms. Exactly. Right? So Right. Right. And and legal code
Speaker:is code. I I
Speaker:definitely get what you're what you're saying. And I think it's super
Speaker:important. You mentioned you were writing a book about this. Now
Speaker:now now you have to tell me more because I wanna read this book.
Speaker:Same. I'm in the process of looking
Speaker:for an agent and it's, it's complicated. It's supposed
Speaker:to be a popular book trying to explain the psychology of fraud.
Speaker:What is, superego, ego, and the id,
Speaker:and then describe what is the pathology? So we all have a pathology. So
Speaker:you have the pathology of, it's called,
Speaker:the, personalities criminal personality disorder. This
Speaker:person will not have a super ego, ego ego. It's like Richard the
Speaker:third from Shakespeare. He didn't have superego. He killed
Speaker:his family and didn't feel guilt. So this wouldn't what's
Speaker:going to happen with the with the with those machine. And then I
Speaker:give some literature examples of,
Speaker:what is a superego like from the, criminal and
Speaker:punishment that that the guy killed the the
Speaker:old lady, but he didn't he nobody,
Speaker:caught him killing the lady. He murdered her. Nobody caught him, but he
Speaker:still feel guilt. So he has a very, big
Speaker:superego. And then we describe I describe, what happened in
Speaker:other moral theories of human being, all of them connected to the
Speaker:superego. And then I tried to describe a little bit how machine
Speaker:learning is trained. Again, solving an optimization problem. And then I try
Speaker:to describe how can we do superego with, how can we have
Speaker:a digital superego if we can? No.
Speaker:It's like you're giving it a conscience of of sorts. Exactly.
Speaker:Yeah. And I I just wanted to, to add, we
Speaker:may be able to help you. Maybe not find an
Speaker:agent, but find a publisher. Both Frank and I are
Speaker:published. And we, you know, we know Andy has a lot of
Speaker:Andy's got a lot of connections in the publishing. Well That would be
Speaker:great. I am I am not, I just wrote a lot of books
Speaker:for different, publishing houses, and I know some people that if
Speaker:they can't help you directly, they can probably point you to someone who
Speaker:can. And, again, I am wholly motivated by wanting to
Speaker:read this book. Same. Like, I think it's important
Speaker:because I live in the Washington DC area. Right?
Speaker:So so, like, there's a lot of people there who they're policy
Speaker:makers. Right? Like, and they just assume
Speaker:and I think a lot of humans fall for this. Right? You you see this
Speaker:when the European Union passed their AI regulation act.
Speaker:They assume that regulation's gonna solve all their problems.
Speaker:And I think regulations prove that 1 of the fundamental forces
Speaker:in the universe is is unintended consequences.
Speaker:And, you know, when you regulate something, you don't end
Speaker:the problem. You change the way people will route around it. Right? Like,
Speaker:and I think a good example of this in AI is the movie Megan, which
Speaker:I don't know if you've seen, or m threagan. I'm not sure how to pronounce
Speaker:it, where I think she was about to torture
Speaker:she was I don't wanna give the plot away, but the the robot
Speaker:child, Chucky, kinda goes evil, Like, this is the
Speaker:basic kind of plot line, and the the the person who created her
Speaker:was like, you can't kill me because it's against your programming. He goes, oh, I
Speaker:said nothing about killing you. I was gonna put you in a coma, and you'll
Speaker:live, you know, however many years. Like, it was just like I mean,
Speaker:that's a great example of, like, she you know, don't kill. Right? Seems like a
Speaker:pretty reasonable instruction to give a robot, particularly a child's toy.
Speaker:They'll kill anyone. But, you know, she was realized, like, well, kill
Speaker:equals death. So if I don't kill you, if I just hospitalize you or
Speaker:incapacitate you, that doesn't conflict with rule number 1.
Speaker:Right? Which I think is no. Obviously, as, you
Speaker:know, humans, we're like, well, it's not really the spirit of the
Speaker:law, or the rule. But clearly,
Speaker:the robot or the AI in this case, kind of figured it
Speaker:out. Like, I don't know. I think you're right. Like and any regulations like that
Speaker:too. Right? How many loopholes do people discover, whether it's
Speaker:tax laws or, you know, this. It's like, well, technically, it's
Speaker:legal. Is it actually, you know,
Speaker:what the law intended? No. Like, it's Yeah. You need a you need
Speaker:almost an something like a Nuance engine,
Speaker:you'll see to Yeah. To get the the
Speaker:what the machine to interpret
Speaker:to the laws. And that's I've read Asimov as well,
Speaker:big fan. And that's what happens down stream of
Speaker:the 3 laws as they begin to fail as because the
Speaker:robots are doing exactly what they're programmed to
Speaker:do. And they're not they're they're
Speaker:finding ways that in our opinion, human opinion,
Speaker:circumvents the 3 laws, but really doesn't
Speaker:break the robot's programming. And it's all about, you know,
Speaker:how do you define harm? Like, Frank's example is a great, you know,
Speaker:great example of that. So, yeah,
Speaker:fascinating stuff. Yeah. We gotta Awesome stuff. We gotta help you write this
Speaker:book. I wanna read this book. Yeah. I want to raise
Speaker:another point, but the opposite point that you raised. Like, what happened with
Speaker:the autonomous car, for example, or people say,
Speaker:let's let's let's focus on autonomous cars. So so there will be
Speaker:autonomous car. Who is in charge of a of a car accident?
Speaker:Accidentally, somebody was killed. You are the
Speaker:owner you. Somebody is the owner of the car. He sits
Speaker:there. He bought the car, but the car killed
Speaker:somebody. So
Speaker:who who this is an open problem. This is, again,
Speaker:moral problem. So what I suggest here is
Speaker:maybe it will take time,
Speaker:I guess. Maybe the the car, if we can be the
Speaker:superego and mechanism for morality, you know, the just
Speaker:the infrastructure for morality can take the
Speaker:morality of the human. And if somehow he
Speaker:inherit the the the driver morality, you
Speaker:can blame the driver. I'll give you another example, which will be much
Speaker:more maybe concrete. So we say now that there will be change GPT for
Speaker:every person, for every laptop and iPhone and whatever.
Speaker:You will have your own GPT with your own life follows
Speaker:your own history. And the discussion with this GPT will be, And the
Speaker:discussion with this, GPT will be very personalized and
Speaker:very helpful. What happened in that case? So in that
Speaker:case, if this, GPT
Speaker:will take your responsibilities and morality, somehow we
Speaker:can copy your morality and be part of it. So if you're moral, it
Speaker:will be moral. If you're not, you're not, but this is
Speaker:your responsibility as a human. And I think this
Speaker:is the way to to go with that. We need just the infrastructure and not
Speaker:the the law. Anybody can define the low, and anybody
Speaker:can break the low. We just need the infrastructure to know that
Speaker:at least the machine to know that it break the broke the low.
Speaker:And and this is really important. I I think
Speaker:Oh, I totally agree. Totally agree. Well, we're
Speaker:gosh. We're coming up on time, Frank. Yeah. This was
Speaker:awesome. So we'll just any
Speaker:book recommendations? Obviously, I, Robot, I think, would be good reading
Speaker:in this space. You also mentioned Shakespeare too,
Speaker:Richard the 3rd. So Eddie, you can book
Speaker:which I'm which I'm reading now, which is the band,
Speaker:Vernon Stuputeux. It's, it's
Speaker:amazing. It's amazing. It's 3 books, and it's actually
Speaker:discussed whatever which is not AI. Anything which cannot be solved with
Speaker:AI. It's speak about a a person who has a vinyl shop,
Speaker:shop to sell vinyl and then CD runs, and now we cannot sell
Speaker:anything. So this shop is is closed, and then he
Speaker:he he try to somehow manage, but he get up at the street. He's, like,
Speaker:homeless, and he meets many people. And the way like,
Speaker:every chapter is a different, person or
Speaker:or a group of pair of people, and it's really
Speaker:fascinating. It's all those things that you cannot solve with AI. It's all
Speaker:the human interaction, the very, very basic human interaction. Amazing.
Speaker:It won the Booker Prize in the, 2018.
Speaker:Nice. Where can folks find out more about
Speaker:you? So I have a website
Speaker:under Joseph Keshet, and, and they
Speaker:can find me there. Excellent.
Speaker:Any parting thoughts, Andy? No. Just great great
Speaker:interview. I appreciate that. 1, I would ask if you repeat the name of
Speaker:the book you just mentioned about the the different stories.
Speaker:What's the name of that book? It's not it's a it's a single
Speaker:story. It's called the the pants,
Speaker:for non subtext. It's from French. Oh, okay.
Speaker:Amazing. Amazing. Amazing. Awesome. Excellent. That's it. That's
Speaker:it for me. But that's great talk. Thank you. Excellent talk. Thank you.
Speaker:And we'll let Bailey finish the show. Well, folks, that brings us to the end
Speaker:of another enlightening episode of data driven. We've
Speaker:navigated the fascinating intricacies of automatic speech
Speaker:recognition, explored the moral quandaries of AI, and
Speaker:pondered the future of technology with none other than 1 of the best minds
Speaker:in the field, doctor Yossi Keshet. Remember, if you
Speaker:enjoyed today's conversation, don't forget to subscribe to data
Speaker:driven media TV for exclusive video content.
Speaker:You can also grab some fantastic merch like the my data is the
Speaker:new oil t shirt Andy's sporting today. And while Frank is
Speaker:basking in the Appalachian sunshine, you can bet we're already cooking up the
Speaker:next episode to keep your data driven minds engaged and entertained.
Speaker:Until next time, stay curious, stay informed, and
Speaker:always keep questioning. Cheerio.