Hi and welcome to Fixing the Future, IEEE Spectrum’s podcast series on the technologies that can set us on the right path toward sustainability, meaningful work, and a healthy economy for all. Fixing the Future is sponsored by COMSOL, makers of of COMSOL Multiphysics simulation software . I’m Steven Cherry.
: Hey, Penny. How’s work?
Penny : Great! I hope I’m a waitress at the Cheesecake Factory for my whole life!
Sheldon : Was that sarcasm?
Penny : No.
Sheldon : Was that sarcasm?
Penny : Yes.
Steven Cherry That’s Leonard, Penny and Sheldon from season two of the Big Bang Theory. Fans of the show know there’s some question of whether Sheldon understands sarcasm. In some episodes he does, and in others he’s just learning it. But there’s no question that computers don’t understand sarcasm or didn’t until some researchers at the University of Central Florida started them on a path to learning it. Software engineers have been working on various flavors of sentiment analysis for quite some time. Back in 2005, I wrote an article in Spectrum about call centers automatically scanning conversations for anger either by the caller or the service operator. One of the early use cases behind messages like this call may be monitored for quality assurance purposes. Since then, software has been getting better and better at detecting joy, fear, sadness, confidence and now, finally, sarcasm. My guest today, Ramia Akula , is a PhD student and a graduate research assistant at the University of Central Florida is Complex Adaptive Systems Lab.. She has at least 11 publications to her name, including the most recent interpretable multiheaded self attention architecture for Sarcasm Detection in Social Media, published in March in the journal Entropy with her advisor, Ivan Garibay Ramia. Welcome to the podcast.
Ramya Akula Thank you. It’s my pleasure to be here.
Ramya, maybe you could tell us a little bit about how sentiment analysis works for things like anger and sadness and joy. And then what’s different and harder about sarcasm?
Ramya Akula So in general, understanding the sentiment behind people’s emotions like a variety of emotions. It’s always been hard. Actually, to some extent when you are in a face-to-face conversation, probably with all the visual cues and bodily gestures, it helps the conversation. But when we do not know who is sitting behind the computer or the mobile phone, so it’s always hard. So that applies for all kinds of sentiments. So that includes anger, emotion, humor, and also sarcasm as well. So that’s the initial point of this research.
Steven Cherry And what makes sarcasm harder than some of the others?
Ramya Akula So sometimes sarcasm can be humor, but also it hurts people really bad. Also how people interpret it because of people coming from different cultures, different backgrounds. In some cultures, something might be okay, but in another it is not. So taking these different cultures, backgrounds, and also the colloquialisms and the slang people use, these are some of the challenges that we face in everyday conversations, especially with sarcasm detection.
Steven Cherry Computers have been writing news and sports stories for some time now, taking a bunch of facts and turning them into simple narratives. Professional writers haven’t been particularly worried by this development, though, because the thinking is that computers have a long way to go—which may be never—when it comes to nuanced, subtle, creative forms of writing. What writers are mainly depending on to save their jobs and maybe their souls is irony, satire, humor. What they’re depending on, in a word, is subtext. Are you trying to teach computers to understand subtext?
Ramya Akula To be precise, these algorithms ... One of the toughest jobs for the algorithms is understanding the context, which we humans are really good at, so any human can understand the context and then go on the content based on the context, but for the algorithms, it’s always hard because when you have such long sentences, so having the semantic similarity or some kind of a relationship between the words in these long sentences, understanding the context and then coming up with the next sentence or coming up with some kind of a sentiment like a humor or the irony or these kinds of emotions to the text that adds another level of complexity. Yet in the machine learning community, they started, like most researchers, attacking this problem by looking at different representations. So by taking the sentence as it is and then chunking it down into parts like phrases, and then having different representations for each phrase. So in order to understand the context and then put all this context together and then generate a meaningful sentence next. I feel like it’s still in a very initial phase. And we have a long way to go.
You started with social media posts. This seems like in some ways an easier problem and in some ways a harder problem than, say, audio from a call center. You don’t have tone and intonation, which I think in a real conversation are often clues in what we might call human-to-human sarcasm detection.
Ramya Akula Yes. So in speech recognition, that’s one advantage, we look at the connotation or the how the voice modulate,s and then those kind of the signals will help us better understand it. But when we look at the text like a real text from all the articles or the online conversations that we see day to day. So there is not really any stress or any kind of a connotation that you could relate to. So that that’s what it makes a little harder for any algorithm to see. Yeah. So Hodor, for checking the severity of the humor or sarcasm there.
Steven Cherry If I understand something in your paper, neuropsychologists and linguists have apparently worked on sarcasm, but often through identifying sarcastic words and punctuation and also something called sentimental shifts. But what are these? And did you use them too?
Ramya Akula So the neurolinguistics or the psychologists, they primarily look at ... So the data that they get is mainly from real humans and real conversations. So it’s actually also when they are looking at the text, text written by real humans, then it’s actually the real humans are understanding the sense of the text. Right. So we humans, as I said earlier, so we are good at understanding the context just by reading it or just by talking in any form. We are good at understanding the context. So in our case, because we have no human involved in any of the data analysis part, so it’s all the pre-processing and everything is done automatically. It’s the algorithm that is doing it.
So we definitely use some cues. And also for the machine learning part, we have the labeled data, which is like giving a sentence, it is labeled with a sentence as sarcasm—has some sarcasm or no sarcasm—and then the data is split into training and test. So we use this data to train our algorithm and then test it on unseen annotated data. So in our case, because the data is already labeled, so we use those labels and also in our case, we use weights to understand what are the cues. So instead of real humans looking at the cues in the sentence, our algorithm looks at the weights that give us the cues for the words.
Steven Cherry We can say a little bit more about this. There’s a lot of AI here, I gather, and it involves pre-trained language models that help you break down a sentence into something you call word embeddings. What are they and how do these models work?
Ramya Akula So basically a computer understands everything in terms of numbers. Right. So we have to convert the words into numbers for the algorithm to understand. So that’s been put forward. So would this embedding does is basically the conversion of the real world into vectors of numbers. In our case, what we’ve used is that we use multiple endings. So there are like many embeddings out there. So starting from [inaudible] to the very latest GPT [Generalized Pre-Trained Transformer] that we are seeing every day, that’s generating tons of data out there.
So in our case, we use the BERT—BERT is one of the latest embedding technologies out there. BERT stands for Bidirectional Encoder Representations from Transformers. I know it’s a mouthful, but it’s basically what it does is that it takes the words—individual words in a sentence—and it tries to relate, connect, each word with every other word, both on the left and right side and also from the right to left side. So the main reason, for the BERT to work that way is that it is trying to understand the positional encoding.
So that’s basically what comes next. Like, for example, I need apples for work. So in this context, does the user mean you need fruit apples for work or an Apple gadget for work? So that depends really on the context. Right. So as I said, humans can understand the context, but for an algorithm to understand what comes, either the gadget or the fruit, it depends on the entire sentence or the conversation. So what BERT does, is basically it looks at these individual positional encodings and then tries to find the similarity or the closest similar word that comes next to it and then put it together. So it works both in the right to left and the left the right directions.
So to better understand and understand the semantic similarity. So similarly, we also have different things like Elmo [Embeddings from Language Models]. We tried experimenting with different embedding types, so we had the BERT, ELMo, and several others. So we added this part into our studies, so this is just the initial layer. So it’s a type of conversion for converting the real words into numbers to fit it into the algorithm.
Ramya Akula Yes, it does. That’s a short answer. But adapting algorithms is something ... It’s up to the corporates whether they want to do it or not, but that that’s the main idea—to help curb the unhealthy conversations online. So that could be anything, ranging from trolling, bullying, to all the way to misinformation propagation. So that’s a wide spectrum. So, yeah.
Steven Cherry Do you think it would help to work with audio in the way the call centers do? For one thing, it would turn the punctuation cues into tones and inflections that they represent.
Ramya Akula So the most precise answer is yes. But then there is another filter to it though, or actually, adding an additional layer. So the first thing is they’re analyzing the audio form. Right. So in the audio form, we also get the cues like as I said earlier. So we’re based on the audio. I mean, the connotations are the expressions that give us and others another set of helpful cues. But after the conversation is again transcribed, that is when our algorithm can help. So, yes, definitely our algorithm can help for using any kind of speech synthesis or for any application in call center or any voice recorder stuff. Yes, we will also add the speech part to it.
Steven Cherry Ramya, your bachelor’s degree was earned in India and your master’s in Germany before you came to the U.S. You speak five languages. Two of them are apparently Dravidian languages. I have two questions for you. Why the University of Central Florida, and what’s next?
Ramya Akula I did my master’s in Technical University of Kaiserslautern in Germany, and my master’s thesis was mainly on the visualization on social networks. And this is back in 2014. So that is when I got introduced to working on social networks. And I was so fascinated to learn about how people adapt to the changes that comes along their way, adapting the technology, how online life keeps changing.
For example, before Covid and after Covid, how we’ve moved from face-to-face to a completely virtual world. So when I was doing my master’s thesis on social networks, I was so interested in the topic. And then I worked for a while again in the industry. But then again, I wanted to come back to academics to pursue ... Get into the research field, actually to understand—rather than like developing something out there in an industry for someone. I thought maybe I could do some research and try to understand and get more knowledge about the field.
So then I was like looking at different options. One of the options was working with Ivan Garibay because he had the Darpa SocialSim project. And so it about a $6.5 million funded project. But the overall idea of the project is really fascinating. It’s looking at the human simulation, how humans behave on all these kinds of online social media networks. So when I wrote about this project and about his lab, so that was my main I think the trajectory point toward this lab and of my work.
And so this project is also part of that of that main big project. And going forward, I would want to work for a startup where I can learn because every day is like a learning process; we can learn like multiple things.
Steven Cherry It seems like a lot of this would be applicable to chatbots. Is that is that a possible direction?
Ramya Akula Chatbots? Yes, that’s one application in a question-answering system. But there is a lot more to it. So instead of just the automated way of analyzing the question and answering stuff. So it can’t be applied for multiple things like not just the online conversations, but also personal assistants, yeah. So it applies for the personal assistant as well.
Steven Cherry What a computer beat the world champion of chess. It was impressive and winning it go was more impressive. And beating the champions of Jeopardy was impressive, at least until you realized it was mostly the computer knew Wikipedia better and faster than humans. But about four years ago, a computer at Carnegie Mellon University beat some top players at poker, which required to in some sense understand bluffing. That seems impressive in a whole different way from poker and go. And this development with sarcasm seems like a similar advance.
Ramya Akula So the main advantage of having these algorithms is that, as I said, they are really good at understanding the different patterns. Right. We as a human being are limited in that sense, how much of a pro we are in a certain task. And so there is always a limitation to understanding a different pattern and learning the patterns are in fact matching the patterns. That is where we can take hold of help of the algorithms like it, like our sarcasm detector or any other machine learning algorithms, because they look at all possible combinations. And also the beauty of this, the beauty of machine learning is that so the algorithm knows when it should stop learning.
Or actually the programmer who is looking at the training lost, like when the training is like really dropping, then that’s when he would know that it’s now it’s starting to decay. Like, for example, it is all fitting on the data. So we have to stop the training. So those are those are the kind of indications for a programmer to stop training.
But after the training, then we can see how well these patterns are learned. So all the all the previous achievements by different machine learning algorithms, precisely the reinforcement learning algorithms, is that it could look at all different, I mean, the variety of combinations of winning chances. And yeah. And then like having all that data within the very last time and then learn from it. It’s like sort of most of these also had some kind of feedback loop. So from which it learns. So sometimes the programmer that helps or the human in the loop that helps the training and sometimes the machine learning train learns by itself. Yeah. So these algorithms help us better understand the patterns and we humans better understand the context.
Steven Cherry Well, Ramya, there are two kinds of people in the world, those who hate sarcasm and those who live by it. I sometimes think that no two people can have a friendship or a romance if they’re on opposite sides of that line. And I can’t think of a more exciting and terrifying prospect than a robot that understands sarcasm. Exciting because maybe I can have a real conversation someday with Siri and terrifying if it means software will soon be writing better fiction than me and not just sports records—to say nothing of the advance towards Skynet. But thanks for this completely trivial and unimportant work and for joining us today.
Ramya Akula It’s my pleasure. It was fun talking to you.
Steven Cherry We’ve been speaking with University of Central Florida PhD student Rami Akula, whose work on detecting sarcasm, a significant advance in the field of sentiment analysis, was published recently in the journal Entropy .
Fixing the Future is brought to you by IEEE Spectrum , the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.
This interview was recorded May 21st, 2021, on Adobe Audition via Zoom, and edited in Audacity. Our theme music is by Chad Crouch.
You can subscribe to Fixing the Future on Spotify, Stitcher, Apple, Google , and wherever else you get your podcasts, or listen on the Spectrum website, which also contains transcripts of all our episodes. We welcome your feedback on the web or in social media.
For Radio Spectrum, I’m Steven Cherry .