You Can't Always Believe Your Eyes: Where AI Falls Flat
From the dawn of time, humans have tried to understand each other by looking at their faces and listening to their voices. It has led to the development of the practice of Physiognomy, which means assessing a person’s character or personality from their outer appearance—especially the face. While the concept is great, the biggest peril is when someone tries to use this assessment to pass judgment on a person.
Let’s take a fun example: looking at this picture, can you tell what our intern is thinking? More importantly, was he lying to me when he claimed that he was confident about taking an exam? (By the way, thank you Elvin for letting me use this image!)
They are Lying – I Can See It!
Many people claim they can tell what someone is thinking just by analyzing their facial expressions. You’ve seen it before in movies; someone is being interrogated for a crime and the interrogator says “they’re lying – I can see it!” Countless websites publish information like this:
- Looking Down to Their Right = Someone creating a feeling or sensory memory.
- Looking Down to Their Left = Someone talking to themself.
- Looking sideways can demonstrate doubt, reluctance to commit, suspicion, or contempt.
I am sure you’ve also seen countless articles or Youtube videos where people analyze the veracity of politicians or personalities being interviewed. And to some extent, we all have the same tendency of judging based on what we see. We all rush to judgment based on limited information, myself included.
So back to our intern, Elvin, and his exam.
At that particular moment in the video, and looking at the intern’s eye movement, I will confess that I jumped to conclusion and assumed that given the facial expression, the exam was going to be an utter failure. Thankfully I did not act on my faulty analysis and jump into action.
Why? Because I heard a click. And that click made me realize how wrong I was.
But first, let’s take a step back.
Why do people even care about analyzing someone’s face?
It really comes down to leverage and information gathering. Whether you are an investigator, or poker player, or into studying behaviors, humans are always trying to find out information about other human beings by trying to gaze inside their brains.
Analyzing behaviors in a digital world
Now that we are moving into a digital life, more and more people are trying to take these “skills” (I use the term loosely) and apply them to analyzing behaviors seen on a webcam or video.
Facial recognition and biometrics are booming in many verticals. While we tackle this question in the Mental Health realm, other industries are trying to leverage this new source of human insights. In the ads realm, marketers are investigating how to use facial information to decide which commercial content to show a prospect.
For example, a CCTV camera could pick up your face before entering a store and then, based on what it sees, display a series of ads on a digital billboard. In the travel industry, certain airlines like JetBlue and Delta are not testing facial recognition to board planes.
This approach is fraught with perils. When taken to the next level of automation, (let’s have the AI do the analysis), it opens the door to privacy and ethics concerns, as well as bias and mis-categorizations.
For today, let’s focus solely on the technology limitations. Doing a true and thorough analysis of someone’s face is far from trivial. Missing simple steps is the gateway to much bigger issues. Let’s look at a few examples of the challenges that arise in a digital world:
First you need to worry about the source of the image. All cameras and recording devices have a mirror effect. You can easily test this by opening Zoom and changing the settings. All of a sudden, the image will flip left and right.
This means that if you are analyzing someone’s reactions in real time, your brain will have to do the mental switch. Except that it’s not always the case!
Some cameras and devices will not have a mirror effect. Or they may be off.
Regardless of whether you are using your brain or an AI, you need some strategies to figure out what the right orientation is. Check out many of the sites talking about physiognomy and you will see that the difference in meaning between left and right can be quite important.
Recording quality will also play tricks on the analysis with frames-per-seconds raising challenges. From simple smartphones to top of the line digital cameras, each device has a different, and fluctuating, frame-rate recording. Some will give you 24 frames per seconds. Others a lot more.
Yet, something as subtle as a blink only takes 8 frames max per second. So once again, what you think you are seeing could be different from what is really happening and that’s even before you factor in the recording rate, the feed quality, and other movements.
In a digital world, it’s easy to miss many external clues about the subject. In a physical setting where the person is close to you, you can pick up much more information. However when dealing with digital content, your brain becomes so focused on one thing (the facial expression) it is also shutting down other external clues. If you want to see a very telling example, check out this famous video from Daniel Simons and Christopher Chabris.
Coming back to our experiment with Elvin, these challenges collided and created the perfect storm.
A mouse (click) gives a clue
Wondering what was happening and which vile lies were being spread by our intern? Prepare to be sorely disappointed!
As with most things in life, the truth is a lot less thrilling than the stories we build up in our heads. Our intern was merely trying to locate his mouse to end the recording.
The click I heard was the mouse being used. All the intern was doing was taking a side glance to locate the mouse and its screen location.
The intern was not trying to hide anything about his exam. He actually aced it, but was merely trying to stop a recording!
So, what can you do to not fall into the same pitfalls and avoid drawing the wrong conclusions?
Obviously not judging a book by its cover is a good place to start! That’s why when it comes to new technologies, especially ones that focus on human understanding, it is always key to build safeguards into an analysis, address bias, and be extremely careful about distinguishing analysis from reaction. To do so, one of our approaches at Okaya is spending a lot of time teaching our AI to look for clues and markers and then balance them with other contextual information.