A funny(?) thing happened to me recently that has me thinking about the future of social media and its impact on society and on our ability to trust what we see and hear.
You see, I received a seemingly disgusted comment on one of my YouTube videos accusing me of being an AI construct and pointing to a specific moment in the video (what he described as an unnatural facial expression) that he alleged proves his assertion.
This was a first for me. I’ve grown accustomed to the occasional comment on my channel accusing me of using an AI-generated voice, but no one (until now) has said the rest of me wasn’t real as well.
I realize my voice is distinctive, and it sometimes prompts comments, often positive and occasionally negative, on my channel. The negative ones I receive almost always question its genuineness and often suggest it has AI fingerprints. Some also find it “creepy” that I speak in a “whispering” voice in my videos.
Why My Videos Sound Different
For the record, I don’t whisper at all in my videos, although I’ve been hoarse in a few, due to illness or allergies. In fact, I consistently speak in my videos at a normal conversational volume, as if to someone sitting five or six feet away from me in a reasonably quiet room, while maintaining a fairly level tone. I do this for two reasons:
The room in which I film is an acoustical nightmare, with four walls of highly reflective glass-paneled bookcases that generate a lot of reverb. The louder my voice, the more reverb distorts the audio signal.
My microphone is a high quality model regularly used in professional film productions to capture dialogue in difficult indoor settings. It’s positioned only 18 inches away (just above and in front of my head), allowing me to speak in a casual voice. However, because of the mic’s sensitivity, I can’t make loud exclamations without clipping the signal and adding another kind of audio distortion.
The natural deepness of my voice exacerbates the problem, because lower frequencies carry far more waveform energy at a given loudness than higher ones, making them harder to tame through audio compression during the editing process. (It’s a reason why, when sound treating a room to dampen reflections, a blanket can be used to muffle high-pitched sounds, but deep tones require thick, dense mass to absorb their energy.)
Combined, those factors create a noticeable contrast with many other YouTube channels whose recording setups require them to speak loudly at the camera (some practically shout) to get a good signal-to-noise ratio in their audio recordings. And unlike me, channels with higher-pitched voices can speak forcefully and excitedly without the same risk of signal clipping. That’s simple physics. Nevertheless, the contrast leads people to accuse me of speaking in an unnaturally soft voice (an AI voice?) when I’m actually using a normal, conversational volume.
The New Normal?
Have YouTube and other social media sites such as TikTok altered viewers’ perceptions of what constitutes “normal,” where loud, excitable and performatively cheerful or aggressive speech patterns by people clamoring for attention are viewed as more “real?” Are content creators (and their audiences) being conditioned to express themselves in an amplified manner, as if their credibility is enhanced by speaking loudly, quickly and emphatically? Are bombastic cable news, sports and podcast commentators who prioritize communication style over substance contributing to this trend?
It certainly appears as if audience perceptions and expectations of trustworthiness are shifting.
The Danger of Normalizing AI-Generated Content
The calm, measured nature of my speaking voice (in contrast to the loud and rapid-fire delivery common on other channels) seems to be at the root of comments I’ve received accusing me of using an AI-generated one.
Such accusations are ridiculous, but they highlight the potential danger social media platforms like YouTube pose if they cause viewers to lose perspective on what is real.
It’s my own voice. I’ve had it since hitting puberty in my teens, and I inherited it from my dad. (In my youth, while living with my parents, I frequently was mistaken for him when I answered the phone.)
According to many of the people who post those accusatory comments on my videos, only AI voices sound the way I do, which they think proves my falseness. I try to respond to that logical fallacy with patient humor. After all, the real question isn’t “Why do I sound like AI?” Instead, it’s “Why have AI developers designed their voices to sound like mine?” I’ve had my voice far longer than AI has been around.
Disturbingly, as I mentioned at the top of this post, someone recently went beyond commenting on my voice. They emphatically declared my visual image to be an AI-generated avatar (à la Max Headroom), and that the proof was self-evident in the video itself.
It alarmed me that someone might seriously believe I wasn’t real. Absurdly, though, the proof of my existence can be found in the video itself – in its flaws! If I truly were using AI to generate video footage of me, I’d have had it whiten and straighten my teeth, thicken my hair, and reduce the prominent gray in my whiskers. [lol + smh]
In fact, I don’t use generative AI at all . . . for anything. I enjoy doing things the old-fashioned way – through my own (often time-consuming) efforts.
Is that misguided commenter an example of what happens when people are so inundated with AI-generated content that they can’t distinguish the real from the imitation?
If so, how will socially isolated or inexperienced viewers learn to recognize normal human expressions, behaviors and social cues? Will the prevalence of exaggerated online communication styles warp their expectations and comprehension?
How can ‘real’ content creators (like me) respond to preserve the trust, credibility and unique identities they’ve established with audiences when faced with increasingly powerful and ubiquitous AI tools?
And how many creators will embrace AI as the path of least resistance (or the path that maximizes their exposure and earnings)?
Companies now exist (e.g., Delphi, Hour One, and RealClones) that use AI to create virtual clones of people, and they’re marketing their services to content creators who want to automate their video/podcast production and audience engagement using digital avatars and chatbots that provide the appearance (but not the reality) of being them. Hour One has predicted (perhaps self-interestedly) that 90% of content will be synthetically generated within the next five to seven years. The implications of that for the loss of societal trust and cohesion (and for the heightened ease of societal manipulation) are staggering and terrifying.
I don’t use those digital cloning services, nor will I ever.
It really bothers me that Google is using my videos (and those of every other YouTube channel, under the mandatory terms of its standard user agreement) to train its AI engines. I can imagine a scenario in the not-too-distant future in which Google uses its rapidly evolving AI capabilities to create artificial channels to compete with real ones, enabling it to steer more advertising revenue to itself rather than share it with human content creators.
I’m dreading the day I might actually encounter an AI-generated channel patterned after my real one. It seems almost inevitable at this point, but I hope I’m wrong.