I’ve written a few times in the past about how the latest raft of massive algorithms, such as OpenAI’s GPT-3, can generate offensive text because they have been trained on pretty much the whole internet, which, as we all know, has more than its fair share of offensive material. But this is the first time that one of the companies that generate these sort of algorithms, in this case Google’s DeepMind, have admitted that this toxicity is unsolvable.
In a pre-print paper they ran some experiments to try to detoxify the language, but (unsurprisingly) found that the mitigating actions also reduced the coverage for texts about, and dialects of, marginalised groups. The researchers also compared humans’ evaluation of the toxicity of the text and the machine’s assessment and found that the machine often under-estimated how toxic its output was. The algorithm obviously doesn’t have any inherent understanding of what is toxic and what isn’t; it just has no clue what it is talking about (perhaps like the humans who write the toxic text in the first place?).
These text-generating algorithms are likely to become a normal part of our lives in the next few years, so this admission from one of the preeminent AI companies in the world is significant. Deepmind have referred the problem to Jigsaw, Google’s ‘knotty problem’ team. My advice to them is to remember the maxim that just because you can do something, doesn’t mean you should.
If you’d like to get more opinions, thoughts and ideas like this, then why don’t you sign up for our monthly newsletter, That Space Cadet Glow? Just enter your email in the box below and click on Subscribe.