Regulation Won’t Cure Disinformation - But Some of These Innovations Might

This time last year, the World Health Organisation began using the word Infodemic to describe the perfect storm of lies and conspiracies tracking the spread of COVID-19.

In retrospect, it was a game-changing moment of clarity.

Before the Infodemic, the conversation about disinformation was framed by the narrow reality of American politics and the empty slogan ‘F*ake News’. Few of us knew exactly how our lives were directly affected by crazy talk on the internet.

Infodemic brought the real world harm of organised deception into our lived experience. Like the virus, you didn’t have to be infected to feel its impact. Disinformation was no longer a dirty political trick, but a public health crisis.

‘Infodemic’ catalysed conversations about global oversight of the technology. But it also exposed a knowledge gap. Political leaders find it hard to differentiate the technology we need to regulate, and the innovation we need to accelerate. Without innovation, regulation is bound to fail. If COVID taught us anything it was the primary importance of a ‘moonshot’ mentality in medical science.

The innovation we need to rein in the Infodemic is at the intersection of human and machine, in the space created by the relentless advance of artificial intelligence.

That’s where I work, with my teammates at Kinzen. We gather data and build machine learning models that help protect online conversations and communities from disinformation. Our job is to help trust and safety professionals stay ahead of online threats, drawing on our experiences from the earliest days of online disinformation.

From our vantage point, here’s the key trends we believe the decision-makers need to get their heads around:

1. This Is A Pandemic Not a War

A full decade before the Infodemic, the Arab Spring was the petri-dish for the disinformation virus. Social media fuelled real world protest but also provided a new battlefield for dictators and dissidents to fight an information war.

Those fighting back against disinformation embraced the language of conflict. We spoke of the ‘weaponisation’ of social media and embraced military jargon like OSINT to describe our research techniques. But faced with a second wave of viral propaganda, the language of a never-ending war is dangerously misleading.

Extremists and autocrats have mastered a new form of “censorship through noise”, flooding our public spaces with an avalanche of deception designed to weaken the immune system of online communities.

The Infodemic describes the peer-to-peer transmission of genetically-engineered lies, incubated in the leaderless meme culture of the internet fringe, mutating across language, platform and format, and spread with silent speed by decentralised networks of superspreaders, often autonomous of the political elites they empower.

2. Automation Is Not The Vaccine

Technologists have come to rely on AI as a Swiss Army knife for the wicked problems of online risks like spam, copyright infringement, porn and terrorist networks. Get humans to give you enough data to train the machine and the machine will eventually replace the human.

However, disinformation is not just another online harm. It is a biological attack on an ecosystem weakened by emotional overload. Every harmful and hateful narrative on the internet shares the same DNA; a genetic code designed to exploit vulnerabilities in human conversation without detection.

Every day, Kinzen’s expert network sees the healthiest of online conversations purposefully infected with language designed to evade automated filters. Our knowledge graph contains an ever-increasing number of permutations of the word vaccine - from V@cc!n@te to faxxination - that have been modified into dog whistles, audible only to the anti-vaccine superspreaders.

Disinformation is not a bomb waiting to be defused. It is a human infection. Machine learning helps us better understand patterns of harmful and hateful speech. But relying solely on automated filters is like fighting a virus with nothing but an X-ray.

3. Put The Human In The Loop

If we approach disinformation as a war, AI will always be a weapon. If we define disinformation as a public health crisis, machine learning can be a human superpower.

The next generation of solutions to disinformation revolves around the phrase “Human In The Loop” (HITL). HITL systems are a hybrid of human skill and machine scale. They harness artificial intelligence to process very large amounts of data while relying on human intelligence to perform very complex tasks.

With HITL systems nothing happens without a human command to proceed with a suggestion, recommendation or action. This form of AI is common when the stakes are at their highest, such as diagnosing medical illness.

When applied to disinformation, HITL is a fundamental shift in the focus of innovation from machines replacing humans, to exponentially increasing the power and reach of human judgment.

HITL also lays the groundwork for increased transparency and accountability in content moderation, and potentially more effective regulation of AI.

4. Open Source Is A Revolutionary Force

Can human solutions to disinformation ever scale? When there are 500 hours of video posted on YouTube every minute, 500 million tweets a day and 17,000 new podcasts every week? In the 6,500 active languages of human expression?

This is where machine learning is most impactful: accelerating the human’s ability to find signal in a world of noise.

It’s hard to overstate the advances made in HITL systems in the past year because of open source innovation emerging from big tech platforms, particularly in the field of Natural Language Processing.

A small engineering team like Kinzen’s can transcribe, translate and understand vast amounts of audio, video, text, and even text in video, thanks in large measure to open source initiatives in Automatic Speech Recognition (ASR) released within the past year.

Training machine learning models in comprehension of this vast body of data is only possible for teams of independent innovators thanks to the NLP transformers developed and refined by companies like Google and Microsoft.

The potential for innovation based on these open source models is impossible to overstate, allowing small, dedicated teams like Kinzen to focus on its core strength: the accuracy of the data and intelligence it uses to build machine learning models.

5. Take Out The Garbage

The old cliche, “Garbage in, Garbage out”, expresses a core weakness in machine learning systems. Fully automated filters require vast quantities of data, full to the brim of inaccuracy and bias.

The rapid advancement in NLP allows for innovation to be defined by quality, rather than quantity, of data. The teams driving solutions to the Infodemic see value in the scarcity of insight they provide, rather than the abundance of data they can hoover up.

We tend to think of artificial intelligence as a God-like entity. But every AI is a child. It mirrors the flaws of the humans who taught it. Teams designing HITL models for disinformation need to be conscious of their unconscious biases in every label and classification they attach to data.

Oversight cannot be an afterthought when it comes to data collection. This is where innovation and regulation need to be in lockstep.

6. Oversight Without Expertise Is Overrated

The right kind of regulation will limit freedom of reach rather than freedom of speech (to borrow Renee DiResta’s elegant formulation). It will focus on truly independent auditing of the algorithms spreading information and transparency about the execution of content moderation policies.

Just as we turned to health experts during the pandemic, the Infodemic will require new independent institutions at national and supranational levels which derive their authority from their understanding of the impact of technology. Think WHO and FDC rather than a Supreme Court.

As former Guardian editor Alan Rusbridger has said of his experience on the Facebook Oversight Board, to understand how the machine works may “take many sessions with coders talking very slowly so that we understand them”.

7. Moderation For The Citizen, By The Citizen

To “flatten the curve” of disinformation, we need to empower online communities to protect themselves from the superspreaders.

While every platform is investing much-needed resources into Trust and Safety teams, some are following the lead of Wikipedia and Reddit and testing the decentralisation of content moderation to digital creators and active citizens.

By far the most interesting current example is Twitter’s Birdwatch project. Particularly impressive is its ambition to develop reputation and consensus systems within communities, and its commitment to expose the algorithms powering any new credibility systems to public scrutiny.

While some innovators are fixing a “democratic deficit” inside existing platforms, others like Ethan Zuckerman and Civic Signals are creating the architecture of a new generation of online spaces. Their guiding principle is perhaps best summed up by digital policy group Demos as an ambition to “turn digital subjects into digital citizens”.

8. It Gets Worse Before It Gets Better ‍

With every step towards a vaccine, the virus gathers new forms of resistance.

Every new platform or online behaviour incubates a new strain of disinformation.

As COVID increased our dependence on online tools, disinformation variants spread into every chat room, every comment section, every marketplace, every gaming platform and every fitness community.

Innovators seeking solutions to the Infodemic are paying particular attention to two new frontiers of disinformation: live conversation and synthetic content.

The rise of Clubhouse and its many clones requires innovators to develop new ways of building real-time content moderation systems that alert hosts and listeners to the presence of disinformation superspreaders in live chat.

The creation of fake imagery through generative AI is a golden opportunity for ‘anti-innovators’ to trigger the next wave of the Infodemic. The digital research agency Graphika recently exposed an attempt by a Chinese propaganda operation to evade detection using an AI technique known as GANs (Generative Adversarial Networks) to create fake human faces for a network of Twitter accounts.

Regulation can’t keep pace with this kind of ground-breaking disinformation, and there may be no fully automated solution to ‘deepfakes’; humans can’t rely on machines to detect other machines who mimic humans.

This is what makes innovations like Project Origin so interesting. This collaboration between big tech and global news organisations is developing a system to encode quality content with data which permanently records its human provenance. This might not be enough to halt a ‘deep fake’ AI. But it is an innovation which might just boost the long-term immunity of our online conversations.

There is a critical role for machine learning in flattening the curve of disinformation. But we need algorithms designed to build trust through transparency. We need innovation that amplifies human expertise and elevates our sense of personal agency. Ultimately, we need to stop trying to harness the mythical power of a God-like technology, and start investing in superpowers for digital citizens.