Sub-editing the machine: the new role for journalists in the age of AI

Long live the sub-editor (or what our US friends call copy-editor)! Every journalist knows that behind every good newsroom is the trusted hand of a sterling sub. Sadly, they are often the first in line when cuts come.

But subs do much more than fix spelling mistakes. They fact check, they bring institutional knowledge, they carve out a unique organisational voice.

So long live the sub! But what role is there for subs in the era of artificial intelligence (AI)? 

A new era and old skills

I believe a grand leap in the evolution of the sub-editor is on the cards, or potentially the emergence of a new type of algorithmic editor. They need to be comfortable with technology and its possibilities while being aware of its limitations. Just as the journalist needs excellent communication skills to work well with other colleagues and sources, this new journalist working with AI will need to communicate fluently with data models at the core of machine learning. 

The careful attention to detail traditionally associated with sub-editors has never been more important as we build out new technologies. In this era, where innovators must consider their impact on humanity, the journalist has a new and expanding role to prime the pumps of artificially intelligent systems.

The sub-editor is the sapien in the machine, one half of what we at Kinzen like to call “the human algorithm.” 


It has struck me in building technology for publishers and platforms that this interpretive skill is invaluable in the age of AI. And the good news is that there is much affinity between purpose-driven engineers and journalists based on a common understanding of what's gone wrong with a previous wave of technology.

However, as of yet, the machine doesn’t understand language as well as humans. And it certainly doesn’t understand it as well as the tiny subset of humanity known as sub-editors.

Examples of our work with NLP

The machine left on its own makes mistakes that humans find baffling. In building out a uniquely journalistic approach to Natural Language Processing (NLP) classifications for understanding quality information and misinformation, we’ve come up with some challenges. 

Sometimes the system makes mistakes that could confuse a reader. When the machine encountered the words “Brooklyn Beckham”, there was a job to be done in separating out the borough in New York, the second name of a famous footballer, and the unique individual that goes by that name. 

Other times, the results might be correct but still require an editor’s guiding hand. A machine trained to generate tags automatically can produce too broad a range of labels for the needs of a human. A story about Liverpool Football Club received 46 different Liverpool-related tags (for example Liverpool City, Liverpool FC, The Greater Liverpool Area, People From Liverpool) where only one was required. 

Equally the algorithm might identify an article as referencing a specific "Book" but is also more broadly about "Books". To the machine the distinction might be clear but the human reader doesn't gain anything from seeing both labels. We curate the automated tagging process so there is a feedback loop between the machine and editor. 

As the human in the machine, I watch the new tags coming through our systems every day and perform an editorial function on them. Automated tagging is a great step forward but, for example, our work with a local publisher in Belfast revealed a number of sensitive political topics  around a hotly-contested date in the political calendar which required careful handling. 

In our work with publishers we realised the value from this work could be applied in wider circumstances. There was significant potential for us to use this hybrid approach - editor and machine - to flag potential misinformation. Now I work with platforms to help identify campaigns of deception. We use our editorial approach with NLP to identify the key narratives that are being disseminated on a daily basis by bad actors. 

Conclusion

These examples are just a sample of what we’re doing, and of what other possibilities are emerging. In this new era of engaged journalism and disinformation, editorial expertise and sophisticated NLP are a necessity. We’re building a Rosetta Stone for publishers who care about promoting the best quality information, and for platforms who need to be smarter about spotting dangerous disinformation. 

I believe it is a myth that new innovations don’t need editorial oversight. If you’re going to build automated content curation without a sub-editor, you’re taking a needless risk. Just as editors need better algorithms, algorithms need better editors. 

If there’s one thing to take from this piece it’s that we as journalists continue to need to apply the sub-editor’s trademark skepticism and judgement to new technologies. But we must also embrace the possibilities enabled by this moment to get involved in improving them. If we leave the job to a handful of Silicon Valley companies to do it for us then we can’t complain when the unintended consequences inevitably happen. Not again. 

The sub is dead? Long live the sub and long live the human algorithm!