Handle With Care: The Art & Science of Manually Classifying Data

At Kinzen we are constantly trying to find the right balance between scalable solutions to improve the quality of online information and the necessity for a considered approach to the data that goes into building those solutions.

Our systems help partners detect disinformation and hate speech by pairing our editorial experts in different regions around the world with our product and engineering team.

In the middle sits a vital component to ensuring that our data is best-in-class and vetted with consideration and thoroughness: our Human-in-the-Loop (HITL) team.

We know that the complexity of language, the value of expert knowledge, and the importance of quality data (over quantity of data) are key to building systems that help partners protect their communities.

That’s why every single piece of data is manually reviewed, and we are constantly considering and reviewing the outputs of our models.

Kinzen’s Knowledge Graph contains detailed information about the nature, usage, and threat of each term, phrase, or hashtag associated with disinformation campaigns.

As we continue to learn, we have identified several core principles that guide our work in refining data for our partners. These include the importance of always testing our assumptions, understanding how to navigate the tricky world of multilingual databases, and always being aware of the evolution of language and content.

Test Your Assumptions

Everyone carries their own internalized biases and assumptions about how language is used in digital ecosystems.

Organizations such as the Algorithmic Justice League have highlighted some of the societal inequities and harms that can occur when biases are embedded in tech.

At Kinzen, we recognize that our personal and professional experiences shape our instinctive perceptions of a particular term, phrase, or hashtag. To ensure we counter our biases and that we are auditing language from as neutral a perspective as possible we have a core philosophy of testing our assumptions.

In our review process we verify how different platforms, spaces, and content types shape the meaning, either explicitly or implicitly, of content.

Our data is always seen by multiple pairs of eyes before it is approved, and we have added further checks and balances to our processes, including open review sessions where people from across the company can observe and help improve our work.

Consider the Evolution of Language (From Fringe to Mainstream)

One critical facet of our editorial experts’ data gathering is the ability to intercept emergent disinformation at the earliest point possible.

First Draft has described the “Trumpet of Amplification”, a useful guide to understanding the increasing oxygen of engagement that disinformation can receive as it spreads across different types of media.

Our researchers focus on the digital locations most used in the spread of localized, harmful campaigns in individual markets.

In an increasingly fragmented space of social platforms, where each month there are new trending alternative platforms, the agility of our editorial network in researching the right spaces is a key asset for Kinzen and the ongoing development of our Knowledge Graph.

Another point of consideration when dealing with hate speech and narratives of disinformation targeting groups with protected characteristics is the reclamation of abusive language by those groups. In particular, a word like “Queer” has been reclaimed in a positive sense by the LGBTQI* community online and offline in recent years.

Consider the Evolution of Content (From Text to Video/Audio/Live)

Protecting online communities from harm has become an increasingly difficult proposition for platforms as the number of content types have increased, with the challenge of written text an entirely different one to video, live chat, or community-driven audio spaces.

At Kinzen our experts track the spread and evolution of campaigns of harmful information from text to audio and back again.

We capture at different points in a campaign’s life cycle the similarities and nuanced differences between how an anti-vaccine phrase like “gene altering vaccines” might appear in a video or audio context, or how it could be represented as a hashtag, or with textual variations.

Conclusion: The Value of Kinzen’s Approach

Purely automated systems of content moderation have been shown to be ineffective solutions to the crisis of information quality online, at least until now.

The ‘infodemic’ has helped undermine public trust in institutions, increased digital polarization and fostered an environment of hate in many digital spaces.

It is clear that if we want to substantially change the quality of information we are exposed to online, we need a new approach.

At Kinzen, we have learned from the mistakes of the past by taking expertly picked, hand-curated data and scaling the value of that information in a responsible and controlled manner.

We solve difficult problems for partners by using this ground-up approach and HITL team to provide accurate, informed and actionable information, all with the goal of empowering companies to protect their most valuable asset: their community.

Handle With Care: The Art & Science of Manually Classifying Data

Test Your Assumptions

Consider the Evolution of Language (From Fringe to Mainstream)

Consider the Evolution of Content (From Text to Video/Audio/Live)

Conclusion: The Value of Kinzen’s Approach

What to read next