We Can’t Improve Content Moderation if we Don’t Have the Right Metrics

Content moderation often boils down to posts taken down and users banned. But are these the right numbers to be paying attention to? In the first blog post of a two-part special, Nick Sainsbury, Kinzen’s Head of Product, explains the risk of focusing on ‘inward-looking metrics’ and why it's time for better ones.

Around the world, the question of how content moderation policies are created and enforced is a hot topic. The smartest lawmakers from Canada to the United Kingdom are trying to answer the question of who gets to make the rules and what responsibilities they have to the general public. But, amidst all the legal definitions and political threats, there is little discussion about how we measure whether the changes in policy and their enforcement worked. 

Focusing on the right metrics is not a nice-to-have; it is essential to being able to operate platforms at scale. Several, including Facebook, have even done independent audits about how they measure the efficacy of moderation. They understand that optimising for the wrong number can lead to harmful secondary consequences.

They also know, to coin a quote misattributed to Einstein, that not everything that can be counted counts and not everything that counts can be counted. For example, the fact we know how many harmful pieces of content were taken down from a platform doesn’t mean we know the thing we care about — how much harm that content led to. Right now, the numbers that Trust and Safety teams have to measure the impact of their work are simple and process-driven.

At Kinzen, we have been considering what the right key performance indicators (KPIs) are to measure the job of protecting communities from disinformation and dangerous content. And we think there’s a case for more ‘outward looking’ metrics. 

The majority of metrics are inward-facing 

Having worked on customer support solutions in the past, I’ve always found the operations of content moderation teams pretty similar to that of customer support teams.

Customer support was long seen as a cost of doing business that should be limited and reduced. Metrics were operational and cost-focused. They didn’t tell an effective story about how important customer support was to the retention and expansion of customers. That changed with Net Promoter Score (NPS), which helped transform how businesses view their customer support organisation. Excellent customer support is now a strategic pillar for many successful and growing companies. 

As we know, there is no shortage of important metrics for content moderation teams to optimise. However, when you take a closer look, it’s clear that they are mostly inward-facing and operational:

  • How long is it taking individual moderators to take action?
  • How many reports can one person moderate in a day?
  • How consistent are people in making decisions and implementing specific policies?
  • How many decisions are subsequently overturned?
  • How many pieces of content have been removed/labeled/taken out of recommendation systems?
  • How is volume changing over time based on time of day/location/language?

The question is: Is there a metric or set of metrics that can help make the often invisible impact of policy creation and moderation visible in the same way NPS has done for customer support? 

Is there a way to measure the true impact of Trust and Safety teams and their work?

Metrics that focus on reducing exposure to harmful content

Social media networks appear to be taking a ‘better measurement, better management’ approach with several platforms recently introducing new ways of measuring removal and reducing harmful content. 

YouTube published a blog post in April 2021 detailing how they optimise their operations around a metric called Violative View Rate (VVR) This approach takes a sample of all video views in a period and then classifies them as violative (i.e. goes against policies) or not. A violative view rate per 10,000 is calculated as a percentage and used over time to assess whether exposure to policy violating content is going up or down. 

A great metric encourages the right behaviours and what is interesting about VVR is that it isn’t purely about removal. Instead, it focuses on the viewer and their experience. It also leaves open the possibility that the largest improvement in VVR might come from changing how the recommendation system handles violative content, rather than any moderation process. As a measure, it aligns nicely with the mission. 

However, that’s not to say VVR is perfect. YouTube’s policies often change, as do how they are enforced. It means that the VVR from two years ago might not be comparable with the VVR today. 

Several platforms, including TikTok, also focus on ‘proactive’ action, that is performing a moderation action before a user report has been submitted on a piece of content. This metric is useful because it aligns different units and creates a shared responsibility for reducing harm. Although a score like this can lack transparency and takes additional effort to align different stakeholders, we know from our own experience at Kinzen that it results in better policies and higher accuracy.

Although there is no single consensus about a ‘golden metric’, these combined scores provide a more robust method for measuring the prevention of abuse and hate online and are generalisable to almost any social media or content hosting platform. 


Operational metrics measure the ‘how’. Measuring policy violation provides insights into the ‘what’. These metrics fail to measure whether the activities are achieving the ‘why’ of trust and safety or content moderation; which might be best described as ‘to safeguard users while promoting an inclusive environment’. 

In Part 2, Nick will be exploring how the jobs-to-be-done framework could usefully complement existing ways of thinking about moderation metrics.

What to read next