The UK government has unveiled a tool it says can accurately detect jihadist content and block it from being viewed.
Home Secretary Amber Rudd told the BBC she would not rule out forcing technology companies to use it by law. Rudd is visiting the
US to meet tech companies to discuss the idea, as well as other efforts to tackle extremism.
The government provided £600,000 of public funds towards the creation of the tool by an artificial intelligence company based in London.
Thousands
of hours of content posted by the Islamic State group was run past the tool, in order to train it to automatically spot extremist material.
ASI Data Science said the software can be configured to detect 94% of IS video uploads. Anything the
software identifies as potential IS material would be flagged up for a human decision to be taken.
The company said it typically flagged 0.005% of non-IS video uploads. But this figure is meaningless without an indication of how many contained any
content that have any connection with jihadis.
In London, reporters were given an off-the-record briefing detailing how ASI's software worked, but were asked not to share its precise methodology. However, in simple terms, it is an algorithm that
draws on characteristics typical of IS and its online activity.
It sounds like the tool is more about analysing data about the uploading account, geographical origin, time of day, name of poster etc rather than analysing the video itself.
Comment: Even extremist takedowns require accountability
15th February 2018. See article from openrightsgroup.org
Can extremist material be identified at 99.99% certainty as Amber Rudd claims today? And how does she intend to ensure that there is legal accountability for content removal?
The Government is very keen to ensure that
extremist material is removed from private platforms, like Facebook, Twitter and Youtube. It has urged use of machine learning and algorithmic identification by the companies, and threatened fines for failing to remove content swiftly.
Today Amber Rudd claims to have developed a tool to identify extremist content, based on a database of known material. Such tools can have a role to play in identifying unwanted material, but we need to understand that there are some
important caveats to what these tools are doing, with implications about how they are used, particularly around accountability. We list these below.
Before we proceed, we should also recognise that this is often about computers
(bots) posting vast volumes of material with a very small audience. Amber Rudd's new machine may then potentially clean some of it up. It is in many ways a propaganda battle between extremists claiming to be internet savvy and exaggerating their impact,
while our own government claims that they are going to clean up the internet. Both sides benefit from the apparent conflict.
The real world impact of all this activity may not be as great as is being claimed. We should be given
much more information about what exactly is being posted and removed. For instance the UK police remove over 100,000 pieces of extremist content by notice to companies: we currently get just this headline figure only. We know nothing more about these
takedowns. They might have never been viewed, except by the police, or they might have been very influential.
The results of the government's' campaign to remove extremist material may be to push them towards more private or
censor-proof platforms. That may impact the ability of the authorities to surveil criminals and to remove material in the future. We may regret chasing extremists off major platforms, where their activities are in full view and easily used to identify
activity and actors.
Whatever the wisdom of proceeding down this path, we need to be worried about the unwanted consequences of machine takedowns. Firstly, we are pushing companies to be the judges of legal and illegal. Secondly,
all systems make mistakes and require accountability for them; mistakes need to be minimised, but also rectified.
Here is our list of questions that need to be resolved.
1 What really is the accuracy of
this system?
Small error rates translate into very large numbers of errors at scale. We see this with more general internet filters in the UK, where our blocked.org.uk project regularly uncovers and reports errors.
How are the accuracy rates determined? Is there any external review of its decisions?
The government appears to recognise the technology has limitations. In order to claim a high accuracy rate, they say at
least 6% of extremist video content has to be missed. On large platforms that would be a great deal of material needing human review. The government's own tool shows the limitations of their prior demands that technology "solve" this problem.
Islamic extremists are operating rather like spammers when they post their material. Just like spammers, their techniques change to avoid filtering. The system will need constant updating to keep a given level of accuracy.
2 Machines are not determining meaning
Machines can only attempt to pattern match, with the assumption that content and form imply purpose and meaning. This explains how errors can occur, particularly in
missing new material.
3 Context is everything
The same content can, in different circumstances, be legal or illegal. The law defines extremist material as promoting or glorifying terrorism. This is a
vague concept. The same underlying material, with small changes, can become news, satire or commentary. Machines cannot easily determine the difference.
4 The learning is only as good as the underlying material
The underlying database is used to train machines to pattern match. Therefore the quality of the initial database is very important. It is unclear how the material in the database has been deemed illegal, but it is likely that these
are police determinations rather than legal ones, meaning that inaccuracies or biases in police assumptions will be repeated in any machine learning.
5 Machines are making no legal judgment
The
machines are not making a legal determination. This means a company's decision to act on what the machine says is absent of clear knowledge. At the very least, if material is "machine determined" to be illegal, the poster, and users who attempt
to see the material, need to be told that a machine determination has been made.
6 Humans and courts need to be able to review complaints
Anyone who posts material must be able to get human review,
and recourse to courts if necessary.
7 Whose decision is this exactly?
The government wants small companies to use the database to identify and remove material. If material is incorrectly removed,
perhaps appealed, who is responsible for reviewing any mistake?
It may be too complicated for the small company. Since it is the database product making the mistake, the designers need to act to correct it so that it is less
likely to be repeated elsewhere.
If the government want people to use their tool, there is a strong case that the government should review mistakes and ensure that there is an independent appeals process.
8 How do we know about errors?
Any takedown system tends towards overzealous takedowns. We hope the identification system is built for accuracy and prefers to miss material rather than remove the wrong things, however
errors will often go unreported. There are strong incentives for legitimate posters of news, commentary, or satire to simply accept the removal of their content. To complain about a takedown would take serious nerve, given that you risk being flagged as
a terrorist sympathiser, or perhaps having to enter formal legal proceedings.
We need a much stronger conversation about the accountability of these systems. So far, in every context, this is a question the government has
ignored. If this is a fight for the rule of law and against tyranny, then we must not create arbitrary, unaccountable, extra-legal censorship systems.