The Case for Artificial Intelligence in Communications Surveillance
Decoding the communications data deluge
In capital markets, communications surveillance forms a key part of regulatory monitoring. Communications surveillance is the practice of monitoring and analysing communications between individuals for a specific purpose. By 2025, investment in surveillance is predicted to be 13.5bn USD globally.
When it comes to communications, it is essential for financial firms to meet jurisdictional surveillance requirements in order to ward off the threat of regulators and ensure ethical trading practices are maintained. In February 2024, the US Securities and Exchange Commission ordered 16 financial institutions to pay fines totalling more than 81 million USD for failing to maintain and preserve electronic communications.
However, communications surveillance must cover vast quantities of data held across several diverse formats that move across varying channels of communication, where humans may struggle to identify certain patterns. While this identification can be done manually with the help of a simple database query, this practice is becoming increasingly difficult to perform due to the sheer variety and quantity of communication data available at a given time.
As a result, some financial regulators and firms such as the UK’s Financial Conduct Authority and Nasdaq are increasingly using Artificial Intelligence, specifically the subsets of machine learning (ML) and natural language processing (NLP), to be able to fulfil regulatory requirements and overcome limitations that come with relying on humans for surveillance.
ML is a type of artificial intelligence that allows machines to learn from data without being explicitly programmed. It does this by optimising model parameters (i.e. internal variables) through calculations, such that the model’s behaviour reflects the data or experience.
NLP is a field within AI that uses ML techniques, enabling machines to understand the way humans speak and write, so that they can analyse it and, in some cases, respond. NLP is commonly used in decoding and assigning meaning to human linguistic behaviours for other purposes. As shown below, there are multiple methods of NLP, such as sentiment analysis, speech recognition, part of speech tagging, machine translation, optical character recognition, semantic search, natural language generation and affective computing.
Sentiment Analysis - Determining what kind of emotion the piece of text is trying to convey.
Speech recognition - Distinguishing speech from other types of audio.
Part of Speech Tagging - Recognising individual words and the context they are used in to get the correct understanding of a piece of text.
Machine Translation - Translating from one language to another.
Optical character recognition - Recognising and extracting text from a non-standard format like an image and making it editable.
Semantic search - Using context and intelligent learning to provide better results.
Natural language generation - Taking structured data and transforming it into sentences that mimic human sounding speech.
Affective computing - Analyses emotions based on the data fed to it such as speech, faces, body physiology before returning a response.
The use of NLP is essential for effective and automated communications surveillance. Without NLP, surveillance teams can only survey firm-wide communications in an extremely basic and inefficient way. If NLP is employed, surveillance teams can search and analyse voice recordings with minimal human intervention, significantly reducing costs and increasing efficacy, and accurately analyse and search a far greater volume of data in a much shorter timeframe compared to traditional database systems.
Using NLP technology, surveillance teams can automatically transcribe voice recordings with a high degree of accuracy to filter out irrelevant text and to perform enhanced lexicographical searching over and above what can be done with simple lexicons by recognising abbreviations, dialect, misspellings, grammatical errors, common slang, jargon and parlance. NLP can also be used to automate the reduction of long-form text to concise summaries.
Email threading is a key use case that utilises both AI and ML techniques. With this, series of emails are organised into a single thread, or conversation, in an email inbox. ML and NLP techniques are used to reduce the amount of data that needs to reviewed manually by recognising when emails are all part of the same conversation, grouping them accordingly.
To recognise an email as an email and to subsequently group it into a thread, an email must always contain an “Email From” field and at least one of the following fields: “Email To”; “Email Subject”; “Email CC”; “Email BCC” or “Sent Date”.
Email threading is an effective method for communications surveillance because it allows users to track the history of a conversation, providing context for each message. This is crucial for understanding the flow of communication, who is involved, and the key topics being discussed.
Another benefit of email threading that it makes it easier to filter irrelevant or repetitive communications. This is especially useful in large-scale surveillance, where multiple threads can be assessed simultaneously to identify suspicious activity, detect trends, or uncover hidden connections. For example, spam filtering can be extended to filter content that is not of concern to compliance teams and conversation threading across media other than e-mail. Irrelevant files can be eliminated from the raw data set before they are included in subsequent analytical processes using a variety of preliminary filters.
Email chains can be broken down into two types of data – inclusive and non-inclusive. Inclusive data are unique contents and attachments that do not feature in other emails. Non-inclusive data is data that is non-unique and does feature in other emails. When unexpected inclusive emails are seen, that could be a sign to start communications surveillance to understand what has happened.
Surveillance is inherently data driven, and the volume, complexity and diversity of data can be extremely high – particularly for unstructured, non-numerical communications – and traditional database and analysis technologies can struggle to manage, query and analyse efficiently. However, not only do AI processes potentially enhance and improve surveillance processes, but they also represent a comparatively ‘safe space’ for institutions to introduce this kind of new technology.