Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Insider UK
Insider UK
Business
insider.co.uk

Synthetic data – beyond the hype

Good quality data is critical to financial innovation, but we know that data access, permissions and security are the biggest factors holding innovators back.

At Smart Data Foundry, creating synthetic data designed to overcome these challenges has been a core focus, and we recently launched our synthetic data engine - aizle - to provide innovators across the UK with artificially-generated data to solve real world problems.

We firmly believe synthetic data has the potential to be a game changer in terms of driving innovation to improve people’s lives.

Through the creation of safe, shareable customised data sets, entrepreneurs and organisations will be able to collaborate, unlock solutions, prove ideas, build and release new products with confidence, safety and speed.

And it is an area gaining increasing attention, as rapid progress in machine learning and artificial intelligence (AI), combined with ever-increasing computing power, means it’s possible to create higher quality artificial data than ever before.

The market is at an interesting inflection point as it moves from ‘hype’ towards mainstream acceptance and greater market understanding, and our recent whitepaper compares two primary methods of generation – agent-based modelling and learning-based synthesis; often referred to as synthetic doubles.

Agent-based is the more suitable approach if real-world data doesn’t exist, is biased or incomplete, or you are looking to innovate through collaboration and don’t want to share confidential data.

Learning-based approaches use machine learning to make a safe-to-use version of data that an organisation already has, creating a synthetic double of that data.

Our agent-based synthetic data engine produces synthetic datasets which contain the important and meaningful features of real-world data, but critically without requiring any real-world input data to generate the custom synthetic data sets; removing privacy and other data risks altogether.

Based on this approach, aizle can provide data where no real-world data is available, or can provide alternative data when real-world data exists but is inadequate or has bias, enabling innovation in areas previously thought too difficult or expensive to explore.

This could include, for example, developing a rapid prototype, training an AI model, or running scenarios on the strategic impact of a new initiative.

The three principles underpinning aizle are privacy, fidelity and utility – ensuring the data output is safe to use and without risk, is accurate and relevant, and is easy to use, process and share.

This approach was successfully used in our work with the Financial Conduct Authority (FCA) and Payment Systems Regulator (PSR), to innovate in the area of Authorised Push Payment (APP) fraud; one of the most common financial crimes in the UK.

The data environment for this type of crime is complex, combining banking and telecommunications systems with criminal data around the outcomes of scam reporting.

The many regulations in play from organisations - ranging from the FCA, to the Information Commissioners Office, the PSR and OFCOM - in reality would limit access to the data needed to innovate.

This is an example case of where synthetic data shines, generating a rich world of synthetic people and businesses, simulating their financial and social interactions, before introducing synthetic criminals into this virtual world to attempt APP fraud on the population.

We generated a detailed customised dataset that was worked on by financial service providers, innovators, academics and regulators during a three-day TechSprint.

Gartner estimates that by 2024, 60% of data for AI applications will be synthetic, and the total of publicly-known funding for synthetic data companies reached $328m in October 2022 – $275m more than in 2020.

In terms of open banking and the fintech community, helping organisations which have real issues around data availability take those first steps to creating specific synthetic data sets is what excites us. Particularly when we start to see synthetic data driving new propositions from innovators that will benefit customers, businesses and the wider society.

David Tracy is the head of data products at Smart Data Foundry

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.