Data quality at risk: The challenges of generative AI in online surveys.

Generative KI

The hype around generative AI models such as ChatGPT has also gripped market research and offers many new positive changes to our industry. However, every new technology has its advantages and disadvantages. In this article, we highlight the challenges associated with the emergence of generative AI technology, particularly in relation to the quality of online surveys.

What is gererative AI?

Generative AI is an emerging area of artificial intelligence that aims to create new data based on existing data. Essentially, existing data is used as training data to generate new content such as text, images and audio files. In doing so, the models learn the patterns of the source data and can create new content based on that.

In this context, generative AI offers enormous potential for a wide range of industries and applications. By creating new content based on existing data, this technology can revolutionize many areas. The market research and insights industry can also benefit from generative AI. At ReDem®, for example, we successfully use this technology for quality control of survey data.

What are the dangers of generative AI for the quality of online surveys?

Survey data is an important source of information for companies, governments and researchers to gain insights into people's opinions and behaviors. To ensure that surveys reflect reality, it is of paramount importance to ensure data quality.

However, survey data, especially from online access panels, is vulnerable to fraudulent participants such as "survey bots" and "click farms" that attempt to exploit the system to gain rewards without honestly and attentively participating in surveys. As generative AI becomes more prevalent, much greater challenges arise than in the past.

Generative AI models can be used to develop intelligent bots that are difficult to detect and pose a major threat to the quality of survey data. In the past, the problem of fraudulent respondents and bots was already pervasive, but the use of AI models takes it to a whole new level.

Manual controls to ensure data quality will become increasingly difficult, if not impossible, in the future. The fight against fraudulent response behavior will have to take place exclusively at the technological level.

The worst-case scenario would be a large-scale bot attack that goes undetected and has a dramatic impact on survey results.

How quality can still be ensured

Generative AI models can not only generate fraudulent response behavior, but also help detect it. AI-based pattern recognition can be used to detect AI-generated content from bots. However, as bots continue to evolve, keeping tools up-to-date is a constant technological tug-of-war.

Another means of quality assurance is so-called “digital fingerprinting”. In this process, several characteristics of a survey participant's digital device are recorded to create a unique “fingerprint”. However, modern bots can also partially overcome this method. In addition, digital fingerprinting poses a challenge with regard to GDPR.

In summary, there is no single measure to ensure data quality in online surveys. A combination of measures is required, which must be continuously developed to achieve excellent data quality.

How the industry can prepare for a future with generative AI

As we look to the future, ensuring data quality is becoming an ever-changing challenge, primarily at the technology level. To ensure outstanding data quality in the long term, we as an industry should pool our resources to take advantage of the most advanced technological approaches. Only in this way can we meet the high quality expectations of all stakeholders and ensure the credibility of the market research and insights industry.


Image by starline on Freepik

Florian Kögl
Florian is the founder and CEO of ReDem®. He is also a board member of the Austrian Market Research Association and has extensive experience in the development of innovative software solutions.