Enhancing Data Quality in Online Surveys: Key Insights from the ReDem Quality Day
On June 19, 2024, the first International ReDem Quality Day took place. This international online event was dedicated to a comprehensive examination of the pressing issue of declining data quality in online surveys. Experts from six different fields – technology, science, associations, institutes, online panels, and users – shared their perspectives, experiences, concerns, and proposed solutions.
Increased Inattention of respondents
Online surveys offer numerous advantages, including cost efficiency and time savings. Nowadays, almost anyone can become a survey creator with just a few clicks, allowing online surveys to be created and distributed within minutes. However, this simplicity comes with drawbacks. Often, there is a lack of expertise in designing high-quality questionnaires. Poorly designed surveys that are too long or confusing, contain logic errors, or feature biased questions lead to a poor user experience. This frustrates respondents and results in inattentive and unreliable answers, particularly in so-called "satisficing" behavior. Satisficing occurs when respondents take shortcuts to complete a survey more quickly, often by intentionally selecting incorrect answers.
Additionally, anonymity and the incentives offered often entice respondents to fill out surveys in large numbers. These factors exacerbate the problem of inattentive and disinterested respondents, significantly impairing data quality. Therefore, it is essential to design online questionnaires carefully and thoughtfully to increase respondents' attention and thus the reliability of the data. Malte Friedrich-Freksa from horizoom emphasizes the importance of creating good engagement and a positive experience for panel members. He warns that without these efforts, it will become increasingly difficult for people to willingly share their opinions and knowledge in surveys.
Right Questionnaire Design
The speakers at the International Quality Day agreed on the importance of well-designed questionnaires. Oliver Hülser from NIQ/GfK emphasized: "Good quality starts with the right questionnaire." A well-conceived and professionally crafted questionnaire is essential to mitigating satisficing behavior among respondents. Implementing counter-measures in the survey design can enhance respondent attentiveness, preventing them from switching to "auto-pilot" mode, and can also help identify inattentive participants (e.g., through the use of targeted trap questions).
Udo Wagner, professor at the University of Vienna, presented a framework with four categories of measures to combat satisficing in online surveys, distinguished along two dimensions: direct vs. indirect and dedicated vs. non-dedicated measures. For instance, one direct and dedicated measure (Category 1) involves asking respondents to imagine themselves as decision-makers and consider the extent to which their own answers should be taken into account. On the other hand, in Category 4 (indirect and non-dedicated measures) satisficing is assessed indirectly by analyzing response patterns. It includes checking for outliers, lack of consistency of responses, or excessive consistency. For further insights into effective questionnaire design, we recommend Udo Wagner’s presentation.
Maßnahmen gegen Satisficing (Baumgartner, H., & Weijters, B., 2019)
Survey Fraud
Another concerning observation shared by the speakers at Quality Day is the rise in survey fraud and the increasing difficulty in detecting it. Tia Maurer from P&G emphasized that she and her team classify about 15-30% of the market research data they receive as fraudulent. This necessitates a significant effort in data cleaning to identify and filter out fraudulent activities. This problem is further exacerbated by the emergence of click farms and advanced AI techniques, which make systematic survey completion accessible for fraudulent activities. Interviews with two fraudsters from Bangladesh and Vietnam illustrate how lucrative survey fraud can be for participants from developing countries. By participating in countless surveys and receiving the incentives, they earn multiple times the average local monthly income.
Modern survey fraud is becoming increasingly difficult to identify as many technical hurdles can be easily bypassed. Fraudsters use services like TunnelBear to mask their IP addresses and change their locations, allowing them to participate in multiple surveys with geographical restrictions and collect incentives multiple times. Trap questions are losing their effectiveness too: P&G has observed that fraudsters are quick to adapt their bots once they recognize trap questions. To reliably detect fraudulent response behavior in the future, a holistic examination of survey data is essential.
Impact of Research Insights
Fraudulent data can have severe consequences. According to the 2023 Cybersecurity Ventures Cybercrime Report, the annual costs of cyber fraud, including survey fraud, are projected to reach $10.5 trillion globally by 2025. At the Quality Day, Tia Maurer presented a striking example of a failed product launch at P&G due to unreliable survey data, costing the company millions of dollars: The pre-launch data for a clinical toothpaste indicated high demand and strong purchase intent. However, reality proved different: customer satisfaction with the product and sales figures were alarmingly low. A closer analysis of the data revealed that the pre-launch results were far from trustworthy. The cleaned post-launch data showed that the actual purchase intent was less than half of what was initially projected.
Pre-Survey & Post-Survey Checks
Pre-survey checks can help keep fraudulent respondents and bots out of surveys. However, these measures often fall short in completely preventing fraudulent survey behavior. Fraud detection tools integrated into surveys in real-time aim to prevent fake and fraudulent participants from entering the survey. Despite their usefulness, these measures often only catch the most obvious fraud cases and do not provide complete protection, as many technical barriers can be easily bypassed. P&G has found that many fraudsters still manage to pass through pre-survey checks, making additional post-survey checks essential to ensure high data quality da viele technische Hürden leicht umgangen werden könnenDespite using pre-survey checks, P&G typically removes 10-30% of the data through careful review after surveys are completed. Experts from MRS and the Global Data Quality Initiative, Debrah Harding and Chris Stevens, emphasize that not all poor data originates from fraud. In the future, it will be increasingly important to better understand one's data and carefully consider how many participants need to be removed: as few as possible, but as many as necessary to ensure data quality.
Costs of high quality data
Despite the allegedly low cost of online questionnaires, providing high-quality data is more expensive than many market researchers and data buyers are willing to accept. The lack of clear quality signals makes it difficult to determine the appropriate price for high-quality data. Discussions often focus on price rather than data quality, leading to confusion about who is responsible for ensuring quality. It is essential to educate buyers of research data about the importance of prioritizing quality over cost. Additionally, transparency regarding sample construction and the costs incurred for quality assurance can justify higher prices for online surveys, as Debrah Harding emphasized in her presentation. This ensures that the focus remains on obtaining high-quality, reliable data.
Conclusio
Combating bad data quality in online surveys requires a multifaceted approach. By addressing inattentive respondents with well-designed questionnaires and implementing both pre-survey and post-survey fraud checks, we can take a step in the right direction to significantly improve data quality. Additionally, understanding the cost implications and focusing on data quality over price are crucial steps toward achieving reliable survey results.
You can find the videos of the individual presentations here
Tia Maurer (P&G)
Survey Fraud: The Implications of Cheaters and Repeaters in Online Research
Sebastian Berger (ReDem)
Quality Control- you have been scammed. We need to check your data!
Malte Friedrich-Freksa (Horizoom)
Enhancing Data Quality: AI in Online Access Panels
Udo Wagner (Universität Wien)
Panel Discussion
Online surveys – is there life in the old dog yet? (moderated by Holger Geißler from marktforschung.de)