Do you still trust your data? How smart survey fraud puts your results at risk.

Incentive hunters, click farms, and bots have always jeopardized the quality of online surveys. Until now, however, they have been easy to identify because, on the one hand, they betrayed themselves through their digital fingerprints and, on the other hand, their survey behavior differed significantly from those people who seriously participate in surveys. In the recent past, however, there has been an increase in both the extent and degree of professionalization of survey fraud. This new generation of incentive fraudsters can hardly be detected by conventional methods and thus increasingly poses a challenge for market researchers.

Traditional survey scam

 Click fraudsters discovered online advertising as a source of income about twenty years ago [1]. With a slight delay, this also happened in market research. Since then, a large number of people have been trying to swindle their way through all those online surveys that promise monetary incentives for participation as quickly as possible in a framework organized like a call center – the so-called click farms. The goal of each individual is to maximize the number of incentive points that can be generated in a short period of time [2]. For this purpose, the survey fraudsters also use computer programs (bots) that automatically answer online surveys. To do this, they first create numerous e-mail accounts with free e-mail services and member accounts with online panels. The incoming invitation links to the surveys are automatically called up and run through by the survey bots.

Market researchers use their knowledge of the efficient workflows of click farms, which are mainly based in low-wage countries such as China, India, and Bangladesh, to expose survey fraudsters. Various indicators such as short survey durations ("speeding"), monotonous click history ("straightlining"), nonsense responses (e.g. arbitrary character strings) and digital fingerprints (e.g. IP addresses) are collected to identify the fraudulent activities.

The exclusion of data generated by incentive hunters, click farms and bots is absolutely necessary for any online survey, otherwise the results will be falsified. A study [3] that looked at five of the ten largest online panels in the U.S. found that 46 percent of the data needed to be removed. A comparison of the results before and after the cleansing of the data showed that the results before the cleaning were diametrically opposed to those after the cleansing.

Another study from Austria [4] found that after data cleansing, survey results were up to 37 percent closer to the reference data.

Modern survey scam

Relatively new is the phenomenon that potential incentive fraudsters are recruited as telecommuters on the Internet and learn on special platforms how to exploit security gaps, technologically conceal identities and trick quality checks in online surveys. The operators charge a kind of course fee from the interested parties and/or participate in their monetary incentives as a percentage [5]. Current tips and hints on how to infiltrate surveys (such as links that directly lead to the collection of incentive points thereby causing so called “ghost completes”) are also shared on such platforms.

The simple, fast and global availability of this specialized knowledge leads to an increasing professionalization and decentralization of survey fraud. Everything you need is available in the home office and centralization via click farm is not necessary. So, anyone can build their own little click farm at home at low cost with a notebook and a few older smartphones [2]. If one calculates with a survey incentive of one dollar for ten minutes, in which experienced incentive fraudsters complete about ten surveys, then the income corresponds to about 60 dollars per hour – a quite attractive (additional) income in many countries. Through the use of bots, this amount can be significantly increased.

This form of survey fraud endangers the results to a greater extent than is the case with the previously prevalent amateur clicking through of surveys from traditionally organized click farms. This is because the new generation of incentive fraudsters has learned how to successfully circumvent the security checks and quality controls of online surveys. They or the bots they use are no longer so easily noticed by speeding, straightlining, nonsense answers or their digital fingerprint. Manual cleaning of the data records is difficult (and therefore costly and time-consuming) or not possible at all.

Recommended Actions

By combining as many of the following measures as possible, it is possible to successfully identify and combat intelligent survey fraud:

1. Identity checking: The stricter and more complex the form of authentication of participants in online panels, the greater the likelihood of stopping misuse at this point. For example, sending access data by post and using two-factor authentication for each survey participation makes it more difficult to manipulate the identity.

2. Analysis of access patterns: A system that alerts in the event of an unusually large number of new registrations of online panel members or an unusually high number of survey accesses within a short period of time.

3. Performance measurement: An evaluation system for panel members that assesses the extent to which each member's participation is honest and conscientious, starting with a welcome survey and then repeated at regular intervals.

4. Delayed credit/payout: The attractiveness of incentive fraud can be lowered by crediting/paying incentives not immediately after the survey has been completed, but only after the data cleansing has been passed (which nowadays is usually done manually after fieldwork). If credits to an account are always redeemed immediately after receiving them, this could be an indication of survey fraud.

5. Capture of the digital fingerprint: Here, various attributes and parameters (such as IP address, cookies, hardware and software used, including their settings) are collected in order to draw conclusions about the identity of a potential survey participant. For example, (intentional and unintentional) multiple participation of one and the same person registered in several panels in a survey can so be prevented. It should be noted that, on the one hand, this identification can be problematic in terms of data protection law in the EU. On the other hand, the attributes and parameters can be circumvented by deleting cookies, using Virtual Private Networks (VPN), certain browsers or browser settings or their extensions, as well as proxy servers and botnets.

6. Use of special questions at the beginning of the survey: Incentive fraudsters are instructed or their bots are programmed to select as many answer options as possible at the beginning in order to maximize the chance of belonging to the target group of a survey. This over-fulfillment of eligibility requirements, which is typical of survey fraud, can be exploited by asking questions for which a selection of certain options is implausible or even contradictory. For example, you ask about the awareness of brands that don't even exist. Any participant who claims to know these brands will then be excluded even before the actual start of the survey.

7. Capturing all durations: Both incentive scammers and their survey bots are instructed/programmed to complete surveys within realistic timeframes. Accordingly, the total duration of a survey has only limited significance. Therefore, in addition to the total duration, the duration of answering individual questions should also be recorded and evaluated. This is because the workplace of a professional incentive fraudster is usually equipped with numerous displays for participating in several surveys at the same time. The surveys are therefore not run one after the other, but in parallel. This systematic clicking through across multiple displays can be recognized by the regularity in the response time of the individual questions. Experience has also shown that the bots can be better identified by the time durations of the individual questions, because there are sometimes unrealistically short response times.

8. Tracking the number of completion errors: For questions that only allow certain answer options, bots (and incentive scammers) can stand out by trying different options so many times until they progress through the survey. With the number of completion errors (in a short time), the probability that it is survey fraud also increases.

9. (Projective) control questions: For example, by surveying the extent to which respondents overestimate the frequency of their own answers compared to others, it is possible to evaluate how confident and convinced they are of their own answers on the basis of the well-researched false consensus effect [6].

10. Identification of outliers: Statistical methods can be used to analyze whether information deviates so much from the mean that it is probably not an honest response.

11. Identification of click patterns: By identifying certain regularities in click patterns, survey fraud can be detected.

12. Evaluation of the quality of the answers: Until now, the quality of the answers to open-ended questions had to be checked manually after fieldwork, which was very time-consuming and subjective. In the meantime, there are technological solutions that have automated and standardized this task based on artificial intelligence and enable data cleansing in real time.

13. Recognizing copy & paste responses: While answering online surveys, it is technically possible to record whether answers are copied into text fields.

14. Capturing duplicates: Recognizing (partially) identical answers to open-ended questions, both within individual interviews and across all interviews, is an important measure for identifying fake data. This is because incentive fraudsters and their bots often use text modules that they combine with each other as they wish so as not to attract attention.

15. AI analysis of behavioral patterns: AI models can be trained to successfully distinguish the behavior of participants who answer surveys honestly and conscientiously from that of incentive fraudsters and survey bots. The prerequisite for this is, on the one hand, that the training is carried out with reliable up-to-date data, on the basis of which the model learns through feedback loops what is good and what is bad. On the other hand, this learning process must be continuously supplied with new data, because the survey fraudsters are constantly adapting their behavior according to the changed framework conditions (such as new data quality checks).

Conclusion and recommendations for action

It can be assumed that survey fraud will continue to increase because incentive fraudsters have little to fear. If someone is classified as a fraudster by the panel operator, the affected accounts (which are naturally fake anyway) are usually blocked. There are no other (e.g. legal) consequences. The blocking of the accounts, which takes place within the framework of the general terms and conditions of the panel operator, hardly has a deterrent effect. Especially not because survey fraudsters usually redeem their incentive points on an ongoing basis and thus these fake accounts have little or no credit when discovered.

However, in market research, the importance of quality assurance of survey data, including the associated measures, will increase due to the increasing concern about the results. However, only those methods have a lasting effect in the fight against fraud that automatically unmask the fraud in real time before or during the survey and thus prevent incentives from being credited or paid out. This live quality control and clean-up also has the advantage that costs and time are saved by not paying out incentives, adhering to quotas, avoiding repeated recruitment of study participants and the elimination of time-consuming manual cleansing.

As the number and quality of measures taken to combat survey fraud increases, the level of professionalism of incentive fraudsters and the sophistication of technologies they use is likely to continue to increase. For example, it is to be feared that in the future, bots will be able to use artificial intelligence to simulate human behavior in surveys – such as answering open-ended questions – much better than has been the case so far. In order to detect these AI survey bots, AI quality checks must then be used to determine whether a text was generated by an AI model or comes from a human.

In addition, it is to be expected that the still traditionally organized click farms will upgrade technologically in order to trick automated technical security checks before the survey (such as identity verification using digital fingerprints).

Since the currently still predominantly amateurish fraud can be cleaned up to a large extent manually by reviewing the data, it could be argued that the measures demanded here are pushing more and more incentive hunters to professionalize their activities. This argument could probably be used against the fight against any kind of fraud. However, the goal of a sustainable fight against fraud must be to make potential survey fraud so costly by using as many of the measures listed here as possible that professional incentive fraud does not pay off financially either now or in the future.

All those market researchers who ignore the problem of increasing intelligent survey fraud and take no or only inadequately appropriate measures against it are a threat to the market research industry. After all, clients must be able to rely on the data and results of market research. If, for example, trust is lost due to a lack of transparency, sooner or later this can lead to damage to the market research image and, as a result, to a significant drop in orders in the industry.

But the clients are also challenged. On the one hand, they should focus on the extent to which appropriate measures are taken to combat fraud when awarding contracts. They have to accept that the quality assurance activities in some areas, such as the quality management of online panels, are accompanied by increased costs. This is because it is not the online panel with the most members, but the one with the lowest number of fraudulent accounts that offers more security in terms of data quality. On the other hand, clients can save costs through the use of new technologies that enable the automated and standardized cleansing of data during the survey.

Finally, it is also necessary for clients to be willing to have the questionnaires adapted or expanded by market researchers in such a way that they offer sufficient starting points for quality checks. For example, open-ended questions should be part of every questionnaire, because experience has shown that they are particularly well suited for fraud detection.

Despite the increasing threat of intelligent survey fraud on the results of online surveys, market researchers can face this challenge with confidence. The good news is that the hit rate of a shotgun is enough to handle it. In other words, a certain margin of error (in the low single-digit percentage range) in identifying unreliable data is usually acceptable because it does not have a significant impact on the results. However, it becomes problematic if ten percent or more of bad data remains in the data set, as this can falsify the results [3]. It should be possible to cope with this task in the future.

Sebastian Berger