The question of whether a survey participant's answer is credible is asked by every market researcher at the latest when it comes to cleaning the data.
In our opinion, this is an essential question, as it determines which data are used for the study results. This results in which recommendations for action are to be derived.
Currently, statistical procedures and manual plausibility checks are mostly used to try to infer the response quality of the respondents. In particular, the duration, variance and reasonability of text responses are checked and evaluated.
The problem here, however, is that often only individual aspects of response quality are illuminated (e.g., only survey duration). In addition, the manual/individual component means that the cleaning process differs from market researcher to market researcher, which in turn leads to differences in quality and results that are not very comparable.
What are the ways to identify quality?
In our last article, we already discussed formal and content-related "response biases" and showed which biases can occur in quantitative surveys. In this article, we will now show you concrete possibilities to help you identify the response quality of a respondent in surveys.
Time
The length of the survey is a key factor in identifying superficial response behavior, especially in online surveys, where it is relatively easy to identify so-called "speeder" or "click-through" respondents by means of the duration of the survey. There are different ways to identify them. One possibility is to define a minimum time to remove all those who finished the survey faster. A second possibility is to calculate the median of the survey duration of all respondents and remove participants who were, for example, twice as fast.
Variance
With the help of variance measurements, several formal quality aspects can be measured. One of these quality aspects is the so-called Straightlining: It occurs when respondents give identical or nearly identical answers to items in a question battery using the same response scale. Also, agreement tendencies or tendencies towards the middle/mild/hard can be identified by a variance measurement.
If the variance of the analysed values is low, there is much to suggest a corresponding bias.
Analysis of text responses
A good indicator of the quality of a respondent is the plausibility check of text answers. If a respondent enters a character combination with no content, such as "gdargacsafs" or even an answer of little value, such as " Everything" state, it can be assumed that the quality of the answer is low. The challenge here is the enormous effort to check all answers for plausibility and to remove less plausible ones accordingly. Here, intelligent machine learning algorithms (artificial intelligence) can be helpful to automatically evaluate and filter the answers. We offer such a method at ReDem in the form of the "Open-Ended Score".
Overestimating one's own answer compared to other respondents
Another interesting, but still little used indicator is the consistency of an answer with the so-called “False Consensus Effect”. We have already published an article on this topic as well. In principle, the false consensus effect describes the psychological tendency to overestimate the frequency of one's own opinions among the population. For example, Apple fans will overestimate the actual percentage of Apple fans. This effect also occurs when people think that their opinion does not agree with the majority. People will always estimate the frequency of their own opinion higher than people with an opposite opinion or attitude. Related to surveys, this will make responses that do not agree with this described phenomenon have a high probability of not being credible. Socially desirable response behavior in particular is thus easily identifiable. A prerequisite for the application of this control option is the existence of a projective statement. In this case, the respondents must make an assessment of the response behavior of the other respondents.
Assessment of the response from other respondents (meta-predictions)
Another good way, especially to identify superficial response behavior (satisficing), is to assess respondents' prediction ability about the actual response distribution of a question. To be precise, respondents have to estimate how other respondents will answer the respective question, as described in the previous section. For example, a projective question might be phrased as follows: "How do you think other customers will rate product XY?”
This evaluates how much each respondent's forecast deviates from the actual distribution.
If there is a large deviation across several questions, it can be assumed that the response behavior is superficial.
General plausibility checks
In addition, general plausibility checks serve to draw conclusions about the quality of the respondents' answers. This involves checking whether individual statements made by respondents are consistent. For example, it would be inconclusive in a survey on car buying behaviour to find an answer in which a 16-year-old student drives a Bugatti.
The challenge with plausibility checks is usually the enormous effort required to carry them out. In order to really subject every data set to a manual plausibility check, an elaborate analysis procedure is required that often takes hours.
Control questions
Control questions, like survey duration, are used primarily in online surveys. Here, respondents must provide a clearly defined statement. For example: Select the fourth answer option for this question. All participants who have made a different entry are removed. This is another way of identifying "click-throughs" or "speeder" in particular.
The problem with classic control questions is that answering such questions can become very exhausting for the respondents and also patronising.
Viewing memories in a differentiated way
Since most interviews refer to the past, respondents have to retrieve memories from their minds. It may be that a question refers to different time horizons of respondents. e.g. question 1: When was your last car purchase? Question 2: How satisfied were you with your car purchase? This example shows very well that the question of satisfaction can refer to different time horizons - depending on when the last car purchase took place.
Since our brain weights recently perceived events more strongly than events that happened longer ago, this factor should also play a role in data cleaning.
Conclusion
With these factors, the majority of the poorer quality data sets can already be identified and also taken into account accordingly in the result. It is important that the quality of survey participants is not only considered from one angle, but from as many as possible in order to obtain as complete a picture as possible of the response quality of a respondent.
To ensure the existence of professional market research, we believe it is essential to establish an industry standard for the process and criteria of data cleaning and to communicate this transparently to clients. In the future, every report of a market research study should contain a section dealing with data quality and the corresponding details. After all, data quality is ultimately a key factor in the success and credibility of market research.
With ReDem, as a solution for automated cleaning and quality assurance of survey data, this independent industry standard is our goal.