The question of whether a survey participant's answer is credible is asked by every market researcher at the latest when it comes to cleaning the data.
In our opinion, this is an essential question, as it is decisive for the data used for the study results and, as a result, which recommendations for action are to be derived.
Currently, statistical procedures and manual plausibility checks are used to try to draw conclusions about the quality of the respondents' answers. Mostly, the duration, variance and plausibility of text answers are checked and evaluated.
The problem here, however, is that often only individual aspects of response quality are illuminated (e.g. only survey duration).
In addition, the manual/individual component of the cleansing process differs from market researcher to market researcher, which in turn leads to differences in quality and less comparable results.
What are the possibilities for identifying the response quality and what do you have to watch out for?
In our last article we already referred to formal and content-related "response tendencies and have shown the biases that can occur in quantitative surveys.
The purpose of this paper is to provide concrete ways to help you identify a respondent's response quality in surveys.
The duration of the survey is an essential factor for identifying superficial response behaviour, especially in online surveys.
So-called "feeders" or "clickers through" are relatively easy to identify by means of survey duration.
There are different identification possibilities. One possibility is to define a minimum time where all those who completed the survey faster are removed. A second possibility is to calculate the median of the survey duration of all respondents and remove participants who were, for example, twice as fast.
Variance measures can be used to measure several formal aspects of quality. One of these quality aspects is so-called straightlining: it occurs when respondents give identical or almost identical answers to items in a questionnaire series using the same answer scale. Also Approval tendencies or tendencies towards the centre/mild/hardness can be identified through a variance measurement.
If the variance of the analysed values is low, there is much to suggest a corresponding bias.
Analysis of text responses
A good indicator of response quality is the plausibility check of text responses. Should a respondent enter a character combination without content, such as "gdargacsafs" or even an answer of little value, such as "I do not care about the question” state, it can be assumed that the quality of the answer is low.
The challenge here is the enormous effort to check all answers for plausibility and to remove less plausible ones accordingly.
Here, intelligent machine learning algorithms (artificial intelligence) can be helpful to automatically evaluate and filter the answers.
Overestimating one's own answer compared to other respondents
Another exciting, but not yet so well known indicator is the agreement of an answer with the so-called "false consensus effect". We have already published a Post published. In principle, the false consensus effect describes the psychological tendency to overestimate the frequency or spread of one's own opinions among the population. For example, Apple fans will overestimate the actual proportion of Apple fans.
Even if people think that their opinion does not agree with the majority, the False Consensus Effect occurs. People will always rate the frequency of their own opinion higher than people with an opposite opinion or attitude.
In terms of interviews, this means that answers that are not in line with this described phenomenon have a high probability of not being credible.
Socially desirable response behaviour in particular is thus easily identifiable.
A prerequisite for the application of this control option is the existence of a projective statement. In this case, the respondents must make an assessment of the response behaviour of the other respondents.
Assessment of the response from other respondents (meta-predictions)
Another good way, especially to identify superficial response behaviour (satisficing), is to assess the respondent's predictive ability about the actual response distribution of a question. Specifically, as described in the previous section, the respondent must estimate how other respondents will answer the question. For example, a projective question could be formulated as follows: "How do you think other customers will rate product XY?”
This assesses how much each respondent's forecast deviates from the actual outcome.
If there is a large deviation across several questions, it can be assumed that the response behaviour is superficial.
General plausibility checks
In addition, general plausibility checks serve to draw conclusions about the quality of the respondents' answers. This involves checking whether individual statements made by respondents are consistent. For example, it would be inconclusive in a survey on car buying behaviour to find an answer in which a 16-year-old student drives a Bugatti.
The challenge with plausibility checks is usually the enormous effort required to carry them out. In order to really subject every data set to a manual plausibility check, an elaborate analysis procedure is required that often takes hours.
Control questions, like the survey duration, are mainly used in online surveys. Here, the respondent has to give a clearly defined indication. For example: Select the fourth answer option for this question. All participants who have made a different entry are removed. This is another way of identifying "click-throughs" or "speeder" in particular.
The problem with classic control questions is that answering such questions can become very exhausting for the respondents and also patronising.
Viewing memories in a differentiated way
Since most interviews refer to the past, respondents have to retrieve memories from their minds. It may be that a question refers to different time horizons of respondents. e.g. question 1: When was your last car purchase? Question 2: How satisfied were you with your car purchase? This example shows very well that the question about satisfaction can refer to different time horizons - depending on when the last car purchase took place.
Since our brain weights recently perceived events more strongly than events that happened longer ago, this factor should also play a role in data cleaning.
A corresponding weighting factor can help to reduce this "memory error".
With these factors, the majority of the poorer quality data sets can already be identified and also taken into account accordingly in the results. It is important that the quality of a respondent is not only considered from one point of view, but from as many as possible, in order to obtain as complete a picture as possible of the quality of a respondent's answers.
In our view, there must be an industry standard for the process and criteria of data cleansing in the future, which is also communicated transparently to the clients. We think that in future every report of a professional market research study must contain a section on the details of data quality.
Market research must stand for data quality, because this is an essential point for the raison d'être of professional market research.
With Redem as a solution for automated cleansing and quality optimisation of survey data, our goal is to become exactly this independent industry standard.
You can find more information on our website.