The following is a transcript of a talk given by yours truly and Chris Davison from KPMG Nunwood at ASC’s One Day Conference on the Challenges of Automation in Survey Research on May, 11th 2017.
Introduction
We have entered the golden era of automation –in other word: make machines do things. At first it was repetitive and simple things – find duplicates in a sample list, copy that survey and substitute the word Coca-Cola by Pepsi and send the results to all the executives of the relevant company – not mixing up companies is the perilous thing that you do not want to get wrong.
Automation is for lazy people – and I have always considered laziness to be a quality! Lazy people [programmers] look to avoid doing things they don’t really have to, and when they do finally have to, they look to get it done with the least amount of effort.
Now if we are to believe claims automation can post-code open-ended responses (yeah right Tim!), write reports, win at Go (mark my words – that one is never going to happen) and soon Skynet is going to enslave us (it will be ok as long as we don’t choose the red pill… or the blue pill.. damn I have to remember which one).
Automation is not new. Software is automation. But not everybody is a programmer – well this is less true at the ASC.
For every-one to benefit from the work done by the best programmers in their sector, Application Programming Interfaces (APIs) were invented. An API is a pile of code (usually documented) inside a box accessible by another software without having to understand its inner working (or have access to the source code). Here’s a brief timeline:
By 1999 it was cool to be a programmer (and not just at the ASC). Every web designer was now a programmer – writing awful code in JavaScript so they could animate their poorly designed website while growing and grooming their facial hair and sipping their Frappuccino soy latte.
XML was no longer the cool kid on the block JSON (JavaScript Object Notation) was – more compact, more elegant and directly usable in JavaScript. jQuery – a JS library, quickly replaced by Angular and then React – made it super easy to query any website and people started calling me a dinosaur because I am a C++ developer.
Automation was always possible before: to make software interact you needed a database accessible by both parts. Or start an executable from the command line… Or a file drop on an FTP server. All these are huge security risks. I am not saying that web services are not security risks – the risks are just less understood so easier to sell to your CEO.
So what was new in the survey world? Well everything.
Panel providers created APIs. First Cint then Lucid which lead to an explosion of DIY research. Software providers opened-up their APIs and some even documented it. I will not give you the list but these days, even SurveyMonkey has an API.
And for me the revolution came with CRM system – like SalesForce, Microsoft Dynamics or ZenDesk – opening up the Enterprise world. You could interview any customer after any touch point… understand what’s happening and adapt quickly, it’s called by one of our competitors the experience gap.
Behaviour is now captured outside of surveys. The “What and When” is known. Surveys can concentrate on the “Why?” and the “What if?”. Verve is managing insights for Walgreens, the owner of Boots. Thanks to the loyalty cards, they know you have bought paracetamol on the Monday, Ibuprofen on the Tuesday and nothing on the Wednesday (with an app and iBeacons they also know where you have walked) – and they can sample you accordingly and interview you to understand your buying pattern.
So the game is about asking the relevant question at the right time. In my opinion nobody does it better than KPMG Nunwood and that’s because – automation looks like magic when it’s well done – there is a wizard in the backroom somewhere in Leeds…
Case study: Nunwood (Chris Davison)
So I’d like to tell Nunwood’s automation story, how it started, the obstacles we faced and where we are now before later coming onto some of the ideas that we implement to take us further…
How did we start? Well I wish I could say we had some far reaching vision but in all honesty this is what kicked things off for us…
Sadly there wasn’t much long term vision associated with our first foray into automation, that came later. What got us moving was necessity. We had several UK projects using bits and pieces of automation techniques but they still required manual intervention at key points to move the process along.
As we started to expand globally, we were faced with the challenge of processes needing to run at any point in the day – including when our UK team were tucked up in bed. Our first attempt at a solution was a successful failure – we met the needs of the project, but the mish-mash of command line, SQL scripts and Askia was very complicated and wasn’t very accessible to the entire team.
If we were going to extend the automated approach to other projects, it was clear we would need different tools and this is what brought us to LoadIt. While a simple tool to use, it allowed for a great deal of complexity meaning new starters could get to grips with it quickly but the more seasoned DPers could still deal with our most demanding projects. Later, its extensibility would allow us to integrate with our in-house developed systems such as our Fizz online reporting platform.
Given LoadIt’s existing integration with Askia and the automation capabilities within Askia we soon developed the long term vision that we were missing – the Zero Hours Project.
Given certain conditions – mainly stability in the project’s design and outputs – could we automate all elements of data collection and delivery?
It was an ambitious goal and one that had to have some compromises – there would always be some elements that would need manual intervention, so “zero hours” really meant “very few hours” but that didn’t quite have the same ring to it.
Discussing these kinds of developments with the whole team raised justifiable concerns that this would mean people’s jobs, but automation does not necessarily mean reducing head count and it certainly wasn’t our goal.
The tasks that lend themselves well to automation are the ones that don’t change – removing these from the team’s workload would free up time for things that required the skills for which they were hired (primarily their problem solving abilities), the skills required for automation were also different from the typical work, meaning it provided an opportunity for people to learn more broaden their skillset. We could also expand the remit of the team – in particular we took on more responsibility for the configuration of our reporting sites. From an operational perspective it would mean we could go some way towards flattening out the peaks and troughs that had developed in our working patterns – driven by the fact that most of our work was tracking studies that came out of and went into field at very similar times.
Framed like this, it was a very positive message for the team and I’m pleased to say everybody got behind the idea.
The main challenge from the wider business was around quality control. If machines were doing all the work, who was checking it?
It was a valid concern but one that could still be approached with automation at the forefront. All surveys are based on a set of rules, however complex – each question has criteria that must be met before it is asked, so it’s reasonably simple to check that the rules have been met.
We created VB scripts that could test the rules and give a pass or fail to a set of Excel tables, this meant that the same files we were using for automated checks could also be verified manually passed to our Insight team should they want to double check things.
So back to the original question – could we run a zero hours project?
The answer wasn’t simple – Yes, if you considered the caveats – when things didn’t change and we considered the standard elements of the project we could produce the outputs through automated scripts. No, because an unexpected consequence of the changes was a change to the way our Data Team worked with our Insight Team. With many of the repetitive, standardized tasks removed, we found we had more time to work on ad hoc requests and deeper analysis of the data – meaning we could add more value to projects.
We had seen many of the improvements we had hoped as well as some unexpected ones: operational improvements, better working practices and we’d started to extend our capabilities.
Paradata (Jérôme Sopoçko)
One of the most exciting areas of using automation and APIs is during (or just after) the collection of the survey. Paradata is often just the date and time of the start of the interview. But more generally it’s about storing any information about the way the interview was conducted. You can find out the name of the browser, the operating system and the language used in the HTTP request header.
If the interviewee is using Internet Explorer 5 (or more generally any version of Internet Explorer), do not bother asking technical questions. Similarly, if the operating system is Linux, forget these ask technical questions because you won’t understand the answer.
Beyond the interview, you can find information about the world. If you interview someone who is boarding a Eurostar train, it’s interesting to check the volume of #Eurostar hashtags in the Twitter API: it’s a strong indication of problems on the line.
Now let’s talk about the IP address – this identifier assigned to you by your Internet Service Provider. Of course your ISP knows who you are and has been allowed to sell (in the US) your browsing history.
There are a number of companies out there who specialise in transforming an IP address in a geographical position: www.freegeoip.net, www.maxmind.com, www.digitalelement.com, …
But you can have a much better definition of the geo-location by authorising the browser. Google thanks to their StreetView vans was fishing for the Wifi network information so they can improve the location which is now at scary levels of accuracy.
As mentioned to have accurate geo-localisation, you need the permission of the user. The idea is not to gather information in a sneaky way. Tell the user what you are doing with this information, explain that you are reducing the number of questions you ask… Because there are so many resources available: openweathermap.org will give you the current weather in any location, developer.zoopla.com will find right away the average price of the house in the vicinity. And then you have the open data government sites. Data.gov.uk have put on 185,000 datasets. Call me old fashioned but I still think we are in Europe, the EU Open Data portal has 10,700 dataset with a full API to access them. For free.
So what can we do with this data? Does that help to know that my interviewee is in Cardiff where 60% of the people voted remain? Linking big data and survey data is one of the greatest challenge of #MRX – and if you are not one of GAFA (Google Amazon Facebook Apple) you are at a real disadvantage.
Let’s use a concept developed for advertising planning: the Average Issue Readership (AIR)
We ask a significant amount of people (very significant in the case of the TGI survey) these questions: “How often do you read this newspaper?” or sometimes rephrased in “When did you last read this newspaper?”. There is still a lot of discussion to find out which is the best way to ask these questions – they are usually called probability questions.
So you get a very classic grid question like the above. Thank to these lovely people who do research on research, we get the following probabilities of reading, based on the “recent reading” question.
From there we can infer, for each interview, the probability that he has read or not any given issue of a newspaper… and the great thing is that we can use that information in crosstabs and crossing that by their likelihood of buying Corn Flakes and plan our advertising campaign according to this.
In a normal survey, if we simply ask people what newspaper they read and we cross that by their gender. When we do a cross-tab, for each interview of gender who has given brand Y, we add 1 in the corresponding cross-tab. With probability question, we add a value between 0 and 1 indicating the probability of the person having seen the brand.
OK, the data looks a bit weird because the counts have decimals but once you move them into percentages, nobody cares anymore. And of course you can still use weighting if your panel data was not balanced.
So what does that mean for you: although you never asked for the vote that fateful referendum, you can cross the NPS of your brand with the vote for leaving Europe who (from what I have seen at the MRS) increasingly used as a segmentation tool.
Of course it’s not just the election you can cross by: the level of crime, the amount of subsidy, the likelihood of rain…
Beyond paradata, you can also create additional information with the questions you actually ask. We have worked on a project where the conjoint analysis utilities were computed in real time – that meant automating R (like Ian showed earlier) to get the results a few screens further, show the best concept for a given user and validate it.
Beyond that – the revolution is also around open-ended question analysis: you do not write open-ends anymore.
You will speak to your device, your computer, but also your phone, tablet, Alexa, your fridge…any IoT device – which have ways of recognising you. MyForce, our sister company, works on Bison – a revolutionary platform of not just speech to text – but identifying people by their voice (who’s talking), classifying the tone and talking speed (how are we talking) and the content (what are we talking about).
It’s not just Bison – look at what APIs Microsoft offers (Microsoft cognitive services) and R is integrated in SQL Server…
Google and Facebook are also on the bandwagon (the gravy train).
One of our clients, through our other sister company Platform One – Nuaxia – has a panel of 1,000,000 doctors (not all in the NHS). These guys are in a hurry but they have interesting things to say. So, Nuaxia lets pharmaceutical labs survey these guys but only ask them 10 questions. And instead of asking them to type, they film them.
This is the interface to create the survey – this is kept simple – it’s for pharmaceutical people. From there a survey file is created through the API of a well-known software vendor, they debit the PayPal account of the white blouse guys, invite the doctors and the data pours in.
After that, the video data is nicely sent to a speech to text algorithm, the text data is classified with Artificial Intelligence (a la CodeIt but not CodeIt yet) and all of it sent to a dashboard.
Text-driven surveys (Chris Davison)
So we know what the typical survey is structured like and most have not moved on that much from the sort that would be posted to someone in the distant past. Linear structures and the logic dictated by closed questions. Technology gives us an opportunity to flip this paradigm on its head.
Imagine a survey more like this…
What we’re looking do is use open end questions to determine the route the customer takes through the survey, asking things that are relevant to them and providing a much more tailored survey experience.
Removing the structure from our surveys is, for me, an exciting proposition and live text analysis can be used to do just that.
Create a pool of open-ended questions and as one is asked, apply live text analysis to determine which would be the most appropriate follow up and continue until either there are no more relevant questions or some constraint, such as time limit or number of questions has been reached.
From a respondent’s perspective, they should have greatly improved experience – far less asking them questions that do not seem relevant, the questionnaire is steered by the issues they want to talk about.
From the analysis side, the data quality should be much greater – in theory you’re asking questions of the respondent that are relevant to them and their experience. Consequently the ability to understand the story behind the data should also improve.
We can also start to tackle some of the issues facing us such as falling response rates – when an invite says the survey will last 10 minutes we can guarantee that – once the time limit is reached stop picking new questions. Or take a different approach, state the number of questions you’re going to ask and don’t ask anymore.
You can always ask the participant’s permission to ask more when you get to the limit, but because you’re asking the most relevant questions to them you hopefully have got the most interesting feedback up front.
There are clearly some analysis considerations – by only asking people about topics they’ve expressed an opinion about could introduce some bias, but nothing about this approach precludes randomly selecting questions or sections to provide balance. But you know when you’re doing that – you know the context in which the question was asked when it comes to analysis – you can even tailor the way it’s worded…
“We know you didn’t mention anything about your experience at the checkouts, but we’d like to ask you about it…”
To take this a step further you can then allow participants to upload photos / videos and do the same real-time analysis and base the survey route from that.
So while this is a specific example, the key principle for me is that we start to utilise the technological landscape we have available to us to start to challenge some of the fundamentals of project design. Connecting through the myriad of APIs helps us to create a combination of services that moves our industry forward and opens up new horizons.
1 Comment
Comments are closed.