A fortunate chain of events – a dry read

At Askia we love to talk about Askia things… and about a year ago, the technical team got together in a room and agreed on what was our biggest need: the ability to elegantly call a web service from a survey and decipher the result and store it appropriately.

Web-service not included

I have mentioned in previous articles how an API allows you extend your para-data. With the IP-address that you collect (and that we encrypt – GDPR is watching you), you can obtain the general location of the person. With the location, you can get the weather at the time of the interview and the likelihood they voted to a given party in the last elections.

You could always call a web service by adding some JavaScript in your page but that was not very elegant… and also made it hard to hide any authentication method.

So we decided to create a new routing where the Web Service was called from the server and not from the browser – effectively hiding the call from the interviewee. We got inspiration from the Postman interface and quickly put together a new routing.

The interface allows you to run different scripts depending on the success of the call and to manipulate and store the different parts of the response… and we introduced a new keyword CurrentHttpResponse.

QueryWebService

At that point, we thought that this had been relatively easy and we contemplated a well deserved visit to the local pub for refreshments.

XML and the Argonauts

As we were putting together an example – calling openweathermap.org to get the weather anywhere in the world – we hit our first problem.

The response looked like like this:

<?xml version="1.0" encoding="utf-8"?>
<current>
   <city id="6690581" name="Belsize Park">
      <coord lon="-0.18" lat="51.55"></coord>
      <country>GB</country>
      <sun rise="2018-03-06T06:33:58" set="2018-03-06T17:50:36"></sun>
   </city>
   <temperature value="282.33" min="281.15" max="283.15" unit="kelvin"></temperature>
   <humidity value="66" unit="%"></humidity>
   <pressure value="988" unit="hPa"></pressure>
   <wind>
      <speed value="2.1" name="Light breeze"></speed>
      <gusts></gusts>
      <direction value="200" code="SSW" name="South-southwest"></direction>
   </wind>
   <clouds value="40" name="scattered clouds"></clouds>
   <visibility value="10000"></visibility>
   <precipitation mode="no"></precipitation>
   <weather number="521" value="shower rain" icon="09d"></weather>
   <lastupdate value="2018-03-06T13:50:00"></lastupdate>
 </current>

To get the temperature, we would have had to look for the string “temperature value=” and extract the following digits… it was possible but a bit of a dirty hack, we felt. As stated before, at Askia we love to talk but we hate dirty hacks.

So we started talking about having a XML parser. The cool kids in the dev team took a clear stand: we do not need a XML parser and we would be a laughing stock if we implemented one. What we needed was a JSON parser. Even better we thought: what if AskiaScript could natively support JSON? Note: I can confirm it, we did a XML parser anyway – I hope you are not laughing.

JSON native and the dictionary

So we came up with the following syntax:

Dim myAuthorVar = @{
 “name”:”Jerome”,
 “age“:21,
 “occupation”:”laughing stock”,
 “busy”: true,
 “children”: [“Mackenzie”, “Austin”],
 “address” : {
    “postcode”:”SW12”
    “city”:”london”
    }
 }
Return myAuthorVar [“occupation”]

We were very excited but that meant we need a new variable type – it’s sometimes called an object or a map but also a Dictionary – the failed librarians and encyclopaedists that we are loved that… so there it was: the Dictionary. It allows to store a series of named values in one object. You can set its properties with a method Set like this myAuthorVar. Set (”Busy”, False ). And access them like you would with an array but by specifying a string instead of a number like this: myAuthorVar [“name”].

Variant and Arrays of Variant

I mentioned that it would be a good time to go to the pub when somebody asked what was the type returned by a dictionary accessor. In other words what was the type of myAuthorVar [“age”] ? The response to this is “it depends”… and there was no way of knowing before. Right now, it was a number, but if a web service had indicated “age” as “fifty-ish”, the result would be a string.

So we had to introduce a new type: the Variant

If you called myAuthorVar.TypeOf(), it would return “variant”… but inside the variant is a dictionary. So we created a method for Variant to know what was inside and we called it InnerTypeOf. myAuthorVar.InnerTypeOf() does return “dictionary”.

It was also nice to write @[1 ,2 ,3] or even @[3.14159,”pear”, “apple”]  – both are arrays of variants that we decided to call “arrays” for simplicity.

A variant could hold any of what we decided to call the base basic types: number, string, date, dictionary and any array of the types above. OK – let’s go to the pub! But then we remembered that JSON supported Null and Booleans… and because we wanted full compatibility, we had to create two new AskiaScript types: Null (which does not do much) and Boolean having the possibility of only taking two values: true or false.

Booleans and back compatibility

This was a can of worms – because we used to consider True and False to be numbers. Let’s imagine some script like this:

 Dim myVariable = (Q1_Name = 7)
 ' … some clever coding…
 myVariable = 42
 ' … more clever coding…
 If myVariable = 42 Then
    ' Save the world ...
 Endif

In classic AskiaScript, this would create a variable called myVariable as a number with a value of 1 or 0 and later taking the value 42 allowing the world to be saved.

We did not want to break back compatibility. I am going to summarize what was hours of discussions. We decided that comparators (like equal or Has) had to return numbers. If they returned booleans, the setting to 42 would now trigger an error because 42 is not a Boolean. And if we permitted an automatic conversion of numbers into Booleans, my Variable would take the value True (and not 42) which would change the way the scripts ran… and the world as we know would perish.

Wordy woes

Having spoken for so long, we were quite thirsty as you might guess. But we realised that our language would become very verbose and somehow inelegant if we had to convert Variants into the type we wanted whenever we wanted to use them.

In the example above, if we wanted to find out the length of our author’s post-code, we would have had to write:

 Dim hisAddress = myAuthorVar["address"]
 Dim hisAddressAsDic = hisAddress.ToDictionary()
 Dim hisPostcode = hisAddressAsDic ["postcode"]
 Dim hisPostcodeStr = hisPostcode.ToString()
 return hisPostcodeStr.Length

This was ridiculous… it would take ages to write any serious code… and we had better things to do than write verbose code (at that stage I was thinking of all the beers I would not be able to drink if I had to type that much to get my own postcode). So we went back to the drawing table and agreed that

myAuthorVar [“address”] [“postcode”]. Length was all we needed.

This elegant code was only possible if Variants supported ALL the properties and methods of ALL the basic types. That was a lot of unit tests that we had to do. So we focused (no blurred vision) and we wrote them.

This meant a serious rewrite and a careful management of conflicts: Format is a method for numbers and dates and they act very differently. So we put together a set of rules.

I’ll give you a reference

At that point, we had spent a lot of time on this, we were (very) thirsty but we wanted it to be perfect. And we realised we had a problem – what if we wanted to change the Postcode of our author (by code).

myAuthorVar [“address”] returned a Variant holding a dictionary with the address – a copy of the address. So to change the postcode we would have needed to write:


 Dim hisAddress = myAuthorVar["address"]
 hisAddress.Set ( "postcode" , "EC2A" )
 myAuthorVar.Set ("address", hisAddress)

This was again way too verbose. So we decided that accessors ( the closed brackets [ ] used by dictionary and arrays) would not return a copy of the address but a reference to the address of the author. This meant that we could write

 myAuthorVar["address"].Set ( "postcode" , "EC2A" )

This added very serious complication the the code  it’s called pointers as in dangling pointers in C++)… and that’s very difficult to make work. In the above example (as in life), the variable hisAddress can outlive myAuthorVar. We had to write a lot of unit tests to ensure that everything worked and we did not have memory leaks. This is discussed here.

For short, a variable stops being a reference as soon as you assign it something else.

AskiaScript Anonymous

We had an ongoing problem with the Value property of question – and we thought it’d be a good idea to address it now before we went to the pub.

Q1.Value returns a string if Q1 is an open-ended question. And it returns an array of numbers if Q1 is closed with multiple response. It can also be a number or a date…

Now let’s imagine we have a script like this

 Dim myVariable = Q1
 ‘ On Mondays at precisely 12 o’clock
 If Now. Day()= 1 and Now.Hour() = 12 Then
    myVariable = Q2
 Endif
 ‘ What is myVariable.Value here?

AskiaScript is a compiler – it wants to know the type of things before it’s run… but in that example, myVariable. Value could be of a different type depending on the day and time it was run.

And what if we had something like Q1.NextVisibleQuestion.Value?

So we decided that as soon as you put a question into a variable then the variable becomes an “anonymous question”. All methods of an anonymous question would work but the Value property would be a variant…. And we also decided to make sure that CurrentQuestion was an anonymous question. Problem solved! Drinks anyone?

But then we had another huge back-compatibility problem. Let’s look at the following code:

 
 Dim myVariable = QNumeric
 Return myVariable + 1

In classic AskiaScript, the system would add an invisible “.Value” after QNumeric (we call that an implicit property). myVariable would be a number and we would return that number incremented by 1.

But with the introduction of anonymous questions, myVariable was now a question. Facing an operator (the +) we would add again the implicit property .Value. But now value would be a variant and we had no rule to add a variant to something else… up to now.

So we made sure that we had rules to add any variant to another Variant – or any basic type or array of basic types. Not just add but also subtract, multiply, divide, compare – including all the keywords like  Has, HasNone etc. In total, combining 4 operators, a dozen of comparators, with 6 basic types and 3 types of arrays (number, string and variants) that made a lot of decisions to take (and a lot of discussions) and many many unit tests.

Before we started this development, we had 1667 unit tests ensuring that all functions of the AskiaScript behave the same from one version to another.

For this, we had to add 2231 (!) more unit tests. Once they all passed successfully, we added the whole thing to Suite 5.4.8 and we hope you’ll like it.

Enough Quant Tricks, we’ll be in the pub for a swift one – we deserved it.

Panel providers, unite – the speech at the ASC

On the 9th of November, the ASC invited some panel providers to attend a discussion on panel harmonisation. The discussion was orchestrated by Tim Macer.

Here was my speech – the written version at least as I may have ad-libbed a few unscripted things.

ASCPanel

Market Research is changing. You have heard it a million times – not in the way that Ray Pointer announced. There will be more surveys in 10 years than ever. That’s the good news. The bad news is that most of them won’t be run by MR institutes. The goose with the golden eggs is dead – client now run their own surveys which means MR companies – just to stay in business – have to be more competitive.

Goose with the golden eggs (before / after)

before-after

They started to delocalise in India, in Romania or in Ukraine. But that was not enough. To save more money, they have started to use automation.

This has its advantages – of course the surveys were a little bit more formatted… but Millward Brown had done that successfully for years. But once the bugs are eradicated, it’s efficient, fast and most of all cheap. And no blockade by disgruntled employees – although that’s more a French problem.

PresentationBrands

The problem is that end-clients are following up the trend – they can do automation too! They are using Zappi Store and Wizer… and SurveyMonkey and SurveyGizmo and ConfirmIt (and Askia). And ToLuna. And SSI self-serve. And Lucid. And Cint.

I have mentioned it at the ASC’s last conference: we have entered a golden age. The age of the API. A golden age for geeks like me at least: the internet is changing into a gigantic API where information is exchanged through web services. Everything is interconnected and uses the same interfaces.

IoT

I do not know if any of you have used IFTTT – If This Then That. It’s an app where you define a condition and an action. If I get near the house, put the lights on. If the temperature gets below 17 at night, put the heating on. If I enter the kitchen in the morning, put the radio on and start the coffee machine. If I have no milk in the fridge, order some. The IoT – the internet of things – is happening through one common interface through web services… and all industries are playing ball because they want their share of that big cake of a connected world.

oil-rig

I know we all have panel providers on stage so they might disagree with me. But panel data is no longer the only oil on planet Research. Customer databases are increasingly used because they can be energised by communities. And there is all sort of big data available at large – aggregated or not. It could be a loyalty card data, www foot prints or mobile phone data.

wine-glass

And just like for a good Bordeaux wine, to get quality you need to master the art of Blend. The merlot a bit dry and earthy – that will be your panel data. There is some cheap Merlot and very good Merlot too. And the Cabernet Sauvignon with its fruity flavours – that will be your behavioural data.

But unlike the IoT industry, Market Research providers have not decided to play ball. There are the ones who do not facilitate automation because they are afraid of losing control and burning panel. And there are the ones who do but work in isolation.

I do not believe there can be one company that will fill all the needs in Panel data. ToLuna is posturing itself as a one stop for all MR needs: the software, the panel and the behaviour. SSI is doing something similar and the merge with ResearchNow is going to be very interesting. The Leonard Murphy analysis about that on GreenBlog was great btw. And it won’t be scraps left for the others – because the need for data is growing – the need for specialised quality data will be growing too.

babel

But we need a common language. A common grammar. What is a social grade? How do I define national representativity? And how do I trigger a soft launch? How do I notify that a quota is full?

But there is another side to this discussion. If we let anyone access a survey which is tedious, long, repetitive, with grids, 2 max-diff exercises and one 20 minute trade-off, how do we reward the dedicated weirdos that filled that nightmare of a survey? How do we warn them that they are in for the long run? Because we might lose another goose with golden eggs. How can we stop the cull of panellists and the ever drop in response rates?

tediousness

I suggest we build metrics: number of questions, number of responses in a question. And then number of words per question, number of similar questions, number of mandatory open-ended questions… and then build a model.

$(Survey) => (Length(Survey) x TotalTediousness(Survey))-1

And then remunerate the panellists (and their providers) accordingly.

While I was preparing this discussion with all of you, most of you mentioned of how slow moving our industry was. It’s not just that: it’s protective, short-sighted and technologically unaware. And that’s everything the ASC is not. It’s at the ASC that triple-S, a format to exchange survey data between competing survey software was created and promoted. It’s two of my competitors, Steve Jenkins and Keith Hughes, who patiently showed my errors and taught me how to write a proper triple-S file. Let’s all be a little bit more like them and a little bit less like Apple who introduces a new plug and a new format with each new version.

chinese-propaganda

That’s my manifesto – a call for arms… please discuss and let’s move it forward.

Panel providers of the world, unite!

The short story

The industry is demanding more streamlining and automation… the only way that can happen is via standards – what are the Panel providers doing/proposing to do in this respect? We would like better visibility on their APIs and the differences between them… possibly talk about harmonising some key variables. We think there should be an automated standard evaluation of surveys in terms of length and complexity to better pre-evaluate the cost of sample.

We would like panel providers to explain their position – and their added values – in a (wait for it) panel discussion on Thursday the 9th of November in London – ORT House, London NW1 7NE as part of the one day ASC conference.

The very long story

I have always wanted to join an English gentlemen’s club. If I moved to the UK, I was going to be Phileas Fogg: travelling the word after a drunken boast and a wager over a bridge game. Last month (after 22 years in the country), it finally happened; I was asked to join the Association for Survey Computing.

I expected a standard acceptance ceremony: arriving blindfolded in a dark room, greeted by men in togas, a solemn oath with my hand on the 15th century preserved skull of the founder of the organisation, uttering something in Latin, maybe “Nam melius quaestiones”.

I was not disappointed. It was a Thursday morning Webex call to agree the subject of the November one-day conference. After the usual rambling about the weather (it was a cold September morning with a forecast for rain in the afternoon), roles were assigned. “You’re French”, they said, “you’re good at starting revolutions” they said “write a manifesto!”

And in truth, a revolution is needed. In previous years, the only way to have a lucrative MR business (not that I know about that) was to delocalise. The new trend is to automatise: you standardise a survey (want an ad test?), select the target (nat. rep. sir?) and you have your dashboard with your data ready just as your PayPal account is being debited. For this to happen, you need an automation platform (Zappi Store and GetWizer for instance) or a survey platform with an API… and you need a sample provider.

And that’s where it gets complicated.

A short digression into the real world

Let’s imagine you have built the perfect automated survey solution… it works nicely and you get results for every wave in exactly 2 hours 47 minutes. But for a given survey, you want to use a different panel provider to reach a very niche B2B target. You contact that specialist panel provider and explain your needs. They are enthusiastic about the idea and Adam, your contact there, wants to test your survey first – their panellists are special, you don’t get to burn their community like that. After 48 hours, Adam calls you back with a price, it’s on the expensive side but you agree right away because you want the data now – well you actually wanted it 45 hours and 13 minutes ago. Now he sends you a list of the internet parameters you need to accept in your survey… what was called SG with panel provider 1 is now called SocialGrade and GE becomes Gender3b… of course you already know why it’s called Gender3b; they introduced an “other” (and a “prefer not to say”) to the gender question. Your survey scripter says he needs a day (or two) to impact the changes… but he can only start after the week-end because it’s Friday and the web designer who did the icons for the gender question has already gone snowboarding for the week-end.

Here comes Monday, the designer damaged his knee and you decide to scrap the icons. The client checks the survey on Monday afternoon (they are based in the East Coast) and they want the gender icons back to verify the sample… so you add (early next morning) a nice routing to exit the survey if they say “other”. No soft launch, we don’t have time for that. Quickly (but not quite quick enough) you realise you have screened out 99% of respondents – your scripter wrote the routing the wrong way. You call a very unimpressed Adam to stop sending sample. Your guys finally correct the routing but unimpressed Adam has gone for the day. You eventually get through to him late morning the next day and he agrees to send more sample.

The data fills your automated portal nicely… you start to relax. You shouldn’t, your client has had a look at the data and he has noticed something very weird with the student segment. How is that possible? You’ve changed nothing there… until you decide to call Adam who reluctantly agrees to take you on. He explains calmly that although the internet parameter is indeed SocialGrade, the value 23 does not indicate “Students” but “Deep sea divers”… Did you not read the explanatory document he attached to his email on Thursday last week?

Now you know you are going to have an interesting conversation with both your client and your boss. But you may as well leave it until tomorrow.

And that’s how automation got scrapped in what you must now call your previous job.

The quest

So let’s get back to my personal quest – how can I make automation and surveys better? The answer is simple: by getting panel providers talking to each other.

That’s never going to be easy. Some of them are already panel aggregators and they feel they have already done the hard job. Others feel commoditising panels is not in their interest and will drive prices down. Some say it’s simply not possible because their own data is too rich. And all agree that sending sample to a broken or boring survey is the one reason that response rates – along with data quality – are dropping.

And they are right. Data is precious. We need to treat interviewees with respect and that’s not what we do when we send them a 40 minute conjoint survey (and tell them it will last 10). For panel providers to evaluate pricing properly, they need to know how good (and more likely how bad) our survey is.

We need to build metrics on the length of a survey (a lot of data is available there) but also on the boredom index of a survey: number of grids, number of responses per question, number of words per question text, number of questions with similar text, number of mandatory open-ended questions… and prices should vary accordingly.

Another option would be that the price could be fixed by the soft launch data. At the end of the survey, we measure interview interest and fix the price of the panel accordingly – with a rebate if the full survey data is actually below the early measure.

And how do we harmonise panel data? Should we break down questions in categories and sub-categories (demographics, lifestyle, political leaning) and incorporate that in the naming? Can we have the same break-down across different countries? For which questions? Should the naming convention clearly indicate the number of responses to avoid coding errors?

Be our panelist for a day

We’ve so many things to discuss… and we thought it’d be best if we did it in public. You, the panel providers, could tell us what you think… explain what’s special about your company, detail your API or your choice not to have one. And the ASC audience – rather technical but friendly – could tell you what they want and stand witness to your promises. The result could be a standard, (national or international), an API router or just an Excel spreadsheet, depending on the uptake… but independently managed – by the MRS, Esomar, ASC or SampleCon.

So please come to ORT House in London, on Thursday the 9th of November. Tell me who from your company is ready to speak and take part in the panel’s panel discussion, and in a few lines, give me an outline of how you’d respond to our challenge on harmonising panel data and panel interfaces by Monday 2 October. We’re looking for original thinking, fresh ideas and practical answers.

Panel Providers of the World, Unite!

Welcome to the machine

The following is a transcript of a talk given by yours truly and Chris Davison from KPMG Nunwood at ASC’s One Day Conference on the Challenges of Automation in Survey Research on May, 11th 2017.

Introduction

We have entered the golden era of automation –in other word: make machines do things. At first it was repetitive and simple things – find duplicates in a sample list, copy that survey and substitute the word Coca-Cola by Pepsi and send the results to all the executives of the relevant company – not mixing up companies is the perilous thing that you do not want to get wrong.

Automation is for lazy people – and I have always considered laziness to be a quality! Lazy people [programmers] look to avoid doing things they don’t really have to, and when they do finally have to, they look to get it done with the least amount of effort.

Now if we are to believe claims automation can post-code open-ended responses (yeah right Tim!), write reports, win at Go (mark my words – that one is never going to happen) and soon Skynet is going to enslave us (it will be ok as long as we don’t choose the red pill… or the blue pill.. damn I have to remember which one).
Automation is not new. Software is automation. But not everybody is a programmer – well this is less true at the ASC.

For every-one to benefit from the work done by the best programmers in their sector, Application Programming Interfaces (APIs) were invented. An API is a pile of code (usually documented) inside a box accessible by another software without having to understand its inner working (or have access to the source code). Here’s a brief timeline:

Timeline

By 1999 it was cool to be a programmer (and not just at the ASC). Every web designer was now a programmer – writing awful code in JavaScript so they could animate their poorly designed website while growing and grooming their facial hair and sipping their Frappuccino soy latte.

XML was no longer the cool kid on the block JSON (JavaScript Object Notation) was – more compact, more elegant and directly usable in JavaScript. jQuery – a JS library, quickly replaced by Angular and then React – made it super easy to query any website and people started calling me a dinosaur because I am a C++ developer.

Automation was always possible before: to make software interact you needed a database accessible by both parts. Or start an executable from the command line… Or a file drop on an FTP server. All these are huge security risks. I am not saying that web services are not security risks – the risks are just less understood so easier to sell to your CEO.

So what was new in the survey world? Well everything.

Panel providers created APIs. First Cint then Lucid which lead to an explosion of DIY research. Software providers opened-up their APIs and some even documented it. I will not give you the list but these days, even SurveyMonkey has an API.
And for me the revolution came with CRM system – like SalesForce, Microsoft Dynamics or ZenDesk – opening up the Enterprise world. You could interview any customer after any touch point… understand what’s happening and adapt quickly, it’s called by one of our competitors the experience gap.

Behaviour is now captured outside of surveys. The “What and When” is known. Surveys can concentrate on the “Why?” and the “What if?”. Verve is managing insights for Walgreens, the owner of Boots. Thanks to the loyalty cards, they know you have bought paracetamol on the Monday, Ibuprofen on the Tuesday and nothing on the Wednesday (with an app and iBeacons they also know where you have walked) – and they can sample you accordingly and interview you to understand your buying pattern.
So the game is about asking the relevant question at the right time. In my opinion nobody does it better than KPMG Nunwood and that’s because – automation looks like magic when it’s well done – there is a wizard in the backroom somewhere in Leeds…

Case study: Nunwood (Chris Davison)

So I’d like to tell Nunwood’s automation story, how it started, the obstacles we faced and where we are now before later coming onto some of the ideas that we implement to take us further…
How did we start? Well I wish I could say we had some far reaching vision but in all honesty this is what kicked things off for us…

In case of emergency, panic.

Sadly there wasn’t much long term vision associated with our first foray into automation, that came later. What got us moving was necessity. We had several UK projects using bits and pieces of automation techniques but they still required manual intervention at key points to move the process along.

As we started to expand globally, we were faced with the challenge of processes needing to run at any point in the day – including when our UK team were tucked up in bed. Our first attempt at a solution was a successful failure – we met the needs of the project, but the mish-mash of command line, SQL scripts and Askia was very complicated and wasn’t very accessible to the entire team.

If we were going to extend the automated approach to other projects, it was clear we would need different tools and this is what brought us to LoadIt. While a simple tool to use, it allowed for a great deal of complexity meaning new starters could get to grips with it quickly but the more seasoned DPers could still deal with our most demanding projects. Later, its extensibility would allow us to integrate with our in-house developed systems such as our Fizz online reporting platform.

robot

Given LoadIt’s existing integration with Askia and the automation capabilities within Askia we soon developed the long term vision that we were missing – the Zero Hours Project.
Given certain conditions – mainly stability in the project’s design and outputs – could we automate all elements of data collection and delivery?

It was an ambitious goal and one that had to have some compromises – there would always be some elements that would need manual intervention, so “zero hours” really meant “very few hours” but that didn’t quite have the same ring to it.

Discussing these kinds of developments with the whole team raised justifiable concerns that this would mean people’s jobs, but automation does not necessarily mean reducing head count and it certainly wasn’t our goal.
The tasks that lend themselves well to automation are the ones that don’t change – removing these from the team’s workload would free up time for things that required the skills for which they were hired (primarily their problem solving abilities), the skills required for automation were also different from the typical work, meaning it provided an opportunity for people to learn more broaden their skillset. We could also expand the remit of the team – in particular we took on more responsibility for the configuration of our reporting sites. From an operational perspective it would mean we could go some way towards flattening out the peaks and troughs that had developed in our working patterns – driven by the fact that most of our work was tracking studies that came out of and went into field at very similar times.

Framed like this, it was a very positive message for the team and I’m pleased to say everybody got behind the idea.
The main challenge from the wider business was around quality control. If machines were doing all the work, who was checking it?
It was a valid concern but one that could still be approached with automation at the forefront. All surveys are based on a set of rules, however complex – each question has criteria that must be met before it is asked, so it’s reasonably simple to check that the rules have been met.
We created VB scripts that could test the rules and give a pass or fail to a set of Excel tables, this meant that the same files we were using for automated checks could also be verified manually passed to our Insight team should they want to double check things.

Quality stamp

So back to the original question – could we run a zero hours project?
The answer wasn’t simple – Yes, if you considered the caveats – when things didn’t change and we considered the standard elements of the project we could produce the outputs through automated scripts. No, because an unexpected consequence of the changes was a change to the way our Data Team worked with our Insight Team. With many of the repetitive, standardized tasks removed, we found we had more time to work on ad hoc requests and deeper analysis of the data – meaning we could add more value to projects.

We had seen many of the improvements we had hoped as well as some unexpected ones: operational improvements, better working practices and we’d started to extend our capabilities.

Paradata (Jérôme Sopoçko)

One of the most exciting areas of using automation and APIs is during (or just after) the collection of the survey. Paradata is often just the date and time of the start of the interview. But more generally it’s about storing any information about the way the interview was conducted. You can find out the name of the browser, the operating system and the language used in the HTTP request header.

If the interviewee is using Internet Explorer 5 (or more generally any version of Internet Explorer), do not bother asking technical questions. Similarly, if the operating system is Linux, forget these ask technical questions because you won’t understand the answer.

If IE is brave enough to ask to be your default browser...

Beyond the interview, you can find information about the world. If you interview someone who is boarding a Eurostar train, it’s interesting to check the volume of #Eurostar hashtags in the Twitter API: it’s a strong indication of problems on the line.

Now let’s talk about the IP address – this identifier assigned to you by your Internet Service Provider. Of course your ISP knows who you are and has been allowed to sell (in the US) your browsing history.
There are a number of companies out there who specialise in transforming an IP address in a geographical position: www.freegeoip.net, www.maxmind.com, www.digitalelement.com, …

But you can have a much better definition of the geo-location by authorising the browser. Google thanks to their StreetView vans was fishing for the Wifi network information so they can improve the location which is now at scary levels of accuracy.

As mentioned to have accurate geo-localisation, you need the permission of the user. The idea is not to gather information in a sneaky way. Tell the user what you are doing with this information, explain that you are reducing the number of questions you ask… Because there are so many resources available: openweathermap.org will give you the current weather in any location, developer.zoopla.com will find right away the average price of the house in the vicinity. And then you have the open data government sites. Data.gov.uk have put on 185,000 datasets. Call me old fashioned but I still think we are in Europe, the EU Open Data portal has 10,700 dataset with a full API to access them. For free.

So what can we do with this data? Does that help to know that my interviewee is in Cardiff where 60% of the people voted remain? Linking big data and survey data is one of the greatest challenge of #MRX – and if you are not one of GAFA (Google Amazon Facebook Apple) you are at a real disadvantage.

Let’s use a concept developed for advertising planning: the Average Issue Readership (AIR)
We ask a significant amount of people (very significant in the case of the TGI survey) these questions: “How often do you read this newspaper?” or sometimes rephrased in “When did you last read this newspaper?”. There is still a lot of discussion to find out which is the best way to ask these questions – they are usually called probability questions.

Grid question

So you get a very classic grid question like the above. Thank to these lovely people who do research on research, we get the following probabilities of reading, based on the “recent reading” question.

Recent reading - NRS survey

From there we can infer, for each interview, the probability that he has read or not any given issue of a newspaper… and the great thing is that we can use that information in crosstabs and crossing that by their likelihood of buying Corn Flakes and plan our advertising campaign according to this.
In a normal survey, if we simply ask people what newspaper they read and we cross that by their gender. When we do a cross-tab, for each interview of gender who has given brand Y, we add 1 in the corresponding cross-tab. With probability question, we add a value between 0 and 1 indicating the probability of the person having seen the brand.

AskiaAnalyse table with randomised data 01

AskiaAnalyse table with randomised data 02

OK, the data looks a bit weird because the counts have decimals but once you move them into percentages, nobody cares anymore. And of course you can still use weighting if your panel data was not balanced.
So what does that mean for you: although you never asked for the vote that fateful referendum, you can cross the NPS of your brand with the vote for leaving Europe who (from what I have seen at the MRS) increasingly used as a segmentation tool.
Of course it’s not just the election you can cross by: the level of crime, the amount of subsidy, the likelihood of rain…

Beyond paradata, you can also create additional information with the questions you actually ask. We have worked on a project where the conjoint analysis utilities were computed in real time – that meant automating R (like Ian showed earlier) to get the results a few screens further, show the best concept for a given user and validate it.
Beyond that – the revolution is also around open-ended question analysis: you do not write open-ends anymore.

You will speak to your device, your computer, but also your phone, tablet, Alexa, your fridge…any IoT device – which have ways of recognising you. MyForce, our sister company, works on Bison – a revolutionary platform of not just speech to text – but identifying people by their voice (who’s talking), classifying the tone and talking speed (how are we talking) and the content (what are we talking about).

It’s not just Bison – look at what APIs Microsoft offers (Microsoft cognitive services) and R is integrated in SQL Server…

Microsoft APIs

Google and Facebook are also on the bandwagon (the gravy train).

One of our clients, through our other sister company Platform One – Nuaxia – has a panel of 1,000,000 doctors (not all in the NHS). These guys are in a hurry but they have interesting things to say. So, Nuaxia lets pharmaceutical labs survey these guys but only ask them 10 questions. And instead of asking them to type, they film them.

Platform One interface

This is the interface to create the survey – this is kept simple – it’s for pharmaceutical people. From there a survey file is created through the API of a well-known software vendor, they debit the PayPal account of the white blouse guys, invite the doctors and the data pours in.
After that, the video data is nicely sent to a speech to text algorithm, the text data is classified with Artificial Intelligence (a la CodeIt but not CodeIt yet) and all of it sent to a dashboard.

Text-driven surveys (Chris Davison)

So we know what the typical survey is structured like and most have not moved on that much from the sort that would be posted to someone in the distant past. Linear structures and the logic dictated by closed questions. Technology gives us an opportunity to flip this paradigm on its head.
Imagine a survey more like this…

What we’re looking do is use open end questions to determine the route the customer takes through the survey, asking things that are relevant to them and providing a much more tailored survey experience.

Removing the structure from our surveys is, for me, an exciting proposition and live text analysis can be used to do just that.

Create a pool of open-ended questions and as one is asked, apply live text analysis to determine which would be the most appropriate follow up and continue until either there are no more relevant questions or some constraint, such as time limit or number of questions has been reached.
From a respondent’s perspective, they should have greatly improved experience – far less asking them questions that do not seem relevant, the questionnaire is steered by the issues they want to talk about.
From the analysis side, the data quality should be much greater – in theory you’re asking questions of the respondent that are relevant to them and their experience. Consequently the ability to understand the story behind the data should also improve.

We can also start to tackle some of the issues facing us such as falling response rates – when an invite says the survey will last 10 minutes we can guarantee that – once the time limit is reached stop picking new questions. Or take a different approach, state the number of questions you’re going to ask and don’t ask anymore.
You can always ask the participant’s permission to ask more when you get to the limit, but because you’re asking the most relevant questions to them you hopefully have got the most interesting feedback up front.

There are clearly some analysis considerations – by only asking people about topics they’ve expressed an opinion about could introduce some bias, but nothing about this approach precludes randomly selecting questions or sections to provide balance. But you know when you’re doing that – you know the context in which the question was asked when it comes to analysis – you can even tailor the way it’s worded…
“We know you didn’t mention anything about your experience at the checkouts, but we’d like to ask you about it…”

To take this a step further you can then allow participants to upload photos / videos and do the same real-time analysis and base the survey route from that.

So while this is a specific example, the key principle for me is that we start to utilise the technological landscape we have available to us to start to challenge some of the fundamentals of project design. Connecting through the myriad of APIs helps us to create a combination of services that moves our industry forward and opens up new horizons.

Of Askia Scripts and Functions

Introduction: What are Askia Scripts for? Or should I say what are their function?

AskiaScripts were designed to evaluate conditions within a survey – at first to branch the survey and then to set values to (often dummy) questions. They needed to be easy to write (and re-read!) and the user should know at creation time if the script was going to succeed or not.

The needs to improve AskiaScripts came as our clients’ surveys became incredibly complex – and that we used our language to produce our ADC.

Lately, AskiaScripts have been used to run very complex routings – like the post-codification of open-ended responses. We had a request to optimise a routing which had hundreds of lines…

AskiaScripts are also used in Tools to verify the quality of data at the end of collection. It’s here that the demand of functions came loudest where there is a need to norm the way straight-lining is evaluated for grid questions for instance. Here again we have seen scripts which have thousands of lines.

Finally, AskiaScripts are also used in Analyse to achieve increasingly complex calculations on the fly – and aggregating data while being at interview level.

From the feedback we received, we believe 2.0 is a success although the in-take has been slow (even internally).

I believe AskiaScripts will be used for weird custom adaptive conjoint, very complex calculations at run-time (segmentation) – I think it will also be used in defining and running super portfolios at a later stage.

Let’s summarise what the core values of AskiaScripts are – knowing they could be antonymic:

  • Simplicity
  • Adapted to survey research
  • Reliable: minimise the likelihood of runtime errors
  • Powerful: the competition often uses JavaScript (which does not have the 3 previous points)
  • And finally extensible – by Askia and by users

Functions: extending Askia Scripts

Rather than us adding functions whenever they are needed (which will still happen), we have decided to let users create their own functions. Teach a man to fish and you have saved yourself a fish.

A function is a piece of code that you can call with different parameters.

By default, the parameters of a function will be passed by value for our basic atomic types: numbers, strings, and dates. The arrays (and all complex objects) will be passed by reference.

Script Value Reference screenshot

If we want to change the way the parameters work, we can use the keyword ByVal or ByRef to force passing the parameter by value or by reference respectively.

Script By Val By Ref screenshot

Let’s talk about scope, baby

Scopes of variables screenshot

A scope defines where a variable is available.

Variable1 is available throughout your script. Referring to Variable2 will generate an error if it’s after the Else statement.

AskiaScript hit the same problem that most scripting languages have had (JavaScript, old VBA, … ). Every variable created is global – unless it’s within a For or an If – or a function.

This might not be a problem when you write a routing condition. It will be if you write an Adaptive Conjoint or a full-on survey analyser. You will need to remember which variables you have already used and name them differently and it will make it very hard to re-use code (the holy grail of any programmer). It also makes IntelliSense (automatic code completion) absolutely unusable.

Every language came up with a different solution to that problem. The original 1960 languages had global variables. Then functions were invented (with parameters passed by value or by address). Then classes and name spaces were invented. JavaScript went another way – it used nested functions to make sure that variables (and sub-functions) were not visible everywhere.

To be or not to be typed, that is the question…

Any variable or method in Askia is strongly typed – this means that at compilation time, we already know the type of the variable. This allows us to know if you can use a method or not for every object.

For questions, this means that we we know that Gender.Value is a number (1, 2 or DK) and that FavouriteNewspapers.Value is an array of numbers.

But if we have a function that takes a question as a parameter, we do not know the type of its value: it could be a number, an array of numbers, a string or a date…

Script Typed Question screenshot

Within the function, we say that the question is anonymous. And we have defined its Value to be a Variant. A variant is an object whose type we only know at run-time. For this, you have a few properties that you can use to convert a Variant into something more useful.

A variant has the property InnerType which indicates what it holds. You can convert any Variant into something else with the following methods: ToNumber(), ToString(), ToDate(), ToNumberArray().

Script Variant To String screenshot

Mods rule!

After a lot of internal discussion, we have decided to define Modules – or name spaces. You will be able to put together a set of variables and functions together. By default – and unless you specify it – these variables and functions will not be accessible from outside of the modules – in Object-Oriented Programming, this is called encapsulation.

You will be able to make some of the variables and functions available from outside the module – they will need to be prefixed by the keyword Export.

To clarify everything, let’s have some sample code:

Script Module screenshot

Inside the module, you can refer to the variables MaxAnswers and Pi from every-where. And you can call any function defined in there.

Outside the module, you will have to write SampleModule1::DoTheCalculation or SampleModule1::MaxAnswers to access the public members.

The default way to create a module is with Module XX / EndModule. You can either include the definition of your module in your condition script OR write it in a file that you add as a resource. These files must have a .asx extension (Askia Script eXtension). To use a module in a routing, you need to call Import + name of the module.

Script Import Module screenshot

Note that a call to SampleModule::PI or SampleModule::DoTheCalculation would return an error.

When Import SampleModule1 is called, all the code which is outside of the function will be run – that is everything in Initialisation a) and Initialisation b) in the example above.

AskiaScripts evolve all the time… and we might create a function which conflicts with a user defined one. The user defined one should still work (and be called) once the new version is released – back compatibility is important.

One side effect of what we have decided to do with modules is that variable declared in the main scope will be global in a whole script if modules are not used. We are hoping we won’t regret this in the future but the aim of AskiaScripts is not to build full on applications… yet!

Conclusion

Functions and modules will be available in 5.4.5 – released in askiafield in February 2017. We will – at a later stage – introduce newer concepts – true OOP, lambda functions. Imagine two instantiated similar modules, it’s pretty much like two objects! We might have something like Dim myObject As Module1 somewhere down the line.

I also believe that we will like to add methods to Askia objects: example Array.RemoveDuplicates().

Script Remove Duplicates screenshot

Note the possible keywords Extends and This (should we call it This or Self?)

But in the meantime, what we have added should make most advanced users happier. We’d love to hear what you think and you suggest what we do next.

Quota: sticking to the script

Nobody likes quota. They have the off-putting echo of a well-wishing community reluctantly leaving Apartheid behind. If researchers mention quota, it’s because you did not hit the targets. If a financial director mentions them, it’s to tell you how you went over and blew the budget. You do not like quota – and us, programmers, well, it was never our favourite part of the job.

But with askiafield 5.4, we have put that behind us and made quota sexy. We have rebuilt the quota interface and the quota distribution engine.  Upgrading an interface – although time consuming – is rarely a problem. Well, we made it look cool which was quite a bit of work.

Changing the entire quota engine is not something that one should approach lightly. We did it with extra care: we put together hundreds of unit tests (where we predict and verify the output of code) and integration tests (where a full automated run of CCA is monitored and the results analysed).

This refactoring had a few goals:

  • Simplify interface(s): quota definition and the quota monitoring could be done in the same window
  • Add functionality: multiple questions, numeric, grouping responses, remove all limitations on the size of the quota tree
  • Expose through an API: the quota can be defined and monitored from a web interface – or automated from an external system (like Platform One)
  • Clarify quota scripting

This article does not focus on the actual functionalities of the quota – they are documented here – but on the impact of scripting quota through routing.

Why script quota?

Scripts are not usually used for screen-out quotas. These are usually dealt automatically (by the dialler in CATI or by the automatic settings in quota). You want 500 males in region X – once you have them, the interview is simply terminated.

Typically you need script when you want have to take a decision about which concept(s) you want to test. You first ask which ads they have seen and you decide to randomly pick 2 of them and question about them.

Ideally you want to select the ones that are the least filled – the ones furthest away in counts or lowest compared to the target percentage. And you might have weird priorities to take into account (always test your client’s brand against another one, etc…).

The rules can be complicated but we have provided simple functions for this.

5.3: the unbearable weakness of strings

In 5.3, you had the possibility of querying the state of the quota by using IsQuotaFullFor, QuotaToDo, MaxQuotaToDo, and AvailableQuota.

It did the trick for a while but there were problems:

  • It was dependent on a string (e.g, QuotaDoTo(“Region:1; Product”)). It was easy to spell it wrong and only realise that you had misspelled a question near the end of fieldwork.
  • It assumed you knew your quota tree – if you had not nested the Product within the Region (or decided to relax the rules near the end), you would get the wrong result.
  • The returned result was only looking at one quota row at a time.
  • The target counts were not taken in account to prioritise your selection.

Quota in 5.4? Sorted!

Enter 5.4 – well 5.4.4 really. We have introduced new keywords: they are methods of questions instead of functions. In other words, you write something like Gender.AvailableQuota() instead.

  • AvailableQuota: returns an ordered list of responses for the quota which are still open. The ordering is done according to the count: the first element is the response where the highest number of interviews are to be found.
  • AvailableBalancedQuota: Same as AvailableQuota but the ordering is done by the difference between targets and observed.
  • QuotaList: Same as AvailableQuota but all responses are returned (even the one ones over quota).
  • BalancedQuotaList: Same as AvailableBalancedQuota but all responses are returned (even the ones over quota).

If you want to specify some additional information about the tree you can: its works like this: Product.AvailableQuota (Gender: 1, Region :3). This means no more spelling mistakes would get in the way as the compiler would pick on the fact that you specify an incorrect question.

Another thing: if the gender and the region are specified in the interview, you do not need to indicate them but you could get information about another region for instance.

But from now on, if you need to pick 2 products to test and regardless of the nightmare of a quota tree you may have defined, you should simply write:

Dim arrProductsToTest = Product.AvailableBalancedQuota()

Return {} + arrProductsToTest[1] +  arrProductsToTest[2]

Back compatibility – what is it good for?

You know we care about it. We really wanted it to make sure that scripted surveys would work as usual. But we wanted to ensure that the old weaknesses were gone. So all previous quota functions will work with the old string… but we also took the liberty of sorting the result for your convenience… and to check the whole quota tree in case a priority at top level interfered with one of the nested quota.

So we have back-compatibility but not quite: it’s simply better and more flexible – and when the old quota tree was failing, you will get the expected results. We hope you agree.

Quota categories

The algorithm to know if a quota target applies to a given interview is actually quite complicated but we are going to explain it as simply as we can… feel free to skip this (and trust us).

Let’s imagine we have a quota tree like:

Root TO DO
1 Male 50
2 Product A 40
3 Product B 0
4 Female 40
5 Product A 15
6 Region1 10
7 Region 2 5
8 Product B 15

 

Let’s look how we run the following Product.AvailableQuota (Gender: 2) call:

  1. We will look for the availability of the first modality (then second…) – so first we will look at Product A.
  2. We count the number of targets we need to attain: one for the question object and one for each of the questions passed as a parameter (Product.AvailableQuota (Gender: 2) would mean 2 targets, Product.AvailableQuota (Gender: 1, Region 2) would mean 3).
  3. We create a quota category where we set the Product (according to step 1) and we also set the parameters
  4. For all questions used in the quota, we look in the interview to see if we have data and we set it in the category.
  5. We are going to iterate through the tree – starting at the root
  6. When we hit a response for a question that’s defined in the quota category, we either explore the sub-tree or skip the branch. For example, for Product.AvailableQuota (Gender:2), when we arrive at row 2, we would skip the entire tree and continue at row 4
  7. We count the number of questions we have found which are part of our targets (as defined in step 1). If we are looking for product A in Product.AvailableQuota (Gender: 2) we would hit that target on row 5
  8. Once we have hit the target we add all the sub-quota rows. So for product A in Product.AvailableQuota (Gender: 2) we would select the following rows 5,6,7. All the quota rows? Not quite! If the region 1 was set in the interview, we would not add row 7
  9. Once the whole tree is scanned, if we have selected 0 rows, we remove one of the targets (like Gender or Region in the Step 2 example) and start again at Step 2
  10. We would go through all the selected rows, and we would return the To Do with the most constraining value (the maximum of the minimum To Do and the minimum of the maximum To Do). Yes you might have to re-read that last sentence.
  11. Do the next response (product B) and re-start at Step 1)

There is added complexity for groups… if a response is in a group and has no targets, we use the first parent group who has a target.

If a response does not have a target, we assume that the To Do value is 1.

That’s it folks!

Conclusion

We think that AvailableQuota and AvailableBalancedQuota should cover 99% of the scripting needs. We’d love to have your feedback on this of course. We might later introduce a quota object where you will be able to query the actual min and max target or the priority… let us know when you need that and how you think it should work!

Big Data with just one digit

I know some of you think I only attend conferences for the free food, the drinks and the social scene. They are right – no point in me denying. But in-between parties, I tend to heal my hang-overs in the semi-darkness of conferences.

Coming back from the ASC and ESOMAR, there are a few new tendencies in the Autumn/Winter MRX fashion. Forgotten MROC, gamification, mobile research, Big Data – that’s so last year… it’s main stream, dude.

These days the cool kids talk about Automation, Data fusion, Artificial Intelligence… and the Tinderisation of research.

Automation – if you’re an assiduous reader of this blog, you know it’s coming and fully available at a software provider near you. I am not going to ramble anymore about this for now but watch this space.

Artificial Intelligence is the next big thing in Research. It has been successfully used to post-code and (less successfully) to measure sentiment in open-ended questions and tweets. It’s also good at recognising logos and objects on pictures and films, building accurate predictive models and beating me at Go (well the latter is not news and not strictly research)… but now AI is also used to merge data. There is an inconvenient truth about convenience panels… and MR data in general. If your survey is 40 minutes long (or 20 minute on a mobile device), the resulting data will be awful: the participants are either too unusual to be trusted or they don’t care because they are not incentivised.

Although there is no evidence of the length of surveys diminishing (according to SSI), every-one agrees that it needs to happen. One way is to… well… make up data. You do not ask all the questions to every-one and you copy the data around for similar looking interviews – this is called ascription (it has been around for some time). For you stat geeks out there, it’s traditionally done using the Mahalanobis distance. The new thing is to use machine learning to infer missing data. Mike Murray and James Eldridge from the Seo toronto agency had a great paper about automating the splitting of surveys in chunks from their XML definition. Annelies Verhaeghe from Insites and John Colias from Decision Analyst also presented two great papers about enriching surveys with open big data.

And finally after the uberisation of research which has seen the arrival of monkeys, gizmos, nuts & limes, the new trend is the tinderisation of research. Millenials (there were boos whenever the term was used – and that was every 47 seconds) take decisions with their index. Left means no, right means yes… and survey research should follow. It’s easy to understand, fast to answer and it’s your system 1 talking… And the index is not just for decisions… the navigation of a survey should be done through flicks of the index. Almost being a millennial myself (the NSA has the names of those who are laughing), I see the attraction… and we are soon to release something code-named Jupiter that might just turn (or keep) Askia the best software for the Generations Y and Z.

Enter the automation era!

It’s not new, Market Research is doing badly.

A few years back, to improve profitability, most major MR institutes have been sub-contracting Survey Programming and Data Processing to Eastern Europe or Asia. This has not been enough. The next step to increase productivity is automation. The successful launch of Zappi Store has made every one acutely aware of this.

Zappi Store uses Millward Brown or Brainjuicer’s methodology to run very formatted studies, entirely automated at unbeatable costs. They have a survey with a few customisable parameters – say the name of the brand, the logo and a list of competitors. With that, they purchase the sample and produce a PowerPoint presentation with all the key (automated) findings.  Who needs researchers and analysts anymore? Actually you only need them once – to design the methodology.

At Askia, we have always known that automation was key to improving performance. At our last user conference, we presented what clients did to automate our system… and our system was always very easy to integrate in a larger enterprise ecosystem because not all of our clients use our full range. Some just collect data, some just analyse data with Askia. So we always conceived our software like bricks in a very heterogeneous Market Research wall.

The cement to these bricks is import and export to open standards but also to produce and document APIs: Application Programming Interface – that is entry points for your own geeks to play with our toys. And if you do not have your own geeks, don’t despair: https://placementseo.com/seo-reseller-services have decided to integrate our APIs so you can automate Askia tasks.

With the new version of AskiaField (its very sexy name is 5.4.2), we have pushed automation to a new level. You can entirely control AskiaField from a custom made application – standalone app or even a web application. Eventually this means that the AskiaField Supervisor will be a web page. But in the meantime, you can write a piece of software that creates a survey script in XML, uploads it to AskiaField, makes it live as a web survey, creates a list from your customers database, email them, cleans the data by removing speeders with AskiaTools, analyses the data and produces a report that you can email every morning to your stakeholders or provides them with a dashboard.
Automation is the future – you are going to need a few more geeks… oh and some air freshener!

AskiaAnalyse: team me up, Scotty!

The software in the Askia range have been designed to work alongside each other.  If you know how your survey tree looks in Design, you will not be surprised by how it looks in Analyse. That’s the whole point of an integrated suite.

Analyse has mainly been designed to work with a single user, creating their weightings, calculated variables and filters in their QES file.

When we realised that a lot of our users were working on continuous surveys, we introduced Surf files: the analysis definitions were stored in the Surf files so there was no problem whenever you were adding more data.

However, there were two types of user for which Askia Analyse was not performing well:

1. People importing their data from an external source (e.g. triple-S, SAV, Dimensions etc)

They would import their data, maybe create a question tree structure in Design and create multiple questions or loops in Tools. Then they would create variables, weightings and sub-populations in Analyse and save their portfolios.

Now if there was any problem with the source data, such as, additional data or cleaning errors, the whole import would have to be done again. Of course, we provided ways to improve the speed of the process:

…but we wanted to make things quicker.

2. People within large teams

In large teams, a portfolio could be created by someone and run by someone else. It’s not always the same people who create the Surf file and use it. Again we had made sure that the portfolios can be shared. If someone created a sub-population within a portfolio, it would be re-created in Analyse but what about calculated variables, recodes and weightings?

To please all these users, we have implemented a new range of developments:

myView

Firstly, we have implemented myView: it lets users re-organise and hide variables as well as hide responses and automatically associate calculated or grouped responses to questions. It’s something we have given a lot of TLC to. Since the ‘my limited view’ definition is stored outside the QES or QEW (in an .mlv file), if your survey changes (after a re-import), the myView file can be re-used (and opened automatically).

Portfolio improvements

Secondly, we have stored a few more things in the portfolio. The weightings and even the calculated variables & recodes. This means that if you use any weighting or portfolio (as well as tab-template and sub-population), their definition will be saved in the portfolio. If you re-open that portfolio in a different QES or QEW, all these definitions will be re-created automatically! – If these already exist in the survey; the system will warn you if they are different.

Creating loops

Finally, an ambitious development scheduled for 5.3.5.5. We have decided to let people create loops in Analyse (well they are called levels, aren’t they?). This means even if you have data in Surf, you will easily be able to bring questions into loops without writing edits or transforming several files in Tools.

The Electric Kool-Aid Askia Test

Abstract: Survey scripting and coding have lots in common and we should bring testing techniques into Survey Design. For this we have improved Random Data Generation and created a new Tools module called “Script Verification”.

SurveyMonkey, Google consumer surveys and other disruptive DIY technologies have changed the Market Research industry. Any marketing director can put together an online survey, get sample from a number of panel providers and have results to their strategic questions in hours.

But Askia software is not designed for marketing directors. It has been conceived for survey specialists, scripters, data processors who design and analyse complex surveys – sometimes long, sometimes algorithmically challenging, over long periods of time and eventually collecting millions of records. And with our target audience in mind we are continually improving our range of software. We want any design to be achievable, any layout, any number of records. It would be an exaggeration to call it Big Data but let’s say we specialise in “Medium Data”.

On the subject of interview data, I will only mention that in the last 2 years we have completely overhauled our way of storing data in SQL Server (5.3.3) and a new compressed inverted data format (5.3.4). But I am digressing, the subject of this article is about methodologically managing complexity.

Managing complexity with Askia's survey software

The challenge with complexity is that it invariably leads to human errors, their number exponentially growing with size, harder to spot and often too late. The thing is we, as programmers, know about complexity. Askia software is made up of millions of lines of code and, as some of you may have fleetingly experienced, it sometimes breaks. And, believe it or not, we coders have an aspiration to perfection: we constantly try new methodological or technical ways of testing our software so it works smoothly the second we release it. But any program that does anything more than sorting three numbers is bound to break and we have to live with the fact that we will always deliver short of what we wished for – but hopefully learn from past mistakes.

Survey scripting is programming – unfortunately Market Research tools are a little bit behind (yes we are aware of our responsibility there). Our first version of Design in 1994 was attempting to mimic the revolution Visual Basic brought to the programming world in 1991. All basic functionalities were available in a Graphical User Interface. We made the layout WYSIWYG but we still allowed programming in event driven scripts but hidden from the interface. Our AskiaScript still has the traces of its ancestry with variables defined with Dim, For Next loops – I’ll admit that not everybody at Askia thinks it’s a good thing but that’s the price to pay for backwards compatibility.

Reusability & object-oriented programming by Askia

Reusability is the key to decrease development time and increase reliability. For programmers reusability is generally known as Object-Oriented Programming. In all of our software, we have tried to include reusable objects: Generations Settings and Internet options in askiadesign, Tab-templates and clones in Askiaanalyse, survey inheritance in askiasurf, libraries everywhere. Last but certainly not least we have created Askia Design Controls: we have enabled (advanced) users to generate the perfect HTML / JavaScript for each PC / tablet / mobile target whatever the browser, its version or its Operating System. ADCs encapsulate data, they are polymorphic (you can use them on different types of questions and browsers) and because they are open source, it’s up to you to give them inheritance.

There is another part of programming that we would like to bring to the Market Research industry: it’s testing – unit testing, integration testing, system testing. For the development of AskiaScript 2.0 we designed the tests before we wrote one line of code – this is called test-driven development (TDD). The number of bugs was minimal for a development of that size. Each time we found a fault, we added it to the list of tests to make sure it would never surface again in subsequent versions.

Test-driven development in survey design by Askia

Along with the spec of a survey, there should be a list of tests. These tests should be run by someone other than the survey scripter – and the tester should not peek into the routing coding. Different people will think differently ensuring your tests cover more defects.  We have put together a non-exhaustive list of tests:

  • Interview level: data presence for mandatory questions, skip routing testing, coherence between questions, testing links and response visibility.
  • Usability testing: testing each screen on every platform.
  • Aggregated testing: making sure quotas are respected, rotations are balanced, multiple questions have multiple responses.
  • System testing: ensuring the survey runs well on the server and that the data you produce is usable.

Long before considering a soft launch the simplest way to see if your survey runs correctly is to generate random data. You have two ways of doing so: either by using AskiaTools’ random data generator or using a JavaScript simulator (see here). The JS simulator is a great way to achieve systems testing.

System testing can also be achieved by exporting test interview data as .dat files and looking at the size of the individual dat files: you will be able to measure the load that will incur on memory. Multiply this by the number of concurrent interviews you expect and you will have an idea of the specs you need on your server(s). Additionally, looking at the size of a .QES file or preferably of the tables generated in SQL server will indicate how much Hard Drive space you will need.

Random Data Generation in askiatools

We have recently added a lot of features to the Tools random data generator: you can define routings that would only be run during random generation (for validating the screening for instance), you can specify the behaviour when blocking error messages are displayed and more importantly you can import your quota settings and take them into account in your generation (all available in 5.3.3). Quota code is often complex and going over could be an expensive mistake.

We have also created a brand new module in Tools 5.3.5 called “Verification scripts” (see here for more details). This allows a tester – remember not the survey scripter – to create checks in AskiaScripts that will be run on each interview. So you can verify that the question about credit cards has been asked if the interviewee has mentioned banks in another question. You simply write a check like this:

Assert.Check( Banks <> {} and CreditCards.HasNA,” Interviewee should have been asked the question about credit cards” )

The scripts can be as long as you like, we have added If Else conditions and Goto to help you create complex code that you will keep in one single text file. And you can write it within an environment – the askia visual studio – where you get help and documentation on any objects, methods or keywords. You can run this on your randomly generated data, on your soft launch or on your full data set – each time you get a detailed report of how many checks have failed. At the time of writing, this is not released yet but contact us if you want to try a beta.

In these scripts, we also want to have access to aggregated data… this will allow us to have one script that runs interview level testing and aggregated testing. We might want to test if an interview took less than 10% of the average length or if a given response to a question is outside a percentile. In other words, you might want to compare interview responses not with other interview responses but all other interview responses. The script grammar for this will be described in a forthcoming article – we are still passionately discussing it internally.

Usability testing in survey design by Askia

We have not covered usability testing here – not that we do not think it’s important: we are constantly talking about it internally. We are putting together a range of tools for designing ADCs (so far codenamed ADCUtil – yes we need something catchier), we have added ways of visualising your HTML in other browsers in Design. But we need to understand when a display no longer works because of screen size, the bias triggered by no longer using JavaScript, count the number of heads of Internet Explorer 5 users – and there again we need your input and your ideas so we can automate these tasks.

In the meantime, I leave you with these great quotes:

The act of maintaining software necessarily degrades it.” – Alain April

It’s harder to read code than write it.Joel Spolsky

If you can’t measure it, you can’t improve it. Peter Drucker