Quota: sticking to the script

Nobody likes quota. They have the off-putting echo of a well-wishing community reluctantly leaving Apartheid behind. If researchers mention quota, it’s because you did not hit the targets. If a financial director mentions them, it’s to tell you how you went over and blew the budget. You do not like quota – and us, programmers, well, it was never our favourite part of the job.

But with askiafield 5.4, we have put that behind us and made quota sexy. We have rebuilt the quota interface and the quota distribution engine.  Upgrading an interface – although time consuming – is rarely a problem. Well, we made it look cool which was quite a bit of work.

Changing the entire quota engine is not something that one should approach lightly. We did it with extra care: we put together hundreds of unit tests (where we predict and verify the output of code) and integration tests (where a full automated run of CCA is monitored and the results analysed).

This refactoring had a few goals:

  • Simplify interface(s): quota definition and the quota monitoring could be done in the same window
  • Add functionality: multiple questions, numeric, grouping responses, remove all limitations on the size of the quota tree
  • Expose through an API: the quota can be defined and monitored from a web interface – or automated from an external system (like Platform One)
  • Clarify quota scripting

This article does not focus on the actual functionalities of the quota – they are documented here – but on the impact of scripting quota through routing.

Why script quota?

Scripts are not usually used for screen-out quotas. These are usually dealt automatically (by the dialler in CATI or by the automatic settings in quota). You want 500 males in region X – once you have them, the interview is simply terminated.

Typically you need script when you want have to take a decision about which concept(s) you want to test. You first ask which ads they have seen and you decide to randomly pick 2 of them and question about them.

Ideally you want to select the ones that are the least filled – the ones furthest away in counts or lowest compared to the target percentage. And you might have weird priorities to take into account (always test your client’s brand against another one, etc…).

The rules can be complicated but we have provided simple functions for this.

5.3: the unbearable weakness of strings

In 5.3, you had the possibility of querying the state of the quota by using IsQuotaFullFor, QuotaToDo, MaxQuotaToDo, and AvailableQuota.

It did the trick for a while but there were problems:

  • It was dependent on a string (e.g, QuotaDoTo(“Region:1; Product”)). It was easy to spell it wrong and only realise that you had misspelled a question near the end of fieldwork.
  • It assumed you knew your quota tree – if you had not nested the Product within the Region (or decided to relax the rules near the end), you would get the wrong result.
  • The returned result was only looking at one quota row at a time.
  • The target counts were not taken in account to prioritise your selection.

Quota in 5.4? Sorted!

Enter 5.4 – well 5.4.4 really. We have introduced new keywords: they are methods of questions instead of functions. In other words, you write something like Gender.AvailableQuota() instead.

  • AvailableQuota: returns an ordered list of responses for the quota which are still open. The ordering is done according to the count: the first element is the response where the highest number of interviews are to be found.
  • AvailableBalancedQuota: Same as AvailableQuota but the ordering is done by the difference between targets and observed.
  • QuotaList: Same as AvailableQuota but all responses are returned (even the one ones over quota).
  • BalancedQuotaList: Same as AvailableBalancedQuota but all responses are returned (even the ones over quota).

If you want to specify some additional information about the tree you can: its works like this: Product.AvailableQuota (Gender: 1, Region :3). This means no more spelling mistakes would get in the way as the compiler would pick on the fact that you specify an incorrect question.

Another thing: if the gender and the region are specified in the interview, you do not need to indicate them but you could get information about another region for instance.

But from now on, if you need to pick 2 products to test and regardless of the nightmare of a quota tree you may have defined, you should simply write:

Dim arrProductsToTest = Product.AvailableBalancedQuota()

Return {} + arrProductsToTest[1] +  arrProductsToTest[2]

Back compatibility – what is it good for?

You know we care about it. We really wanted it to make sure that scripted surveys would work as usual. But we wanted to ensure that the old weaknesses were gone. So all previous quota functions will work with the old string… but we also took the liberty of sorting the result for your convenience… and to check the whole quota tree in case a priority at top level interfered with one of the nested quota.

So we have back-compatibility but not quite: it’s simply better and more flexible – and when the old quota tree was failing, you will get the expected results. We hope you agree.

Quota categories

The algorithm to know if a quota target applies to a given interview is actually quite complicated but we are going to explain it as simply as we can… feel free to skip this (and trust us).

Let’s imagine we have a quota tree like:

Root TO DO
1 Male 50
2 Product A 40
3 Product B 0
4 Female 40
5 Product A 15
6 Region1 10
7 Region 2 5
8 Product B 15

 

Let’s look how we run the following Product.AvailableQuota (Gender: 2) call:

  1. We will look for the availability of the first modality (then second…) – so first we will look at Product A.
  2. We count the number of targets we need to attain: one for the question object and one for each of the questions passed as a parameter (Product.AvailableQuota (Gender: 2) would mean 2 targets, Product.AvailableQuota (Gender: 1, Region 2) would mean 3).
  3. We create a quota category where we set the Product (according to step 1) and we also set the parameters
  4. For all questions used in the quota, we look in the interview to see if we have data and we set it in the category.
  5. We are going to iterate through the tree – starting at the root
  6. When we hit a response for a question that’s defined in the quota category, we either explore the sub-tree or skip the branch. For example, for Product.AvailableQuota (Gender:2), when we arrive at row 2, we would skip the entire tree and continue at row 4
  7. We count the number of questions we have found which are part of our targets (as defined in step 1). If we are looking for product A in Product.AvailableQuota (Gender: 2) we would hit that target on row 5
  8. Once we have hit the target we add all the sub-quota rows. So for product A in Product.AvailableQuota (Gender: 2) we would select the following rows 5,6,7. All the quota rows? Not quite! If the region 1 was set in the interview, we would not add row 7
  9. Once the whole tree is scanned, if we have selected 0 rows, we remove one of the targets (like Gender or Region in the Step 2 example) and start again at Step 2
  10. We would go through all the selected rows, and we would return the To Do with the most constraining value (the maximum of the minimum To Do and the minimum of the maximum To Do). Yes you might have to re-read that last sentence.
  11. Do the next response (product B) and re-start at Step 1)

There is added complexity for groups… if a response is in a group and has no targets, we use the first parent group who has a target.

If a response does not have a target, we assume that the To Do value is 1.

That’s it folks!

Conclusion

We think that AvailableQuota and AvailableBalancedQuota should cover 99% of the scripting needs. We’d love to have your feedback on this of course. We might later introduce a quota object where you will be able to query the actual min and max target or the priority… let us know when you need that and how you think it should work!

Big Data with just one digit

I know some of you think I only attend conferences for the free food, the drinks and the social scene. They are right – no point in me denying. But in-between parties, I tend to heal my hang-overs in the semi-darkness of conferences.

Coming back from the ASC and ESOMAR, there are a few new tendencies in the Autumn/Winter MRX fashion. Forgotten MROC, gamification, mobile research, Big Data – that’s so last year… it’s main stream, dude.

These days the cool kids talk about Automation, Data fusion, Artificial Intelligence… and the Tinderisation of research.

Automation – if you’re an assiduous reader of this blog, you know it’s coming and fully available at a software provider near you. I am not going to ramble anymore about this for now but watch this space.

Artificial Intelligence is the next big thing in Research. It has been successfully used to post-code and (less successfully) to measure sentiment in open-ended questions and tweets. It’s also good at recognising logos and objects on pictures and films, building accurate predictive models and beating me at Go (well the latter is not news and not strictly research)… but now AI is also used to merge data. There is an inconvenient truth about convenience panels… and MR data in general. If your survey is 40 minutes long (or 20 minute on a mobile device), the resulting data will be awful: the participants are either too unusual to be trusted or they don’t care because they are not incentivised.

Although there is no evidence of the length of surveys diminishing (according to SSI), every-one agrees that it needs to happen. One way is to… well… make up data. You do not ask all the questions to every-one and you copy the data around for similar looking interviews – this is called ascription (it has been around for some time). For you stat geeks out there, it’s traditionally done using the Mahalanobis distance. The new thing is to use machine learning to infer missing data. Mike Murray and James Eldridge from Research Now had a great paper about automating the splitting of surveys in chunks from their XML definition. Annelies Verhaeghe from Insites and John Colias from Decision Analyst also presented two great papers about enriching surveys with open big data.

And finally after the uberisation of research which has seen the arrival of monkeys, gizmos, nuts & limes, the new trend is the tinderisation of research. Millenials (there were boos whenever the term was used – and that was every 47 seconds) take decisions with their index. Left means no, right means yes… and survey research should follow. It’s easy to understand, fast to answer and it’s your system 1 talking… And the index is not just for decisions… the navigation of a survey should be done through flicks of the index. Almost being a millennial myself (the NSA has the names of those who are laughing), I see the attraction… and we are soon to release something code-named Jupiter that might just turn (or keep) Askia the best software for the Generations Y and Z.

Enter the automation era!

It’s not new, Market Research is doing badly.

A few years back, to improve profitability, most major MR institutes have been sub-contracting Survey Programming and Data Processing to Eastern Europe or Asia. This has not been enough. The next step to increase productivity is automation. The successful launch of Zappi Store has made every one acutely aware of this.

Zappi Store uses Millward Brown or Brainjuicer’s methodology to run very formatted studies, entirely automated at unbeatable costs. They have a survey with a few customisable parameters – say the name of the brand, the logo and a list of competitors. With that, they purchase the sample and produce a PowerPoint presentation with all the key (automated) findings.  Who needs researchers and analysts anymore? Actually you only need them once – to design the methodology.

At Askia, we have always known that automation was key to improving performance. At our last user conference, we presented what clients did to automate our system… and our system was always very easy to integrate in a larger enterprise ecosystem because not all of our clients use our full range. Some just collect data, some just analyse data with Askia. So we always conceived our software like bricks in a very heterogeneous Market Research wall.

The cement to these bricks is import and export to open standards but also to produce and document APIs: Application Programming Interface – that is entry points for your own geeks to play with our toys. And if you do not have your own geeks, don’t despair: some independent geeks have decided to integrate our APIs so you can automate Askia tasks.

With the new version of AskiaField (its very sexy name is 5.4.2), we have pushed automation to a new level. You can entirely control AskiaField from a custom made application – standalone app or even a web application. Eventually this means that the AskiaField Supervisor will be a web page. But in the meantime, you can write a piece of software that creates a survey script in XML, uploads it to AskiaField, makes it live as a web survey, creates a list from your customers database, email them, cleans the data by removing speeders with AskiaTools, analyses the data and produces a report that you can email every morning to your stakeholders or provides them with a dashboard.
Automation is the future – you are going to need a few more geeks… oh and some air freshener!

AskiaAnalyse: team me up, Scotty!

The software in the Askia range have been designed to work alongside each other.  If you know how your survey tree looks in Design, you will not be surprised by how it looks in Analyse. That’s the whole point of an integrated suite.

Analyse has mainly been designed to work with a single user, creating their weightings, calculated variables and filters in their QES file.

When we realised that a lot of our users were working on continuous surveys, we introduced Surf files: the analysis definitions were stored in the Surf files so there was no problem whenever you were adding more data.

However, there were two types of user for which Askia Analyse was not performing well:

1. People importing their data from an external source (e.g. triple-S, SAV, Dimensions etc)

They would import their data, maybe create a question tree structure in Design and create multiple questions or loops in Tools. Then they would create variables, weightings and sub-populations in Analyse and save their portfolios.

Now if there was any problem with the source data, such as, additional data or cleaning errors, the whole import would have to be done again. Of course, we provided ways to improve the speed of the process:

…but we wanted to make things quicker.

2. People within large teams

In large teams, a portfolio could be created by someone and run by someone else. It’s not always the same people who create the Surf file and use it. Again we had made sure that the portfolios can be shared. If someone created a sub-population within a portfolio, it would be re-created in Analyse but what about calculated variables, recodes and weightings?

To please all these users, we have implemented a new range of developments:

myView

Firstly, we have implemented myView: it lets users re-organise and hide variables as well as hide responses and automatically associate calculated or grouped responses to questions. It’s something we have given a lot of TLC to. Since the ‘my limited view’ definition is stored outside the QES or QEW (in an .mlv file), if your survey changes (after a re-import), the myView file can be re-used (and opened automatically).

Portfolio improvements

Secondly, we have stored a few more things in the portfolio. The weightings and even the calculated variables & recodes. This means that if you use any weighting or portfolio (as well as tab-template and sub-population), their definition will be saved in the portfolio. If you re-open that portfolio in a different QES or QEW, all these definitions will be re-created automatically! – If these already exist in the survey; the system will warn you if they are different.

Creating loops

Finally, an ambitious development scheduled for 5.3.5.5. We have decided to let people create loops in Analyse (well they are called levels, aren’t they?). This means even if you have data in Surf, you will easily be able to bring questions into loops without writing edits or transforming several files in Tools.

The Electric Kool-Aid Askia Test

Abstract: Survey scripting and coding have lots in common and we should bring testing techniques into Survey Design. For this we have improved Random Data Generation and created a new Tools module called “Script Verification”.

SurveyMonkey, Google consumer surveys and other disruptive DIY technologies have changed the Market Research industry. Any marketing director can put together an online survey, get sample from a number of panel providers and have results to their strategic questions in hours.

But Askia software is not designed for marketing directors. It has been conceived for survey specialists, scripters, data processors who design and analyse complex surveys – sometimes long, sometimes algorithmically challenging, over long periods of time and eventually collecting millions of records. And with our target audience in mind we are continually improving our range of software. We want any design to be achievable, any layout, any number of records. It would be an exaggeration to call it Big Data but let’s say we specialise in “Medium Data”.

On the subject of interview data, I will only mention that in the last 2 years we have completely overhauled our way of storing data in SQL Server (5.3.3) and a new compressed inverted data format (5.3.4). But I am digressing, the subject of this article is about methodologically managing complexity.

Managing complexity with Askia's survey software

The challenge with complexity is that it invariably leads to human errors, their number exponentially growing with size, harder to spot and often too late. The thing is we, as programmers, know about complexity. Askia software is made up of millions of lines of code and, as some of you may have fleetingly experienced, it sometimes breaks. And, believe it or not, we coders have an aspiration to perfection: we constantly try new methodological or technical ways of testing our software so it works smoothly the second we release it. But any program that does anything more than sorting three numbers is bound to break and we have to live with the fact that we will always deliver short of what we wished for – but hopefully learn from past mistakes.

Survey scripting is programming – unfortunately Market Research tools are a little bit behind (yes we are aware of our responsibility there). Our first version of Design in 1994 was attempting to mimic the revolution Visual Basic brought to the programming world in 1991. All basic functionalities were available in a Graphical User Interface. We made the layout WYSIWYG but we still allowed programming in event driven scripts but hidden from the interface. Our AskiaScript still has the traces of its ancestry with variables defined with Dim, For Next loops – I’ll admit that not everybody at Askia thinks it’s a good thing but that’s the price to pay for backwards compatibility.

Reusability & object-oriented programming by Askia

Reusability is the key to decrease development time and increase reliability. For programmers reusability is generally known as Object-Oriented Programming. In all of our software, we have tried to include reusable objects: Generations Settings and Internet options in askiadesign, Tab-templates and clones in Askiaanalyse, survey inheritance in askiasurf, libraries everywhere. Last but certainly not least we have created Askia Design Controls: we have enabled (advanced) users to generate the perfect HTML / JavaScript for each PC / tablet / mobile target whatever the browser, its version or its Operating System. ADCs encapsulate data, they are polymorphic (you can use them on different types of questions and browsers) and because they are open source, it’s up to you to give them inheritance.

There is another part of programming that we would like to bring to the Market Research industry: it’s testing – unit testing, integration testing, system testing. For the development of AskiaScript 2.0 we designed the tests before we wrote one line of code – this is called test-driven development (TDD). The number of bugs was minimal for a development of that size. Each time we found a fault, we added it to the list of tests to make sure it would never surface again in subsequent versions.

Test-driven development in survey design by Askia

Along with the spec of a survey, there should be a list of tests. These tests should be run by someone other than the survey scripter – and the tester should not peek into the routing coding. Different people will think differently ensuring your tests cover more defects.  We have put together a non-exhaustive list of tests:

  • Interview level: data presence for mandatory questions, skip routing testing, coherence between questions, testing links and response visibility.
  • Usability testing: testing each screen on every platform.
  • Aggregated testing: making sure quotas are respected, rotations are balanced, multiple questions have multiple responses.
  • System testing: ensuring the survey runs well on the server and that the data you produce is usable.

Long before considering a soft launch the simplest way to see if your survey runs correctly is to generate random data. You have two ways of doing so: either by using AskiaTools’ random data generator or using a JavaScript simulator (see here). The JS simulator is a great way to achieve systems testing.

System testing can also be achieved by exporting test interview data as .dat files and looking at the size of the individual dat files: you will be able to measure the load that will incur on memory. Multiply this by the number of concurrent interviews you expect and you will have an idea of the specs you need on your server(s). Additionally, looking at the size of a .QES file or preferably of the tables generated in SQL server will indicate how much Hard Drive space you will need.

Random Data Generation in askiatools

We have recently added a lot of features to the Tools random data generator: you can define routings that would only be run during random generation (for validating the screening for instance), you can specify the behaviour when blocking error messages are displayed and more importantly you can import your quota settings and take them into account in your generation (all available in 5.3.3). Quota code is often complex and going over could be an expensive mistake.

We have also created a brand new module in Tools 5.3.5 called “Verification scripts” (see here for more details). This allows a tester – remember not the survey scripter – to create checks in AskiaScripts that will be run on each interview. So you can verify that the question about credit cards has been asked if the interviewee has mentioned banks in another question. You simply write a check like this:

Assert.Check( Banks <> {} and CreditCards.HasNA,” Interviewee should have been asked the question about credit cards” )

The scripts can be as long as you like, we have added If Else conditions and Goto to help you create complex code that you will keep in one single text file. And you can write it within an environment – the askia visual studio – where you get help and documentation on any objects, methods or keywords. You can run this on your randomly generated data, on your soft launch or on your full data set – each time you get a detailed report of how many checks have failed. At the time of writing, this is not released yet but contact us if you want to try a beta.

In these scripts, we also want to have access to aggregated data… this will allow us to have one script that runs interview level testing and aggregated testing. We might want to test if an interview took less than 10% of the average length or if a given response to a question is outside a percentile. In other words, you might want to compare interview responses not with other interview responses but all other interview responses. The script grammar for this will be described in a forthcoming article – we are still passionately discussing it internally.

Usability testing in survey design by Askia

We have not covered usability testing here – not that we do not think it’s important: we are constantly talking about it internally. We are putting together a range of tools for designing ADCs (so far codenamed ADCUtil – yes we need something catchier), we have added ways of visualising your HTML in other browsers in Design. But we need to understand when a display no longer works because of screen size, the bias triggered by no longer using JavaScript, count the number of heads of Internet Explorer 5 users – and there again we need your input and your ideas so we can automate these tasks.

In the meantime, I leave you with these great quotes:

The act of maintaining software necessarily degrades it.” – Alain April

It’s harder to read code than write it.Joel Spolsky

If you can’t measure it, you can’t improve it. Peter Drucker