Guest Blog: Tim Brandwood, codeit

As researchers, we understand the value of asking open-ended questions in our surveys. They help us get to the “why” that lies behind the closed questions you ask. It’s no use finding out that your customers are dissatisfied if you don’t also ask them why.

The challenge with collecting this kind of free-text data is that it usually needs to be measured. For example, what are the top five most frequent reasons your customers are dissatisfied? How does that trend over time? And so on . . .

Turning qualitative data into quantitative measures usually involves coding the data, which is generally considered a time-consuming and painful process.

For several years, codeit has been helping Askia users overcome this challenge. Its in-built machine learning works alongside human coders, learning from them and automatically coding as much as it safely can, to considerably speed up the coding process.

This year, however, there’s a new kid on the block.

As you’ll no doubt be aware, “generative AI” (most notably, ChatGPT) has broken through into the mainstream and promises to disrupt many areas of the research world.  Given the hype around this, surely we don’t need coders anymore? Now we can just pass our verbatim data through ChatGPT and it will automatically sift through it and spit out a perfect analysis in seconds, right?

So, can generative AI automate verbatim data analysis?

The short answer is, no – it can’t do a complete and accurate analysis on its own.

The longer answer is, if all you need is a high-level read of the data then generative AI is a reasonable tool to use. You can pass in your verbatims, ask it to summarise the main themes and you’re done.

But the problem with this approach is it’s still more qualitative than quantitative. If you need reliable quantitative measures from your data then we have to tread more carefully.

In our experiments, we have found that ChatGPT can generate a really good draft codeframe and autocode between 10% and 40% of your verbatims at a “human standard” level of accuracy. Whether you’re on the lower or higher end of this range depends on a number of factors. For example, the number of verbatims you have, the complexity of the text within them and whether you’re using GPT3 or GPT4 will all make a difference to the performance.

What about codeit’s machine learning?

If you’ve used codeit before, you will know that behind the scenes it contains a powerful machine learning capability using the latest deep learning techniques. As you code data codeit learns by example, and builds a model that is tuned to the nuances of your specific project, codeframe and coding examples. Once trained, codeit can use this model to autocode further verbatim data.

In our testing, codeit’s machine learning significantly outperforms ChatGPT with autocoding rates typically around 60%. The reason for this comes down to specialisation. ChatGPT is a generalist (“Wide AI” in the jargon) – a Jack of all trades, if you will. That means it can do lots of things quite well, but it isn’t exceptionally good at any one thing (it’s not going to beat you at Go anytime soon, for example). The machine learning built into codeit is a specialist (“Narrow AI” in the jargon) – it has been trained on your specific requirements and is optimised for the task of autocoding.

But here’s the rub – in order to do this, you need to feed it some example coding to learn from. This is great for large projects (1000+ verbatims), or on-going tracking studies, but for smaller, ad hoc projects it’s not as useful.

Clearly then, generative AI can help plug this gap. If you’re beginning a project from a standing start, with no historical data to teach the system, then generative AI can get us a long way down the line really quickly. Most likely you will want to review the output, make refinements and sense-check the results, but this is still a big leap forward and a huge help.

OK, so now what?

Our sense is that the world is beginning to get a handle on generative AI. Some of the hysteria of early 2023 is dying down, and people are gaining a better perspective on its strengths and weaknesses and exactly how it can be useful.

At codeit we feel we have also learned a lot this year and we’ve been putting that learning into practice within our software. Next month we will be unveiling some exciting new features that will bring all of this work to fruition. Watch this space, as they say, or as ChatGPT would say: “I’m sorry, but I do not have access to current news updates as my knowledge only goes up until September 2021.”

If you’re interested in seeing an early preview of our new features, please contact your Askia Key Account Manager.

Photo by Mariia Shalabaieva on Unsplash