Learning analytics in the age of big data

By John Helmer

Graphic to illustrate Learning Analytics theme with graphs, etc. in a thought bubble What we can’t measure, we can’t manage, according to the business cliché. And suddenly, it seems we are able to measure a lot more than we could before: there has been an explosion in new data sources. So is this making businesses more manageable? It’s certainly having effects – in all parts of the enterprise, including learning. Learning analytics is becoming increasingly important for L&D. But do they know how to use the new learning analytics effectively?

Our latest Think Tank takes this as a subject, with a specific focus on how we can deploy actionable insights and analytics from data to fine-tune learning programmes.

As an introduction to the blog posts and reports that will come out of the Think Tank in the weeks and months to come, let’s take a look at this new data hoard, and the kinds of structured and unstructured data that are available to learning departments.

Highlights Report from the Think Tank is now available here 

The data deficit – leveraging your waste products

Driving this explosion of new data sources is the wholesale digitisation of business processes that has taken place in recent years, in the wider context of an ever increasing proportion of human activity being mediated by digital technology.

Every digital interaction produces data, as an inevitable by-product. Consequently, as digital penetration increases, the amount of data we have access to grows.

But if data is a by-product, in many cases it feels more like a waste product – because while huge amounts of learner data are being generated every day, the largest proportion of that data is thrown away unused.

As an aside, to see how powerful waste products can be in a business context, you only have to look at Marmite. Marmite is perhaps the UK’s favourite waste product. It was a great moment in business when somebody said, ‘that gunk we scrape out of the barrels at the end of the beer-brewing process –  why don’t we put that in jars and persuade people to spread it on their toast?’

For me, as a marketing person of many years standing, this waste of data seems a particular shame, since for most of my career just getting hold of data was about the most difficult, time-consuming and expensive thing I had to do, and now it is so much easier. In fact you have so much data, almost for free, that the bigger difficulty is really trying to make any sense of it.

This, in essence, is the opportunity and the challenge thrown up by what has come to be called Big Data. Suddenly there is this profusion of data, much of it ‘just there’, requiring no special effort to extract, the difficulty now being how you make sense of what the data are telling you. A further problem is posed by the fact that it is often data about questions you haven’t even started to formulate – the answers to questions you haven’t yet asked.

In talking about the problems posed by Big Data, Gartner and others talk about the ‘three Vs’:

  • Volume
  • Velocity
  • Variety

Three Vs of big data



To start with, Big Data is something of a misnomer in this context since datasets in learning analytics really aren’t that large. The term was coined to cover datasets such as those generated by the Large Hadron Collider, measured in petabytes (10 to power of 15). If you think of total staff numbers, even in a global organisation, and the relatively small amounts of data generated by corporate systems, it is a different order of magnitude.

However, the increasing shift to digital multiplies the amount of data generated by any one of those employees, well beyond the scope of what was previously envisaged. Think of a single employee accessing a learning portal. Just a glance at Google Analytics for that site will serve to show what a baffling profusion of data one person can generate: where did they enter, where did they exit, what pages did they visit, what was their dwell time on each, what browser did they use, were they on mobile, tablet or desktop ..? So each of your previous data points gets magnified exponentially.


Not only is there more data, it now comes in much faster. In fact it is near-time or real-time and this has completely changed the way we look at data. In the old days you used to wait days, weeks, months for reports to come through, sometimes quarters before you really knew how you had done. Nowadays you can see within hours whether your pilot launch has been a fantastic (or more partial) success.

This gives you an enhanced ability to iterate. And when data is coming in real-time, it gives you the capability to change and adjust in real time the way you are communicating with your people. An example of real-time data can be seen in the virtual classroom environment. Many VC products allow you to see how many of your participants are actively engaged with what is happening on the session and how many have zoned out – so you can see straight away what is resonating with them and what is proving harder work. You can run instant polls to gauge opinion and reaction, and adjust the session content accordingly.


Most of the data we habitually use in learning is structured data (e.g. test scores, course completions). With these new data sources, what comes in is a mix, with most of what has been added being unstructured. A lot of the stuff that comes in from Twitter and Facebook and Google Analytics does structure data, but in ways that are not necessarily helpful for your organisation’s normal data processing routines.

Then again, data can be in formats and media types (e.g. videos, slideshares, images, memes, free text) that you are not set up to analyse. New forms of analysis, such as sentiment analysis for social media, will require a learning curve (no pun intended) for many L&D professionals.

It is a confusing world.

Structured data in learning analytics

A possible reason why Learning has been slow to get to grips with this phenomenon so far is because it is so far different from the way measurement has historically been done in learning. We have tended to have a lot of highly structured data around a relatively small number of data points.

In the digital world, progress on data standards has generally been slow and painful. A lot of work was done to establish SCORM as the common language for learning management systems, and our world has largely run on that since the introduction of SCORM 2.1 some 15 years ago. There was an attempt to update this standard, which never really stuck, and now we have xAPI, which is also proving somewhat slow to prove itself.

The famous Kirkpatrick model of training evaluation has been with us for even longer, but here the complaint has been that nobody ever gets much beyond level one, passing out ‘happy sheets’ at the end of the day’s training. Nobody evaluates, it is said, because it is too expensive and difficult.

My suspicion is that a lot of the stuff that we want to know that would come through a Kirkpatrick model is still difficult and expensive to get at. But, in the meantime, we have this additive layer of largely unstructured data from new sources data that poses both an opportunity and a challenge. Does it tell us things we really need to know, or is it distracting us from more important questions?  How could we use this new information? And how could we use it to help influence the way we do learning?

These were among the questions we posed the delegate to our Think Tank. Watch out for the reports from that event we will be issuing in the coming weeks. If you want to be kept up to date with this and other outputs from our Thought Leadership programme, sign up to our newsletter list.

Leave a Reply

Your email address will not be published. Required fields are marked *