Big data has become the hot topic of the moment. The BBC and the Guardian have published whole series of articles and programmes about how it is changing lives, cityscapes, mapping and even art. It is reportedly capable of anything, from improving Australia’s already stringent border controls to analysing the latest fashion trends. And of course, the EdTech sector won’t be left out. If the hype is to be believed, it is a revolutionary driver, poised to restructure lessons, lectures, admissions processes, examinations, textbooks and practically everything else we associate with teaching. But what does the nebulous term “big data” mean, how can we really use it in education, and what sort of ethical problems does it throw up?
By Alasdair MacKinnon
While size is a characteristic of big data sets, it is not their defining feature. Since the beginning of the computer age, our ability to produce and store data has increased roughly exponentially. Today, it is claimed, 2.5 quintillion bytes of high-velocity data are created daily, and 90% of the data ever produced was produced in the last two years (IBM). Advances in ICTs have allowed us to measure more variables, more quickly, than ever before. A flight from London to New York in 1970 would have only produced a handful of data: now each jet-engine alone produces 10 terabytes every 30 minutes.
Advances in connectivity and computation have given us the ability to measure the world around us second by second: what big data analysis allows is to compare the previously overwhelming quantities of data this sort of measurement produces. Key to understanding the impact of big data is not its magnitude, but its interconnectedness – it is the ability to compare multiple variables in real time.
It is clear that this new capability could have a huge impact on education. A university, for example, is constantly receiving data about its students from a huge variety of sources. It’s not just about the exam results and assignment marks: any university library with a swipe card system will be producing banks of data on reading habits, time spent in the library, working patterns and so on. The University of Huddersfield plotted this very data against dropout rates a few years ago, finding that students who did not use the library were seven times more likely to drop out. It was a dramatic statistic which would have been hard, if not impossible, to extract by traditional methods.
What’s more, as the use of learning technology increases, the amount of data available for analysis can only increase: every student’s test score could theoretically be analysed in real time to give a prediction of their eventual exam results and suggest ways to improve their learning. ONLINE EDUCA BERLIN keynote Dr Viktor Mayer-Schönberger recognised this in a recent interview for the News Service. “This reflects the power of big data,” he said, “that with huge quantities of data we can gain insights that we could not with smaller amounts, and these insights will improve decision-making in all aspects of our society.”
There are many who will have read the previous paragraph with a sense of rising discomfort. And justifiably: “data” has, over the last few months, become practically synonymous with “surveillance”. Data mining is in the popular conception a dirty pursuit carried out by government listening stations and corporations eager to sell personal information to the highest bidder.
But perhaps what is to be feared is not the surveillance itself, but the decisions made using such data: for the boundary between prediction and prejudice is very thin. The greater understanding data analysis offers could clearly allow schools to tailor their support to individual students’ needs, depending on what the big data predictions, based on the student’s demographics, habits and academic record, suggest. But from this point, active and harmful discrimination could be just a small step away.
There is another way in which big data can enforce prejudice, and at an altogether more fundamental level. Involving sets of a size too great for traditional analysis, big data relies on complex systems of algorithms: to extract results, machines have to be taught to replicate over and again some of the processes of human thought. Yet, as documented in the British Medical Journal in 1988, algorithms can turn out only to be as good as the people that make them.
In the 1970’s, St George’s Medical Hospital School introduced an algorithm to automate the first stage of admissions, hoping to reduce work and eliminate any inconsistencies in the process. It was modified carefully, until in 1979 it could produce with 90-95% accuracy the same results as a selection committee. It was not until 1986 that it was discovered that the system was unfairly discriminating against women and people with non-European sounding names: the algorithm was reflecting the bias inherent in the original selection process. The staff had unwittingly created a racist, sexist computer.
Anyone using big data analysis would therefore be wise to heed these two ethical considerations. Firstly, the results big data gives are certainly powerful – but they are general, and do not tell you anything about individuals. Secondly, their conclusions are necessarily inductive, and therefore based on current knowledge. Big data reflects the world as it is and not necessarily as it will be.
ONLINE EDUCA BERLIN keynote Dr Viktor Mayer-Schönberger’s interview can be read here. The Conference (4 – 6 December, 2013) will feature several other takes on the rapidly-developing sector of learning analytics. For those interested in how real-time data analysis can help students, Prof Samson Perry’s talk, “Data Mining Student Notes and Questions to Provide Personalised Feedback” will prove highly engaging. Or to discover how teachers themselves are adapting to data, take in Erik Woning’s talk, “A Teacher’s View on Learning Analytics: Controlling the Data or Being Controlled by Data?”
See other programme highlights here.