Striking the Right Balance: Big Data and Educational Data Mining under GDPR and the AI Act

For over a decade in higher education, I’ve seen first-hand how digitalisation is transforming data governance. From digital assessments and automated grading to sophisticated plagiarism detection, the world of education is increasingly driven by data. Big Data is reshaping how we teach, learn, and manage educational outcomes — especially in higher education institutions (HEIs). As more institutions integrate Learning Management Systems (LMS) and other digital platforms, an immense volume of data is generated, waiting to be mined for insights that could revolutionise student success and institutional efficiency. Yet, this shift to digital brings with it essential questions: How is this data utilised? Who controls it? And, crucially, how do we explain AI system outputs amid ever-evolving regulations?


Defining Big Data in Higher Education

Today, HEIs handle vast amounts of data, commonly known as “Big Data,” encompassing everything from student demographics and academic records to behavioural data captured through digital learning platforms. What makes Big Data truly impactful are its defining characteristics: volume, velocity, variety, veracity, and value.

  • Volume: the sheer size of the datasets.
  • Velocity: the speed at which data is generated and processed.
  • Variety: the diversity of data types, from written assignments to click-stream data.
  • Veracity: the quality and reliability of the data.
  • Value: the insights that can be drawn from it, driving better decision-making and enhancing learning outcomes.


In education, Big Data exists at three critical levels: micro, meso, and macro.

  • Micro-level data (e.g., click-stream data) captures every detail of a student’s interaction with digital platforms, providing real-time insights into engagement patterns and knowledge gaps, empowering educators to intervene with tailored support.
  • Meso-level data (e.g., textual data) delves into students’ writing and communication, giving educators a deeper understanding of cognitive, social, and emotional development.
  • Macro-level data (e.g., institutional data) offers the broader picture, covering demographics, admissions, recruitment, and funding, which enable HEIs to make strategic, data-driven decisions on issues such as retention and longterm planning.


Together, these layers of data offer a snapshot of student performance and institutional effectiveness, driving smarter, more proactive choices in education.


The Promise of Educational Data Mining (EDM)

Educational Data Mining (EDM) is an exciting, emerging field focused on revealing patterns within educational data to enhance student success and institutional efficiency. By delving into enormous datasets, EDM uncovers trends that might otherwise remain hidden. Imagine using machine learning to identify students who may struggle based on past performance data. Once identified, HEIs can offer tailored support—such as additional tutoring or adaptive learning paths — aligned with each student’s unique needs. However, with these transformative opportunities come essential ethical responsibilities around data privacy and responsible use.


How Can HEIs Ensure Compliance with Regulations like the AI Act and GDPR?

HEIs have a responsibility to align their use of EDM with regulatory frameworks like the AI Act and GDPR, focusing on transparency, explainability, and data governance.

  • Transparency with AI Tools: The AI Act mandates transparency in AI systems, ensuring students and staff know when their data is being used. HEIs must clarify how AI functions and impacts decisions—such as predicting student success or risk of dropout. Transparent communication builds trust and prevents a perception of opaque or hidden processes.
  • Explainability in AI: Explainability requires that AI-generated outcomes are understandable to humans. For HEIs, this means AI systems should clearly articulate the reasoning behind decisions. If an EDM tool flags a student at risk of failing, it’s essential to communicate the factors driving this prediction — not simply present it as a mystery.
  • Human Oversight: The AI Act emphasises human oversight, particularly in high-stakes decisions. HEIs should regularly assess the risks of their AI systems and ensure human involvement in decision-making. This prevents unchecked, automated decisions that could adversely impact students’ experiences.
  • Ethical Data Use: Beyond compliance, ethical data usage is crucial. HEIs must cultivate clear data governance policies, avoid over-collection of data, and ensure AI tools serve the student’s best interest. Anonymisation and fair-use policies are essential to building trust and safeguarding students’ rights.



Join the Conversation on the Future of AI in Education

The journey of Big Data and AI in education is just beginning, and its impact on the way we teach, learn, and grow is profound. I invite you to join me at my talk on the Spotlight Stage on Friday, Nov 29 Time 12:30 – 13:00, where we’ll delve into the future of AI regulations and data-driven education. Let’s explore how to harness the power of Big Data ethically and effectively to create smarter, transparent, and more personalised learning experiences for all.



Written for OEB Global 2024 by Lezel Roddeck.

Leave a Reply

Your email address will not be published.