2–4 minutes

to read

About the challenges of platform data access and our approach to data donation

Accessing data from social media platforms remains a major obstacle for researchers studying digital media. Despite existing EU legislation on data access, “the actual delivery of data by platforms is inconsistent and, in some occasions, limited to only a few researchers”, said Andreu Casas, who leads the cross-national and cross-platform data collection in the WHAT-IF project. This situation is unlikely to improve—especially given recent shifts in platform ownership and legislation, such as those at X (formerly Twitter).

To address this, our WHAT-IF team members Andreu Casas & Georgia Dagher (Royal Holloway University of London), Andreu Rodilla & Nienke Visscher (Barcelona Supercomputing Center) as well as Kasper Welbers and Wouter van Atteveldt (VU Amsterdam) have been working on adapting a data donation tool to collect user data across five different platforms and four countries in order to map how people engage with digital media in their everyday lives. This cross-platform and cross-country research has never been done at this scale, and it brings a number of technical and structural challenges. 

The technical challenge

One of the main issues is the lack of standardization and the heterogeneity of data structures. Platforms provide user data in various formats (HTML, JSON, Javascript…) but these vary widely in structure and it’s challenging to understand which variables they represent. The data structures are poorly documented, and often not research-friendly. Only X offers a file that documents its overall structure and format. In addition, the content and structure of the data varies over time; file names, structure and variable names can all change at any point and without notice. 

Our solution

To deal with this, we’ve created a workflow to process and extract information from data donation files from Instagram, Facebook, X, YouTube and TikTok more dynamically. Using a set of test files from our team members, we developed code that:

  1. automatically identifies data fields and their structure,
  2. builds a CSV file that maps all variable names within nested JSON (and similar types of) structures,
  3. which can be used to extract these fields dynamically and avoid obtaining sensitive data.

The result is a flexible system that allows us to detect structural variation between donations and platforms and that will optimize the data collection process. This will avoid potential loss of information and minimize problems with the code – once in the field. While this is a good solution for our case, researchers need to be actively included in shaping how data access is regulated and implemented. New EU legal frameworks, in particular the Digital Services Act, promise to facilitate scientific research into platforms through new data access rights for researchers. However, as the obstacles we have mapped here show, data access alone is not enough. To truly facilitate research into platforms, we must ensure data access rights are operationalized in a way that provides reliable, consistent, well-documented, and standardized data across platforms.

Looking ahead: Why regulation must go further

This requires researchers to actively communicate how they need data access tools to be designed, and for regulators such as the European Commission to take those needs seriously in its enforcement and regulatory guidelines. The Commission has recently taken a first step in this process by setting binding rules operationalising researchers’ right to access data under the DSA. As this right becomes fully operational, it is key that the Commission continues to ensure data access tools are designed to address obstacles researchers face in practice. Only in this way can we ensure regulation actually enables high-quality independent research on the benefits and perils of digital media for society and democracy.


Discover more from WHAT-IF

Subscribe to get the our newsletter sent to your email.

Follow us on social media

BlueSky

LinkedIn

Subscribe to our newsletter

© 2025. Created using WordPress.