Making the most of our data
In a dedicated effort to develop open research data practices in electron microscopy and materials science, Paul Scherrer Institute PSI and collaborating institutions are awarded nearly 3 million Swiss Francs in funding from the ETH Domain.
We recently calculated that if you put all the data that PSI produces in one year onto old-fashioned floppy disks and stacked them up, one on top of another, you could reach nearly to the moon. As the amount of data produced in scientific experiments sky rockets, so do our means to exploit it thanks to the growth of computational techniques such as artificial intelligence and data mining. But making the most of this data and the huge potential it holds is only possible if it is appropriately stored and accessible.
“You don’t know how another piece of research may be enabled from a piece of the puzzle that you have,” says Alun Ashton, head of Science IT at PSI and a member of the steering committee of the ETH Domain Open Research Data (ORD) program. The program, along with the many other ORD initiatives across Europe, aims to put in place measures to ensure that scientific output from publicly funded research is available and accessible to everyone. Alongside improving efficiency, this aims to improve transparency and traceability, since this data is the evidence behind scientific findings.
Achieving this isn’t a simple task. Millions of gigabytes of data are produced per year at research infrastructures such as the ones at PSI - a number that is growing as technological advances such as SwissFEL and the SLS 2.0 upgrade give increasingly detailed insight into the nature of matter and materials. As well as the practical challenges of storing such a large amount of data, the principles of ORD state that it can’t just be a jumble of numbers; it must be Findable, Accessible, Interoperable, and Reusable (FAIR).
Open access data is key to our ability to tackle a range of global crises and societal challenges.
Now, in collaboration with other institutions of the ETH Domain, researchers from the four different research areas of PSI have been awarded nearly 3 million Swiss Francs in funding from the ETH Domain ORD program to implement open data practices in a range of research areas. These include two extensive and high impact initiatives, so-called ‘Establish’ projects, in electron microscopy (EM) and materials science.
Open electron microscopy data repository planned at PSI
The ‘Open EM Data Network’ project, which brings together researchers from PSI, EPFL, ETH Zurich and EMPA, received 1.5 million Swiss Francs of funding to establish ORD practices for electron microscopy (EM) in Switzerland. In addition, it has successfully received a further 920 000 Swiss Francs of funding from swissuniversities, bringing on board all electron microscopy centres at academic sites in Switzerland.
Here, the challenge lies in the large data sets and developing the computational resources to handle them. In the life sciences, cryo-electron microscopy experienced a resolution revolution enabling the atomic resolution determination of protein structures. Similarly, in materials sciences, electron microscopy has seen a dramatic expansion of possibilities and multidisciplinary approaches. With these advances came rapid increases in data volume.
The project, led by EPFL, aims to set up a searchable repository for electron microscopy data, hosted at PSI. “We bring our experience from the large scale facilities in dealing with very large data streams, automatically archiving them and making it available, together with our established infrastructure for doing this,” explains Spencer Bliven, applicant and scientist in the Laboratory of Simulation and Modelling at PSI.
The Open EM Data Network will build on the existing and very successful SciCat Data Catalog developed at PSI for the large-scale facilities, which stores data according to FAIR principles and can make it publicly available after an optional embargo period of 3 years. Extending this to electron microscopy, a key aspect will be standardisation and streamlining to ensure easy transfer of data and Swiss-wide access regardless of the electron microscopy facility used. The project complements the ambition to advance electron microscopy technology in Switzerland as outlined in the “EM-frontiers” initiative, which was recently put forwards by the ETH Board for the Swiss Roadmap for Research Infrastructures 2023.
Linking data from simulations and experiments in materials science
In the field of materials science, nearly 1.3 million Swiss francs were awarded to a collaboration led by Giovanni Pizzi from PSI, together with researchers from EMPA and ETH Zurich, which focuses on linking data between simulations and experiments. “At the moment, there is a mismatch between how you store and represent open data and machine actionable data in simulations and experiments,” explains Pizzi, who is the group leader of the Materials Software and Data group at PSI.
To overcome this mismatch, the researchers will draw on two existing open platforms: openBIS, a data management platform developed at ETH Zurich, and AiiDA, a workflow management system developed at EPFL and PSI. The two platforms focus on different parts of the data life cycle—experiments and simulations, respectively.
Making these data interoperable will streamline the development of simulation-assisted experiments. Ultimately, the hope is that this will facilitate the establishment of autonomous laboratories that combine simulations with experiments via self-driven artificial-intelligence algorithms and produce open data in a way that is FAIR-by-design.
Projects also funded in atmospheric chemistry and time-resolved serial crystallography
In addition to these two ‘Establish’ projects, funding has been awarded to projects to implement ORD practices in environmental chemistry, led by Thorsten Bartels-Rausch (scientist in the Surface Chemistry research group) and time-resolved serial crystallography, led by Filip Leonarski (beamline data scientist in the Laboratory for Macromolecules and Bioimaging), demonstrating PSI’s commitment to the cause across its diverse disciplines.
Building on Open Data practices established in photon and neutron science
As the largest federal research institute in Switzerland, PSI recognises that the data it produces is a valuable resource for national and international researchers. The most recently funded initiatives come as part of a long-term commitment at PSI to open data and complement developments already made in photon and neutron science (see ExPaNDS and PaNOSC).
“During the recent SARS-CoV-2 and ZIKA virus outbreaks, many research groups immediately made their data open access, which was pivotal to the understanding of the infection and contributed to the SARS-CoV-2 vaccine being developed so quickly. Open access data is key to our ability to tackle a range of global crises and societal challenges,” comments Alun Ashton.
Text: Paul Scherrer Institute / Miriam Arrell