Brussels / 3 & 4 February 2024


CERN's Open Source Storage Systems

CERN (European Organization for Nuclear Research) is one of the world's largest and most respected centres for scientific research. It is home to the world's largest particle accelerator (Large Hadron Collider, LHC) and is the birthplace of the Web.

CERN's Storage and Data Management Group is responsible for enabling data storage and access for the CERN laboratory, in particular the long-term archival, preservation and distribution of LHC data to a worldwide scientific community (WLCG). Today we operate heterogeneous disk and tape software defined storage systems, with several large EOS disk farms, the CERN Tape Archive (CTA) system and other storage solutions. In total the group manages more than one exabyte of storage across about 2,000 data servers (60,000 disks) and 50,000 high-capacity tapes and orchestrates more than 4 exabytes of data transfers every year. More than 30,000 users need to access these data from their computers in a user-friendly way, which has been made possible through the CERNBox project, CERN's open source cloud sync and share platform. The group also operates a large Ceph cluster (over 60 petabytes) mostly to support the CERN OpenStack Cloud Computing infrastructure.

All of these systems are open source and available for use and hosting on-premise.

In this talk we would like to give you a high level view of the following technologies and how they work together to satisfy the storage needs of the organisation: from the early phases of data taking, to its distribution and final end-user analysis.

EOS provides a service for storing large amounts of physics data and user files (accounting today for more than 1 exabyte), with a focus on interactive and batch analysis with multi-protocol support (WebDAV, HTTP, FUSE, CIFS). It supports different authentication protocols (KRB5, X509, OIDC).

CTA (CERN Tape Archive) is the archival storage system for the custodial copy of all physics data at CERN. The CTA software provides a free and open source platform for managing data on magnetic tapes at scale.

CERNBox is the CERN cloud storage platform that provides sync and share functionality on top of the EOS and Ceph storage systems. It is built on top of the ownCloud open source product and the Reva project.

All of these systems are open source and available for use and are deployed in other laboratories around the world.


Photo of Hugo Gonzalez Labrador Hugo Gonzalez Labrador
Richard Bachmann