LEXIS Distributed Data Infrastructure (DDI) gets operational
In the LEXIS project, we are developing an advanced and user-friendly computing environment that converges Big Data Analytics, High Performance Computing and Cloud Computing capabilities of European supercomputing centres. Three main layers – the infrastructure layer, the data management layer and a workflow orchestration layer – constitute the main building blocks of LEXIS. With the LEXIS platform and portal, we enable users in science, industry and society to automate and run their compute-and data-intensive workflows efficiently, and thus to accelerate their Research and Development.
The LEXIS Data System allows users to consistently manage input, output and temporary data of their workflows. Its core is the Distributed Data Infrastructure (DDI), which federates the storage systems of the LEXIS infrastructure layer and can be conveniently addressed via REST APIs (i.e., interfaces based on web technology) Figure 1. The DDI is based on the “Integrated Rule-Oriented Data System” (iRODS) and B2SAFE of the “European Data Collaborative Data Infrastructure” (EUDAT CDI). Technically, this means that LEXIS data can be accessed and managed in a uniform way, independently of where the data are physically located. From a collaboration perspective, the integration of LEXIS with EUDAT is one step towards a unified, European research data management following the FAIR principles (Wilkinson et al., 2016, https://doi.org/10.1038/sdata.2016.18). It gives us straightforward possibilities to federate our data system with more European data centres and projects.
In practice, all project members can interact with their data through the LEXIS Portal. They can utilize the data within their LEXIS workflows and iRODS automatically manages cross-site data transfer wherever necessary. The novel Burst Buffer systems in LEXIS can be used to prefetch remote data, if data transfer would take too long. Alternatively, immediate data availability and increased data security can be obtained by activating a convenient cross-site replication functionality implemented with iRODS/B2SAFE.
Recently, the first results from “Weather and Climate Large Scale Pilot” workflows exploiting the LEXIS Computing and Distributed Data Infrastructure were published (Parodi et al., 2020, https://doi.org/10.1007/978-3-030-50454-0_25). We are proud to disseminate these at CISIS 2020, and at SC 2020 with a poster highlighting the DDI (Figure 2 and 3).
Currently, we are working on benchmarking the DDI system, taking into account different network speeds between LEXIS sites. The LEXIS Orchestration System will be aware of physical data locations and typical transfer speeds. Thus, it can select suitable storage and computing sites for executing a given workflow at the best performance. Mastering the challenges in integrating IT4I, LRZ and further infrastructure within LEXIS, we keep our focus on providing an optimised system with immediate benefits to the users. Besides convenient usability via the portal, high performance and speed gains are key points for an optimum uptake of the DDI and the LEXIS platform as a whole.