The LEXIS approach to Searching and Managing FAIR Research Data in Converged HPC-Cloud-Big Data Architectures
Stephan Hachinger1, Jan Martinovic2, Olivier Terzo3, Cédric Koch-Hofer4, Martin Golasowski2, Mohamad Hayek1, Marc Levrier4, Alberto Scionti3, Donato Magarielli5, Thierry Goubier6, Antonio Parodi7, Sean Murphy8, Florin-Ionut Apopei9, Carmine D’Amico3, Simone Ciccia3, Massimo Sardo7, Danijel Schorlemmer10
1Leibniz Supercomputing Centre (LRZ), Bavarian Academy of Sciences and Humanities, Garching, Germany
2T4Innovations, VŠB – Technical University of Ostrava, Ostrava-Poruba, Czech Republic
3Advanced Computing and Applications, LINKS Foundation, Torino, Italy
4ATOS, Paris, France
5Avio Aero, Torino, Italy
6CEA, LIST, Paris, France
7CIMA Research Foundation, Savona, Italy
8Cyclops Labs GmbH, Winterthur, Switzerland
9TESEO, Torino, Italy
10GFZ German Research Centre for Geosciences, Potsdam, Germany
The enormous amounts of data generated in modern industry, business and science pose a significant challenge to those extracting actionable intelligence from data using various filtering and analysis techniques. In this “Big Data” setting, the LEXIS project (Large-scale EXecution for Industry & Society) provides a platform for optimised execution of Cloud-HPC workflows, reducing computation time and energy efficiency. The system will rely on advanced, distributed orchestration solutions (Bull Ystia Orchestrator, based on TOSCA and Alien4Cloud technologies), the High-End Application Execution Middleware HEAppE, and new hardware capabilities for maximizing efficiency in data processing, analysis and transfer (e.g. Burst Buffers with GPU- and FPGA-based data reprocessing).
LEXIS handles computation tasks and data from three Pilots, based on representative and demanding HPC/Cloud-Computing use cases in Industry (SMEs) and Science: i) Simulations of of complex turbo- machinery and gearbox systems in Aeronautics, ii) Earthquake and Tsunami simulations which are accelerated to enable accurate real-time analysis, and iii) Weather and Climate simulations where massive amounts of in situ data are assimilated to improve forecasts. A user-friendly LEXIS web portal, as a unique entry point, will provide access to data as well as workflow-handling and remote visualization functionality.
The “LEXIS Data System” constitutes the data back-end for the project. At its core, a Distributed Data Infrastructure (DDI) ensures the availability of LEXIS data at all participating HPC sites, and provides functionality for FAIR (“Findable, Interoperable, Accessible, Reusable”) Research Data Management. The DDI leverages best of breed data-management solutions of EUDAT, including B2SAFE and B2HANDLE. Via DOI acquisition, open data products can be published and disseminated. Exposing metadata via standardized REST interfaces, the DDI will be a best-practice example for demonstrating the connection of Research Data Infrastructures to specialized and general-purpose search facilities (internal data search, B2FIND, GeRDI, web search engines). Making research data findable in such ways proves essential for data sharing and re-use within research communities.