VLDB2023

Towards Migration-Free Just-In-Case Data Archival for Future Cloud Data Lakes

Eugenio Marinelli, Yiqing Yan, Virginie Magnone, Charlotte Dumargne, Pascal Barbry, Thomas Heinis, Raja Appuswamy

1 citation

Abstract

Given the growing adoption of AI, cloud data lakes are facing the need to support cost-effective "just-in-case" data archival over long time periods to meet regulatory compliance requirements. Unfortunately, current media technologies suffer from fundamental issues that will soon, if not already, make cost-effective data archival infeasible. In this paper, we present a vision for redesigning the archival tier of cloud data lakes based on a novel, obsolescence-free storage medium-synthetic DNA. In doing so, we make two contributions: (i) we highlight the challenges in using DNA for data archival and list several open research problems, (ii) we outline OligoArchive-DSM (OA-DSM)-an end-to-end DNA storage pipeline that we are developing to demonstrate the feasibility of our vision.