DatE PiE - Data Engineering Pipeline Evolution

Evolutionary change over time in the context of data engineering pipelines is certain, especially with regard to the structure and semantics of data as well as to the pipeline operators. Dealing with these changes, i.e. providing long-term maintenance, is costly. This shows the need for evolution capabilities.

Keywords:

Data Engineering

Data Pipeline

Workflow

Self-Awareness

Self-Adaption

Evolution Capabilities in Data Pipelines

Dealing with evolutionary change within data pipelines is a major goal with diverse challenges. At the core of our solution lies a two-step process consisting of self-awareness and self-adaption. In order to grasp these abstract concepts, we created a conceptual requirements model, which encompasses criteria for self-awareness and self-adaption as well as covering the dimensions data, operator, pipeline and environment. A lack of said capabilities in existing frameworks exposes a major gap, which we envision on filling with our future work. We created a roadmap with the most important steps towards this goal, which would contribute to scientists and practitioners alike.

Publication: Kevin Kramer. Towards Evolution Capabilities in Data Pipelines. In Proc. GvDB 2023, 2023. Also available as technical report: https://arxiv.org/abs/2308.14591

Kevin Kramer | 08.09.2023