Menue Symbol

Towards Automated Schema and Data Migration between different Data Stores

Keywords:
Model Transformation
Data Migration
Schema Optimization
Automation
Microservices

Due to the steady development and appearance of new database technologies as well as the further development of applications and the consequentially changed requirements to a data store used at a given time, it can be necessary to migrate data between different data stores. However, due to the heterogeneity of NoSQL data stores, such as different data models as well as schema flexibility, migration is rather a complex undertaking.

The goal is to develop a system that enables automatic data and schema migration between all types of data stores which also includes schema optimization with respect to the target system.

Our approach has the following major objectives:

  • An automatic process of schema and data migration between all types of data stores.
  • Automatic schema optimizations based on query workload and/or different metrics of the data.
  • Extensibility in terms of supported data stores from and to which migration is possible.

Such a migration process consists of several logically separate components:

  • Schema Extraction: To be able to make needed optimizations of the schema, requires the extraction of the schema from the source data store.
  • Schema Model (PIM): Our concept of schema optimization and migration is based on the use of a meta model. The use of such a platform-independent model (PIM) represents a central component that enables the flexible extension of the supported data stores from and to which a migration is possible.
  • Data and Query Analysis: An important prerequisite for the optimization of the schema regarding various target stores to be migrated to, is an in-depth analysis of the workload queries and consideration of various metrics of the data.
  • Schema Optimization: An important aspect and at the same time a great challenge is the automated optimization process of the schema with respect to the target store. For example, migration from relational to non-relational stores can result in significant performance losses if no changes to the schema are made. Unlike relational data stores, most NoSQL stores do not support join operations. These need then to be accomplished outside the database engine at application level. Optimizing queries of multiple entity types can therefore be reached by embedding the entities involved in the corresponding query into one single entity. This may in turn lead to redundancies and may thus have a negative impact on writing operations. Even though aggregate-oriented data stores are fundamentally a matter of embedding and/or referencing, it is still relevant to find an optimal model pertaining to complex workloads with different possibilities of read and write operations.
  • Data & Schema Migration: Schema Migration describes the mapping (and optimization) of the schema into the target store. The data migration describes the “transfer” of data into the target store. This primarily concerns the mapping of the data structures of the source store to the data structures of the target store.

Publications

Poster

Poster Logo

Student theses

We regularly publish new topics for theses. An overview of open topics can be found here: dbis theses

Assigned

currently none

Ongoing

  • Entwurf und Implementierung einer Anwendung zur Erfassung von Arbeitslast-Informationen aus verschiedenen Datenbanksystemen - A. Zanker (Bachelor)
  • Evaluation of Different NoSQL Schema Optimization Aproaches - C. Bost (Master)
  • Methoden zur Entdeckung von expliziten und impliziten Referenzen in relationalen und nicht-relationalen Datenbanksystemen - M. Sperling (Master)
  • Ende-zu-Ende Schema-Migration und -Optimierung von relationalen Datenbanken zu MongoDB - H. Pohlhausen (Master)

Overview (live generated gantt chart)

(...init in progress...)

Completed

  • Schematransformation zwischen verschiedenen relationalen und nicht-relationalen Datenbanksystemen - C. Herber (Master), March 2024
  • Generische Testdatengenerierung für verschiedene relationale und nicht-relationale Datenbanksysteme - M. Heder (Bachelor), August 2023
  • Schemaextraktion für verschiedene NoSQL-Datenbanksysteme - N. Bölter (Master), August 2023
  • Datenmigration zwischen verschiedenen relationalen und nicht-relationalen Systemen - B. Meier (Master), August 2023
  • Entwicklung eines WebUIs zur Darstellung und Bearbeitung plattformunabhängiger Datenbank-Schemata - A. Jakob (Bachelor), July 2023
  • Schemaextraktion aus Graphdatenbanken - J. Hübner (Bachelor), February 2023
  • Perfomance-Evaluierung für verschiedene virtualisierte Datenbankmanagementsysteme - D. Coric (Bachelor), August 2022
  • Schema-Extraktion in NoSQL-Datenbanksystemen - I. Russkaya (Master), July 2022
  • Metamodelle für NoSQL-Datenbanksysteme - Ph. Utzmann (Master), June 2022
  • Optimierung von NoSQL-Schemata - R. Helbig (Master), May 2022
André Conrad | 18.03.2024