This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

Introduction #

Creating and operating materialized tables involves multiple components’ collaborative work. This document will systematically explain the complete deployment solution for Materialized Tables, covering architectural overview, environment preparation, deployment procedures, and operational practices.

Architecture Introduction #

Client: Could be any client that can interact with Flink SQL Gateway, such as SQL Client, Flink JDBC Driver and so on.
Flink SQL Gateway: Supports creating, altering, and dropping Materialized table. It also serves as an embedded workflow scheduler to periodically refresh full mode Materialized Table.
Flink Cluster: The pipeline for refreshing Materialized Table will run on the Flink cluster.
Catalog: Manages the creation, retrieval, modification, and deletion of the metadata of Materialized Table.
Catalog Store: Supports catalog property persistence to automatically initialize catalogs for retrieving metadata in Materialized Table related operations.

Illustration of Flink Materialized Table Architecture

Deployment Preparation #

Flink Cluster Setup #

Materialized Table refresh jobs currently support execution in these cluster environments:

Flink SQL Gateway Deployment #

Materialized Tables must be created through SQL Gateway, which requires specific configurations for metadata persistence and job scheduling.

Configure Catalog Store #

Add catalog store configurations in config.yaml to persist catalog properties.

table:
  catalog-store:
    kind: file
    file:
      path: {path_to_catalog_store} # Replace with the actual path

Refer to Catalog Store for details.

Configure Workflow Scheduler Plugin #

Add workflow scheduler configurations in config.yaml for periodic refresh job scheduling. Currently, only the embedded scheduler is supported:

workflow-scheduler:
  type: embedded

Start SQL Gateway #

Start the SQL Gateway using:

./sql-gateway.sh start

Note The Catalog must support creating materialized tables, which is currently only supported by Paimon Catalog.

Operation Guide #

Connecting to SQL Gateway #

Example using SQL Client:

./sql-client.sh gateway --endpoint {gateway_endpoint}:{gateway_port}

Creating Materialized Tables #

Refresh Jobs Running on Standalone Cluster #

Flink SQL> SET 'execution.mode' = 'remote';
[INFO] Execute statement succeeded.

FLINK SQL> CREATE MATERIALIZED TABLE my_materialized_table
> ... ;
[INFO] Execute statement succeeded.

Refresh Jobs Running in Session Mode #

For session modes, pre-create session cluster as documented in yarn-session or kubernetes-session

Kubernetes session mode:

Flink SQL> SET 'execution.mode' = 'kubernetes-session';
[INFO] Execute statement succeeded.

Flink SQL> SET 'kubernetes.cluster-id' = 'flink-cluster-mt-session-1';
[INFO] Execute statement succeeded.

Flink SQL> CREATE MATERIALIZED TABLE my_materialized_table
> ... ;
[INFO] Execute statement succeeded.

Set execution.mode to kubernetes-session and specify a valid kubernetes.cluster-id corresponding to an existing Kubernetes session cluster.

YARN session mode:

Flink SQL> SET 'execution.mode' = 'yarn-session';
[INFO] Execute statement succeeded.

Flink SQL> SET 'yarn.application.id' = 'application-xxxx';
[INFO] Execute statement succeeded.

Flink SQL> CREATE MATERIALIZED TABLE my_materialized_table
> ... ;
[INFO] Execute statement succeeded.

Set execution.mode to yarn-session and specify a valid yarn.application.id corresponding to an existing YARN session cluster.

Refresh Jobs Running in Application Mode #

Kubernetes application mode:

Flink SQL> SET 'execution.mode' = 'kubernetes-application';
[INFO] Execute statement succeeded.

Flink SQL> SET 'kubernetes.cluster-id' = 'flink-cluster-mt-application-1';
[INFO] Execute statement succeeded.

Flink SQL> CREATE MATERIALIZED TABLE my_materialized_table
> ... ;
[INFO] Execute statement succeeded.

Set execution.mode to kubernetes-application. The kubernetes.cluster-id is optional; if not set, it will be automatically generated.

YARN application mode:

Flink SQL> SET 'execution.mode' = 'yarn-application';
[INFO] Execute statement succeeded.

Flink SQL> CREATE MATERIALIZED TABLE my_materialized_table
> ... ;
[INFO] Execute statement succeeded.

Only set execution.mode to yarn-application. The yarn.application.id doesn’t need to be set; it will be automatically generated during submission.

Maintenance Operations #

Cluster information (e.g., execution.mode or kubernetes.cluster-id) is already persisted in the catalog and does not need to be set when suspend or resume the refresh jobs of Materialized Table.

Suspend Refresh Job #

-- Suspend the MATERIALIZED TABLE refresh job
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table SUSPEND;
[INFO] Execute statement succeeded.

Resume Refresh Job #

-- Resume the MATERIALIZED TABLE refresh job
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table RESUME;
[INFO] Execute statement succeeded.

Modify Query Definition #

-- Modify the MATERIALIZED TABLE query definition
Flink SQL> ALTER MATERIALIZED TABLE my_materialized_table
> AS SELECT
> ... ;
[INFO] Execute statement succeeded.