Apache Doris Docs (English) - Compressed
Apache Doris Docs (English) - Compressed
Table of contents:
Getting-Started
Quick Start
Introduction to Apache Doris
Introduction to Apache Doris
Usage Scenarios
Technical Overview
Installation and Deployment
Installation and Deployment
Compilation
Compilation
Compiling with LDB toolchain
Compile With ldb-toolchain
Compilation With Arm
Compile With ARM
Compilation on Windows
Compile With Windows
Compilation on macOS
Compile With macOS
Build Docker Image
Build Docker Image
Deploy Docker cluster
Deploy the Docker cluster
Data Model
Data Model
Data Partition
Data Partitioning
Guidelines for Basic Use
User Guide
Rollup and query
Rollup and Query
Best Practices
Best Practices
dynamie schema table
Dynamic Table
Index Overview
Index Overview
inverted index
Inverted Index
BloomFilter index
BloomFilter index
NGram BloomFilter Index
Doris NGram BloomFilter Index
Bitmap Index
Bitmap Index
Import Overview
Import Overview
Import local data
Import local data
External storage data import
External storage data import
Kafka Data Subscription
Subscribe to Kafka logs
Synchronize data through external table
Synchronize data through external table
Synchronize data using Insert method
Synchronize data using Insert method
Data import things and atomicity
Data import things and atomicity
Data transformation, column mapping and filtering
Imported data transformation, column mapping and filtering
import strict mode
import strict mode
Binlog Load
Binlog Load
Broker Load
Broker Load
Routine Load
Routine Load
Spark Load
Spark Load
Stream load
Stream load
MySql load
Mysql load
S3 Load
S3 Load
Insert Into
Insert Into
Load Json Format Data
JSON format data import
Data export
Data export
Export Query Result
Export Query Result
Use mysqldump data to export table structure or data
Use mysqldump data to export table structure or data
Batch Delete
Batch Delete
update
Update
Delete
Delete
Sequence Column
Sequence Column
Schema Change
Schema Change
Replace Table
Replace Table
Dynamic Partition
Dynamic Partition
Temporary partition
Temporary partition
Partition Cache
Partition Cache
Bucket Shuffle Join
Bucket Shuffle Join
Colocation Join
Colocation Join
Runtime Filter
Runtime Filter
Doris Join optimization principle
Doris Join optimization principle
[Experimental] Pipeline execution engine
Pipeline Execution Engine
Materialized view
Materialized view
Broker
Broker
Query Analysis
Query Analysis
Import Analysis
Import Analysis
Debug Log
Debug Log
Compaction
Compaction
Resource management
Resource Management
Orthogonal BITMAP calculation
Orthogonal BITMAP calculation
Approximate deduplication using HLL
Variable
Variable
Time zone
Time zone
File Manager
File Manager
cold hot separation
cold hot separation
Compute Node
Compute node
High-concurrency point query
High-concurrency point query based on primary key
Multi Catalog
Multi Catalog
Hive
Hive
Iceberg
Iceberg
Hudi
Hudi
Elasticsearch
Elasticsearch
JDBC
JDBC
Alibaba Cloud DLF
Alibaba Cloud DLF
FAQ
FAQ
Elasticsearch External Table
Elasticsearch External Table
JDBC External Table
JDBC External Table
ODBC External Table
ODBC External Table
Hive External Table
Hive External Table
File Analysis
File Analysis
File Cache
File Cache
Spark Doris Connector
Spark Doris Connector
Flink Doris Connector
Flink Doris Connector
DataX doriswriter
DataX doriswriter
Mysql to Doris
Mysql to Doris
Logstash Doris Output Plugin
Doris output plugin
Plugin Development Manual
Doris Plugin Framework
Audit log plugin
Audit log plugin
CloudCanal Data Import
CloudCanal Data Import
Hive Bitmap UDF
Hive UDF
Seatunnel Connector Flink Doris
Seatunnel
Seatunnel Connector Spark Doris
Seatunnel
Install Seatunnel
Contribute UDF
Contribute UDF
Remote User Defined Function Service
User Defined Function Rpc
Native User Defined Function
Native User Defined Function
Java UDF
Java UDF
Remote User Defined Aggregation Function Service
User Defined Function Rpc
array
array_max
array_min
array_avg
array_sum
array_size
array_remove
array_slice
array_sort
array_position
array_contains
array_except
array_product
array_intersect
array_range
array_distinct
array_difference
array_union
array_join
array_with_constant
array_enumerate
array_popback
array_compact
arrays_overlap
countequal
element_at
convert_tz
curdate,current_date
curtime,current_time
current_timestamp
localtime,localtimestamp
now
year
quarter
month
day
dayofyear
dayofmonth
dayofweek
week
weekday
weekofyear
yearweek
dayname
monthname
hour
minute
second
from_days
last_day
to_monday
from_unixtime
unix_timestamp
utc_timestamp
to_date
to_days
extract
makedate
str_to_date
time_round
timediff
timestampadd
timestampdiff
date_add
date_sub
date_trunc
date_format
datediff
microseconds_add
minutes_add
minutes_diff
minutes_sub
seconds_add
seconds_diff
seconds_sub
hours_add
hours_diff
hours_sub
days_add
days_diff
days_sub
weeks_add
weeks_diff
weeks_sub
months_add
months_diff
months_sub
years_add
years_diff
years_sub
ST_X
ST_Y
ST_Circle
ST_Distance_Sphere
St_Point
ST_Polygon,ST_PolyFromText,ST_PolygonFromText
ST_AsText,ST_AsWKT
ST_Contains
ST_GeometryFromText,ST_GeomFromText
ST_LineFromText,ST_LineStringFromText
to_base64
from_base64
ascii
length
bit_length
char_length
lpad
rpad
lower
lcase
upper
ucase
initcap
repeat
reverse
concat
concat_ws
substr
substring
sub_replace
append_trailing_char_if_absent
ends_with
starts_with
trim
ltrim
rtrim
null_or_empty
not_null_or_empty
hex
unhex
elt
instr
locate
field
find_in_set
replace
left
right
strleft
strright
split_part
split_by_string
substring_index
money_format
parse_url
convert_to
extract_url_parameter
uuid
space
sleep
esquery
mask
mask_first_n
mask_last_n
multi_search_all_positions
multi_match_any
like
not like
regexp
regexp_extract
regexp_extract_all
regexp_replace
regexp_replace_one
not regexp
COLLECT_SET
MIN
STDDEV_SAMP
AVG
AVG_WEIGHTED
PERCENTILE
PERCENTILE_ARRAY
HLL_UNION_AGG
TOPN
TOPN_ARRAY
TOPN_WEIGHTED
COUNT
SUM
MAX_BY
BITMAP_UNION
group_bitmap_xor
group_bit_and
group_bit_or
group_bit_xor
PERCENTILE_APPROX
STDDEV,STDDEV_POP
GROUP_CONCAT
COLLECT_LIST
MIN_BY
MAX
ANY_VALUE
VARIANCE_SAMP,VARIANCE_SAMP
APPROX_COUNT_DISTINCT
VARIANCE,VAR_POP,VARIANCE_POP
RETENTION
SEQUENCE-MATCH
SEQUENCE-COUNT
GROUPING
GROUPING_ID
to_bitmap
bitmap_hash
bitmap_from_string
bitmap_to_string
bitmap_to_array
bitmap_from_array
bitmap_empty
bitmap_or
bitmap_and
bitmap_union
bitmap_xor
bitmap_not
bitmap_and_not
bitmap_subset_limit
bitmap_subset_in_range
sub_bitmap
bitmap_count
bitmap_and_count
bitmap_and_not_count
orthogonal_bitmap_union_count
bitmap_xor_count
bitmap_or_count
bitmap_contains
bitmap_has_all
bitmap_has_any
bitmap_max
bitmap_min
intersect_count
bitmap_intersect
orthogonal_bitmap_intersect
orthogonal_bitmap_intersect_count
bitmap_hash64
bitand
bitor
bitxor
bitnot
case
coalesce
if
ifnull
nvl
nullif
jsonb_parse
jsonb_extract
get_json_double
get_json_int
get_json_string
json_array
json_object
json_quote
json_valid
murmur_hash3_32
murmur_hash3_64
HLL_CARDINALITY
HLL_EMPTY
HLL_HASH
conv
bin
sin
cos
tan
asin
acos
atan
e
Pi
exp
log
log2
ln
log10
ceil
floor
pmod
round
round_bankers
truncate
abs
sqrt
cbrt
pow
degrees
radians
sign
positive
negative
greatest
least
random
mod
AES
MD5
MD5SUM
SM4
SM3
SM3SUM
explode_json_array
explode
explode_split
explode_bitmap
outer combinator
numbers
explode_numbers
s3
hdfs
iceberg_meta
WINDOW-FUNCTION-LAG
WINDOW-FUNCTION-SUM
WINDOW-FUNCTION-LAST_VALUE
WINDOW-FUNCTION-AVG
WINDOW-FUNCTION-MIN
WINDOW-FUNCTION-COUNT
WINDOW-FUNCTION
WINDOW-FUNCTION-RANK
WINDOW-FUNCTION-DENSE_RANK
WINDOW-FUNCTION-MAX
WINDOW-FUNCTION-FIRST_VALUE
WINDOW-FUNCTION-LEAD
WINDOW-FUNCTION-ROW_NUMBER
WINDOW-FUNCTION-NTILE
WINDOW-FUNCTION-WINDOW-FUNNEL
CAST
DIGITAL-MASKING
width_bucket
SET-PROPERTY
REVOKE
GRANT
LDAP
CREATE-ROLE
DROP-ROLE
CREATE-USER
DROP-USER
SET-PASSWORD
ALTER-USER
ALTER-SYSTEM-DROP-FOLLOWER
ALTER-SYSTEM-DECOMMISSION-BACKEND
ALTER-SYSTEM-DROP-OBSERVER
CANCEL-ALTER-SYSTEM
ALTER-SYSTEM-DROP-BROKER
ALTER-SYSTEM-ADD-OBSERVER
ALTER-SYSTEM-MODIFY-BACKEND
ALTER-SYSTEM-ADD-FOLLOWER
ALTER-SYSTEM-MODIFY-BROKER
ALTER-SYSTEM-ADD-BROKER
ALTER-SYSTEM-ADD-BACKEND
ALTER-SYSTEM-DROP-BACKEND
ALTER-CATALOG
ALTER-DATABASE
ALTER-TABLE-BITMAP
ALTER-TABLE-PARTITION
ALTER-TABLE-COLUMN
ALTER-RESOURCE
CANCEL-ALTER-TABLE
ALTER-TABLE-COMMENT
ALTER-VIEW
ALTER-SQL-BLOCK-RULE
ALTER-TABLE-REPLACE
ALTER-TABLE-PROPERTY
ALTER-TABLE-ROLLUP
ALTER-TABLE-RENAME
ALTER-POLICY
RESTORE
DROP-REPOSITORY
CANCEL-RESTORE
BACKUP
CANCEL-BACKUP
CREATE-REPOSITORY
CREATE-ENCRYPT-KEY
CREATE-TABLE-AS-SELECT
CREATE-POLICY
CREATE-VIEW
CREATE-DATABASE
CREATE-FILE
CREATE-INDEX
CREATE-RESOURCE
CREATE-TABLE-LIKE
CREATE-MATERIALIZED-VIEW
CREATE-EXTERNAL-TABLE
CREATE-TABLE
CREATE-SQL-BLOCK-RULE
CREATE-FUNCTION
CREATE-CATALOG
DROP-INDEX
DROP-RESOURCE
DROP-FILE
DROP-ENCRYPT-KEY
DROP-DATABASE
DROP-MATERIALIZED-VIEW
DROP-MATERIALIZED-VIEW
TRUNCATE-TABLE
DROP-TABLE
DROP-FUNCTION
DROP-DATABASE
DROP-CATALOG
PAUSE-ROUTINE-LOAD
MULTI-LOAD
RESUME-SYNC-JOB
CREATE-ROUTINE-LOAD
STOP-ROUTINE-LOAD
CLEAN-LABEL
ALTER-ROUTINE-LOAD
CANCEL-LOAD
RESUME-ROUTINE-LOAD
STOP-SYNC-JOB
PAUSE-SYNC-JOB
BROKER-LOAD
CREATE-SYNC-JOB
STREAM-LOAD
MYSQL-LOAD
INSERT
SELECT
DELETE
UPDATE
EXPORT
CANCEL-EXPORT
OUTFILE
ADMIN DIAGNOSE TABLET
ADMIN-SHOW-CONFIG
KILL
ADMIN-CHECK-TABLET
ADMIN-CLEAN-TRASH
ENABLE-FEATURE
RECOVER
UNINSTALL-PLUGIN
ADMIN-SET-REPLICA-STATUS
ADMIN-SHOW-REPLICA-DISTRIBUTION
INSTALL-PLUGIN
ADMIN-REPAIR-TABLE
ADMIN-CANCEL-REPAIR
SET-VARIABLE
ADMIN-SET-CONFIG
ADMIN SHOW TABLET STORAGE FORMAT
ADMIN-SHOW-REPLICA-STATUS
ADMIN-COPY-TABLET
ADMIN-REBALANCE-DISK
ADMIN-CANCEL-REBALANCE-DISK
SHOW ALTER TABLE MATERIALIZED VIEW
SHOW-ALTER
SHOW-BACKUP
SHOW-BACKENDS
SHOW-BROKER
SHOW-CATALOGS
SHOW-CREATE-TABLE
SHOW-CHARSET
SHOW-CREATE-CATALOG
SHOW-CREATE-DATABASE
SHOW-CREATE-MATERIALIZED-VIEW
SHOW-CREATE-LOAD
SHOW-CREATE-ROUTINE-LOAD
SHOW-CREATE-FUNCTION
SHOW-COLUMNS
SHOW-COLLATION
SHOW-DATABASES
SHOW DATA SKEW
SHOW-DATABASE-ID
SHOW-DYNAMIC-PARTITION
SHOW-DELETE
SHOW-DATA
SHOW-ENGINES
SHOW-EVENTS
SHOW-EXPORT
SHOW-ENCRYPT-KEY
SHOW-FUNCTIONS
SHOW-TYPECAST
SHOW-FILE
SHOW-GRANTS
SHOW-LAST-INSERT
SHOW-LOAD-PROFILE
SHOW-LOAD-WARNINGS
SHOW-INDEX
SHOW-MIGRATIONS
SHOW-PARTITION-ID
SHOW-SNAPSHOT
SHOW-ROLLUP
SHOW-SQL-BLOCK-RULE
SHOW-ROUTINE-LOAD
SHOW-SYNC-JOB
SHOW-WHITE-LIST
SHOW-WARNING
SHOW-TABLET
SHOW-VARIABLES
SHOW-PLUGINS
SHOW-ROLES
SHOW-PROCEDURE
SHOW-ROUTINE-LOAD-TASK
SHOW-PROC
SHOW-TABLE-STATUS
SHOW-REPOSITORIES
SHOW-QUERY-PROFILE
SHOW-OPEN-TABLES
SHOW-TABLETS
SHOW-LOAD
SHOW-TABLES
SHOW-RESOURCES
SHOW-PARTITIONS
SHOW-FRONTENDS
SHOW-RESTORE
SHOW-PROPERTY
SHOW-TRIGGERS
SHOW-PROCESSLIST
SHOW-TRASH
SHOW-VIEW
SHOW-TRANSACTION
SHOW-STREAM-LOAD
SHOW-STATUS
SHOW-TABLE-ID
SHOW-SMALL-FILES
SHOW-POLICY
SHOW-CATALOG-RECYCLE-BIN
BOOLEAN
TINYINT
SMALLINT
INT
BIGINT
LARGEINT
FLOAT
DOUBLE
DECIMAL
DECIMALV3
DATE
DATETIME
DATEV2
DATETIMEV2
CHAR
VARCHAR
STRING
HLL (HyperLogLog)
BITMAP
QUANTILE_STATE
ARRAY
JSONB
IN
HELP
USE
DESCRIBE
SWITCH
REFRESH
Cluster upgrade
Cluster upgrade
Elastic scaling
Elastic scaling
load balancing
load balancing
Data Backup
Data Backup
Data Restore
Data Recovery
Data Recover
Data Deletion Recovery
Sql Interception
SQL Block Rule
Statistics of query execution
Statistics of query execution
tracing
tracing
performance optimization
Performance optimization
Monitor Metrics
Monitor Metrics
Disk Capacity Management
Disk Capacity Management
Data replica management
Data replica management
Description of the return value of the OLAP function on the BE side
Description of the return value of the OLAP function on the BE side
Doris Error Table
Doris Error Table
Tablet metadata management tool
Tablet metadata management tool
Monitoring and alarming
Monitoring and alarming
Tablet Local Debug
Tablet Local Debug
Tablet Restore Tool
Tablet Restore Tool
Metadata Operations and Maintenance
Metadata Operations and Maintenance
Memory Tracker
Memory Tracker
Memory Limit Exceeded Analysis
Memory Limit Exceeded Analysis
BE OOM Analysis
BE OOM Analysis
Config Dir
Config Dir
FE Configuration
FE Configuration
BE Configuration
BE Configuration
User Property
User configuration item
Authority Management
Authority Management
LDAP
LDAP
backends
rowsets
Multi-tenancy
Multi-tenancy
Config Action
Config Action
HA Action
HA Action
Hardware Info Action
Hardware Info Action
Help Action
Help Action
Log Action
Log Action
Login Action
Login Action
Logout Action
Logout Action
Query Profile Action
Query Profile Action
Session Action
Session Action
System Action
System Action
Colocate Meta Action
Colocate Meta Action
Meta Action
Meta Action
Cluster Action
Cluster Action
Node Action
Node Action
Query Profile Action
Query Profile Action
Backends Action
Backends Action
Bootstrap Action
Bootstrap Action
Cancel Load Action
Cancel Load Action
Check Decommission Action
Check Decommission Action
Check Storage Type Action
Check Storage Type Action
Connection Action
Connection Action
Extra Basepath Action
Extra Basepath Action
Fe Version Info Action
Fe Version Info Action
Get DDL Statement Action
Get DDL Statement Action
Get Load Info Action
Get Load Info Action
Get Load State
Get Load State
Get FE log file
Get FE log file
Get Small File Action
Get Small File
Health Action
Health Action
Meta Info Action
Meta Action
Meta Replay State Action
Meta Replay State Action
Metrics Action
Metrics Action
Profile Action
Profile Action
Query Detail Action
Query Detail Action
Query Schema Action
Query Schema Action
Row Count Action
Row Count Action
Set Config Action
Set Config Action
Show Data Action
Show Data Action
Show Meta Info Action
Show Meta Info Action
Show Proc Action
Show Proc Action
Show Runtime Info Action
Show Runtime Info Action
Statement Execution Action
Statement Execution Action
Table Query Plan Action
Table Query Plan Action
Table Row Count Action
Table Row Count Action
Table Schema Action
Table Schema Action
Upload Action
Upload Action
Import Action
Import Action
Meta Info Action
Meta Info Action
Statistic Action
Statistic Action
Be Version Info Action
Be Version Info Action
RESTORE TABLET
RESTORE TABLET
PAD ROWSET
PAD ROWSET
MIGRATE SINGLE TABLET TO A PARTICULAR DISK
MIGRATE SINGLE TABLET TO A PARTICULAR DISK
GET TABLETS DISTRIBUTION BETWEEN DIFFERENT DISKS
GET TABLETS DISTRIBUTION BETWEEN DIFFERENT DISKS
Compaction Action
Compaction Action
GET TABLETS ON A PARTICULAR BE
GET TABLETS ON A PARTICULAR BE
CHECK/RESET Stub Cache
CHECK/RESET Stub Cache
CHECK ALL TABLET SEGMENT LOST
CHECK ALL TABLET SEGMENT LOST
Install Error
Operation and Maintenance Error
Data Operation Error
Data Operation Error
SQL Error
SQL Error
Star-Schema-Benchmark
Star Schema Benchmark
TPC-H Benchmark
TPC-H Benchmark
Release 1.2.2
New Features
Behavior Changes
Improvement
Bug Fix
Other
Big Thanks
Release 1.2.1
Improvement
Bug Fix
Upgrade Notice
Big Thanks
Release 1.2.0
Feature
Upgrade Notice
Big Thanks
Release 1.1.5
Behavior Changes
Features
Improvements
Bug Fix
Release 1.1.4
Features
Improvements
BugFix
Release 1.1.3
Features
Improvements
BugFix
Upgrade Notes
Release 1.1.2
Features
Improvements
Bug Fix
Release 1.1.1
Thanks
Release 1.1.0
Getting Started Getting-Started
Getting-Started
Apache Doris is a high-performance, real-time analytic database based on the MPP architecture and is known for its extreme
speed and ease of use. It takes only sub-second response times to return query results under massive amounts of data, and
can support not only highly concurrent point query scenarios, but also high-throughput complex analytic scenarios. version,
install and run it on a single node, including creating databases, data tables, importing data and queries, etc.
Download Doris
Doris runs on a Linux environment, CentOS 7.x or Ubuntu 16.04 or higher is recommended, and you need to have a Java
runtime environment installed (the minimum JDK version required is 8). To check the version of Java you have installed, run
the following command.
java -version
Next, download the latest binary version of Doris and unzip it.
Configure Doris
Configure FE
Go to the apache-doris-x.x.x/fe directory
cd apache-doris-x.x.x/fe
Modify the FE configuration file conf/fe.conf , here we mainly modify two parameters: priority_networks and meta_dir , if
you need more optimized configuration, please refer to FE parameter configuration for instructions on how to adjust them.
priority_networks=172.23.16.0/24
Note:
This parameter we have to configure during installation, especially when a machine has multiple IP addresses, we have
to specify a unique IP address for FE.
meta_dir=/path/your/doris-meta
Note:
Here you can leave it unconfigured, the default is doris-meta in your Doris FE installation directory.
To configure the metadata directory separately, you need to create the directory you specify in advance
Start FE
Execute the following command in the FE installation directory to complete the FE startup.
./bin/start_fe.sh --daemon
curl http://127.0.0.1:8030/api/bootstrap
Here the IP and port are the IP and http_port of FE (default 8030), if you are executing in FE node, just run the above
command directly.
If the return result has the word "msg": "success" , then the startup was successful.
You can also check this through the web UI provided by Doris FE by entering the address in your browser
http: // fe_ip:8030
You can see the following screen, which indicates that the FE has started successfully
Note:
1. Here we use the Doris built-in default user, root, to log in with an empty password.
2. This is an administrative interface for Doris, and only users with administrative privileges can log in.
Connect FE
We will connect to Doris FE via MySQL client below, download the installation-free MySQL client
Unzip the MySQL client you just downloaded and you can find the mysql command line tool in the bin/ directory. Then
execute the following command to connect to Doris.
Note:
1. The root user used here is the default user built into doris, and is also the super administrator user, see Rights
Management
2. -P: Here is our query port to connect to Doris, the default port is 9030, which corresponds to query_port in fe.conf
3. -h: Here is the IP address of the FE we are connecting to, if your client and FE are installed on the same node you can
use 127.0.0.1, this is also provided by Doris if you forget the root password, you can connect directly to the login
without the password in this way and reset the root password
show frontends\G;
Name: 172.21.32.5_9010_1660549353220
IP: 172.21.32.5
EditLogPort: 9010
HttpPort: 8030
QueryPort: 9030
RpcPort: 9020
Role: FOLLOWER
IsMaster: true
ClusterId: 1685821635
Join: true
Alive: true
ReplayedJournalId: 49292
IsHelper: true
ErrMsg:
Version: 1.1.2-rc03-ca55ac2
CurrentConnected: Yes
1. If the IsMaster, Join and Alive columns are true, the node is normal.
Stop FE
The stopping of Doris FE can be done with the following command
./bin/stop_fe.sh
Configure BE
Go to the apache-doris-x.x.x/be directory
cd apache-doris-x.x.x/be
Modify the BE configuration file conf/be.conf , here we mainly modify two parameters: priority_networks' and
storage_root , if you need more optimized configuration, please refer to BE parameter configuration instructions to make
adjustments.
priority_networks=172.23.16.0/24
Note:
This parameter we have to configure during installation, especially when a machine has multiple IP addresses, we have
to assign a unique IP address to the BE.
storage_root_path=/path/your/data_dir
Notes.
Since Version 1.2.0 Java UDF are supported since version 1.2, so BE are dependent on the Java environment. It is
necessary to set the `JAVA _HOME` environment variable before starting. You can also add `export
JAVA _HOME=your_java_home_path` to the first line of the `start_be.sh` startup script to set the variable.
Since Version 1.2.0 Install Java UDF functionsBecause Java UDF functions are supported from version 1.2, you need to
download the JAR package of Java UDF functions from the official website and put them in the lib directory of BE,
otherwise it may fail to start.
Start BE
Execute the following command in the BE installation directory to complete the BE startup.
./bin/start_be.sh --daemon
1. be_host_ip: Here is the IP address of your BE, match with priority_networks in be.conf .
2. heartbeat_service_port: This is the heartbeat upload port of your BE, match with heartbeat_service_port in be.conf ,
default is 9050 .
Example:
BackendId: 10003
Cluster: default_cluster
IP: 172.21.32.5
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 170
DataUsedCapacity: 985.787 KB
AvailCapacity: 782.729 GB
TotalCapacity: 984.180 GB
UsedPct: 20.47 %
MaxDiskUsedPct: 20.47 %
Version: 1.1.2-rc03-ca55ac2
Status: {"lastSuccessReportTabletsTime":"2022-08-17
13:33:05","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
Stop BE
The stopping of Doris BE can be done with the following command
./bin/stop_be.sh
Create table
1. Create database
2. Create table
use demo;
PROPERTIES (
);
3. Example data
10000,2017-10-01,beijing,20,0,2017-10-01 06:00:00,20,10,10
10006,2017-10-01,beijing,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,beijing,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,shanghai,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,guangzhou,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,shenzhen,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,shenzhen,35,0,2017-10-03 10:20:22,11,6,6
4. Import data
Here we import the data saved to the file above into the table we just created via Stream load 。
curl --location-trusted -u root: -T test.csv -H "column_separator:,"
http://127.0.0.1:8030/api/demo/example_tbl/_stream_load
-T test.csv : This is the data file we just saved, if the path is different, please specify the full path
-u root: Here is the user name and password, we use the default user root, the password is empty
127.0.0.1:8030 : is the ip and http_port of fe, respectively
"TxnId": 30303,
"Label": "8690a5c7-a493-48fc-b274-1bb7cd656f25",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 7,
"NumberLoadedRows": 7,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 399,
"LoadTimeMs": 381,
"BeginTxnTimeMs": 3,
"StreamLoadPutTimeMs": 5,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 191,
"CommitAndPublishTimeMs": 175
1. NumberLoadedRows indicates the number of data records that have been imported
Query data
We have finished building tables and importing data above, so we can experience Doris' ability to quickly query and analyze
data.
+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
+---------+------------+-----------+------+------+---------------------+------+----------------+----------------+
+---------+------------+----------+------+------+---------------------+------+----------------+----------------+
+---------+------------+----------+------+------+---------------------+------+----------------+----------------+
+-----------+------------+
| city | total_cost |
+-----------+------------+
| beijing | 37 |
| shenzhen | 111 |
| guangzhou | 30 |
| shanghai | 200 |
+-----------+------------+
This is the end of our entire quick start. We have experienced the complete Doris operation process from Doris installation
and deployment, start/stop, creation of library tables, data import and query, let's start our Doris usage journey.
Apache Doris, formerly known as Palo, was initially created to support Baidu's ad reporting business. It was officially open-
sourced in 2017 and donated by Baidu to the Apache Foundation for incubation in July 2018, where it was operated by
members of the incubator project management committee under the guidance of Apache mentors. Currently, the Apache
Doris community has gathered more than 400 contributors from nearly 100 companies in different industries, and the
number of active contributors is close to 100 per month. In June 2022, Apache Doris graduated from Apache incubator as a
Top-Level Project.
Apache Doris now has a wide user base in China and around the world, and as of today, Apache Doris is used in production
environments in over 1000 companies worldwide. Of the top 50 Chinese Internet companies by market capitalization (or
valuation), more than 80% are long-term users of Apache Doris, including Baidu, Meituan, Xiaomi, Jingdong, Bytedance,
Tencent, NetEase, Kwai, Weibo, and Ke Holdings. It is also widely used in some traditional industries such as finance, energy,
manufacturing, and telecommunications.
Usage Scenarios
As shown in the figure below, after various data integration and processing, the data sources are usually stored in the real-
time data warehouse Doris and the offline data lake or data warehouse (in Apache Hive, Apache Iceberg or Apache Hudi).
Reporting Analysis
Real-time dashboards
Reports for in-house analysts and managers
Highly concurrent user-oriented or customer-oriented report analysis: such as website analysis and ad reporting that
usually require thousands of QPS and quick response times measured in miliseconds. A successful user case is that
Doris has been used by the Chinese e-commerce giant JD.com in ad reporting, where it receives 10 billion rows of
data per day, handles over 10,000 QPS, and delivers a 99 percentile query latency of 150 ms.
Ad-Hoc Query. Analyst-oriented self-service analytics with irregular query patterns and high throughput requirements.
XiaoMi has built a growth analytics platform (Growth Analytics, GA) based on Doris, using user behavior data for business
growth analysis, with an average query latency of 10 seconds and a 95th percentile query latency of 30 seconds or less,
and tens of thousands of SQL queries per day.
Unified Data Warehouse Construction. Apache Doris allows users to build a unified data warehouse via one single
platform and save the trouble of handling complicated software stacks. Chinese hot pot chain Haidilao has built a unified
data warehouse with Doris to replace its old complex architecture consisting of Apache Spark, Apache Hive, Apache
Kudu, Apache HBase, and Apache Phoenix.
Data Lake Query. Apache Doris avoids data copying by federating the data in Apache Hive, Apache Iceberg, and Apache
Hudi using external tables, and thus achieves outstanding query performance.
Technical Overview
As shown in the figure below, the Apache Doris architecture is simple and neat, with only two types of processes.
Frontend (FE): user request access, query parsing and planning, metadata management, node management, etc.
Backend (BE): data storage and query plan execution
Both types of processes are horizontally scalable, and a single cluster can support up to hundreds of machines and tens of
petabytes of storage capacity. And these two types of processes guarantee high availability of services and high reliability of
data through consistency protocols. This highly integrated architecture design greatly reduces the operation and
maintenance cost of a distributed system.
In terms of interfaces, Apache Doris adopts MySQL protocol, supports standard SQL, and is highly compatible with MySQL
dialect. Users can access Doris through various client tools and it supports seamless connection with BI tools.
Doris uses a columnar storage engine, which encodes, compresses, and reads data by column. This enables a very high
compression ratio and largely reduces irrelavant data scans, thus making more efficient use of IO and CPU resources.
Sorted Compound Key Index: Users can specify three columns at most to form a compound sort key. This can effectively
prune data to better support highly concurrent reporting scenarios.
Z-order Index: This allows users to efficiently run range queries on any combination of fields in their schema.
MIN/MAX Indexing: This enables effective filtering of equivalence and range queries for numeric types.
Bloom Filter: very effective in equivalence filtering and pruning of high cardinality columns
Invert Index: This enables fast search for any field.
Doris supports a variety of storage models and has optimized them for different scenarios:
Aggregate Key Model: able to merge the value columns with the same keys and significantly improve performance
Unique Key Model: Keys are unique in this model and data with the same key will be overwritten to achieve row-level
data updates.
Duplicate Key Model: This is a detailed data model capable of detailed storage of fact tables.
Doris also supports strongly consistent materialized views. Materialized views are automatically selected and updated, which
greatly reduces maintenance costs for users.
Doris adopts the MPP model in its query engine to realize parallel execution between and within nodes. It also supports
distributed shuffle join for multiple large tables so as to handle complex queries.
The Doris query engine is vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual
function calls, improve cache hit rates, and make efficient use of SIMD instructions. Doris delivers a 5–10 times higher
performance in wide table aggregation scenarios than non-vectorized engines.
Apache Doris uses Adaptive Query Execution technology to dynamically adjust the execution plan based on runtime
statistics. For example, it can generate runtime filter, push it to the probe side, and automatically penetrate it to the Scan
node at the bottom, which drastically reduces the amount of data in the probe and increases join performance. The runtime
filter in Doris supports In/Min/Max/Bloom filter.
In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports constant folding, subquery rewriting,
predicate pushdown and CBO supports Join Reorder. The Doris CBO is under continuous optimization for more accurate
statistical information collection and derivation, and more accurate cost model prediction.
Before continue reading, you might want to compile Doris following the instructions in the Compile topic.
Overview
Doris, as an open source OLAP database with an MPP architecture, can run on most mainstream commercial servers. For
you to take full advantage of the high concurrency and high availability of Doris, we recommend that your computer meet
the following requirements:
Software Requirements
Soft Version
OS Installation Requirements
Set the maximum number of open file descriptors in the system
vi /etc/security/limits.conf
Clock synchronization
The metadata in Doris requires a time precision of less than 5000ms, so all machines in all clusters need to synchronize their
clocks to avoid service exceptions caused by inconsistencies in metadata caused by clock problems.
Backend 8 core + 16GB + SSD or SATA, 50GB + * Gigabit Network Card 1-3*
Production Environment
Note 1:
1. The disk space of FE is mainly used to store metadata, including logs and images. It usually ranges from several
hundred MB to several GB.
2. The disk space of BE is mainly used to store user data. The total disk space taken up is 3 times the total user data (3
copies). Then an additional 40% of the space is reserved for background compaction and intermediate data storage.
3. On one single machine, you can deploy multiple BE instances but only one FE instance. If you need 3 copies of the
data, you need to deploy 3 BE instances on 3 machines (1 instance per machine) instead of 3 BE instances on one
machine). Clocks of the FE servers must be consistent (allowing a maximum clock skew of 5 seconds).
4. The test environment can also be tested with only 1 BE instance. In the actual production environment, the number
of BE instances directly determines the overall query latency.
5. Disable swap for all deployment nodes.
1. FE nodes are divided into Followers and Observers based on their roles. (Leader is an elected role in the Follower
group, hereinafter referred to as Follower, too.)
2. The number of FE nodes should be at least 1 (1 Follower). If you deploy 1 Follower and 1 Observer, you can achieve
high read availability; if you deploy 3 Followers, you can achieve high read-write availability (HA).
3. The number of Followers must be odd, and there is no limit on the number of Observers.
4. According to past experience, for business that requires high cluster availability (e.g. online service providers), we
recommend that you deploy 3 Followers and 1-3 Observers; for offline business, we recommend that you deploy 1
Follower and 1-3 Observers.
Usually we recommend 10 to 100 machines to give full play to Doris' performance (deploy FE on 3 of them (HA) and BE
on the rest).
The performance of Doris is positively correlated with the number of nodes and their configuration. With a minimum of
four machines (one FE, three BEs; hybrid deployment of one BE and one Observer FE to provide metadata backup) and
relatively low configuration, Doris can still run smoothly.
In hybrid deployment of FE and BE, you might need to be watchful for resource competition and ensure that the
metadata catalogue and data catalogue belong to different disks.
Broker Deployment
Broker is a process for accessing external data sources, such as hdfs. Usually, deploying one broker instance on each machine
should be enough.
Network Requirements
Doris instances communicate directly over the network. The following table shows all required ports.
Instance Default Communication
Port Name Description
Name Port Direction
BE be_port 9060 FE --> BE Thrift server port on BE for receiving requests from FE
Note:
1. When deploying multiple FE instances, make sure that the http_port configuration of each FE is consistent.
2. Make sure that each port has access in its proper direction before deployment.
IP Binding
Because of the existence of multiple network cards, or the existence of virtual network cards caused by the installation of
docker and other environments, the same host may have multiple different IPs. Currently Doris does not automatically
identify available IPs. So when you encounter multiple IPs on the deployment host, you must specify the correct IP via the
priority_networks configuration item.
priority_networks is a configuration item that both FE and BE have. It needs to be written in fe.conf and be.conf. It is used
to tell the process which IP should be bound when FE or BE starts. Examples are as follows:
priority_networks=10.1.3.0/24
This is a representation of CIDR. FE or BE will find the matching IP based on this configuration item as their own local IP.
Note: Configuring priority_networks and starting FE or BE only ensure the correct IP binding of FE or BE. You also need to
specify the same IP in ADD BACKEND or ADD FRONTEND statements, otherwise the cluster cannot be created. For
example:
If you use the following IP in the ADD BACKEND statement: ALTER SYSTEM ADD BACKEND "192.168.0.1:9050";
At this point, you must DROP the wrong BE configuration and use the correct IP to perform ADD BACKEND.
Broker currently does not have the priority_networks configuration item, nor does it need. Broker's services are bound to
0.0.0.0 by default. You can simply execute the correct accessible BROKER IP when using ADD BROKER.
Table Name Case Sensitivity
By default, table names in Doris are case-sensitive. If you need to change that, you may do it before cluster initialization. The
table name case sensitivity cannot be changed after cluster initialization is completed.
Cluster Deployment
Manual Deployment
Deploy FE
Find the Fe folder under the output generated by source code compilation, copy it into the specified deployment path of
FE nodes and put it the corresponding directory.
Configure FE
i. The configuration file is conf/fe.conf. Note: meta_dir indicates the metadata storage location. The default value is
${DORIS_HOME}/doris-meta . The directory needs to be created manually.
Note: For production environments, it is better not to put the directory under the Doris installation directory but in a
separate disk (SSD would be the best); for test and development environments, you may use the default
configuration.
ii. The default maximum Java heap memory of JAVA _OPTS in fe.conf is 4GB. For production environments, we
recommend that it be adjusted to more than 8G.
Start FE
bin/start_fe.sh --daemon
The FE process starts and enters the background for execution. Logs are stored in the log/ directory by default. If startup
fails, you can view error messages by checking out log/fe.log or log/fe.out.
For details about deployment of multiple FEs, see the FE scaling section.
Deploy BE
Copy the BE deployment file into all nodes that are to deploy BE on
Find the BE folder under the output generated by source code compilation, copy it into to the specified deployment
paths of the BE nodes.
Modify be/conf/be.conf, which mainly involves configuring storage_root_path : data storage directory. By default, under
be/storage, the directory needs to be created manually. Use ; to separate multiple paths (do not add ; after the last
directory).
You may specify the directory storage medium in the path: HDD or SSD. You may also add capacility limit to the end of every
path and use , for separation. Unless you use a mix of SSD and HDD disks, you do not need to follow the configuration
methods in Example 1 and Example 2 below, but only need to specify the storage directory; you do not need to modify the
default storage medium configuration of FE, either.
Example 1:
Note: For SSD disks, add .SSD to the end of the directory; for HDD disks, add .HDD .
`storage_root_path=/home/disk1/doris.HDD;/home/disk2/doris.SSD;/home/disk2/doris`
Description
Example 2:
Note: You do not need to add the .SSD or .HDD suffix, but to specify the medium in the storage_root_path parameter
`storage_root_path=/home/disk1/doris,medium:HDD;/home/disk2/doris,medium:SSD`
Description
BE webserver_port configuration
If the BE component is installed in hadoop cluster, you need to change configuration webserver_port=8040 to avoid port
used.
Since Version 1.2.0 Java UDF is supported since version 1.2, so BEs are dependent on the Java environment. It is
necessary to set the `JAVA _HOME` environment variable before starting. You can also do this by adding `export
JAVA _HOME=your_java_home_path` to the first line of the `start_be.sh` startup script.
Since Version 1.2.0 Because Java UDF is supported since version 1.2, you need to download the JAR package of Java
UDF from the official website and put them under the lib directory of BE, otherwise it may fail to start.
BE nodes need to be added in FE before they can join the cluster. You can use mysql-client (Download MySQL 5.7) to
connect to FE:
fe_host is the node IP where FE is located; query_port is in fe/conf/fe.conf; the root account is used by default and no
password is required in login.
After login, execute the following command to add all the BE host and heartbeat service port:
Start BE
bin/start_be.sh --daemon
The BE process will start and go into the background for execution. Logs are stored in be/log/directory by default. If
startup fails, you can view error messages by checking out be/log/be.log or be/log/be.out.
View BE status
Connect to FE using mysql-client and execute SHOW PROC '/backends'; to view BE operation status. If everything goes
well, the isAlive column should be true .
Copy the corresponding Broker directory in the output directory of the source fs_broker to all the nodes that need to be
deployed. It is recommended to keep the Broker directory on the same level as the BE or FE directories.
You can modify the configuration in the corresponding broker/conf/directory configuration file.
Start Broker
bin/start_broker.sh --daemon
Add Broker
To let Doris FE and BE know which nodes Broker is on, add a list of Broker nodes by SQL command.
Use mysql-client to connect the FE started, and execute the following commands:
broker\_host is the Broker node ip; broker_ipc_port is in conf/apache_hdfs_broker.conf in the Broker configuration file.
Connect any started FE using mysql-client and execute the following command to view Broker status: SHOW PROC
'/brokers';
Note: In production environments, daemons should be used to start all instances to ensure that processes are automatically
pulled up after they exit, such as Supervisor. For daemon startup, in Doris 0.9.0 and previous versions, you need to remove
the last & symbol in the start_xx.sh scripts. In Doris 0.10.0 and the subsequent versions, you may just call sh start_xx.sh
directly to start.
FAQ
Process-Related Questions
After the FE process starts, metadata is loaded first. Based on the role of FE, you can see transfer from UNKNOWN to
MASTER/FOLLOWER/OBSERVER in the log. Eventually, you will see the thrift server started log and can connect to FE
through MySQL client, which indicates that FE started successfully.
You can also check whether the startup was successful by connecting as follows:
http://fe_host:fe_http_port/api/bootstrap
If it returns:
{"status":"OK","msg":"Success"}
Note: If you can't see the information of boot failure in fe. log, you may check in fe. out.
After the BE process starts, if there have been data there before, it might need several minutes for data index loading.
If BE is started for the first time or the BE has not joined any cluster, the BE log will periodically scroll the words waiting
to receive first heartbeat from frontend , meaning that BE has not received the Master's address through FE's
heartbeat and is waiting passively. Such error log will disappear after sending the heartbeat by ADD BACKEND in FE. If
the word master client', get client from cache failed. host:, port: 0, code: 7 appears after receiving the
heartbeat, it indicates that FE has successfully connected BE, but BE cannot actively connect FE. You may need to check
the connectivity of rpc_port from BE to FE.
If BE has been added to the cluster, the heartbeat log from FE will be scrolled every five seconds: get heartbeat,
host:xx. xx.xx.xx, port:9020, cluster id:xxxxxxx , indicating that the heartbeat is normal.
Secondly, if the word finish report task success. return code: 0 is scrolled every 10 seconds in the log, that indicates
that the BE-to-FE communication is normal.
Meanwhile, if there is a data query, you will see the rolling logs and the execute time is xxx logs, indicating that BE is
started successfully, and the query is normal.
You can also check whether the startup was successful by connecting as follows:
http://be_host:be_http_port/api/health
If it returns:
Note: If you can't see the information of boot failure in be.INFO, you may see it in be.out.
3. How can we confirm that the connectivity of FE and BE is normal after building the system?
Firstly, you need to confirm that FE and BE processes have been started separately and worked normally. Then, you need
to confirm that all nodes have been added through ADD BACKEND or ADD FOLLOWER/OBSERVER statements.
If the heartbeat is normal, BE logs will show get heartbeat, host:xx.xx.xx.xx, port:9020, cluster id:xxxxx ; if the
heartbeat fails, you will see backend [10001] got Exception: org.apache.thrift.transport.TTransportException in FE's
log, or other thrift communication abnormal log, indicating that the heartbeat from FE to 10001 BE fails. Here you need
to check the connectivity of the FE to BE host heartbeat port.
If the BE-to-FE communication is normal, the BE log will display the words finish report task success. return code:
0 . Otherwise, the words master client, get client from cache failed will appear. In this case, you need to check the
connectivity of BE to the rpc_port of FE.
In addition to Master FE, the other role nodes (Follower FE, Observer FE, Backend) need to register to the cluster through
the ALTER SYSTEM ADD statement before joining the cluster.
When Master FE is started for the first time, a cluster_id is generated in the doris-meta/image/VERSION file.
When FE joins the cluster for the first time, it first retrieves the file from Master FE. Each subsequent reconnection
between FEs (FE reboot) checks whether its cluster ID is the same as that of other existing FEs. If it is not the same, the
FE will exit automatically.
When BE first receives the heartbeat of Master FE, it gets the cluster ID from the heartbeat and records it in the
cluster_id file of the data directory. Each heartbeat after that compares that to the cluster ID sent by FE. If the cluster
IDs are not matched, BE will refuse to respond to FE's heartbeat.
The heartbeat also contains Master FE's IP. If the Master FE changes, the new Master FE will send the heartbeat to BE
together with its own IP, and BE will update the Master FE IP it saved.
priority_network
priority_network is a configuration item that both FE and BE have. It is used to help FE or BE identify their own IP
addresses in the cases of multi-network cards. priority_network uses CIDR notation: RFC 4632
If the connectivity of FE and BE is confirmed to be normal, but timeout still occurs in creating tables, and the FE log
shows the error message backend does not find. host:xxxx.xxx.XXXX , this means that there is a problem with the
IP address automatically identified by Doris and that the priority_network parameter needs to be set manually.
The explanation to this error is as follows. When the user adds BE through the ADD BACKEND statement, FE
recognizes whether the statement specifies hostname or IP. If it is a hostname, FE automatically converts it to an IP
address and stores it in the metadata. When BE reports on the completion of the task, it carries its own IP address. If
FE finds that the IP address reported by BE is different from that in the metadata, the above error message will show
up.
Solutions to this error: 1) Set priority_network in FE and BE separately. Usually FE and BE are in one network
segment, so their priority_network can be set to be the same. 2) Put the correct IP address instead of the hostname
in the ADD BACKEND statement to prevent FE from getting the wrong IP address.
If it is not in the range [ min_file_descriptor_number , max_file_descriptor_number ], error will occurs when starting a BE
process. You may use the ulimit command to reset the parameters.
The default value of min_file_descriptor_number is 65536.
For example, the command ulimit -n 65536; means to set the number of file descriptors to 65536.
After starting a BE process, you can use cat /proc/$pid/limits to check the actual number of file descriptors of the
process.
If you have used supervisord and encountered a file descriptor error, you can fix it by modifying minfds in
supervisord.conf.
vim /etc/supervisord.conf
Compilation
This topic is about how to compile Doris from source.
$ docker images
Note 1: For different versions of Doris, you need to download the corresponding image version. For Apache Doris 0.15
and above, the corresponding Docker image will have the same version number as Doris. For example, you can use
apache/doris:build-env-for-0.15.0 to compile Apache Doris 0.15.0.
Note 2: apache/doris:build-env-ldb-toolchain-latest is used to compile the latest trunk code. It will keep up with the
update of the trunk code. You may view the update time in docker/README.md .
apache/doris:build-env-for-1.0.0 1.0.0
apache/doris:build-env-for-1.1.0 1.1.0
apache/doris:build-env-ldb-toolchain-latest trunk
apache/doris:build-env-ldb-toolchain-no-avx2-latest trunk
Note:
1. Third-party libraries in images with "no-avx2" in their names can run on CPUs that do not support avx2 instructions.
Doris can be compiled with the USE_AVX2=0 option.
4. The docker images of build-env-1.3.1 and above include both OpenJDK 8 and OpenJDK 11, please confirm the default
JDK version with java -version . You can also switch versions as follows. (It is recommended to use JDK8.)
Switch to JDK 8:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0
export JAVA_HOME=/usr/lib/jvm/java-11
It is recommended to run the image by mounting the local Doris source directory, so that the compiled binary file will be
stored in the host machine and will not disappear because of the exit of the image.
Meanwhile, it is recommended to mount the maven .m2 directory in the image to the host directory to prevent
repeated downloading of maven's dependent libraries each time the compilation is started.
After starting the image, you should be in the container. You can download the Doris source code using the following
command (If you have mounted the local Doris source directory, you don't need do this):
4. Compile Doris
Firstly, run the following command to check whether the compilation machine supports the avx2 instruction set.
$ USE_AVX2=0 sh build.sh
$ sh build.sh
If you are using build-env-for-0.15.0 or the subsequent versions for the first time, use the following command
when compiling:
This is we have upgraded the thrift (0.9 -> 0.13) for build-env-for-0.15.0 and the subsequent versions. That means
you need to use the --clean command to force the use of the new version of thrift to generate code files, otherwise it
will result in code incompatibility.
1. System Dependencies
GCC 7.3+, Oracle JDK 1.8+, Python 2.7+, Apache Maven 3.5+, CMake 3.11+ Bison 3.0+
If you are using Ubuntu 16.04 or newer, you can use the following command to install the dependencies:
sudo apt-get install build-essential openjdk-8-jdk maven byacc flex automake libtool-bin bison binutils-
dev libiberty-dev zip unzip libncurses5-dev curl git ninja-build python autopoint pkg-config
If you are using CentOS, you can use the following command to install the dependencies:
sudo yum groupinstall 'Development Tools' && sudo yum install maven cmake byacc flex automake libtool bison
binutils-devel zip unzip ncurses-devel curl git wget python2 glibc-static libstdc++-static java-1.8.0-
openjdk
GCC 10+, Oracle JDK 1.8+, Python 2.7+, Apache Maven 3.5+, CMake 3.19.2+ Bison 3.0+
If you are using Ubuntu 16.04 or newer, you can use the following command to install the dependencies:
sudo apt install build-essential openjdk-8-jdk maven cmake byacc flex automake libtool-bin bison binutils-
dev libiberty-dev zip unzip libncurses5-dev curl git ninja-build python
ln -s /usr/bin/g++-11 /usr/bin/g++
ln -s /usr/bin/gcc-11 /usr/bin/gcc
2. Compile Doris
This is the same as compiling with the Docker development image. Before compiling, you need to check whether the
avx2 instruction is supported.
$ sh build.sh
$ USE_AVX2=0 sh build.sh
FAQ
1. Could not transfer artifact net.sourceforge.czt.dev:cup-maven-plugin:pom:1.6-cdh from/to xxx
If you encounter the above error, please refer to PR #4769 and modify the cloudera-related repo configuration in
fe/pom.xml .
The download links of the third-party libraries that Doris relies on are all in the thirdparty/vars.sh file. Over time, some
download links may fail. If you encounter this situation. It can be solved in the following two ways:
Manually modify the problematic download links and the corresponding MD5 value.
export REPOSITORY_URL=https://doris-thirdparty-repo.bj.bcebos.com/thirdparty
sh build-thirdparty.sh
REPOSITORY_URL contains all third-party library source code packages and their historical versions.
If you encounter this error, the possible reason is not enough memory allocated to the image. (The default memory
allocation for Docker is 2 GB, and the peak memory usage during the compilation might exceed that.)
You can fix this by increasing the memory allocation for the image, 4 GB ~ 8 GB, for example.
Special Statement
Starting from version 0.13, the dependency on the two third-party libraries [1] and [2] will be removed in the default compiled
output. These two third-party libraries are under GNU General Public License V3. This license is incompatible with Apache
License 2.0, so it will not appear in the Apache release by default.
Remove library [1] will result in the inability to access MySQL external tables. The feature of accessing MySQL external tables
will be implemented through UnixODBC in future release version.
Remove library [2] will cause some data written in earlier versions (before version 0.8) to be unable to read. Because the data
in the earlier version was compressed using the LZO algorithm, in later versions, it has been changed to the LZ4 compression
algorithm. We will provide tools to detect and convert this part of the data in the future.
If required, you can continue to use these two libraries by adding the following option when compiling:
Note that if you use these two third-party libraries, that means you choose not to use Doris under the Apache License 2.0,
and you might need to pay attention to the GPL-related agreements.
[1] mysql-5.7.18
[2] lzo-2.10
You can still compile the latest code using the Docker development image: apache/doris:build-env-ldb-toolchain-
latest
1. Download ldb_toolchain_gen.sh
The latest ldb_toolchain_gen.sh can be downloaded from here. This script is used to generate the ldb toolchain.
sh ldb_toolchain_gen.sh /path/to/ldb_toolchain/
After execution, the following directory structure will be created under /path/to/ldb_toolchain/ .
├── bin
├── include
├── lib
├── share
├── test
└── usr
i. Java8
ii. Apache Maven 3.6.3
iii. Node v12.13.0
Different Linux distributions might contain different packages, so you may need to install additional packages. The
following instructions describe how to set up a minimal CentOS 6 box to compile Doris. It should work similarly for other
Linux distros.
sudo yum install -y byacc patch automake libtool make which file ncurses-devel gettext-devel unzip bzip2 zip
util-linux wget git python2
# install autoconf-2.69
cd autoconf-2.69 && \
./configure && \
make && \
make install
# install bison-3.0.4
cd bison-3.0.4 && \
./configure && \
make && \
make install
After downloading, create the custom_env.sh , file under the Doris source directory, and set the PATH environment
variable:
export JAVA_HOME=/path/to/java/
export PATH=$JAVA_HOME/bin:$PATH
export PATH=/path/to/maven/bin:$PATH
export PATH=/path/to/node/bin:$PATH
export PATH=/path/to/ldb_toolchain/bin:$PATH
Compile Doris
Enter the Doris source code directory and execute:
Check whether the compilation machine supports the avx2 instruction set.
$ USE_AVX2=0 sh build.sh
This script will compile the third-party libraries first and then the Doris components (FE, BE) later. The compiled output will
be in the output/ directory.
https://github.com/apache/doris-thirdparty/releases
Here we provide precompiled third-party binaries for Linux X86(with AVX2) and MacOS(X86 Chip). If it is consistent with your
compiling and running environment, you can download and use it directly.
After downloading, you will get an installed/ directory after decompression, copy this directory to the thirdparty/
directory, and then run build.sh .
Edit this page
Feedback
Install And Deploy Compile Compilation With Arm
Note that this is for reference only. Other errors may occur when compiling in different environments.
KylinOS
1. KylinOS Version :
$> cat /etc/.kyinfo
name=Kylin-Server
milestone=10-SP1-Release-Build04-20200711
arch=arm64
beta=False
time=2020-07-11 17:16:54
dist_id=Kylin-Server-10-SP1-Release-Build04-20200711-arm64-2020-07-11 17:16:54
2. CPU Model:
Download ldb_toolchain_gen.aarch64.sh
Note that you need to download the corresponding aarch64 versions of jdk and nodejs:
1. Java8-aarch64
2. Node v12.13.0-aarch64
Hardware Environment
1. System Version: CentOS 8.4, Ubuntu 20.04
2. System Architecture: ARM X64
3. CPU: 4C
4. Memory: 16 GB
5. Hard Disk: 40GB (SSD), 100GB (SSD)
Software Environment
Software Environment List
Component Name Component Version
Git 2.0+
JDK 1.8.0
Maven 3.6.3
NodeJS 16.3.0
LDB-Toolchain 0.9.1
byacc
patch
automake
libtool
make
which
file
ncurses-devel
yum install or apt-get install
gettext-devel
unzip
bzip2
zip
util-linux
wget
git
python2
autoconf 2.69
bison 3.0.4
CentOS 8.4
mkdir /opt/tools
mkdir /opt/software
Git
JDK8 (2 methods)
# 1. yum install, which can avoid additional download and configuration. Installing the devel package is to
get tools such as the jps command.
# 2. Download the installation package of the arm64 architecture, decompress it, and configure the environment
variables.
cd /opt/tools
mv jdk1.8.0_291 /opt/software/jdk8
Maven
cd /opt/tools
# Download the wget tool, decompress it, and configure the environment variables.
mv apache-maven-3.6.3 /opt/software/maven
NodeJS
cd /opt/tools
mv node-v16.3.0-linux-arm64 /opt/software/nodejs
ldb-toolchain
cd /opt/tools
sh ldb_toolchain_gen.aarch64.sh /opt/software/ldb_toolchain/
vim /etc/profile.d/doris.sh
export JAVA_HOME=/opt/software/jdk8
export MAVEN_HOME=/opt/software/maven
export NODE_JS_HOME=/opt/software/nodejs
export LDB_HOME=/opt/software/ldb_toolchain
export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$NODE_JS_HOME/bin:$LDB_HOME/bin:$PATH
source /etc/profile.d/doris.sh
# Test
java -version
mvn -version
node --version
> v16.3.0
gcc --version
> gcc-11
sudo yum install -y byacc patch automake libtool make which file ncurses-devel gettext-devel unzip bzip2 bison
zip util-linux wget git python2
# Install autoconf-2.69
cd /opt/tools
cd /opt/software/autoconf && \
./configure && \
make && \
make install
Ubuntu 20.04
apt-get update
The Ubuntu shell installs dash instead of bash by default. It needs to be switched to bash for proper execution. Run the
following command to view the details of sh and confirm which program corresponds to the shell:
ls -al /bin/sh
After these steps, dash will no longer be the default shell tool.
mkdir /opt/tools
mkdir /opt/software
Git
JDK8
# Download the installation package of the ARM64 architecture, decompress it, and configure environment
variables.
cd /opt/tools
mv jdk1.8.0_291 /opt/software/jdk8
Maven
cd /opt/tools
# Download the wget tool, decompress it, and configure the environment variables.
mv apache-maven-3.6.3 /opt/software/maven
NodeJS
cd /opt/tools
mv node-v16.3.0-linux-arm64 /opt/software/nodejs
ldb-toolchain
cd /opt/tools
sh ldb_toolchain_gen.aarch64.sh /opt/software/ldb_toolchain/
vim /etc/profile.d/doris.sh
export JAVA_HOME=/opt/software/jdk8
export MAVEN_HOME=/opt/software/maven
export NODE_JS_HOME=/opt/software/nodejs
export LDB_HOME=/opt/software/ldb_toolchain
export PATH=$JAVA_HOME/bin:$MAVEN_HOME/bin:$NODE_JS_HOME/bin:$LDB_HOME/bin:$PATH
source /etc/profile.d/doris.sh
# Test
java -version
mvn -version
node --version
> v16.3.0
gcc --version
> gcc-11
sudo apt install -y build-essential cmake flex automake bison binutils-dev libiberty-dev zip libncurses5-dev
curl ninja-build
# Install autoconf-2.69
cd /opt/tools
cd /opt/software/autoconf && \
./configure && \
make && \
make install
cd /opt
Execute compilation
# For machines that support the AVX2 instruction set, compile directly
sh build.sh
# For machines that do not support the AVX2 instruction set, use the following command to compile
USE_AVX2=OFF sh build.sh
FAQ
Problem Description
During the compilation and installation process, the following error occurrs:
Cause
Solution
export REPOSITORY_URL=https://doris-thirdparty-repo.bj.bcebos.com/thirdparty
sh /opt/doris/thirdparty/build-thirdparty.sh
REPOSITORY_URL contains all third-party library source packages and their historical versions.
Problem Description
Python 2.7.18
Cause
The system uses python2.7 , python3.6 , python2 , python3 by default to execute python commands. Doris only
requires python 2.7+ to install dependencies, so you just need to add a command python to connect.
Solution
whereis python
Problem Description
Cannot find the output folder in the directory after the execution of build.sh.
Cause
Solution
sh build.sh --clean
Problem Description
Cause
This error message is caused by download failure (connection to the repo.maven.apache.org central repository fails).
Solution
Rebuild
Problem Description
Failed to build CXX object during compilation, error message showing no space left
Cause
Solution
Expand the free space on the device by deleting files you don't need, etc.
Problem Description
Cause
Solution
Problem Description
An exception is reported when starting FE after migrating the drive letter where FE is located
Cause
FE has been migrated to another location, which doesn't match the hard disk information stored in the metadata; or
the hard disk is damaged or not mounted
Solution
Couldn't find pkg.m4 from pkg-config. Install the appropriate package for your distribution or set
ACLOCAL_PATH to the directory containing pkg.m4.
Cause
There is something wrong with the compilation of the third-party library libxml2 .
Possible Reasons:
a. An exception occurs when the Ubuntu system loads the environment variables so the index under the ldb
directory is not successfully loaded.
b. The retrieval of environment variables during libxml2 compilation fails, so the ldb/aclocal directory is not
retrieved.
Solution
Copy the pkg.m4 file in the ldb/aclocal directory into the libxml2/m4 directory, and recompile the third-party library.
cp /opt/software/ldb_toolchain/share/aclocal/pkg.m4 /opt/incubator-doris/thirdparty/src/libxml2-v2.9.10/m4
sh /opt/incubator-doris/thirdparty/build-thirdparty.sh
Problem Description
Cause
There is an error in the compilation environment. The gcc is of version 9.3.0 that comes with the system, so it is not
compiled with ldb, so you need to configure the ldb environment variable.
Solution
vim /etc/profile.d/ldb.sh
export LDB_HOME=/opt/software/ldb_toolchain
export PATH=$LDB_HOME/bin:$PATH
source /etc/profile.d/ldb.sh
# Test
gcc --version
> gcc-11
Problem Description
The follow error prompts are all due to one root cause.
bison related
a. fseterr.c error when installing bison-3.0.4
flex related
a. flex command not found
cmake related
a. cmake command not found
b. cmake cannot find the dependent library
c. cmake cannot find CMAKE_ROOT
d. Compiler set not found in cmake environment variable CXX
boost related
a. Boost.Build build engine failed
mysql related
a. Could not find mysql client dependency a file
gcc related
a. GCC version requires 11+
Cause
Solution
Compilation on Windows
This topic is about how to compile Doris from source with Windows.
Environment Requirements
1. Windows 11 or Windows 10, Version 1903, Build 18362 or newer
2. Normal-functioning WSL2
Steps
1. Install the Oracle Linux 7.9 distribution from the Microsoft Store
You can also install other distros you want via Docker images or Github installs
3. Install dependencies
sudo yum install -y byacc patch automake libtool make which file ncurses-devel gettext-devel unzip bzip2 zip
util-linux wget git python2
# Install autoconf-2.69
cd autoconf-2.69 && \
./configure && \
make && \
make install
# install bison-3.0.4
cd bison-3.0.4 && \
./configure && \
make && \
make install
Java8
Apache Maven 3.6.3
Node v12.13.0
LDB_TOOLCHAIN
7. Compile
cd doris
sh build.sh
Note
The default data storage drive letter of WSL2 distribution is the C drive. If neccessary, you can change that to prevent the
system drive letter from getting full.
Compilation on macOS
This topic is about how to compile Doris from source with macOS (both x86_64 and arm64).
Environment Requirements
1. macOS 12 (Monterey) or newer (both Intel chip and Apple Silicon chips are supported)
2. Homebrew
Steps
1. Use Homebrew to install dependencies.
python cmake ninja ccache bison byacc gettext wget pcre maven llvm openjdk@11 npm
bash build.sh
Third-Party Libraries
1. The Apache Doris Third Party Prebuilt page contains the source code of all third-party libraries. You can download doris-
thirdparty-source.tgz to obtain them.
2. You can download the precompiled third party library from the Apache Doris Third Party Prebuilt page. You may refer to
the following commands:
cd thirdparty
rm -rf installed
# Intel chips
curl -L https://github.com/apache/doris-thirdparty/releases/download/automation/doris-thirdparty-prebuilt-
darwin-x86_64.tar.xz \
-o - | tar -Jxf -
curl -L https://github.com/apache/doris-thirdparty/releases/download/automation/doris-thirdparty-prebuilt-
darwin-arm64.tar.xz \
-o - | tar -Jxf -
cd installed/bin
./protoc --version
./thrift --version
3. When running protoc or thrift , you may meet an error which says the app can not be opened because the developer
cannot be verified. Go to Security & Privacy . Click the Open Anyway button in the General pane to confirm your intent
to open the app. See https: //support.apple.com/en-us/HT202491.
Start-up
1. Set file descriptors (NOTICE: If you have closed the current session, you need to set this variable again)。
ulimit -n 65536
You can also write this configuration into the initialization files so you don't need to set the variables again when opening
a new terminal session.
# bash
# zsh
$ ulimit -n
65536
2. Start BE up
cd output/be/bin
./start_be.sh --daemon
3. Start FE up
cd output/fe/bin
./start_fe.sh --daemon
FAQ
Fail to start BE up. The log shows: fail to open StorageEngine, res=file descriptors limit is too small
To fix this, please refer to the "Start-up" section above and reset file descriptors .
Java Version
Java 11 is recommended.
Overview
Prepare the production machine before building a Docker image. The platform architecture of the Docker image will be the
same as that of the machine. For example, if you use an X86_64 machine to build a Docker image, you need to download the
Doris binary program of X86_64, and the Docker image built can only run on the X86_64 platform. The ARM platform (or M1),
likewise.
Hardware Requirements
Minimum configuration: 2C 4G
Software Requirements
Docker Version: 20.10 or newer
1. Use the official OpenJDK image certified by Docker-Hub as the base parent image (Version: JDK 1.8).
2. Use the official binary package for download; do not use binary packages from unknown sources.
3. Use embedded scripts for tasks such as FE startup, multi-FE registration, FE status check, BE startup, registration of
BE to FE, and BE status check.
4. Do not use --daemon to start applications in Docker. Otherwise there will be exceptions during the deployment of
orchestration tools such as K8S.
Apache Doris 1.2 and the subsequent versions support JavaUDF, so you also need a JDK environment for BE. The
recommended images are as follows:
Frontend openjdk:8u342-jdk
Backend openjdk:8u342-jdk
Broker openjdk:8u342-jdk
Script Preparation
In the Dockerfile script for compiling the Docker Image, there are two methods to load the binary package of the Apache
Doris program:
1. Execute the download command via wget / curl when compiling, and then start the docker build process.
2. Download the binary package to the compilation directory in advance, and then load it into the docker build process
through the ADD or COPY command.
Method 1 can produce a smaller Docker image, but if the docker build process fails, the download operation might be
repeated and result in longer build time; Method 2 is more suitable for less-than-ideal network environments.
The examples below are based on Method 2. If you prefer to go for Method 1, you may modify the steps accordingly.
If you have no such needs, you can just download the binary package from the official website.
Steps
Build FE
The build environment directory is as follows:
mkdir -p ./docker-build/fe/resource
FROM openjdk:8u342-jdk
ENV JAVA_HOME="/usr/local/openjdk-8/" \
PATH="/opt/apache-doris/fe/bin:$PATH"
# Download the software into the image (you can modify based on your own needs)
cd /opt && \
mv apache-doris-fe-${x.x.x}-bin /opt/apache-doris/fe
ENTRYPOINT ["/opt/apache-doris/fe/bin/init_fe.sh"]
Please note that ${tagName} needs to be replaced with the tag name you want to package and name, such as: apache-
doris:1.1.3-fe
Build FE:
cd ./docker-build/fe
Build BE
1. Create a build environment directory
mkdir -p ./docker-build/be/resource
FROM openjdk:8u342-jdk
ENV JAVA_HOME="/usr/local/openjdk-8/" \
PATH="/opt/apache-doris/be/bin:$PATH"
# Download the software into the image (you can modify based on your own needs)
cd /opt && \
mv apache-doris-be-${x.x.x}-bin-x86_64 /opt/apache-doris/be
ENTRYPOINT ["/opt/apache-doris/be/bin/init_be.sh"]
Please note that ${tagName} needs to be replaced with the tag name you want to package and name, such as: apache-
doris:1.1.3-be
Build BE:
cd ./docker-build/be
After the build process is completed, you will see the prompt Success . Then, you can check the built image using the
following command.
docker images
Build Broker
1. Create a build environment directory
mkdir -p ./docker-build/broker/resource
FROM openjdk:8u342-jdk
ENV JAVA_HOME="/usr/local/openjdk-8/" \
PATH="/opt/apache-doris/broker/bin:$PATH"
# Download the software into the image, where the broker directory is synchronously compressed to the binary
package of FE, which needs to be decompressed and repackaged (you can modify based on your own needs)
cd /opt && \
mv apache_hdfs_broker /opt/apache-doris/broker
ENTRYPOINT ["/opt/apache-doris/broker/bin/init_broker.sh"]
Please note that ${tagName} needs to be replaced with the tag name you want to package and name, such as: apache-
doris:1.1.3-broker
Build Broker:
cd ./docker-build/broker
After the build process is completed, you will see the prompt Success . Then, you can check the built image using the
following command.
docker images
docker login
If the login succeeds, you will see the prompt Success , and then you can push the Docker image to the warehouse.
Background description
This article will briefly describe how to quickly build a complete Doris test cluster through docker run or docker-compose up
commands.
Applicable scene
It is recommended to use Doris Docker in SIT or DEV environment to simplify the deployment process.
If you want to test a certain function point in the new version, you can use Doris Docker to deploy a Playground
environment. Or when you want to reproduce a certain problem during debugging, you can also use the docker
environment to simulate.
In the production environment, currently try to avoid using containerized solutions for Doris deployment.
Software Environment
Software Version
Hardware environment
Configuration Type Hardware Information Maximum Running Cluster Size
Pre-environment preparation
The following command needs to be executed on the host machine
sysctl -w vm.max_map_count=2000000
Docker Compose
Different platforms need to use different Image images. This article takes the X86_64 platform as an example.
1. HOST mode suitable for deployment across multiple nodes, this mode is suitable for deploying 1FE 1BE on each node.
2. The subnet bridge mode is suitable for deploying multiple Doris processes on a single node. This mode is suitable for
single-node deployment (recommended). If you want to deploy multiple nodes, you need to deploy more components
(not recommended).
For the sake of presentation, this chapter only demonstrates scripts written in subnet bridge mode.
Interface Description
From the version of Apache Doris 1.2.1 Docker Image , the interface list of each process image is as follows:
FE FE_ID FE node ID 1
Note that the above interface must fill in the information, otherwise the process cannot be started.
The FE_ID interface rule is: an integer of 1-9 , where the FE number 1 is the Master node.
The NODE_ROLE interface rule is: computation or empty, where empty or other values indicate that the node type is
mix type
Script Template
--name=fe \
--env FE_SERVERS="fe1:172.20.80.2:9010" \
--env FE_ID=1 \
-p 8030:8030\
-p 9030:9030 \
-v /data/fe/doris-meta:/opt/apache-doris/fe/doris-meta \
-v /data/fe/conf:/opt/apache-doris/fe/conf \
-v /data/fe/log:/opt/apache-doris/fe/log \
--network=doris-network \
--ip=172.20.80.2\
apache/doris:1.2.1-fe-x86_64
--name=be\
--env FE_SERVERS="fe1:172.20.80.2:9010" \
--env BE_ADDR="172.20.80.3:9050" \
-p 8040:8040 \
-v /data/be/storage:/opt/apache-doris/be/storage \
-v /data/be/conf:/opt/apache-doris/be/conf \
-v /data/be/log:/opt/apache-doris/be/log \
--network=doris-network \
--ip=172.20.80.3\
apache/doris:1.2.1-be-x86_64
3FE & 3BE Run command template if needed click here to access Downloads.
version: '3'
services:
docker-fe:
image: "apache/doris:1.2.1-fe-x86_64"
container_name: "doris-fe"
hostname: "fe"
environment:
- FE_SERVERS=fe1:172.20.80.2:9010
- FE_ID=1
ports:
- 8030:8030
- 9030:9030
volumes:
- /data/fe/doris-meta:/opt/apache-doris/fe/doris-meta
- /data/fe/conf:/opt/apache-doris/fe/conf
- /data/fe/log:/opt/apache-doris/fe/log
networks:
doris_net:
ipv4_address: 172.20.80.2
docker-be:
image: "apache/doris:1.2.1-be-x86_64"
container_name: "doris-be"
hostname: "be"
depends_on:
- docker-fe
environment:
- FE_SERVERS=fe1:172.20.80.2:9010
- BE_ADDR=172.20.80.3:9050
ports:
- 8040:8040
volumes:
- /data/be/storage:/opt/apache-doris/be/storage
- /data/be/conf:/opt/apache-doris/be/conf
- /data/be/script:/docker-entrypoint-initdb.d
- /data/be/log:/opt/apache-doris/be/log
networks:
doris_net:
ipv4_address: 172.20.80.3
networks:
doris_net:
ipam:
config:
- subnet: 172.20.80.0/16
3FE & 3BE Docker Compose script template if needed [click here]
(https: //github.com/apache/doris/tree/master/docker/runtime/docker-compose-demo/build-cluster/docker-compose/
3fe_3be/docker-compose.yaml) access to download.
sysctl -w vm.max_map_count=2000000
Unfinished business
1. Compose Demo List
Data Model
This topic introduces the data models in Doris from a logical perspective so you can make better use of Doris in different
business scenarios.
Basic concepts
In Doris, data is logically described in the form of tables. A table consists of rows and columns. Row is a row of user data.
Column is used to describe different fields in a row of data.
Columns can be divided into two categories: Key and Value. From a business perspective, Key and Value correspond to
dimension columns and indicator columns, respectively.
Aggregate
Unique
Duplicate
Aggregate Model
We illustrate what aggregation model is and how to use it correctly with practical examples.
The corresponding CREATE TABLE statement would be as follows (omitting the Partition and Distribution information):
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "last visit date time",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "user max dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "user min dwell time"
PROPERTIES (
);
As you can see, this is a typical fact table of user information and visit behaviors. In star models, user information and visit
behaviors are usually stored in dimension tables and fact tables, respectively. Here, for the convenience of explanation, we
store the two types of information in one single table.
The columns in the table are divided into Key (dimension) columns and Value (indicator columns) based on whether they are
set with an AggregationType . Key columns are not set with an AggregationType , such as user_id , date , and age , while
Value columns are.
When data are imported, rows with the same contents in the Key columns will be aggregated into one row, and their values
in the Value columns will be aggregated as their AggregationType specify. Currently, their are four aggregation types:
Suppose that you have the following import data (raw data):
Assume that this is a table recording the user behaviors when visiting a certain commodity page. The first row of data, for
example, is explained as follows:
Data Description
20 User Age
After this batch of data is imported into Doris correctly, it will be stored in Doris as follows:
As you can see, the data of User 10000 have been aggregated to one row, while those of other users remain the same. The
explanation for the aggregated data of User 10000 is as follows (the first 5 columns remain unchanged, so it starts with
Column 6 last_visit_date ):
* 2017-10-01 07:00 : The last_visit_date column is aggregated by REPLACE, so 2017-10-01 07:00 has replaced 2017-10-01
06:00 .
Note: When using REPLACE to aggregate data from the same import batch, the order of replacement is uncertain. That
means, in this case, the data eventually saved in Doris could be 2017-10-01 06:00 . However, for different import
batches, it is certain that data from the new batch will replace those from the old batch.
* 35 : The cost column is aggregated by SUM, so the update value 35 is the result of 20 + 15 .
* 10 : The max_dwell_time column is aggregated by MAX, so 10 is saved as it is the maximum between 10 and 2 .
* 2 : The min_dwell_time column is aggregated by MIN, so 2 is saved as it is the minimum between 10 and 2 .
After aggregation, Doris only stores the aggregated data. In other words, the detailed raw data will no longer be available.
time stamp DATETIME Date and time when the data are imported (with second-level accuracy)
last visit date DATETIME REPLACE Last visit time of the user
A new column timestamp has been added to record the date and time when the data are imported (with second-level
accuracy).
user_id date timestamp city age sex last_visit_date cost max_dwell_time min_dwell_time
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 06:00 20 10 10
01 08:00:05
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 07:00 15 2 2
01 09:00:05
user_id date timestamp city age sex last_visit_date cost max_dwell_time min_dwell_time
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 06:00 20 10 10
01 08:00:05
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 07:00 15 2 2
01 09:00:05
As you can see, the stored data are exactly the same as the import data. No aggregation has ever happened. This is because,
the newly added timestamp column results in difference of Keys among the rows. That is to say, as long as the Keys of the
rows are not identical in the import data, Doris can save the complete detailed data even in the Aggregate Model.
Example 3: aggregate import data and existing data
Based on Example 1, suppose that you have the following data stored in Doris:
As you can see, the existing data and the newly imported data of User 10004 have been aggregated. Meanwhile, the new data
of User 10005 have been added.
1. The ETL stage of each batch of import data. At this stage, the batch of import data will be aggregated internally.
2. The data compaction stage of the underlying BE. At this stage, BE will aggregate data from different batches that have
been imported.
3. The data query stage. The data involved in the query will be aggregated accordingly.
At different stages, data will be aggregated to varying degrees. For example, when a batch of data is just imported, it may not
be aggregated with the existing data. But for users, they can only query aggregated data. That is, what users see are the
aggregated data, and they should not assume that what they have seen are not or partly aggregated. (See the [Limitations of
Aggregate Model](#Limitations of Aggregate Model) section for more details.)
Unique Model
In some multi-dimensional analysis scenarios, users are highly concerned about how to ensure the uniqueness of the Key,
that is, how to create uniqueness constraints for the Primary Key. Therefore, we introduce the Unique Model. Prior to Doris
1.2, the Unique Model was essentially a special case of the Aggregate Model and a simplified representation of table schema.
The Aggregate Model is implemented by Merge on Read, so it might not deliver high performance in some aggregation
queries (see the [Limitations of Aggregate Model] (#Limitations of Aggregate Model) section). In Doris 1.2, we have introduced
a new implementation for the Unique Model--Merge on Write, which can help achieve optimal query performance. For now,
Merge on Read and Merge on Write will coexist in the Unique Model for a while, but in the future, we plan to make Merge on
Write the default implementation of the Unique Model. The following will illustrate the two implementations with examples.
This is a typical user basic information table. There is no aggregation requirement for such data. The only concern is to ensure
the uniqueness of the primary key. (The primary key here is user_id + username). The CREATE TABLE statement for the above
table is as follows:
PROPERTIES (
);
This is the same table schema and the CREATE TABLE statement as those of the Aggregate Model:
PROPERTIES (
);
That is to say, the Merge on Read implementation of the Unique Model is equivalent to the REPLACE aggregation type in the
Aggregate Model. The internal implementation and data storage are exactly the same.
In Doris 1.2.0, as a new feature, Merge on Write is disabled by default, and users can enable it by adding the following
property:
"enable_unique_key_merge_on_write" = "true"
Take the previous table as an example, the corresponding CREATE TABLE statement should be:
PROPERTIES (
"enable_unique_key_merge_on_write" = "true"
);
The table schema produced by the above statement will be different from that of the Aggregate Model.
On a Unique table with the Merge on Write option enabled, during the import stage, the data that are to be overwritten and
updated will be marked for deletion, and new data will be written in. When querying, all data marked for deletion will be
filtered out at the file level, and only the latest data would be readed. This eliminates the data aggregation cost while reading,
and supports many types of predicate pushdown now. Therefore, it can largely improve performance in many scenarios,
especially in aggregation queries.
[NOTE]
1. The new Merge on Write implementation is disabled by default, and can only be enabled by specifying a property when
creating a new table.
2. The old Merge on Read cannot be seamlessly upgraded to the new implementation (since they have completely different
data organization). If you want to switch to the Merge on Write implementation, you need to manually execute insert
into unique-mow- table select * from source table to load data to new table.
3. The two unique features delete sign and sequence col of the Unique Model can be used as normal in the new
implementation, and their usage remains unchanged.
Duplicate Model
In some multi-dimensional analysis scenarios, there is no need for primary keys or data aggregation. For these cases, we
introduce the Duplicate Model to. Here is an example:
PROPERTIES (
);
Different from the Aggregate and Unique Models, the Duplicate Model stores the data as they are and executes no
aggregation. Even if there are two identical rows of data, they will both be retained. The DUPLICATE KEY in the CREATE
TABLE statement is only used to specify based on which columns the data are sorted. (A more appropriate name than
DUPLICATE KEY would be SORTING COLUMN, but it is named as such to specify the data model used. For more information,
see Prefix Index.) For the choice of DUPLICATE KEY, we recommend the first 2-4 columns.
The Duplicate Mode l is suitable for storing raw data without aggregation requirements or primary key uniqueness
constraints. For more usage scenarios, see the [Limitations of Aggregate Model](#Limitations of Aggregate Model) section.
The Aggregate Model only presents the aggregated data. That means we have to ensure the presentation consistency of data
that has not yet been aggregated (for example, two different import batches). The following provides further explanation
with examples.
Assume that there are two batches of data that have been imported into the storage engine as follows:
batch 1
10001 2017-11-20 50
10002 2017-11-21 39
batch 2
10001 2017-11-20 1
10001 2017-11-21 5
10003 2017-11-22 22
As you can see, data about User 10001 in these two import batches have not yet been aggregated. However, in order to
ensure that users can only query the aggregated data as follows:
10001 2017-11-20 51
10001 2017-11-21 5
10002 2017-11-21 39
10003 2017-11-22 22
We have added an aggregation operator to the query engine to ensure the presentation consistency of data.
In addition, on the aggregate column (Value), when executing aggregate class queries that are inconsistent with the
aggregate type, please pay attention to the semantics. For example, in the example above, if you execute the following
query:
Meanwhile, this consistency guarantee could considerably reduce efficiency in some queries.
In other databases, such queries return results quickly. Because in actual implementation, the models can get the query
result by counting rows and saving the statistics upon import, or by scanning only one certain column of data to get count
value upon query, with very little overhead. But in Doris's Aggregation Model, the overhead of such queries is large.
batch 1
10001 2017-11-20 50
10002 2017-11-21 39
batch 2
10001 2017-11-20 1
10001 2017-11-21 5
10003 2017-11-22 22
10001 2017-11-20 51
10001 2017-11-21 5
10002 2017-11-21 39
10003 2017-11-22 22
The correct result of select count (*) from table; should be 4. But if the model only scans the user_id column and
operates aggregation upon query, the final result will be 3 (10001, 10002, 10003). And if it does not operate aggregation, the
final result will be 5 (a total of five rows in two batches). Apparently, both results are wrong.
In order to get the correct result, we must read both the user_id and date column, and performs aggregation when
querying. That is to say, in the count (*) query, Doris must scan all AGGREGATE KEY columns (in this case, user_id and
date ) and aggregate them to get the semantically correct results. That means if there are many aggregated columns, count
(*) queries could involve scanning large amounts of data.
Therefore, if you need to perform frequent count (*) queries, we recommend that you simulate count (*) by adding a
column of value 1 and aggregation type SUM. In this way, the table schema in the previous example will be modified as
follows:
The above adds a count column, the value of which will always be 1, so the result of select count (*) from table; is
equivalent to that of select sum (count) from table; The latter is much more efficient than the former. However, this
method has its shortcomings, too. That is, it requires that users will not import rows with the same values in the AGGREGATE
KEY columns. Otherwise, select sum (count) from table; can only express the number of rows of the originally imported
data, instead of the semantics of select count (*) from table;
Another method is to add a cound column of value 1 but aggregation type of REPLACE. Then select sum (count) from
table; and select count (*) from table; could produce the same results. Moreover, this method does not require the
absence of same AGGREGATE KEY columns in the import data.
batch 1
After Batch 2 is imported, the duplicate rows in the first batch will be marked as deleted, and the status of the two batches of
data is as follows
batch 1
batch 2
user_id date cost delete bit
In queries, all data marked true in the delete bitmap will not be read, so there is no need for data aggregation. Since there
are 4 valid rows in the above data, the query result should also be 4. This also enables minimum overhead since it only scans
one column of data.
In the test environment, count(*) queries in Merge on Write of the Unique Model deliver 10 times higher performance than
that of the Aggregate Model.
Duplicate Model
The Duplicate Model does not impose the same limitation as the Aggregate Model because it does not involve aggregation
semantics. For any columns, it can return the semantically correct results in count (*) queries.
Key Columns
For the Duplicate, Aggregate, and Unique Models, the Key columns will be specified when the table is created, but there exist
some differences: In the Duplicate Model, the Key columns of the table can be regarded as just "sorting columns", but not
unique identifiers. In Aggregate and Unique Models, the Key columns are both "sorting columns" and "unique identifier
columns".
1. The Aggregate Model can greatly reduce the amount of data scanned and query computation by pre-aggregation. Thus,
it is very suitable for report query scenarios with fixed patterns. But this model is unfriendly to count (*) queries.
Meanwhile, since the aggregation method on the Value column is fixed, semantic correctness should be considered in
other types of aggregation queries.
2. The Unique Model guarantees the uniqueness of primary key for scenarios requiring unique primary key. The downside is
that it cannot exploit the advantage brought by pre-aggregation such as ROLLUP in queries.
i. Users who have high performance requirements for aggregate queries are recommended to use the newly added
Merge on Write implementation since version 1.2.
ii. The Unique Model only supports entire-row updates. If you require primary key uniqueness as well as partial updates
of certain columns (such as loading multiple source tables into one Doris table), you can consider using the
Aggregate Model, while setting the aggregate type of the non-primary key columns to REPLACE_IF_NOT_NULL. See
CREATE TABLE Manual for more details.
3. The Duplicate Model is suitable for ad-hoc queries of any dimensions. Although it may not be able to take advantage of
the pre-aggregation feature, it is not limited by what constrains the Aggregate Model and can give full play to the
advantage of columnar storage (reading only the relevant columns, but not all Key columns).
Data Partition
This topic is about table creation and data partitioning in Doris, including the common problems in table creation and their
solutions.
Basic Concepts
In Doris, data is logically described in the form of table.
Row refers to a row of data about the user. Column is used to describe different fields in a row of data.
Columns can be divided into two categories: Key and Value. From a business perspective, Key and Value correspond to
dimension columns and metric columns, respectively. In the Aggregate Model, rows with the same values in Key columns
will be aggregated into one row. The way how Value columns are aggregated is specified by the user when the table is built.
For more information about the Aggregate Model, please see the Data Model.
Tablets are logically attributed to different Partitions. One Tablet belongs to only one Partition, and one Partition contains
several Tablets. Since the tablets are physically stored independently, the partitions can be seen as physically independent,
too. Tablet is the smallest physical storage unit for data operations such as movement and replication.
A Table is formed of multiple Partitions. Partition can be thought of as the smallest logical unit of management. Data import
and deletion can be performed on only one Partition.
Data Partitioning
The following illustrates on data partitioning in Doris using the example of a CREATE TABLE operation.
CREATE TABLE in Doris is a synchronous command. It returns results after the SQL execution is completed. Successful
returns indicate successful table creation. For more information on the syntax, please refer to CREATE TABLE, or input the
HELP CREATE TABLE; command.
-- Range Partition
`date` DATE NOT NULL COMMENT "Date when the data are imported",
`timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
ENGINE=olap
PARTITION BY RANGE(`date`)
PROPERTIES
"replication_num" = "3",
"storage_medium" = "SSD",
);
-- List Partition
`date` DATE NOT NULL COMMENT "Date when the data are imported",
`timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are imported",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "User last visit time",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
ENGINE=olap
PARTITION BY LIST(`city`)
PARTITION `default`
PROPERTIES
"replication_num" = "3",
"storage_medium" = "SSD",
);
Definition of Column
Here we only use the AGGREGATE KEY data model as an example. See Doris Data Model for more information.
You can view the basic types of columns by executing HELP CREATE TABLE; in MySQL Client.
In the AGGREGATE KEY data model, all columns that are specified with an aggregation type (SUM, REPLACE, MAX, or MIN)
are Value columns. The rest are the Key columns.
A few suggested rules for defining columns include:
It is also possible to use one layer of data partitioning. In this case, it only supports data bucketing.
1. Partition
You can specify one or more columns as the partitioning columns, but they have to be KEY columns. The usage of
multi-column partitions is described further below.
Regardless of the type of the partitioning columns, double quotes are required for partition values.
There is no theoretical limit on the number of partitions.
If users create a table without specifying the partitions, the system will automatically generate a Partition with the
same name as the table. This Partition contains all data in the table and is neither visible to users nor modifiable.
Partitions should not have overlapping ranges.
Range Partitioning
Partitioning columns are usually time columns for easy management of old and new data.
Range partitioning supports specifying only the upper bound by VALUES LESS THAN (...) . The system will use the
upper bound of the previous partition as the lower bound of the next partition, and generate a left-closed right-open
interval. It also supports specifying both the upper and lower bounds by VALUES [...) , and generate a left-closed
right-open interval.
The following takes the VALUES [...) method as an example since it is more comprehensible. It shows how the
partition ranges change as we use the VALUES LESS THAN (...) statement to add or delete partitions:
As in the example_range_tbl example above, when the table is created, the following 3 partitions are
automatically generated:
If we add Partition p201705 VALUES LESS THAN ("2017-06-01"), the results will be as follows:
Note that the partition range of p201702 and p201705 has not changed, and there is a gap between the two
partitions: [2017-03-01, 2017-04-01). That means, if the imported data is within this gap range, the import
would fail.
Then we add Partition p201702new VALUES LESS THAN ("2017-03-01"), the results will be as follows:
Now we delete Partition p201701 and add Partition p201612 VALUES LESS THAN ("2017-01-01"), the partition result
is as follows:
In summary, the deletion of a partition does not change the range of the existing partitions, but might result in gaps.
When a partition is added via the VALUES LESS THAN statement, the lower bound of one partition is the upper bound of
its previous partition.
In addition to the single-column partitioning mentioned above, Range Partitioning also supports multi-column
partitioning. Examples are as follows:
In the above example, we specify date (DATE type) and id (INT type) as the partitioning columns, so the resulting
partitions will be as follows:
Note that in the last partition, the user only specifies the partition value of the date column, so the system fills in
MIN_VALUE as the partition value of the id column by default. When data are imported, the system will compare them
with the partition values in order, and put the data in their corresponding partitions. Examples are as follows:
* Data --> Partition
Range partitioning also supports batch partitioning. For example, you can create multiple partitions that are divided by day
at a time using the FROM ("2022-01-03") TO ("2022-01-06") INTERVAL 1 DAY :2022-01-03 to 2022-01-06 (not including 2022-
01-06), the results will be as follows:
List Partitioning
The partitioning columns support the BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME, CHAR,
VARCHAR data types, and the partition values are enumeration values. Partitions can be only hit if the data is one of the
enumeration values in the target partition.
List partitioning supports using VALUES IN (...) to specify the enumeration values contained in each partition.
The following example illustrates how partitions change when adding or deleting a partition.
As in the example_list_tbl example above, when the table is created, the following three partitions are
automatically created.
p_jp: ("Tokyo")
p_jp: ("Tokyo")
p_uk: ("London")
p_jp: ("Tokyo")
p_uk: ("London")
When data are imported, the system will compare them with the partition values in order, and put the data in
their corresponding partitions. Examples are as follows:
2. Bucketing
If you use the Partition method, the DISTRIBUTED ... statement will describe how data are divided among
partitions. If you do not use the Partition method, that statement will describe how data of the whole table are
divided.
You can specify multiple columns as the bucketing columns. In Aggregate and Unique Models, bucketing columns
must be Key columns; in the Duplicate Model, bucketing columns can be Key columns and Value columns.
Bucketing columns can either be partitioning columns or not.
The choice of bucketing columns is a trade-off between query throughput and query concurrency:
a. If you choose to specify multiple bucketing columns, the data will be more evenly distributed. However, if the
query condition does not include the equivalent conditions for all bucketing columns, the system will scan all
buckets, largely increasing the query throughput and decreasing the latency of a single query. This method is
suitable for high-throughput, low-concurrency query scenarios.
b. If you choose to specify only one or a few bucketing columns, point queries might scan only one bucket. Thus,
when multiple point queries are preformed concurrently, they might scan various buckets, with no interaction
between the IO operations (especially when the buckets are stored on various disks). This approach is suitable for
high-concurrency point query scenarios.
3. Recommendations on the number and data volume for Partitions and Buckets.
The total number of tablets in a table is equal to (Partition num * Bucket num).
The recommended number of tablets in a table, regardless of capacity expansion, is slightly more than the number of
disks in the entire cluster.
The data volume of a single tablet does not have an upper or lower limit theoretically, but is recommended to be in
the range of 1G - 10G. Overly small data volume of a single tablet can impose a stress on data aggregation and
metadata management; while overly large data volume can cause trouble in data migration and completion, and
increase the cost of Schema Change or Rollup operation failures (These operations are performed on the Tablet level).
For the tablets, if you cannot have the ideal data volume and the ideal quantity at the same time, it is recommended
to prioritize the ideal data volume.
Upon table creation, you specify the same number of Buckets for each Partition. However, when dynamically
increasing partitions ( ADD PARTITION ), you can specify the number of Buckets for the new partitions separately. This
feature can help you cope with data reduction or expansion.
Once you have specified the number of Buckets for a Partition, you may not change it afterwards. Therefore, when
determining the number of Buckets, you need to consider the need of cluster expansion in advance. For example, if
there are only 3 hosts, and each host has only 1 disk, and you have set the number of Buckets is only set to 3 or less,
then no amount of newly added machines can increase concurrency.
For example, suppose that there are 10 BEs and each BE has one disk, if the total size of a table is 500MB, you can
consider dividing it into 4-8 tablets; 5GB: 8-16 tablets; 50GB: 32 tablets; 500GB: you may consider dividing it into
partitions, with each partition about 50GB in size, and 16-32 tablets per partition; 5TB: divided into partitions of
around 50GB and 16-32 tablets per partition.
Note: You can check the data volume of the table using the show data command. Divide the returned result by the
number of copies, and you will know the data volume of the table.
If the OLAP table does not have columns of REPLACE type, set the data bucketing mode of the table to RANDOM.
This can avoid severe data skew. (When loading data into the partition corresponding to the table, each batch of data
in a single load task will be written into a randomly selected tablet).
When the bucketing mode of the table is set to RANDOM, since there are no specified bucketing columns, it is
impossible to query only a few buckets, so all buckets in the hit partition will be scanned when querying the table.
Thus, this setting is only suitable for aggregate query analysis of the table data as a whole, but not for highly
concurrent point queries.
If the data distribution of the OLAP table is Random Distribution, you can set load to single tablet to true when
importing data. In this way, when importing large amounts of data, in one task, data will be only written in one tablet
of the corresponding partition. This can improve both the concurrency and throughput of data import and reduce
write amplification caused by data import and compaction, and thus, ensure cluster stability.
The first layer of data partitioning is called Partition. Users can specify a dimension column as the partitioning column
(currently only supports columns of INT and TIME types), and specify the value range of each partition.
The second layer is called Distribution, which means bucketing. Users can perform HASH distribution on data by
specifying the number of buckets and one or more dimension columns as the bucketing columns, or perform random
distribution on data by setting the mode to Random Distribution.
Scenarios with time dimensions or similar dimensions with ordered values, which can be used as partitioning columns.
The partitioning granularity can be evaluated based on data import frequency, data volume, etc.
Scenarios with a need to delete historical data: If, for example, you only need to keep the data of the last N days), you can
use compound partitioning so you can delete historical partitions. To remove historical data, you can also send a DELETE
statement within the specified partition.
Scenarios with a need to avoid data skew: you can specify the number of buckets individually for each partition. For
example, if you choose to partition the data by day, and the data volume per day varies greatly, you can customize the
number of buckets for each partition. For the choice of bucketing column, it is recommended to select the column(s)
with variety in values.
Users can also choose for single partitioning, which is about HASH distribution.
PROPERTIES
In the PROPERTIES section at the last of the CREATE TABLE statement, you can set the relevant parameters. Please see
CREATE TABLE for a detailed introduction.
ENGINE
In this example, the ENGINE is of OLAP type, which is the default ENGINE type. In Doris, only the OALP ENGINE type is
managed and stored by Doris. Other ENGINE types, such as MySQL, Broker, ES, are essentially mappings to tables in other
external databases or systems to ensure that Doris can read the data. And Doris itself does not create, manage, or store any
tables and data of non-OLAP ENGINE type.
Other
IF NOT EXISTS means to create the table if it is non-existent. Note that the system only checks the existence of table based
on the table name, but not compare the schema of the newly created table with the existing ones. So if there exists a table of
the same name but different schema, the command will also return, but it does not mean that a new table of a new schema
has been created.
FAQ
Table Creation
1. If a syntax error occurs in a long CREATE TABLE statement, the error message may be incomplete. Here is a list of
possible syntax errors for your reference in manual touble shooting:
Incorrect syntax. Please use HELP CREATE TABLE; to check the relevant syntax.
Reserved words. Reserved words in user-defined names should be enclosed in backquotes ``. It is recommended
that all user-defined names be enclosed in backquotes.
Chinese characters or full-width characters. Non-UTF8 encoded Chinese characters, or hidden full-width characters
(spaces, punctuation, etc.) can cause syntax errors. It is recommended that you check for these characters using a
text editor that can display non-printable characters.
In Doris, tables are created in the order of the partitioning granularity. This error prompt may appear when a partition
creation task fails, but it could also appear in table creation tasks with no partitioning operations, because, as mentioned
earlier, Doris will create an unmodifiable default partition for tables with no partitions specified.
This error usually pops up because the tablet creation goes wrong in BE. You can follow the steps below for
troubleshooting:
i. In fe.log, find the Failed to create partition log of the corresponding time point. In that log, find a number pair
that looks like {10001-10010} . The first number of the pair is the Backend ID and the second number is the Tablet ID.
As for {10001-10010} , it means that on Backend ID 10001, the creation of Tablet ID 10010 failed.
ii. After finding the target Backend, go to the corresponding be.INFO log and find the log of the target tablet, and then
check the error message.
iii. A few common tablet creation failures include but not limited to:
The task is not received by BE. In this case, the tablet ID related information will be found in be.INFO, or the
creation is successful in BE but it still reports a failure. To solve the above problems, see Installation and
Deployment about how to check the connectivity of FE and BE.
Pre-allocated memory failure. It may be that the length of a row in the table exceeds 100KB.
Too many open files . The number of open file descriptors exceeds the Linux system limit. In this case, you need
to change the open file descriptor limit of the Linux system.
3. The build table command does not return results for a long time.
Doris's table creation command is a synchronous command. The timeout of this command is currently set to be
relatively simple, ie (tablet num * replication num) seconds. If you create more data fragments and have fragment
creation failed, it may cause an error to be returned after waiting for a long timeout.
Under normal circumstances, the statement will return in a few seconds or ten seconds. If it is more than one minute, it
is recommended to cancel this operation directly and go to the FE or BE log to view the related errors.
More Help
For more detailed instructions on data partitioning, please refer to the CREATE TABLE command manual, or enter HELP
CREATE TABLE; in MySQL Client.
Create Users
Note:
The root user has all the privileges about the clusters by default. Users who have both Grant_priv and Node_priv can
grant these privileges to other users. Node changing privileges include adding, deleting, and offlining FE, BE, and
BROKER nodes.
After starting the Doris program, root or admin users can connect to Doris clusters. You can use the following command to
log in to Doris. After login, you will enter the corresponding MySQL command line interface.
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
1. FE_HOST is the IP address of any FE node. 9030 is the query_port configuration in fe.conf.
After login, you can change the root password by the following command:
your_password is a new password for the root user, which can be set at will. A strong password is recommended for
security. The new password is required in the next login.
By default, the newly created regular users do not have any privileges. Privileges can be granted to these users.
Create a database
Initially, root or admin users can create a database by the following command:
You can use the HELP command to check the syntax of all commands. For example, HELP CREATE DATABASE; . Or you
can refer to the SHOW CREATE DATABASE command manual.
If you don't know the full name of the command, you can use "HELP + a field of the command" for fuzzy query. For
example, if you type in HELP CREATE , you can find commands including CREATE DATABASE , CREATE TABLE , and CREATE
USER .
topics:
CREATE CLUSTER
CREATE DATABASE
CREATE ENCRYPTKEY
CREATE FILE
CREATE FUNCTION
CREATE INDEX
CREATE REPOSITORY
CREATE RESOURCE
CREATE ROLE
CREATE TABLE
CREATE USER
CREATE VIEW
ROUTINE LOAD
After the database is created, you can view the information about the database via the SHOW DATABASES command.
+--------------------+
| Database |
+--------------------+
| example_db |
| information_schema |
+--------------------+
information_schema exists for compatibility with MySQL protocol, so the information might not be 100% accurate in
practice. Therefore, for information about the specific databases, please query the corresponding databases directly.
Authorize an Account
After example_db is created, root/admin users can grant read/write privileges of example_db to regular users, such as test ,
using the GRANT command. After authorization, user test can perform operations on example_db .
Create a Table
You can create a table using the CREATE TABLE command. For detailed parameters, you can send a HELP CREATE TABLE;
command.
Firstly, you need to switch to the target database using the USE command:
Doris supports two table creation methods: compound partitioning and single partitioning. The following takes the
Aggregate Model as an example to demonstrate how to create tables with these two methods, respectively.
Single Partitioning
Create a logical table named table1 . The bucketing column is the siteid column, and the number of buckets is 10.
username : VARCHAR, with a maximum length of 32; default value: empty string
pv : BIGINT (8 bytes); default value: 0; This is a metric column, and Doris will aggregate the metric columns internally. The
pv column is aggregated by SUM.
citycode SMALLINT,
PROPERTIES("replication_num" = "1");
Compound Partitioning
Create a logical table named table2 .
username : VARCHAR, with a maximum length of 32; default value: empty string
pv : BIGINT (8 bytes); default value: 0; This is a metric column, and Doris will aggregate the metric columns internally. The
pv column is aggregated by SUM.
Use the event_day column as the partitioning column and create 3 partitions: p201706, p201707, and p201708.
HASH bucket each partition based on siteid . The number of buckets per partition is 10.
event_day DATE,
citycode SMALLINT,
PARTITION BY RANGE(event_day)
PROPERTIES("replication_num" = "1");
After the table is created, you can view the information of the table in example_db :
+----------------------+
| Tables_in_example_db |
+----------------------+
| table1 |
| table2 |
+----------------------+
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
+-----------+-------------+------+-------+---------+-------+
+-----------+-------------+------+-------+---------+-------+
+-----------+-------------+------+-------+---------+-------+
Note:
1. As replication_num is set to 1 , the above tables are created with only one copy. We recommend that you adopt
the default three-copy settings to ensure high availability.
2. You can dynamically add or delete partitions of compoundly partitioned tables. See HELP ALTER TABLE .
3. You can import data into the specified Partition. See HELP LOAD; .
4. You can dynamically change the table schema. See HELP ALTER TABLE; .
5. You can add Rollups to Tables to improve query performance. See the Rollup-related section in "Advanced Usage".
6. The default value in the Null column is true, which may affect query performance.
Load data
Doris supports a variety of data loading methods. You can refer to Data Loading for more details. The following uses Stream
Load and Broker Load as examples.
Stream Load
The Stream Load method transfers data to Doris via HTTP protocol. It can import local data directly without relying on any
other systems or components. For the detailed syntax, please see HELP STREAM LOAD; .
Example 1: Use "table1_20170707" as the Label, import the local file table1_data into table1 .
1. FE_HOST is the IP address of any FE node and 8030 is the http_port in fe.conf.
2. You can use the IP address of any BE and the webserver_port in be.conf for import. For example: BE_HOST:8040 .
The local file table1_data uses , as the separator between data. The details are as follows:
1,1,Jim,2
2,1,grace,2
3,2,tom,2
4,3,bush,3
5,3,helen,3
Example 2: Use "table2_20170707" as the Label, import the local file table2_data into table2 .
The local file table2_data uses | as the separator between data. The details are as follows:
2017-07-03|1|1|jim|2
2017-07-05|2|1|grace|2
2017-07-12|3|2|tom|2
2017-07-15|4|3|bush|3
2017-07-12|5|3|helen|3
Note:
1. The recommended file size for Stream Load is less than 10GB. Excessive file size will result in higher retry cost.
2. Each batch of import data should have a Label. Label serves as the unique identifier of the load task, and guarantees
that the same batch of data will only be successfully loaded into a database once. For more details, please see Data
Loading and Atomicity.
3. Stream Load is a synchronous command. The return of the command indicates that the data has been loaded
successfully; otherwise the data has not been loaded.
Broker Load
The Broker Load method imports externally stored data via deployed Broker processes. For more details, please see HELP
BROKER LOAD;
Example: Use "table1_20170708" as the Label, import files on HDFS into table1 .
DATA INFILE("hdfs://your.namenode.host:port/dir/table1_data")
"username"="hdfs_user",
"password"="hdfs_password"
PROPERTIES
"timeout"="3600",
"max_filter_ratio"="0.1"
);
The Broker Load is an asynchronous command. Successful execution of it only indicates successful submission of the task.
You can check if the import task has been completed by SHOW LOAD; . For example:
In the return result, if you find FINISHED in the State field, that means the import is successful.
Simple Query
Query example::
+--------+----------+----------+------+
+--------+----------+----------+------+
| 2 | 1 | 'grace' | 2 |
| 5 | 3 | 'helen' | 3 |
| 3 | 2 | 'tom' | 2 |
+--------+----------+----------+------+
+--------+----------+----------+------+
+--------+----------+----------+------+
| 2 | 1 | 'grace' | 2 |
| 1 | 1 | 'jim' | 2 |
| 3 | 2 | 'tom' | 2 |
| 4 | 3 | 'bush' | 3 |
| 5 | 3 | 'helen' | 3 |
+--------+----------+----------+------+
SELECT * EXCEPT
The SELECT * EXCEPT statement is used to exclude one or more columns from the result. The output will not include any of
the specified columns.
+--------+------+
| siteid | pv |
+--------+------+
| 2 | 2 |
| 5 | 3 |
| 3 | 2 |
+--------+------+
Note: SELECT * EXCEPT does not exclude columns that do not have a name.
Join Query
Query example::
MySQL> SELECT SUM(table1.pv) FROM table1 JOIN table2 WHERE table1.siteid = table2.siteid;
+--------------------+
| sum(`table1`.`pv`) |
+--------------------+
| 12 |
+--------------------+
Subquery
Query example::
MySQL> SELECT SUM(pv) FROM table2 WHERE siteid IN (SELECT siteid FROM table1 WHERE siteid > 2);
+-----------+
| sum(`pv`) |
+-----------+
| 8 |
+-----------+
Adding columns
Deleting columns
Modify column types
Changing the order of columns
The following table structure changes are illustrated by using the following example.
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
We add a new column uv, type BIGINT, aggregation type SUM, default value 0:
ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
After successful submission, you can check the progress of the job with the following command:
When the job status is FINISHED , the job is complete. The new Schema has taken effect.
After ALTER TABLE is completed, you can view the latest Schema via DESC TABLE .
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
You can cancel the currently executing job with the following command:
Rollup
Rollup can be seen as a materialized index structure for a Table, materialized in the sense that its data is physically
independent in storage, and indexed in the sense that Rollup can reorder columns to increase the hit rate of prefix indexes as
well as reduce Key columns to increase the aggregation level of data.
You can perform various changes to Rollup using ALTER TABLE ROLLUP.
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
+----------+-------------+------+-------+---------+-------+
For table1 , siteid , citycode , and username constitute a set of Key, based on which the pv fields are aggregated; if you
have a frequent need to view the total city pv, you can create a Rollup consisting of only citycode and pv :
After successful submission, you can check the progress of the task with the following command:
After the Rollup is created, you can use DESC table1 ALL to check the information of the Rollup.
+-------------+----------+-------------+------+-------+--------+-------+
| IndexName | Field | Type | Null | Key | Default | Extra |
+-------------+----------+-------------+------+-------+---------+-------+
| | | | | | | |
+-------------+----------+-------------+------+-------+---------+-------+
You can cancel the currently ongoing task using the following command:
With created Rollups, you do not need to specify the Rollup in queries, but only specify the original table for the query. The
program will automatically determine if Rollup should be used. You can check whether Rollup is hit or not using the EXPLAIN
your_sql; command.
Materialized Views
Materialized views are a space-for-time data analysis acceleration technique. Doris supports building materialized views on
top of base tables. For example, a partial column-based aggregated view can be built on top of a table with a granular data
model, allowing for fast querying of both granular and aggregated data.
Doris can automatically ensure data consistency between materialized views and base tables, and automatically match the
appropriate materialized view at query time, greatly reducing the cost of data maintenance for users and providing a
consistent and transparent query acceleration experience.
Memory Limit
To prevent excessive memory usage of one single query, Doris imposes memory limit on queries. By default, one query task
should consume no more than 2GB of memory on one single BE node.
If you find a Memory limit exceeded error, that means the program is trying to allocate more memory than the memory
limit. You can solve this by optimizing your SQL statements.
You can change the 2GB memory limit by modifying the memory parameter settings.
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| exec_mem_limit| 2147483648 |
+---------------+------------+
exec_mem_limit is measured in byte. You can change the value of exec_mem_limit using the SET command. For example,
you can change it to 8GB as follows:
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| exec_mem_limit| 8589934592 |
+---------------+------------+
The above change is executed at the session level and is only valid for the currently connected session. The default
memory limit will restore after reconnection.
If you need to change the global variable, you can set: SET GLOBAL exec_mem_limit = 8589934592; . After this, you
disconnect and log back in, and then the new parameter will take effect permanently.
Query Timeout
The default query timeout is set to 300 seconds. If a query is not completed within 300 seconds, it will be cancelled by the
Doris system. You change this parameter and customize the timeout for your application to achieve a blocking method
similar to wait(timeout).
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| QUERY_TIMEOUT | 300 |
+---------------+-------+
The current timeout check interval is 5 seconds, so if you set the query timeout to less than 5 seconds, it might not
be executed accurately.
The above changes are also performed at the session level. You can change the global variable by SET GLOBAL .
Broadcast/Shuffle Join
The default way to implement Join is to filter the sub table conditionally, broadcast it to each node of the overall table to form
a memory Hash table, and then stream read the data of the overall table for Hash Join, but if the filtered data of the sub table
cannot be put into memory, the Join will not be completed, and then usually a memory overrun error will occur.
In this case, it is recommended to explicitly specify Shuffle Join, also known as Partitioned Join, where both the sub table and
overall table are Hashed according to the Key of the Join and then perform a distributed Join, with the memory consumption
being spread across all compute nodes in the cluster.
Doris will automatically attempt a Broadcast Join and switch to a Shuffle Join if the sub table is estimated to be too large;
note that if a Broadcast Join is explicitly specified at this point, it will enforce Broadcast Join.
+--------------------+
| sum(`table1`.`pv`) |
+--------------------+
| 10 |
+--------------------+
mysql> select sum(table1.pv) from table1 join [broadcast] table2 where table1.siteid = 2;
+--------------------+
| sum(`table1`.`pv`) |
+--------------------+
| 10 |
+--------------------+
mysql> select sum(table1.pv) from table1 join [shuffle] table2 where table1.siteid = 2;
+--------------------+
| sum(`table1`.`pv`) |
+--------------------+
| 10 |
+--------------------+
Please refer to Load Balancing for details on installation, deployment, and usage.
The other method is only used in the Unique Models with a unique primary key. It imports the the primary key rows that are
to be deleted, and the final physical deletion of the data is performed internally by Doris using the deletion mark. This method
is suitable for real-time deletion of data.
For specific instructions on deleting and updating data, see Data Update.
Basic Concepts
In Doris, tables that are created by users via the CREATE TABLE statement are called "Base Tables". Base Tables contains the
basic data stored in the way specified by the user in the the CREATE TABLE statement.
On top of Base Tables, you can create any number of Rollups. Data in Rollups are generated based on the Base Tables and are
physically independent in storage.
Let's illustrate the ROLLUP tables and their roles in different data models with examples.
Following Data Model Aggregate Model in the Aggregate Model section, the Base table structure is as follows:
user_id date timestamp city age sex last_visit_date cost max_dwell_time min_dwell_time
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 06:00 20 10 10
01 08:00:05
2017-10- 2017-10-01
10000 Beijing 20 0 2017-10-01 07:00 15 2 2
01 09:00:05
ColumnName
user_id
cost
The ROLLUP contains only two columns: user_id and cost. After the creation, the data stored in the ROLLUP is as follows:
user_id cost
10000 35
10001 2
10002 200
10003 30
10004 111
As you can see, ROLLUP retains only the results of SUM on the cost column for each user_id. So when we do the following
query:
Doris automatically hits the ROLLUP table, thus completing the aggregated query by scanning only a very small amount of
data.
2. Example 2: Get the total consumption, the longest and shortest page residence time of users of different ages in different
cities
Beijing 20 35 10 2
city age cost max_dwell_time min_dwell_time
Beijing 30 2 22 22
Shanghai 20 200 5 5
Guangzhou 32 30 11 11
Shenzhen 35 111 6 3
mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city, age;
mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM table GROUP BY city;
mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY city, age;
ColumnName Type
user_id BIGINT
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME
ColumnName Type
age INT
user_id BIGINT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME
As you can see, the columns of ROLLUP and Base tables are exactly the same, just changing the order of user_id and age. So
when we do the following query:
mysql> SELECT * FROM table where age=20 and message LIKE "%error%";
The ROLLUP table is preferred because the prefix index of ROLLUP matches better.
Query
As a polymer view in Doris, Rollup can play two roles in queries:
Index
Aggregate data (only for aggregate models, aggregate key)
However, in order to hit Rollup, certain conditions need to be met, and the value of PreAggregation of ScanNdo node in the
execution plan can be used to determine whether Rollup can be hit or not, and the Rollup field can be used to determine
which Rollup table is hit.
Index
Doris's prefix index has been introduced in the previous query practice, that is, Doris will generate the first 36 bytes in the
Base/Rollup table separately in the underlying storage engine (with varchar type, the prefix index may be less than 36 bytes,
varchar will truncate the prefix index, and use up to 20 bytes of varchar). A sorted sparse index data (data is also sorted,
positioned by index, and then searched by dichotomy in the data), and then matched each Base/Rollup prefix index
according to the conditions in the query, and selected a Base/Rollup that matched the longest prefix index.
+----+----+----+----+----+----+
| c1 | c2 | c3 | c4 | c5 |... |
As shown in the figure above, the conditions of where and on in the query are pushed up and down to ScanNode and
matched from the first column of the prefix index. Check if there are any of these columns in the condition, and then
accumulate the matching length until the matching cannot match or the end of 36 bytes (columns of varchar type can only
match 20 bytes and match less than 36 words). Section truncates prefix index, and then chooses a Base/Rollup with the
longest matching length. The following example shows how to create a Base table and four rollups:
+---------------+-------+--------------+------+-------+---------+-------+
+---------------+-------+--------------+------+-------+---------+-------+
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
+---------------+-------+--------------+------+-------+---------+-------+
rollup_index1(k9)
rollup_index2(k9)
Conditions on columns that can be indexed with the prefix need to be = < > <= >= in between , and these conditions
are side-by-side and the relationship uses and connections', which cannot be hit for or 、 != and so on. Then look at the
following query:
With the conditions on K1 and k2, check that only the first column of Base contains K1 in the condition, so match the longest
prefix index, test, explain:
| 0:OlapScanNode
| TABLE: test
| PREAGGREGATION: OFF. Reason: No AggregateInfo
| PREDICATES: `k1` = 1, `k2` > 3
| partitions=1/1
| rollup: test
| buckets=1/10
| cardinality=-1
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
With K4 and K5 conditions, check that the first column of rollup_index3 and rollup_index4 contains k4, but the second
column of rollup_index3 contains k5, so the matching prefix index is the longest.
| 0:OlapScanNode
| TABLE: test
| PREAGGREGATION: OFF. Reason: No AggregateInfo
| PREDICATES: `k4` = 1, `k5` > 3
| partitions=1/1
| rollup: rollup_index3
| buckets=10/10
| cardinality=-1
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
Now we try to match the conditions on the column containing varchar, as follows:
There are K9 and K1 conditions. The first column of rollup_index1 and rollup_index2 contains k9. It is reasonable to choose
either rollup here to hit the prefix index and randomly select the same one (because there are just 20 bytes in varchar, and
the prefix index is truncated in less than 36 bytes). The current strategy here will continue to match k1, because the second
rollup_index1 is listed as k1, so rollup_index1 is chosen, in fact, the latter K1 condition will not play an accelerating role. (If the
condition outside the prefix index needs to accelerate the query, it can be accelerated by establishing a Bloom Filter filter.
Typically for string types, because Doris has a Block level for columns, a Min/Max index for shaping and dates.) The following
is the result of explain.
| 0:OlapScanNode
| TABLE: test
| PREAGGREGATION: OFF. Reason: No AggregateInfo
| PREDICATES: `k9` IN ('xxx', 'yyyy'), `k1` = 10
| partitions=1/1
| rollup: rollup_index1
| buckets=1/10
| cardinality=-1
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
Finally, look at a query that can be hit by more than one Rollup:
There are three conditions: k4, K5 and k6. The first three columns of rollup_index3 and rollup_index4 contain these three
columns respectively. So the prefix index length matched by them is the same. Both can be selected. The current default
strategy is to select a rollup created earlier. Here is rollup_index3.
| 0:OlapScanNode
| TABLE: test
| PREAGGREGATION: OFF. Reason: No AggregateInfo
| PREDICATES: `k4` < 1000, `k5` = 80, `k6` >= 10000.0
| partitions=1/1
| rollup: rollup_index3
| buckets=10/10
| cardinality=-1
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
The query here cannot hit the prefix index. (Even any Min/Max in the Doris storage engine, the BloomFilter index doesn't
work.)
Aggregate data
Of course, the function of aggregated data is indispensable for general polymer views. Such materialized views are very
helpful for aggregated queries or report queries. To hit the polymer views, the following prerequisites are needed:
1. There is a separate Rollup for all columns involved in a query or subquery.
2. If there is Join in a query or sub-query, the type of Join needs to be Inner join.
The following are some types of aggregated queries that can hit Rollup.
Column type Query type Sum Distinct/Count Distinct Min Max APPROX_COUNT_DISTINCT
If the above conditions are met, there will be two stages in judging the hit of Rollup for the aggregation model:
1. Firstly, the Rollup table with the longest index hit by prefix index is matched by conditions. See the index strategy above.
2. Then compare the rows of Rollup and select the smallest Rollup.
+-------------+-------+--------------+------+-------+---------+-------+
+-------------+-------+--------------+------+-------+---------+-------+
| | | | | | | |
| | | | | | | |
+-------------+-------+--------------+------+-------+---------+-------+
SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);
Firstly, it judges whether the query can hit the aggregated Rollup table. After checking the graph above, it is possible. Then
the condition contains three conditions: k1, K2 and k3. The first three columns of test_rollup, rollup1 and rollup2 contain all the
three conditions. So the prefix index length is the same. Then, it is obvious that the aggregation degree of rollup2 is the
highest when comparing the number of rows. Row 2 is selected because of the minimum number of rows.
| 0:OlapScanNode |
| TABLE: test_rollup |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: rollup2 |
| buckets=1/10 |
| cardinality=-1 |
| avgRowSize=0.0 |
| numNodes=0 |
| tuple ids: 0 |
Best Practices
1 tabulation
When AGGREGATE KEY is the same, old and new records are aggregated. The aggregation functions currently supported are
SUM, MIN, MAX, REPLACE.
AGGREGATE KEY model can aggregate data in advance and is suitable for reporting and multi-dimensional analysis business.
siteid INT,
City: SMALLINT,
When UNIQUE KEY is the same, the new record covers the old record. Before version 1.2, UNIQUE KEY implements the same
REPLACE aggregation method as AGGREGATE KEY, and they are essentially the same. We introduced a new merge-on-write
implementation for UNIQUE KEY since version 1.2, which have better performance on many scenarios. Suitable for analytical
business with updated requirements.
orderid BIGINT,
status TINYINT,
Only sort columns are specified, and the same rows are not merged. It is suitable for the analysis business where data need
not be aggregated in advance.
visitorid SMALLINT,
sessionid BIGINT,
province CHAR(20),
brower CHAR(20),
There are many fields in Schema, and there may be more key columns in the aggregation model. The number of
columns that need to be sorted in the import process will increase.
Dimensional information updates are reflected in the whole table, and the frequency of updates directly affects the
efficiency of queries.
In the process of using Star Schema, users are advised to use Star Schema to distinguish dimension tables from indicator
tables as much as possible. Frequently updated dimension tables can also be placed in MySQL external tables. If there are
only a few updates, they can be placed directly in Doris. When storing dimension tables in Doris, more copies of dimension
tables can be set up to improve Join's performance.
1.3.1. Partitioning
Partition is used to divide data into different intervals, which can be logically understood as dividing the original table into
multiple sub-tables. Data can be easily managed by partition, for example, to delete data more quickly.
In business, most users will choose to partition on time, which has the following advantages:
In business,, users can select cities or other enumeration values for partition.
The data is divided into different buckets according to the hash value.
It is suggested that columns with large differentiation should be used as buckets to avoid data skew.
In order to facilitate data recovery, it is suggested that the size of a single bucket should not be too large and should be
kept within 10GB. Therefore, the number of buckets should be considered reasonably when building tables or increasing
partitions, among which different partitions can specify different buckets.
1.4 Sparse Index and Bloom Filter
Doris stores the data in an orderly manner, and builds a sparse index for Doris on the basis of ordered data. The index
granularity is block (1024 rows).
Sparse index chooses fixed length prefix in schema as index content, and Doris currently chooses 36 bytes prefix as index.
When building tables, it is suggested that the common filter fields in queries should be placed in front of Schema. The
more distinguishable the query fields are, the more frequent the query fields are.
One particular feature of this is the varchar type field. The varchar type field can only be used as the last field of the sparse
index. The index is truncated at varchar, so if varchar appears in front, the length of the index may be less than 36 bytes.
Specifically, you can refer to data model, ROLLUP and query.
In addition to sparse index, Doris also provides bloomfilter index. Bloomfilter index has obvious filtering effect on
columns with high discrimination. If you consider that varchar cannot be placed in a sparse index, you can create a
bloomfilter index.
1.5 Rollup
Rollup can essentially be understood as a physical index of the original table. When creating Rollup, only some columns in
Base Table can be selected as Schema. The order of fields in Schema can also be different from that in Base Table.
This is usually due to the fact that Base Table has more differentiated fields. At this point, you can consider selecting some
columns and establishing Rollup.
Siteid may lead to a low degree of data aggregation. If business parties often base their PV needs on city statistics, they can
build a city-only, PV-based rollup:
Generally, the way Base Table is constructed cannot cover all query modes. At this point, you can consider adjusting the
column order and establishing Rollup.
Database Session
session -u data (visitorid, sessionid, visittime, city, province, ip, browser, url)
In addition to visitorid analysis, there are Brower and province analysis cases, Rollup can be established separately.
Schema Change
Users can modify the Schema of an existing table through the Schema Change operation, currently Doris supports the
following modifications:
Dynamic Table
A dynamic schema table is a special kind of table which schema expands automatically with the import procedure.
Currently, this feature is mainly used for importing semi-structured data such as JSON. Because JSON is self-describing, we
can extract the schema information from the original document and infer the final type information. This special table can
reduce manual schema change operations and easily import semi-structured data and automatically expand its schema.
Terminology
Schema change, changing the structure of the table, such as adding columns, reducing columns, changing column
types
Static column, column specified during table creation, such as partition columns, primary key columns
Dynamic column, columns automatically recognized and added during import
-- Create table and specify static column types, import will automatically convert to the type of static column
qid bigint,
`answers.date` array<datetime>,
`title` string,
... -- ... Identifying a table as a dynamic table and its syntax for dynamic tables.
DUPLICATE KEY(`qid`)
properties("replication_num" = "1");
-- Three Columns are added to the table by default, and their types are specified
+--------------+-----------------+------+-------+---------+-------+
+--------------+-----------------+------+-------+---------+-------+
+--------------+-----------------+------+-------+---------+-------+
Importing data
-- example1.json
'{
"qid": "1000000",
"answers": [
],
"user": "Jash",
"creationdate": "2009-06-16T07:28:42.770"
}'
-- The types of the three columns: `qid`, `answers.date`, `user` remain the same as with the table was created
+--------------+-----------------+------+-------+---------+-------+
+--------------+-----------------+------+-------+---------+-------+
+--------------+-----------------+------+-------+---------+-------+
-- Specifying -H "strip_outer_array:true", parsing the entire file as a JSON array, each element in the array is
the same, more efficient parsing way
For a dynamic table, you can also use S3load or Routine load, with similar usage.
{"id" : 123}
{"id" : "123"}
-- The type will finally be inferred as Text type, and if {"id" : 123} is imported later, the type will
automatically be converted to String type
{"id" : [123]}
{"id" : 123}
Index Overview
Indexes are used to help quickly filter or find data.
The ZoneMap index is the index information automatically maintained for each column in the column storage format,
including Min/Max, the number of Null values, and so on. This index is transparent to the user.
Prefix Index
Unlike traditional database designs, Doris does not support creating indexes on arbitrary columns. an OLAP database with an
MPP architecture like Doris is typically designed to handle large amounts of data by increasing concurrency.
Essentially, Doris data is stored in a data structure similar to an SSTable (Sorted String Table). This structure is an ordered data
structure that can be stored sorted by specified columns. On this data structure, it will be very efficient to perform lookups
with sorted columns as a condition.
In the Aggregate, Unique and Duplicate data models. The underlying data storage is sorted and stored according to the
columns specified in the respective table building statements, AGGREGATE KEY, UNIQUE KEY and DUPLICATE KEY.
The prefix index, which is based on sorting, is an indexing method to query data quickly based on a given prefix column.
Example
We use the first 36 bytes of a row of data as the prefix index of this row of data. Prefix indexes are simply truncated when a
VARCHAR type is encountered. We give an example:
1. The prefix index of the following table structure is user_id(8 Bytes) + age(4 Bytes) + message(prefix 20 Bytes).
ColumnName Type
user_id BIGINT
age INT
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME
2. The prefix index of the following table structure is user_name(20 Bytes). Even if it does not reach 36 bytes, because
VARCHAR is encountered, it is directly truncated and will not continue further.
ColumnName Type
user_name VARCHAR(20)
age INT
ColumnName Type
message VARCHAR(100)
max_dwell_time DATETIME
min_dwell_time DATETIME
When our query condition is the prefix of the prefix index, the query speed can be greatly accelerated. For example, in the
first example, we execute the following query:
This query will be much more efficient than the following query:
Therefore, when building a table, choosing the correct column order can greatly improve query efficiency.
inverted index
Since Version 2.0.0
From version 2.0.0, Doris implemented inverted index to support fulltext search on text field, normal eq and range filter on
text, numeric, datetime field. This doc introduce inverted index usage, including create, drop and query.
Glossary
inverted index is a index techlogy used in information retirval commonly. It split text into word terms and construct a
term to doc index. This index is called inverted index and can be used to find the docs where a specific term appears.
Basic Principles
Doris use CLucene as its underlying lib for inverted index. CLucene is a high performance and robust implementation of the
famous Lucene inverted index library written in C++. Doris optimize CLucene to be more simple, fast and suitable for a
database.
In the inverted index of Doris, a row in a table corresponds to a doc in CLucene, a column corresponds to a field in doc. So
using inverted index, doris can get the rows that meet the filter of SQL WHERE clause, and then get the rows quickly without
reading other unrelated rows.
Doris use a seperate file to store inverted index. It's related to segment file in logic, but iosolated with each other. The
advantange is that, create and drop inverted index does not need to rewrite tablet and segment file, which is very heavy
work.
Features
The features for inverted index is as follows:
Syntax
The inverted index definition syntax on table creation is as follows
USING INVERTED is mandatory, it specify index type to be inverted index
PROPERTIES is optional, it allows user to specify additional properties for index, "parser" is for type of word
tokenizor/parser
missing stands for no parser, the whole field is considered to be a term
"english" stands for english parser
"chinese" stands for chinese parser
COMMENT is optional
columns_difinition,
table_properties;
-- syntax 1
-- syntax 2
ALTER TABLE table_name ADD INDEX idx_name(column_name) USING INVERTED [PROPERTIES("parser" = "english|chinese")]
[COMMENT 'your comment'];
-- syntax 1
-- syntax 2
-- 1.2 find rows that logmsg contains keyword1 or keyword2 or more keywords
-- 1.3 find rows that logmsg contains both keyword1 and keyword2 and more keywords
Example
This example will demostrate inverted index creation, fulltext query, normal query using a hackernews dataset with 1 million
rows. The performanc comparation between using and without inverted index will also be showed.
Create table
USE test_inverted_index;
`id` BIGINT,
`deleted` TINYINT,
`type` String,
`author` String,
`timestamp` DateTimeV2,
`comment` String,
`dead` TINYINT,
`parent` BIGINT,
`poll` BIGINT,
`children` Array<BIGINT>,
`url` String,
`score` INT,
`title` String,
`parts` Array<INT>,
`descendants` INT,
INDEX idx_comment (`comment`) USING INVERTED PROPERTIES("parser" = "english") COMMENT 'inverted index for
comment'
DUPLICATE KEY(`id`)
Load data
load data by stream load
wget https://doris-build-1308700295.cos.ap-beijing.myqcloud.com/regression/index/hacknernews_1m.csv.gz
"TxnId": 2,
"Label": "a8a3e802-2329-49e8-912b-04c800a461a6",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1000000,
"NumberLoadedRows": 1000000,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 130618406,
"LoadTimeMs": 8988,
"BeginTxnTimeMs": 23,
"StreamLoadPutTimeMs": 113,
"ReadDataTimeMs": 4788,
"WriteDataTimeMs": 8811,
"CommitAndPublishTimeMs": 38
+---------+
| count() |
+---------+
| 1000000 |
+---------+
Query
+---------+
| count() |
+---------+
| 34 |
+---------+
count the rows that comment contains 'OLAP' using MATCH_ANY fulltext search based on inverted index , cost 0.02s
and 9x speedup, the speedup will be even larger on larger dataset
the difference of count is due to feature of fulltext. Word parser will not only split text to words, but also do some
normalization such transform to lower case letters. So the result of MATCH_ANY will be a little more.
+---------+
| count() |
+---------+
| 35 |
+---------+
Semilarly, count on 'OLTP' shows 0.07s vs 0.01s. Due to the cache in Doris, both LIKE and MATCH_ANY is faster, but there
is still 7x speedup.
+---------+
| count() |
+---------+
| 48 |
+---------+
+---------+
| count() |
+---------+
| 51 |
+---------+
,
search for both 'OLAP' and 'OLTP', 0.13s vs 0.01s 13x speedup
using MATCH_ALL if you need the keywords all appears
mysql> SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' AND comment LIKE '%OLTP%';
+---------+
| count() |
+---------+
| 14 |
+---------+
mysql> SELECT count() FROM hackernews_1m WHERE comment MATCH_ALL 'OLAP OLTP';
+---------+
| count() |
+---------+
| 15 |
+---------+
,
search for at least one of 'OLAP' or 'OLTP', 0.12s vs 0.01s 12x speedup
using MATCH_ALL if you only need at least one of the keywords appears
mysql> SELECT count() FROM hackernews_1m WHERE comment LIKE '%OLAP%' OR comment LIKE '%OLTP%';
+---------+
| count() |
+---------+
| 68 |
+---------+
mysql> SELECT count() FROM hackernews_1m WHERE comment MATCH_ANY 'OLAP OLTP';
+---------+
| count() |
+---------+
| 71 |
+---------+
mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+---------+
| count() |
+---------+
| 999081 |
+---------+
-- for timestamp column, there is no need for word parser, so just USING INVERTED without PROPERTIES
progress of building index can be view by SQL. It just costs 1s (compare FinishTime and CreateTime) to build index for
timestamp column with 1 million rows.
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
after the index is build, Doris will automaticaly use index for range query, but the performance is almost the same since
it's already fast on the small dataset
mysql> SELECT count() FROM hackernews_1m WHERE timestamp > '2007-08-23 04:17:00';
+---------+
| count() |
+---------+
| 999081 |
+---------+
similary test for parent column with numeric type, using equal query
+---------+
| count() |
+---------+
| 2 |
+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+---------+
| count() |
+---------+
| 2 |
+---------+
for text column author, inverted index can also be used to speedup equal query
+---------+
| count() |
+---------+
| 20 |
+---------+
-- costs 1.5s to build index for author column with 1 million rows.
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+-------+---------------+-------------------------+-------------------------+---------------+---------+----------
-----+---------------+---------------+----------+------+----------+---------+
+---------+
| count() |
+---------+
| 20 |
+---------+
BloomFilter index
BloomFilter is a fast search algorithm for multi-hash function mapping proposed by Bloom in 1970. Usually used in some
occasions where it is necessary to quickly determine whether an element belongs to a set, but is not strictly required to be
100% correct, BloomFilter has the following characteristics:
A highly space-efficient probabilistic data structure used to check whether an element is in a set.
For a call to detect whether an element exists, BloomFilter will tell the caller one of two results: it may exist or it must not
exist.
The disadvantage is that there is a misjudgment, telling you that it may exist, not necessarily true.
Bloom filter is actually composed of an extremely long binary bit array and a series of hash functions. The binary bit array is all
0 initially. When an element to be queried is given, this element will be calculated by a series of hash functions to map out a
series of values, and all values are treated as 1 in the offset of the bit array.
Figure below shows an example of Bloom Filter with m=18, k=3 (m is the size of the Bit array, and k is the number of Hash
functions). The three elements of x, y, and z in the set are hashed into the bit array through three different hash functions.
When querying the element w, after calculating by the Hash function, because one bit is 0, w is not in the set.
So how to judge whether the plot and the elements are in the set? Similarly, all the offset positions of this element are
obtained after hash function calculation. If these positions are all 1, then it is judged that this element is in this set, if one is not
1, then it is judged that this element is not in this set. It's that simple!
If you are looking for a short row, only building an index on the starting row key of the entire data block cannot give you fine-
grained index information. For example, if your row occupies 100 bytes of storage space, a 64KB data block contains (64 *
1024)/100 = 655.53 = ~700 rows, and you can only put the starting row on the index bit. The row you are looking for may fall in
the row interval on a particular data block, but it is not necessarily stored in that data block. There are many possibilities for
this, or the row does not exist in the table, or it is stored in another HFile, or even in MemStore. In these cases, reading data
blocks from the hard disk will bring IO overhead and will abuse the data block cache. This can affect performance, especially
when you are facing a huge data set and there are many concurrent users.
So HBase provides Bloom filters that allow you to do a reverse test on the data stored in each data block. When a row is
requested, first check the Bloom filter to see if the row is not in this data block. The Bloom filter must either confirm that the
line is not in the answer, or answer that it doesn't know. This is why we call it a reverse test. Bloom filters can also be applied
to the cells in the row. Use the same reverse test first when accessing a column of identifiers.
Bloom filters are not without cost. Storing this additional index level takes up additional space. Bloom filters grow as their
index object data grows, so row-level bloom filters take up less space than column identifier-level bloom filters. When space
is not an issue, they can help you squeeze the performance potential of the system.
The BloomFilter index of Doris is specified when the table is built, or completed by the ALTER operation of the table. Bloom
Filter is essentially a bitmap structure, used to quickly determine whether a given value is in a set. This judgment will produce
a small probability of misjudgment. That is, if it returns false, it must not be in this set. And if the range is true, it may be in this
set.
The BloomFilter index is also created with Block as the granularity. In each Block, the value of the specified column is used as
a set to generate a BloomFilter index entry, which is used to quickly filter the data that does not meet the conditions in the
query.
Let's take a look at how Doris creates BloomFilter indexes through examples.
PARTITION BY RANGE(sale_date)
PROPERTIES (
"replication_num" = "3",
"bloom_filter_columns"="saler_id,category_id",
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "MONTH",
"dynamic_partition.time_zone" = "Asia/Shanghai",
"dynamic_partition.start" = "-2147483648",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "P_",
"dynamic_partition.replication_num" = "3",
"dynamic_partition.buckets" = "3"
);
In order to improve the like query performance, the NGram BloomFilter index was implemented, which referenced to the
ClickHouse's ngrambf skip indices;
) ENGINE=OLAP
PROPERTIES (
"replication_num" = "1"
);
,
-- PROPERTIES("gram_size"="3", "bf_size"="1024") indicate the number of gram and bytes of bloom filter
respectively.
-- the gram size set to same as the like query pattern string length. and the suitable bytes of bloom filter can
be get by test, more larger more better, 256 maybe is a good start.
-- Usually, if the data's cardinality is small, you can increase the bytes of bloom filter to improve the
efficiency.
Bitmap Index
Users can speed up queries by creating a bitmap index
This document focuses on how to create an index job, as well as some
considerations and frequently asked questions when creating an index.
Glossary
bitmap index: a fast data structure that speeds up queries
Basic Principles
Creating and dropping index is essentially a schema change job. For details, please refer to
Schema Change.
Syntax
Create index
Create a bitmap index for siteid on table1
CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING BITMAP COMMENT 'balabala';
View index
Display the lower index of the specified table_name
Delete index
Delete the lower index of the specified table_name
Notice
Currently only index of bitmap type is supported.
The bitmap index is only created on a single column.
Bitmap indexes can be applied to all columns of the Duplicate , Uniq data model and key columns of the Aggregate
models.
The data types supported by bitmap indexes are as follows:
TINYINT
SMALLINT
INT
BIGINT
CHAR
VARCHAR
DATE
DATETIME
LARGEINT
DECIMAL
BOOL
The bitmap index takes effect only in segmentV2. The table's storage format will be converted to V2 automatically when
creating index.
More Help
For more detailed syntax and best practices for using bitmap indexes, please refer to the CREARE INDEX / SHOW INDEX /
DROP INDEX command manual. You can also enter HELP CREATE INDEX / HELP SHOW INDEX / HELP DROP INDEX on the
MySql client command line.
Import Overview
By scene
Data Source Import Method
Stream Load Stream import data (local file and memory data)
import instructions
The data import implementation of Apache Doris has the following common features, which are introduced here to help you
better use the data import function
At the same time, an import job will have a Label. This Label is unique under a database (Database) and is used to uniquely
identify an import job. Label can be specified by the user, and some import functions will also be automatically generated by
the system.
Label is used to ensure that the corresponding import job can only be successfully imported once. A successfully imported
Label, when used again, will be rejected with the error Label already used . Through this mechanism, At-Most-Once
semantics can be implemented in Doris. If combined with the At-Least-Once semantics of the upstream system, the
Exactly-Once semantics of imported data can be achieved.
For best practices on atomicity guarantees, see Importing Transactions and Atomicity.
For example, in the following import, you need to cast columns b14 and a13 into array<string> type, and then use the
array_union function.
DATA INFILE("hdfs://test.hdfs.com:9000/user/test/data/sys/load/array_test.data")
COLUMNS TERMINATED BY "|" (`k1`, `a1`, `a2`, `a3`, `a4`, `a5`, `a6`, `a7`, `a8`, `a9`, `a10`, `a11`, `a12`,
`a13`, `b14`)
PROPERTIES( "max_filter_ratio"="0.8" );
Now Doris support two way to load data from client local file:
1. Stream Load
2. MySql Load
Stream Load
Stream Load is used to import local files into Doris.
Unlike the submission methods of other commands, Stream Load communicates with Doris through the HTTP protocol.
The HOST:PORT involved in this method should be the HTTP protocol port.
In this document, we use the curl command as an example to demonstrate how to import data.
At the end of the document, we give a code example of importing data using Java
Import Data
The request body of Stream Load is as follows:
PUT /api/{db}/{table}/_stream_load
1. Create a table
Use the CREATE TABLE command to create a table in the demo to store the data to be imported. For the specific import
method, please refer to the CREATE TABLE command manual. An example is as follows:
id INT,
age TINYINT,
name VARCHAR(50)
unique key(id)
2. Import data
user:passwd is the user created in Doris. The initial user is admin/root, and the password is blank in the initial state.
host:port is the HTTP protocol port of BE, the default is 8040, which can be viewed on the Doris cluster WEB UI page.
label: Label can be specified in the Header to uniquely identify this import task.
For more advanced operations of the Stream Load command, see Stream Load Command documentation.
The Stream Load command is a synchronous command, and a successful return indicates that the import is successful.
If the imported data is large, a longer waiting time may be required. Examples are as follows:
"TxnId": 1003,
"Label": "load_local_file_test",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 1000000,
"NumberLoadedRows": 1000000,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 40888898,
"LoadTimeMs": 2144,
"BeginTxnTimeMs": 1,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 325,
"WriteDataTimeMs": 1933,
"CommitAndPublishTimeMs": 106,
"ErrorURL": "http://192.168.1.1:8042/api/_load_error_log?
file=__shard_0/error_log_insert_stmt_db18266d4d9b4ee5-abb00ddd64bdf005_db18266d4d9b4ee5_abb00ddd64bdf005"
The status of the Status field is Success , which means the import is successful.
For details of other fields, please refer to the Stream Load command documentation.
Import suggestion
Stream Load can only import local files.
It is recommended to limit the amount of data for an import request to 1 - 2 GB. If you have a large number of local files,
you can submit them concurrently in batches.
package demo.doris;
import org.apache.commons.codec.binary.Base64;
import org.apache.http.HttpHeaders;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPut;
import org.apache.http.entity.FileEntity;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.DefaultRedirectStrategy;
import org.apache.http.impl.client.HttpClientBuilder;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
/*
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
*/
//You can choose to fill in the FE address and the http_port of the FE, but the connectivity between the
client and the BE node must be guaranteed.
private final static String LOAD_FILE_NAME = "/path/to/1.txt"; // local file path to import
.custom()
.setRedirectStrategy(new DefaultRedirectStrategy() {
@Override
return true;
});
put.setHeader(HttpHeaders.EXPECT, "100-continue");
// You can set stream load related properties in Header, here we set label and column_separator.
put.setHeader("label","label1");
put.setHeader("column_separator",",");
put.setEntity(entity);
if (response.getEntity() != null) {
loadResult = EntityUtils.toString(response.getEntity());
if (statusCode != 200) {
loader.load(file);
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
MySql LOAD
Since Version dev Example of mysql load
Import Data
1. Create a table
Use the CREATE TABLE command to create a table in the demo database to store the data to be imported.
id INT,
age TINYINT,
name VARCHAR(50)
unique key(id)
2. Import data
Excute fellowing sql statmeent in the mysql client to load client local file:
LOAD DATA
LOCAL
INFILE '/path/to/local/demo.txt'
For more advanced operations of the MySQL Load command, see MySQL Load Command documentation.
The MySql Load command is a synchronous command, and a successful return indicates that the import is successful. If
the imported data is large, a longer waiting time may be required. Examples are as follows:
Query OK, 1 row affected (0.17 sec)
Load success if the client show the return rows. Otherwise sql statement will throw an exception and show the error
message in client.
For details of other fields, please refer to the MySQL Load command documentation.
Import suggestion
MySql Load can only import local files(which can be client local file or fe local file) and only support csv format.
It is recommended to limit the amount of data for an import request to 1 - 2 GB. If you have a large number of local files,
you can submit them concurrently in batches.
HDFS LOAD
Ready to work
Upload the files to be imported to HDFS. For specific commands, please refer to HDFS upload command
start import
Hdfs load creates an import statement. The import method is basically the same as Broker Load, only need to WITH BROKER
broker_name () statement with the following
(data_desc, ...)
WITH HDFS
1. create a table
id INT,
age TINYINT,
name VARCHAR(50)
unique key(id)
DATA INFILE("hdfs://host:port/tmp/test_hdfs.txt")
(id,age,name)
with HDFS (
"fs.defaultFS"="hdfs://testFs",
"hdfs_user"="user"
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
For parameter introduction, please refer to Broker Load, HA cluster creation syntax, view through HELP BROKER LOAD
JobId: 41326624
Label: broker_load_2022_04_15
State: FINISHED
Progress: ETL:100%; LOAD:100%
Type: BROKER
ErrorMsg: NULL
URL: NULL
S3 LOAD
Starting from version 0.14, Doris supports the direct import of data from online storage systems that support the S3 protocol
through the S3 protocol.
This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage
systems that support the S3 protocol.
Applicable scenarios
Source data in S3 protocol accessible storage systems, such as S3.
Data volumes range from tens to hundreds of GB.
Preparing
1. Standard AK and SK
First, you need to find or regenerate AWS Access keys , you can find the generation method in My
Security Credentials of AWS console, as shown in the following figure:
AK _SK
Select Create New Access Key and pay
attention to save and generate AK and SK.
2. Prepare REGION and ENDPOINT
REGION can be selected when creating the bucket or can be viewed in the bucket list.
ENDPOINT can be found through REGION on the following page AWS Documentation
Other cloud storage systems can find relevant information compatible with S3 in corresponding documents
Start Loading
Like Broker Load just replace WITH BROKER broker_name () with
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION"
example:
DATA INFILE("s3://your_bucket_name/your_file.txt")
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION"
PROPERTIES
"timeout" = "3600"
);
FAQ
1. S3 SDK uses virtual-hosted style by default. However, some object storage systems may not be enabled or support
virtual-hosted style access. At this time, we can add the use_path_style parameter to force the use of path style:
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION",
"use_path_style" = "true"
2. Support using temporary security credentials to access object stores that support the S3 protocol:
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_TEMP_ACCESS_KEY",
"AWS_SECRET_KEY" = "AWS_TEMP_SECRET_KEY",
"AWS_TOKEN" = "AWS_TEMP_TOKEN",
"AWS_REGION" = "AWS_REGION"
Doris itself can ensure that messages in Kafka are subscribed without loss or weight, that is, Exactly-Once consumption
semantics.
The user first needs to create a routine import job. The job will send a series of tasks continuously through routine
scheduling, and each task will consume a certain number of messages in Kafka.
Accessing an SSL-authenticated Kafka cluster requires the user to provide a certificate file (ca.pem) for authenticating the
Kafka Broker public key. If client authentication is also enabled in the Kafka cluster, the client's public key (client.pem), key file
(client.key), and key password must also be provided. The files required here need to be uploaded to Doris through the CREAE
FILE command, and the catalog name is kafka . The specific help of the CREATE FILE command can be found in the
CREATE FILE command manual . Here is an example:
upload files
After the upload is complete, you can view the uploaded files through the SHOW FILES command.
PROPERTIES
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"property.group.id" = "xxx",
"property.client.id" = "xxx",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
PROPERTIES
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
FROM KAFKA
"kafka_broker_list"= "broker1:9091,broker2:9091",
"kafka_topic" = "my_topic",
"property.security.protocol" = "ssl",
"property.ssl.ca.location" = "FILE:ca.pem",
"property.ssl.certificate.location" = "FILE:client.pem",
"property.ssl.key.location" = "FILE:client.key",
"property.ssl.key.password" = "abcdefg"
);
Only the currently running tasks can be viewed, and the completed and unstarted tasks cannot be viewed.
Job Control
The user can control the stop, pause and restart of the job through the STOP/PAUSE/RESUME three commands.
For specific commands, please refer to STOP ROUTINE LOAD , PAUSE ROUTINE LOAD, RESUME ROUTINE LOAD command
documentation.
more help
For more detailed syntax and best practices for ROUTINE LOAD, see ROUTINE LOAD command manual.
MySQL
Oracle
PostgreSQL
SQLServer
Hive
Iceberg
ElasticSearch
This document describes how to create external tables accessible through the ODBC protocol and how to import data from
these external tables.
The purpose of ODBC Resource is to manage the connection information of external tables uniformly.
PROPERTIES (
"type" = "odbc_catalog",
"host" = "192.168.0.1",
"port" = "8086",
"user" = "oracle",
"password" = "oracle",
"database" = "oracle",
"odbc_type" = "oracle",
"driver" = "Oracle"
);
Here we have created a Resource named oracle_test_odbc , whose type is odbc_catalog , indicating that this is a Resource
used to store ODBC information. odbc_type is oracle , indicating that this OBDC Resource is used to connect to the Oracle
database. For other types of resources, see the resource management documentation for details.
) ENGINE=ODBC
COMMENT "ODBC"
PROPERTIES (
"odbc_catalog_resource" = "oracle_test_odbc",
"database" = "oracle",
"table" = "baseall"
);
Here we create an ext_oracle_demo external table and reference the oracle_test_odbc Resource created earlier
Import Data
1. Create the Doris table
Here we create a Doris table with the same column information as the external table ext_oracle_demo created in the
previous step:
PROPERTIES (
"replication_num" = "1"
);
For detailed instructions on creating Doris tables, see CREATE-TABLE syntax help.
The INSERT command is a synchronous command, and a successful return indicates that the import was successful.
Precautions
It must be ensured that the external data source and the Doris cluster can communicate with each other, including the
network between the BE node and the external data source.
ODBC external tables essentially access the data source through a single ODBC client, so it is not suitable to import a
large amount of data at one time. It is recommended to import multiple times in batches.
The INSERT statement is used in a similar way to the INSERT statement used in databases such as MySQL. The INSERT
statement supports the following two syntaxes:
Here we only introduce the second way. For a detailed description of the INSERT command, see the INSERT command
documentation.
Single write
Single write means that the user directly executes an INSERT command. An example is as follows:
INSERT INTO example_tbl (col1, col2, col3) VALUES (1000, "test", 3.25);
Therefore, whether it is importing one piece of data or multiple pieces of data, we do not recommend using this method for
data import in the production environment. The INSERT operation of high-frequency words will result in a large number of
small files in the storage layer, which will seriously affect the system performance.
This method is only used for simple offline tests or low-frequency operations.
Or you can use the following methods for batch insert operations:
We recommend that the number of inserts in a batch be as large as possible, such as thousands or even 10,000 at a time. Or
you can use PreparedStatement to perform batch inserts through the following procedure.
JDBC example
Here we give a simple JDBC batch INSERT code example:
package demo.doris;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
private static final int PORT = 9030; // query port of Leader Node
insert();
// String query = "insert into " + TBL + " WITH LABEL my_label values(?, ?)";
try {
Class.forName(JDBC_DRIVER);
stmt = conn.prepareStatement(query);
stmt.setInt(1, i);
stmt.setInt(2, i * 100);
stmt.addBatch();
System.out.println(res);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if (stmt != null) {
stmt.close();
se2.printStackTrace();
try {
se.printStackTrace();
1. The JDBC connection string needs to add the rewriteBatchedStatements=true parameter and use the
PreparedStatement method.
Currently, Doris does not support PrepareStatemnt on the server side, so the JDBC Driver will perform batch Prepare on
the client side.
rewriteBatchedStatements=true will ensure that the Driver executes batches. And finally form an INSERT statement of
the following form and send it to Doris:
2. Batch size
Because batch processing is performed on the client, if a batch is too large, it will occupy the memory resources of the
client, so you need to pay attention.
Doris will support PrepareStatemnt on the server in the future, so stay tuned.
3. Import atomicity
Like other import methods, the INSERT operation itself supports atomicity. Each INSERT operation is an import
transaction, which guarantees atomic writing of all data in an INSERT.
As mentioned earlier, we recommend that when using INSERT to import data, use the "batch" method to import, rather
than a single insert.
At the same time, we can set a Label for each INSERT operation. Through the Label mechanism, the idempotency and
atomicity of operations can be guaranteed, and the data will not be lost or heavy in the end. For the specific usage of
Label in INSERT, you can refer to the INSERT document.
For the materialized view attached to the table, atomicity and consistency with the base table are also guaranteed.
Label mechanism
Doris's import job can set a Label. This Label is usually a user-defined string with certain business logic attributes.
The main function of Label is to uniquely identify an import task, and to ensure that the same Label will only be successfully
imported once.
The Label mechanism can ensure that the imported data is not lost or heavy. If the upstream data source can guarantee the
At-Least-Once semantics, with the Doris Label mechanism, the Exactly-Once semantics can be guaranteed.
Label is unique under a database. The retention period for labels is 3 days by default. That is, after 3 days, the completed Label
will be automatically cleaned up, and then the Label can be reused.
Best Practices
Labels are usually formatted as business logic + time . Such as my_business1_20220330_125000 .
This Label is usually used to represent: a batch of data generated by the business my_business1 at 2022-03-30 12:50:00 .
Through this Label setting, the business can query the import task status through the Label to clearly know whether the
batch data has been imported successfully at this point in time. If unsuccessful, you can continue to retry the import using
this Label
DATA INFILE("bos://bucket/input/file")
PRECEDING FILTER k1 = 1
SET (
k3 = tmpk3 + 1
WHERE k1 > k2
...
);
STREAM LOAD
curl
--location-trusted
-u user:passwd
-T file.txt
http://host:port/api/testDb/testTbl/_stream_load
ROUTINE LOAD
PRECEDING FILTER k1 = 1,
WHERE k1 > k2
...
The above import methods all support column mapping, transformation and filtering operations on the source data:
PRECEDING FILTER k1 = 1
Mapping: Define the columns in the source data. If the defined column name is the same as the column in the table, it is
directly mapped to the column in the table. If different, the defined column can be used for subsequent transformation
operations. As in the example above:
Conversion: Convert the mapped columns in the first step, you can use built-in expressions, functions, and custom
functions for conversion, and remap them to the corresponding columns in the table. As in the example above:
k3 = tmpk3 + 1
Post filtering: Filter the mapped and transformed columns by expressions. Filtered data rows are not imported into the
system. As in the example above:
WHERE k1 > k2
column mapping
The purpose of column mapping is mainly to describe the information of each column in the import file, which is equivalent
to defining the name of the column in the source data. By describing the column mapping relationship, we can import
source files with different column order and different number of columns into Doris. Below we illustrate with an example:
Assuming that the source file has 4 columns, the contents are as follows (the header column names are only for
convenience, and there is no header actually):
4 \N chongqing 1.4
Suppose there are 4 columns k1,k2,k3,k4 in the table. The import mapping relationship we want is as follows:
column 1 -> k1
column 2 -> k3
column 3 -> k2
column 4 -> k4
2. There are more columns in the source file than in the table
Suppose there are 3 columns k1,k2,k3 in the table. The import mapping relationship we want is as follows:
column 1 -> k1
column 2 -> k3
column 3 -> k2
where tmpk4 is a custom column name that does not exist in the table. Doris ignores this non-existing column name.
3. The number of columns in the source file is less than the number of columns in the table, fill with default values
Suppose there are 5 columns k1,k2,k3,k4,k5 in the table. The import mapping relationship we want is as follows:
column 1 -> k1
column 2 -> k3
column 3 -> k2
Here we only use the first 3 columns from the source file. The two columns k4,k5 want to be filled with default values.
If the k4,k5 columns have default values, the default values will be populated. Otherwise, if it is a nullable column, it
will be populated with a null value. Otherwise, the import job will report an error.
Column pre-filtering
Pre-filtering is to filter the read raw data once. Currently only BROKER LOAD and ROUTINE LOAD are supported.
Scenarios where you want to filter before column mapping and transformation. It can filter out some unwanted data
first.
2. The filter column does not exist in the table, it is only used as a filter identifier
For example, the source data stores the data of multiple tables (or the data of multiple tables is written to the same Kafka
message queue). Each row in the data has a column name to identify which table the row of data belongs to. Users can
filter the corresponding table data for import by pre-filtering conditions.
Column conversion
The column transformation function allows users to transform column values in the source file. Currently, Doris supports
most of the built-in functions and user-defined functions for conversion.
Note: The user-defined function belongs to a certain database. When using the user-defined function for conversion, the
user needs to have read permission to this database.
Transformation operations are usually defined along with column mappings. That is, the columns are first mapped and then
converted. Below we illustrate with an example:
Assuming that the source file has 4 columns, the contents are as follows (the header column names are only for
convenience, and there is no header actually):
1. Convert the column values in the source file and import them into the table
Suppose there are 4 columns k1,k2,k3,k4 in the table. Our desired import mapping and transformation relationship is
as follows:
column 1 -> k1
column 3 -> k2
column 4 -> k4
This is equivalent to us naming the second column in the source file tmpk3 , and specifying that the value of the k3
column in the table is tmpk3 * 100 . The data in the final table is as follows:
k1 k2 k3 k4
Suppose there are 4 columns k1,k2,k3,k4 in the table. We hope that beijing, shanghai, guangzhou, chongqing in the
source data are converted to the corresponding region ids and imported:
column 1 -> k1
column 2 -> k2
column 4 -> k4
(k1, k2, tmpk3, k4, k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2 when "guangzhou" then 3 when
"chongqing" then 4 else null end)
k1 k2 k3 k4
1 100 1 1.1
2 200 2 1.2
3 300 3 1.3
Suppose there are 4 columns k1,k2,k3,k4 in the table. While converting the region id, we also want to convert the null
value of the k1 column in the source data to 0 and import:
column 2 -> k2
column 3 -> k3
column 4 -> k4
(tmpk1, k2, tmpk3, k4, k1 = ifnull(tmpk1, 0), k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2
when "guangzhou" then 3 when "chongqing" then 4 else null end)
k1 k2 k3 k4
1 100 1 1.1
2 200 2 1.2
3 300 3 1.3
0 400 4 1.4
List filter
After column mapping and transformation, we can filter the data that we do not want to import into Doris through filter
conditions. Below we illustrate with an example:
Assuming that the source file has 4 columns, the contents are as follows (the header column names are only for
convenience, and there is no header actually):
Suppose there are 4 columns k1,k2,k3,k4 in the table. We can define filter conditions directly with default column
mapping and transformation. If we want to import only the data rows whose fourth column in the source file is greater
than 1.2, the filter conditions are as follows:
k1 k2 k3 k4
k1 k2 k3 k4
By default, Doris maps columns sequentially, so column 4 in the source file is automatically mapped to column k4 in
the table.
Suppose there are 4 columns k1,k2,k3,k4 in the table. In the column conversion example, we converted province
names to ids. Here we want to filter out the data with id 3. Then the conversion and filter conditions are as follows:
(k1, k2, tmpk3, k4, k3 = case tmpk3 when "beijing" then 1 when "shanghai" then 2 when "guangzhou" then 3 when
"chongqing" then 4 else null end)
where k3 != 3
k1 k2 k3 k4
1 100 1 1.1
2 200 2 1.2
Here we see that the column value when performing the filter is the final column value after mapping and
transformation, not the original data.
3. Multi-condition filtering
Suppose there are 4 columns k1,k2,k3,k4 in the table. We want to filter out the data whose k1 column is null , and at
the same time filter out the data whose k4 column is less than 1.2, the filter conditions are as follows:
k1 k2 k3 k4
2 200 2 1.2
3 300 3 1.3
1. Filtered Rows
Data that was filtered out due to poor data quality. Unqualified data quality includes data format problems such as type
error, precision error, long string length, mismatched file column number, and data rows that are filtered out because
there is no corresponding partition.
2. Unselected Rows
This part is the row of data that was filtered out due to preceding filter or where column filter conditions.
3. Loaded Rows
Doris's import task allows the user to set a maximum error rate ( max_filter_ratio ). If the error rate of the imported data is
below the threshold, those erroneous rows will be ignored and other correct data will be imported.
That is to say, Unselected Rows will not participate in the calculation of the error rate.
This document mainly explains how to set strict mode, and the impact of strict mode.
How to set
Strict mode is all False by default, i.e. off.
1. BROKER LOAD
DATA INFILE("bos://my_bucket/input/file.txt")
"bos_endpoint" = "http://bj.bcebos.com",
"bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"bos_secret_accesskey"="yyyyyyyyyyyyyyyyyyyyyyyy"
PROPERTIES
"strict_mode" = "true"
2. STREAM LOAD
-H "strict_mode: true" \
-T 1.txt \
http://host:port/api/example_db/my_table/_stream_load
3. ROUTINE LOAD
PROPERTIES
"strict_mode" = "true"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic"
);
4. INSERT
Set via session variables:
For column type conversion, if strict mode is turned on, the wrong data will be filtered. The wrong data here refers to: the
original data is not null , but the result is null after column type conversion.
The column type conversion mentioned here does not include the null value calculated by the function.
For an imported column type that contains range restrictions, if the original data can pass the type conversion normally, but
cannot pass the range restrictions, strict mode will not affect it. For example: if the type is decimal(1,0) and the original data
is 10, it belongs to the range that can be converted by type but is not within the scope of the column declaration. This kind of
data strict has no effect on it.
Primitive data type Primitive data example Converted value to TinyInt Strict mode Result
Description:
Primitive Data Types Examples of Primitive Data Converted to Decimal Strict Mode Result
Description:
Binlog Load
The Binlog Load feature enables Doris to incrementally synchronize update operations in MySQL, so as to CDC(Change Data
Capture) of data on Mysql.
Scenarios
Support insert / update / delete operations
Filter query
Temporarily incompatible with DDL statements
Principle
In the design of phase one, Binlog Load needs to rely on canal as an intermediate medium, so that canal can be pretended to
be a slave node to get and parse the binlog on the MySQL master node, and then Doris can get the parsed data on the canal.
This process mainly involves mysql, canal and Doris. The overall data flow is as follows:
+---------------------------------------------+
| Mysql |
+----------------------+----------------------+
| Binlog
+----------------------v----------------------+
| Canal Server |
+-------------------+-----^-------------------+
Get | | Ack
+-------------------|-----|-------------------+
| FE | | |
| +-----------------|-----|----------------+ |
| | Sync Job | | | |
| | +------------v-----+-----------+ | |
| | | Canal Client | | |
| | | +-----------------------+ | | |
| | | | Receiver | | | |
| | | +-----------------------+ | | |
| | | +-----------------------+ | | |
| | | | Consumer | | | |
| | | +-----------------------+ | | |
| | +------------------------------+ | |
| +----+---------------+--------------+----+ |
| | | | |
| | | | |
+----------------------+----------------------+
| | |
+----v-----------------v------------------v---+
| Coordinator |
| BE |
+----+-----------------+------------------+---+
| | |
| BE | | BE | | BE |
As shown in the figure above, the user first submits a SyncJob to the Fe.
Then, Fe will start a Canal Client for each SyncJob to subscribe to and get data from the Canal Server.
The Receiver in the Canal Client will receives data by the GET request. Every time a Batch is received, it will be distributed by
the Consumer to different Channels according to the corresponding target table. Once a channel received data distributed
by Consumer, it will submit a send task for sending data.
A Send task is a request from Channel to Be, which contains the data of the same Batch distributed to the current channel.
Channel controls the begin, commit and abort of transaction of single table. In a transaction, the consumer may distribute
multiple Batches of data to a channel, so multiple send tasks may be generated. These tasks will not actually take effect until
the transaction is committed successfully.
When certain conditions are met (for example, a certain period of time was passed, reach the maximun data size of commit),
the Consumer will block and notify each channel to try commit the transaction.
If and only if all channels are committed successfully, Canal Server will be notified by the ACK request and Canal Client
continue to get and consume data.
If there are any Channel fails to commit, it will retrieve data from the location where the last consumption was successful and
commit again (the Channel that has successfully commited before will not commmit again to ensure the idempotency of
commit).
In the whole cycle of a SyncJob, Canal Client continuously received data from Canal Server and send it to Be through the
above process to complete data synchronization.
The architecture of master-slave synchronization is usually composed of a master node (responsible for writing) and one or
more slave nodes (responsible for reading). All data changes on the master node will be copied to the slave node.
Note that: Currently, you must use MySQL version 5.7 or above to support Binlog Load
To enable the binlog of MySQL, you need to edit the my.cnf file and set it like:
[mysqld]
binlog-format=ROW # 选择 ROW 模式
Principle Description
On MySQL, the binlog files usually name as mysql-bin.000001, mysql-bin.000002... And MySQL will automatically segment
the binlog file when certain conditions are met:
1. MySQL is restarted
2. The user enters the flush logs command
3. The binlog file size exceeds 1G
To locate the latest consumption location of binlog, the binlog file name and position (offset) must be needed.
For instance, the binlog location of the current consumption so far will be saved on each slave node to prepare for
disconnection, reconnection and continued consumption at any time.
--------------------- ---------------------
--------------------- ---------------------
For the master node, it is only responsible for writing to the binlog. Multiple slave nodes can be connected to a master node
at the same time to consume different parts of the binlog log without affecting each other.
Binlog log supports two main formats (in addition to mixed based mode):
Statement-based format:
Binlog only records the SQL statements executed on the master node, and the slave node copies them to the local node
for re-execution.
Row-based format:
Binlog will record the data change information of each row and all columns of the master node, and the slave node will
copy and execute the change of each row to the local node.
The first format only writes the executed SQL statements. Although the log volume will be small, it has the following
disadvantages:
Therefore, we need to choose the second format which parses each row of data from the binlog log.
In the row-based format, binlog will record the timestamp, server ID, offset and other information of each binlog event. For
instance, the following transaction with two insert statements:
begin;
commit;
There will be four binlog events, including one begin event, two insert events and one commit event:
SET TIMESTAMP=1538238301/*!*/;
BEGIN
/*!*/.
# at 211935643
# at 211935698
...
'/*!*/;
### SET
### @1=1
### @2=100
# at 211935744
...
'/*!*/;
### SET
### @1=2
### @2=200
# at 211935771
COMMIT/*!*/;
As shown above, each insert event contains modified data. During delete/update, an event can also contain multiple rows of
data, making the binlog more compact.
To open the gtid mode of MySQL, you need to edit the my.cnf configuration file and set it like:
In gtid mode, the master server can easily track transactions, recover data and replicas without binlog file name and offset.
In gtid mode, due to the global validity of gtid, the slave node will no longer need to locate the binlog location on the master
node by saving the file name and offset, but can be located by the data itself. During SyncJob, the slave node will skip the
execution of any gtid transaction already executed before.
Gtid is expressed as a pair of coordinates, source_ID identifies the master node, transaction_ID indicates the order in which
this transaction is executed on the master node (max 263-1).
GTID = source_id:transaction_id
For example, the gtid of the 23rd transaction executed on the same master node is:
3E11FA47-71CA-11E1-9E33-C80AA9429562:23
After downloading, please follow the steps below to complete the deployment.
1. Unzip the canal deployer
2. Create a new directory under the conf folder and rename it as the root directory of instance. The directory name is the
destination mentioned later.
3. Modify the instance configuration file (you can copy from conf/example/instance.properties )
canal.instance.mysql.slaveId = 1234
## mysql adress
canal.instance.master.address = 127.0.0.1:3306
## mysql username/password
canal.instance.dbUsername = canal
canal.instance.dbPassword = canal
sh bin/startup.sh
Multiple instances can be started on the canal server. An instance can be regarded as a slave node. Each instance consists of
the following parts:
-------------------------------------------------
| Server |
| -------------------------------------------- |
| | Instance 1 | |
| | ----------------------------------- | |
| | | MetaManager | | |
| | ----------------------------------- | |
| -------------------------------------------- |
-------------------------------------------------
Parser: Access the data source, simulate the dump protocol, interact with the master, and analyze the protocol
Sink: Linker between parser and store, for data filtering, processing and distribution
Store: Data store
Meta Manager: Metadata management module
Each instance has its own unique ID in the cluster, that is, server ID.
In the canal server, the instance is identified by a unique string named destination. The canal client needs destination to
connect to the corresponding instance.
Note that: canal client and canal instance should correspond to each other one by one
Binlog load has forbidded multiple SyncJobs to connect to the same destination.
The data flow direction in instance is binlog -> Parser -> sink -> store.
Instance parses binlog logs through the parser module, and the parsed data is cached in the store. When the user submits a
SyncJob to Fe, it will start a Canal Client to subscripe and get the data in the store in the corresponding instance.
The store is actually a ring queue. Users can configure its length and storage space by themselves.
1. Get pointer: the GET pointer points to the last location get by the Canal Client.
2. Ack pointer: the ACK pointer points to the location of the last successful consumption.
3. Put pointer: the PUT pointer points to the location where the sink module successfully wrote to the store at last.
| | | ...... |
v v v v
^ ^
| |
ack 0 ack 1
When the Canal Client calls the Get command, the Canal Server will generate data batches and send them to the Canal
Client, and move the Get pointer to the right. The Canal Client can get multiple batches until the Get pointer catches up with
the Put pointer.
When the consumption of data is successful, the Canal Client will return Ack + Batch ID, notify that the consumption has
been successful, and move the Ack pointer to the right. The store will delete the data of this batch from the ring queue, make
room to get data from the upstream sink module, and then move the Put pointer to the right.
When the data consumption fails, the client will return a rollback notification of the consumption failure, and the store will
reset the Get pointer to the left to the Ack pointer's position, so that the next data get by the Canal Client can start from the
Ack pointer again.
Like the slave node in mysql, Canal Server also needs to save the latest consumption location of the client. All metadata in
Canal Server (such as gtid and binlog location) is managed by the metamanager. At present, these metadata is persisted in
the meta.dat file in the instance's root directory in JSON format by default.
Basic Operation
Binlog Load can only support unique target tables from now, and the batch delete feature of the target table must be
activated.
For the method of enabling Batch Delete, please refer to the batch delete function in ALTER TABLE PROPERTY.
Example:
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT "OLAP"
! ! Doris table structure and Mysql table structure field order must be consistent ! !
Create SyncJob
CREATE SYNC `demo`.`job`
(
(id,name)
FROM BINLOG
"type" = "canal",
"canal.server.ip" = "127.0.0.1",
"canal.server.port" = "11111",
"canal.destination" = "xxx",
"canal.username" = "canal",
"canal.password" = "canal"
);
The detailed syntax for creating a data synchronization job can be connected to Doris and CREATE SYNC JOB to view the
syntax help. Here is a detailed introduction to the precautions when creating a job.
grammar :
CREATE SYNC [db.]job_name
channel_desc,
channel_desc
...
binlog_desc
job_name
job_Name is the unique identifier of the SyncJob in the current database. With a specified job name, only one SyncJob
can be running at the same time.
channel_desc
column_Mapping mainly refers to the mapping relationship between the columns of the MySQL source table and the
Doris target table.
If it is not specified, the columns of the source table and the target table will consider correspond one by one in order.
However, we still recommend explicitly specifying the mapping relationship of columns, so that when the schema-
change of the target table (such as adding a nullable column), data synchronization can still be carried out.
Otherwise, when the schema-change occur, because the column mapping relationship is no longer one-to-one, the
SyncJob will report an error.
binlog_desc
binlog_desc defines some necessary information for docking the remote binlog address.
At present, the only supported docking type is the canal type. In canal type, all configuration items need to be prefixed
with the canal prefix.
State
The current stage of the job. The transition between job states is shown in the following figure:
+-------------+
+-----------+ <-------------+
| +-------------+ |
+----v-------+ +-------+----+
| +-----------------------> |
| +-------------+ |
| | CANCELLED | |
+-----------> <-------------+
system error
After the Fe scheduler starts the canal client, the status becomes running.
User can control the status of the job by three commands: stop/pause/resume . After the operation, the job status is
cancelled/paused/running respectively.
There is only one final stage of the job, Cancelled. When the job status changes to Canceled, it cannot be resumed
again.
When an error occurs during SyncJob is running, if the error is unrecoverable, the status will change to cancelled,
otherwise it will change to paused.
Channel
The mapping relationship between all source tables and target tables of the job.
Status
The latest consumption location of the current binlog (if the gtid mode is on, the gtid will be displayed), and the delay
time of the Doris side compared with the MySQL side.
JobConfig
The remote server information of the docking, such as the address of the Canal Server and the destination of the
connected instance.
Control Operation
Users can control the status of jobs through stop/pause/resume commands.
You can use STOP SYNC JOB ; PAUSE SYNC JOB; And RESUME SYNC JOB; commands to view help and examples.
Related Parameters
Canal configuration
canal.ip
canal.port
canal.instance.memory.buffer.size
The queue length of the store ring queue, must be set to the power of 2, the default length is 16384. This value is equal to
the maximum number of events that can be cached on the canal side and directly determines the maximum number of
events that can be accommodated in a transaction on the Doris side. It is recommended to make it large enough to
prevent the upper limit of the amount of data that can be accommodated in a transaction on the Doris side from being
too small, resulting in too frequent transaction submission and data version accumulation.
canal.instance.memory.buffer.memunit
The default space occupied by an event at the canal end, default value is 1024 bytes. This value multiplied by
canal.instance.memory.buffer.size is equal to the maximum space of the store. For example, if the queue length of the
store is 16384, the space of the store is 16MB. However, the actual size of an event is not actually equal to this value, but is
determined by the number of rows of data in the event and the length of each row of data. For example, the insert event
of a table with only two columns is only 30 bytes, but the delete event may reach thousands of bytes. This is because the
number of rows of delete event is usually more than that of insert event.
Fe configuration
The following configuration belongs to the system level configuration of SyncJob. The configuration value can be modified in
configuration file fe.conf.
sync_commit_interval_second
Maximum interval time between commit transactions. If there is still data in the channel that has not been committed
after this time, the consumer will notify the channel to commit the transaction.
min_sync_commit_size
The minimum number of events required to commit a transaction. If the number of events received by Fe is less than it,
Fe will continue to wait for the next batch of data until the time exceeds sync_commit_interval_second . The default value
is 10000 events. If you want to modify this configuration, please ensure that this value is less than the
canal.instance.memory.buffer.size configuration on the canal side (16384 by default). Otherwise, Fe will try to get more
events than the length of the store queue without ack, causing the store queue to block until timeout.
min_bytes_sync_commit
The minimum data size required to commit a transaction. If the data size received by Fe is smaller than it, it will continue
to wait for the next batch of data until the time exceeds sync_commit_interval_second . The default value is 15MB. If you
want to modify this configuration, please ensure that this value is less than the product
canal.instance.memory.buffer.size and canal.instance.memory.buffer.memunit on the canal side (16MB by default).
Otherwise, Fe will try to get data from canal larger than the store space without ack, causing the store queue to block
until timeout.
max_bytes_sync_commit
The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will
immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify
this configuration, please ensure that this value is greater than the product of canal.instance.memory.buffer.size and
canal.instance.memory.buffer.mmemunit on the canal side (16MB by default) and min_bytes_sync_commit 。
max_sync_task_threads_num
The maximum number of threads in the SyncJobs' thread pool. There is only one thread pool in the whole Fe for
synchronization, which is used to process the tasks created by all SyncJobs in the Fe.
FAQ
1. Will modifying the table structure affect data synchronization?
No. In this case, the SyncJob will be checked by the Fe's scheduler thread and be stopped.
No. When creating a SyncJob, FE will check whether the IP:Port + destination is duplicate with the existing job to
prevent multiple jobs from connecting to the same instance.
4. Why is the precision of floating-point type different between MySQL and Doris during data synchronization?
The precision of Doris floating-point type is different from that of MySQL. You can choose to use decimal type instead.
More Help
For more detailed syntax and best practices used by Binlog Load, see Binlog Load command manual, you can also enter
HELP BINLOG in the MySql client command line for more help information.
Edit this page
Feedback
Data Operation Import Import Way Broker Load
Broker Load
Broker load is an asynchronous import method, and the supported data sources depend on the data sources supported by
the Broker process.
Because the data in the Doris table is ordered, Broker load uses the doris cluster resources to sort the data when importing
data. Complete massive historical data migration relative to Spark load, the Doris cluster resource usage is relatively large. ,
this method is used when the user does not have Spark computing resources. If there are Spark computing resources, it is
recommended to use Spark load.
Users need to create Broker load import through MySQL protocol and import by viewing command to check the import
result.
Applicable scene
The source data is in a storage system that the broker can access, such as HDFS.
The amount of data is at the level of tens to hundreds of GB.
Fundamental
After the user submits the import task, FE will generate the corresponding Plan and distribute the Plan to multiple BEs for
execution according to the current number of BEs and file size, and each BE executes a part of the imported data.
BE pulls data from the broker during execution, and imports the data into the system after transforming the data. All BEs are
imported, and FE ultimately decides whether the import is successful.
+----+----+
| |
| FE |
| |
+----+----+
+--------------------------+
| | |
| | | | | |
| BE | | BE | | BE |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
+---v-+-----------v-+----------v-+-+
| HDFS/BOS/AFS cluster |
| |
+----------------------------------+
start import
Let's look at Broker Load through several actual scenario examples. use
`id` string,
`store_id` string,
`company_id` string,
`tower_id` string,
`commodity_id` string,
`commodity_name` string,
`commodity_price` double,
`member_price` double,
`cost_price` double,
`unit` string,
`quantity` double,
`actual_price` double
Then use Hive's Load command to import your data into the Hive table
2. Create a Doris table, refer to the specific table syntax: CREATE TABLE
) ENGINE=OLAP
PARTITION BY RANGE(`rq`)
(
PROPERTIES (
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "MONTH",
"dynamic_partition.start" = "-2147483648",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "P_",
"dynamic_partition.buckets" = "1",
"in_memory" = "false",
"storage_format" = "V2"
);
DATA INFILE("hdfs://192.168.20.123:8020/user/hive/warehouse/ods.db/ods_demo_detail/*/*")
(id,store_id,company_id,tower_id,commodity_id,commodity_name,commodity_price,member_price,cost_price,unit,quantit
COLUMNS FROM PATH AS (`day`)
SET
(rq = str_to_date(`day`,'%Y-%m-
%d'),id=id,store_id=store_id,company_id=company_id,tower_id=tower_id,commodity_id=commodity_id,commodity_name=commo
=member_price,cost_price=cost_price,unit=unit,quantity=quantity,actual_price=actual_price)
"username" = "hdfs",
"password" = ""
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
`id` string,
`store_id` string,
`company_id` string,
`tower_id` string,
`commodity_id` string,
`commodity_name` string,
`commodity_price` double,
`member_price` double,
`cost_price` double,
`unit` string,
`quantity` double,
`actual_price` double
STORED AS ORC
2. Create a Doris table. The table creation statement here is the same as the Doris table creation statement above. Please
refer to the above .
3. Import data using Broker Load
DATA INFILE("hdfs://10.220.147.151:8020/user/hive/warehouse/ods.db/ods_demo_orc_detail/*/*")
FORMAT AS "orc"
(id,store_id,company_id,tower_id,commodity_id,commodity_name,commodity_price,member_price,cost_price,unit,quanti
COLUMNS FROM PATH AS (`day`)
SET
(rq = str_to_date(`day`,'%Y-%m-
%d'),id=id,store_id=store_id,company_id=company_id,tower_id=tower_id,commodity_id=commodity_id,commodity_name=co
=member_price,cost_price=cost_price,unit=unit,quantity=quantity,actual_price=actual_price)
"username" = "hdfs",
"password" = ""
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
Notice:
SET : Here we define the field mapping relationship between the Hive table and the Doris table and some operations
for field conversion
DATA INFILE("hdfs://10.220.147.151:8020/tmp/test_hdfs.txt")
with HDFS (
"fs.defaultFS"="hdfs://10.220.147.151:8020",
"hadoop.username"="root"
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
The specific parameters here can refer to: Broker and [Broker Load](.. /.. /.. /sql-manual/sql-reference-v2 /Data-Manipulation-
Statements/Load/BROKER-LOAD.md) documentation
View import status
We can view the status information of the above import task through the following command,
The specific syntax reference for viewing the import status SHOW LOAD
JobId: 41326624
Label: broker_load_2022_03_23
State: FINISHED
Progress: ETL: 100%; LOAD: 100%
Type: BROKER
ErrorMsg: NULL
URL: NULL
Cancel import
When the broker load job status is not CANCELLED or FINISHED, it can be manually canceled by the user. When canceling,
you need to specify the Label of the import task to be canceled. Cancel the import command syntax to execute CANCEL
LOAD view.
For example: cancel the import job with the label broker_load_2022_03_23 on the database demo
Broker parameters
Broker Load needs to use the Broker process to access remote storage. Different brokers need to provide different
parameters. For details, please refer to Broker documentation.
FE configuration
The following configurations belong to the system-level configuration of Broker load, that is, the configurations that apply to
all Broker load import tasks. The configuration values are adjusted mainly by modifying fe.conf .
min_bytes_per_broker_scanner/max_bytes_per_broker_scanner/max_broker_concurrency
The first two configurations limit the minimum and maximum amount of data processed by a single BE. The third
configuration limits the maximum number of concurrent imports for a job. The minimum amount of data processed,
the maximum number of concurrency, the size of the source file and the number of BEs in the current cluster together
determine the number of concurrent imports.
The number of concurrent imports this time = Math.min (source file size/minimum processing capacity, maximum
concurrent number, current number of BE nodes)
The processing volume of a single BE imported this time = the size of the source file / the number of
concurrent imports this time
Usually the maximum amount of data supported by an import job is max_bytes_per_broker_scanner * number of BE
nodes . If you need to import a larger amount of data, you need to adjust the size of the max_bytes_per_broker_scanner
parameter appropriately.
default allocation:
Best Practices
Application scenarios
The most suitable scenario for using Broker load is the scenario where the original data is in the file system (HDFS, BOS, AFS).
Secondly, since Broker load is the only way of asynchronous import in a single import, if users need to use asynchronous
access when importing large files, they can also consider using Broker load.
Below 3G (included)
Above 3G
Since the maximum processing capacity of a single import BE is 3G, the import of files exceeding 3G needs to be
adjusted by adjusting the import parameters of Broker load to realize the import of large files.
i. Modify the maximum scan amount and maximum concurrent number of a single BE according to the current
number of BEs and the size of the original file.
The amount of data processed by a single BE of the current import task = original file size /
max_broker_concurrency
max_bytes_per_broker_scanner >= the amount of data processed by a single BE of the current import task
For example, for a 100G file, the number of BEs in the cluster is 10
max_broker_concurrency = 10
After modification, all BEs will process the import task concurrently, each BE processing part of the original file.
Note: The configurations in the above two FEs are all system configurations, that is to say, their modifications are
applied to all Broker load tasks.
ii. Customize the timeout time of the current import task when creating an import
The amount of data processed by a single BE of the current import task / the slowest import speed of the
user Doris cluster (MB/s) >= the timeout time of the current import task >= the amount of data processed
by a single BE of the current import task / 10M/s
For example, for a 100G file, the number of BEs in the cluster is 10
iii. When the user finds that the timeout time calculated in the second step exceeds the default import timeout time of
4 hours
At this time, it is not recommended for users to directly increase the maximum import timeout to solve the problem.
If the single import time exceeds the default import maximum timeout time of 4 hours, it is best to divide the files to
be imported and import them in multiple times to solve the problem. The main reason is: if a single import exceeds 4
hours, the time cost of retrying after the import fails is very high.
The expected maximum import file data volume of the Doris cluster can be calculated by the following formula:
Expected maximum import file data volume = 14400s * 10M/s * number of BEs
Expected maximum import file data volume = 14400s * 10M/s * 10 = 1440000M ≈ 1440G
Note: The average user's environment may not reach the speed of 10M/s, so it is recommended that files
over 500G be divided and imported.
Job scheduling
The system limits the number of running Broker Load jobs in a cluster to prevent too many Load jobs from running at the
same time.
First, the configuration parameter of FE: desired_max_waiting_jobs will limit the number of Broker Load jobs that have not
started or are running (job status is PENDING or LOADING) in a cluster. Default is 100. If this threshold is exceeded, newly
submitted jobs will be rejected outright.
A Broker Load job is divided into pending task and loading task phases. Among them, the pending task is responsible for
obtaining the information of the imported file, and the loading task will be sent to the BE to execute the specific import task.
The FE configuration parameter async_pending_load_task_pool_size is used to limit the number of pending tasks running at
the same time. It is also equivalent to controlling the number of import tasks that are actually running. This parameter
defaults to 10. That is to say, assuming that the user submits 100 Load jobs, at the same time only 10 jobs will enter the
LOADING state and start execution, while other jobs are in the PENDING waiting state.
The configuration parameter async_loading_load_task_pool_size of FE is used to limit the number of tasks of loading tasks
running at the same time. A Broker Load job will have one pending task and multiple loading tasks (equal to the number of
DATA INFILE clauses in the LOAD statement). So async_loading_load_task_pool_size should be greater than or equal to
async_pending_load_task_pool_size .
Performance Analysis
Session variables can be enabled by executing set enable_profile=true before submitting the LOAD job. Then submit the
import job. After the import job is completed, you can view the profile of the import job in the Queris tab of the FE web
page.
You can check the SHOW LOAD PROFILE help document for more usage help information.
This Profile can help analyze the running status of import jobs.
Currently the Profile can only be viewed after the job has been successfully executed
common problem
Import error: Scan bytes per broker scanner exceed limit:xxx
Please refer to the Best Practices section in the document to modify the FE configuration items
max_bytes_per_broker_scanner and max_broker_concurrency
Import error: failed to send batch or TabletWriter add batch with unknown id
streaming_load_rpc_max_alive_time_sec:
During the import process, Doris will open a Writer for each Tablet to receive data and write. This parameter specifies the
Writer's wait timeout. If the Writer does not receive any data within this time, the Writer will be automatically destroyed.
When the system processing speed is slow, the Writer may not receive the next batch of data for a long time, resulting in
an import error: TabletWriter add batch with unknown id . At this time, this configuration can be appropriately
increased. Default is 600 seconds
If it is data in PARQUET or ORC format, the column name of the file header needs to be consistent with the column name
in the doris table, such as:
(tmp_c1,tmp_c2)
SET
id=tmp_c2,
name=tmp_c1
Represents getting the column with (tmp_c1, tmp_c2) as the column name in parquet or orc, which is mapped to the (id,
name) column in the doris table. If set is not set, the column in column is used as the map.
Note: If you use the orc file directly generated by some hive versions, the header in the orc file is not hive meta data, but
(_col0, _col1, _col2, ...), which may cause Invalid Column Name error, then you need to use set to map
more help
For more detailed syntax and best practices used by Broker Load, see Broker Load command manual, you can also enter
HELP BROKER LOAD in the MySql client command line for more help information.
Routine Load
The Routine Load feature provides users with a way to automatically load data from a specified data source.
This document describes the implementation principles, usage, and best practices of this feature.
Glossary
RoutineLoadJob: A routine load job submitted by the user.
JobScheduler: A routine load job scheduler for scheduling and dividing a RoutineLoadJob into multiple Tasks.
Task: RoutineLoadJob is divided by JobScheduler according to the rules.
TaskScheduler: Task Scheduler. Used to schedule the execution of a Task.
Principle
+---------+
| Client |
+----+----+
+-----------------------------+
| FE | |
| +-----------v------------+ |
| | | |
| | | |
| +---+--------+--------+--+ |
| | | | |
| | | | |
+-----------------------------+
| | |
v v v
| BE | | BE | | BE |
FE splits an load job into several Tasks via JobScheduler. Each Task is responsible for loading a specified portion of the data.
The Task is assigned by the TaskScheduler to the specified BE.
On the BE, a Task is treated as a normal load task and loaded via the Stream Load load mechanism. After the load is
complete, report to FE.
The JobScheduler in the FE continues to generate subsequent new Tasks based on the reported results, or retry the failed
Task.
The entire routine load job completes the uninterrupted load of data by continuously generating new Tasks.
Kafka Routine load
Currently we only support routine load from the Kafka system. This section details Kafka's routine use and best practices.
Usage restrictions
1. Support unauthenticated Kafka access and Kafka clusters certified by SSL.
2. The supported message format is csv text or json format. Each message is a line in csv format, and the end of the line
does not contain a ** line break.
3. Kafka 0.10.0.0 (inclusive) or above is supported by default. If you want to use Kafka versions below 0.10.0.0 (0.9.0, 0.8.2,
0.8.1, 0.8.0), you need to modify the configuration of be, set the value of kafka_broker_version_fallback to be the older
version, or directly set the value of property.broker.version.fallback to the old version when creating routine load. The
cost of the old version is that some of the new features of routine load may not be available, such as setting the offset of
the kafka partition by time.
Here we illustrate how to create Routine Load tasks with a few examples.
1. Create a Kafka example import task named test1 for example_tbl of example_db. Specify the column separator and
group.id and client.id, and automatically consume all partitions by default and subscribe from the location where data is
available (OFFSET_BEGINNING).
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"property.group.id" = "xxx",
"property.client.id" = "xxx",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
2. Create a Kafka example import task named test1 for example_tbl of example_db in strict mode.
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "true"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
Notes :
"strict_mode" = "true"
Routine Load only supports the following two types of json formats
The first one has only one record and is a json object.
{"category":"a9jadhx","author":"test","price":895}
The second one is a json array, which can contain multiple records
"category":"11",
"author":"4avc",
"price":895,
"timestamp":1589191587
},
"category":"22",
"author":"2avc",
"price":895,
"timestamp":1589191487
},
"category":"33",
"author":"3avc",
"price":342,
"timestamp":1589191387
) ENGINE=OLAP
AGGREGATE KEY(`category`,`author`,`timestamp`,`dt`)
COMMENT "OLAP"
PARTITION BY RANGE(`dt`)
PROPERTIES (
"replication_num" = "1"
);
COLUMNS(category,price,author)
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"format" = "json"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "0,0,0"
);
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"format" = "json",
"jsonpaths" = "[\"$.category\",\"$.author\",\"$.price\",\"$.timestamp\"]",
"strip_outer_array" = "true"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "0,0,0"
);
Notes:
The partition field dt in the table is not in our data, but is converted in our Routine load statement by
dt=from_unixtime(timestamp, '%Y%m%d')
Notes:
Notes:
Although 10 is an out-of-range value, it is not affected by strict mode because its type meets the decimal requirement. 10
will eventually be filtered in other ETL processing processes. But it will not be filtered by strict mode.
Accessing an SSL-certified Kafka cluster requires the user to provide the certificate file (ca.pem) used to authenticate the
Kafka Broker's public key. If the Kafka cluster also has client authentication enabled, the client's public key (client.pem), the
key file (client.key), and the key password are also required. The required files need to be uploaded to Doris first via the CREAE
FILE command, and the catalog name is kafka . See HELP CREATE FILE; for help with the CREATE FILE command. Here are
some examples.
1. uploading a file
PROPERTIES
"desired_concurrent_number"="1"
FROM KAFKA
"kafka_broker_list"= "broker1:9091,broker2:9091",
"kafka_topic" = "my_topic",
"property.security.protocol" = "ssl",
"property.ssl.ca.location" = "FILE:ca.pem",
"property.ssl.certificate.location" = "FILE:client.pem",
"property.ssl.key.location" = "FILE:client.key",
"property.ssl.key.password" = "abcdefg"
);
Doris accesses Kafka clusters through Kafka's C++ API librdkafka . The parameters supported by librdkafka can be
found in
[https: //github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md]
(https: //github.com/edenhill/librdkafka/blob/master/ CONFIGURATION.md)
PROPERTIES (
"desired_concurrent_number"="1",
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"property.security.protocol" = "SASL_PLAINTEXT",
"property.sasl.kerberos.service.name" = "kafka",
"property.sasl.kerberos.keytab" = "/etc/krb5.keytab",
"property.sasl.kerberos.principal" = "[email protected]"
);
Note:
To enable Doris to access the Kafka cluster with Kerberos authentication enabled, you need to deploy the Kerberos client
kinit on all running nodes of the Doris cluster, configure krb5.conf, and fill in KDC service information.
Configure property.sasl.kerberos The value of keytab needs to specify the absolute path of the keytab local file and allow
Doris processes to access the local file.
Specific commands and examples to view the status of tasks running can be viewed with the HELP SHOW ROUTINE LOAD TASK;
command.
Only currently running tasks can be viewed; closed and unstarted tasks cannot be viewed.
Job Control
The user can control the stop, pause and restart of jobs with the STOP/PAUSE/RESUME commands. Help and examples can be
viewed with the HELP STOP ROUTINE LOAD; HELP PAUSE ROUTINE LOAD; and HELP RESUME ROUTINE LOAD; commands.
Other notes
1. The relationship between a routine import job and an ALTER TABLE operation
Example import does not block SCHEMA CHANGE and ROLLUP operations. However, note that if the column
mapping relationships do not match after the SCHEMA CHANGE completes, it can cause a spike in error data for the
job and eventually cause the job to pause. It is recommended that you reduce this problem by explicitly specifying
column mapping relationships in routine import jobs and by adding Nullable columns or columns with Default
values.
Deleting a Partition of a table may cause the imported data to fail to find the corresponding Partition and the job to
enter a pause. 2.
2. Relationship between routine import jobs and other import jobs (LOAD, DELETE, INSERT)
There is no conflict between the routine import and other LOAD operations and INSERT operations.
When the DELETE operation is executed, the corresponding table partition cannot have any ongoing import jobs.
Therefore, before executing DELETE operation, you may need to suspend the routine import job and wait until all the
issued tasks are completed before executing DELETE. 3.
When the database or table corresponding to the routine import is deleted, the job will automatically CANCEL.
4. The relationship between kafka type routine import jobs and kafka topic
When the kafka_topic declared by the user in the create routine import statement does not exist in the kafka cluster.
If the broker of the user's kafka cluster has auto.create.topics.enable = true set, then kafka_topic will be created
automatically first, and the number of partitions created automatically is determined by the configuration of the
broker in the user's kafka cluster with num. partitions . The routine job will keep reading data from the topic as
normal.
If the broker in the user's kafka cluster has auto.create.topics.enable = false set, the topic will not be created
automatically and the routine job will be suspended with a status of PAUSED before any data is read.
So, if you want the kafka topic to be automatically created by the routine when it does not exist, just set
auto.create.topics.enable = true for the broker in the user's kafka cluster.
5. Problems that may arise in network isolated environments In some environments there are isolation measures for
network segments and domain name resolution, so care needs to be taken
i. the Broker list specified in the Create Routine load task must be accessible by the Doris service
ii. If advertised.listeners is configured in Kafka, the addresses in advertised.listeners must be accessible to the
Doris service
kafka_offsets : specifies the starting offset of each partition, which must correspond to the number of
kafka_partitions list. For example: " 1000, 1000, 2000, 2000"
When creating an import job, these three parameters can have the following combinations.
FE will automatically clean up the ROUTINE LOAD in STOP status periodically, while the PAUSE status can be restored to
enable again.
Related Parameters
Some system configuration parameters can affect the use of routine import.
1. max_routine_load_task_concurrent_num
FE configuration item, defaults to 5 and can be modified at runtime. This parameter limits the maximum number of
concurrent subtasks for a routine import job. It is recommended to keep the default value. Setting it too large may result
in too many concurrent tasks and consume cluster resources.
2. max_routine_load_task_num_per_be
FE configuration item, default is 5, can be modified at runtime. This parameter limits the maximum number of
concurrently executed subtasks per BE node. It is recommended to keep the default value. If set too large, it may lead to
too many concurrent tasks and consume cluster resources.
3. max_routine_load_job_num
FE configuration item, default is 100, can be modified at runtime. This parameter limits the total number of routine
import jobs, including the states NEED_SCHEDULED, RUNNING, PAUSE. After this, no new jobs can be submitted.
4. max_consumer_num_per_group
BE configuration item, default is 3. This parameter indicates the maximum number of consumers that can be generated
for data consumption in a subtask. For a Kafka data source, a consumer may consume one or more kafka partitions. If
there are only 2 partitions, only 2 consumers are generated, each consuming 1 partition. 5. push_write_mby
5. max_tolerable_backend_down_num
FE configuration item, the default value is 0. Doris can PAUSED job rescheduling to RUNNING if certain conditions are
met. 0 means rescheduling is allowed only if all BE nodes are ALIVE.
6. period_of_auto_resume_min
FE configuration item, the default is 5 minutes, Doris rescheduling will only be attempted up to 3 times within the 5
minute period. If all 3 attempts fail, the current task is locked and no further scheduling is performed. However, manual
recovery can be done through human intervention.
More help
For more detailed syntax on the use of Routine Load, you can type HELP ROUTINE LOAD at the Mysql client command line for
more help.
Spark Load
Spark load realizes the preprocessing of load data by spark, improves the performance of loading large amount of Doris data
and saves the computing resources of Doris cluster. It is mainly used for the scene of initial migration and large amount of
data imported into Doris.
Spark load uses the resources of the spark cluster to sort the data to be imported, and Doris be writes files directly, which can
greatly reduce the resource usage of the Doris cluster, and is very good for historical mass data migration to reduce the
resource usage and load of the Doris cluster. Effect.
If users do not have the resources of Spark cluster and want to complete the migration of external storage historical data
conveniently and quickly, they can use Broker load . Compared with Spark load, importing Broker load will consume more
resources on the Doris cluster.
Spark load is an asynchronous load method. Users need to create spark type load job by MySQL protocol and view the load
results by show load .
Applicable scenarios
The source data is in a file storage system that spark can access, such as HDFS.
Explanation of terms
1. Spark ETL: in the load process, it is mainly responsible for ETL of data, including global dictionary construction (bitmap
type), partition, sorting, aggregation, etc.
2. Broker: broker is an independent stateless process. It encapsulates the file system interface and provides the ability of
Doris to read the files in the remote storage system.
3. Global dictionary: it stores the data structure from the original value to the coded value. The original value can be any
data type, while the encoded value is an integer. The global dictionary is mainly used in the scene of precise de
duplication precomputation.
Basic principles
Basic process
The user submits spark type load job by MySQL client, Fe records metadata and returns that the user submitted successfully.
The implementation of spark load task is mainly divided into the following five stages.
2. Spark cluster executes ETL to complete the preprocessing of load data. It includes global dictionary building (bitmap
type), partitioning, sorting, aggregation, etc.
3. After the ETL task is completed, Fe obtains the data path of each partition that has been preprocessed, and schedules the
related be to execute the push task.
4. Be reads data through broker and converts it into Doris underlying storage format.
+----v----+
| FE |---------------------------------+
+----+----+ |
| 5. FE publish version |
+------------+------------+ |
| | | |
| | | |
| <-------+ |
+---------------------------------+ +-----------------------------+
Global dictionary
Applicable scenarios
At present, the bitmap column in Doris is implemented using the class library ' roaingbitmap ', while the input data type of
' roaringbitmap ' can only be integer. Therefore, if you want to pre calculate the bitmap column in the import process, you
need to convert the type of input data to integer.
In the existing Doris import process, the data structure of global dictionary is implemented based on hive table, which stores
the mapping from original value to encoded value.
Build process
1. Read the data from the upstream data source and generate a hive temporary table, which is recorded as hive_table .
2. Extract the de duplicated values of the fields to be de duplicated from the hive_table , and generate a new hive table,
which is marked as distinct_value_table .
3. Create a new global dictionary table named dict_table ; one column is the original value, and the other is the encoded
value.
4. Left join the distinct_value_table and dict_table , calculate the new de duplication value set, and then code this set
with window function. At this time, the original value of the de duplication column will have one more column of
encoded value. Finally, the data of these two columns will be written back to dict_table .
5. Join the dict_table with the hive_table to replace the original value in the hive_table with the integer encoded value.
6. hive_table will be read by the next data preprocessing process and imported into Doris after calculation.
Data preprocessing (DPP)
Basic process
1. Read data from the data source. The upstream data source can be HDFS file or hive table.
2. Map the read data, calculate the expression, and generate the bucket field bucket_id according to the partition
information.
4. Traverse rolluptree to perform hierarchical aggregation. The rollup of the next level can be calculated from the rollup of
the previous level.
5. After each aggregation calculation, the data will be calculated according to the bucket_id is divided into buckets and
then written into HDFS.
6. Subsequent brokers will pull the files in HDFS and import them into Doris be.
Basic operation
Before submitting the spark import task, you need to configure the spark cluster that performs the ETL task.
Grammar:
PROPERTIES
type = spark,
spark_conf_key = spark_conf_value,
working_dir = path,
broker = broker_name,
broker.property_key = property_value,
broker.hadoop.security.authentication = kerberos,
broker.kerberos_principal = [email protected],
broker.kerberos_keytab = /home/doris/my.keytab
broker.kerberos_keytab_content = ASDOWHDLAWIDJHWLDKSALDJSDIWALD
-- show resources
SHOW RESOURCES
-- privileges
Create resource
spark.submit.deployMode : the deployment mode of Spark Program. It is required and supports cluster and client.
working_dir : directory used by ETL. Spark is required when used as an ETL resource. For example: hdfs://host
:port/tmp/doris .
broker.kerberos_keytab : Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on
the server where the broker process is located. And can be accessed by the Broker process.
broker.kerberos_keytab_content : Specify the content of the keytab file in kerberos after base64 encoding. You can
choose one of these with broker.kerberos_keytab configuration.
broker : the name of the broker. Spark is required when used as an ETL resource. You need to use the 'alter system add
broker' command to complete the configuration in advance.
broker.property_key : the authentication information that the broker needs to specify when reading the intermediate
file generated by ETL.
env : Specify the spark environment variable and support dynamic setting. For example, when the authentication mode
of Hadoop is simple, set the Hadoop user name and password
Example:
-- yarn cluster 模式
PROPERTIES
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "queue0",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
);
PROPERTIES
"type" = "spark",
"spark.master" = "spark://127.0.0.1:7777",
"spark.submit.deployMode" = "client",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker1"
);
If Spark load accesses Hadoop cluster resources with Kerberos authentication, we only need to specify the following
parameters when creating Spark resources:
broker.kerberos_keytab : Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on
the server where the broker process is located. And can be accessed by the Broker process.
broker.kerberos_keytab_content : Specify the content of the keytab file in kerberos after base64 encoding. You can
choose one of these with kerberos_keytab configuration.
Example :
CREATE EXTERNAL RESOURCE "spark_on_kerberos"
PROPERTIES
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "queue0",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "[email protected]",
"broker.kerberos_keytab" = "/home/doris/my.keytab"
);
Show resources
Ordinary accounts can only see the resources that they have USAGE_PRIV to use.
The root and admin accounts can see all the resources.
Resource privilege
Resource permissions are managed by grant revoke. Currently, only USAGE_PRIV permission is supported.
You can use the USAGE_PRIV permission is given to a user or a role, and the role is used the same as before.
Place the spark client on the same machine as Fe and configure spark_home_default_dir in the fe.conf . This configuration
item defaults to the fe/lib/spark2x path. This config cannot be empty.
Package all jar packages in jars folder under spark client root path into a zip file, and configure spark_resource_patj in
fe.conf as this zip file's path.
When the spark load task is submitted, this zip file will be uploaded to the remote repository, and the default repository path
will be hung in working_dir/{cluster_ID} directory named as __spark_repository__{resource_name} , which indicates that a
resource in the cluster corresponds to a remote warehouse. The directory structure of the remote warehouse is as follows:
__spark_repository__spark0/
|-__archive_1.0.0/
| |-__lib_990325d2c0d1d5e45bf675e54e44fb16_spark-dpp-1.0.0-jar-with-dependencies.jar
| |-__lib_7670c29daf535efe3c9b923f778f61fc_spark-2x.zip
|-__archive_1.1.0/
| |-__lib_64d5696f99c379af2bee28c1c84271d5_spark-dpp-1.1.0-jar-with-dependencies.jar
| |-__lib_1bbb74bb6b264a270bc7fca3e964160f_spark-2x.zip
|-__archive_1.2.0/
| |-...
In addition to spark dependency (named by spark-2x.zip by default), Fe will also upload DPP's dependency package to the
remote repository. If all the dependency files submitted by spark load already exist in the remote repository, then there is no
need to upload dependency, saving the time of repeatedly uploading a large number of files each time.
Place the downloaded yarn client in the same machine as Fe, and configure yarn_client_path in the fe.conf as the
executable file of yarn, which is set as the fe/lib/yarn-client/hadoop/bin/yarn by default.
(optional) when Fe obtains the application status or kills the application through the yarn client, the configuration files
required for executing the yarn command will be generated by default in the lib/yarn-config path in the Fe root directory.
This path can be configured by configuring yarn-config-dir in the fe.conf . The currently generated configuration yarn
config files include core-site.xml and yarn-site.xml .
Create Load
Grammar:
(data_desc, ...)
WITH RESOURCE resource_name resource_properties
* load_label:
db_name.label_name
* data_desc:
[NEGATIVE]
[(col1, ...)]
[WHERE predicate]
* resource_properties:
(key2=value2, ...)
DATA INFILE("hdfs://abc.com:8888/user/palo/test/ml/file1")
(tmp_c1,tmp_c2)
SET
id=tmp_c2,
name=tmp_c1
),
DATA INFILE("hdfs://abc.com:8888/user/palo/test/ml/file2")
(col1, col2)
"spark.executor.memory" = "2g",
"spark.shuffle.compress" = "true"
PROPERTIES
"timeout" = "3600"
);
k1 INT,
K2 SMALLINT,
k3 varchar(50),
uuid varchar(100)
ENGINE=hive
properties
"database" = "tmp",
"table" = "t1",
"hive.metastore.uris" = "thrift://0.0.0.0:8080"
);
step 2: 提交 load 命令
(k1,k2,k3)
SET
uuid=bitmap_dict(uuid)
"spark.executor.memory" = "2g",
"spark.shuffle.compress" = "true"
PROPERTIES
"timeout" = "3600"
);
Example 3: when the upstream data source is hive binary type table
k1 INT,
K2 SMALLINT,
k3 varchar(50),
uuid varchar(100)
ENGINE=hive
properties
"database" = "tmp",
"table" = "t1",
"hive.metastore.uris" = "thrift://0.0.0.0:8080"
);
(k1,k2,k3)
SET
uuid=binary_bitmap(uuid)
"spark.executor.memory" = "2g",
"spark.shuffle.compress" = "true"
PROPERTIES
"timeout" = "3600"
);
id int,
name string,
age int
stored as textfile;
dt date,
id int,
name string,
age int
PROPERTIES (
);
-- spark load
PROPERTIES
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "default",
"spark.hadoop.yarn.resourcemanager.address" = "localhost:50056",
"spark.hadoop.fs.defaultFS" = "hdfs://localhost:9000",
"working_dir" = "hdfs://localhost:9000/tmp/doris",
"broker" = "broker_01"
);
DATA INFILE("hdfs://localhost:9000/user/hive/warehouse/demo.db/test/dt=2022-08-01/*")
FORMAT AS "csv"
(id,name,age)
SET
dt=dt,
id=id,
name=name,
age=age
"spark.executor.memory" = "1g",
"spark.shuffle.compress" = "true"
PROPERTIES
"timeout" = "3600"
);
You can view the details syntax about creating load by input help spark load . This paper mainly introduces the parameter
meaning and precautions in the creation and load syntax of spark load.
Label
Identification of the import task. Each import task has a unique label within a single database. The specific rules are
consistent with broker load .
Currently, the supported data sources are CSV and hive table. Other rules are consistent with broker load .
Load job parameters mainly refer to the opt_properties in the spark load. Load job parameters are applied to the entire load
job. The rules are consistent with broker load .
When users have temporary requirements, such as adding resources for tasks and modifying spark configs, you can set
them here. The settings only take effect for this task and do not affect the existing configuration in the Doris cluster.
"spark.driver.memory" = "1g",
"spark.executor.memory" = "3g"
At present, if you want to use hive table as a data source in the import process, you need to create an external table of type
hive,
Then you can specify the table name of the external table when submitting the Load command.
The data type applicable to the aggregate columns of the Doris table is of type bitmap.
In the load command, you can specify the field to build a global dictionary. The format is: ' doris field
name=bitmap_dict(hive_table field name)
It should be noted that the construction of global dictionary is supported only when the upstream data source is hive table.
The data type applicable to the aggregate column of the doris table is bitmap type, and the data type of the corresponding
column in the hive table of the data source is binary (through the org.apache.doris.load.loadv2.dpp.BitmapValue (FE spark-
dpp) class serialized) type.
There is no need to build a global dictionary, just specify the corresponding field in the load command, the format is: doris
field name=binary_bitmap (hive table field name)
Similarly, the binary (bitmap) type of data import is currently only supported when the upstream data source is a hive
table,You can refer to the use of hive bitmap hive-bitmap-udf
Show Load
Spark load is asynchronous just like broker load, so the user must create the load label record and use label in the show load
command to view the load result. The show load command is common in all load types. The specific syntax can be viewed
by executing help show load.
Example:
JobId: 76391
Label: label1
State: FINISHED
Progress: ETL:100%; LOAD:100%
Type: SPARK
ErrorMsg: N/A
URL: http://1.1.1.1:8089/proxy/application_1586619723848_0035/
JobDetails: {"ScannedRows":28133395,"TaskNumber":1,"FileNumber":1,"FileSize":200000}
Refer to broker load for the meaning of parameters in the returned result set. The differences are as follows:
State
The current phase of the load job. After the job is submitted, the status is pending. After the spark ETL is submitted, the status
changes to ETL. After ETL is completed, Fe schedules be to execute push operation, and the status changes to finished after
the push is completed and the version takes effect.
There are two final stages of the load job: cancelled and finished. When the load job is in these two stages, the load is
completed. Among them, cancelled is load failure, finished is load success.
Progress
Progress description of the load job. There are two kinds of progress: ETL and load, corresponding to the two stages of the
load process, ETL and loading.
Load progress = the number of tables that have completed all replica imports / the total number of tables in this
import task * 100%
If all load tables are loaded, the progress of load is 99%, the load enters the final effective stage. After the whole load is
completed, the load progress will be changed to 100%.
The load progress is not linear. Therefore, if the progress does not change over a period of time, it does not mean that the
load is not in execution.
Type
CreateTime/EtlStartTime/EtlFinishTime/LoadStartTime/LoadFinishTime
These values represent the creation time of the load, the start time of the ETL phase, the completion time of the ETL phase,
the start time of the loading phase, and the completion time of the entire load job.
JobDetails
Display the detailed running status of some jobs, which will be updated when ETL ends. It includes the number of loaded
files, the total size (bytes), the number of subtasks, the number of processed original lines, etc.
{"ScannedRows":139264,"TaskNumber":1,"FileNumber":1,"FileSize":940754064}
URL
Copy this url to the browser and jump to the web interface of the corresponding application.
View spark launcher commit log
Sometimes users need to view the detailed logs generated during the spark submission process. The logs are saved in the
log/spark_launcher_log under the Fe root directory named as spark_launcher_{load_job_id}_{label}.log . The log will be
saved in this directory for a period of time. When the load information in Fe metadata is cleaned up, the corresponding log
will also be cleaned. The default saving log time is 3 days.
cancel load
When the spark load job status is not cancelled or finished, it can be manually cancelled by the user. When canceling, you
need to specify the label to cancel the load job. The syntax of the cancel load command can be viewed by executing help
cancel load .
FE configuration
The following configuration belongs to the system level configuration of spark load, that is, the configuration for all spark
load import tasks. Mainly through modification fe.conf to modify the configuration value.
enable_spark_load
Open spark load and create resource. The default value is false. This feature is turned off.
spark_load_default_timeout_second
spark_home_default_dir
spark_resource_path
spark_launcher_log_dir
The directory where the spark client's commit log is stored ( Fe/log/spark)_launcher_log ).
yarn_client_path
yarn_config_dir
Best practices
Application scenarios
The most suitable scenario to use spark load is that the raw data is in the file system (HDFS), and the amount of data is tens of
GB to TB. Stream load or broker load is recommended for small amount of data.
FAQ
Spark load does not yet support the import of Doris table fields that are of type String. If your table fields are of type
String, please change them to type varchar, otherwise the import will fail, prompting type:ETL_QUALITY_UNSATISFIED;
msg:quality not good enough to cancel
When using spark load, the HADOOP_CONF_DIR environment variable is no set in the spark-env.sh .
If the HADOOP_CONF_DIR environment variable is not set, the error When running with master 'yarn' either HADOOP_CONF_DIR
or YARN_CONF_DIR must be set in the environment will be reported.
When using spark load, the spark_home_default_dir does not specify correctly.
The spark submit command is used when submitting a spark job. If spark_home_default_dir is set incorrectly, an error
Cannot run program 'xxx/bin/spark_submit', error = 2, no such file or directory will be reported.
When using spark load, spark_resource_path does not point to the packaged zip file.
If spark_resource_path is not set correctly. An error file XXX/jars/spark-2x.zip does not exist will be reported.
When using spark load yarn_client_path does not point to a executable file of yarn.
If yarn_client_path is not set correctly. An error yarn client does not exist in path: XXX/yarn-client/hadoop/bin/yarn
will be reported.
When using spark load, the JAVA_HOME environment variable is no set in the hadoop-config.sh on the yarn clinet.
If the JAVA_HOME environment variable is not set, the error yarn application kill failed. app id: xxx, load job id: xxx,
msg: which: no xxx/lib/yarn-client/hadoop/bin/yarn in ((null)) Error: JAVA_HOME is not set and could not be found
will be reported.
More Help
For more detailed syntax used by Spark Load, you can enter HELP SPARK LOAD on the Mysql client command line for more
help.
Stream load
Stream load is a synchronous way of importing. Users import local files or data streams into Doris by sending HTTP protocol
requests. Stream load synchronously executes the import and returns the import result. Users can directly determine
whether the import is successful by the return body of the request.
Stream load is mainly suitable for importing local files or data from data streams through procedures.
Basic Principles
The following figure shows the main flow of Stream load, omitting some import details.
^ +
| |
| |
| +--v-----------+
| | FE |
| |
| | 2. Redirect to BE
| |
| +--v-----------+
| | |
+-----+ | +-----+
| | | 3. Distrbute data
| | |
In Stream load, Doris selects a node as the Coordinator node. This node is responsible for receiving data and distributing data
to other data nodes.
Users submit import commands through HTTP protocol. If submitted to FE, FE forwards the request to a BE via the HTTP
redirect instruction. Users can also submit import commands directly to a specified BE.
The final result of the import is returned to the user by Coordinator BE.
Since Version 1.2 PARQUET and ORC 1.2+ support PARQUET and ORC
Basic operations
Create Load
Stream load submits and transfers data through HTTP protocol. Here, the curl command shows how to submit an import.
Users can also operate through other HTTP clients.
The properties supported in the header are described in "Load Parameters" below
Examples:
The detailed syntax for creating imports helps to execute HELP STREAM LOAD view. The following section focuses on the
significance of creating some parameters of Stream load.
Signature parameters
user/passwd
Stream load uses the HTTP protocol to create the imported protocol and signs it through the Basic Access
authentication. The Doris system verifies user identity and import permissions based on signatures.
Load Parameters
Stream load uses HTTP protocol, so all parameters related to import tasks are set in the header. The significance of some
parameters of the import task parameters of Stream load is mainly introduced below.
label
Identity of import task. Each import task has a unique label inside a single database. Label is a user-defined name in the
import command. With this label, users can view the execution of the corresponding import task.
Another function of label is to prevent users from importing the same data repeatedly. It is strongly recommended that
users use the same label for the same batch of data. This way, repeated requests for the same batch of data will only be
accepted once, guaranteeing at-Most-Once
When the corresponding import operation state of label is CANCELLED, the label can be used again.
column_separator
Used to specify the column separator in the load file. The default is \t . If it is an invisible character, you need to add \x
as a prefix and hexadecimal to indicate the separator.
For example, the separator \x01 of the hive file needs to be specified as -H "column_separator:\x01" .
line_delimiter
Used to specify the line delimiter in the load file. The default is \n .
max_filter_ratio
The maximum tolerance rate of the import task is 0 by default, and the range of values is 0-1. When the import error rate
exceeds this value, the import fails.
If the user wishes to ignore the wrong row, the import can be successful by setting this parameter greater than 0.
dpp.abnorm.ALL denotes the number of rows whose data quality is not up to standard. Such as type mismatch, column
mismatch, length mismatch and so on.
dpp.norm.ALL refers to the number of correct data in the import process. The correct amount of data for the import task
can be queried by the ` SHOW LOAD command.
where
Import the filter conditions specified by the task. Stream load supports filtering of where statements specified for raw
data. The filtered data will not be imported or participated in the calculation of filter ratio, but will be counted as
num_rows_unselected .
partitions
Partitions information for tables to be imported will not be imported if the data to be imported does not belong to the
specified Partition. These data will be included in dpp.abnorm.ALL .
columns
The function transformation configuration of data to be imported includes the sequence change of columns and the
expression transformation, in which the expression transformation method is consistent with the query statement.
Examples of column order transformation: There are three columns of original data (src_c1,src_c2,src_c3), and
there are also three columns (
dst_c1,dst_c2,dst_c3) in the doris table at present.
when the first column src_c1 of the original file corresponds to the dst_c1 column of the target table, while
the second column src_c2 of the original file corresponds to the dst_c2 column of the target table and the
third column src_c3 of the original file corresponds to the dst_c3 column of the target table,which is written
as follows:
when the first column src_c1 of the original file corresponds to the dst_c2 column of the target table, while
the second column src_c2 of the original file corresponds to the dst_c3 column of the target table and the
third column src_c3 of the original file corresponds to the dst_c1 column of the target table,which is written
as follows:
Example of expression transformation: There are two columns in the original file and two columns in the target
table (c1, c2). However, both columns in the original file need to be transformed by functions to correspond
to the two columns in the target table.
format
Specify the import data format, support csv, json, the default is csv
Since Version 1.2 format 1.2 supports csv_with_names (support csv file line header filter), csv_with_names_and_types
exec_mem_limit
merge_type
The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value,
which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with
the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which
means that the data that meets the delete condition is processed according to DELETE semantics and the rest is
processed according to APPEND semantics
two_phase_commit
Stream load import can enable two-stage transaction commit mode: in the stream load process, the data is written and
the information is returned to the user. At this time, the data is invisible and the transaction status is PRECOMMITTED . After
the user manually triggers the commit operation, the data is visible.
Example :
i. Initiate a stream load pre-commit operation
"TxnId": 18036,
"Label": "55c8ffc9-1c40-4d51-b75e-f2265b3602ef",
"TwoPhaseCommit": "true",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 100,
"NumberLoadedRows": 100,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 1031,
"LoadTimeMs": 77,
"BeginTxnTimeMs": 1,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 58,
"CommitAndPublishTimeMs": 0
"status": "Success",
"status": "Success",
Return results
Since Stream load is a synchronous import method, the result of the import is directly returned to the user by creating the
return value of the import.
Examples:
"TxnId": 1003,
"Label": "b6f3bc78-0d2c-45d9-9e4c-faa0a0149bee",
"Status": "Success",
"ExistingJobStatus": "FINISHED", // optional
"Message": "OK",
"NumberTotalRows": 1000000,
"NumberLoadedRows": 1000000,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 40888898,
"LoadTimeMs": 2144,
"BeginTxnTimeMs": 1,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 325,
"WriteDataTimeMs": 1933,
"CommitAndPublishTimeMs": 106,
"ErrorURL": "http://192.168.1.1:8042/api/_load_error_log?
file=__shard_0/error_log_insert_stmt_db18266d4d9b4ee5-abb00ddd64bdf005_db18266d4d9b4ee5_abb00ddd64bdf005"
The following main explanations are given for the Stream load import result parameters:
"Publish Timeout": This state also indicates that the import has been completed, except that the data may be delayed
and visible without retrying.
ExistingJobStatus: The state of the load job corresponding to the existing Label.
This field is displayed only when the status is "Label Already Exists". The user can know the status of the load job
corresponding to Label through this state. "RUNNING" means that the job is still executing, and "FINISHED" means that
the job is successful.
BeginTxnTimeMs: The time cost for RPC to Fe to begin a transaction, Unit milliseconds.
StreamLoadPutTimeMs: The time cost for RPC to Fe to get a stream load plan, Unit milliseconds.
CommitAndPublishTimeMs: The time cost for RPC to Fe to commit and publish a transaction, Unit milliseconds.
ErrorURL: If you have data quality problems, visit this URL to see specific error lines.
Note: Since Stream load is a synchronous import mode, import information will not be recorded in Doris system. Users
cannot see Stream load asynchronously by looking at import commands. You need to listen for the return value of the
create import request to get the import result.
Cancel Load
Users can't cancel Stream load manually. Stream load will be cancelled automatically by the system after a timeout or import
error.
By default, BE does not record Stream Load records. If you want to view records that need to be enabled on BE, the
configuration parameter is: enable_stream_load_record=true . For details, please refer to [BE Configuration Items]
(https: //doris.apache. org/zh-CN/docs/admin-manual/config/be-config)
FE configuration
stream_load_default_timeout_second
The timeout time of the import task (in seconds) will be cancelled by the system if the import task is not completed
within the set timeout time, and will become CANCELLED.
At present, Stream load does not support custom import timeout time. All Stream load import timeout time is uniform.
The default timeout time is 600 seconds. If the imported source file can no longer complete the import within the
specified time, the FE parameter stream_load_default_timeout_second needs to be adjusted.
BE configuration
streaming_load_max_mb
The maximum import size of Stream load is 10G by default, in MB. If the user's original file exceeds this value, the BE
parameter streaming_load_max_mb needs to be adjusted.
Best Practices
Application scenarios
The most appropriate scenario for using Stream load is that the original file is in memory or on disk. Secondly, since Stream
load is a synchronous import method, users can also use this import if they want to obtain the import results in a
synchronous manner.
Data volume
Since Stream load is based on the BE initiative to import and distribute data, the recommended amount of imported data is
between 1G and 10G. Since the default maximum Stream load import data volume is 10G, the configuration of BE
streaming_load_max_mb needs to be modified if files exceeding 10G are to be imported.
Stream load default timeout is 600 seconds, according to Doris currently the largest import speed limit, about more than 3G
files need to modify the import task default timeout.
Import Task Timeout = Import Data Volume / 10M / s (Specific Average Import Speed Requires Users to Calculate
Based on Their Cluster Conditions)
Complete examples
Data situation: In the local disk path /home/store_sales of the sending and importing requester, the imported data is about
15G, and it is hoped to be imported into the table store_sales of the database bj_sales.
Cluster situation: The concurrency of Stream load is not affected by cluster size.
Step 1: Does the import file size exceed the default maximum import size of 10G
BE conf
streaming_load_max_mb = 16000
Step 2: Calculate whether the approximate import time exceeds the default timeout value
Over the default timeout time, you need to modify the FE configuration
stream_load_default_timeout_second = 1500
Common Questions
Label Already Exists
i. Is there an import Label conflict that already exists with other import methods?
Because imported Label in Doris system does not distinguish between import methods, there is a problem that
other import methods use the same Label.
Through SHOW LOAD WHERE LABEL = "xxx"' , where XXX is a duplicate Label string, see if there is already a Label
imported by FINISHED that is the same as the Label created by the user.
ii. Are Stream loads submitted repeatedly for the same job?
Since Stream load is an HTTP protocol submission creation import task, HTTP Clients in various languages usually
have their own request retry logic. After receiving the first request, the Doris system has started to operate Stream
load, but because the result is not returned to the Client side in time, the Client side will retry to create the request. At
this point, the Doris system is already operating on the first request, so the second request will be reported to Label
Already Exists.
To sort out the possible methods mentioned above: Search FE Master's log with Label to see if there are two
redirect load action to destination = redirect load action to destination cases in the same Label. If so, the
request is submitted repeatedly by the Client side.
It is recommended that the user calculate the approximate import time based on the amount of data currently
requested, and change the request overtime on the client side to a value greater than the import timeout time
according to the import timeout time to avoid multiple submissions of the request by the client side.
In the community version 0.14.0 and earlier versions, the connection reset exception occurred after Http V2 was
enabled, because the built-in web container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is a
problem with the implementation of this protocol. All In the case of using Stream load to import a large amount of
data, a connect reset exception will occur. This is because tomcat started data transmission before the 307 jump,
which resulted in the lack of authentication information when the BE received the data request. Later, changing the
built-in container to Jetty solved this problem. If you encounter this problem, please upgrade your Doris or disable
Http V2 ( enable_http_server_v2=false ).
After the upgrade, also upgrade the http client version of your program to 4.5.13 ,Introduce the following
dependencies in your pom.xml file
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
After enabling the Stream Load record on the BE, the record cannot be queried
This is caused by the slowness of fetching records, you can try to adjust the following parameters:
i. Increase the BE configuration stream_load_record_batch_size . This configuration indicates how many Stream load
records can be pulled from BE each time. The default value is 50, which can be increased to 500.
ii. Reduce the FE configuration fetch_stream_load_record_interval_second , this configuration indicates the interval for
obtaining Stream load records, the default is to fetch once every 120 seconds, and it can be adjusted to 60 seconds.
iii. If you want to save more Stream load records (not recommended, it will take up more resources of FE), you can
increase the configuration max_stream_load_record_size of FE, the default is 5000.
More Help
For more detailed syntax used by Stream Load, you can enter HELP STREAM LOAD on the Mysql client command line for more
help.
MySql load
Since Version dev
This is an stand syntax of MySql LOAD DATA for user to load local file.
MySql load synchronously executes the import and returns the import result. The return information will show whether the
import is successful for user.
MySql load is mainly suitable for importing local files on the client side, or importing data from a data stream through a
program.
Basic Principles
The MySql Load are similar with Stream Load. Both import local files into the Doris cluster, so the MySQL Load will reuses
StreamLoad:
1. FE receives the MySQL Load request executed by the client and then analyse the SQL
4. When sending the request, FE will read the local file data from the MySQL client side streamingly, and send it to the HTTP
request of StreamLoad asynchronously.
5. After the data transfer on the MySQL client side is completed, FE waits for the StreamLoad to complete, and displays the
import success or failure information to the client side.
Basic operations
IGNORE 1 LINES
1. MySQL Load starts with the syntax LOAD DATA , and specifying LOCAL means reading client side files.
2. The local fill path will be filled after INFILE , which can be a relative path or an absolute path. Currently only a single file is
supported, and multiple files are not supported
3. The table name after INTO TABLE can specify the database name, as shown in the case. It can also be omitted, and the
current database for the user will be used.
4. PARTITION syntax supports specified partition to import
5. COLUMNS TERMINATED BY specifies the column separator
6. LINES TERMINATED BY specifies the line separator
7. IGNORE num LINES skips the num header of the CSV.
8. Column mapping syntax, see the column mapping chapter of Imported Data Transformation for specific parameters
9. PROPERTIES is the configuration of import, please refer to the MySQL Load command manual for specific properties.
LOAD DATA
INFILE '/root/server_local.csv'
IGNORE 1 LINES
1. The only difference between the syntax of importing server level local files and importing client side syntax is whether
the'LOCAL 'keyword is added after the'LOAD DATA' keyword.
2. FE will have multi-nodes, and importing server level files can only import FE nodes connected by the client side, and
cannot import files local to other FE nodes.
3. Server side load was disabled by default. Enable it by setting mysql_load_server_secure_path with a secure path. All the
load file should be under this path. Recommend create a local_import_data directory under DORIS_HOME to load data.
Return result
Since MySQL load is a synchronous import method, the imported results are returned to the user through SQL syntax.
If the
import fails, a specific error message will be displayed. If the import is successful, the number of imported rows will be
displayed.
Notice
1. If you see this LOAD DATA LOCAL INFILE file request rejected due to restrictions on access message, you should
connet mysql with mysql --local-infile=1 command to enable client to load local file.
2. The configuration for StreamLoad will also affect MySQL Load. Such as the configurate in be named
streaming_load_max_mb , it's 10GB by default and it will control the max size for one load.
More Help
1. For more detailed syntax and best practices for using MySQL Load, see the MySQL Load command manual.
S3 Load
Starting from version 0.14, Doris supports the direct import of data from online storage systems that support the S3 protocol
through the S3 protocol.
This document mainly introduces how to import data stored in AWS S3. It also supports the import of other object storage
systems that support the S3 protocol, such as Baidu Cloud’s BOS, Alibaba Cloud’s OSS and Tencent Cloud’s COS, etc.
Applicable scenarios
Source data in S3 protocol accessible storage systems, such as S3, BOS.
Data volumes range from tens to hundreds of GB.
Preparing
1. Standard AK and SK First, you need to find or regenerate AWS Access keys , you can find the generation method in My
Security Credentials of AWS console, as shown in the following figure: AK _SK Select Create New Access Key and pay
attention to save and generate AK and SK.
2. Prepare REGION and ENDPOINT REGION can be selected when creating the bucket or can be viewed in the bucket list.
ENDPOINT can be found through REGION on the following page AWS Documentation
Other cloud storage systems can find relevant information compatible with S3 in corresponding documents
Start Loading
Like Broker Load just replace WITH BROKER broker_name () with
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION"
example:
DATA INFILE("s3://your_bucket_name/your_file.txt")
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION"
PROPERTIES
"timeout" = "3600"
);
FAQ
1. S3 SDK uses virtual-hosted style by default. However, some object storage systems may not be enabled or support
virtual-hosted style access. At this time, we can add the use_path_style parameter to force the use of path style:
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "AWS_REGION",
"use_path_style" = "true"
2. Support using temporary security credentials to access object stores that support the S3 protocol:
WITH S3
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_TEMP_ACCESS_KEY",
"AWS_SECRET_KEY" = "AWS_TEMP_SECRET_KEY",
"AWS_TOKEN" = "AWS_TEMP_TOKEN",
"AWS_REGION" = "AWS_REGION"
Insert Into
The use of Insert Into statements is similar to that of Insert Into statements in databases such as MySQL. But in Doris, all data
writing is a separate import job. So Insert Into is also introduced here as an import method.
The main Insert Into command contains the following two kinds;
The second command is for Demo only, not in a test or production environment.
Note: When you need to use CTE(Common Table Expressions) as the query part in an insert operation, you must specify
the WITH LABEL and column list parts or wrap CTE . Example:
select * from (
For specific parameter description, you can refer to INSERT INTO command or execute HELP INSERT to see its help
documentation for better use of this import method.
Insert Into itself is a SQL command, and the return result is divided into the following types according to the different
execution results:
If the result set of the insert corresponding SELECT statement is empty, it is returned as follows:
Query OK indicates successful execution. 0 rows affected means that no data was loaded.
In the case where the result set is not empty. The returned results are divided into the following situations:
mysql> insert into tbl1 with label my_label1 select * from tbl2;
Query OK indicates successful execution. 4 rows affected means that a total of 4 rows of data were imported. 2
warnings indicates the number of lines to be filtered.
{'label': 'my_label1', 'status': 'visible', 'txnId': '4005', 'err': 'some other error'}
label is a user-specified label or an automatically generated label. Label is the ID of this Insert Into load job. Each
load job has a label that is unique within a single database.
status indicates whether the loaded data is visible. If visible, show visible , if not, show committed .
When user need to view the filtered rows, the user can use the following statement
The URL in the returned result can be used to query the wrong data. For details, see the following View Error Lines
Summary. "Data is not visible" is a temporary status, this batch of data must be visible eventually
You can view the visible status of this batch of data with the following statement:
If the TransactionStatus column in the returned result is visible , the data is visible.
ERROR 1064 (HY000): all partitions have no load data. Url: http://10.74.167.16:8042/api/_load_error_log?
file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de850e8de850
Where ERROR 1064 (HY000): all partitions have no load data shows the reason for the failure. The latter url can
be used to query the wrong data. For details, see the following View Error Lines Summary.
In summary, the correct processing logic for the results returned by the insert operation should be:
1. If the returned result is ERROR 1064 (HY000) , it means that the import failed.
2. If the returned result is Query OK , it means the execution was successful.
i. If rows affected is 0, the result set is empty and no data is loaded.
ii. If rows affected is greater than 0:
a. If status is committed , the data is not yet visible. You need to check the status through the show transaction
statement until visible .
b. If status is visible , the data is loaded successfully.
iii. If warnings is greater than 0, it means that some data is filtered. You can get the url through the show load
statement to see the filtered rows.
In the previous section, we described how to follow up on the results of insert operations. However, it is difficult to get the
json string of the returned result in some mysql libraries. Therefore, Doris also provides the SHOW LAST INSERT command to
explicitly retrieve the results of the last insert operation.
After executing an insert operation, you can execute SHOW LAST INSERT on the same session connection. This command
returns the result of the most recent insert operation, e.g.
TransactionId: 64067
Label: insert_ba8f33aea9544866-8ed77e2844d0cc9b
Database: default_cluster:db1
Table: t1
TransactionStatus: VISIBLE
LoadedRows: 2
FilteredRows: 0
This command returns the insert results and the details of the corresponding transaction. Therefore, you can continue to
execute the show last insert command after each insert operation to get the insert results.
Note: This command will only return the results of the last insert operation within the same session connection. If the
connection is broken or replaced with a new one, the empty set will be returned.
FE configuration
time out
The timeout time of the import task (in seconds) will be cancelled by the system if the import task is not completed
within the set timeout time, and will become CANCELLED.
At present, Insert Into does not support custom import timeout time. All Insert Into imports have a uniform timeout
time. The default timeout time is 1 hour. If the imported source file cannot complete the import within the specified time,
the parameter insert_load_default_timeout_second of FE needs to be adjusted.
Since Version dev At the same time, the Insert Into statement receives the restriction of the Session variable
`insert_timeout`. You can increase the timeout time by `SET insert_timeout = xxx;` in seconds.
Session Variables
enable_insert_strict
The Insert Into import itself cannot control the tolerable error rate of the import. Users can only use the Session
parameter enable_insert_strict . When this parameter is set to false, it indicates that at least one data has been
imported correctly, and then it returns successfully. When this parameter is set to true, the import fails if there is a data
error. The default is false. It can be set by SET enable_insert_strict = true; .
query u timeout
Insert Into itself is also an SQL command, and the Insert Into statement is restricted by the Session variable
insert_timeout . You can increase the timeout time by SET insert_timeout = xxx; in seconds.
Best Practices
Application scenarios
1. Users want to import only a few false data to verify the functionality of Doris system. The grammar of INSERT INTO
VALUES is suitable at this time.
2. Users want to convert the data already in the Doris table into ETL and import it into a new Doris table, which is suitable
for using INSERT INTO SELECT grammar.
3. Users can create an external table, such as MySQL external table mapping a table in MySQL system. Or create Broker
external tables to map data files on HDFS. Then the data from the external table is imported into the Doris table for
storage through the INSERT INTO SELECT grammar.
Data volume
Insert Into has no limitation on the amount of data, and large data imports can also be supported. However, Insert Into has a
default timeout time, and the amount of imported data estimated by users is too large, so it is necessary to modify the
system's Insert Into import timeout time.
Among them, 10M/s is the maximum import speed limit. Users need to calculate the average import speed according
to the current cluster situation to replace 10M/s in the formula.
Complete examples
Users have a table store sales in the database sales. Users create a table called bj store sales in the database sales. Users want
to import the data recorded in the store sales into the new table bj store sales. The amount of data imported is about 10G.
Cluster situation: The average import speed of current user cluster is about 5M/s
Step1: Determine whether you want to modify the default timeout of Insert Into
10G / 5M /s = 2000s
Modify FE configuration
insert_load_default_timeout_second = 2000
Since users want to ETL data from a table and import it into the target table, they should use the Insert in query\stmt
mode to import it.
INSERT INTO bj_store_sales SELECT id, total, user_id, sale_timestamp FROM store_sales where region = "bj";
Common Questions
View the wrong line
Because Insert Into can't control the error rate, it can only tolerate or ignore the error data completely by
enable_insert_strict . So if enable_insert_strict is set to true, Insert Into may fail. If enable_insert_strict is set to
false, then only some qualified data may be imported. However, in either case, Doris is currently unable to provide the
ability to view substandard data rows. Therefore, the user cannot view the specific import error through the Insert Into
statement.
The causes of errors are usually: source data column length exceeds destination data column length, column type
mismatch, partition mismatch, column order mismatch, etc. When it's still impossible to check for problems. At present,
it is only recommended that the SELECT command in the Insert Into statement be run to export the data to a file, and
then import the file through Stream load to see the specific errors.
more help
For more detailed syntax and best practices used by insert into, see insert command manual, you can also enter HELP
INSERT in the MySql client command line for more help information.
Other ways of importing data in JSON format are not currently supported.
Json format with Array as root node. Each element in the Array represents a row of data to be imported, usually an
Object. An example is as follows:
...
...
This method is typically used for Stream Load import methods to represent multiple rows of data in a batch of imported
data.
This method must be used with the setting strip_outer_array=true . Doris will expand the array when parsing, and then
parse each Object in turn as a row of data.
This method is usually used for the Routine Load import method, such as representing a message in Kafka, that is, a row
of data.
3. Multiple lines of Object data separated by a fixed delimiter
A row of data represented by Object represents a row of data to be imported. The example is as follows:
...
This method is typically used for Stream Load import methods to represent multiple rows of data in a batch of imported
data.
This method must be used with the setting read_json_by_line=true , the special delimiter also needs to specify the
line_delimiter parameter, the default is \n . When Doris parses, it will be separated according to the delimiter, and then
parse each line of Object as a line of data.
The default value is 100, The unit is MB, modify this parameter by referring to the BE configuration.
fuzzy_parse parameters
In STREAM LOAD fuzzy_parse parameter can be added to speed up JSON Data import efficiency.
This parameter is usually used to import the format of multi-line data represented by Array, so it is generally used with
strip_outer_array=true .
This feature requires that each row of data in the Array has exactly the same order of fields. Doris will only parse according to
the field order of the first row, and then access the subsequent data in the form of subscripts. This method can improve the
import efficiency by 3-5X.
Json Path
Doris supports extracting data specified in Json through Json Path.
Note: Because for Array type data, Doris will expand the array first, and finally process it in a single line according to the
Object format. So the examples later in this document are all explained with Json data in a single Object format.
If Json Path is not specified, Doris will use the column name in the table to find the element in Object by default. An
example is as follows:
Then Doris will use id , city for matching, and get the final data 123 and beijing .
Then use id , city for matching, and get the final data 123 and null .
Specify a set of Json Path in the form of a Json data. Each element in the array represents a column to extract. An
example is as follows:
["$.id", "$.name"]
Doris will use the specified Json Path for data matching and extraction.
The values that are finally matched in the preceding examples are all primitive types, such as integers, strings, and so on.
Doris currently does not support composite types, such as Array, Map, etc. So when a non-basic type is matched, Doris
will convert the type to a string in Json format and import it as a string type. An example is as follows:
"{'name':'beijing','region':'haidian'}"
match failed
Json Path is ["$.id", "$.info"] . The matched elements are 123 and null .
Doris currently does not distinguish between null values represented in Json data and null values produced when a
match fails. Suppose the Json data is:
The same result would be obtained with the following two Json Paths: 123 and null .
["$.id", "$.name"]
["$.id", "$.info"]
In order to prevent misoperation caused by some parameter setting errors. When Doris tries to match a row of data, if all
columns fail to match, it considers this to be an error row. Suppose the Json data is:
If the Json Path is written incorrectly as (or if the Json Path is not specified, the columns in the table do not contain id
and city ):
["$.ad", "$.infa"]
would cause the exact match to fail, and the line would be marked as an error line instead of yielding null, null .
In other words, it is equivalent to rearranging the columns of a Json format data according to the column order specified in
Json Path through Json Path. After that, you can map the rearranged source data to the columns of the table through
Columns. An example is as follows:
Data content:
{"k1": 1, "k2": 2}
Table Structure:
k2 int, k1 int
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -T example.json http:/
/127.0.0.1:8030/api/db1/tbl1/_stream_load
In import statement 1, only Json Path is specified, and Columns is not specified. The role of Json Path is to extract the Json
data in the order of the fields in the Json Path, and then write it in the order of the table structure. The final imported data
results are as follows:
+------+------+
| k1 | k2 |
+------+------+
| 2 | 1 |
+------+------+
You can see that the actual k1 column imports the value of the "k2" column in the Json data. This is because the field name in
Json is not equivalent to the field name in the table structure. We need to explicitly specify the mapping between the two.
Import statement 2:
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -H "columns: k2, k1 "
-T example.json http://127.0.0.1:8030/api/db1/tbl1/_stream_load
Compared with the import statement 1, the Columns field is added here to describe the mapping relationship of the
columns, in the order of k2, k1 . That is, after extracting in the order of fields in Json Path, specify the value of column k2 in
the table for the first column, and the value of column k1 in the table for the second column. The final imported data results
are as follows:
+------+------+
| k1 | k2 |
+------+------+
| 1 | 2 |
+------+------+
Of course, as with other imports, column transformations can be performed in Columns. An example is as follows:
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -H "columns: k2,
tmp_k1 , k1 = tmp_k1 * 100" -T example.json http://127.0.0.1:8030/api/db1/tbl1/_stream_load
The above example will import the value of k1 multiplied by 100. The final imported data results are as follows:
+------+------+
| k1 | k2 |
+------+------+
| 100 | 2 |
+------+------+
Json root
Doris supports extracting data specified in Json through Json root.
Note: Because for Array type data, Doris will expand the array first, and finally process it in a single line according to the
Object format. So the examples later in this document are all explained with Json data in a single Object format.
If Json root is not specified, Doris will use the column name in the table to find the element in Object by default. An
example is as follows:
Then use id , city for matching, and get the final data 123 and null
When the import data format is json, you can specify the root node of the Json data through json_root. Doris will extract
the elements of the root node through json_root for parsing. Default is empty.
The element will be treated as new json for subsequent import operations,and get the final data 321 and shanghai
{"k1": 2},
The table structure is: k1 int null, k2 varchar(32) null default "x"
The import results that users may expect are as follows, that is, for missing columns, fill in the default values.
+------+------+
| k1 | k2 |
+------+------+
| 1 | a |
+------+------+
| 2 | x |
+------+------+
|3|c|
+------+------+
But the actual import result is as follows, that is, for the missing column, NULL is added.
+------+------+
| k1 | k2 |
+------+------+
| 1 | a |
+------+------+
| 2 | NULL |
+------+------+
|3|c|
+------+------+
This is because Doris doesn't know "the missing column is column k2 in the table" from the information in the import
statement. If you want to import the above data according to the expected result, the import statement is as follows:
Application example
Stream Load
Because of the inseparability of the Json format, when using Stream Load to import a Json format file, the file content will be
fully loaded into the memory before processing begins. Therefore, if the file is too large, it may take up more memory.
Import result:
100 beijing 1
Import result:
100 beijing 1
Import result:
100 beijing 1
"id": 105,
"city": {
"order1": ["guangzhou"]
},
"code" : 6
Import result:
100 beijing 1
102 tianjin 3
103 chongqing 4
104 ["zhejiang","guangzhou"] 5
105 {"order1":["guangzhou"]} 6
StreamLoad import :
curl --location-trusted -u user:passwd -H "format: json" -H "read_json_by_line: true" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
Import result:
100 beijing 1
102 tianjin 3
103 chongqing 4
The data is still the multi-line data in Example 3, and now it is necessary to add 1 to the code column in the imported data
before importing.
Import result:
100 beijing 2
102 tianjin 4
103 chongqing 5
104 ["zhejiang","guangzhou"] 6
105 {"order1":["guangzhou"]} 7
Import result:
+------+----------------------------------+
| k1 | k2 |
+------+----------------------------------+
| 39 | [-818.2173181] |
| 40 | [100000000000000000.001111111] |
+------+----------------------------------+
Import result:
+------+------------------------------------------------------------------------------------+
| k1 | k2 |
+------+------------------------------------------------------------------------------------+
+------+------------------------------------------------------------------------------------+
Routine Load
The processing principle of Routine Load for Json data is the same as that of Stream Load. It is not repeated here.
For Kafka data sources, the content in each Massage is treated as a complete Json data. If there are multiple rows of data
represented in Array format in a Massage, multiple rows will be imported, and the offset of Kafka will only increase by 1. If an
Array format Json represents multiple lines of data, but the Json parsing fails due to the wrong Json format, the error line will
only increase by 1 (because the parsing fails, in fact, Doris cannot determine how many lines of data are contained in it, and
can only error by one line data record)
Data export
Export is a function provided by Doris to export data. This function can export user-specified table or partition data in text
format to remote storage through Broker process, such as HDFS / Object storage (supports S3 protocol) etc.
This document mainly introduces the basic principles, usage, best practices and precautions of Export.
Noun Interpretation
FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
Broker: Doris can manipulate files for remote storage through the Broker process.
Tablet: Data fragmentation. A table is divided into multiple data fragments.
Principle
After the user submits an Export job. Doris counts all Tablets involved in this job. These tablets are then grouped to generate
a special query plan for each group. The query plan reads the data on the included tablet and then writes the data to the
specified path of the remote storage through Broker. It can also be directly exported to the remote storage that supports S3
protocol through S3 protocol.
+--------+
| Client |
+---+----+
| 1. Submit Job
+---v--------------------+
| FE |
| |
| +-------------------+ |
| | ExportPendingTask | |
| +-------------------+ |
| | 2. Generate Tasks
| +--------------------+ |
| | ExportExporingTask | |
| +--------------------+ |
| |
The query plan will automatically retry three times if it encounters errors. If a query plan fails three retries, the entire job fails.
Doris will first create a temporary directory named doris_export_tmp_12345 (where 12345 is the job id) in the specified
remote storage path. The exported data is first written to this temporary directory. Each query plan generates a file with an
example file name:
export-data-c69fcf2b6db5420f-a96b94c1ff8bccef-1561453713822
Among them, c69fcf2b6db5420f-a96b94c1ff8bccef is the query ID of the query plan. 1561453713822 Timestamp generated
for the file.
When all data is exported, Doris will rename these files to the user-specified path.
Broker parameter
Export needs to use the Broker process to access remote storage. Different brokers need to provide different parameters. For
details, please refer to Broker documentation
Start Export
For detailed usage of Export, please refer to SHOW EXPORT.
Export's detailed commands can be passed through HELP EXPORT; Examples are as follows:
Export to hdfs
EXPORT TABLE db1.tbl1
PARTITION (p1,p2)
[WHERE [expr]]
TO "hdfs://host/path/to/export/"
PROPERTIES
"label" = "mylabel",
"column_separator"=",",
"columns" = "col1,col2",
"exec_mem_limit"="2147483648",
"timeout" = "3600"
)
"username" = "user",
"password" = "passwd"
);
label : The identifier of this export job. You can use this identifier to view the job status later.
column_separator : Column separator. The default is \t . Supports invisible characters, such as' \x07'.
column : columns to be exported, separated by commas, if this parameter is not filled in, all columns of the table will be
exported by default.
line_delimiter : Line separator. The default is \n . Supports invisible characters, such as' \x07'.
exec_mem_limit : Represents the memory usage limitation of a query plan on a single BE in an Export job. Default 2GB.
Unit bytes.
timeout : homework timeout. Default 2 hours. Unit seconds.
tablet_num_per_task : The maximum number of fragments allocated per query plan. The default is 5.
"AWS_ENDPOINT" = "http://host",
"AWS_ACCESS_KEY" = "AK",
"AWS_SECRET_KEY"="SK",
"AWS_REGION" = "region"
);
AWS_ACCESS_KEY / AWS_SECRET_KEY :Is your key to access the object storage API.
AWS_ENDPOINT :Endpoint indicates the access domain name of object storage external services.
AWS_REGION :Region indicates the region where the object storage data center is located.
View export status
After submitting a job, the job status can be viewed by querying the SHOW EXPORT command. The results are as follows:
JobId: 14008
State: FINISHED
Progress: 100%
Path: hdfs://host/path/to/export/
Timeout: 3600
ErrorMsg: NULL
After submitting a job, the job can be canceled by using the CANCEL EXPORT command. For example:
CANCEL EXPORT
FROM example_db
Best Practices
exec_mem_limit
Usually, a query plan for an Export job has only two parts scan - export , and does not involve computing logic that requires
too much memory. So usually the default memory limit of 2GB can satisfy the requirement. But in some scenarios, such as a
query plan, too many Tablets need to be scanned on the same BE, or too many data versions of Tablets, may lead to
insufficient memory. At this point, larger memory needs to be set through this parameter, such as 4 GB, 8 GB, etc.
Notes
It is not recommended to export large amounts of data at one time. The maximum amount of exported data
recommended by an Export job is tens of GB. Excessive export results in more junk files and higher retry costs.
If the amount of table data is too large, it is recommended to export it by partition.
During the operation of the Export job, if FE restarts or cuts the master, the Export job will fail, requiring the user to
resubmit.
If the Export job fails, the __doris_export_tmp_xxx temporary directory generated in the remote storage and the
generated files will not be deleted, requiring the user to delete them manually.
If the Export job runs successfully, the __doris_export_tmp_xxx directory generated in the remote storage may be
retained or cleared according to the file system semantics of the remote storage. For example, in object storage
(supporting the S3 protocol), after removing the last file in a directory through rename operation, the directory will also
be deleted. If the directory is not cleared, the user can clear it manually.
When the Export runs successfully or fails, the FE reboots or cuts, then some information of the jobs displayed by SHOW
EXPORT will be lost and cannot be viewed.
Export jobs only export data from Base tables, not Rollup Index.
Export jobs scan data and occupy IO resources, which may affect the query latency of the system.
Relevant configuration
FE
expo_checker_interval_second : Scheduling interval of Export job scheduler, default is 5 seconds. Setting this parameter
requires restarting FE.
export_running_job_num_limit : Limit on the number of Export jobs running. If exceeded, the job will wait and be in
PENDING state. The default is 5, which can be adjusted at run time.
Export_task_default_timeout_second : Export job default timeout time. The default is 2 hours. It can be adjusted at run
time.
export_tablet_num_per_task : The maximum number of fragments that a query plan is responsible for. The default is 5.
label : The label of this Export job. Doris will generate a label for an Export job if this param is not set.
More Help
For more detailed syntax and best practices used by Export, please refer to the Export command manual, you can also You
can enter HELP EXPORT at the command line of the MySql client for more help.
Example
Export to HDFS
Export simple query results to the file hdfs://path/to/result.txt , specifying the export format as CSV.
FORMAT AS CSV
PROPERTIES
"broker.name" = "my_broker",
"column_separator" = ",",
"line_delimiter" = "\n"
);
Concurrent export
By default, the export of the query result set is non-concurrent, that is, a single point of export. If the user wants the query
result set to be exported concurrently, the following conditions need to be met:
If the above three conditions are met, the concurrent export query result set can be triggered. Concurrency =
be_instacne_num * parallel_fragment_exec_instance_num
explain select xxx from xxx where xxx into outfile "s3://xxx" format as csv properties ("AWS_ENDPOINT" = "xxx",
...);
After explaining the query, Doris will return the plan of the query. If you find that RESULT FILE SINK appears in PLAN FRAGMENT
1 , it means that the export concurrency has been opened successfully.
If RESULT FILE SINK appears in PLAN FRAGMENT 0 , it
means that the current query cannot be exported concurrently (the current query does not satisfy the three conditions of
concurrent export at the same time).
+-----------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 1:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| |
| STORAGE TYPE: S3 |
| |
| 0:OlapScanNode |
| TABLE: multi_tablet |
+-----------------------------------------------------------------------------+
Usage example
For details, please refer to OUTFILE Document.
Return result
The command is a synchronization command. The command returns, which means the operation is over.
At the same time,
a row of results will be returned to show the exported execution result.
+------------+-----------+----------+--------------------------------------------------------------------+
+------------+-----------+----------+--------------------------------------------------------------------+
| 1 | 2 | 8 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
+------------+-----------+----------+--------------------------------------------------------------------+
+------------+-----------+----------+--------------------------------------------------------------------+
| 1 | 3 | 7 | file:///192.168.1.10/home/work/path/result_{fragment_instance_id}_ |
| 1 | 2 | 4 | file:///192.168.1.11/home/work/path/result_{fragment_instance_id}_ |
+------------+-----------+----------+--------------------------------------------------------------------+
ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ...
Notice
The CSV format does not support exporting binary types, such as BITMAP and HLL types. These types will be output as
\N , which is null.
If you do not enable concurrent export, the query result is exported by a single BE node in a single thread. Therefore, the
export time and the export result set size are positively correlated. Turning on concurrent export can reduce the export
time.
The export command does not check whether the file and file path exist. Whether the path will be automatically created
or whether the existing file will be overwritten is entirely determined by the semantics of the remote storage system.
If an error occurs during the export process, the exported file may remain on the remote storage system. Doris will not
clean these files. The user needs to manually clean up.
The timeout of the export command is the same as the timeout of the query. It can be set by SET query_timeout = xxx .
For empty result query, there will be an empty file.
File spliting will ensure that a row of data is stored in a single file. Therefore, the size of the file is not strictly equal to
max_file_size .
For functions whose output is invisible characters, such as BITMAP and HLL types, the output is \N , which is NULL.
At present, the output type of some geo functions, such as ST_Point is VARCHAR, but the actual output value is an
encoded binary character. Currently these functions will output garbled characters. For geo functions, use ST_AsText for
output.
More Help
For more detailed syntax and best practices for using OUTFILE, please refer to the OUTFILE command manual, you can also
More help information can be obtained by typing HELP OUTFILE at the command line of the MySql client.
Example
Export
1. Export the table1 table in the test database: mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases test -
-tables table1
2. Export the table1 table structure in the test database: mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --
databases test --tables table1 --no-data
3. Export all tables in the test1, test2 database: mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases test1
test2
4. Export all databases and tables mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --all-databases
For more
usage parameters, please refer to the manual of mysqldump
Import
The results exported by mysqldump can be redirected to a file, which can then be imported into Doris through the source
command source filename.sql
Notice
1. Since there is no concept of tablespace in mysql in Doris, add the --no-tablespaces parameter when using mysqldump
2. Using mysqldump to export data and table structure is only used for development and testing or when the amount of
data is small. Do not use it in a production environment with a large amount of data.
Batch Delete
Currently, Doris supports multiple import methods such as broker load, routine load, stream load, etc. The data can only be
deleted through the delete statement at present. When the delete statement is used to delete, a new data version will be
generated every time delete is executed. Frequent deletion will seriously affect the query performance, and when using the
delete method to delete, it is achieved by generating an empty rowset to record the deletion conditions. Each time you read,
you must filter the deletion jump conditions. Also when there are many conditions, Performance affects. Compared with
other systems, the implementation of greenplum is more like a traditional database product. Snowflake is implemented
through the merge syntax.
For scenarios similar to the import of cdc data, insert and delete in the data data generally appear interspersed. In this
scenario, our current import method is not enough, even if we can separate insert and delete, it can solve the import
problem , But still cannot solve the problem of deletion. Use the batch delete function to solve the needs of these scenarios.
There are three ways to merge data import:
Fundamental
This is achieved by adding a hidden column __DORIS_DELETE_SIGN__ , because we are only doing batch deletion on the unique
model, so we only need to add a hidden column whose type is bool and the aggregate function is replace. In be, the various
aggregation write processes are the same as normal columns, and there are two read schemes:
Remove __DORIS_DELETE_SIGN__ when fe encounters extensions such as *, and add the condition of __DORIS_DELETE_SIGN__
!= true by default When be reads, a column is added for judgment, and the condition is used to determine whether to
delete.
Import
When importing, set the value of the hidden column to the value of the DELETE ON expression during fe parsing. The other
aggregation behaviors are the same as the replace aggregation column.
Read
When reading, add the condition of __DORIS_DELETE_SIGN__ != true to all olapScanNodes with hidden columns, be does not
perceive this process and executes normally.
Cumulative Compaction
In Cumulative Compaction, hidden columns are treated as normal columns, and the compaction logic remains unchanged.
Base Compaction
In Base Compaction, delete the rows marked for deletion to reduce the space occupied by data.
2. For tables that have not changed the above fe configuration or for existing tables that do not support the bulk delete
function, you can use the following statement:
ALTER TABLE tablename ENABLE FEATURE "BATCH_DELETE" to enable the
batch delete.
If you want to determine whether a table supports batch delete, you can set a session variable to display the hidden columns
SET show_hidden_columns=true , and then use desc tablename , if there is a __DORIS_DELETE_SIGN__ column in the output, it is
supported, if not, it is not supported
Syntax Description
The syntax design of the import is mainly to add a column mapping that specifies the field of the delete marker column, and
it is necessary to add a column to the imported data. The syntax of various import methods is as follows:
Stream Load
The writing method of Stream Load adds a field to set the delete label column in the columns field in the header. Example
-H
"columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"
Broker Load
The writing method of Broker Load sets the field of the delete marker column at PROPERTIES , the syntax is as follows:
(tmp_c1,tmp_c2, label_c3)
SET
id=tmp_c2,
name=tmp_c1,
[DELETE ON label_c3=true]
WITH BROKER'broker'
"username"="user",
"password"="pass"
PROPERTIES
"timeout" = "3600"
);
Routine Load
The writing method of Routine Load adds a mapping to the columns field. The mapping method is the same as above. The
syntax is as follows:
[WITH MERGE|APPEND|DELETE]
[DELETE ON label=true]
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
Note
1. Since import operations other than stream load may be executed out of order inside doris, if it is not stream load when
importing using the MERGE method, it needs to be used with load sequence. For the specific syntax, please refer to the
sequence column related documents
2. DELETE ON condition can only be used with MERGE.
3. if session variable SET show_hidden_columns = true was executed before running import task to show whether table
support batch delete feature, then execute select count(*) from xxx statement in the same session after finishing
DELETE/MERGE import task, it will result in a unexpected result that the statement result set will include the deleted
results. To avoid this problem you should execute SET show_hidden_columns = false before select statement or open a
new session to run the select statement.
Usage example
+-----------------------+--------------+------+-------+---------+---------+
+-----------------------+--------------+------+-------+---------+---------+
+-----------------------+--------------+------+-------+---------+---------+
The APPEND condition can be omitted, which has the same effect as the following statement:
2. Delete all data with the same key as the imported data
Before load:
+--------+----------+----------+------+
+--------+----------+----------+------+
| 3 | 2 | tom | 2 |
| 4 | 3 | bush | 3 |
| 5 | 3 | helen | 3 |
+--------+----------+----------+------+
Load data:
3,2,tom,0
After load:
+--------+----------+----------+------+
+--------+----------+----------+------+
| 4 | 3 | bush | 3 |
| 5 | 3 | helen | 3 |
+--------+----------+----------+------+
3. Import the same row as the key column of the row with site_id=1
Before load:
+--------+----------+----------+------+
+--------+----------+----------+------+
| 4 | 3 | bush | 3 |
| 5 | 3 | helen | 3 |
| 1 | 1 | jim | 2 |
+--------+----------+----------+------+
Load data:
2,1,grace,2
3,2,tom,2
1,1,jim,2
After load:
+--------+----------+----------+------+
+--------+----------+----------+------+
| 4 | 3 | bush | 3 |
| 2 | 1 | grace | 2 |
| 3 | 2 | tom | 2 |
| 5 | 3 | helen | 3 |
+--------+----------+----------+------+
update
This article mainly describes how to use the UPDATE command to operate if we need to modify or update the data in Doris.
The data update is limited to the version of Doris and can only be used in Doris Version 0.15.x +.
Applicable scenarios
Modify its value for rows that meet certain conditions;
Point update, small range update, the row to be updated is preferably a very small part of the entire table;
The update command can only be executed on a table with a Unique data model.
Fundamentals
Use the query engine's own where filtering logic to filter the rows that need to be updated from the table to be updated.
Then use the Unique model's own Value column replacement logic to change the rows to be updated and reinsert them into
the table. This enables row-level updates.
Synchronization
The Update syntax is a synchronization syntax in Doris. If the Update statement succeeds, the update succeeds and the data
is visible.
Performance
The performance of the Update statement is closely related to the number of rows to be updated and the retrieval efficiency
of the condition.
Number of rows to be updated: The more rows to be updated, the slower the Update statement will be. This is consistent
with the principle of importing.
Doris updates are more suitable for occasional update scenarios, such as changing the
values of individual rows.
Doris is not suitable for large batches of data changes. Large modifications can make Update
statements take a long time to run.
Condition retrieval efficiency: Doris Update implements the principle of reading the rows that satisfy the condition first,
so if the condition retrieval efficiency is high, the Update will be faster.
The condition column should ideally be hit,
indexed, or bucket clipped. This way Doris does not need to scan the entire table and can quickly locate the rows that
need to be updated. This improves update efficiency.
It is strongly discouraged to include the UNIQUE model value
column in the condition column.
Concurrency Control
By default, multiple concurrent Update operations on the same table are not allowed at the same time.
The main reason for this is that Doris currently supports row updates, which means that even if the user declares SET v2 =
1 , virtually all other Value columns will be overwritten (even though the values are not changed).
This presents a problem in that if two Update operations update the same row at the same time, the behavior may be
indeterminate. That is, there may be dirty data.
However, in practice, the concurrency limit can be turned on manually if the user himself can guarantee that even if
concurrent updates are performed, they will not operate on the same row at the same time. This is done by modifying the FE
configuration enable_concurrent_update . When the configuration value is true, there is no limit on concurrent updates.
Risks of use
Since Doris currently supports row updates and uses a two-step read-and-write operation, there is uncertainty about the
outcome of an Update statement if it modifies the same row as another Import or Delete statement.
Therefore, when using Doris, you must be careful to control the concurrency of Update statements and other DML
statements on the user side itself.
Usage example
Suppose there is an order table in Doris, where the order id is the Key column, the order status and the order amount are the
Value column. The data status is as follows:
+----------+--------------+--------------+
+----------+--------------+--------------+
| 1 | 100 | 待付款 |
+----------+--------------+--------------+
At this time, after the user clicks to pay, the Doris system needs to change the status of the order with the order id ' 1' to
'Pending Shipping', and the Update function needs to be used.
+----------+--------------+------------------+
+----------+--------------+------------------+
+----------+--------------+------------------+
After the user executes the UPDATE command, the system performs the following three steps.
Step 1: Read the rows that satisfy WHERE order id=1 (1, 100, 'pending payment')
Step 2: Change the order status of the row
from 'Pending Payment' to 'Pending Shipping' (1, 100, 'Pending shipment')
Step 3: Insert the updated row back into the table
to achieve the updated effect.
Since the table order is a UNIQUE model, the rows with the same Key, after which the latter will take effect, so the final effect
is as follows.
More Help
For more detailed syntax used by data update, please refer to the update command manual , you can also enter HELP
UPDATE in the Mysql client command line to get more help information.
Delete
Delete is different from other import methods. It is a synchronization process, similar to Insert into. All Delete operations are
an independent import job in Doris. Generally, the Delete statement needs to specify the table and partition and delete
conditions to filter the data to be deleted. , and will delete the data of the base table and the rollup table at the same time.
Syntax
Please refer to the official website for the DELETE syntax of the delete operation.
Delete Result
The delete command is an SQL command, and the returned results are synchronous. It can be divided into the following
types:
1. Successful visible
If delete completes successfully and is visible, the following results will be returned, query OK indicates success.
The transaction submission of Doris is divided into two steps: submission and publish version. Only after the publish
version step is completed, the result will be visible to the user. If it has been submitted successfully, then it can be
considered that the publish version step will eventually success. Doris will try to wait for publishing for a period of time
after submitting. If it has timed out, even if the publishing version has not been completed, it will return to the user in
priority and prompt the user that the submission has been completed but not visible. If delete has been committed and
executed, but has not been published and visible, the following results will be returned.
affected rows : Indicates the row affected by this deletion. Since the deletion of Doris is currently a logical deletion, the
value is always 0.
label : The label generated automatically to be the signature of the delete jobs. Each job has a unique label within a
single database.
status : Indicates whether the data deletion is visible. If it is visible, visible will be displayed. If it is not visible,
committed will be displayed.
If the delete statement is not submitted successfully, it will be automatically aborted by Doris and the following results
will be returned
example:
A timeout deletion will return the timeout and unfinished replicas displayed as (tablet = replica)
ERROR 1064 (HY000): errCode = 2, detailMessage = failed to delete replicas from job: 4005, Unfinished
replicas:10000=60000, 10001=60000, 10002=60000
The correct processing logic for the returned results of the delete operation is as follows:
i. If status is committed , the data deletion is committed and will be eventually invisible. Users can wait for a while and
then use the show delete command to view the results.
ii. If status is visible , the data have been deleted successfully.
In general, Doris's deletion timeout is limited from 30 seconds to 5 minutes. The specific time can be adjusted through the
following configuration items
tablet_delete_timeout_second
The timeout of delete itself can be elastically changed by the number of tablets in the specified partition. This
configuration represents the average timeout contributed by a tablet. The default value is 2.
Assuming that there are 5 tablets under the specified partition for this deletion, the timeout time available for the
deletion is 10 seconds. Because the minimum timeout is 30 seconds which is higher than former timeout time, the final
timeout is 30 seconds.
load_straggler_wait_second
If the user estimates a large amount of data, so that the upper limit of 5 minutes is insufficient, the user can adjust the
upper limit of timeout through this item, and the default value is 300.
query_timeout
Because delete itself is an SQL command, the deletion statement is also limited by the session variables, and the timeout
is also affected by the session value query'timeout . You can increase the value by set query'timeout = xxx .
InPredicate configuration
max_allowed_in_element_num_of_delete
If the user needs to take a lot of elements when using the in predicate, the user can adjust the upper limit of the allowed
in elements number, and the default value is 1024.
Syntax
example
+-----------+---------------+---------------------+-----------------+----------+
+-----------+---------------+---------------------+-----------------+----------+
+-----------+---------------+---------------------+-----------------+----------+
Note
Unlike the Insert into command, delete cannot specify label manually. For the concept of label, see the Insert Into
documentation.
More Help
For more detailed syntax used by delete, see the delete command manual, You can also enter HELP DELETE in the Mysql
client command line to get more help information
Sequence Column
The sequence column currently only supports the Uniq model. The Uniq model is mainly aimed at scenarios that require a
unique primary key, which can guarantee the uniqueness constraint of the primary key. However, due to the REPLACE
aggregation method, the replacement order of data imported in the same batch is not guaranteed. See Data Model. If the
replacement order cannot be guaranteed, the specific data finally imported into the table cannot be determined, and there is
uncertainty.
In order to solve this problem, Doris supports the sequence column. The user specifies the sequence column when
importing. Under the same key column, the REPLACE aggregation type column will be replaced according to the value of
the sequence column. The larger value can replace the smaller value, and vice versa. Cannot be replaced. This method leaves
the determination of the order to the user, who controls the replacement order.
Applicable scene
Sequence columns can only be used under the Uniq data model.
Fundamental
By adding a hidden column __DORIS_SEQUENCE_COL__ , the type of the column is specified by the user when creating the table,
the specific value of the column is determined during import, and the REPLACE column is replaced according to this value.
Create Table
When creating a Uniq table, a hidden column __DORIS_SEQUENCE_COL__ will be automatically added according to the user-
specified type.
Import
When importing, fe sets the value of the hidden column to the value of the order by expression (broker load and routine
load), or the value of the function_column.sequence_col expression (stream load) during the parsing process, the value
column will be Replace with this value. The value of the hidden column __DORIS_SEQUENCE_COL__ can be set to either a
column in the data source or a column in the table structure.
Read
When the request contains the value column, the __DORIS_SEQUENCE_COL__ column needs to be additionally read. This
column is used as the basis for the replacement order of the REPLACE aggregate function under the same key column. The
larger value can replace the smaller value, otherwise it cannot be replaced.
Cumulative Compaction
The principle is the same as that of the reading process during Cumulative Compaction.
Base Compaction
The principle is the same as the reading process during Base Compaction.
Syntax
There are two ways to create a table with sequence column. One is to set the sequence_col attribute when creating a table,
and the other is to set the sequence_type attribute when creating a table.
Set sequence_col (recommend)
When you create the Uniq table, you can specify the mapping of sequence column to other columns
PROPERTIES (
"function_column.sequence_col" = 'column_name',
);
The sequence_col is used to specify the mapping of the sequence column to a column in the table, which can be integral and
time (DATE, DATETIME). The type of this column cannot be changed after creation.
The import method is the same as that without the sequence column. It is relatively simple and recommended.
Set sequence_type
When you create the Uniq table, you can specify the sequence column type
PROPERTIES (
"function_column.sequence_type" = 'Date',
);
The sequence_type is used to specify the type of the sequence column, which can be integral and time (DATE / DATETIME).
Stream Load
The syntax of the stream load is to add the mapping of hidden columns corresponding to source_sequence in the
function_column.sequence_col field in the header, for example
Broker Load
Set the source_sequence field for the hidden column map at ORDER BY
DATA INFILE("hdfs://host:port/user/data/*/test.txt")
(k1,k2,source_sequence,v1,v2)
ORDER BY source_sequence
"username"="user",
"password"="pass"
PROPERTIES
"timeout" = "3600"
);
Routine Load
The mapping method is the same as above, as shown below
CREATE ROUTINE LOAD example_db.test1 ON example_tbl
[WITH MERGE|APPEND|DELETE]
[ORDER BY source_sequence]
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
If you are not sure whether a table supports sequence column, you can display hidden columns by setting a session variable
SET show_hidden_columns=true , then use desc tablename , if there is a __DORIS_SEQUENCE_COL__ column in the output, it is
supported, if not, it is not supported .
Usage example
Let's take the stream Load as an example to show how to use it
Create the test_table data table of the unique model and specify that the sequence column maps to the modify_date
column in the table.
user_id bigint,
date date,
group_id bigint,
modify_date date,
keyword VARCHAR(128)
PROPERTIES(
"function_column.sequence_col" = 'modify_date',
"replication_num" = "1",
"in_memory" = "false"
);
+-------------+--------------+------+-------+---------+---------+
+-------------+--------------+------+-------+---------+---------+
+-------------+--------------+------+-------+---------+---------+
1 2020-02-22 1 2020-02-21 a
1 2020-02-22 1 2020-02-22 b
1 2020-02-22 1 2020-03-05 c
1 2020-02-22 1 2020-02-26 d
1 2020-02-22 1 2020-02-23 e
1 2020-02-22 1 2020-02-24 b
Take the Stream Load as an example here and map the sequence column to the modify_date column
The results is
+---------+------------+----------+-------------+---------+
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-05 | c |
+---------+------------+----------+-------------+---------+
In this import, the c is eventually retained in the keyword column because the value of the sequence column (the value in
modify_date) is the maximum value: '2020-03-05'.
After the above steps are completed, import the following data
1 2020-02-22 1 2020-02-22 a
1 2020-02-22 1 2020-02-23 b
Query data
+---------+------------+----------+-------------+---------+
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-05 | c |
+---------+------------+----------+-------------+---------+
In this import, the c is eventually retained in the keyword column because the value of the sequence column (the value in
modify_date) in all imports is the maximum value: '2020-03-05'.
Try importing the following data again
1 2020-02-22 1 2020-02-22 a
1 2020-02-22 1 2020-03-23 w
Query data
+---------+------------+----------+-------------+---------+
+---------+------------+----------+-------------+---------+
| 1 | 2020-02-22 | 1 | 2020-03-23 | w |
+---------+------------+----------+-------------+---------+
At this point, you can replace the original data in the table. To sum up, the sequence column will be compared among all the
batches, the largest value of the same key will be imported into Doris table.
Schema Change
Users can modify the schema of existing tables through the Schema Change operation. Doris currently supports the
following modifications:
This document mainly describes how to create a Schema Change job, as well as some considerations and frequently asked
questions about Schema Change.
Glossary
Base Table: When each table is created, it corresponds to a base table. The base table stores the complete data of this
table. Rollups are usually created based on the data in the base table (and can also be created from other rollups).
Index: Materialized index. Rollup or Base Table are both called materialized indexes.
Transaction: Each import task is a transaction, and each transaction has a unique incrementing Transaction ID.
Rollup: Roll-up tables based on base tables or other rollups.
Basic Principles
The basic process of executing a Schema Change is to generate a copy of the index data of the new schema from the data of
the original index. Among them, two parts of data conversion are required. One is the conversion of existing historical data,
and the other is the conversion of newly arrived imported data during the execution of Schema Change.
+----------+
| Load Job |
+----+-----+
| +------------------+ +---------------+
| +------------------+ +------+--------+
| |
| |
| +------------------+ +------v--------+
+------------------+ +---------------+
Before starting the conversion of historical data, Doris will obtain a latest transaction ID. And wait for all import transactions
before this Transaction ID to complete. This Transaction ID becomes a watershed. This means that Doris guarantees that all
import tasks after the watershed will generate data for both the original Index and the new Index. In this way, when the
historical data conversion is completed, the data in the new Index can be guaranteed to be complete.
Create Job
The specific syntax for creating a Schema Change can be found in the help ALTER TABLE COLUMN for the description of the
Schema Change section .
The creation of Schema Change is an asynchronous process. After the job is submitted successfully, the user needs to view
the job progress through the SHOW ALTER TABLE COLUMN command.
View Job
SHOW ALTER TABLE COLUMN You can view the Schema Change jobs that are currently executing or completed. When multiple
indexes are involved in a Schema Change job, the command displays multiple lines, each corresponding to an index. For
example:
JobId: 20021
TableName: tbl1
IndexName: tbl1
IndexId: 20022
OriginIndexId: 20017
SchemaVersion: 2:792557838
TransactionId: 10023
State: FINISHED
Msg:
Progress: NULL
Timeout: 86400
Best Practice
Schema Change can make multiple changes to multiple indexes in one job. For example:
Source Schema:
+-----------+-------+------+------+------+---------+-------+
+-----------+-------+------+------+------+---------+-------+
| | | | | | | |
| | | | | | | |
+-----------+-------+------+------+------+---------+-------+
You can add a row k4 to both rollup1 and rollup2 by adding the following k5 to rollup2:
+-----------+-------+------+------+------+---------+-------+
+-----------+-------+------+------+------+---------+-------+
| | k4 | INT | No | true | 1 | |
| | k5 | INT | No | true | 1 | |
| | | | | | | |
| | k4 | INT | No | true | 1 | |
| | k5 | INT | No | true | 1 | |
| | | | | | | |
| | k4 | INT | No | true | 1 | |
+-----------+-------+------+------+------+---------+-------+
As you can see, the base table tbl1 also automatically added k4, k5 columns. That is, columns added to any rollup are
automatically added to the Base table.
At the same time, columns that already exist in the Base table are not allowed to be added to Rollup. If you need to do this,
you can re-create a Rollup with the new columns and then delete the original Rollup.
Source Schema :
+-----------+-------+-------------+------+------+---------+-------+
+-----------+-------+-------------+------+------+---------+-------+
+-----------+-------+-------------+------+------+---------+-------+
The modification statement is as follows, we will change the degree of the k3 column to 50
alter table example_tbl modify column k3 varchar(50) key null comment 'to 50'
+-----------+-------+-------------+------+------+---------+-------+
+-----------+-------+-------------+------+------+---------+-------+
+-----------+-------+-------------+------+------+---------+-------+
Because the Schema Chanage job is an asynchronous operation, only one Schema chanage job can be performed on the
same table at the same time. To check the operation status of the job, you can use the following command
Notice
Only one Schema Change job can be running on a table at a time.
Schema Change operation does not block import and query operations.
If there is a value column aggregated by REPLACE in the schema, the Key column is not allowed to be deleted.
If the Key column is deleted, Doris cannot determine the value of the REPLACE column.
All non-Key columns of the Unique data model table are REPLACE aggregated.
When adding a value column whose aggregation type is SUM or REPLACE, the default value of this column has no
meaning to historical data.
Because the historical data has lost the detailed information, the default value cannot actually reflect the aggregated
value.
When modifying the column type, fields other than Type need to be completed according to the information on the
original column.
If you modify the column k1 INT SUM NULL DEFAULT" 1 " as type BIGINT, you need to execute the following command:
```ALTER TABLE tbl1 MODIFY COLUMN `k1` BIGINT SUM NULL DEFAULT "1"; ```
Note that in addition to the new column types, such as the aggregation mode, Nullable attributes, and default values must
be completed according to the original information.
Modifying column names, aggregation types, nullable attributes, default values, and column comments is not supported.
FAQ
the execution speed of Schema Change
At present, the execution speed of Schema Change is estimated to be about 10MB / s according to the worst efficiency.
To be conservative, users can set the timeout for jobs based on this rate.
Schema Change can only be started when the table data is complete and unbalanced. If some data shard copies of the
table are incomplete, or if some copies are undergoing an equalization operation, the submission is rejected.
Whether the data shard copy is complete can be checked with the following command:
ADMIN SHOW REPLICA STATUS
FROM tbl WHERE STATUS != "OK";
If a result is returned, there is a problem with the copy. These problems are usually fixed automatically by the system.
You can also use the following commands to repair this table first:
ADMIN REPAIR TABLE tbl1;
You can check if there are running balancing tasks with the following command:
You can wait for the balancing task to complete, or temporarily disable the balancing operation with the following
command:
Configurations
FE Configurations
alter_table_timeout_second : The default timeout for the job is 86400 seconds.
BE Configurations
alter_tablet_worker_count : Number of threads used to perform historical data conversion on the BE side. The default is
3. If you want to speed up the Schema Change job, you can increase this parameter appropriately and restart the BE. But
too many conversion threads can cause increased IO pressure and affect other operations. This thread is shared with the
Rollup job.
More Help
For more detailed syntax and best practices used by Schema Change, see ALTER TABLE COLUMN command manual, you
can also enter HELP ALTER TABLE COLUMN in the MySql client command line for more help information.
Replace Table
In version 0.14, Doris supports atomic replacement of two tables.
This operation only applies to OLAP tables.
For partition level replacement operations, please refer to Temporary Partition Document
Syntax
ALTER TABLE [db.]tbl1 REPLACE WITH tbl2
[PROPERTIES('swap' = 'true')];
If the swap parameter is true , after replacement, the data in the table named tbl1 is the data in the original tbl2 table. The
data in the table named tbl2 is the data in the original table tbl1 . That is, the data of the two tables are interchanged.
If the swap parameter is false , after replacement, the data in the table named tbl1 is the data in the original tbl2 table.
The table named tbl2 is dropped.
Principle
The replacement table function actually turns the following set of operations into an atomic operation.
Suppose you want to replace table A with table B, and swap is true , the operation is as follows:
1. Drop table A.
2. Rename table B to table A.
Notice
1. The swap parameter defaults to true . That is, the replacement table operation is equivalent to the exchange of two table
data.
2. If the swap parameter is set to false , the replaced table (table A) will be dropped and cannot be recovered.
3. The replacement operation can only occur between two OLAP tables, and the table structure of the two tables is not
checked for consistency.
4. The replacement operation will not change the original permission settings. Because the permission check is based on
the table name.
Best Practices
1. Atomic Overwrite Operation
In some cases, the user wants to be able to rewrite the data of a certain table, but if it is dropped and then imported,
there will be a period of time in which the data cannot be viewed. At this time, the user can first use the CREATE TABLE
LIKE statement to create a new table with the same structure, import the new data into the new table, and replace the
old table atomically through the replacement operation to achieve the goal. For partition level atomic overwrite
operation, please refer to Temporary partition document
Dynamic Partition
Dynamic partition is a new feature introduced in Doris version 0.12. It's designed to manage partition's Time-to-Life (TTL),
reducing the burden on users.
At present, the functions of dynamically adding partitions and dynamically deleting partitions are realized.
Noun Interpretation
FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
Principle
In some usage scenarios, the user will partition the table according to the day and perform routine tasks regularly every day.
At this time, the user needs to manually manage the partition. Otherwise, the data load may fail because the user does not
create a partition. This brings additional maintenance costs to the user.
Through the dynamic partitioning feature, users can set the rules of dynamic partitioning when building tables. FE will start a
background thread to create or delete partitions according to the rules specified by the user. Users can also change existing
rules at runtime.
Usage
Establishment of tables
The rules for dynamic partitioning can be specified when the table is created or modified at runtime. Currently,dynamic
partition rules can only be set for partition tables with single partition columns.
(...)
PROPERTIES
"dynamic_partition.prop1" = "value1",
"dynamic_partition.prop2" = "value2",
...
Modify at runtime
"dynamic_partition.prop1" = "value1",
"dynamic_partition.prop2" = "value2",
...
dynamic_partition.enable
Whether to enable the dynamic partition feature. Can be specified as TRUE or FALSE . If not filled, the default is TRUE . If it
is FALSE , Doris will ignore the dynamic partitioning rules of the table.
dynamic_partition.time_unit
The unit for dynamic partition scheduling. Can be specified as HOUR , DAY , WEEK , and MONTH , means to create or delete
partitions by hour, day, week, and month, respectively.
When specified as HOUR , the suffix format of the dynamically created partition name is yyyyMMddHH , for example,
2020032501 . When the time unit is HOUR, the data type of partition column cannot be DATE.
When specified as DAY , the suffix format of the dynamically created partition name is yyyyMMdd , for example, 20200325 .
When specified as WEEK , the suffix format of the dynamically created partition name is yyyy_ww . That is, the week of the
year of current date. For example, the suffix of the partition created for 2020-03-25 is 2020_13 , indicating that it is
currently the 13th week of 2020.
When specified as MONTH , the suffix format of the dynamically created partition name is yyyyMM , for example, 202003 .
dynamic_partition.time_zone
The time zone of the dynamic partition, if not filled in, defaults to the time zone of the current machine's system, such as
Asia/Shanghai , if you want to know the supported TimeZone, you can found in
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones .
dynamic_partition.start
The starting offset of the dynamic partition, usually a negative number. Depending on the time_unit attribute, based on
the current day (week / month), the partitions with a partition range before this offset will be deleted. If not filled, the
default is -2147483648 , that is, the history partition will not be deleted.
dynamic_partition.end
The end offset of the dynamic partition, usually a positive number. According to the difference of the time_unit attribute,
the partition of the corresponding range is created in advance based on the current day (week / month).
dynamic_partition.prefix
dynamic_partition.buckets
dynamic_partition.replication_num
The replication number of dynamic partition.If not filled in, defaults to the number of table's replication number.
dynamic_partition.start_day_of_week
When time_unit is WEEK , this parameter is used to specify the starting point of the week. The value ranges from 1 to 7.
Where 1 is Monday and 7 is Sunday. The default is 1, which means that every week starts on Monday.
dynamic_partition.start_day_of_month
When time_unit is MONTH , this parameter is used to specify the start date of each month. The value ranges from 1 to 28. 1
means the 1st of every month, and 28 means the 28th of every month. The default is 1, which means that every month starts
at 1st. The 29, 30 and 31 are not supported at the moment to avoid ambiguity caused by lunar years or months.
dynamic_partition.create_history_partition
The default is false. When set to true, Doris will automatically create all partitions, as described in the creation rules below.
At the same time, the parameter max_dynamic_partition_num of FE will limit the total number of partitions to avoid
creating too many partitions at once. When the number of partitions expected to be created is greater than
max_dynamic_partition_num , the operation will fail.
When the start attribute is not specified, this parameter has no effect.
dynamic_partition.history_partition_num
When create_history_partition is true , this parameter is used to specify the number of history partitions. The default
value is -1, which means it is not set.
dynamic_partition.hot_partition_num
Specify how many of the latest partitions are hot partitions. For hot partition, the system will automatically set its
storage_medium parameter to SSD, and set storage_cooldown_time .
Note: If there is no SSD disk path under the storage path, configuring this parameter will cause dynamic partition
creation to fail.
Let us give an example. Suppose today is 2021-05-20, partition by day, and the properties of dynamic partition are set to:
hot_partition_num=2, end=3, start=-3. Then the system will automatically create the following partitions, and set the
storage_medium and storage_cooldown_time properties:
dynamic_partition.reserved_history_periods
The range of reserved history periods. It should be in the form of [yyyy-MM-dd,yyyy-MM-dd],[...,...] while the
dynamic_partition.time_unit is "DAY, WEEK, and MONTH". And it should be in the form of [yyyy-MM-dd HH:mm:ss,yyyy-
MM-dd HH:mm:ss],[...,...] while the dynamic_partition.time_unit is "HOUR". And no more spaces expected. The
default value is "NULL"`, which means it is not set.
,
Let us give an example. Suppose today is 2021-09-06 partitioned by day, and the properties of dynamic partition are set
to:
time_unit="DAY/WEEK/MONTH", end=3, start=-3, reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-
31,2020-11-15]" .
["2020-06-01","2020-06-20"],
["2020-10-31","2020-11-15"]
or
Otherwise, every [...,...] in reserved_history_periods is a couple of properties, and they should be set at the same
time. And the first date can't be larger than the second one.
dynamic_partition.storage_medium
Specifies the default storage medium for the created dynamic partition. HDD is the default, SSD can be selected.
Note that when set to SSD, the hot_partition_num property will no longer take effect, all partitions will default to SSD
storage media and the cooldown time will be 9999-12-31 23:59:59.
Assuming the number of history partitions to be created is expect_create_partition_num , the number is as follows
according to different settings.
1. create_history_partition = true
dynamic_partition.history_partition_num is set
2. create_history_partition = false
When expect_create_partition_num is greater than max_dynamic_partition_num (default 500), creating too many partitions
is prohibited.
Examples:
1. Suppose today is 2021-05-20, partition by day, and the attributes of dynamic partition are set to
create_history_partition=true, end=3, start=-3, history_partition_num=1 , then the system will automatically create
the following partitions.
p20210519
p20210520
p20210521
p20210522
p20210523
2. history_partition_num=5 and keep the rest attributes as in 1, then the system will automatically create the following
partitions.
p20210517
p20210518
p20210519
p20210520
p20210521
p20210522
p20210523
3. history_partition_num=-1 i.e., if you do not set the number of history partitions and keep the rest of the attributes as in 1,
the system will automatically create the following partitions.
p20210517
p20210518
p20210519
p20210520
p20210521
p20210522
p20210523
Notice
If some partitions between dynamic_partition.start and dynamic_partition.end are lost due to some unexpected
circumstances when using dynamic partition, the lost partitions between the current time and dynamic_partition.end will
be recreated, but the lost partitions between dynamic_partition.start and the current time will not be recreated.
Example
1. Table tbl1 partition column k1, type is DATE, create a dynamic partition rule. By day partition, only the partitions of the
last 7 days are kept, and the partitions of the next 3 days are created in advance.
k1 DATE,
...
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-7",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32"
);
Suppose the current date is 2020-05-29. According to the above rules, tbl1 will produce the following partitions:
On the next day, 2020-05-30, a new partition will be created p20200602: [" 2020-06-02 "," 2020-06-03 ")
On 2020-06-06, because dynamic_partition.start is set to 7, the partition 7 days ago will be deleted, that is, the partition
p20200529 will be deleted.
2. Table tbl1 partition column k1, type is DATETIME, create a dynamic partition rule. Partition by week, only keep the
partition of the last 2 weeks, and create the partition of the next 2 weeks in advance.
k1 DATETIME,
...
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "WEEK",
"dynamic_partition.start" = "-2",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "8"
);
Suppose the current date is 2020-05-29, which is the 22nd week of 2020. The default week starts on Monday. Based on
the above rules, tbl1 will produce the following partitions:
The start date of each partition is Monday of the week. At the same time, because the type of the partition column k1 is
DATETIME, the partition value will fill the hour, minute and second fields, and all are 0.
On 2020-06-15, the 25th week, the partition 2 weeks ago will be deleted, ie p2020_22 will be deleted.
In the above example, suppose the user specified the start day of the week as "dynamic_partition.start_day_of_week" =
"3" , that is, set Wednesday as the start of week. The partition is as follows:
That is, the partition ranges from Wednesday of the current week to Tuesday of the next week.
Note: 2019-12-31 and 2020-01-01 are in same week, if the starting date of the partition is 2019-12-31, the partition name
is p2019_53 , if the starting date of the partition is 2020-01 -01, the partition name is p2020_01 .
3. Table tbl1 partition column k1, type is DATE, create a dynamic partition rule. Partition by month without deleting
historical partitions, and create partitions for the next 2 months in advance. At the same time, set the starting date on the
3rd of each month.
k1 DATE,
...
PARTITION BY RANGE(k1) ()
DISTRIBUTED BY HASH(k1)
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "MONTH",
"dynamic_partition.end" = "2",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "8",
"dynamic_partition.start_day_of_month" = "3"
);
Suppose the current date is 2020-05-29. Based on the above rules, tbl1 will produce the following partitions:
Because dynamic_partition.start is not set, the historical partition will not be deleted.
Assuming that today is 2020-05-20, and set 28th as the start of each month, the partition range is:
"dynamic_partition.prop1" = "value1",
...
);
The modification of certain attributes may cause conflicts. Assume that the partition granularity was DAY and the following
partitions have been created:
If the partition granularity is changed to MONTH at this time, the system will try to create a partition with the range ["2020-
05-01", "2020-06-01") , and this range conflicts with the existing partition. So it cannot be created. And the partition with the
range ["2020-06-01", "2020-07-01") can be created normally. Therefore, the partition between 2020-05-22 and 2020-05-30
needs to be filled manually.
Check Dynamic Partition Table Scheduling Status
You can further view the scheduling of dynamic partitioned tables by using the following command:
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+-----------
----------+--------+------------------------+----------------------+-------------------------+
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+-----------
----------+--------+------------------------+----------------------+-------------------------+
+-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+-----------
----------+--------+------------------------+----------------------+-------------------------+
Advanced Operation
FE Configuration Item
dynamic_partition_enable
Whether to enable Doris's dynamic partition feature. The default value is false, which is off. This parameter only affects
the partitioning operation of dynamic partition tables, not normal tables. You can modify the parameters in fe.conf and
restart FE to take effect. You can also execute the following commands at runtime to take effect:
MySQL protocol:
HTTP protocol:
dynamic_partition_check_interval_seconds
The execution frequency of dynamic partition threads defaults to 3600 (1 hour), that is, scheduling is performed every 1
hour. You can modify the parameters in fe.conf and restart FE to take effect. You can also modify the following
commands at runtime:
MySQL protocol:
HTTP protocol:
When dynamic partitioning feature is enabled, Doris will no longer allow users to manage partitions manually, but will
automatically manage partitions based on dynamic partition properties.
NOTICE: If dynamic_partition.start is set, historical partitions with a partition range before the start offset of the dynamic
partition will be deleted.
When dynamic partitioning feature is disabled, Doris will no longer manage partitions automatically, and users will have to
create or delete partitions manually by using ALTER TABLE .
Common problem
1. After creating the dynamic partition table, it prompts Could not create table with dynamic partition when fe config
dynamic_partition_enable is false
Because the main switch of dynamic partition, that is, the configuration of FE dynamic_partition_enable is false, the
dynamic partition table cannot be created.
At this time, please modify the FE configuration file, add a line dynamic_partition_enable=true , and restart FE. Or
execute the command ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true") to turn on the dynamic
partition switch.
Dynamic partitions are automatically created by scheduling logic inside the system. When creating a partition
automatically, the partition properties (including the number of replicas of the partition, etc.) are all prefixed with
dynamic_partition , rather than the default properties of the table. for example:
CREATE TABLE tbl1 (
`k1` int,
`k2` date
PARTITION BY RANGE(k2)()
DISTRIBUTED BY HASH(k1) BUCKETS 3
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32",
"dynamic_partition.replication_num" = "1",
"dynamic_partition.start" = "-3",
"replication_num" = "3"
);
In this example, no initial partition is created (partition definition in PARTITION BY clause is empty), and DISTRIBUTED BY
HASH(k1) BUCKETS 3 , "replication_num" = "3" , "dynamic_partition is set. replication_num" = "1 and
"dynamic_partition.buckets" = "32" .
We make the first two parameters the default parameters for the table, and the last two parameters the dynamic
partition-specific parameters.
When the system automatically creates a partition, it will use the two configurations of bucket number 32 and replica
number 1 (that is, parameters dedicated to dynamic partitions). Instead of the two configurations of bucket number 3
and replica number 3.
When a user manually adds a partition through the ALTER TABLE tbl1 ADD PARTITION statement, the two configurations
of bucket number 3 and replica number 3 (that is, the default parameters of the table) will be used.
That is, dynamic partitioning uses a separate set of parameter settings. The table's default parameters are used only if no
dynamic partition-specific parameters are set. as follows:
`k1` int,
`k2` date
PARTITION BY RANGE(k2)()
DISTRIBUTED BY HASH(k1) BUCKETS 3
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.start" = "-3",
"dynamic_partition.buckets" = "32",
"replication_num" = "3"
);
In this example, if dynamic_partition.replication_num is not specified separately, the default parameter of the table is
used, which is "replication_num" = "3" .
`k1` int,
`k2` date
PARTITION BY RANGE(k2)(
PROPERTIES
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.start" = "-3",
"dynamic_partition.buckets" = "32",
"dynamic_partition.replication_num" = "1",
"replication_num" = "3"
);
In this example, there is a manually created partition p1. This partition will use the default settings for the table, which are
3 buckets and 3 replicas. The dynamic partitions automatically created by the subsequent system will still use the special
parameters for dynamic partitions, that is, the number of buckets is 32 and the number of replicas is 1.
More Help
For more detailed syntax and best practices for using dynamic partitions, see SHOW DYNAMIC PARTITION Command
manual, you can also enter HELP ALTER TABLE in the MySql client command line for more help information.
Temporary partition
Since version 0.12, Doris supports temporary partitioning.
A temporary partition belongs to a partitioned table. Only partitioned tables can create temporary partitions.
Rules
The partition columns of the temporary partition is the same as the formal partition and cannot be modified.
The partition ranges of all temporary partitions of a table cannot overlap, but the ranges of temporary partitions and
formal partitions can overlap.
The partition name of the temporary partition cannot be the same as the formal partitions and other temporary
partitions.
Supported operations
The temporary partition supports add, delete, and replace operations.
ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01");
ALTER TABLE tbl2 ADD TEMPORARY PARTITION tp1 VALUES [("2020-01-01"), ("2020-02-01"));
ALTER TABLE tbl1 ADD TEMPORARY PARTITION tp1 VALUES LESS THAN ("2020-02-01")
ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai");
ALTER TABLE tbl4 ADD TEMPORARY PARTITION tp1 VALUES IN ((1, "Beijing"), (1, "Shanghai"));
ALTER TABLE tbl3 ADD TEMPORARY PARTITION tp1 VALUES IN ("Beijing", "Shanghai")
Adding a temporary partition is similar to adding a formal partition. The partition range of the temporary partition is
independent of the formal partition.
Temporary partition can independently specify some attributes. Includes information such as the number of buckets, the
number of replicas, whether it is a memory table, or the storage medium.
Deleting the temporary partition will not affect the data of the formal partition.
Replace partition
You can replace formal partitions of a table with temporary partitions with the ALTER TABLE REPLACE PARTITION statement.
ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2, tp3);
ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1, tp2)
PROPERTIES (
"strict_range" = "false",
"use_temp_partition_name" = "true"
);
1. strict_range
For Range partition, When this parameter is true, the range union of all formal partitions to be replaced needs to be the
same as the range union of the temporary partitions to be replaced. When set to false, you only need to ensure that the
range between the new formal partitions does not overlap after replacement.
For List partition, this parameter is always true, and the enumeration values of all full partitions to be replaced must be
identical to the enumeration values of the temporary partitions to be replaced.
Example 1
(10, 20), [20, 30), [40, 50) => [10, 30), [40, 50)
(10, 30), [40, 45), [45, 50) => [10, 30), [40, 50)
The union of ranges is the same, so you can use tp1 and tp2 to replace p1, p2, p3.
Example 2
The union of ranges is not the same. If strict_range is true, you cannot use tp1 and tp2 to replace p1. If false, and the
two partition ranges [10, 30), [40, 50) and the other formal partitions do not overlap, they can be replaced.
Example 3
Replace the enumerated values of partitions tp1, tp2, tp3 (=> union).
The enumeration values are the same, you can use tp1, tp2, tp3 to replace p1, p2
Example 4
(("1", "beijing"), ("1", "shanghai")), (("2", "beijing"), ("2", "shanghai")), (("3", "beijing"), ("3",
"shanghai")) => (("1", "beijing"), ("3", "shanghai")) "), ("1", "shanghai"), ("2", "beijing"), ("2",
"shanghai"), ("3", "beijing"), ("3", "shanghai"))
(("1", "beijing"), ("1", "shanghai")), (("2", "beijing"), ("2", "shanghai"), ("3", "beijing"), ("3",
"shanghai")) => (("1", "beijing") , ("1", "shanghai"), ("2", "beijing"), ("2", "shanghai"), ("3",
"beijing"), ("3", "shanghai"))
The enumeration values are the same, you can use tp1, tp2 to replace p1, p2, p3
2. use_temp_partition_name
The default is false. When this parameter is false, and the number of partitions to be replaced is the same as the number
of replacement partitions, the name of the formal partition after the replacement remains unchanged. If true, after
replacement, the name of the formal partition is the name of the replacement partition. Here are some examples:
Example 1
ALTER TABLE tbl1 REPLACE PARTITION (p1) WITH TEMPORARY PARTITION (tp1);
use_temp_partition_name is false by default. After replacement, the partition name is still p1, but the related data and
attributes are replaced with tp1.
If use_temp_partition_name is true by default, the name of the partition is tp1 after replacement. The p1 partition no
longer exists.
Example 2
ALTER TABLE tbl1 REPLACE PARTITION (p1, p2) WITH TEMPORARY PARTITION (tp1);
use_temp_partition_name is false by default, but this parameter is invalid because the number of partitions to be
replaced and the number of replacement partitions are different. After the replacement, the partition name is tp1, and
p1 and p2 no longer exist.
After the partition is replaced successfully, the replaced partition will be deleted and cannot be recovered.
The syntax for specifying a temporary partition is slightly different depending on the load method. Here is a simple
illustration through an example:
INSERT INTO tbl TEMPORARY PARTITION (tp1, tp2, ...) SELECT ....
curl --location-trusted -u root: -H "label: 123" -H "temporary_partition: tp1, tp2, ..." -T testData http: //
host: port / api / testDb / testTbl / _stream_load
DATA INFILE ("hdfs: // hdfs_host: hdfs_port / user / palo / data / input / file")
...
PROPERTIES
(...)
FROM KAFKA
(...);
JOIN
ON ...
WHERE ...;
TRUNCATE
Use the TRUNCATE command to empty the table. The temporary partition of the table will be deleted and cannot be
recovered.
When using TRUNCATE command to empty the formal partition, it will not affect the temporary partition.
You cannot use the TRUNCATE command to empty the temporary partition.
ALTER
When the table has a temporary partition, you cannot use the ALTER command to perform Schema Change, Rollup, etc.
on the table.
You cannot add temporary partitions to a table while the table is undergoing a alter operation.
Best Practices
1. Atomic overwrite
In some cases, the user wants to be able to rewrite the data of a certain partition, but if it is dropped first and then loaded,
there will be a period of time when the data cannot be seen. At this moment, the user can first create a corresponding
temporary partition, load new data into the temporary partition, and then replace the original partition atomically
through the REPLACE operation to achieve the purpose. For atomic overwrite operations of non-partitioned tables,
please refer to Replace Table Document
In some cases, the user used an inappropriate number of buckets when creating a partition. The user can first create a
temporary partition corresponding to the partition range and specify a new number of buckets. Then use the INSERT
INTO command to load the data of the formal partition into the temporary partition. Through the replacement operation,
the original partition is replaced atomically to achieve the purpose.
In some cases, users want to modify the range of partitions, such as merging two partitions, or splitting a large partition
into multiple smaller partitions. Then the user can first create temporary partitions corresponding to the merged or
divided range, and then load the data of the formal partition into the temporary partition through the INSERT INTO
command. Through the replacement operation, the original partition is replaced atomically to achieve the purpose.
Partition Cache
Demand scenario
In most data analysis scenarios, write less and read more. Data is written once and read frequently. For example, the
dimensions and indicators involved in a report are calculated at one time in the early morning, but there are hundreds or
even thousands of times every day. page access, so it is very suitable for caching the result set. In data analysis or BI
applications, the following business scenarios exist:
High concurrency scenario, Doris can better support high concurrency, but a single server cannot carry too high QPS
Kanban for complex charts, complex Dashboard or large-screen applications, the data comes from multiple tables, each
page has dozens of queries, although each query is only tens of milliseconds, but the overall query time will be in a few
seconds
Trend analysis, the query for a given date range, the indicators are displayed by day, such as querying the trend of the
number of users in the last 7 days, this type of query has a large amount of data and a wide range of queries, and the
query time often takes tens of seconds
User repeated query, if the product does not have an anti-reload mechanism, the user repeatedly refreshes the page due
to hand error or other reasons, resulting in a large number of repeated SQL submissions
In the above four scenarios, the solution at the application layer is to put the query results in Redis, update the cache
periodically or manually refresh the cache by the user, but this solution has the following problems:
Data inconsistency, unable to perceive the update of data, causing users to often see old data
Low hit rate, cache the entire query result, if the data is written in real time, the cache is frequently invalidated, the hit rate
is low and the system load is heavy
Additional cost, the introduction of external cache components will bring system complexity and increase additional
costs
Solution
This partitioned caching strategy can solve the above problems, giving priority to ensuring data consistency. On this basis,
the cache granularity is refined and the hit rate is improved. Therefore, it has the following characteristics:
Users do not need to worry about data consistency, cache invalidation is controlled by version, and the cached data is
consistent with the data queried from BE
No additional components and costs, cached results are stored in BE's memory, users can adjust the cache memory size
as needed
Implemented two caching strategies, SQLCache and PartitionCache, the latter has a finer cache granularity
Use consistent hashing to solve the problem of BE nodes going online and offline. The caching algorithm in BE is an
improved LRU
SQLCache
SQLCache stores and retrieves the cache according to the SQL signature, the partition ID of the queried table, and the latest
version of the partition. The combination of the three determines a cached data set. If any one changes, such as SQL
changes, such as query fields or conditions are different, or the version changes after the data is updated, the cache will not
be hit.
If multiple tables are joined, use the latest updated partition ID and the latest version number. If one of the tables is updated,
the partition ID or version number will be different, and the cache will also not be hit.
SQLCache is more suitable for T+1 update scenarios. Data is updated in the early morning. The results obtained from the BE
for the first query are put into the cache, and subsequent identical queries are obtained from the cache. Real-time update
data can also be used, but there may be a problem of low hit rate. You can refer to the following PartitionCache.
PartitionCache
Design Principles
1. SQL can be split in parallel, Q = Q1 ∪ Q2 ... ∪ Qn, R= R1 ∪ R2 ... ∪ Rn, Q is the query statement, R is the result set
2. Split into read-only partitions and updatable partitions, read-only partitions are cached, and update partitions are not
cached
As above, query the number of users per day in the last 7 days, such as partitioning by date, the data is only written to the
partition of the day, and the data of other partitions other than the day is fixed. Under the same query SQL, query a certain
part that does not update Partition indicators are fixed. As follows, the number of users in the first 7 days is queried on 2020-
03-09, the data from 2020-03-03 to 2020-03-07 comes from the cache, the first query on 2020-03-08 comes from the
partition, and subsequent queries come from the cache , 2020-03-09 is from the partition because it is constantly being
written that day.
Therefore, when querying N days of data, the data is updated on the most recent D days. Every day is only a query with a
different date range and a similar query. Only D partitions need to be queried, and the other parts are from the cache, which
can effectively reduce the cluster load and reduce query time.
+------------+-----------------+
| eventdate | count(`userid`) |
+------------+-----------------+
| 2020-03-03 | 15 |
| 2020-03-04 | 20 |
| 2020-03-05 | 25 |
| 2020-03-06 | 30 |
| 2020-03-07 | 35 |
+------------+-----------------+
In PartitionCache, the first-level key of the cache is the 128-bit MD5 signature of the SQL after the partition condition is
removed. The following is the rewritten SQL to be signed:
The cached second-level key is the content of the partition field of the query result set, such as the content of the eventdate
column of the query result above, and the auxiliary information of the second-level key is the version number and version
update time of the partition.
The following demonstrates the process of executing the above SQL for the first time on 2020-03-09:
| 2020-03-03 | 15 |
| 2020-03-04 | 20 |
| 2020-03-05 | 25 |
| 2020-03-06 | 30 |
| 2020-03-07 | 35 |
+------------+-----------------+
1. SQL and data to get data from BE SQL and data to get data from BE
+------------+-----------------+
| 2020-03-08 | 40 |
+------------+-----------------+
| 2020-03-09 | 25 |
+------------+-----------------+
+------------+-----------------+
| eventdate | count(`userid`) |
+------------+-----------------+
| 2020-03-03 | 15 |
| 2020-03-04 | 20 |
| 2020-03-05 | 25 |
| 2020-03-06 | 30 |
| 2020-03-07 | 35 |
| 2020-03-08 | 40 |
| 2020-03-09 | 25 |
+------------+-----------------+
+------------+-----------------+
| 2020-03-08 | 40 |
+------------+-----------------+
Partition cache is suitable for partitioning by date, some partitions are updated in real time, and the query SQL is relatively
fixed.
Partition fields can also be other fields, but need to ensure that only a small number of partition updates.
Some restrictions
Only OlapTable is supported, other tables such as MySQL have no version information and cannot sense whether the data
is updated
Only supports grouping by partition field, does not support grouping by other fields, grouping by other fields, the
grouped data may be updated, which will cause the cache to be invalid
Only the first half of the result set, the second half of the result set and all cache hits are supported, and the result set is
not supported to be divided into several parts by the cached data
How to use
Enable SQLCache
Make sure cache_enable_sql_mode=true in fe.conf (default is true)
vim fe/conf/fe.conf
cache_enable_sql_mode=true
Note: global is a global variable, not referring to the current session variable
Enable PartitionCache
Make sure cache_enable_partition_mode=true in fe.conf (default is true)
vim fe/conf/fe.conf
cache_enable_partition_mode=true
If two caching strategies are enabled at the same time, the following parameters need to be paid attention to:
cache_last_version_interval_second=900
If the interval between the latest version of the partition is greater than cache_last_version_interval_second, the entire query
result will be cached first. If it is less than this interval, if it meets the conditions of PartitionCache, press PartitionCache data.
Monitoring
FE monitoring items:
Optimization parameters
The configuration item cache_result_max_row_count of FE, the maximum number of rows in the cache for the query result
set, can be adjusted according to the actual situation, but it is recommended not to set it too large to avoid taking up too
much memory, and the result set exceeding this size will not be cached.
vim fe/conf/fe.conf
cache_result_max_row_count=3000
The maximum number of partitions in BE cache_max_partition_count refers to the maximum number of partitions
corresponding to each SQL. If it is partitioned by date, it can cache data for more than 2 years. If you want to keep the cache
for a longer time, please set this parameter to a larger value and modify it at the same time. Parameter of
cache_result_max_row_count.
vim be/conf/be.conf
cache_max_partition_count=1024
The cache memory setting in BE consists of two parameters, query_cache_max_size and query_cache_elasticity_size (in MB).
If the memory exceeds query_cache_max_size + cache_elasticity_size, it will start to clean up and control the memory to
below query_cache_max_size. These two parameters can be set according to the number of BE nodes, node memory size,
and cache hit rate.
query_cache_max_size_mb=256
query_cache_elasticity_size_mb=128
Calculation method:
If 10000 queries are cached, each query caches 1000 rows, each row is 128 bytes, distributed on 10 BEs, then each BE requires
about 128M memory (10000 1000 128/10).
Unfinished Matters
Can the data of T+1 also be cached by Partition? Currently not supported
Similar SQL, 2 indicators were queried before, but now 3 indicators are queried. Can the cache of 2 indicators be used?
Not currently supported
Partition by date, but need to aggregate data by week dimension, is PartitionCache available? Not currently supported
Noun Interpretation
FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
Left table: the left table in join query. Perform probe expr. The order can be adjusted by join reorder.
Right table: the right table in join query. Perform build expr The order can be adjusted by join reorder.
Principle
The conventional distributed join methods supported by Doris is: Shuffle Join, Broadcast Join . Both of these join will lead
to some network overhead.
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as
follows:
Broadcast Join: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent
to the three HashJoinNode. Its network overhead is 3B , and its memory overhead is 3B .
Shuffle Join: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash
calculation, so its network overhead is A + B and memory overhead is B .
The data distribution information of each Doris table is saved in FE. If the join statement hits the data distribution column of
the left table, we should use the data distribution information to reduce the network and memory overhead of the join query.
This is the source of the idea of bucket shuffle join.
The picture above shows how the Bucket Shuffle Join works. The SQL query is A table join B table. The equivalent expression
of join hits the data distribution column of A. According to the data distribution information of table A. Bucket Shuffle Join
sends the data of table B to the corresponding data storage and calculation node of table A. The cost of Bucket Shuffle Join is
as follows:
: B < min(3B, A + B)
network cost
Therefore, compared with Broadcast Join and Shuffle Join, Bucket shuffle join has obvious performance advantages. It
reduces the time-consuming of data transmission between nodes and the memory cost of join. Compared with Doris's
original join method, it has the following advantages
First of all, Bucket Shuffle Join reduces the network and memory cost which makes some join queries have better
performance. Especially when FE can perform partition clipping and bucket clipping of the left table.
Secondly, unlike Colorate Join, it is not intrusive to the data distribution of tables, which is transparent to users. There is
no mandatory requirement for the data distribution of the table, which is not easy to lead to the problem of data skew.
Finally, it can provide more optimization space for join reorder.
Usage
In FE's distributed query planning, the priority order is Colorate Join -> Bucket Shuffle Join -> Broadcast Join -> Shuffle Join.
However, if the user explicitly hints the type of join, for example:
The session variable is set to true by default in version 0.14, while it needs to be set to true manually in version 0.13.
| 2:HASH JOIN
|
| | hash predicates:
|
The join type indicates that the join method to be used is : BUCKET_SHUFFLE 。
Bucket Shuffle Join only works when the join condition is equivalent. The reason is similar to Colorate Join. They all rely
on hash to calculate the determined data distribution.
The bucket column of two tables is included in the equivalent join condition. When the bucket column of the left table is
an equivalent join condition, it has a high probability of being planned as a Bucket Shuffle Join.
Because the hash values of different data types have different calculation results. Bucket Shuffle Join requires that the
bucket column type of the left table and the equivalent join column type of the right table should be consistent,
otherwise the corresponding planning cannot be carried out.
Bucket Shuffle Join only works on Doris native OLAP tables. For ODBC, MySQL, ES External Table, when they are used as
left tables, they cannot be planned as Bucket Shuffle Join.
For partitioned tables, because the data distribution rules of each partition may be different, the Bucket Shuffle Join can
only guarantee that the left table is a single partition. Therefore, in SQL execution, we need to use the where condition as
far as possible to make the partition clipping policy effective.
If the left table is a colorate table, the data distribution rules of each partition are determined. So the bucket shuffle join
can perform better on the colorate table.
Colocation Join
Colocation Join is a new feature introduced in Doris 0.9. The purpose of this paper is to provide local optimization for some
Join queries to reduce data transmission time between nodes and speed up queries.
The original design, implementation and effect can be referred to ISSUE 245.
The Colocation Join function has undergone a revision, and its design and use are slightly different from the original design.
This document mainly introduces Colocation Join's principle, implementation, usage and precautions.
Noun Interpretation
FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
Colocation Group (CG): A CG contains one or more tables. Tables within the same group have the same Colocation
Group Schema and the same data fragmentation distribution.
Colocation Group Schema (CGS): Used to describe table in a CG and general Schema information related to Colocation.
Including bucket column type, bucket number and copy number.
Principle
The Colocation Join function is to make a CG of a set of tables with the same CGS. Ensure that the corresponding data
fragments of these tables will fall on the same BE node. When tables in CG perform Join operations on bucket columns, local
data Join can be directly performed to reduce data transmission time between nodes.
The data of a table will eventually fall into a barrel according to the barrel column value Hash and the number of barrels
modeled. Assuming that the number of buckets in a table is 8, there are eight buckets [0, 1, 2, 3, 4, 5, 6, 7] Buckets'.
We call such a sequence a Buckets Sequence . Each Bucket has one or more Tablets. When a table is a single partitioned
table, there is only one Tablet in a Bucket. If it is a multi-partition table, there will be more than one.
In order for a table to have the same data distribution, the table in the same CG must ensure the following attributes are the
same:
Bucket column, that is, the column specified in `DISTRIBUTED BY HASH (col1, col2,...)'in the table building statement.
Bucket columns determine which column values are used to Hash data from a table into different Tablets. Tables in the
same CG must ensure that the type and number of barrel columns are identical, and the number of barrels is identical, so
that the data fragmentation of multiple tables can be controlled one by one.
2. Number of copies
The number of copies of all partitions of all tables in the same CG must be the same. If inconsistent, there may be a copy
of a Tablet, and there is no corresponding copy of other table fragments on the same BE.
Tables in the same CG do not require consistency in the number, scope, and type of partition columns.
After fixing the number of bucket columns and buckets, the tables in the same CG will have the same Buckets Sequence. The
number of replicas determines the number of replicas of Tablets in each bucket, which BE they are stored on. Suppose that
Buckets Sequence is [0, 1, 2, 3, 4, 5, 6, 7] , and that BE nodes have [A, B, C, D] 4. A possible distribution of data is
as follows:
| 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 |
| A | | B | | C | | D | | A | | B | | C | | D |
| | | | | | | | | | | | | | | |
| B | | C | | D | | A | | B | | C | | D | | A |
| | | | | | | | | | | | | | | |
| C | | D | | A | | B | | C | | D | | A | | B |
The data of all tables in CG will be uniformly distributed according to the above rules, which ensures that the data with the
same barrel column value are on the same BE node, and local data Join can be carried out.
Usage
Establishment of tables
When creating a table, you can specify the attribute "colocate_with"="group_name" in PROPERTIES , which means that the
table is a Colocation Join table and belongs to a specified Colocation Group.
Examples:
DISTRIBUTED BY HASH(k1)
BUCKETS 8
PROPERTIES(
"colocate_with" = "group1"
);
If the specified group does not exist, Doris automatically creates a group that contains only the current table. If the Group
already exists, Doris checks whether the current table satisfies the Colocation Group Schema. If satisfied, the table is created
and added to the Group. At the same time, tables create fragments and replicas based on existing data distribution rules in
Groups.
Group belongs to a database, and its name is unique in a database. Internal storage is the full name of Group
dbId_groupName , but users only perceive groupName.
Delete table
When the last table in Group is deleted completely (deleting completely means deleting from the recycle bin). Usually, when
a table is deleted by the DROP TABLE command, it will be deleted after the default one-day stay in the recycle bin, and the
group will be deleted automatically.
View Group
The following command allows you to view the existing Group information in the cluster.
+-------------+--------------+--------------+------------+----------------+----------+----------+
+-------------+--------------+--------------+------------+----------------+----------+----------+
+-------------+--------------+--------------+------------+----------------+----------+----------+
GroupId: The unique identity of a group's entire cluster, with DB ID in the first half and group ID in the second half.
GroupName: The full name of Group.
Tablet Ids: The group contains a list of Tables'ID.
Buckets Num: Number of barrels.
Replication Num: Number of copies.
DistCols: Distribution columns,
IsStable: Is the group stable (for the definition of stability, see section `Collocation replica balancing and repair').
You can further view the data distribution of a group by following commands:
+-------------+---------------------+
| BucketIndex | BackendIds |
+-------------+---------------------+
+-------------+---------------------+
The above commands require ADMIN privileges. Normal user view is not supported at this time.
If the table has not previously specified a Group, the command checks the Schema and adds the table to the Group (if
the Group does not exist, it will be created).
If other groups are specified before the table, the command first removes the table from the original group and adds a
new group (if the group does not exist, it will be created).
You can also delete the Colocation attribute of a table by following commands:
Replica Repair
Copies can only be stored on specified BE nodes. So when a BE is unavailable (downtime, Decommission, etc.), a new BE is
needed to replace it. Doris will first look for the BE with the lowest load to replace it. After replacement, all data fragments on
the old BE in the Bucket will be repaired. During the migration process, Group is marked Unstable.
Duplicate Equilibrium
Doris will try to distribute the fragments of the Collocation table evenly across all BE nodes. For the replica balancing of
common tables, the granularity is single replica, that is to say, it is enough to find BE nodes with lower load for each replica
alone. The equilibrium of the Colocation table is at the Bucket level, where all replicas within a Bucket migrate together. We
adopt a simple equalization algorithm, which distributes Buckets Sequence evenly on all BEs, regardless of the actual size of
the replicas, but only according to the number of replicas. Specific algorithms can be referred to the code annotations in
ColocateTableBalancer.java .
Note 1: Current Colocation replica balancing and repair algorithms may not work well for heterogeneous deployed Oris
clusters. The so-called heterogeneous deployment, that is, the BE node's disk capacity, number, disk type (SSD and HDD)
is inconsistent. In the case of heterogeneous deployment, small BE nodes and large BE nodes may store the same
number of replicas.
Note 2: When a group is in an Unstable state, the Join of the table in it will degenerate into a normal Join. At this time, the
query performance of the cluster may be greatly reduced. If you do not want the system to balance automatically, you
can set the FE configuration item disable_colocate_balance to prohibit automatic balancing. Then open it at the right
time. (See Section Advanced Operations for details)
Query
The Colocation table is queried in the same way as ordinary tables, and users do not need to perceive Colocation attributes. If
the Group in which the Colocation table is located is in an Unstable state, it will automatically degenerate to a normal Join.
Table 1:
) ENGINE=OLAP
PARTITION BY RANGE(`k1`)
(
PROPERTIES (
"colocate_with" = "group1"
);
Table 2:
CREATE TABLE `tbl2` (
) ENGINE=OLAP
PROPERTIES (
"colocate_with" = "group1"
);
+----------------------------------------------------+
| Explain String |
+----------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:`tbl1`.`k1` | |
| PARTITION: RANDOM |
| |
| RESULT SINK |
| |
| 2:HASH JOIN |
| | hash predicates: |
| | colocate: true |
| | `tbl1`.`k2` = `tbl2`.`k2` |
| | tuple ids: 0 1 |
| | |
| |----1:OlapScanNode |
| | TABLE: tbl2 |
| | partitions=0/1 |
| | rollup: null |
| | buckets=0/0 |
| | cardinality=-1 |
| | avgRowSize=0.0 |
| | numNodes=0 |
| | tuple ids: 1 |
| | |
| 0:OlapScanNode |
| TABLE: tbl1 |
| partitions=0/2 |
| rollup: null |
| buckets=0/0 |
| cardinality=-1 |
| avgRowSize=0.0 |
| numNodes=0 |
| tuple ids: 0 |
+----------------------------------------------------+
If Colocation Join works, the Hash Join Node will show colocate: true .
+----------------------------------------------------+
| Explain String |
+----------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:`tbl1`.`k1` | |
| PARTITION: RANDOM |
| |
| RESULT SINK |
| |
| 2:HASH JOIN |
| | hash predicates: |
| | `tbl1`.`k2` = `tbl2`.`k2` |
| | tuple ids: 0 1 |
| | |
| |----3:EXCHANGE |
| | tuple ids: 1 |
| | |
| 0:OlapScanNode |
| TABLE: tbl1 |
| partitions=0/2 |
| rollup: null |
| buckets=0/0 |
| cardinality=-1 |
| avgRowSize=0.0 |
| numNodes=0 |
| tuple ids: 0 |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| EXCHANGE ID: 03 |
| UNPARTITIONED |
| |
| 1:OlapScanNode |
| TABLE: tbl2 |
| partitions=0/1 |
| rollup: null |
| buckets=0/0 |
| cardinality=-1 |
| avgRowSize=0.0 |
| numNodes=0 |
| tuple ids: 1 |
+----------------------------------------------------+
The HASH JOIN node displays the corresponding reason: colocate: false, reason: group is not stable . At the same time,
an EXCHANGE node will be generated.
Advanced Operations
FE Configuration Item
disable_colocate_relocate
Whether to close Doris's automatic Colocation replica repair. The default is false, i.e. not closed. This parameter only affects
the replica repair of the Colocation table, but does not affect the normal table.
disable_colocate_balance
Whether to turn off automatic Colocation replica balancing for Doris. The default is false, i.e. not closed. This parameter only
affects the replica balance of the Collocation table, but does not affect the common table.
User can set these configurations at runtime. See HELP ADMIN SHOW CONFIG; and HELP ADMIN SET CONFIG; .
disable_colocate_join
Whether to turn off the Colocation Join function or not. In 0.10 and previous versions, the default is true, that is, closed. In a
later version, it will default to false, that is, open.
use_new_tablet_scheduler
In 0.10 and previous versions, the new replica scheduling logic is incompatible with the Colocation Join function, so in 0.10
and previous versions, if disable_colocate_join = false , you need to set use_new_tablet_scheduler = false , that is, close
the new replica scheduler. In later versions, use_new_tablet_scheduler will be equal to true.
Doris provides several HTTP Restful APIs related to Colocation Join for viewing and modifying Colocation Group.
The API is implemented on the FE side and accessed using fe_host: fe_http_port . ADMIN privileges are required.
GET /api/colocate
"msg": "success",
"code": 0,
"data": {
"infos": [
],
"unstableGroupIds": [],
"allGroupIds": [{
"dbId": 10003,
"grpId": 12002
}]
},
"count": 0
Mark as Stable
```
POST /api/colocate/group_stable?db_id=10005&group_id=10008
Returns: 200
```
Mark as Unstable
```
DELETE /api/colocate/group_stable?db_id=10005&group_id=10008
Returns: 200
```
POST /api/colocate/bucketseq?db_id=10005&group_id=10008
Body:
[[10004,10002],[10003,10002],[10002,10004],[10003,10002],[10002,10004],[10003,10002],[10003,10004],
[10003,10004],[10003,10004],[10002,10004]]
Returns: 200
Body is a Buckets Sequence represented by a nested array and the ID of the BE where the fragments are distributed in
each Bucket.
Note that using this command, you may need to set the FE configuration disable_colocate_relocate and
disable_colocate_balance to true. That is to shut down the system for automatic Colocation replica repair and
balancing. Otherwise, it may be automatically reset by the system after modification.
Runtime Filter
Runtime Filter is a new feature officially added in Doris 0.15. It is designed to dynamically generate filter conditions for certain
Join queries at runtime to reduce the amount of scanned data, avoid unnecessary I/O and network transmission, and speed
up the query.
Noun Interpretation
Left table: the table on the left during Join query. Perform Probe operation. The order can be adjusted by Join Reorder.
Right table: the table on the right during Join query. Perform the Build operation. The order can be adjusted by Join
Reorder.
Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and send them to BE
for execution. The corresponding Fragment is executed on the BE, and the results are aggregated and returned to the
FE.
Join on clause: Aa=Bb in A join B on Aa=Bb , based on this to generate join conjuncts during query planning, including
expr used by join Build and Probe, where Build expr is called in Runtime Filter src expr, Probe expr are called target expr
in Runtime Filter.
Principle
Runtime Filter is generated during query planning, constructed in HashJoinNode, and applied in ScanNode.
For example, there is currently a Join query between the T1 table and the T2 table. Its Join mode is HashJoin. T1 is a fact table
with 100,000 rows of data. T2 is a dimension table with 2000 rows of data. Doris join The actual situation is:
| | |
| | 100000 | 2000
| | |
| OlapScanNode OlapScanNode
| ^ ^
| | 100000 | 2000
| T1 T2
Obviously, scanning data for T2 is much faster than T1. If we take the initiative to wait for a while and then scan T1, after T2
sends the scanned data record to HashJoinNode, HashJoinNode calculates a filter condition based on the data of T2, such as
the maximum value of T2 data And the minimum value, or build a Bloom Filter, and then send this filter condition to
ScanNode waiting to scan T1, the latter applies this filter condition and delivers the filtered data to HashJoinNode, thereby
reducing the number of probe hash tables and network overhead. This filter condition is Runtime Filter, and the effect is as
follows:
| | |
| | 6000 | 2000
| | |
| OlapScanNode OlapScanNode
| ^ ^
| | 100000 | 2000
| T1 T2
If the filter condition (Runtime Filter) can be pushed down to the storage engine, in some cases, the index can be used to
directly reduce the amount of scanned data, thereby greatly reducing the scanning time. The effect is as follows:
| | |
| | 6000 | 2000
| | |
| OlapScanNode OlapScanNode
| ^ ^
| | 6000 | 2000
| T1 T2
It can be seen that, unlike predicate push-down and partition cutting, Runtime Filter is a filter condition dynamically
generated at runtime, that is, when the query is run, the join on clause is parsed to determine the filter expression, and the
expression is broadcast to ScanNode that is reading the left table , Thereby reducing the amount of scanned data, thereby
reducing the number of probe hash table, avoiding unnecessary I/O and network transmission.
Runtime Filter is mainly used to optimize joins for large tables. If the amount of data in the left table is too small, or the
amount of data in the right table is too large, the Runtime Filter may not achieve the expected effect.
Usage
The first query option is to adjust the type of Runtime Filter used. In most cases, you only need to adjust this option, and
keep the other options as default.
runtime_filter_type : Including Bloom Filter, MinMax Filter, IN predicate, IN_OR_BLOOM Filter and Bitmap_Filter. By
default, only IN_OR_BLOOM Filter will be used. In some cases, the performance will be higher when both Bloom
Filter, MinMax Filter and IN predicate are used at the same time.
Other query options usually only need to be further adjusted in certain specific scenarios to achieve the best results.
Usually only after performance testing, optimize for resource-intensive, long enough running time and high enough
frequency queries.
runtime_filter_mode : Used to adjust the push-down strategy of Runtime Filter, including three strategies of OFF,
LOCAL, and GLOBAL. The default setting is the GLOBAL strategy
runtime_filter_wait_time_ms : the time that ScanNode in the left table waits for each Runtime Filter, the default is
1000ms
runtime_filters_max_num : The maximum number of Bloom Filters in the Runtime Filter that can be applied to each
query, the default is 10
runtime_bloom_filter_min_size : the minimum length of Bloom Filter in Runtime Filter, default 1048576 (1M)
runtime_bloom_filter_max_size : the maximum length of Bloom Filter in Runtime Filter, the default is 16777216 (16M)
runtime_bloom_filter_size : The default length of Bloom Filter in Runtime Filter, the default is 2097152 (2M)
runtime_filter_max_in_num : If the number of rows in the right table of the join is greater than this value, we will not
generate an IN predicate, the default is 102400
1.runtime_filter_type
Type of Runtime Filter used.
Type: Number (1, 2, 4, 8, 16) or the corresponding mnemonic string (IN, BLOOM_FILTER, MIN_MAX, IN_OR_BLOOM_FILTER,
BITMAP_FILTER), the default is 8 (IN_OR_BLOOM FILTER), use multiple commas to separate, pay attention to the need to add
quotation marks , Or add any number of types, for example:
set runtime_filter_type="BLOOM_FILTER,IN,MIN_MAX";
Equivalent to:
set runtime_filter_type=7;
IN or Bloom Filter: According to the actual number of rows in the right table during execution, the system automatically
determines whether to use IN predicate or Bloom Filter.
By default, IN Predicate will be used when the number of data rows in the right table is less than 102400 (which can
be adjusted by ` runtime_filter_max_in_num 'in the session variable). Otherwise, use bloom filter.
Bloom Filter: There is a certain misjudgment rate, which results in the filtered data being a little less than expected, but it
will not cause the final result to be inaccurate. In most cases, Bloom Filter can improve performance or has no significant
impact on performance, but in some cases Under circumstances will cause performance degradation.
Bloom Filter construction and application overhead is high, so when the filtering rate is low, or the amount of data in
the left table is small, Bloom Filter may cause performance degradation.
At present, only the Key column of the left table can be pushed down to the storage engine if the Bloom Filter is
applied, and the test results show that the performance of the Bloom Filter is often reduced when the Bloom Filter is
not pushed down to the storage engine.
Currently Bloom Filter only has short-circuit logic when using expression filtering on ScanNode, that is, when the
false positive rate is too high, the Bloom Filter will not continue to be used, but there is no short-circuit logic when
the Bloom Filter is pushed down to the storage engine , So when the filtration rate is low, it may cause performance
degradation.
MinMax Filter: Contains the maximum value and the minimum value, thereby filtering data smaller than the minimum
value and greater than the maximum value. The filtering effect of the MinMax Filter is related to the type of the Key
column in the join on clause and the data distribution of the left and right tables.
When the type of the Key column in the join on clause is int/bigint/double, etc., in extreme cases, if the maximum
and minimum values o f the left and right tables are the same, there is no effect, otherwise the maximum value of the
right table is less than the minimum value of the left table, or the minimum of the right table The value is greater than
the maximum value in the left table, the effect is best.
When the type of the Key column in the join on clause is varchar, etc., applying the MinMax Filter will often cause
performance degradation.
IN predicate: Construct IN predicate based on all the values o f Key listed in the join on clause on the right table, and use
the constructed IN predicate to filter on the left table. Compared with Bloom Filter, the cost of construction and
application is lower. The amount of data in the right table is lower. When it is less, it tends to perform better.
Bitmap Filter:
Currently, the bitmap filter is used only when the subquery in the in subquery operation returns a bitmap column.
Currently, bitmap filter is only supported in vectorization engine.
2.runtime_filter_mode
Used to control the transmission range of Runtime Filter between instances.
Type: Number (0, 1, 2) or corresponding mnemonic string (OFF, LOCAL, GLOBAL), default 2 (GLOBAL).
LOCAL: Relatively conservative, the constructed Runtime Filter can only be used in the same Fragment on the same instance
(the smallest unit of query execution), that is, the Runtime Filter producer (the HashJoinNode that constructs the Filter) and
the consumer (the ScanNode that uses the RuntimeFilter) The same Fragment, such as the general scene of broadcast join;
GLOBAL: Relatively radical. In addition to satisfying the scenario of the LOCAL strategy, the Runtime Filter can also be
combined and transmitted to different Fragments on different instances via the network. For example, the Runtime Filter
producer and consumer are in different Fragments, such as shuffle join.
In most cases, the GLOBAL strategy can optimize queries in a wider range of scenarios, but in some shuffle joins, the cost of
generating and merging Runtime Filters exceeds the performance advantage brought to the query, and you can consider
changing to the LOCAL strategy.
If the join query involved in the cluster does not improve performance due to Runtime Filter, you can change the setting to
OFF to completely turn off the function.
When building and applying Runtime Filters on different Fragments, the reasons and strategies for merging Runtime Filters
can be found in ISSUE 6116
3.runtime_filter_wait_time_ms
Waiting for Runtime Filter is time consuming.
After the Runtime Filter is turned on, the ScanNode in the table on the left will wait for a period of time for each Runtime
Filter assigned to itself before scanning the data, that is, if the ScanNode is assigned 3 Runtime Filters, it will wait at most
3000ms.
Because it takes time to build and merge the Runtime Filter, ScanNode will try to push down the Runtime Filter that arrives
within the waiting time to the storage engine. If the waiting time is exceeded, ScanNode will directly start scanning data
using the Runtime Filter that has arrived.
If the Runtime Filter arrives after ScanNode starts scanning, ScanNode will not push the Runtime Filter down to the storage
engine. Instead, it will use expression filtering on ScanNode based on the Runtime Filter for the data that has been scanned
from the storage engine. The scanned data will not apply the Runtime Filter, so the intermediate data size obtained will be
larger than the optimal solution, but serious cracking can be avoided.
If the cluster is busy and there are many resource-intensive or long-time-consuming queries on the cluster, consider
increasing the waiting time to avoid missing optimization opportunities for complex queries. If the cluster load is light, and
there are many small queries on the cluster that only take a few seconds, you can consider reducing the waiting time to
avoid an increase of 1s for each query.
4.runtime_filters_max_num
The upper limit of the number of Bloom Filters in the Runtime Filter generated by each query.
If the number of Bloom Filters generated exceeds the maximum allowable number, then the Bloom Filter with a large
selectivity is retained. A large selectivity means that more rows are expected to be filtered. This setting can prevent Bloom
Filter from consuming too much memory overhead and causing potential problems.
- Because the cardinality of FE is currently inaccurate, the selectivity of Bloom Filter calculation here is
inaccurate, so in the end, it may only randomly reserve part of Bloom Filter.
This query option needs to be adjusted only when tuning some long-consuming queries involving joins between large tables.
Type: Integer
If you can get the number of data rows (Cardinality) in the statistical information of the join right table, it will try to estimate
the optimal size of the Bloom Filter based on Cardinality, and round to the nearest power of 2 (log value with the base 2). If
the Cardinality of the table on the right cannot be obtained, the default Bloom Filter length runtime_bloom_filter_size will
be used. runtime_bloom_filter_min_size and runtime_bloom_filter_max_size are used to limit the minimum and
maximum length of the final Bloom Filter.
Larger Bloom Filters are more effective when processing high-cardinality input sets, but require more memory. If the query
needs to filter high cardinality columns (for example, containing millions of different values), you can consider increasing the
value of runtime_bloom_filter_size for some benchmark tests, which will help make the Bloom Filter filter more accurate,
so as to obtain the expected Performance improvement.
The effectiveness of Bloom Filter depends on the data distribution of the query, so it is usually only for some specific queries
to additionally adjust the length of the Bloom Filter, rather than global modification, generally only for some long time-
consuming queries involving joins between large tables. Only when you need to adjust this query option.
The comment contained in the Fragment that generates the Runtime Filter, such as runtime filters: filter_id[type]
<- table.column .
Use the comment contained in the fragment of Runtime Filter such as runtime filters: filter_id[type] ->
table.column .
The query in the following example uses a Runtime Filter with ID RF000.
CREATE TABLE test (t1 INT) DISTRIBUTED BY HASH (t1) BUCKETS 2 PROPERTIES("replication_num" = "1");
CREATE TABLE test2 (t2 INT) DISTRIBUTED BY HASH (t2) BUCKETS 2 PROPERTIES("replication_num" = "1");
+-------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:`t1` |
| |
| 4:EXCHANGE |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| |
| 2:HASH JOIN |
| | |
| |----3:EXCHANGE |
| | |
| 0:OlapScanNode |
| TABLE: test |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| |
| 1:OlapScanNode |
| TABLE: test2 |
+-------------------------------------------------------------------+
-- The line of `runtime filters` above shows that `2:HASH JOIN` of `PLAN FRAGMENT 1` generates IN predicate with
ID RF000,
-- Among them, the key values of `test2`.`t2` are only known at runtime,
-- This IN predicate is used in `0:OlapScanNode` to filter unnecessary data when reading `test`.`t1`.
-- Through the query profile (set enable_profile=true;) you can view the detailed information of the internal
work of the query,
-- and the total time from prepare to receiving Runtime Filter for OLAP_SCAN_NODE.
RuntimeFilter:in:
- HasPushDownToEngine: true
- AWaitTimeCost: 0ns
- EffectTimeCost: 2.76ms
-- In addition, in the OLAP_SCAN_NODE of the profile, you can also view the filtering effect
- VectorPredEvalTime: 364.39ms
Hash Join: Create a hash table on the right table based on the equivalent join column, and the left table uses the hash
table to perform join calculations in a streaming manner. Its limitation is that it can only be applied to equivalent joins.
Nest Loop Join: With two for loops, it is very intuitive. Then it is applicable to unequal-valued joins, such as: greater than
or less than or the need to find a Cartesian product. It is a general join operator, but has poor performance.
As a distributed MPP database, data shuffle needs to be performed during the Join process. Data needs to be split and
scheduled to ensure that the final Join result is correct. As a simple example, assume that the relationship S and R are joined,
and N represents the number of nodes participating in the join calculation; T represents the number of tuples in the
relationship.
i. BroadCast Join
It requires the full data of the right table to be sent to the left table, that is, each node participating in Join has the full
data of the right table, that is, T(R).
Its applicable scenarios are more general, and it can support Hash Join and Nest loop Join at the same time, and its
network overhead is N * T(R).
The data in the left table is not moved, and the data in the right table is sent to the scanning node of the data in the left
table.
2. Shuffle Join
When Hash Join is performed, the corresponding Hash value can be calculated through the Join column, and Hash
bucketing can be performed.
Its network overhead is: T(R) + T(N), but it can only support Hash Join, because it also calculates buckets according to the
conditions of Join.
The left and right table data are sent to different partition nodes according to the partition, and the calculated demerits
are sent.
Doris's table data itself is bucketed by Hash calculation, so you can use the properties of the bucketed columns of the
table itself to shuffle the Join data. If two tables need to be joined, and the Join column is the bucket column of the left
table, then the data in the left table can actually be calculated by sending the data into the buckets of the left table
without moving the data in the right table.
Its network overhead is: T(R) is equivalent to only Shuffle the data in the right table.
The data in the left table does not move, and the data in the right table is sent to the node that scans the table in the left
table according to the result of the partition calculation.
4. Colocation
It is similar to Bucket Shuffle Join, which means that the data has been shuffled according to the preset Join column
scenario when data is imported. Then the join calculation can be performed directly without considering the Shuffle
problem of the data during the actual query.
The data has been pre-partitioned, and the Join calculation is performed directly locally
Bucket There are distributed columns in the left table in the join condition, and the left table is
T(R) Hash Join
Shuffle executed as a single partition
There are distributed columns in the left table in the join condition, and the left and
Colocate 0 Hash Join
right tables belong to the same Colocate Group
The flexibility of the above four methods is from high to low, and its requirements for this data distribution are becoming
more and more strict, but the performance of Join calculation is also getting better and better.
If the left table is a large table and the right table is a small table, then using the filter conditions generated by the left table,
most of the data to be filtered in the Join layer can be filtered in advance when the data is read, so that a large amount of
data can be filtered. Improve the performance of join queries.
One is IN-IN, which is well understood, and pushes a hashset down to the data scanning node.
The second is BloomFilter, which uses the data of the hash table to construct a BloomFilter, and then pushes the
BloomFilter down to the scanning node that queries the data. .
The last one is MinMax, which is a Range range. After the Range range is determined by the data in the right table, it is
pushed down to the data scanning node.
There are two requirements for the applicable scenarios of Runtime Filter:
The first requirement is that the right table is large and the left table is small, because building a Runtime Filter needs to
bear the computational cost, including some memory overhead.
The second requirement is that there are few results from the join of the left and right tables, indicating that this join can
filter out most of the data in the left table.
When the above two conditions are met, turning on the Runtime Filter can achieve better results
When the Join column is the Key column of the left table, the RuntimeFilter will be pushed down to the storage engine. Doris
itself supports delayed materialization,
Delayed materialization is simply like this: if you need to scan three columns A, B, and C, there is a filter condition on column
A: A is equal to 2, if you want to scan 100 rows, you can scan 100 rows of column A first, Then filter through the filter
condition A = 2. After filtering the results, read columns B and C, which can greatly reduce the data read IO. Therefore, if the
Runtime Filter is generated on the Key column, and the delayed materialization of Doris itself is used to further improve the
performance of the query.
Join Reader
Once the database involves multi-table Join, the order of Join has a great impact on the performance of the entire Join query.
Assuming that there are three tables to join, refer to the following picture, the left is the a table and the b table to do the join
first, the intermediate result has 2000 rows, and then the c table is joined.
Next, look at the picture on the right and adjust the order of Join. Join the a table with the c table first, the intermediate result
generated is only 100, and then finally join with the b table for calculation. The final join result is the same, but the
intermediate result it generates has a 20x difference, which results in a big performance diff.
Doris currently supports the rule-based Join Reorder algorithm. Its logic is:
Make joins with large tables and small tables as much as possible, and the intermediate results it generates are as
small as possible.
Put the conditional join table forward, that is to say, try to filter the conditional join table
Hash Join has higher priority than Nest Loop Join, because Hash Join itself is much faster than Nest Loop Join.
Use the Profile provided by Doris itself to locate the bottleneck of the query. Profile records various information in Doris'
entire query, which is first-hand information for performance tuning. .
Understand the Join mechanism of Doris, which is also the content shared with you in the second part. Only by knowing
why and understanding its mechanism can we analyze why it is slow.
Use Session variables to change some behaviors of Join, so as to realize the tuning of Join.
Check the Query Plan to analyze whether this tuning is effective.
The above 4 steps basically complete a standard Join tuning process, and then it is to actually query and verify it to see what
the effect is.
If the first 4 methods are connected in series, it still does not work. At this time, it may be necessary to rewrite the Join
statement, or to adjust the data distribution. It is necessary to recheck whether the entire data distribution is reasonable,
including querying the Join statement, and some manual adjustments may be required. Of course, this method has a
relatively high mental cost, which means that further analysis is required only when the previous method does not work.
Case 1
A four-table join query, through Profile, found that the second join took a long time, taking 14 seconds.
After further analysis of Profile, it is found that BuildRows, that is, the data volume of the right table is about 25 million. And
ProbeRows (ProbeRows is the amount of data in the left table) is only more than 10,000. In this scenario, the right table is
much larger than the left table, which is obviously an unreasonable situation. This obviously shows that there is some
problem with the order of Join. At this time, try to change the Session variable and enable Join Reorder.
This time, the time has been reduced from 14 seconds to 4 seconds, and the performance has been improved by more than 3
times.
At this time, when checking the profile again, the order of the left and right tables has been adjusted correctly, that is, the
right table is a large table, and the left table is a small table. The overhead of building a hash table based on a small table is
very small. This is a typical scenario of using Join Reorder to improve Join performance.
Case 2
There is a slow query. After viewing the Profile, the entire Join node takes about 44 seconds. Its right table has 10 million, the
left table has 60 million, and the final returned result is only 60 million.
It can be roughly estimated that the filtering rate is very high, so why does the Runtime Filter not take effect? Check it out
through Query Plan and find that it only enables the Runtime Filter of IN.
When the right table exceeds 1024 rows, IN will not take effect, so there is no filtering effect at all, so try to adjust the type of
RuntimeFilter.
This is changed to BloomFilter, and the 60 million pieces of data in the left table have filtered 59 million pieces. Basically, 99%
of the data is filtered out, and this effect is very significant. The query has also dropped from the original 44 seconds to 13
seconds, and the performance has been improved by about three times.
Case 3
The following is a relatively extreme case, which cannot be solved by tuning some environment variables, because it involves
SQL Rewrite, so the original SQL is listed here.
else 0
where
1_partkey = p_partkey
This Join query is very simple, a simple join of left and right tables. Of course, there are some filter conditions on it. When I
opened the Profile, I found that the entire query Hash Join was executed for more than three minutes. It is a BroadCast Join,
and its right table has 200 million entries, while the left table has only 700,000. In this case, it is unreasonable to choose
Broadcast Join, which is equivalent to making a Hash Table of 200 million records, and then traversing the Hash Table of 200
million records with 700,000 records, which is obviously unreasonable.
Why is there an unreasonable Join order? In fact, the left table is a large table with a level of 1 billion records. Two filter
conditions are added to it. After adding these two filter conditions, there are 700,000 records of 1 billion records. But Doris
currently does not have a good framework for collecting statistics, so it does not know what the filtering rate of this filter
condition is. Therefore, when the join order is arranged, the wrong left and right table order of the join is selected, resulting in
extremely low performance.
The following figure is an SQL statement after the rewrite is completed. A Join Hint is added after the Join, a square bracket is
added after the Join, and then the required Join method is written. Here, Shuffle Join is selected. You can see in the actual
query plan on the right that the data is indeed Partitioned. The original 3-minute time-consuming is only 7 seconds after the
rewriting, and the performance is improved significantly.
The first point: When doing Join, try to select columns of the same type or simple type. If the same type is used, reduce
its data cast, and the simple type itself joins the calculation quickly.
The second point: try to choose the Key column for Join. The reason is also introduced in the Runtime Filter. The Key
column can play a better effect on delayed materialization.
The third point: Join between large tables, try to make it Co-location, because the network overhead between large
tables is very large, if you need to do Shuffle, the cost is very high.
Fourth point: Use Runtime Filter reasonably, which is very effective in scenarios with high join filtering rate. But it is not a
panacea, but has certain side effects, so it needs to be switched according to the granularity of specific SQL.
Finally: When it comes to multi-table Join, it is necessary to judge the rationality of Join. Try to ensure that the left table is
a large table and the right table is a small table, and then Hash Join will be better than Nest Loop Join. If necessary, you
can use SQL Rewrite to adjust the order of Join using Hint.
Pipeline execution engine is an experimental feature added by Doris in version 2.0. The goal is to replace the current
execution engine of Doris's volcano model, fully release the computing power of multi-core CPUs, and limit the number of
Doris's query threads to solve the problem of Doris's execution thread bloat.
Its specific design, implementation and effects can be found in [DSIP-027](DSIP-027: Support Pipeline Exec Engine - DORIS -
Apache Software Foundation) 。
Principle
The current Doris SQL execution engine is designed based on the traditional volcano model, which has the following
problems in a single multi-core scenario :
Inability to take full advantage of multi-core computing power to improve query performance,most scenarios require
manual setting of parallelism for performance tuning, which is almost difficult to set in production environments.
Each instance of a standalone query corresponds to one thread of the thread pool, which introduces two additional
problems.
a. Once the thread pool is hit full. Doris' query engine will enter a pseudo-deadlock and will not respond to
subsequent queries. At the same time there is a certain probability of entering a logical deadlock situation: for
example, all threads are executing an instance's probe task.
b. Blocking arithmetic will take up thread resources,blocking thread resources can not be yielded to instances that
can be scheduled, the overall resource utilization does not go up.
Blocking arithmetic relies on the OS thread scheduling mechanism, thread switching overhead (especially in the scenario
of system mixing) )
The resulting set of problems drove Doris to implement an execution engine adapted to the architecture of modern multi-
core CPUs.
And as shown in the figure below (quoted from[Push versus pull-based loop fusion in query engines](jfp_1800010a
(cambridge.org)) ),The resulting set of problems drove Doris to implement an execution engine adapted to the
architecture of modern multi-core CPUs.:
1. Transformation of the traditional pull pull logic-driven execution process into a data-driven execution engine for the push
model
2. Blocking operations are asynchronous, reducing the execution overhead caused by thread switching and thread
blocking and making more efficient use of the CPU
3. Controls the number of threads to be executed and reduces the resource congestion of large queries on small queries in
mixed load scenarios by controlling time slice switching
This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries.
Usage
enable_pipeline_engine
This improves the efficiency of CPU execution on mixed-load SQL and enhances the performance of SQL queries
parallel_fragment_exec_instance_num
The default configuration of parallel_fragment_exec_instance_num represents the number of instances that a SQL query will
query concurrently; Doris defaults to 1 , which affects the number of query threads in the non-Pipeline execution engine,
whereas in the Pipeline execution engine there is no thread inflation This configuration affects the number of threads in the
Non-Pipeline execution engine. The recommended configuration here is 16 , but users can actually adjust it to suit their own
queries.
Materialized view
A materialized view is a data set that is pre-calculated (according to a defined SELECT statement) and stored in a special table
in Doris.
The emergence of materialized views is mainly to satisfy users. It can analyze any dimension of the original detailed data, but
also can quickly analyze and query fixed dimensions.
Advantage
For those queries that frequently use the same sub-query results repeatedly, the performance is greatly improved
Doris automatically maintains the data of the materialized view, whether it is a new import or delete operation, it can
ensure the data consistency of the base table and the materialized view table. No need for any additional labor
maintenance costs.
When querying, it will automatically match the optimal materialized view and read data directly from the materialized
view.
Automatic maintenance of materialized view data will cause some maintenance overhead, which will be explained in the
limitations of materialized views later.
Materialized views cover the functions of Rollup while also supporting richer aggregate functions. So the materialized view is
actually a superset of Rollup.
In other words, the functions previously supported by the ALTER TABLE ADD ROLLUP syntax can now be implemented by
CREATE MATERIALIZED VIEW .
First of all, the first point, if a materialized view is abstracted, and multiple queries can be matched to this materialized view.
This materialized view works best. Because the maintenance of the materialized view itself also consumes resources.
If the materialized view only fits a particular query, and other queries do not use this materialized view. As a result, the
materialized view is not cost-effective, which not only occupies the storage resources of the cluster, but cannot serve more
queries.
Therefore, users need to combine their own query statements and data dimension information to abstract the definition of
some materialized views.
The second point is that in the actual analysis query, not all dimensional analysis will be covered. Therefore, it is enough to
create a materialized view for the commonly used combination of dimensions, so as to achieve a space and time balance.
Creating a materialized view is an asynchronous operation, which means that after the user successfully submits the creation
task, Doris will calculate the existing data in the background until the creation is successful.
In Doris 2.0 we made some enhancements to materialized views (described in Best Practice 4 of this article). We
recommend that users check whether the expected query can hit the desired materialized view in the test environment
before using the materialized view in the official production environment.
If you don't know how to verify that a query hits a materialized view, you can read Best Practice 1 of this article.
At the same time, we do not recommend that users create multiple materialized views with similar shapes on the same
table, as this may cause conflicts between multiple materialized views and cause query hit failures. (Of course, these possible
problems can be verified in the test environment)
Update strategy
In order to ensure the data consistency between the materialized view table and the Base table, Doris will import, delete and
other operations on the Base table are synchronized to the materialized view table. And through incremental update to
improve update efficiency. To ensure atomicity through transaction.
For example, if the user inserts data into the base table through the INSERT command, this data will be inserted into the
materialized view synchronously. When both the base table and the materialized view table are written successfully, the
INSERT command will return successfully.
Users can use the EXPLAIN command to check whether the current query uses a materialized view.
The matching relationship between the aggregation in the materialized view and the aggregation in the query:
sum sum
min min
max max
count count
After the aggregation functions of bitmap and hll match the materialized view in the query, the aggregation operator of the
query will be rewritten according to the table structure of the materialized view. See example 2 for details.
+-----------+---------------+-----------------+----------+------+-------+---------+--------------+
+-----------+---------------+-----------------+----------+------+-------+---------+--------------+
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
+-----------+---------------+-----------------+----------+------+-------+---------+--------------+
You can see that the current mv_test table has three materialized views: mv_1, mv_2 and mv_3, and their table structure.
Best Practice 1
The use of materialized views is generally divided into the following steps:
Assume that the user has a sales record list, which stores the transaction id, salesperson, sales store, sales time, and amount
of each transaction. The table building statement and insert data statement is:
create table sales_records(record_id int, seller_id int, store_id int, sale_date date, sale_amt bigint)
distributed by hash(record_id) properties("replication_num" = "1");
+-----------+--------+------+-------+---------+--- ----+
+-----------+--------+------+-------+---------+--- ----+
+-----------+--------+------+-------+---------+--- ----+
At this time, if the user often performs an analysis query on the sales volume of different stores, you can create a materialized
view for the sales_records table to group the sales stores and sum the sales of the same sales stores. The creation
statement is as follows:
MySQL [test]> create materialized view store_amt as select store_id, sum(sale_amt) from sales_records group by
store_id;
The backend returns to the following figure, indicating that the task of creating a materialized view is submitted successfully.
Since the creation of a materialized view is an asynchronous operation, after the user submits the task of creating a
materialized view, he needs to asynchronously check whether the materialized view has been constructed through a
command. The command is as follows:
In this command, db_name is a parameter, you need to replace it with your real db name. The result of the command is to
display all the tasks of creating a materialized view of this db. The results are as follows:
+-------+---------------+---------------------+--- ------------------+---------------+--------------- --+--------
--+---------------+-----------+-------- -------------------------------------------------- ----------------------
---------------------------- -------------+----------+---------+
| 22036 | sales_records | 2020-07-30 20:04:28 | 2020-07-30 20:04:57 | sales_records | store_amt | 22037 | 5008 |
FINISHED | | NULL | 86400 |
Among them, TableName refers to which table the data of the materialized view comes from, and RollupIndexName refers
to the name of the materialized view. One of the more important indicators is State.
When the State of the task of creating a materialized view has become FINISHED, it means that the materialized view has
been created successfully. This means that it is possible to automatically match this materialized view when querying.
Step 3: Query
After the materialized view is created, when users query the sales volume of different stores, they will directly read the
aggregated data from the materialized view store_amt just created. To achieve the effect of improving query efficiency.
The user's query still specifies the query sales_records table, for example:
The above query will automatically match store_amt . The user can use the following command to check whether the
current query matches the appropriate materialized view.
+----------------------------------------------------------------------------------------------+
| Explain String |
+----------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS: |
| PARTITION: UNPARTITIONED |
| |
| VRESULT SINK |
| |
| 4:VEXCHANGE |
| offset: 0 |
| |
| PLAN FRAGMENT 1 |
| |
| |
| EXCHANGE ID: 04 |
| UNPARTITIONED |
| |
| | cardinality=-1 |
| | |
| 2:VEXCHANGE |
| offset: 0 |
| |
| PLAN FRAGMENT 2 |
| |
| |
| EXCHANGE ID: 02 |
| |
| | STREAMING |
| | output: sum(`default_cluster:test`.`sales_records`.`mva_SUM__`sale_amt``) |
| | cardinality=-1 |
| | |
| 0:VOlapScanNode |
+----------------------------------------------------------------------------------------------+
From the bottom test.sales_records(store_amt) , it can be shown that this query hits the store_amt materialized view. It is
worth noting that if there is no data in the table, then the materialized view may not be hit.
Assuming that the user's original ad click data is stored in Doris, then for ad PV and UV queries, the query speed can be
improved by creating a materialized view of bitmap_union .
Use the following statement to first create a table that stores the details of the advertisement click data, including the click
event of each click, what advertisement was clicked, what channel clicked, and who was the user who clicked.
MySQL [test]> create table advertiser_view_record(time date, advertiser varchar(10), channel varchar(10), user_id
int) distributed by hash(time) properties("replication_num" = "1");
+------------+-------------+------+-------+---------+-------+
+------------+-------------+------+-------+---------+-------+
+------------+-------------+------+-------+---------+-------+
Since the user wants to query the UV value of the advertisement, that is, a precise de-duplication of users of the same
advertisement is required, the user's query is generally:
SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
For this kind of UV-seeking scene, we can create a materialized view with bitmap_union to achieve a precise
deduplication effect in advance.
In Doris, the result of count(distinct) aggregation is exactly the same as the result of bitmap_union_count aggregation.
And bitmap_union_count is equal to the result of bitmap_union to calculate count, so if the query involves
count(distinct) , you can speed up the query by creating a materialized view with bitmap_union aggregation.
For this case, you can create a materialized view that accurately deduplicate user_id based on advertising and channel
grouping.
Note: Because the user_id itself is an INT type, it is called bitmap_union directly in Doris. The fields need to be converted
to bitmap type through the function to_bitmap first, and then bitmap_union can be aggregated.
After the creation is complete, the table structure of the advertisement click schedule and the materialized view table is
as follows:
+------------------------+---------------+----------------------+-------------+------+-------+---------+------
--------+
+------------------------+---------------+----------------------+-------------+------+-------+---------+------
--------+
| | | | | | | |
|
+------------------------+---------------+----------------------+-------------+------+-------+---------+------
--------+
When the materialized view table is created, when querying the advertisement UV, Doris will automatically query the
data from the materialized view advertiser_uv just created. For example, the original query statement is as follows:
SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY advertiser, channel;
After the materialized view is selected, the actual query will be transformed into:
SELECT advertiser, channel, bitmap_union_count(to_bitmap(user_id)) FROM advertiser_uv GROUP BY advertiser,
channel;
Through the EXPLAIN command, you can check whether Doris matches the materialized view:
mysql [test]>explain SELECT advertiser, channel, count(distinct user_id) FROM advertiser_view_record GROUP BY
advertiser, channel;
+--------------------------------------------------------------------------------------------------------------
---------------------------------------------------------+
| Explain String
|
+--------------------------------------------------------------------------------------------------------------
---------------------------------------------------------+
| PLAN FRAGMENT 0
|
| OUTPUT EXPRS:
|
| <slot 11>
bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mva_BITMAP_UNION__to_bitmap_with_check(`user
|
| PARTITION: UNPARTITIONED
|
|
|
| VRESULT SINK
|
|
|
| 4:VEXCHANGE
|
| offset: 0
|
|
|
| PLAN FRAGMENT 1
|
|
|
|
|
| EXCHANGE ID: 04
|
| UNPARTITIONED
|
|
|
| | cardinality=-1
|
| |
|
| 2:VEXCHANGE
|
| offset: 0
|
|
|
| PLAN FRAGMENT 2
|
|
|
|
|
| EXCHANGE ID: 02
|
|
|
| | STREAMING
|
| | output:
bitmap_union_count(`default_cluster:test`.`advertiser_view_record`.`mva_BITMAP_UNION__to_bitmap_with_check(`user
|
| | cardinality=-1
|
| |
|
| 0:VOlapScanNode
|
+--------------------------------------------------------------------------------------------------------------
---------------------------------------------------------+
In the result of EXPLAIN, you can first see that VOlapScanNode hits advertiser_uv . That is, the query scans the
materialized view's data directly. Indicates that the match is successful.
Secondly, the calculation of count(distinct) for the user_id field is rewritten as bitmap_union_count . That is to achieve
the effect of precise deduplication through bitmap.
Best Practice 3
Business scenario: matching a richer prefix index
The user's original table has three columns (k1, k2, k3). Among them, k1, k2 are prefix index columns. At this time, if the user
query condition contains where k1=a and k2=b , the query can be accelerated through the index.
But in some cases, the user's filter conditions cannot match the prefix index, such as where k3=c . Then the query speed
cannot be improved through the index.
This problem can be solved by creating a materialized view with k3 as the first column.
CREATE MATERIALIZED VIEW mv_1 as SELECT k3, k2, k1 FROM tableA ORDER BY k3;
After the creation of the above grammar is completed, the complete detail data is retained in the materialized view, and
the prefix index of the materialized view is the k3 column. The table structure is as follows:
+-----------+---------------+-------+------+------+-------+---------+-------+
+-----------+---------------+-------+------+------+-------+---------+-------+
| | | | | | | | |
+-----------+---------------+-------+------+------+-------+---------+-------+
2. Query matching
At this time, if the user's query has k3 column, the filter condition is, for example:
At this time, the query will read data directly from the mv_1 materialized view just created. The materialized view has a
prefix index on k3, and query efficiency will also be improved.
Best Practice 4
Since Version 2.0.0
In Doris 2.0 , we have made some enhancements to the expressions supported by the materialized view. This example will
mainly reflect the support for various expressions of the new version of the materialized view.
k1 int null,
k3 bigint null,
k4 varchar(100) null
properties("replication_num" = "1");
3. Use some queries to test if the materialized view was successfully hit.
Limitations
1. The parameter of the aggregate function of the materialized view does not support the expression only supports a single
column, for example: sum(a+b) does not support. (Supported after 2.0)
2. If the conditional column of the delete statement does not exist in the materialized view, the delete operation cannot be
performed. If you must delete data, you need to delete the materialized view before deleting the data.
3. Too many materialized views on a single table will affect the efficiency of importing: When importing data, the
materialized view and base table data are updated synchronously. If a table has more than 10 materialized view tables, it
may cause the import speed to be very high. slow. This is the same as a single import needs to import 10 tables at the
same time.
4. The same column with different aggregate functions cannot appear in a materialized view at the same time. For
example, select sum(a), min(a) from table are not supported. (Supported after 2.0)
5. For the Unique Key data model, the materialized view can only change the column order and cannot play the role of
aggregation. Therefore, in the Unique Key model, it is not possible to perform coarse-grained aggregation operations on
the data by creating a materialized view.
Error
1. DATA _QUALITY_ERROR: "The data quality does not satisfy, please check your data"
Materialized view creation failed due
to data quality issues or Schema Change memory usage exceeding the limit. If it is a memory problem, increase the
memory_limitation_per_thread_for_schema_change_bytes parameter.
Note: The bitmap type only supports positive
integers. If there are negative Numbers in the original data, the materialized view will fail to be created
More Help
For more detailed syntax and best practices for using materialized views, see CREATE MATERIALIZED VIEW and DROP
MATERIALIZED VIEW command manual, you can also Enter HELP CREATE MATERIALIZED VIEW and HELP DROP MATERIALIZED
VIEW at the command line of the MySql client for more help information.
Broker
Broker is an optional process in the Doris cluster. It is mainly used to support Doris to read and write files or directories on
remote storage. Now support:
Apache HDFS
Aliyun OSS
Tencent Cloud CHDFS
Huawei Cloud OBS (since 1.2.0)
Amazon S3
JuiceFS (since 2.0.0)
Broker provides services through an RPC service port. It is a stateless JVM process that is responsible for encapsulating some
POSIX-like file operations for read and write operations on remote storage, such as open, pred, pwrite, and so on.
In addition,
the Broker does not record any other information, so the connection information, file information, permission information,
and so on stored remotely need to be passed to the Broker process in the RPC call through parameters in order for the
Broker to read and write files correctly .
Broker only acts as a data channel and does not participate in any calculations, so it takes up less memory. Usually one or
more Broker processes are deployed in a Doris system. And the same type of Broker will form a group and set a Broker name
.
+----+ +----+
| FE | | BE |
+-^--+ +--^-+
| |
| |
+-v---------v-+
| Broker |
+------^------+
+------v------+
|HDFS/BOS/AFS |
+-------------+
This document mainly introduces the parameters that Broker needs when accessing different remote storages, such as
connection information,
authorization information, and so on.
1. Community HDFS
Broker Information
Broker information includes two parts: Broker name and Certification information . The general syntax is as follows:
"username" = "xxx",
"password" = "yyy",
"other_prop" = "prop_value",
...
);
Broker Name
Usually the user needs to specify an existing Broker Name through the WITH BROKER" broker_name " clause in the operation
command.
Broker Name is a name that the user specifies when adding a Broker process through the ALTER SYSTEM ADD
BROKER command.
A name usually corresponds to one or more broker processes. Doris selects available broker processes
based on the name.
You can use the SHOW BROKER command to view the Brokers that currently exist in the cluster.
Note: Broker Name is just a user-defined name and does not represent the type of Broker.
Certification Information
Different broker types and different access methods need to provide different authentication information.
Authentication
information is usually provided as a Key-Value in the Property Map after WITH BROKER" broker_name " .
Community HDFS
1. Simple Authentication
Use system users to access HDFS. Or add in the environment variable started by Broker: HADOOP_USER_NAME .
"username" = "user",
"password" = ""
);
2. Kerberos Authentication
kerberos_keytab : Specify the path to the keytab file for kerberos. The file must be an absolute path to a file on the
server where the broker process is located. And can be accessed by the Broker process.
kerberos_keytab_content : Specify the content of the keytab file in kerberos after base64 encoding. You can choose
one of these with kerberos_keytab configuration.
"hadoop.security.authentication" = "kerberos",
"kerberos_principal" = "[email protected]",
"kerberos_keytab" = "/home/doris/my.keytab"
"hadoop.security.authentication" = "kerberos",
"kerberos_principal" = "[email protected]",
"kerberos_keytab_content" = "ASDOWHDLAWIDJHWLDKSALDJSDIWALD"
If Kerberos authentication is used, the krb5.conf file is required when deploying the Broker process.
The krb5.conf file
contains Kerberos configuration information, Normally, you should install your krb5.conf file in the directory /etc. You
can override the default location by setting the environment variable KRB5_CONFIG.
An example of the contents of the
krb5.conf file is as follows:
[libdefaults]
default_realm = DORIS.HADOOP
dns_lookup_kdc = true
dns_lookup_realm = false
[realms]
DORIS.HADOOP = {
kdc = kerberos-doris.hadoop.service:7005
3. HDFS HA Mode
dfs.nameservices : Specify the name of the hdfs service, custom, such as "dfs.nameservices" = "my_ha".
dfs.ha.namenodes.xxx : Custom namenode names. Multiple names are separated by commas, where xxx is the
custom name in dfs.nameservices , such as" dfs.ha.namenodes.my_ha "=" my_nn ".
dfs.namenode.rpc-address.xxx.nn : Specify the rpc address information of namenode, Where nn represents the
name of the namenode configured in dfs.ha.namenodes.xxx , such as: "dfs.namenode.rpc-address.my_ha.my_nn" =
"host:port".
dfs.client.failover.proxy.provider : Specify the provider for the client to connect to the namenode. The default is:
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.
"dfs.nameservices" = "my_ha",
"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
"dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
The HA mode can be combined with the previous two authentication methods for cluster access. If you access HA
HDFS with simple authentication:
"username"="user",
"password"="passwd",
"dfs.nameservices" = "my_ha",
"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
"dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
The configuration for accessing the HDFS cluster can be written to the hdfs-site.xml file. When users use the Broker
process to read data from the HDFS cluster, they only need to fill in the cluster file path and authentication
information.
Aliyun OSS
"fs.oss.accessKeyId" = "",
"fs.oss.accessKeySecret" = "",
"fs.oss.endpoint" = ""
"fs.obs.access.key" = "xx",
"fs.obs.secret.key" = "xx",
"fs.obs.endpoint" = "xx"
Amazon S3
"fs.s3a.access.key" = "xx",
"fs.s3a.secret.key" = "xx",
"fs.s3a.endpoint" = "xx"
JuiceFS
"fs.defaultFS" = "jfs://xxx/",
"fs.jfs.impl" = "io.juicefs.JuiceFileSystem",
"fs.AbstractFileSystem.jfs.impl" = "io.juicefs.JuiceFS",
"juicefs.meta" = "xxx",
"juicefs.access-log" = "xxx"
Query Analysis
Doris provides a graphical command to help users analyze a specific query or import more easily. This article describes how
to use this feature.
For example, if the user specifies a Join operator, the query planner needs to decide the specific Join algorithm, such as Hash
Join or Merge Sort Join; whether to use Shuffle or Broadcast; whether the Join order needs to be adjusted to avoid Cartesian
product; on which nodes to execute and so on.
Doris' query planning process is to first convert an SQL statement into a single-machine execution plan tree.
┌────┐
│Sort│
└────┘
┌──────────────┐
│Aggregation│
└──────────────┘
┌────┐
│Join│
└────┘
┌────┴────┐
┌──────┐ ┌──────┐
│Scan-1│ │Scan-2│
└──────┘ └──────┘
After that, the query planner will convert the single-machine query plan into a distributed query plan according to the
specific operator execution mode and the specific distribution of data. The distributed query plan is composed of multiple
fragments, each fragment is responsible for a part of the query plan, and the data is transmitted between the fragments
through the ExchangeNode operator.
┌────┐
│Sort│
│F1 │
└────┘
┌──────────────┐
│Aggregation│
│F1 │
└──────────────┘
┌────┐
│Join│
│F1 │
└────┘
┌──────┴────┐
┌──────┐ ┌────────────┐
│Scan-1│ │ExchangeNode│
│F1 │ │F1 │
└──────┘ └────────────┘
┌────────────────┐
│DataStreamDink│
│F2 │
└────────────────┘
┌──────┐
│Scan-2│
│F2 │
└──────┘
As shown above, we divided the stand-alone plan into two Fragments: F1 and F2. Data is transmitted between two
Fragments through an ExchangeNode.
And a Fragment will be further divided into multiple Instances. Instance is the final concrete execution instance. Dividing into
multiple Instances helps to make full use of machine resources and improve the execution concurrency of a Fragment.
The first command displays a query plan graphically. This command can more intuitively display the tree structure of the
query plan and the division of Fragments:
mysql> explain graph select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1 order
by tbl1.k1;
+----------------------------------------------------------------------------------------------------------------
-----------------+
| Explain String
|
+----------------------------------------------------------------------------------------------------------------
-----------------+
|
|
| ┌───────────────┐
|
| │[9: ResultSink]│
|
| │[Fragment: 4] │
|
| │RESULT SINK │
|
| └───────────────┘
|
| │
|
| ┌─────────────────────┐
|
| │[9: MERGING-EXCHANGE]│
|
| │[Fragment: 4] │
|
| └─────────────────────┘
|
| │
|
| ┌───────────────────┐
|
| │[9: DataStreamSink]│
|
| │[Fragment: 3] │
|
| │ EXCHANGE ID: 09 │
|
| │ UNPARTITIONED │
|
| └───────────────────┘
|
| │
|
| ┌─────────────┐
|
| │[4: TOP-N] │
|
| │[Fragment: 3]│
|
| └─────────────┘
|
| │
|
| ┌───────────────────────────────┐
|
| │[Fragment: 3] │
|
| └───────────────────────────────┘
|
| │
|
| ┌─────────────┐
|
| │[7: EXCHANGE]│
|
| │[Fragment: 3]│
|
| └─────────────┘
|
| │
|
| ┌───────────────────┐
|
| │[7: DataStreamSink]│
|
| │[Fragment: 2] │
|
| │ EXCHANGE ID: 07 │
|
| │ HASH_PARTITIONED │
|
| └───────────────────┘
|
| │
|
| ┌─────────────────────────────────┐
|
| │[Fragment: 2] │
|
| │STREAMING │
|
| └─────────────────────────────────┘
|
| │
|
| ┌─────────────────────────────────┐
|
| │[Fragment: 2] │
|
| └─────────────────────────────────┘
|
| ┌──────────┴──────────┐
|
| ┌─────────────┐ ┌─────────────┐
|
| └─────────────┘ └─────────────┘
|
| │ │
|
| ┌───────────────────┐ ┌───────────────────┐
|
| │[Fragment: 0] │ │[Fragment: 1] │
|
| │ HASH_PARTITIONED │ │ HASH_PARTITIONED │
|
| └───────────────────┘ └───────────────────┘
|
| │ │
|
| ┌─────────────────┐ ┌─────────────────┐
|
| │[Fragment: 0] │ │[Fragment: 1] │
|
| └─────────────────┘ └─────────────────┘
|
+----------------------------------------------------------------------------------------------------------------
-----------------+
As can be seen from the figure, the query plan tree is divided into 5 fragments: 0, 1, 2, 3, and 4. For example, [Fragment: 0]
on the OlapScanNode node indicates that this node belongs to Fragment 0. Data is transferred between each Fragment
through DataStreamSink and ExchangeNode.
The graphics command only displays the simplified node information. If you need to view more specific node information,
such as the filter conditions pushed to the node as follows, you need to view the more detailed text version information
through the second command:
mysql> explain select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1 order by
tbl1.k1;
+----------------------------------------------------------------------------------+
| Explain String |
+----------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 5> <slot 3> `tbl1`.`k1` | <slot 6> <slot 4> sum(`tbl1`.`k2`) |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 9:MERGING-EXCHANGE |
| limit: 65535 |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| |
| EXCHANGE ID: 09 |
| UNPARTITIONED |
| |
| 4:TOP-N |
| | offset: 0 |
| | limit: 65535 |
| | |
| | cardinality=-1 |
| | |
| 7:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| |
| EXCHANGE ID: 07 |
| |
| | STREAMING |
| | output: sum(`tbl1`.`k2`) |
| | cardinality=-1 |
| | |
| 2:HASH JOIN |
| | hash predicates: |
| | cardinality=2 |
| | |
| |----6:EXCHANGE |
| | |
| 5:EXCHANGE |
| |
| PLAN FRAGMENT 3 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| EXCHANGE ID: 06 |
| HASH_PARTITIONED: `tbl2`.`k1` |
| |
| 1:OlapScanNode |
| TABLE: tbl2 |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: tbl2 |
| tabletRatio=3/3 |
| tabletList=105104776,105104780,105104784 |
| cardinality=1 |
| avgRowSize=4.0 |
| numNodes=6 |
| |
| PLAN FRAGMENT 4 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| EXCHANGE ID: 05 |
| HASH_PARTITIONED: `tbl1`.`k1` |
| |
| 0:OlapScanNode |
| TABLE: tbl1 |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: tbl1 |
| tabletRatio=3/3 |
| tabletList=105104752,105104763,105104767 |
| cardinality=2 |
| avgRowSize=8.0 |
| numNodes=6 |
+----------------------------------------------------------------------------------+
The third command explain verbose select ...; gives you more details than the second command.
mysql> explain verbose select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1
order by tbl1.k1;
+----------------------------------------------------------------------------------------------------------------
-----------------------------------------+
| Explain String
|
+----------------------------------------------------------------------------------------------------------------
-----------------------------------------+
| PLAN FRAGMENT 0
|
| OUTPUT EXPRS:<slot 5> <slot 3> `tbl1`.`k1` | <slot 6> <slot 4> sum(`tbl1`.`k2`)
|
| PARTITION: UNPARTITIONED
|
|
|
| VRESULT SINK
|
|
|
| 6:VMERGING-EXCHANGE
|
| limit: 65535
|
| tuple ids: 3
|
|
|
| PLAN FRAGMENT 1
|
|
|
|
|
| EXCHANGE ID: 06
|
| UNPARTITIONED
|
|
|
| 4:VTOP-N
|
| | offset: 0
|
| | limit: 65535
|
| | tuple ids: 3
|
| |
|
| | cardinality=-1
|
| | tuple ids: 2
|
| |
|
| 2:VHASH JOIN
|
| | cardinality=0
|
| |
|
| |----5:VEXCHANGE
|
| | tuple ids: 1
|
| |
|
| 0:VOlapScanNode
|
| TABLE: tbl1(null), PREAGGREGATION: OFF. Reason: the type of agg on StorageEngine's Key column should only
be MAX or MIN.agg expr: sum(`tbl1`.`k2`) |
| tuple ids: 0
|
|
|
| PLAN FRAGMENT 2
|
|
|
|
|
| EXCHANGE ID: 05
|
| UNPARTITIONED
|
|
|
| 1:VOlapScanNode
|
| tuple ids: 1
|
|
|
| Tuples:
|
| parent=0
|
| materialized=true
|
| byteSize=16
|
| byteOffset=16
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=1
|
|
|
| parent=0
|
| materialized=true
|
| byteSize=4
|
| byteOffset=0
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=0
|
|
|
|
|
| parent=1
|
| materialized=true
|
| byteSize=16
|
| byteOffset=0
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=0
|
|
|
|
|
| parent=2
|
| materialized=true
|
| byteSize=16
|
| byteOffset=16
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=1
|
|
|
| parent=2
|
| materialized=true
|
| byteSize=8
|
| byteOffset=0
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=0
|
|
|
|
|
| parent=3
|
| materialized=true
|
| byteSize=16
|
| byteOffset=16
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=1
|
|
|
| parent=3
|
| materialized=true
|
| byteSize=8
|
| byteOffset=0
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=0
|
|
|
|
|
| parent=4
|
| materialized=true
|
| byteSize=16
|
| byteOffset=16
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=1
|
|
|
| parent=4
|
| materialized=true
|
| byteSize=4
|
| byteOffset=0
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=0
|
|
|
| parent=4
|
| materialized=true
|
| byteSize=16
|
| byteOffset=32
|
| nullIndicatorByte=0
|
| nullIndicatorBit=-1
|
| slotIdx=2
|
+----------------------------------------------------------------------------------------------------------------
-----------------------------------------+
The information displayed in the query plan is still being standardized and improved, and we will introduce it in detail in
subsequent articles.
View query Profile
The user can open the session variable is_report_success with the following command:
SET is_report_success=true;
Then execute the query, and Doris will generate a Profile of the query. Profile contains the specific execution of a query for
each node, which helps us analyze query bottlenecks.
After executing the query, we can first get the Profile list with the following command:
QueryId: c257c52f93e149ee-ace8ac14e8c9fef9
User: root
DefaultDb: default_cluster:db1
SQL: select tbl1.k1, sum(tbl1.k2) from tbl1 join tbl2 on tbl1.k1 = tbl2.k1 group by tbl1.k1 order by
tbl1.k1
QueryType: Query
TotalTime: 9ms
QueryState: EOF
This command will list all currently saved profiles. Each row corresponds to a query. We can select the QueryId
corresponding to the Profile we want to see to see the specific situation.
This step is mainly used to analyze the execution plan as a whole and view the execution time of each Fragment.
Fragments:
┌──────────────────────┐
│[-1: DataBufferSender]│
│Fragment: 0 │
│MaxActiveTime: 6.626ms│
└──────────────────────┘
┌──────────────────┐
│[9: EXCHANGE_NODE]│
│Fragment: 0 │
└──────────────────┘
┌──────────────────────┐
│[9: DataStreamSender] │
│Fragment: 1 │
│MaxActiveTime: 5.449ms│
└──────────────────────┘
┌──────────────┐
│[4: SORT_NODE]│
│Fragment: 1 │
└──────────────┘
┌┘
┌─────────────────────┐
│[8: AGGREGATION_NODE]│
│Fragment: 1 │
└─────────────────────┘
└┐
┌──────────────────┐
│[7: EXCHANGE_NODE]│
│Fragment: 1 │
└──────────────────┘
┌──────────────────────┐
│[7: DataStreamSender] │
│Fragment: 2 │
│MaxActiveTime: 3.505ms│
└──────────────────────┘
┌┘
┌─────────────────────┐
│[3: AGGREGATION_NODE]│
│Fragment: 2 │
└─────────────────────┘
┌───────────────────┐
│[2: HASH_JOIN_NODE]│
│Fragment: 2 │
└───────────────────┘
┌────────────┴────────────┐
┌──────────────────┐ ┌──────────────────┐
│Fragment: 2 │ │Fragment: 2 │
└──────────────────┘ └──────────────────┘
│ │
┌─────────────────────┐ ┌────────────────────────┐
│Fragment: 4 │ │Fragment: 3 │
└─────────────────────┘ └────────────────────────┘
│ ┌┘
┌───────────────────┐ ┌───────────────────┐
│Fragment: 4 │ │Fragment: 3 │
└───────────────────┘ └───────────────────┘
│ │
┌─────────────┐ ┌─────────────┐
│[OlapScanner]│ │[OlapScanner]│
│Fragment: 4 │ │Fragment: 3 │
└─────────────┘ └─────────────┘
│ │
┌─────────────────┐ ┌─────────────────┐
│[SegmentIterator]│ │[SegmentIterator]│
│Fragment: 4 │ │Fragment: 3 │
└─────────────────┘ └─────────────────┘
As shown in the figure above, each node is marked with the Fragment to which it belongs, and at the Sender node of
each Fragment, the execution time of the Fragment is marked. This time-consuming is the longest of all Instance
execution times under Fragment. This helps us find the most time-consuming Fragment from an overall perspective.
For example, if we find that Fragment 1 takes the longest time, we can continue to view the Instance list of Fragment 1:
```sql
This shows the execution nodes and time consumption of all three Instances on Fragment 1.
We can continue to view the detailed profile of each operator on a specific Instance:
Instance:
┌────────────────────────────────────────────┐
│[9: DataStreamSender] │
│ - Counters: │
│ - BytesSent: 0.00 │
│ - IgnoreRows: 0 │
│ - PeakMemoryUsage: 8.00 KB │
│ - SerializeBatchTime: 0ns │
│ - UncompressedRowBatchSize: 0.00 │
└──────────────────────────────────────────┘
└┐
┌──────────────────────────────────────┐
│[4: SORT_NODE] │
│ - Counters: │
│ - PeakMemoryUsage: 12.00 KB │
│ - RowsReturned: 0 │
│ - RowsReturnedRate: 0 │
└──────────────────────────────────────┘
┌┘
┌──────────────────────────────────────┐
│[8: AGGREGATION_NODE] │
│ - Counters: │
│ - BuildTime: 3.701us │
│ - GetResultsTime: 0ns │
│ - HTResize: 0 │
│ - HTResizeTime: 1.211us │
│ - HashBuckets: 0 │
│ - HashCollisions: 0 │
│ - HashFailedProbe: 0 │
│ - HashFilledBuckets: 0 │
│ - HashProbe: 0 │
│ - HashTravelLength: 0 │
│ - LargestPartitionPercent: 0 │
│ - MaxPartitionLevel: 0 │
│ - NumRepartitions: 0 │
│ - PartitionsCreated: 16 │
│ - PeakMemoryUsage: 34.02 MB │
│ - RowsProcessed: 0 │
│ - RowsRepartitioned: 0 │
│ - RowsReturned: 0 │
│ - RowsReturnedRate: 0 │
│ - SpilledPartitions: 0 │
└──────────────────────────────────────┘
└┐
┌────────────────────────────────────────────────────┐
│[7: EXCHANGE_NODE] │
│ - Counters: │
│ - BytesReceived: 0.00 │
│ - ConvertRowBatchTime: 387ns │
│ - DataArrivalWaitTime: 4.357ms │
│ - DeserializeRowBatchTimer: 0ns │
│ - FirstBatchArrivalWaitTime: 4.356ms│
│ - PeakMemoryUsage: 0.00 │
│ - RowsReturned: 0 │
│ - RowsReturnedRate: 0 │
│ - SendersBlockedTotalTimer(*): 0ns │
└────────────────────────────────────────────────────┘
The above figure shows the specific profiles of each operator of Instance c257c52f93e149ee-ace8ac14e8c9ff03 in
Fragment 1.
Through the above three steps, we can gradually check the performance bottleneck of a SQL.
Import Analysis
Doris provides a graphical command to help users analyze a specific import more easily. This article describes how to use this
feature.
The execution process of a Broker Load request is also based on Doris' query framework. A Broker Load job will be split into
multiple subtasks based on the number of DATA INFILE clauses in the import request. Each subtask can be regarded as an
independent import execution plan. An import plan consists of only one Fragment, which is composed as follows:
┌────────────────┐
│OlapTableSink│
└────────────────┘
┌────────────────┐
│BrokerScanNode│
└────────────────┘
BrokerScanNode is mainly responsible for reading the source data and sending it to OlapTableSink, and OlapTableSink is
responsible for sending data to the corresponding node according to the partition and bucketing rules, and the
corresponding node is responsible for the actual data writing.
The Fragment of the import execution plan will be divided into one or more Instances according to the number of imported
source files, the number of BE nodes and other parameters. Each Instance is responsible for part of the data import.
The execution plans of multiple subtasks are executed concurrently, and multiple instances of an execution plan are also
executed in parallel.
SET is_report_success=true;
Then submit a Broker Load import request and wait until the import execution completes. Doris will generate a Profile for
this import. Profile contains the execution details of importing each subtask and Instance, which helps us analyze import
bottlenecks.
We can get the Profile list first with the following command:
+---------+------+-----------+------+------------+- --------------------+---------------------------------+------
- ----+------------+
+---------+------+-----------+------+------------+- --------------------+---------------------------------+------
- ----+------------+
| 10441 | N/A | N/A | N/A | Load | 2021-04-10 22:15:37 | 2021-04-10 22:18:54 | 3m17s | N/A |
+---------+------+-----------+------+------------+- --------------------+---------------------------------+------
- ----+------------+
This command will list all currently saved import profiles. Each line corresponds to one import. where the QueryId column is
the ID of the import job. This ID can also be viewed through the SHOW LOAD statement. We can select the QueryId
corresponding to the Profile we want to see to see the specific situation.
View an overview of subtasks with imported jobs by running the following command:
+-----------------------------------+------------+
| TaskId | ActiveTime |
+-----------------------------------+------------+
| 980014623046410a-88e260f0c43031f1 | 3m14s |
+-----------------------------------+------------+
As shown in the figure above, it means that the import job 10441 has a total of one subtask, in which ActiveTime indicates the
execution time of the longest instance in this subtask.
When we find that a subtask takes a long time, we can further check the execution time of each instance of the subtask:
+-----------------------------------+------------- -----+------------+
+-----------------------------------+------------- -----+------------+
+-----------------------------------+------------- -----+------------+
This shows the time-consuming of four instances of the subtask 980014623046410a-88e260f0c43031f1, and also shows the
execution node where the instance is located.
We can continue to view the detailed profile of each operator on a specific Instance:
Instance:
┌-----------------------------------------┐
│[-1: OlapTableSink] │
│ - Counters: │
│ - CloseWaitTime: 1m53s │
│ - ConvertBatchTime: 0ns │
│ - MaxAddBatchExecTime: 1m46s │
│ - NonBlockingSendTime: 3m11s │
│ - NumberBatchAdded: 782 │
│ - NumberNodeChannels: 1 │
│ - OpenTime: 743.822us │
│ - RowsFiltered: 0 │
│ - SendDataTime: 11s761ms │
│ - TotalAddBatchExecTime: 1m46s │
│ - ValidateDataTime: 9s802ms │
└-----------------------------------------┘
┌------------------------------------------------- ----┐
│[0: BROKER_SCAN_NODE] │
│ - Counters: │
│ - BytesDecompressed: 0.00 │
│ - BytesRead: 5.77 GB │
│ - DecompressTime: 0ns │
│ - FileReadTime: 34s263ms │
│ - MaterializeTupleTime(*): 45s54ms │
│ - NumDiskAccess: 0 │
│ - PeakMemoryUsage: 33.03 MB │
│ - TotalRawReadTime(*): 1m20s │
│ - WaitScannerTime: 56s528ms │
└------------------------------------------------- ----┘
The figure above shows the specific profiles of each operator of Instance 980014623046410a-88e260f0c43031f5 in subtask
980014623046410a-88e260f0c43031f1.
Through the above three steps, we can gradually check the execution bottleneck of an import task.
Debug Log
The system operation logs of Doris's FE and BE nodes are at INFO level by default. It can usually satisfy the analysis of system
behavior and the localization of basic problems. However, in some cases, it may be necessary to enable DEBUG level logs to
further troubleshoot the problem. This document mainly introduces how to enable the DEBUG log level of FE and BE nodes.
It is not recommended to adjust the log level to WARN or higher, which is not conducive to the analysis of system
behavior and the location of problems.
Enable DEBUG log may cause a large number of logs to be generated, Please be careful to open it in production
environment.
sys_log_verbose_modules=org.apache.doris.catalog.Catalog
# Open the Debug log of all classes under the package org.apache.doris.catalog
sys_log_verbose_modules=org.apache.doris.catalog
sys_log_verbose_modules=org
2. Via FE UI interface
The log level can be modified at runtime through the UI interface. There is no need to restart the FE node. Open the http
port of the FE node (8030 by default) in the browser, and log in to the UI interface. Then click on the Log tab in the upper
navigation bar.
We can enter the package name or specific class name in the Add input box to open the corresponding Debug log. For
example, enter org.apache.doris.catalog.Catalog to open the Debug log of the Catalog class:
You can also enter the package name or specific class name in the Delete input box to close the corresponding Debug
log.
The modification here will only affect the log level of the corresponding FE node. Does not affect the log level of
other FE nodes.
The log level can also be modified at runtime via the following API. There is no need to restart the FE node.
The username and password are the root or admin users who log in to Doris. The add_verbose parameter specifies the
package or class name to enable Debug logging. Returns if successful:
"msg": "success",
"code": 0,
"data": {
"LogConfiguration": {
"VerboseNames": "org,org.apache.doris.catalog.Catalog",
"AuditNames": "slow_query,query,load",
"Level": "INFO"
},
"count": 0
Debug logging can also be turned off via the following API:
The del_verbose parameter specifies the package or class name for which to turn off Debug logging.
sys_log_verbose_modules=plan_fragment_executor,olap_scan_node
sys_log_verbose_level=3
sys_log_verbose_modules specifies the file name to be opened, which can be specified by the wildcard *. for example:
sys_log_verbose_modules=*
sys_log_verbose_level indicates the level of DEBUG. The higher the number, the more detailed the DEBUG log. The value
range is 1-10.
Compaction
Doris writes data through a structure similar to LSM-Tree, and continuously merges small files into large ordered files
through compaction in the background. Compaction handles operations such as deletion and updating.
Appropriately adjusting the compaction strategy can greatly improve load and query efficiency. Doris provides the following
two compaction strategies for tuning:
Vertical compaction
Since Version 1.2.2
Vertical compaction is a new compaction algorithm implemented in Doris 1.2.2, which is used to optimize compaction
execution efficiency and resource overhead in large-scale and wide table scenarios. It can effectively reduce the memory
overhead of compaction and improve the execution speed of compaction. The test results show that the memory
consumption by vertical compaction is only 1/10 of the original compaction algorithm, and the compaction rate is increased
by 15%.
In vertical compaction, merging by row is changed to merging by column group. The granularity of each merge is changed
to column group, which reduces the amount of data involved in single compaction and reduces the memory usage during
compaction.
BE configuration :
enable_vertical_compaction = true will turn on vertical compaction
Segment compaction
Segment compaction mainly deals with the large-scale data load. Segment compaction operates during the load process
and compact segments inside the job, which is different from normal compaction and vertical compaction. This mechanism
can effectively reduce the number of generated segments and avoid the -238 (OLAP_ERR_TOO_MANY_SEGMENTS) errors.
BE configuration:
Loading large amounts of data fails at OLAP ERR TOO MANY SEGMENTS (errcode - 238) error. Then it is recommended to
turn on segment compaction to reduce the quantity of segments during the load process.
Too many small files are generated during the load process: although the amount of loading data is reasonable, the
generation of a large number of small segment files may also fail the load job because of low cardinality or memory
constraints that trigger memtable to be flushed in advance. Then it is recommended to turn on this function.
Query immediately after loading. When the load is just finished and the standard compaction has not finished, large
number of segment files will affect the efficiency of subsequent queries. If the user needs to query immediately after
loading, it is recommended to turn on this function.
The pressure of normal compaction is high after loading: segment compaction evenly puts part of the pressure of
normal compaction on the loading process. At this time, it is recommended to enable this function.
When the load operation itself has exhausted memory resources, it is not recommended to use the segment
compaction to avoid further increasing memory pressure and causing the load job to fail.
Refer to this link for more information about implementation and test results.
Resource management
In order to save the compute and storage resources in the Doris cluster, Doris needs to reference to some other external
resources to do the related work. such as spark/GPU for query, HDFS/S3 for external storage, spark/MapReduce for ETL,
connect to external storage by ODBC driver. Therefore, Doris need a resource management mechanism to manage these
external resources.
Fundamental Concept
A resource contains basic information such as name and type. The name is globally unique. Different types of resources
contain different attributes. Please refer to the introduction of each resource for details.
The creation and deletion of resources can only be performed by users own admin permission. One resource belongs to the
entire Doris cluster. Users with admin permission can assign permission of resource to other users. Please refer to HELP
GRANT or doris document.
Operation Of Resource
There are three main commands for resource management: create resource , drop resource and show resources . They
are to create, delete and check resources. The specific syntax of these three commands can be viewed by executing help
CMD after MySQL client connects to Doris.
1. CREATE RESOURCE
This statement is used to create a resource. For details, please refer to CREATE RESOURCE.
2. DROP RESOURCE
This command can delete an existing resource. For details, see DROP RESOURCE.
3. SHOW RESOURCES
This command can view the resources that the user has permission to use. For details, see SHOW RESOURCES.
Resources Supported
Currently, Doris can support
Spark
Parameter
Spark Parameters:
spark.master : required, currently supported yarn, spark: //host:port.
spark.submit.deployMode : The deployment mode of spark. required. It supports cluster and client.
spark.hadoop.yarn.resourcemanager.address : required when master is yarn.
If spark is used for ETL, also need to specify the following parameters:
working_dir : Directory used by ETL. Spark is required when used as an ETL resource. For example:
hdfs: //host:port/tmp/doris.
broker : The name of broker. Is required when spark be used as ETL resource. You need to use the ALTER SYSTEM ADD BROKER
command to complete the configuration in advance.
broker.property_key : When the broker reads the intermediate file generated by ETL, it needs the specified
authentication information.
Example
Create a spark resource named spark0 in the yarn cluster mode.
PROPERTIES
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "queue0",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
);
ODBC
Parameter
ODBC Parameters:
type : Required, must be odbc_catalog . As the type identifier of resource.
odbc_type : Indicates the type of external table. Currently, Doris supports MySQL and Oracle . In the future, it may support
more databases. The ODBC external table referring to the resource is required. The old MySQL external table referring to the
resource is optional.
driver : Indicates the driver dynamic library used by the ODBC external table.
The ODBC external table referring to the
resource is required. The old MySQL external table referring to the resource is optional.
Example
Create the ODBC resource of Oracle, named oracle_odbc .
PROPERTIES (
"type" = "odbc_catalog",
"host" = "192.168.0.1",
"port" = "8086",
"user" = "test",
"password" = "test",
"database" = "test",
"odbc_type" = "oracle",
);
Background
The original bitmap aggregate function designed by Doris is more general, but it has poor performance for the intersection
and union of bitmap large cardinality above 100 million level. There are two main reasons for checking the bitmap aggregate
function logic of the back-end be. First, when the bitmap cardinality is large, if the bitmap data size exceeds 1g, the network /
disk IO processing time is relatively long; second, after the scan data, all the back-end be instances are transmitted to the top-
level node for intersection and union operation, which brings pressure on the top-level single node and becomes the
processing bottleneck.
The solution is to divide the bitmap column values according to the range, and the values of different ranges are stored in
different buckets, so as to ensure that the bitmap values of different buckets are orthogonal and the data distribution is more
uniform. In the case of query, the orthogonal bitmap in different buckets is firstly aggregated and calculated, and then the
top-level node directly combines and summarizes the aggregated calculated values and outputs them. This will greatly
improve the computing efficiency and solve the bottleneck problem of the top single node computing.
User guide
1. Create a table and add hid column to represent bitmap column value ID range as hash bucket column
2. Usage scenarios
Create table
We need to use the aggregation model when building tables. The data type is bitmap, and the aggregation function is
bitmap_ union
) ENGINE=OLAP
COMMENT "OLAP"
The HID column is added to the table schema to indicate the ID range as a hash bucket column.
Note: the HID number and buckets should be set reasonably, and the HID number should be set at least 5 times of buckets,
so as to make the data hash bucket division as balanced as possible
Data Load
LOAD LABEL user_tag_bitmap_test
DATA INFILE('hdfs://abc')
(tmp_tag, tmp_user_id)
SET (
tag = tmp_tag,
hid = ceil(tmp_user_id/5000000),
user_id = to_bitmap(tmp_user_id)
...
Data format:
11111111,1
11111112,2
11111113,3
11111114,4
...
Note: the first column represents the user tags, which have been converted from Chinese into numbers
When loading data, vertically cut the bitmap value range of the user. For example, the hid value of the user ID in the range of
1-5000000 is the same, and the row with the same HID value will be allocated into a sub-bucket, so that the bitmap value in
each sub-bucket is orthogonal. On the UDAF implementation of bitmap, the orthogonal feature of bitmap value in the
bucket can be used to perform intersection union calculation, and the calculation results will be shuffled to the top node for
aggregation.
Note: The orthogonal bitmap function cannot be used in the partitioned table. Because the partitions of the partitioned table
are orthogonal, the data between partitions cannot be guaranteed to be orthogonal, so the calculation result cannot be
estimated.
orthogonal_bitmap_intersect
The bitmap intersection function
Syntax:
Parameters:
the first parameter is the bitmap column, the second parameter is the dimension column for filtering, and the third
parameter is the variable length parameter, which means different values of the filter dimension column
Explain:
on the basis of this table schema, this function has two levels of aggregation in query planning. In the first layer, be nodes
(update and serialize) first press filter_ Values are used to hash aggregate the keys, and then the bitmaps of all keys are
intersected. The results are serialized and sent to the second level be nodes (merge and finalize). In the second level be nodes,
all the bitmap values from the first level nodes are combined circularly
Example:
orthogonal_bitmap_intersect_count
To calculate the bitmap intersection count function, the syntax is the same as the original Intersect_Count, but the
implementation is different
Syntax:
orthogonal_bitmap_intersect_count(bitmap_column, column_to_filter, filter_values)
Parameters:
The first parameter is the bitmap column, the second parameter is the dimension column for filtering, and the third
parameter is the variable length parameter, which means different values of the filter dimension column
Explain:
on the basis of this table schema, the query planning aggregation is divided into two layers. In the first layer, be nodes
(update and serialize) first press filter_ Values are used to hash aggregate the keys, and then the intersection of bitmaps of all
keys is performed, and then the intersection results are counted. The count values are serialized and sent to the second level
be nodes (merge and finalize). In the second level be nodes, the sum of all the count values from the first level nodes is
calculated circularly
orthogonal_bitmap_union_count
Figure out the bitmap union count function, syntax with the original bitmap_union_count, but the implementation is
different.
Syntax:
orthogonal_bitmap_union_count(bitmap_column)
Explain:
on the basis of this table schema, this function is divided into two layers. In the first layer, be nodes (update and serialize)
merge all the bitmaps, and then count the resulting bitmaps. The count values are serialized and sent to the second level be
nodes (merge and finalize). In the second layer, the be nodes are used to calculate the sum of all the count values from the
first level nodes
Crowd selection:
The characteristic of HLL is that it has excellent space complexity O(mloglogn), time complexity is O(n), and the error of the
calculation result can be controlled at about 1%-2%. The error is related to the size of the data set and the ha related to the
Hierarchy function.
What is HyperLogLog
It is an upgraded version of the LogLog algorithm, and its role is to provide imprecise deduplication counts. Its mathematical
basis is the Bernoulli test.
Assuming that the coin has both heads and tails, the probability of the coin being flipped up and down is 50%. Keep flipping
the coin until it comes up heads, which we record as a full trial.
Then for multiple Bernoulli trials, assume that the number of times is n. It means that there are n times of heads. Suppose the
number of tosses experienced per Bernoulli trial is k. For the first Bernoulli trial, the number of times is set to k1, and so on,
the nth time corresponds to kn.
Among them, for these n Bernoulli trials, there must be a maximum number of tosses k. For example, after 12 tosses, a head
will appear, then this is called k_max, which represents the maximum number of tosses.
Finally, combined with the method of maximum likelihood estimation, it is found that there is an estimated correlation
between n and k_max: n = 2^k_max. When we only record k_max, we can estimate how many pieces of data there are in
total, that is, the cardinality.
1st trial: 3 tosses before it turns heads, at this time k=3, n=1
2nd trial: Heads appear after 2 tosses, at this time k=2, n=2
The 3rd trial: 6 tosses before the heads appear, at this time k=6, n=3
The nth trial: it took 12 tosses to get heads, at this point we estimate, n = 2^12
Take the first three groups of experiments in the above example, then k_max = 6, and finally n=3, we put it into the
estimation formula, obviously: 3 ≠ 2^6 . That is to say, when the number of trials is small, the error of this estimation method
is very large.
These three sets of trials, we call one round of estimation. If only one round is performed, when n is large enough, the
estimated error rate will be relatively reduced, but still not small enough.
To achieve the purpose of speeding up the query, based on it is an estimated result, the error is about 1%, the hll column is
generated by other columns or the data in the imported data, and the hll_hash function is used when importing
To specify which column in the data is used to generate the hll column, it is often used to replace count distinct, and is used
to quickly calculate uv in business by combining rollup
HLL_UNION_AGG(hll)
This function is an aggregate function that computes a cardinality estimate for all data that satisfies the condition.
HLL_CARDINALITY(hll)
This function is used to calculate the cardinality estimate for a single hll column
HLL_HASH(column_name)
Generate HLL column type for insert or import, see related instructions for import usage
dt date,
id int,
name char(10),
province char(10),
os char(10),
pv hll hll_union
PROPERTIES(
"replication_num" = "1",
"in_memory"="false"
);
Import Data
-H "column_separator:," \
(
The sample data is as follows test_hll.csv ):
2022-05-05,10001, 测试01,北京,windows
2022-05-05,10002, 测试01,北京,linux
2022-05-05,10003, 测试01,北京,macos
2022-05-05,10004, 测试01,河北,windows
2022-05-06,10001, 测试01,上海,windows
2022-05-06,10002, 测试01,上海,linux
2022-05-06,10003, 测试01,江苏,macos
2022-05-06,10004, 测试01,陕西,windows
"TxnId": 693,
"Label": "label_test_hll_load",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 8,
"NumberLoadedRows": 8,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 320,
"LoadTimeMs": 23,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 9,
"CommitAndPublishTimeMs": 11
2. Broker Load
DATA INFILE("hdfs://hdfs_host:hdfs_port/user/doris_test_hll/data/input/file")
(dt,id,name,province,os)
SET (
pv = HLL_HASH(id)
);
Query data
HLL columns do not allow direct query of the original value, but can only be queried through the HLL aggregate function.
+---------------------+
| hll_union_agg(`pv`) |
+---------------------+
| 4 |
+---------------------+
Equivalent to:
+----------------------+
| count(DISTINCT `pv`) |
+----------------------+
| 4 |
+----------------------+
+---------------------+
| hll_union_agg(`pv`) |
+---------------------+
| 4 |
| 4 |
+---------------------+
Variable
This document focuses on currently supported variables.
Variables in Doris refer to variable settings in MySQL. However, some of the variables are only used to be compatible with
some MySQL client protocols, and do not produce their actual meaning in the MySQL database.
View
All or specified variables can be viewed via SHOW VARIABLES [LIKE 'xxx']; . Such as:
SHOW VARIABLES;
Settings
Note that before version 1.1, after the setting takes effect globally, the setting value will be inherited in subsequent new
session connections, but the value in the current session will remain unchanged.
After version 1.1 (inclusive), after the setting
takes effect globally, the setting value will be used in subsequent new session connections, and the value in the current
session will also change.
time_zone
wait_timeout
sql_mode
enable_profile
query_timeout
exec_mem_limit
batch_size
parallel_fragment_exec_instance_num
parallel_exchange_instance_num
allow_partition_column_nullable
insert_visible_timeout_ms
enable_fold_constant_by_be
default_rowset_type
default_password_lifetime
password_history
validate_password_policy
At the same time, variable settings also support constant expressions. Such as:
Note that the comment must start with /*+ and can only follow the SELECT.
Supported variables
SQL_AUTO_IS_NULL
auto_increment_increment
autocommit
auto_broadcast_join_threshold
The maximum size in bytes of the table that will be broadcast to all nodes when a join is performed, broadcast can be
disabled by setting this value to -1.
The system provides two join implementation methods, broadcast join and shuffle join .
broadcast join means that after conditional filtering the small table, broadcast it to each node where the large table is
located to form an in-memory Hash table, and then stream the data of the large table for Hash Join.
shuffle join refers to hashing both small and large tables according to the join key, and then performing distributed
join.
broadcast join has better performance when the data volume of the small table is small. On the contrary, shuffle join
has better performance.
The system will automatically try to perform a Broadcast Join, or you can explicitly specify the implementation of each
join operator. The system provides a configurable parameter auto_broadcast_join_threshold , which specifies the upper
limit of the memory used by the hash table to the overall execution memory when broadcast join is used. The value
ranges from 0 to 1, and the default value is 0.8. When the memory used by the system to calculate the hash table exceeds
this limit, it will automatically switch to using shuffle join
The overall execution memory here is: a fraction of what the query optimizer estimates
Note:
It is not recommended to use this parameter to adjust, if you must use a certain join, it is recommended to use hint,
such as join[shuffle]
batch_size
Used to specify the number of rows of a single packet transmitted by each node during query execution. By default, the
number of rows of a packet is 1024 rows. That is, after the source node generates 1024 rows of data, it is packaged and
sent to the destination node.
A larger number of rows will increase the throughput of the query in the case of scanning large data volumes, but may
increase the query delay in small query scenario. At the same time, it also increases the memory overhead of the query.
The recommended setting range is 1024 to 4096.
character_set_client
character_set_connection
character_set_results
character_set_server
codegen_level
collation_connection
collation_database
collation_server
Used for compatibility with MySQL clients. No practical effect.
default_order_by_limit
Used to control the default number of items returned after OrderBy. The default value is -1, and the maximum number of
records after the query is returned by default, and the upper limit is the MAX_VALUE of the long data type.
delete_without_partition
When set to true. When using the delete command to delete partition table data, no partition is required. The delete
operation will be automatically applied to all partitions.
Note, however, that the automatic application to all partitions may cause the delete command to take a long time to
trigger a large number of subtasks and cause a long time. If it is not necessary, it is not recommended to turn it on.
disable_colocate_join
Controls whether the Colocation Join function is enabled. The default is false, which means that the feature is enabled.
True means that the feature is disabled. When this feature is disabled, the query plan will not attempt to perform a
Colocation Join.
enable_bucket_shuffle_join
Controls whether the Bucket Shuffle Join function is enabled. The default is true, which means that the feature is enabled.
False means that the feature is disabled. When this feature is disabled, the query plan will not attempt to perform a
Bucket Shuffle Join.
disable_streaming_preaggregations
Controls whether streaming pre-aggregation is turned on. The default is false, which is enabled. Currently not
configurable and enabled by default.
enable_insert_strict
Used to set the strict mode when loading data via INSERT statement. The default is false, which means that the
strict mode is not turned on. For an introduction to this mode, see here.
enable_spilling
Used to set whether to enable external sorting. The default is false, which turns off the feature. This feature is enabled
when the user does not specify a LIMIT condition for the ORDER BY clause and also sets enable_spilling to true. When
this feature is enabled, the temporary data is stored in the doris-scratch/ directory of the BE data directory and the
temporary data is cleared after the query is completed.
This feature is mainly used for sorting operations with large amounts of data using limited memory.
Note that this feature is experimental and does not guarantee stability. Please turn it on carefully.
exec_mem_limit
Used to set the memory limit for a single query. The default is 2GB, you can set it in B/K/KB/M/MB/G/GB/T/TB/P/PB, the
default is B.
This parameter is used to limit the memory that can be used by an instance of a single query fragment in a query plan. A
query plan may have multiple instances, and a BE node may execute one or more instances. Therefore, this parameter
does not accurately limit the memory usage of a query across the cluster, nor does it accurately limit the memory usage
of a query on a single BE node. The specific needs need to be judged according to the generated query plan.
Usually, only some blocking nodes (such as sorting node, aggregation node, and join node) consume more memory,
while in other nodes (such as scan node), data is streamed and does not occupy much memory.
When a Memory Exceed Limit error occurs, you can try to increase the parameter exponentially, such as 4G, 8G, 16G, and
so on.
forward_to_master
The user sets whether to forward some commands to the Master FE node for execution. The default is true , which
means forwarding. There are multiple FE nodes in Doris, one of which is the Master node. Usually users can connect to
any FE node for full-featured operation. However, some of detail information can only be obtained from the Master FE
node.
For example, the SHOW BACKENDS; command, if not forwarded to the Master FE node, can only see some basic
information such as whether the node is alive, and forwarded to the Master FE to obtain more detailed information
including the node startup time and the last heartbeat time.
i. SHOW FRONTEND;
Forward to Master to view startup time, last heartbeat information, and disk capacity information.
Forward to Master to view the start time and last heartbeat information.
iv. SHOW TABLET; / ADMIN SHOW REPLICA DISTRIBUTION; / ADMIN SHOW REPLICA STATUS;
Forward to Master to view the tablet information stored in the Master FE metadata. Under normal circumstances,
the tablet information in different FE metadata should be consistent. When a problem occurs, this method can be
used to compare the difference between the current FE and Master FE metadata.
v. SHOW PROC;
Forward to Master to view information about the relevant PROC stored in the Master FE metadata. Mainly used for
metadata comparison.
init_connect
interactive_timeout
enable_profile
Used to set whether you need to view the profile of the query. The default is false, which means no profile is required.
By default, the BE sends a profile to the FE for viewing errors only if an error occurs in the query. A successful query will
not send a profile. Sending a profile will incur a certain amount of network overhead, which is detrimental to a high
concurrent query scenario.
When the user wants to analyze the profile of a query, the query can be sent after this variable is set to true. After the
query is finished, you can view the profile on the web page of the currently connected FE:
fe_host:fe_http:port/query
It will display the most recent 100 queries which enable_profile is set to true.
language
license
lower_case_table_names
When the value is 1, the table name is case insensitive. Doris will convert the table name to lowercase when storing and
querying.
The advantage is that any case of table name can be used in one statement. The following SQL is correct:
+------------------+
| Tables_ in_testdb|
+------------------+
| cost |
+------------------+
mysql> select * from COST where COst.id < 100 order by cost.id;
The disadvantage is that the table name specified in the table creation statement cannot be obtained after table creation.
The table name viewed by 'show tables' is lower case of the specified table name.
When the value is 2, the table name is case insensitive. Doris stores the table name specified in the table creation
statement and converts it to lowercase for comparison during query.
The advantage is that the table name viewed by 'show tables' is the table name specified in the table creation statement;
The disadvantage is that only one case of table name can be used in the same statement. For example, the table name
'cost' can be used to query the 'cost' table:
mysql> select * from COST where COST.id < 100 order by COST.id;
This variable is compatible with MySQL and must be configured at cluster initialization by specifying
lower_case_table_names= in fe.conf. It cannot be modified by the set statement after cluster initialization is complete,
nor can it be modified by restarting or upgrading the cluster.
The system view table names in information_schema are case-insensitive and behave as 2 when the value of
lower_case_table_names is 0.
Translated with www.DeepL.com/Translator (free version)
max_allowed_packet
max_pushdown_conditions_per_column
For the specific meaning of this variable, please refer to the description of max_pushdown_conditions_per_column in BE
Configuration. This variable is set to -1 by default, which means that the configuration value in be.conf is used. If the
setting is greater than 0, the query in the current session will use the variable value, and ignore the configuration value in
be.conf .
max_scan_key_num
For the specific meaning of this variable, please refer to the description of doris_max_scan_key_num in BE Configuration.
This variable is set to -1 by default, which means that the configuration value in be.conf is used. If the setting is greater
than 0, the query in the current session will use the variable value, and ignore the configuration value in be.conf .
net_buffer_length
net_read_timeout
net_write_timeout
parallel_exchange_instance_num
Used to set the number of exchange nodes used by an upper node to receive data from the lower node in the execution
plan. The default is -1, which means that the number of exchange nodes is equal to the number of execution instances of
the lower nodes (default behavior). When the setting is greater than 0 and less than the number of execution instances of
the lower node, the number of exchange nodes is equal to the set value.
In a distributed query execution plan, the upper node usually has one or more exchange nodes for receiving data from
the execution instances of the lower nodes on different BEs. Usually the number of exchange nodes is equal to the
number of execution instances of the lower nodes.
In some aggregate query scenarios, if the amount of data to be scanned at the bottom is large, but the amount of data
after aggregation is small, you can try to modify this variable to a smaller value, which can reduce the resource overhead
of such queries. Such as the scenario of aggregation query on the DUPLICATE KEY data model.
parallel_fragment_exec_instance_num
For the scan node, set its number of instances to execute on each BE node. The default is 1.
A query plan typically produces a set of scan ranges, the range of data that needs to be scanned. These data are
distributed across multiple BE nodes. A BE node will have one or more scan ranges. By default, a set of scan ranges for
each BE node is processed by only one execution instance. When the machine resources are abundant, you can increase
the variable and let more execution instances process a set of scan ranges at the same time, thus improving query
efficiency.
The number of scan instances determines the number of other execution nodes in the upper layer, such as aggregate
nodes and join nodes. Therefore, it is equivalent to increasing the concurrency of the entire query plan execution.
Modifying this parameter will help improve the efficiency of large queries, but larger values will consume more machine
resources, such as CPU, memory, and disk IO.
query_cache_size
query_cache_type
query_timeout
Used to set the query timeout. This variable applies to all query statements in the current connection. Particularly,
timeout of INSERT statements is recommended to be managed by the insert_timeout below. The default is 5 minutes, in
seconds.
insert_timeout
Since Version dev Used to set the insert timeout. This variable applies to INSERT statements particularly in the current
connection, and is recommended to manage long-duration INSERT action. The default is 4 hours, in seconds. It will lose
effect when query_timeout is greater than itself to make it compatible with the habits of older version users to use
query_timeout to control the timeout of INSERT statements.
resource_group
Not used.
send_batch_parallelism
Used to set the default parallelism for sending batch when execute InsertStmt operation, if the value for parallelism
exceed max_send_batch_parallelism_per_job in BE config, then the coordinator BE will use the value of
max_send_batch_parallelism_per_job .
sql_mode
Used to specify SQL mode to accommodate certain SQL dialects. For the SQL mode, see here.
sql_safe_updates
sql_select_limit
system_time_zone
time_zone
Used to set the time zone of the current session. The time zone has an effect on the results of certain time functions. For
the time zone, see here.
tx_isolation
tx_read_only
transaction_read_only
transaction_isolation
version
performance_schema
Used for compatibility with MySQL JDBC 8.0.16 or later version. No practical effect.
version_comment
wait_timeout
The length of the connection used to set up an idle connection. When an idle connection does not interact with Doris for
that length of time, Doris will actively disconnect the link. The default is 8 hours, in seconds.
default_rowset_type
Used for setting the default storage format of Backends storage engine. Valid options: alpha/beta
use_v2_rollup
Used to control the sql query to use segment v2 rollup index to get data. This variable is only used for validation when
upgrading to segment v2 feature. Otherwise, not recommended to use.
rewrite_count_distinct_to_bitmap_hll
Whether to rewrite count distinct queries of bitmap and HLL types as bitmap_union_count and hll_union_agg.
prefer_join_method
When choosing the join method(broadcast join or shuffle join), if the broadcast join cost and shuffle join cost are equal,
which join method should we prefer.
Currently, the optional values for this variable are "broadcast" or "shuffle".
allow_partition_column_nullable
Whether to allow the partition column to be NULL when creating the table. The default is true, which means NULL is
allowed. false means the partition column must be defined as NOT NULL.
insert_visible_timeout_ms
When execute insert statement, doris will wait for the transaction to commit and visible after the import is completed.
This parameter controls the timeout of waiting for transaction to be visible. The default value is 10000, and the minimum
value is 1000.
enable_exchange_node_parallel_merge
In a sort query, when an upper level node receives the ordered data of the lower level node, it will sort the corresponding
data on the exchange node to ensure that the final data is ordered. However, when a single thread merges multiple
channels of data, if the amount of data is too large, it will lead to a single point of exchange node merge bottleneck.
Doris optimizes this part if there are too many data nodes in the lower layer. Exchange node will start multithreading for
parallel merging to speed up the sorting process. This parameter is false by default, which means that exchange node
does not adopt parallel merge sort to reduce the extra CPU and memory consumption.
extract_wide_range_expr
Used to control whether turn on the 'Wide Common Factors' rule. The value has two: true or false. On by default.
enable_fold_constant_by_be
Used to control the calculation method of constant folding. The default is false , that is, calculation is performed in FE ; if
it is set to true , it will be calculated by BE through RPC request.
cpu_resource_limit
Used to limit the resource overhead of a query. This is an experimental feature. The current implementation is to limit the
number of scan threads for a query on a single node. The number of scan threads is limited, and the data returned from
the bottom layer slows down, thereby limiting the overall computational resource overhead of the query. Assuming it is
set to 2, a query can use up to 2 scan threads on a single node.
This parameter will override the effect of parallel_fragment_exec_instance_num . That is, assuming that
parallel_fragment_exec_instance_num is set to 4, and this parameter is set to 2. Then 4 execution instances on a single
node will share up to 2 scanning threads.
This parameter will be overridden by the cpu_resource_limit configuration in the user property.
disable_join_reorder
Used to turn off all automatic join reorder algorithms in the system. There are two values: true and false.It is closed by
default, that is, the automatic join reorder algorithm of the system is adopted. After set to true, the system will close all
automatic sorting algorithms, adopt the original SQL table order, and execute join
enable_infer_predicate
Used to control whether to perform predicate derivation. There are two values: true and false. It is turned off by default,
that is, the system does not perform predicate derivation, and uses the original predicate to perform related operations.
After it is set to true, predicate expansion is performed.
return_object_data_as_binary
Used to identify whether to return the bitmap/hll result in the select result. In the select
into outfile statement, if the export file format is csv, the bimap/hll data will be base64-encoded, if it is the parquet file
format, the data will be stored as a byte array
block_encryption_mode
The block_encryption_mode variable controls the block encryption mode. The default setting is
empty, when use AES equal to AES_128_ECB , when use SM4 equal to SM3_128_ECB
available values:
AES_128_ECB,
AES_192_ECB,
AES_256_ECB,
AES_128_CBC,
AES_192_CBC,
AES_256_CBC,
AES_128_CFB,
AES_192_CFB,
AES_256_CFB,
AES_128_CFB1,
AES_192_CFB1,
AES_256_CFB1,
AES_128_CFB8,
AES_192_CFB8,
AES_256_CFB8,
AES_128_CFB128,
AES_192_CFB128,
AES_256_CFB128,
AES_128_CTR,
AES_192_CTR,
AES_256_CTR,
AES_128_OFB,
AES_192_OFB,
AES_256_OFB,
SM4_128_ECB,
SM4_128_CBC,
SM4_128_CFB128,
SM4_128_OFB,
SM4_128_CTR,
enable_infer_predicate
Used to control whether predicate deduction is performed. There are two values: true and false. It is turned off by default,
and the system does not perform predicate deduction, and uses the original predicate for related operations. When set
to true, predicate expansion occurs.
trim_tailing_spaces_for_external_table_query
Used to control whether trim the tailing spaces while quering Hive external tables. The default is false.
skip_storage_engine_merge
For debugging purpose. In vectorized execution engine, in case of problems of reading data of Aggregate Key model and
Unique Key model, setting value to true will read data as Duplicate Key model.
skip_delete_predicate
For debugging purpose. In vectorized execution engine, in case of problems of reading data, setting value to true will
also read deleted data.
skip_delete_bitmap
For debugging purpose. In Unique Key MoW table, in case of problems of reading data, setting value to true will also
read deleted data.
default_password_lifetime
Default password expiration time. The default value is 0, which means no expiration. The unit is days. This parameter is
only enabled if the user's password expiration property has a value of DEFAULT. like:
password_history
The default number of historical passwords. The default value is 0, which means no limit. This parameter is enabled only
when the user's password history attribute is the DEFAULT value. like:
validate_password_policy
Password strength verification policy. Defaults to NONE or 0 , i.e. no verification. Can be set to STRONG or 2 . When set to
STRONG or 2 , when setting a password via the ALTER USER or SET PASSWORD commands, the password must contain
any of "uppercase letters", "lowercase letters", "numbers" and "special characters". 3 items, and the length must be
greater than or equal to 8. Special characters include: ~!@#$%^&*()_+|<>,.?/:;'[]{}" .
group_concat_max_len
For compatible purpose. This variable has no effect, just enable some BI tools can query or set this session variable
sucessfully.
rewrite_or_to_in_predicate_threshold
The default threshold of rewriting OR to IN. The default value is 2, which means that when there are 2 ORs, if they can be
compact, they will be rewritten as IN predicate.
group_by_and_having_use_alias_first
Specifies whether group by and having clauses use column aliases rather than searching for column name in From
clause. The default value is false.
enable_file_cache
Set wether to use block file cache. This variable takes effect only if the BE config enable_file_cache=true. The cache is not
used when BE config enable_file_cache=false.
topn_opt_limit_threshold
Set threshold for limit of topn query (eg. SELECT * FROM t ORDER BY k LIMIT n). If n <= threshold, topn
optimizations(runtime predicate pushdown, two phase result fetch and read order by key) will enable automatically,
otherwise disable. Default value is 1024.
drop_table_if_ctas_failed
Controls whether create table as select deletes created tables when a insert error occurs, the default value is true.
show_user_default_role
Controls whether to show each user's implicit roles in the results of show roles . Default is false.
use_fix_replica
Use a fixed replica to query. If use_fix_replica is 1, the smallest one is used, if use_fix_replica is 2, the second smallest one is
used, and so on. The default value is -1, which means it is not enabled.
dry_run_query
If set to true, for query requests, the actual result set will no longer be returned, but only the number of rows. The default
is false.
This parameter can be used to avoid the time-consuming result set transmission when testing a large number of data
sets, and focus on the time-consuming underlying query execution.
+--------------+
| ReturnedRows |
+--------------+
| 10000000 |
+--------------+
Time zone
Doris supports multiple time zone settings
Noun Interpretation
FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access.
BE: Backend, Doris's back-end node. Responsible for query execution and data storage.
Basic concepts
There are multiple time zone related parameters in Doris
system_time_zone :
When the server starts, it will be set automatically according to the time zone set by the machine, which cannot be modified
after setting.
time_zone :
Specific operations
1. SHOW VARIABLES LIKE '% time_zone%'
This command can set the session level time zone, which will fail after disconnection.
This command can set time zone parameters at the global level. The FE will persist the parameters and will not fail when
the connection is disconnected.
It includes the values displayed by time functions such as NOW() or CURTIME() , as well as the time values in SHOW LOAD and
SHOW BACKENDS statements.
However, it does not affect the LESS THAN VALUE of the time-type partition column in the CREATE TABLE statement, nor does
it affect the display of values stored as DATE/DATETIME type.
FROM_UNIXTIME : Given a UTC timestamp, return the date and time of the specified time zone, such as FROM_UNIXTIME(0) ,
return the CST time zone: 1970-01-08:00 .
UNIX_TIMESTAMP : Given a specified time zone date and time, return UTC timestamp, such as CST time zone
UNIX_TIMESTAMP('1970-01 08:00:00') , return 0 .
NOW : Returns the specified date and time of specified time zone.
CONVERT_TZ : Converts a date and time from one specified time zone to another.
Restrictions
Time zone values can be given in several formats, case-insensitive:
Abbreviated time zone formats such as MET and CTT are not supported. Because the abbreviated time zone is
ambiguous in different scenarios, it is not recommended to use it.
In order to be compatible with Doris and support CST abbreviated time zone, CST will be internally transferred to
"Asia/Shanghai", which is Chinese standard time zone.
File Manager
Some functions in Doris require some user-defined files. For example, public keys, key files, certificate files and so on are
used to access external data sources. The File Manager provides a function that allows users to upload these files in advance
and save them in Doris system, which can then be referenced or accessed in other commands.
Noun Interpretation
BDBJE: Oracle Berkeley DB Java Edition. Distributed embedded database for persistent metadata in FE.
SmallFileMgr: File Manager. Responsible for creating and maintaining user files.
Basic concepts
Files are files created and saved by users in Doris.
A file is located by database , catalog , file_name . At the same time, each file also has a globally unique ID (file_id), which
serves as the identification in the system.
File creation and deletion can only be performed by users with admin privileges. A file belongs to a database. Users who have
access to a database (queries, imports, modifications, etc.) can use the files created under the database.
Specific operation
File management has three main commands: CREATE FILE , SHOW FILE and DROP FILE , creating, viewing and deleting files
respectively. The specific syntax of these three commands can be viewed by connecting to Doris and executing HELP cmd; .
CREATE FILE
This statement is used to create and upload a file to the Doris cluster. For details, see CREATE FILE.
Examples:
PROPERTIES
"url" = "https://test.bj.bcebos.com/kafka-key/ca.pem",
"catalog" = "kafka"
);
IN my_database
PROPERTIES
"url" = "https://test.bj.bcebos.com/kafka-key/client.key",
"catalog" = "my_catalog",
"md5" = "b5bb901bf10f99205b39a46ac3557dd9"
);
SHOW FILE
This statement can view the files that have been created successfully. For details, see SHOW FILE.
Examples:
DROP FILE
This statement can view and delete an already created file. For specific operations, see DROP FILE.
Examples:
Implementation details
Use of documents
If the FE side needs to use the created file, SmallFileMgr will directly save the data in FE memory as a local file, store it in the
specified directory, and return the local file path for use.
If the BE side needs to use the created file, BE will download the file content to the specified directory on BE through FE's
HTTP interface api/get_small_file for use. At the same time, BE also records the information of the files that have been
downloaded in memory. When BE requests a file, it first checks whether the local file exists and verifies it. If the validation
passes, the local file path is returned directly. If the validation fails, the local file is deleted and downloaded from FE again.
When BE restarts, local files are preloaded into memory.
Use restrictions
Because the file meta-information and content are stored in FE memory. So by default, only files with size less than 1MB can
be uploaded. And the total number of files is limited to 100. The configuration items described in the next section can be
modified.
Relevant configuration
1. FE configuration
Small_file_dir : The path used to store uploaded files, defaulting to the small_files/ directory of the FE runtime
directory.
max_small_file_size_bytes : A single file size limit in bytes. The default is 1MB. File creation larger than this configuration
will be rejected.
max_small_file_number : The total number of files supported by a Doris cluster. The default is 100. When the number of
files created exceeds this value, subsequent creation will be rejected.
If you need to upload more files or increase the size limit of a single file, you can modify the
max_small_file_size_bytes and max_small_file_number parameters by using the ADMIN SET CONFIG command.
However, the increase in the number and size of files will lead to an increase in FE memory usage.
2. BE configuration
Small_file_dir : The path used to store files downloaded from FE by default is in the lib/small_files/ directory of the
BE runtime directory.
More Help
For more detailed syntax and best practices used by the file manager, see CREATE FILE, DROP FILE and SHOW FILE
command manual, you can also enter HELP CREATE FILE , HELP DROP FILE and HELP SHOW FILE in the MySql client
command line to get more help information.
Demand scenario
A big usage scenario in the future is similar to the es log storage. In the log scenario, the data will be cut by date. Many data
are cold data, with few queries. Therefore, the storage cost of such data needs to be reduced. From the perspective of saving
storage costs
1. The price of ordinary cloud disks of cloud manufacturers is higher than that of object storage
2. In the actual online use of the doris cluster, the utilization rate of ordinary cloud disks cannot reach 100%
3. Cloud disk is not paid on demand, but object storage can be paid on demand
4. High availability based on ordinary cloud disks requires multiple replicas, and a replica migration is required for a replica
exception. This problem does not exist when data is placed on the object store, because the object store is shared 。
Solution
Set the freeze time on the partition level to indicate how long the partition will be frozen, and define the location of remote
storage stored after the freeze. On the be, the daemon thread will periodically determine whether the table needs to be
frozen. If it does, it will upload the data to s3.
The cold and hot separation supports all doris functions, but only places some data on object storage to save costs without
sacrificing functions. Therefore, it has the following characteristics:
When cold data is stored on object storage, users need not worry about data consistency and data security
Flexible freeze policy, cooling remote storage property can be applied to table and partition levels
Users query data without paying attention to the data distribution location. If the data is not local, they will pull the data
on the object and cache it to be local
Optimization of replica clone. If the stored data is on the object, the replica clone does not need to pull the stored data
locally
Remote object space recycling recycler. If the table and partition are deleted, or the space is wasted due to abnormal
conditions in the cold and hot separation process, the recycler thread will periodically recycle, saving storage resources
Cache optimization, which caches the accessed cold data to be local, achieving the query performance of non cold and
hot separation
Be thread pool optimization, distinguish whether the data source is local or object storage, and prevent the delay of
reading objects from affecting query performance
Storage policy
The storage policy is the entry to use the cold and hot separation function. Users only need to associate a storage policy with
a table or partition during table creation or doris use. that is, they can use the cold and hot separation function.
Since Version dev When creating an S3 RESOURCE, the S3 remote link verification will be performed to ensure that the
For example:
CREATE RESOURCE "remote_s3"
PROPERTIES
"type" = "s3",
"AWS_ENDPOINT" = "bj.s3.com",
"AWS_REGION" = "bj",
"AWS_BUCKET" = "test-bucket",
"AWS_ROOT_PATH" = "path/to/root",
"AWS_ACCESS_KEY" = "bbb",
"AWS_SECRET_KEY" = "aaaa",
"AWS_MAX_CONNECTIONS" = "50",
"AWS_REQUEST_TIMEOUT_MS" = "3000",
"AWS_CONNECTION_TIMEOUT_MS" = "1000"
);
PROPERTIES(
"storage_resource" = "remote_s3",
"cooldown_ttl" = "1d"
);
k1 BIGINT,
k2 LARGEINT,
v1 VARCHAR(2048)
UNIQUE KEY(k1)
PROPERTIES(
"storage_policy" = "test_policy"
);
For details, please refer to the resource, policy, create table, alter and other documents in the docs directory
Some restrictions
A single table or a single partition can only be associated with one storage policy. After association, the storage policy
cannot be dropped
The object information associated with the storage policy does not support modifying the data storage path information,
such as bucket, endpoint, and root_ Path and other information
Currently, the storage policy only supports creation and modification, not deletion
The cache is actually stored on the be local disk and does not occupy memory.
the cache can limit expansion and clean up data through LRU
The be parameter file_cache_alive_time_sec can set the maximum storage time of the cache data after it has not been
accessed. The default is 604800, which is one week.
The be parameter file_cache_max_size_per_disk can set the disk size occupied by the cache. Once this setting is
exceeded, the cache that has not been accessed for the longest time will be deleted. The default is 0, means no limit to
the size, unit: byte.
The be parameter file_cache_type is optional sub_file_cache (segment the remote file for local caching) and
whole_file_cache (the entire remote file for local caching), the default is " ", means no file is cached, please set it when
caching is required this parameter.
Unfinished Matters
After the data is frozen, there are new data updates or imports, etc. The compression has not been processed at present.
The schema change operation after the data is frozen is not supported at present.
Compute Node
<! --
Licensed to the Apache Software Foundation (ASF) under one
Or more contributor license agreements. See the NOTICE
file
Distributed with this work for additional information
Regarding copyright ownership. The ASF licenses this file
To you
under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
With the License. You
may obtain a copy of the License at
http: //www.apache.org/licenses/LICENSE-2.0
Compute node
Since Version 1.2.1
Scenario
At present, Doris is a typical Share-Nothing architecture, which achieves very high performance by binding data and
computing resources in the same node.
With the continuous improvement of the performance for the Doris computing
engine, more and more users have begun to use Doris to directly query data on data lake.
This is a Share-Disk scenario that
data is often stored on the remote HDFS/S3, and calculated in Doris.
Doris will get the data through the network, and then
completes the computation in memory.
For these two mixed loads in one cluster, current Doris architecture will appear
some disadvantages:
1. Poor resource isolation, the response requirements of these two loads are different, and the hybrid deployment will have
mutual effects.
2. Poor disk usage, the data lake query only needs the computing resources, while doris binding the storage and
computing and we have to expand them together, and cause a low utilization rate for disk.
3. Poor expansion efficiency, when the cluster is expanded, Doris will start the migration of Tablet data, and this process will
take a lot of time. And the data lake query load has obvious peaks and valleys, it need hourly flexibility.
solution
Implement a BE node role specially used for federated computing named Compute node .
Compute node is used to handle
remote federated queries such as this query of data lake.
The original BE node type is called hybrid node , and this type of
node can not only execute SQL query, but also handle tablet data storage.
And the Compute node only can execute SQL
query, it have no data on node.
With the computing node, the cluster deployment topology will also change:
the hybrid node is used for the data calculation of the OLAP type table, the node is expanded according to the storage
demand
the computing node is used for the external computing, and this node is expanded according to the query load.
Since the compute node has no storage, the compute node can be deployed on an HDD disk machine with other
workload or on a container.
Usage of ComputeNode
Configure
Add configuration items to BE's configuration file be.conf :
be_node_role = computation
This defualt value of this is mix , and this is original BE node type. After setting to computation , the node is a computing
node.
You can see the value of the'NodeRole 'field through the show backend\G command. If it is'mix ', it is a mixed node, and if it
is'computation', it is a computing node
BackendId: 10010
Cluster: default_cluster
IP: 10.248.181.219
HeartbeatPort: 9050
BePort: 9060
HttpPort: 8040
BrpcPort: 8060
Alive: true
SystemDecommissioned: false
ClusterDecommissioned: false
TabletNum: 753
DataUsedCapacity: 1.955 GB
AvailCapacity: 202.987 GB
TotalCapacity: 491.153 GB
UsedPct: 58.67 %
MaxDiskUsedPct: 58.67 %
RemoteUsedCapacity: 0.000
ErrMsg:
Version: doris-0.0.0-trunk-80baca264
Status: {"lastSuccessReportTabletsTime":"2022-12-05
15:00:38","lastStreamLoadTime":-1,"isQueryDisabled":false,"isLoadDisabled":false}
HeartbeatFailureCounter: 0
NodeRole: computation
Usage
Add configuration items in fe.conf
prefer_compute_node_for_external_table=true
min_backend_num_for_external_table=3
When using the MultiCatalog function when querying, the query will be dispatched to the computing node first.
some restrictions
Compute nodes are controlled by configuration items, so do not configure mixed type nodes, modify the configuration
to compute nodes.
TODO
Computational spillover: Doris inner table query, when the cluster load is high, the upper layer (outside TableScan)
operator can be scheduled to the compute node.
Graceful offline:
When the compute node goes offline, the new task of the task is automatically scheduled to online nodes
the node go offline after all the old tasks on the node are completed
when the old task cannot be completed on time, the task can kill by itself
Background
Doris is built on a columnar storage format engine. In high-concurrency service scenarios, users always want to retrieve
entire rows of data from the system. However, when tables are wide, the columnar format greatly amplifies random read IO.
Doris query engine and planner are too heavy for some simple queries, such as point queries. A short path needs to be
planned in the FE's query plan to handle such queries. FE is the access layer service for SQL queries, written in Java. Parsing
and analyzing SQL also leads to high CPU overhead for high-concurrency queries. To solve these problems, we have
introduced row storage, short query path, and PreparedStatement in Doris. Below is a guide to enable these optimizations.
"store_row_column" = "true"
) ENGINE=OLAP
UNIQUE KEY(`key`)
COMMENT 'OLAP'
PROPERTIES (
"enable_unique_key_merge_on_write" = "true",
"light_schema_change" = "true",
"store_row_column" = "true"
);
[NOTE]
1. enable_unique_key_merge_on_write should be enabled, since we need primary key for quick point lookup in storage
engine
2. when condition only contains primary key like select * from tbl_point_query where key = 123 , such query will go
through the short fast path
3. light_schema_change should also been enabled since we rely column unique id of each columns when doing point
query.
Using PreparedStatement
In order to reduce CPU cost for parsing query SQL and SQL expressions, we provide PreparedStatement feature in FE fully
compatible with mysql protocol (currently only support point queries like above mentioned).Enable it will pre caculate
PreparedStatement SQL and expresions and caches it in a session level memory buffer and will be reused later on.We could
improve 4x+ performance by using PreparedStatement when CPU became hotspot doing such queries.Bellow is an JDBC
example of using PreparedStatement .
url = jdbc:mysql://127.0.0.1:9030/ycsb?useServerPrepStmts=true
2. Using PreparedStatement
...
readStatement.setInt(1234);
...
readStatement.setInt(1235);
resultSet = readStatement.executeQuery();
...
row_cache_mem_limit : Specifies the percentage of memory occupied by the row cache. The default is 20% of memory.
Multi Catalog
Since Version 1.2.0
Multi-Catalog is a newly added feature in Doris 1.2.0. It allows Doris to interface with external catalogs more conveniently and
thus increases the data lake analysis and federated query capabilities of Doris.
In older versions of Doris, user data is in a two-tiered structure: database and table. Thus, connections to external catalogs
could only be done at the database or table level. For example, users could create a mapping to a table in an external catalog
via create external table , or to a database via create external database . If there were large amounts of databases or
tables in the external catalog, users would need to create mappings to them one by one, which could be a heavy workload.
With the advent of Multi-Catalog, Doris now has a new three-tiered metadata hierarchy (catalog -> database -> table), which
means users can connect to external data at the catalog level. The currently supported external catalogs include:
1. Hive
2. Iceberg
3. Hudi
4. Elasticsearch
5. JDBC
Multi-Catalog works as an additional and enhanced external table connection method. It helps users conduct multi-catalog
federated queries quickly.
Basic Concepts
1. Internal Catalog
Existing databases and tables in Doris are all under the Internal Catalog, which is the default catalog in Doris and cannot
be modified or deleted.
2. External Catalog
Users can create an External Catalog using the CREATE CATALOG command, and view the existing Catalogs via the
SHOW CATALOGS command.
3. Switch Catalog
After login, you will enter the Internal Catalog by default. Then, you can view or switch to your target database via SHOW
DATABASES and USE DB .
SWITCH internal;
SWITCH hive_catalog;
After switching catalog, you can view or switch to your target database in that catalog via SHOW DATABASES and USE DB .
You can view and access data in External Catalogs the same way as doing that in the Internal Catalog.
Doris only supports read-only access to data in External Catalogs currently.
4. Delete Catalog
Databases and tables in External Catalogs are for read only, but External Catalogs are deletable via the DROP CATALOG
command. (The Internal Catalog cannot be deleted.)
The deletion only means to remove the mapping in Doris to the corresponding catalog. It doesn't change the external
catalog itself by all means.
5. Resource
Resource is a set of configurations. Users can create a Resource using the CREATE RESOURCE command, and then apply
this Resource for a newly created Catalog. One Resource can be reused for multiple Catalogs.
Examples
Connect to Hive
The followings are the instruction on how to connect to a Hive catalog using the Catalog feature.
1. Create Catalog
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004'
);
2. View Catalog
+-----------+-------------+----------+
+-----------+-------------+----------+
| 0 | internal | internal |
+-----------+-------------+----------+
You can view the CREATE CATALOG statement via SHOW CREATE CATALOG.
3. Switch Catalog
Switch to the Hive Catalog using the SWITCH command, and view the databases in it:
+-----------+
| Database |
+-----------+
| default |
| random |
| ssb100 |
| tpch1 |
| tpch100 |
| tpch1_orc |
+-----------+
After switching to the Hive Catalog, you can use the relevant features.
For example, you can switch to Database tpch100, and view the tables in it:
Database changed
+-------------------+
| Tables_in_tpch100 |
+-------------------+
| customer |
| lineitem |
| nation |
| orders |
| part |
| partsupp |
| region |
| supplier |
+-------------------+
+-----------------+---------------+------+------+---------+-------+
+-----------------+---------------+------+------+---------+-------+
+-----------------+---------------+------+------+---------+-------+
+------------+------------+-----------+
+------------+------------+-----------+
+------------+------------+-----------+
mysql> SELECT l.l_shipdate FROM hive.tpch100.lineitem l WHERE l.l_partkey IN (SELECT p_partkey FROM
internal.db1.part) LIMIT 10;
+------------+
| l_shipdate |
+------------+
| 1993-02-16 |
| 1995-06-26 |
| 1995-08-19 |
| 1992-07-23 |
| 1998-05-23 |
| 1997-07-12 |
| 1994-03-06 |
| 1996-02-07 |
| 1997-06-01 |
| 1996-08-23 |
+------------+
The table is identified in the format of catalog.database.table . For example, internal.db1.part in the above snippet.
If the target table is in the current Database of the current Catalog, catalog and database in the format can be omitted.
You can use the INSERT INTO command to insert table data from the Hive Catalog into a table in the Internal Catalog.
This is how you can import data from External Catalogs to the Internal Catalog:
Database changed
Connect to Iceberg
See Iceberg
Connect to Hudi
See Hudi
Connect to Elasticsearch
See Elasticsearch
Connect to JDBC
See JDBC
As for types that cannot be mapped to a Doris column type, such as map and struct , Doris will map them to an
UNSUPPORTED type. Here are examples of queries in a table containing UNSUPPORTED types:
k1 INT,
k2 INT,
k3 UNSUPPORTED,
k4 INT
You can find more details of the mapping of various data sources (Hive, Iceberg, Hudi, Elasticsearch, and JDBC) in the
corresponding pages.
Privilege Management
Access from Doris to databases and tables in an External Catalog is not under the privilege control of the external catalog
itself, but is authorized by Doris.
Along with the new Multi-Catalog feature, we also added privilege management at the Catalog level (See Privilege
Management for details).
Metadata Update
Manual Update
By default, changes in metadata of external data sources, including addition or deletion of tables and columns, will not be
synchronized into Doris.
Users need to manually update the metadata using the REFRESH CATALOG command.
Automatic Update
Since Version 1.2.2
Currently, Doris only supports automatic update of metadata in Hive Metastore (HMS). It perceives changes in metadata by
the FE node which regularly reads the notification events from HMS. The supported events are as follows:
CREATE
Create a database in the corresponding catalog.
DATABASE
DROP
Delete a database in the corresponding catalog.
DATABASE
ALTER Such alterations mainly include changes in properties, comments, or storage location of databases. They do not affect
DATABASE Doris' queries in External Catalogs so they will not be synchronized.
CREATE
Create a table in the corresponding database.
TABLE
DROP
Delete a table in the corresponding database, and invalidate the cache of that table.
TABLE
ALTER If it is a renaming, delete the table of the old name, and then create a new table with the new name; otherwise, invalidate
TABLE the cache of that table.
ADD
Add a partition to the cached partition list of the corresponding table.
PARTITION
DROP
Delete a partition from the cached partition list of the corresponding table, and invalidate the cache of that partition.
PARTITION
ALTER If it is a renaming, delete the partition of the old name, and then create a new partition with the new name; otherwise,
PARTITION invalidate the cache of that partition.
After data ingestion, changes in partition tables will follow the ALTER PARTITION logic, while those in non-partition tables
will follow the ALTER TABLE logic.
If changes are conducted on the file system directly instead of through the HMS, the HMS will not generate an event. As a
result, such changes will not be perceived by Doris.
To enable automatic update, you need to modify the hive-site.xml of HMS and then restart HMS:
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.dml.events</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.transactional.event.listeners</name>
<value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
</property>
Note: To enable automatic update, whether for existing Catalogs or newly created Catalogs, all you need is to set
enable_hms_events_incremental_sync to true , and then restart the FE node. You don't need to manually update the
metadata before or after the restart.
Hive
Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access
databases and tables in Hive and conduct queries.
Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also
access these systems via Hive Catalog.
Usage
When connnecting to Hive, Doris:
Create Catalog
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
In addition to type and hive.metastore.uris , which are required, you can specify other parameters regarding the
connection.
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hive.metastore.sasl.enabled' = 'true',
'dfs.nameservices'='your-nameservice',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'hadoop.security.authentication' = 'kerberos',
'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
'hadoop.kerberos.principal' = '[email protected]',
'hive.metastore.kerberos.principal' = 'your-hms-principal'
);
Remember krb5.conf and keytab file should be placed at all BE nodes and FE nodes. The location of keytab file should be
equal to the value of hadoop.kerberos.keytab .
As default, krb5.conf should be placed at /etc/krb5.conf .
Value of hive.metastore.kerberos.principal should be same with the same name property used by HMS you are
connecting to, which can be found in hive-site.xml .
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
);
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'root',
'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
'juicefs.meta' = 'xxx'
);
In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating
new Catalogs. Here is an example:
# 1. Create Resource
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite
the corresponding information in the Resource.
'key' = 'value'
);
You can also put the hive-site.xml file in the conf directories of FE and BE. This will enable Doris to automatically read
information from hive-site.xml . The relevant information will be overwritten based on the following rules :
Hive Versions
Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access
Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hive.version' = '1.1.0'
);
boolean boolean
tinyint tinyint
smallint smallint
int int
bigint bigint
date date
timestamp datetime
float float
double double
char char
varchar varchar
decimal decimal
other unsupported
Iceberg
Usage
When connecting to Iceberg, Doris:
Create Catalog
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Access metadata with the iceberg API. The Hive, REST, Glue and other services can serve as the iceberg catalog.
'type'='iceberg',
'iceberg.catalog.type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
"type"="iceberg",
"iceberg.catalog.type" = "glue",
"glue.endpoint" = "https://glue.us-east-1.amazonaws.com",
"warehouse" = "s3://bucket/warehouse",
"AWS_ENDPOINT" = "s3.us-east-1.amazonaws.com",
"AWS_REGION" = "us-east-1",
"AWS_ACCESS_KEY" = "ak",
"AWS_SECRET_KEY" = "sk",
"use_path_style" = "true"
);
warehouse : Glue Warehouse Location. To determine the root path of the data warehouse in storage.
RESTful service as the server side. Implementing RESTCatalog interface of iceberg to obtain metadata.
'type'='iceberg',
'iceberg.catalog.type'='rest',
'uri' = 'http://172.21.0.1:8181',
);
"AWS_ACCESS_KEY" = "ak"
"AWS_SECRET_KEY" = "sk"
"AWS_REGION" = "region-name"
"AWS_ENDPOINT" = "http://endpoint-uri"
Time Travel
Since Version dev
You can read data of historical table versions using the FOR TIME AS OF or FOR VERSION AS OF statements based on the
Snapshot ID or the timepoint the Snapshot is generated. For example:
You can use the iceberg_meta table function to view the Snapshot details of the specified table.
Hudi
Usage
1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read
tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More
types of Catalogs will be supported in future versions.
Create Catalog
Same as creating Hive Catalogs. A simple example is provided here. See Hive for more information.
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-
nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Elasticsearch
Elasticsearch (ES) Catalogs in Doris support auto-mapping of ES metadata. Users can utilize the full-text search capability of
ES in combination of the distributed query planning capability of Doris to provide a full-fledged OLAP solution that is able to
perform:
Usage
1. Doris supports Elasticsearch 5.x and newer versions.
Create Catalog
CREATE CATALOG es PROPERTIES (
"type"="es",
"hosts"="http://127.0.0.1:9200"
);
Since there is no concept of "database" in ES, after connecting to ES, Doris will automatically generate a unique database:
default_db .
After switching to the ES Catalog, you will be in the dafault_db so you don't need to execute the USE default_db
command.
Parameter Description
Required Default
Parameter Description
or Not Value
hosts Yes ES address, can be one or multiple addresses, or the load balancer address of ES
doc_value_scan No true Whether to obtain value of the target field by ES/Lucene columnar storage
Whether to sniff the text.fields in ES based on keyword; If this is set to false, the system will
keyword_sniff No true
perform matching after tokenization.
Whether to enable ES node discovery, set to true by default; set to false in network isolation
nodes_discovery No true
environments and only connected to specified nodes
Whether to enable HTTPS access mode for ES, currently follows a "Trust All" method in
ssl No false
FE/BE
Whether to transform like to wildcard push down to es, this increases the cpu consumption
like_push_down No true
of the es.
1. In terms of authentication, only HTTP Basic authentication is supported and it requires the user to have read
privilege for the index and paths including /_cluster/state/ and _nodes/http ; if you have not enabled security
authentication for the cluster, you don't need to set the user and password .
2. If there are multiple types in the index in 5.x and 6.x, the first type is taken by default.
null null
boolean boolean
byte tinyint
short smallint
integer int
long bigint
unsigned_long largeint
float float
half_float float
double double
scaled_float double
keyword string
text string
ip string
nested string
object string
other unsupported
Since Version dev ### Array Type
Elasticsearch does not have an explicit array type, but one of its fields can contain
0 or more values.
To indicate that a field is
an array type, a specific doris structural annotation can be added to the
_meta section of the index mapping.
For
Elasticsearch 6.x and before release, please refer _meta.
For example, suppose there is an index doc containing the following data structure.
"id_field": "id-xxx-xxx",
"timestamp_field": "2022-11-12T12:08:56Z",
"array_object_field": [
"name": "xxx",
"age": 18
The array fields of this structure can be defined by using the following command to add the field property definition
to the
_meta.doris property of the target index mapping.
"_meta": {
"doris":{
"array_fields":[
"array_int_field",
"array_string_field",
"array_object_field"
}'
"_doc": {
"_meta": {
"doris":{
"array_fields":[
"array_int_field",
"array_string_field",
"array_object_field"
'
For the sake of optimization, operators will be converted into the following ES queries:
= term query
in terms query
and bool.filter
or bool.should
not bool.must_not
After this, when obtaining data from ES, Doris will follow these rules:
Try and see: Doris will automatically check if columnar storage is enabled for the target fields (doc_value: true), if it is,
Doris will obtain all values in the fields from the columnar storage.
Auto-downgrading: If any one of the target fields is not available in columnar storage, Doris will parse and obtain all
target data from row storage ( _source ).
Benefits
By default, Doris On ES obtains all target columns from _source , which is in row storage and JSON format. Compared to
columnar storage, _source is slow in batch read. In particular, when the system only needs to read small number of
columns, the performance of docvalue can be about a dozen times faster than that of _source .
Note
1. Columnar storage is not available for text fields in ES. Thus, if you need to obtain fields containing text values, you will
need to obtain them from _source .
2. When obtaining large numbers of fields ( >= 25 ), the performances of docvalue and _source are basically equivalent.
ES allows direct data ingestion without an index since it will automatically create an index after ingestion. For string fields, ES
will create a field with both text and keyword types. This is how the Multi-Field feature of ES works. The mapping is as
follows:
"k4": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
For example, to conduct "=" filtering on k4 , Doris on ES will convert the filtering operation into an ES TermQuery.
k4 = "Doris On ES"
"term" : {
Since the first field of k4 is text , it will be tokenized by the analyzer set for k4 (or by the standard analyzer if no analyzer has
been set for k4 ) after data ingestion. As a result, it will be tokenized into three terms: "Doris", "on", and "ES".
POST /_analyze
"analyzer": "standard",
"tokens": [
"token": "doris",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
"token": "on",
"start_offset": 6,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
},
"token": "es",
"start_offset": 9,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
"term" : {
Since there is no term in the dictionary that matches the term Doris On ES , no result will be returned.
However, if you have set enable_keyword_sniff: true , the system will convert k4 = "Doris On ES" to k4.keyword = "Doris
On ES" to match the SQL semantics. The converted ES query DSL will be:
"term" : {
k4.keyword is of keyword type, so the data is written in ES as a complete term, allowing for successful matching.
Then, Doris will discover all available data nodes (the allocated shards) in ES. If Doris BE hasn't accessed the ES data node
addresses, then set "nodes_discovery" = "false" . ES clusters are deployed in private networks that are isolated from public
Internet, so users will need proxy access.
A temporary solution is to implement a "Trust All" method in FE/BE. In the future, the real user configuration certificates will
be used.
Query Usage
You can use the ES external tables in Doris the same way as using Doris internal tables, except that the Doris data models
(Rollup, Pre-Aggregation, and Materialized Views) are unavailable.
Basic Query
select * from es_table where k1 > 1000 and k3 ='term' or k4 like 'fu*z_'
In esquery , the first parameter (the column name) is used to associate with index , while the second parameter is the JSON
expression of basic Query DSL in ES, which is surrounded by {} . The root key in JSON is unique, which can be
match_phrase , geo_shape or bool , etc.
A match_phrase query:
select * from es_table where esquery(k4, '{
"match_phrase": {
}');
A geo query:
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates": [
13,
53
],
14,
52
},
"relation": "within"
}');
A bool query:
"bool": {
"must": [
"terms": {
"k1": [
11,
12
},
"terms": {
"k2": [
100
}');
ES boasts flexible usage of time fields, but in ES external tables, improper type setting of time fields will result in predicate
pushdown failures.
It is recommended to allow the highest level of format compatibility for time fields when creating an index:
"dt": {
"type": "date",
When creating this field in Doris, it is recommended to set its type to date or datetime (or varchar ) . You can use the
following SQL statements to push the filters down to ES.
Note:
strict_date_optional_time||epoch_millis
Timestamps ingested into ES need to be converted into ms , which is the internal processing format in ES; otherwise
errors will occur in ES external tables.
To obtain such field values from ES external tables, you can add an _id field of varchar type when creating tables.
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://127.0.0.1:8200",
"user" = "root",
"password" = "root",
"index" = "doe"
To obtain such field values from ES Catalogs, please set "mapping_es_id" = "true" .
Note:
FAQ
1. Are X-Pack authenticated ES clusters supported?
2. Why are some queries require longer response time than those in ES?
For _count queries, ES can directly read the metadata regarding the number of the specified files instead of filtering the
original data. This is a huge time saver.
Currently, Doris On ES does not support pushdown for aggregations such as sum, avg, and min/max. In such operations,
Doris obtains all files that met the specified conditions from ES and then conducts computing internally.
Appendix
| |
| Doris +------------------+ |
| | FE +--------------+-------+
| +--+-------------+-+ | |
| ^ ^ | |
| | | | |
| +-------------------+ +------------------+ | |
| | | | | | | | |
| | +----------+----+ | | +--+-----------+ | | |
| | | BE | | | | BE | | | |
| | +---------------+ | | +--------------+ | | |
+----------------------------------------------+ |
| | | | | | |
| | | | | | |
+-----------+---------------------+------------+ |
| | v | | v | | |
| | +------+--------+ | | +------+-------+ | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | +<--------------------------------+
| | +---------------+ | | |--------------| | | |
| +-------------------+ +------------------+ | |
| | |
| +-----------------------+ | |
| | | | |
| | MasterNode +<-----------------+
| ES | | |
| +-----------------------+ |
+----------------------------------------------+
1. Doris FE sends a request to the specified host for table creation in order to obtain information about the HTTP port and
the index shard allocation.
2. Based on the information about node and index metadata from FE, Doris generates a query plan and send it to the
corresponding BE node.
3. Following the principle of proximity, the BE node sends request to the locally deployed ES node, and obtain data from
_source or docvalue from each shard of ES index concurrently by way of HTTP Scroll .
JDBC
JDBC Catalogs in Doris are connected to external data sources using the standard JDBC protocol.
Once connected, Doris will ingest metadata of databases and tables from the external data sources in order to enable quick
access to external data.
Usage
1. Supported datas sources include MySQL, PostgreSQL, Oracle, SQLServer, Clickhouse and Doris.
Create Catalog
1. MySQL
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3306/demo",
"driver_url" = "mysql-connector-java-5.1.47.jar",
"driver_class" = "com.mysql.jdbc.Driver"
2. PostgreSQL
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:postgresql://127.0.0.1:5449/demo",
"driver_url" = "postgresql-42.5.1.jar",
"driver_class" = "org.postgresql.Driver"
);
As for data mapping from PostgreSQL to Doris, one Database in Doris corresponds to one schema in the specified database
in PostgreSQL (for example, "demo" in jdbc_url above), and one Table in that Database corresponds to one table in that
schema. To make it more intuitive, the mapping relations are as follows:
Doris PostgreSQL
Catalog Database
Database Schema
Table Table
3. Oracle
Since Version 1.2.2
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:oracle:thin:@127.0.0.1:1521:helowin",
"driver_url" = "ojdbc6.jar",
"driver_class" = "oracle.jdbc.driver.OracleDriver"
);
As for data mapping from Oracle to Doris, one Database in Doris corresponds to one User, and one Table in that Database
corresponds to one table that the User has access to. In conclusion, the mapping relations are as follows:
Doris Oracle
Catalog Database
Database User
Table Table
4. Clickhouse
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:clickhouse://127.0.0.1:8123/demo",
"driver_url" = "clickhouse-jdbc-0.3.2-patch11-all.jar",
"driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
);
5. SQLServer
"type"="jdbc",
"user"="SA",
"password"="Doris123456",
"jdbc_url" = "jdbc:sqlserver://localhost:1433;DataBaseName=doris_test",
"driver_url" = "mssql-jdbc-11.2.3.jre8.jar",
"driver_class" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
);
As for data mapping from SQLServer to Doris, one Database in Doris corresponds to one schema in the specified database in
SQLServer (for example, "doris_test" in jdbc_url above), and one Table in that Database corresponds to one table in that
schema. The mapping relations are as follows:
Doris SQLServer
Catalog Database
Database Schema
Table Table
6. Doris
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:9030?useSSL=false",
"driver_url" = "mysql-connector-java-5.1.47.jar",
"driver_class" = "com.mysql.jdbc.Driver"
);
Currently, Jdbc Catalog only support to use 5.x version of JDBC jar package to connect another Doris database. If you use 8.x
version of JDBC jar package, the data type of column may not be matched.
Parameter Description
Required or Default
Parameter Description
Not Value
1. File name. For example, mysql-connector-java-5.1.47.jar . Please place the Jar file package in jdbc_drivers/
under the FE/BE deployment directory in advance so the system can locate the file. You can change the location of
the file by modifying jdbc_drivers_dir in fe.conf and be.conf.
2. Local absolute path. For example, file:///path/to/mysql-connector-java-5.1.47.jar . Please place the Jar file
package in the specified paths of FE/BE node.
only_specified_database :
When the JDBC is connected, you can specify which database/schema to connect. For example, you can specify the
DataBase in mysql jdbc_url ; you can specify the CurrentSchema in PG jdbc_url . only_specified_database specifies
whether only the database specified to be synchronized.
If you connect the Oracle database when using this property, please use the version of the jar package above 8 or more
(such as ojdbc8.jar).
Query
select * from mysql_table where k1 > 1000 and k3 ='term';
In some cases, the keywords in the database might be used as the field names. For queries to function normally in these
cases, Doris will add escape characters to the field names and tables names in SQL statements based on the rules of different
databases, such as (``) for MySQL, ([]) for SQLServer, and (" ") for PostgreSQL and Oracle. This might require extra attention on
case sensitivity. You can view the query statements sent to these various databases via explain sql .
Write Data
Since Version 1.2.2 After creating a JDBC Catalog in Doris, you can write data or query results to it using the `insert into`
statement. You can also ingest data from one JDBC Catalog Table to another JDBC Catalog Table.
Example:
Transaction
In Doris, data is written to External Tables in batches. If the ingestion process is interrupted, rollbacks might be required.
That's why JDBC Catalog Tables support data writing transactions. You can utilize this feature by setting the session variable:
enable_odbc_transcation .
The transaction mechanism ensures the atomicity of data writing to JDBC External Tables, but it reduces performance to a
certain extent. You may decide whether to enable transactions based on your own tradeoff.
MySQL
MYSQL Type Doris Type Comment
BOOLEAN BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
MEDIUMINT INT
INT INT
BIGINT BIGINT
FLOAT FLOAT
DOUBLE DOUBLE
DECIMAL DECIMAL
DATE DATE
TIMESTAMP DATETIME
DATETIME DATETIME
YEAR SMALLINT
TIME STRING
CHAR CHAR
VARCHAR VARCHAR
、TEXT、MEDIUMTEXT、LONGTEXT、TINYBLOB、BLOB、
TINYTEXT
MEDIUMBLOB、LONGBLOB、TINYSTRING、STRING、MEDIUMSTRING、 STRING
LONGSTRING、 BINARY、 VARBINARY、 JSON 、 SET、 BIT
Other UNSUPPORTED
PostgreSQL
POSTGRESQL Type Doris Type Comment
boolean BOOLEAN
smallint/int2 SMALLINT
integer/int4 INT
bigint/int8 BIGINT
decimal/numeric DECIMAL
real/float4 FLOAT
smallserial SMALLINT
serial INT
bigserial BIGINT
char CHAR
varchar/text STRING
timestamp DATETIME
date DATE
time STRING
interval STRING
point/line/lseg/box/path/polygon/circle STRING
POSTGRESQL Type Doris Type Comment
cidr/inet/macaddr STRING
uuid/josnb STRING
Other UNSUPPORTED
Oracle
ORACLE Type Doris Type Comment
Doris does not support Oracle NUMBER type that does not
number
specified p and s
float/real DOUBLE
DATE DATETIME
TIMESTAMP DATETIME
CHAR/NCHAR STRING
VARCHAR2/NVARCHAR2 STRING
Other UNSUPPORTED
SQLServer
SQLServer Type Doris Type Comment
bit BOOLEAN
smallint SMALLINT
int INT
bigint BIGINT
real FLOAT
float/money/smallmoney DOUBLE
decimal/numeric DECIMAL
date DATE
datetime/datetime2/smalldatetime DATETIMEV2
SQLServer Type Doris Type Comment
char/varchar/text/nchar/nvarchar/ntext STRING
binary/varbinary STRING
time/datetimeoffset STRING
Other UNSUPPORTED
Clickhouse
ClickHouse Type Doris Type Comment
Bool BOOLEAN
String STRING
Date/Date32 DATE
DateTime/DateTime64 DATETIME Data beyond the maximum precision of DateTime in Doris will be truncated.
Float32 FLOAT
Float64 DOUBLE
Int8 TINYINT
Int16/UInt8 SMALLINT Doris does not support UNSIGNED data types so UInt8 will be mapped to SMALLINT.
Int32/UInt16 INT Doris does not support UNSIGNED data types so UInt16 will be mapped to INT.
Int64/Uint32 BIGINT Doris does not support UNSIGNED data types so UInt32 will be mapped to BIGINT.
Int128/UInt64 LARGEINT Doris does not support UNSIGNED data types so UInt64 will be mapped to LARGEINT.
Doris does not support data types of such orders of magnitude so these will be mapped to
Int256/UInt128/UInt256 STRING
STRING.
DECIMAL DECIMAL Data beyond the maximum decimal precision in Doris will be truncated.
Data of IPv4 and IPv6 type will be displayed with an extra / as a prefix. To remove the / ,
Enum/IPv4/IPv6/UUID STRING
you can use the split_part function.
Other UNSUPPORTED
Doris
Doris Type Jdbc Catlog Doris Type Comment
BOOLEAN BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
INT INT
BIGINT BIGINT
LARGEINT LARGEINT
FLOAT FLOAT
DOUBLE DOUBLE
DECIMAL / The Data type is based on the DECIMAL field's (precision, scale) and the
DECIMAL/DECIMALV3/STRING
DECIMALV3 enable_decimal_conversion configuration
DATE DATEV2 JDBC CATLOG uses Datev2 type default when connecting DORIS
DATEV2 DATEV2
Doris Type Jdbc Catlog Doris Type Comment
DATETIME DATETIMEV2 JDBC CATLOG uses DATETIMEV2 type default when connecting DORIS
DATETIMEV2 DATETIMEV2
CHAR CHAR
VARCHAR VARCHAR
STRING STRING
TEXT STRING
Other UNSUPPORTED
FAQ
1. Are there any other databases supported besides MySQL, Oracle, PostgreSQL, SQLServer, and ClickHouse?
Currently, Doris supports MySQL, Oracle, PostgreSQL, SQLServer, and ClickHouse. We are planning to expand this list.
Technically, any databases that support JDBC access can be connected to Doris in the form of JDBC external tables. You
are more than welcome to be a Doris contributor to expedite this effort.
2. Why does Mojibake occur when Doris tries to read emojis from MySQL external tables?
In MySQL, utf8mb3 is the default utf8 format. It cannot represent emojis, which require 4-byte encoding. To solve this,
when creating MySQL external tables, you need to set utf8mb4 encoding for the corresponding columns, set the server
encoding to utf8mb4, and leave the characterEncoding in JDBC URl empty (because utf8mb4 is not supported for this
property, and anything other than utf8mb4 will cause a failure to write the emojis).
Modify the my.ini file in the mysql directory (for linux system, modify the my.cnf file in the etc directory)
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
ALTER TABLE table_name MODIFY colum_name VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
3. Why does the error message "CAUSED BY: DataReadException: Zero date value prohibited" pop up when
DateTime="0000:00:00 00:00:00" while reading MySQL external tables?
This error occurs because of an illegal DateTime. It can be fixed by modifying the zeroDateTimeBehavior parameter.
The options for this parameter include: EXCEPTION , CONVERT_TO_NULL , ROUND . Respectively, they mean to report error,
convert to null, and round the DateTime to "0001-01-01 00:00:00" when encountering an illegal DateTime.
You can add "jdbc_url"="jdbc:mysql://IP:PORT/doris_test?zeroDateTimeBehavior=convertToNull" to the URL.
4. Why do loading failures happen when reading MySQL or other external tables?
For example:
Such errors occur because the driver_class has been wrongly put when creating the Resource. The problem with the
above example is the letter case. It should be corrected as "driver_class" = "com.mysql.jdbc.Driver" .
The last packet successfully received from the server was 7 milliseconds ago. The last packet sent
successfully to the server was 4 milliseconds ago.
The last packet successfully received from the server was 7 milliseconds ago. The last packet sent
successfully to the server was 4 milliseconds ago.
WARN: Establishing SSL connection without server's identity verification is not recommended.
According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if
explicit option isn't set.
For compliance with existing applications not using SSL the verifyServerCertificate property is set to
'false'.
You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore
for server certificate verification.
You can add ?useSSL=false to the end of the JDBC connection string when creating Catalog. For example, "jdbc_url"
= "jdbc:mysql://127.0.0.1:3306/test?useSSL=false" .
To reduce memory usage, Doris obtains one batch of query results at a time, and has a size limit for each batch. However,
MySQL conducts one-off loading of all query results by default, which means the "loading in batches" method won't
work. To solve this, you need to specify "jdbc_url"="jdbc:mysql: //IP:PORT/doris_test?useCursorFetch=true" in the URL.
7. What to do with errors such as "CAUSED BY: SQLException OutOfMemoryError" when performing JDBC queries?
If you have set useCursorFetch for MySQL, you can increase the JVM memory limit by modifying the value of
jvm_max_heap_size in be.conf. The current default value is 1024M.
What is DLF
Doris can access DLF the same way as it accesses Hive Metastore.
Connect to DLF
"type"="hms",
"dlf.catalog.proxyMode" = "DLF_ONLY",
"hive.metastore.type" = "dlf",
"dlf.catalog.endpoint" = "dlf.cn-beijing.aliyuncs.com",
"dlf.catalog.region" = "cn-beijing",
"dlf.catalog.uid" = "uid",
"dlf.catalog.accessKeyId" = "ak",
"dlf.catalog.accessKeySecret" = "sk"
);
type should always be hms . If you need to access Alibaba Cloud OSS on the public network, can add
"dlf.catalog.accessPublic"="true" .
dlf.catalog.uid : Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud
console.
dlf.catalog.accessKeyId:AccessKey, which you can create and manage on the Alibaba Cloud console.
dlf.catalog.accessKeySecret :SecretKey, which you can create and manage on the Alibaba Cloud console.
After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
<?xml version="1.0"?>
<configuration>
<property>
<name>hive.metastore.type</name>
<value>dlf</value>
</property>
<property>
<name>dlf.catalog.endpoint</name>
<value>dlf-vpc.cn-beijing.aliyuncs.com</value>
</property>
<property>
<name>dlf.catalog.region</name>
<value>cn-beijing</value>
</property>
<property>
<name>dlf.catalog.proxyMode</name>
<value>DLF_ONLY</value>
</property>
<property>
<name>dlf.catalog.uid</name>
<value>20000000000000000</value>
</property>
<property>
<name>dlf.catalog.accessKeyId</name>
<value>XXXXXXXXXXXXXXX</value>
</property>
<property>
<name>dlf.catalog.accessKeySecret</name>
<value>XXXXXXXXXXXXXXXXX</value>
</property>
</configuration>
2. Restart FE, Doris will read and parse fe/conf/hive-site.xml . And then Create Catalog via the CREATE CATALOG
statement.
"type"="hms",
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
type should always be hms ; while hive.metastore.uris can be arbitary since it is not used in real practice, but it should
follow the format of Hive Metastore Thrift URI.
FAQ
1. What to do with errors such as failed to get schema and Storage schema reading not supported when accessing
Icerberg tables via Hive Metastore?
To fix this, please place the Jar file package of iceberg runtime in the lib/ directory of Hive.
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
2. What to do with the GSS initiate failed error when connecting to Hive Metastore with Kerberos authentication?
Usually it is caused by incorrect Kerberos authentication information, you can troubleshoot by the following steps:
i. In versions before 1.2.1, the libhdfs3 library that Doris depends on does not enable gsasl. Please update to a version
later than 1.2.2.
ii. Confirm that the correct keytab and principal are set for each component, and confirm that the keytab file exists on
all FE and BE nodes.
iii. Try to replace the IP in the principal with a domain name (do not use the default _HOST placeholder)
iv. Confirm that the /etc/krb5.conf file exists on all FE and BE nodes.
3. What to do with the java.lang.VerifyError: xxx error when accessing HDFS 3.x?
Doris 1.2.1 and the older versions rely on Hadoop 2.8. Please update Hadoop to 2.10.2 or update Doris to 1.2.2 or newer.
4. An error is reported when using KMS to access HDFS: java.security.InvalidKeyException: Illegal key size
Upgrade the JDK version to a version >= Java 8 u162. Or download and install the JCE Unlimited Strength Jurisdiction
Policy Files corresponding to the JDK.
5. When querying a table in ORC format, FE reports an error Could not obtain block
For ORC files, by default, FE will access HDFS to obtain file information and split files. In some cases, FE may not be able
to access HDFS. It can be solved by adding the following parameters:
"hive.exec.orc.split.strategy" = "BI"
6. An error is reported when connecting to SQLServer through JDBC Catalog: unable to find valid certification path
to requested target
Note: After version 1.2.3, these parameters will be automatically added when using JDBC Catalog to connect to the
MySQL database.
8. An error is reported when connecting to the MySQL database through the JDBC Catalog: Establishing SSL connection
without server's identity verification is not recommended
By creating JDBC External Tables, Doris can access external tables via JDBC, the standard database access inferface. This
allows Doris to visit various databases without tedious data ingestion, and give full play to its own OLAP capabilities to
perform data analysis on external tables:
properties (
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url"="jdbc:mysql://192.168.0.1:3306/test?useCursorFetch=true",
"driver_url"="http://IP:port/mysql-connector-java-5.1.47.jar",
"driver_class"="com.mysql.jdbc.Driver"
);
) ENGINE=JDBC
PROPERTIES (
"resource" = "jdbc_resource",
"table" = "baseall",
"table_type"="mysql"
);
Parameter Description:
Parameter Description
JDBC URL protocol, including the database type, IP address, port number, and database name; Please be aware of the
jdbc_url different formats of different database protocols. For example, MySQL: "jdbc:mysql://127.0.0.1:3306/test?
useCursorFetch=true".
driver_class Class of the driver used to access the external database. For example, to access MySQL data: com.mysql.jdbc.Driver.
Driver URL for downloading the Jar file package that is used to access the external database, for example,
http://IP:port/mysql-connector-java-5.1.47.jar. For local stand-alone testing, you can put the Jar file package in a local path:
driver_url
"driver_url"="file:///home/disk1/pathTo/mysql-connector-java-5.1.47.jar"; for local multi-machine testing, please ensure the
consistency of the paths.
resource Name of the Resource that the Doris External Table depends on; should be the same as the name set in Resource creation.
table_type The database from which the external table comes, such as mysql, postgresql, sqlserver, and oracle.
Note:
For local testing, please make sure you put the Jar file package in the FE and BE nodes, too.
In Doris 1.2.1 and newer versions, if you have put the driver in the jdbc_drivers directory of FE/BE, you can simply
specify the file name in the driver URL: "driver_url" = "mysql-connector-java-5.1.47.jar" , and the system will
automatically find the file in the jdbc_drivers directory.
Query
select * from mysql_table where k1 > 1000 and k3 ='term';
In some cases, the keywords in the database might be used as the field names. For queries to function normally in these
cases, Doris will add escape characters to the field names and tables names in SQL statements based on the rules of different
databases, such as (``) for MySQL, ([]) for SQLServer, and (" ") for PostgreSQL and Oracle. This might require extra attention on
case sensitivity. You can view the query statements sent to these various databases via explain sql .
Write Data
After creating a JDBC External Table in Doris, you can write data or query results to it using the insert into statement. You
can also ingest data from one JDBC External Table to another JDBC External Table.
Transaction
In Doris, data is written to External Tables in batches. If the ingestion process is interrupted, rollbacks might be required.
That's why JDBC External Tables support data writing transactions. You can utilize this feature by setting the session variable:
enable_odbc_transcation (ODBC transactions are also controlled by this variable).
The transaction mechanism ensures the atomicity of data writing to JDBC External Tables, but it reduces performance to a
certain extent. You may decide whether to enable transactions based on your own tradeoff.
1.MySQL Test
MySQL Version MySQL JDBC Driver Version
8.0.30 mysql-connector-java-5.1.47.jar
2.PostgreSQL Test
14.5 postgresql-42.5.0.jar
properties (
"type"="jdbc",
"user"="postgres",
"password"="123456",
"jdbc_url"="jdbc:postgresql://127.0.0.1:5442/postgres?currentSchema=doris_test",
"driver_url"="http://127.0.0.1:8881/postgresql-42.5.0.jar",
"driver_class"="org.postgresql.Driver"
);
`k1` int
) ENGINE=JDBC
PROPERTIES (
"resource" = "jdbc_pg",
"table" = "pg_tbl",
"table_type"="postgresql"
);
3.SQLServer Test
2022 mssql-jdbc-11.2.0.jre8.jar
4.Oracle Test
11 ojdbc6.jar
5.ClickHouse Test
ClickHouse Version ClickHouse JDBC Driver Version
22 clickhouse-jdbc-0.3.2-patch11-all.jar
Type Mapping
The followings list how data types in different databases are mapped in Doris.
MySQL
MySQL Doris
BOOLEAN BOOLEAN
BIT(1) BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
MySQL Doris
INT INT
BIGINT BIGINT
VARCHAR VARCHAR
DATE DATE
FLOAT FLOAT
DATETIME DATETIME
DOUBLE DOUBLE
DECIMAL DECIMAL
PostgreSQL
PostgreSQL Doris
BOOLEAN BOOLEAN
SMALLINT SMALLINT
INT INT
BIGINT BIGINT
VARCHAR VARCHAR
DATE DATE
TIMESTAMP DATETIME
REAL FLOAT
FLOAT DOUBLE
DECIMAL DECIMAL
Oracle
Oracle Doris
VARCHAR VARCHAR
DATE DATETIME
SMALLINT SMALLINT
INT INT
REAL DOUBLE
FLOAT DOUBLE
NUMBER DECIMAL
SQL server
SQLServer Doris
BIT BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
SQLServer Doris
INT INT
BIGINT BIGINT
VARCHAR VARCHAR
DATE DATE
DATETIME DATETIME
REAL FLOAT
FLOAT DOUBLE
DECIMAL DECIMAL
ClickHouse
ClickHouse Doris
BOOLEAN BOOLEAN
CHAR CHAR
VARCHAR VARCHAR
STRING STRING
DATE DATE
Float32 FLOAT
Float64 DOUBLE
Int8 TINYINT
Int16 SMALLINT
Int32 INT
Int64 BIGINT
Int128 LARGEINT
DATETIME DATETIME
DECIMAL DECIMAL
Note:
Some data types in ClickHouse, such as UUID, IPv4, IPv6, and Enum8, will be mapped to Varchar/String in Doris. IPv4
and IPv6 will be displayed with an / as a prefix. You can use the split_part function to remove the / .
The Point Geo type in ClickHouse cannot be mapped in Doris by far.
Q&A
See the FAQ section in JDBC.
File Analysis
Since Version 1.2.0
With the Table Value Function feature, Doris is able to query files in object storage or HDFS as simply as querying Tables. In
addition, it supports automatic column type inference.
Usage
For more usage details, please see the documentation:
The followings illustrate how file analysis is conducted with the example of S3 Table Value Function.
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "parquet",
"use_path_style"="true");
+---------------+--------------+------+-------+---------+-------+
+---------------+--------------+------+-------+---------+-------+
+---------------+--------------+------+-------+---------+-------+
s3(
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "parquet",
"use_path_style"="true")
After defining, you can view the schema of this file using the DESC FUNCTION statement.
As can be seen, Doris is able to automatically infer column types based on the metadata of the Parquet file.
Besides Parquet, Doris supports analysis and auto column type inference of ORC, CSV, and Json files.
CSV Schema
By default, for CSV format files, all columns are of type String. Column names and column types can be specified individually
via the csv_schema attribute. Doris will use the specified column type for file reading. The format is as follows:
name1:type1;name2:type2;...
For columns with mismatched formats (such as string in the file and int defined by the user), or missing columns (such as 4
columns in the file and 5 columns defined by the user), these columns will return null.
tinyint tinyint
smallint smallint
int int
bigint bigint
largeint largeint
float float
double double
decimal(p,s) decimalv3(p,s)
date datev2
datetime datetimev2
char string
varchar string
string string
boolean boolean
Example:
s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "parquet",
"use_path_style"="true")
LIMIT 5;
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
| 1 | goldenrod lavender spring chocolate lace | Manufacturer#1 | Brand#13 | PROMO BURNISHED COPPER |
7 | JUMBO PKG | 901 | ly. slyly ironi |
| 2 | blush thistle blue yellow saddle | Manufacturer#1 | Brand#13 | LARGE BRUSHED BRASS |
1 | LG CASE | 902 | lar accounts amo |
| 3 | spring green yellow purple cornsilk | Manufacturer#4 | Brand#42 | STANDARD POLISHED BRASS |
21 | WRAP CASE | 903 | egular deposits hag |
| 4 | cornflower chocolate smoke green pink | Manufacturer#3 | Brand#34 | SMALL PLATED BRASS |
14 | MED DRUM | 904 | p furiously r |
| 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN |
15 | SM PKG | 905 | wake carefully |
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
You can put the Table Value Function anywhere that you used to put Table in the SQL, such as in the WITH or FROM clause in
CTE. In this way, you can treat the file as a normal table and conduct analysis conveniently.
Data Ingestion
Users can ingest files into Doris tables via INSERT INTO SELECT for faster file analysis:
id int,
name varchar(50),
age int
PROPERTIES("replication_num" = "1");
FROM s3(
"uri" = "${uri}",
"ACCESS_KEY"= "${ak}",
"SECRET_KEY" = "${sk}",
"format" = "${format}",
"strip_outer_array" = "true",
"read_json_by_line" = "true",
"use_path_style" = "true");
File Cache
Since Version dev
File Cache accelerates queries that read the same data by caching the data files of recently accessed from remote storage
system (HDFS or Object Storage). In Ad Hoc scenarios where the same data is frequently accessed, File Cache can avoid
repeated remote data access costs and improve the query analysis performance and stability of hot data.
How it works
File Cache caches the accessed remote data in the local BE node. The original data file will be divided into blocks according to
the read IO size, and the block will be stored in file cache_path/hash(filepath).substr(0, 3)/hash(filepath)/offset , and
save the block meta information in the BE node. When accessing the same remote file, doris will check whether the cached
data of the file exists in the local cache, and according to the offset and size of the block, confirm which data is read from the
local block, which data is pulled from the remote, and cache the new data pulled from the remote. When the BE node
restarts, scan cache_path directory, recover the meta information of the block. When the cache size reaches the upper
threshold, the blocks that have not been accessed for a long time shall be cleaned according to the LRU principle.
Usage
File Cache is disabled by default. You need to set the relevant configuration in FE and BE to enable it.
Configurations for FE
Enable File Cache for a given session:
Configurations for BE
Add settings to the BE node's configuration file conf/be.conf , and restart the BE node for the configuration to take effect.
|
Parameter | Description |
| ---- | ---- |
| enable_file_cache | Whether to enable File Cache, default false |
|
file_cache_max_file_segment_size | Max size of a single cached block, default 4MB, should greater than 4096 |
|
file_cache_path | Parameters about cache path, json format, for exmaple: [{"path": "storage1",
"normal":53687091200,"persistent":21474836480,"query_limit": " 10737418240"},{"path": "storage2",
"normal":53687091200,"persistent":21474836480},{"path": "storage3","normal":53687091200,"persistent":21474836480}]. path
is the path to save cached data; normal is the max size of cached data; query_limit is the max size of cached data for a
single query; persistent / file_cache_max_file_segment_size is max number of cache blocks. |
|
enable_file_cache_query_limit | Whether to limit the cache size used by a single query, default false |
| clear_file_cache |
Whether to delete the previous cache data when the BE restarts, default false |
- FileCache:
- IOHitCacheNum: 552
- IOTotalNum: 835
- ReadFromFileCacheBytes: 19.98 MB
- ReadFromWriteCacheBytes: 0.00
- ReadTotalBytes: 29.52 MB
- WriteInFileCacheBytes: 915.77 MB
- WriteInFileCacheNum: 283
SkipCacheBytes : Failed to create the cache file, or the cache file was deleted. The amount of data that needs to be read
from the remote again
WriteInFileCacheBytes : Amount of data saved to cache file
IOHitCacheNum / IOTotalNum Equal to 1, indicating that read data only from file cache
ReadFromFileCacheBytes / ReadTotalBytes Equal to 1, indicating that read data only from file cache
ReadFromFileCacheBytes The smaller the better, the smaller the amount of data read from remote
Version Compatibility
Connector Spark Doris Java Scala
#export THRIFT_BIN=
#export MVN_BIN=
#export JAVA_HOME=
export THRIFT_BIN=/opt/homebrew/Cellar/[email protected]/0.13.0/bin/thrift
#export MVN_BIN=
#export JAVA_HOME=
Install `thrift` 0.13.0 (Note: `Doris` 0.15 and the latest builds are based on `thrift` 0.13.0, previous versions
are still built with `thrift` 0.9.3)
Windows:
1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
MacOS:
Note: Executing `brew install [email protected]` on MacOS may report an error that the version cannot be found. The
solution is as follows, execute it in the terminal:
Linux:
:
1.Download source package `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
:
2.Install dependencies `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel
zlib-devel gcc gcc-c++`
4.`cd thrift-0.13.0`
5.`./configure --without-tests`
6.`make`
7.`make install`
:
Check the version after installation is complete thrift --version
Note: If you have compiled Doris, you do not need to install thrift, you can directly use
$DORIS_HOME/thirdparty/installed/bin/thrift
Note: If you check out the source code from tag, you can just run sh build.sh --tag without specifying the spark and scala
versions. This is because the version in the tag source code is fixed.
After successful compilation, the file doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar will be generated in the output/
directory. Copy this file to ClassPath in Spark to use Spark-Doris-Connector . For example, Spark running in Local mode,
put this file in the jars/ folder. Spark running in Yarn cluster mode, put this file in the pre-deployment package ,for
example upload doris-spark-2.3.4-2.11-1.0.0-SNAPSHOT.jar to hdfs and add hdfs file path in spark.yarn.jars.
spark.yarn.jars=hdfs:///spark-jars/doris-spark-connector-3.1.2-2.12-1.0.0.jar
Using Maven
<dependency>
<groupId>org.apache.doris</groupId>
<artifactId>spark-doris-connector-3.1_2.12</artifactId>
<!--artifactId>spark-doris-connector-2.3_2.11</artifactId-->
<version>1.0.1</version>
</dependency>
Notes
Please replace the Connector version according to the different Spark and Scala versions.
Example
Read
SQL
USING doris
OPTIONS(
"table.identifier"="$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME",
"fenodes"="$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
"user"="$YOUR_DORIS_USERNAME",
"password"="$YOUR_DORIS_PASSWORD"
);
DataFrame
val dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
dorisSparkDF.show(5)
RDD
import org.apache.doris.spark._
tableIdentifier = Some("$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME"),
cfg = Some(Map(
))
dorisSparkRDD.collect()
pySpark
dorisSparkDF = spark.read.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
.load()
dorisSparkDF.show(5)
Write
SQL
USING doris
OPTIONS(
"table.identifier"="$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME",
"fenodes"="$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT",
"user"="$YOUR_DORIS_USERNAME",
"password"="$YOUR_DORIS_PASSWORD"
);
# or
DataFrame(batch/stream)
## batch sink
mockDataDF.show(5)
mockDataDF.write.format("doris")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//other options
.option("doris.write.fields","$YOUR_FIELDS_TO_WRITE")
.save()
## stream sink(StructuredStreaming)
.option("kafka.bootstrap.servers", "$YOUR_KAFKA_SERVERS")
.option("startingOffsets", "latest")
.option("subscribe", "$YOUR_KAFKA_TOPICS")
.format("kafka")
.load()
.writeStream
.format("doris")
.option("checkpointLocation", "$YOUR_CHECKPOINT_LOCATION")
.option("doris.table.identifier", "$YOUR_DORIS_DATABASE_NAME.$YOUR_DORIS_TABLE_NAME")
.option("doris.fenodes", "$YOUR_DORIS_FE_HOSTNAME:$YOUR_DORIS_FE_RESFUL_PORT")
.option("user", "$YOUR_DORIS_USERNAME")
.option("password", "$YOUR_DORIS_PASSWORD")
//other options
.option("doris.write.fields","$YOUR_FIELDS_TO_WRITE")
.start()
.awaitTermination()
Configuration
General
Key Default Value Comment
Query the timeout time of doris, the default is 1 hour, -1 means no timeout
doris.request.query.timeout.s 3600
limit
Key Default Value Comment
The maximum number of rows to read data from BE at one time. Increasing
doris.batch.size 1024 this value can reduce the number of connections between Spark and Doris.
Thereby reducing the extra time overhead caused by network delay.
doris.exec.mem.limit 2147483648 Memory limit for a single query. The default is 2GB, in bytes.
Specifies the fields (or the order of the fields) to write to the Doris table, fileds
doris.write.fields -- separated by commas.
By default, all fields are written in the order of Doris table fields.
sink.properties.* -- eg:
sink.properties.column_separator' = ','
In the predicate pushdown, the maximum number of elements in the in expression value list.
doris.filter.query.in.max.count 100 If this number is exceeded, the in-expression conditional filtering is processed on the Spark
side.
RDD Configuration
Default
Key Comment
Value
Filter expression of the query, which is transparently transmitted to Doris. Doris uses this
doris.filter.query --
expression to complete source-side data filtering.
NULL_TYPE DataTypes.NullType
BOOLEAN DataTypes.BooleanType
TINYINT DataTypes.ByteType
SMALLINT DataTypes.ShortType
INT DataTypes.IntegerType
BIGINT DataTypes.LongType
FLOAT DataTypes.FloatType
DOUBLE DataTypes.DoubleType
DATE DataTypes.StringType1
DATETIME DataTypes.StringType1
BINARY DataTypes.BinaryType
DECIMAL DecimalType
CHAR DataTypes.StringType
LARGEINT DataTypes.StringType
VARCHAR DataTypes.StringType
DECIMALV2 DecimalType
TIME DataTypes.DoubleType
Note: In Connector, DATE and DATETIME are mapped to String . Due to the processing logic of the Doris underlying
storage engine, when the time type is used directly, the time range covered cannot meet the demand. So use String
type to directly return the corresponding time readable text.
The Flink Doris Connector can support operations (read, insert, modify, delete) data stored in Doris through Flink.
Note:
1. Modification and deletion are only supported on the Unique Key model
2. The current deletion is to support Flink CDC to access data to achieve automatic deletion. If it is to delete other data
access methods, you need to implement it yourself. For the data deletion usage of Flink CDC, please refer to the last
section of this document
Version Compatibility
Connector Version Flink Version Doris Version Java Version Scala Version
#export THRIFT_BIN=
#export MVN_BIN=
#export JAVA_HOME=
export THRIFT_BIN=/opt/homebrew/Cellar/[email protected]/0.13.0/bin/thrift
#export MVN_BIN=
#export JAVA_HOME=
Install thrift 0.13.0 (Note: Doris 0.15 and the latest builds are based on thrift 0.13.0, previous versions are still built with
thrift 0.9.3)
Windows:
1. Download: http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe
2. Modify thrift-0.13.0.exe to thrift
MacOS:
1. Download: brew install [email protected]
2. default address: /opt/homebrew/Cellar/[email protected]/0.13.0/bin/thrift
Note: Executing brew install [email protected] on MacOS may report an error that the version cannot be found. The solution
is as follows, execute it in the terminal:
Linux:
2. yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++
# Install dependencies
4. cd thrift-0.13.0
5. ./configure --without-tests
6. make
7. make install
Note: If you have compiled Doris, you do not need to install thrift, you can directly use
$DORIS_HOME/thirdparty/installed/bin/thrift
sh build.sh
Usage:
build.sh --flink version # specify flink version (after flink-doris-connector v1.2 and flink-1.15, there is
no need to provide scala version)
e.g.:
build.sh --tag
Then, for example, execute the command to compile according to the version you need:
sh build.sh --flink 1.16.0
After successful compilation, the file flink-doris-connector-1.16-1.3.0-SNAPSHOT.jar will be generated in the target/
directory. Copy this file to classpath in Flink to use Flink-Doris-Connector . For example, Flink running in Local mode,
put this file in the lib/ folder. Flink running in Yarn cluster mode, put this file in the pre-deployment package.
Remarks:
conf/fe.conf
enable_http_server_v2 = true
Using Maven
Add flink-doris-connector Maven dependencies
<dependency>
<groupId>org.apache.doris</groupId>
<artifactId>flink-doris-connector-1.16</artifactId>
<version>1.3.0</version>
</dependency>
Notes
1. Please replace the corresponding Connector and Flink dependency versions according to different Flink versions. Version
1.3.0 only supports Flink1.16
2. You can also download the relevant version jar package from here.
How to use
There are three ways to use Flink Doris Connector.
SQL
DataStream
Parameters Configuration
Flink Doris Connector Sink writes data to Doris by the Stream load , and also supports the configurations of Stream load ,
For specific parameters, please refer to here.
SQL
Source
name STRING,
age INT,
price DECIMAL(5,2),
sale DOUBLE
WITH (
'connector' = 'doris',
'fenodes' = 'FE_IP:8030',
'table.identifier' = 'database.table',
'username' = 'root',
'password' = 'password'
);
Sink
-- enable checkpoint
name STRING,
age INT,
price DECIMAL(5,2),
sale DOUBLE
WITH (
'connector' = 'doris',
'fenodes' = 'FE_IP:8030',
'table.identifier' = 'db.table',
'username' = 'root',
'password' = 'password',
'sink.label-prefix' = 'doris_label'
);
Insert
DataStream
Source
.setFenodes("FE_IP:8030")
.setTableIdentifier("db.table")
.setUsername("root")
.setPassword("password");
.setDorisOptions(builder.build())
.setDorisReadOptions(DorisReadOptions.builder().build())
.setDeserializer(new SimpleListDeserializationSchema())
.build();
Sink
String Stream
// enable checkpoint
env.enableCheckpointing(10000);
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
dorisBuilder.setFenodes("FE_IP:8030")
.setTableIdentifier("db.table")
.setUsername("root")
.setPassword("password");
/**
properties.setProperty("format", "json");
properties.setProperty("read_json_by_line", "true");
**/
.setStreamLoadProp(properties);
builder.setDorisReadOptions(DorisReadOptions.builder().build())
.setDorisExecutionOptions(executionBuilder.build())
.setDorisOptions(dorisBuilder.build());
data.add(new Tuple2<>("doris",1));
.sinkTo(builder.build());
RowData Stream
// enable checkpoint
env.enableCheckpointing(10000);
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
dorisBuilder.setFenodes("FE_IP:8030")
.setTableIdentifier("db.table")
.setUsername("root")
.setPassword("password");
properties.setProperty("format", "json");
properties.setProperty("read_json_by_line", "true");
builder.setDorisReadOptions(DorisReadOptions.builder().build())
.setDorisExecutionOptions(executionBuilder.build())
.setFieldNames(fields)
.setFieldType(types).build())
.setDorisOptions(dorisBuilder.build());
@Override
genericRowData.setField(0, StringData.fromString("beijing"));
genericRowData.setField(1, 116.405419);
genericRowData.setField(2, 39.916927);
genericRowData.setField(3, LocalDate.now().toEpochDay());
return genericRowData;
});
source.sinkTo(builder.build());
SchemaChange Stream
// enable checkpoint
env.enableCheckpointing(10000);
props.setProperty("format", "json");
props.setProperty("read_json_by_line", "true");
.setFenodes("127.0.0.1:8030")
.setTableIdentifier("test.t1")
.setUsername("root")
.setPassword("").build();
executionBuilder.setLabelPrefix("label-doris" + UUID.randomUUID())
.setStreamLoadProp(props).setDeletable(true);
builder.setDorisReadOptions(DorisReadOptions.builder().build())
.setDorisExecutionOptions(executionBuilder.build())
.setDorisOptions(dorisOptions)
.setSerializer(JsonDebeziumSchemaSerializer.builder().setDorisOptions(dorisOptions).build());
.sinkTo(builder.build());
refer: CDCSchemaChangeExample
General
Key Default Value Required Comment
doris.exec.mem.limit 2147483648 N Memory limit for a single query. The default is 2GB, in bytes.
The label prefix used by stream load imports. In the 2pc scenario,
sink.label-prefix -- Y global uniqueness is required to ensure the EOS semantics of
Flink.
eg:
sink.properties.column_separator' = ','
NULL_TYPE NULL
BOOLEAN BOOLEAN
TINYINT TINYINT
SMALLINT SMALLINT
INT INT
Doris Type Flink Type
BIGINT BIGINT
FLOAT FLOAT
DOUBLE DOUBLE
DATE DATE
DATETIME TIMESTAMP
DECIMAL DECIMAL
CHAR STRING
LARGEINT STRING
VARCHAR STRING
DECIMALV2 DECIMAL
TIME DOUBLE
id int
,name VARCHAR
) WITH (
'connector' = 'mysql-cdc',
'hostname' = '127.0.0.1',
'port' = '3306',
'username' = 'root',
'password' = 'password',
'database-name' = 'database',
'table-name' = 'table'
);
-- Support delete event synchronization (sink.enable-delete='true'), requires Doris table to enable batch delete
function
id INT,
name STRING
WITH (
'connector' = 'doris',
'fenodes' = '127.0.0.1:8030',
'table.identifier' = 'database.table',
'username' = 'root',
'password' = '',
'sink.properties.format' = 'json',
'sink.properties.read_json_by_line' = 'true',
'sink.enable-delete' = 'true',
'sink.label-prefix' = 'doris_label'
);
Java example
samples/doris-demo/ An example of the Java version is provided below for reference, see here
Best Practices
Application scenarios
The most suitable scenario for using Flink Doris Connector is to synchronize source data to Doris (Mysql, Oracle, PostgreSQL)
in real time/batch, etc., and use Flink to perform joint analysis on data in Doris and other data sources. You can also use Flink
Doris Connector
Other
1. The Flink Doris Connector mainly relies on Checkpoint for streaming writing, so the interval between Checkpoints is the
visible delay time of the data.
2. To ensure the Exactly Once semantics of Flink, the Flink Doris Connector enables two-phase commit by default, and
Doris enables two-phase commit by default after version 1.1. 1.0 can be enabled by modifying the BE parameters, please
refer to two_phase_commit.
FAQ
1. After Doris Source finishes reading data, why does the stream end?
Currently Doris Source is a bounded stream and does not support CDC reading.
By configuring the doris.filter.query parameter, refer to the configuration section for details.
dt int,
page string,
user_id int
WITH (
'connector' = 'doris',
'fenodes' = '127.0.0.1:8030',
'table.identifier' = 'test.bitmap_test',
'username' = 'root',
'password' = '',
'sink.label-prefix' = 'doris_label',
'sink.properties.columns' = 'dt,page,user_id,user_id=to_bitmap(user_id)'
4. errCode = 2, detailMessage = Label [label_0_1] has already been used, relate to txn [19650]
In the Exactly-Once scenario, the Flink Job must be restarted from the latest Checkpoint/Savepoint, otherwise the above
error will be reported.
When Exactly-Once is not required, it can also be solved by turning off 2PC commits (sink.enable-
2pc=false) or changing to a different sink.label-prefix.
Occurred in the Commit phase, the transaction ID recorded in the checkpoint has expired on the FE side, and the above error
will occur when committing again at this time.
At this time, it cannot be started from the checkpoint, and the expiration time
can be extended by modifying the streaming_label_keep_max_second configuration in fe.conf, which defaults to 12 hours.
6. errCode = 2, detailMessage = current running txns on db 10006 is 100, larger than limit 100
This is because the concurrent import of the same library exceeds 100, which can be solved by adjusting the parameter
max_running_txn_num_per_db of fe.conf. For details, please refer to max_running_txn_num_per_db
7. How to ensure the order of a batch of data when Flink writes to the Uniq model?
You can add sequence column configuration to ensure that, for details, please refer to sequence
8. The Flink task does not report an error, but the data cannot be synchronized?
Before Connector1.1.0, it was written in batches, and the writing was driven by data. It was necessary to determine whether
there was data written upstream. After 1.1.0, it depends on Checkpoint, and Checkpoint must be enabled to write.
It usually occurs before Connector1.1.0, because the writing frequency is too fast, resulting in too many versions. The
frequency of Streamload can be reduced by setting the sink.batch.size and sink.batch.interval parameters.
When Flink imports data, if there is dirty data, such as field format, length, etc., it will cause StreamLoad to report an error,
and Flink will continue to retry at this time. If you need to skip, you can disable the strict mode of StreamLoad
(strict_mode=false, max_filter_ratio=1) or filter the data before the Sink operator.
DataX doriswriter
DataX doriswriter plug-in, used to synchronize data from other data sources to Doris through DataX.
The plug-in uses Doris' Stream Load function to synchronize and import data. It needs to be used with DataX service.
About DataX
DataX is an open source version of Alibaba Cloud DataWorks data integration, an offline data synchronization tool/platform
widely used in Alibaba Group. DataX implements efficient data synchronization functions between various heterogeneous
data sources including MySQL, Oracle, SqlServer, Postgre, HDFS, Hive, ADS, HBase, TableStore (OTS), MaxCompute (ODPS),
Hologres, DRDS, etc.
Usage
The code of DataX doriswriter plug-in can be found here.
Because the doriswriter plug-in depends on some modules in the DataX code base, and these module dependencies are not
submitted to the official Maven repository, when we develop the doriswriter plug-in, we need to download the complete
DataX code base to facilitate our development and compilation of the doriswriter plug-in.
Directory structure
1. doriswriter/
This directory is the code directory of doriswriter, and this part of the code should be in the Doris code base.
2. init-env.sh
After that, developers can enter DataX/ for development. And the changes in the DataX/doriswriter directory will be
reflected in the doriswriter/ directory, which is convenient for developers to submit code.
How to build
Doris code base compilation
1. Run init-env.sh
3. Build doriswriter
hdfsreader, hdfswriter and oscarwriter needs some extra jar packages. If you don't need to use these
components, you can comment out the corresponding module in DataX/pom.xml.
a. Download alibaba-datax-maven-m2-20210928.tar.gz
b. After decompression, copy the resulting alibaba/datax/ directory to .m2/repository/com/alibaba/
corresponding to the maven used.
c. Try to compile again.
cd datax
After compiling, you can see the datax.tar.gz package under datax/target/Datax
Description: Doris's JDBC connection string, the user executes preSql or postSQL.
Mandatory: Yes
Default: None
loadUrl
Description: As a connection target for Stream Load. The format is "ip:port". Where IP is the FE node IP, port is the
http_port of the FE node. You can fill in more than one, separated by a semicolon in English: ; , doriswriter will visit in
a polling manner.
Mandatory: Yes
Default: None
username
password
connection.selectedDatabase
connection. table
flushInterval
Description: The time interval at which data is written in batches. If this time interval is set too small, it will cause
Doris write blocking problem, error code -235, and if you set this time interval too small, maxBatchRows and
batchSize parameters are set too large, then it may not be able to reach you The data size set by this will also be
imported.
Mandatory: No
Default: 30000 (ms)
column
Description: The fields that the destination table needs to write data into, these fields will be used as the field names
of the generated Json data. Fields are separated by commas. For example: "column": ["id","name","age"].
Mandatory: Yes
Default: No
preSql
Description: Before writing data to the destination table, the standard statement here will be executed first.
Mandatory: No
Default: None
postSql
Description: After writing data to the destination table, the standard statement here will be executed.
Mandatory: No
Default: None
maxBatchRows
Description: The maximum number of rows for each batch of imported data. Together with batchSize, it controls the
number of imported record rows per batch. When each batch of data reaches one of the two thresholds, the data of
this batch will start to be imported.
Mandatory: No
Default: 500000
batchSize
Description: The maximum amount of data imported in each batch. Works with maxBatchRows to control the
number of imports per batch. When each batch of data reaches one of the two thresholds, the data of this batch will
start to be imported.
Mandatory: No
Default: 104857600
maxRetries
Description: The number of retries after each batch of failed data imports.
Mandatory: No
Default: 3
labelPrefix
Description: The label prefix for each batch of imported tasks. The final label will have labelPrefix + UUID to form a
globally unique label to ensure that data will not be imported repeatedly
Mandatory: No
Default: datax_doris_writer_
loadProps
Description: The request parameter of StreamLoad. For details, refer to the StreamLoad introduction page. Stream
load - Apache Doris
This includes the imported data format: format, etc. The imported data format defaults to csv, which supports JSON.
For details, please refer to the type conversion section below, or refer to the official information of Stream load
above.
Mandatory: No
Default: None
Example
) ENGINE=OLAP
PROPERTIES (
"in_memory" = "false",
"storage_format" = "V2"
);
my_import.json
"job": {
"content": [
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"connection": [
"jdbcUrl": ["jdbc:mysql://localhost:3306/demo"],
"table": ["employees_1"]
],
"username": "root",
"password": "xxxxx",
"where": ""
},
"writer": {
"name": "doriswriter",
"parameter": {
"loadUrl": ["127.0.0.1:8030"],
"loadProps": {
},
"column": ["id","order_code","line_code","remark","unit_no","unit_name","price"],
"username": "root",
"password": "xxxxxx",
"preSql": [],
"flushInterval":30000,
"connection": [
"jdbcUrl": "jdbc:mysql://127.0.0.1:9030/demo",
"selectedDatabase": "demo",
"table": ["all_employees_info"]
],
"loadProps": {
"format": "json",
"strip_outer_array":"true",
"line_delimiter": "\\x02"
],
"setting": {
"speed": {
"channel": "1"
Remark:
"loadProps": {
"format": "json",
"strip_outer_array": "true",
"line_delimiter": "\\x02"
"loadProps": {
"format": "csv",
"column_separator": "\\x01",
"line_delimiter": "\\x02"
CSV format should pay special attention to row and column separators to avoid conflicts with special characters in the
data. Hidden characters are recommended here. The default column separator is: \t, row separator: \n
4.Execute the datax task, refer to the specific datax official website
2022-11-16 14:28:54.012 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2022-11-16 14:28:54.013 [job-0] INFO JobContainer - DataX Writer.Job [doriswriter] do prepare work .
2022-11-16 14:28:54.023 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2022-11-16 14:28:54.023 [job-0] INFO JobContainer - DataX Writer.Job [doriswriter] splits to [1] tasks.
2022-11-16 14:28:54.041 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2022-11-16 14:28:54.043 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2022-11-16 14:28:54.043 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2022-11-16 14:28:54.052 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select taskid,p
] jdbcUrl:[jdbc:mysql://localhost:3306/demo?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=f
Wed Nov 16 14:28:54 GMT+08:00 2022 WARN: Establishing SSL connection without server's identity verification is not
established by default if explicit option isn't set. For compliance with existing applications not using SSL the ve
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2022-11-16 14:28:54.071 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select taskid,p
] jdbcUrl:[jdbc:mysql://localhost:3306/demo?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=f
2022-11-16 14:28:54.104 [Thread-1] INFO DorisStreamLoadObserver - Start to join batch data: rows[2] bytes[438] lab
2022-11-16 14:28:54.104 [Thread-1] INFO DorisStreamLoadObserver - Executing stream load to: 'http://127.0.0.1:8030
2022-11-16 14:28:54.224 [Thread-1] INFO DorisStreamLoadObserver - StreamLoad response :
{"Status":"Success","BeginTxnTimeMs":0,"Message":"OK","NumberUnselectedRows":0,"CommitAndPublishTimeMs":17,"Label":
db34acc45b6f","LoadBytes":441,"StreamLoadPutTimeMs":1,"NumberTotalRows":2,"WriteDataTimeMs":11,"TxnId":217056,"Load
2022-11-16 14:28:54.225 [Thread-1] INFO DorisWriterManager - Async stream load finished: label[datax_doris_writer_
2022-11-16 14:28:54.249 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[201]ms
2022-11-16 14:29:04.048 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 214 bytes | Speed 21B/s
WaitReaderTime 0.000s | Percentage 100.00%
2022-11-16 14:29:04.049 [job-0] INFO JobContainer - DataX Writer.Job [doriswriter] do post work.
Wed Nov 16 14:29:04 GMT+08:00 2022 WARN: Establishing SSL connection without server's identity verification is not
established by default if explicit option isn't set. For compliance with existing applications not using SSL the ve
setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2022-11-16 14:29:04.187 [job-0] INFO DorisWriter$Job - Start to execute preSqls:[select count(1) from dwd_universa
2022-11-16 14:29:04.204 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2022-11-16 14:29:04.204 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2022-11-16 14:29:04.204 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /dat
2022-11-16 14:29:04.205 [job-0] INFO JobContainer -
2022-11-16 14:29:04.206 [job-0] INFO StandAloneJobContainerCommunicator - Total 2 records, 214 bytes | Speed 21B/s
WaitReaderTime 0.000s | Percentage 100.00%
任务总计耗时 : 10s
任务平均流量 : 21B/s
记录写入速度 : 0rec/s
读出记录总数 : 2
读写失败总数 : 0
Mysql to Doris
mysql to doris is mainly suitable for automating the creation of doris odbc tables, mainly implemented with shell scripts
manual
mysql to doris code here
Directory Structure
├── mysql_to_doris
│ ├── conf
│ │ ├── doris.conf
│ │ ├── mysql.conf
│ │ └── tables
│ ├── all_tables.sh
│ │
1. all_tables.sh
This script mainly reads all the tables under the mysql specified library and automatically creates the Doris odbc external
table
2. user_define_tables.sh
This script is mainly used for users to customize certain tables under the specified mysql library to automatically create
Doris odbc external tables
3. conf
Configuration file, doris.conf is mainly used to configure doris related, mysql.conf is mainly used to configure mysql
related, tables is mainly used to configure user-defined mysql library tables
full
#doris.conf
master_host=
master_port=
doris_password=
doris_odbc_name=''
#mysql.conf
mysql_host=
mysql_password=
doris_odbc_name The name of mysql odbc in the odbcinst.ini configuration file under be/conf
mysql_host Mysql IP
After successful execution, the files directory will be generated, and the directory will contain tables (table name) and
tables.sql (doris odbc table creation statement)
custom
1. Modify the conf/tables file to add the name of the odbc table that needs to be created
2. To configure mysql and doris related information, refer to step 2 of full creation
3. Execute the user_define_tables.sh script
After successful execution, the user_files directory will be generated, and the directory will contain tables.sql (doris odbc
table creation statement)
2.compile
Execute under extension/logstash/ directory
3.Plug-in installation
copy logstash-output-doris-{version}.gem to the logstash installation directory
Executing an order
Configuration
Example:
Create a new configuration file in the config directory and name it logstash-doris.conf
output {
doris {
db => "db_name"
Configuration instructions:
Connection configuration:
Configuration Explanation
user User name, the user needs to have import permission for the doris table
password Password
db Database name
Configuration Explanation
columns Used to specify the correspondence between the columns in the import file and the columns in the table
max_filter_ratio The maximum tolerance rate of the import task, the default is zero tolerance
timezone Specify the time zone used for this import, the default is the East Eight District
Other configuration:
Configuration Explanation
batch_size The maximum number of events processed per batch, the default is 100000
Start Up
Run the command to start the doris output plugin:
1. Compile doris-output-plugin
1> Download the ruby compressed package and go to ruby official website to download it. The version 2.7.1 used here
2> Enter the filebeat directory and modify the configuration file filebeat.yml as follows:
filebeat.inputs:
- type: log
paths:
- /tmp/doris.data
output.logstash:
hosts: ["localhost:5044"]
2> Copy the logstash-output-doris-0.1.0.gem obtained in step 1 to the logstash installation directory
3> execute
4> Create a new configuration file logstash-doris.conf in the config directory as follows:
input {
beats {
output {
doris {
db => "logstash_output_test"
4.Test Load
Add write data to /tmp/doris.data
Observe the logstash log. If the status of the returned response is Success, the import was successful. At this time, you can
view the imported data in the logstash_output_test.output table
Introduction
Doris plugin framework supports install/uninstall custom plugins at runtime without restart the Doris service. Users can
extend Doris's functionality by developing their own plugins.
For example, the audit plugin worked after a request execution, it can obtain information related to a request (access user,
request IP, SQL, etc...) and write the information into the specified table.
UDF is a function used for data calculation when SQL is executed. Plugin is additional function that is used to extend
Doris with customized function, such as support different storage engines and different import ways, and plugin doesn't
participate in data calculation when executing SQL.
The execution cycle of UDF is limited to a SQL execution. The execution cycle of plugin may be the same as the Doris
process.
The usage scene is different. If you need to support special data algorithms when executing SQL, then UDF is
recommended, if you need to run custom functions on Doris, or start a background thread to do tasks, then the use of
plugin is recommended.
Note:
Doris plugin framework is an experimental feature, currently only supports FE plugin, and is closed by default, can
be opened by FE configuration plugin_enable = true .
Plugin
A FE Plugin can be a .zip package or a directory, which contains at least two parts: the plugin.properties and .jar files.
The plugin.properties file is used to describe the plugin information.
# plugin .zip
auditodemo.zip:
-plugin.properties
-auditdemo.jar
-xxx.config
-data/
-test_data/
-plugin.properties
-auditdemo.jar
-xxx.config
-data/
-test_data/
plugin.properties example:
### required:
name = audit_plugin_demo
#
type = AUDIT
version = 0.11.0
# use the command "java -version" value, like 1.8.0, 9.0.1, 13.0.4
java.version = 1.8.31
classname = AuditPluginDemo
soName = example.so
Write A Plugin
The development environment of the FE plugin depends on the development environment of Doris. So please make sure
Doris's compilation and development environment works normally.
fe_plugins is the parent module of the fe plugins. It can uniformly manage the third-party library information that the
plugin depends on. Adding a plugin can add a submodule implementation under fe_plugins .
Create module
We can add a submodule in the fe_plugins directory to implement Plugin and create a project:
The command produces a new mvn project, and a new submodule is automatically added to fe_plugins/pom.xml :
.....
<groupId>org.apache</groupId>
<artifactId>doris-fe-plugins</artifactId>
<packaging>pom</packaging>
<version>1.0-SNAPSHOT</version>
<modules>
<module>auditdemo</module>
<module>doris-fe-test</module>
</modules>
.....
-pom.xml
-src/
---- main/java/org/apache/
---- test/java/org/apache
We will add an assembly folder under main to store plugin.properties and zip.xml . After completion, the file structure is
as follows:
-doris-fe-test/
-pom.xml
-src/
---- main/
------ assembly/
-------- plugin.properties
-------- zip.xml
------ java/org/apache/
---- test/java/org/apache
Add zip.xml
zip.xml , used to describe the content of the final package of the plugin (.jar file, plugin.properties):
<assembly>
<id>plugin</id>
<formats>
<format>zip</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>target</directory>
<includes>
<include>*.jar</include>
</ ncludes>
<outputDirectory>/</outputDirectory>
</fileSet>
<fileSet>
<directory>src/main/assembly</directory>
<includes>
<include>plugin.properties</include>
</includes>
<outputDirectory>/</outputDirectory>
</fileSet>
</fileSets>
</assembly>
Update pom.xml
Then we need to update pom.xml , add doris-fe dependency, and modify maven packaging way:
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>org.apache</groupId>
<artifactId>doris-fe-plugins</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>auditloader</artifactId>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache</groupId>
<artifactId>doris-fe</artifactId>
</dependency>
<dependency>
...
</dependency>
</dependencies>
<build>
<finalName>auditloader</finalName>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4.1</version>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptors>
<descriptor>src/main/assembly/zip.xml</descriptor>
</descriptors>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Implement plugin
Then we can implement Plugin according to the needs. Plugins need to implement the Plugin interface. For details, please
refer to the auditdemo plugin sample code that comes with Doris.
Compile
Before compiling the plugin, you must first execute sh build.sh --fe of Doris to complete the compilation of Doris FE.
Finally, execute sh build_plugin.sh in the ${DORIS_HOME} path and you will find the your_plugin_name.zip file in
fe_plugins/output
Or you can execute sh build_plugin.sh --plugin your_plugin_name to only build your plugin.
Other way
The easiest way, you can implement your plugin by modifying the example auditdemo
Deploy
Doris's plugin can be deployed in three ways:
Http or Https .zip, like http://xxx.xxxxxx.com/data/plugin.zip , Doris will download this .zip file. At the same time, the
value of md5sum needs to be set in properties, or an md5 file with the same name as the .zip file needs to be placed,
such as http://xxx.xxxxxx.com/data/my_plugin.zip.md5 . The content is the MD5 value of the .zip file.
Local .zip, like /home/work/data/plugin.zip . If the plug-in is only used for FE, it needs to be deployed in the same
directory of all FE nodes. Otherwise, it needs to be deployed on all FE and BE nodes.
Local directory, like /home/work/data/plugin , .zip decompressed folder. If the plug-in is only used for FE, it needs to be
deployed in the same directory of all FE nodes. Otherwise, it needs to be deployed on all FE and BE nodes.
Note: Need to ensure that the plugin .zip file is available in the life cycle of doris!
Name: auditloader
Type: AUDIT
Description: load audit log to olap load, and user can view the statistic of queries
Version: 0.12.0
JavaVersion: 1.8.31
ClassName: AuditLoaderPlugin
SoName: NULL
Sources: /home/users/doris/auditloader.zip
Status: INSTALLED
Properties: {}
Name: AuditLogBuilder
Type: AUDIT
Version: 0.12.0
JavaVersion: 1.8.31
ClassName: org.apache.doris.qe.AuditLogBuilder
SoName: NULL
Sources: Builtin
Status: INSTALLED
Properties: {}
http: //www.apache.org/licenses/LICENSE-2.0
This plugin can periodically import the FE audit log into the specified Doris cluster, so that users can easily view and analyze
the audit log through SQL.
FE Configuration
FE's plugin framework is an experimental feature, which is closed by default. In the FE configuration file, add plugin_enable
= true to enable the plugin framework.
AuditLoader Configuration
The configuration of the auditloader plugin is located in $ {DORIS}/fe_plugins/auditloader/src/main/assembly/ .
Open plugin.conf for configuration. See the comments of the configuration items.
Since Version 1.2.0 Audit log plugin supports importing slow query logs into a separate slow table since version 1.2,
`doris_slow_log_tbl__`, which is closed by default. In the plugin configuration file, add `enable_slow_log = true` to enable the
function. And you could modify 'qe_slow_log_ms' item in FE configuration file to change slow query threshold.
Compile
After executing sh build_plugin.sh in the Doris code directory, you will get the auditloader.zip file in the
fe_plugins/output directory.
Deployment
You can place this file on an http download server or copy(or unzip) it to the specified directory of all FEs. Here we use the
latter.
Installation
After deployment is complete, and before installing the plugin, you need to create the audit database and tables previously
specified in plugin.conf . If enable_slow_log is set true, the slow table doris_slow_log_tbl__ needs to be created, with the
same schema as doris_audit_log_tbl__ . The database and table creation statement is as follows:
cpu_time_ms bigint comment "Total scan cpu time in millisecond of this query",
peak_memory_bytes bigint comment "Peak memory bytes used on all backends of this query",
stmt string comment "The original statement, trimed if longer than 2G "
) engine=OLAP
partition by range(`time`) ()
properties(
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "1",
"dynamic_partition.enable" = "true",
"replication_num" = "3"
);
cpu_time_ms bigint comment "Total scan cpu time in millisecond of this query",
peak_memory_bytes bigint comment "Peak memory bytes used on all backends of this query",
stmt string comment "The original statement, trimed if longer than 2G"
) engine=OLAP
partition by range(`time`) ()
properties(
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-30",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "1",
"dynamic_partition.enable" = "true",
"replication_num" = "3"
);
Notice
In the above table structure: stmt string, this can only be used in 0.15 and later versions, in previous versions, the field
type used varchar
The dynamic_partition attribute selects the number of days to keep the audit log based on your needs.
After that, connect to Doris and use the INSTALL PLUGIN command to complete the installation. After successful installation,
you can see the installed plug-ins through SHOW PLUGINS , and the status is INSTALLED .
Upon completion, the plug-in will continuously import audit date into this table at specified intervals.
Introduction
CloudCanal Community Edition is a free data migration and synchronization platform published by ClouGence Company
that integrates structure migration, full data migration/check/correction, and incremental real-time synchronization. Product
contains complete
Its productization capabilities help enterprises break data silos, complete data integration and
interoperability, and make better use of data.
There is no English version of this document, please switch to the Chinese version.
1. Reduce the time of importing data into doris, and remove processes such as dictionary building and bitmap pre-
aggregation;
2. Save hive storage, use bitmap to compress data, reduce storage cost;
3. Provide flexible bitmap operations in hive, such as: intersection, union, and difference operations, and the calculated
bitmap can also be directly imported into doris; imported into doris;
How To Use
) comment 'comment'
:
-- Example Create Hive Table
) comment 'comment'
--install thrift
(
--Execute the maven packaging command All sub modules of fe will be packaged )
After packaging and compiling, enter the hive-udf directory and there will be a target directory,There will be hive-udf-jar-
with-dependencies.jar package
-- Load the Hive Bitmap Udf jar package (Upload the compiled hive-udf jar package to HDFS)
-- Example: Generate bitmap by to_bitmap function and write to Hive Bitmap table
select
k1,
k2,
k3,
to_bitmap(uuid) as uuid
from
hive_table
group by
k1,
k2,
k3
-- Example: The bitmap_count function calculate the number of elements in the bitmap
Options
name type required default value engine
doris.* - no - Flink
fenodes [string]
database [string]
Doris database
table [string]
Doris table
user [string]
Doris user
password [string]
Doris password
batch_size [int]
The maximum number of lines to write to Doris at a time, the default value is 100
interval [int]
The flush interval (in milliseconds), after which the asynchronous thread writes the data in the cache to Doris. Set to 0 to turn
off periodic writes.
max_retries [int]
doris.* [string]
Import parameters for Stream load. For example: 'doris.column_separator' = ', ' etc.
Examples
Socket To Doris
env {
execution.parallelism = 1
source {
SocketStream {
host = 127.0.0.1
port = 9999
result_table_name = "socket"
field_name = "info"
transform {
sink {
DorisSink {
fenodes = "127.0.0.1:8030"
user = root
password = 123456
database = test
table = test_tbl
batch_size = 5
max_retries = 1
interval = 5000
Start command
sh bin/start-seatunnel-flink.sh --config config/flink.streaming.conf
Install Seatunnel
Seatunnel install
Options
name type required default value engine
fenodes [string]
Doris FE address:8030
database [string]
table [string]
user [string]
password [string]
batch_size [string]
doris. [string]
Doris stream_load properties,you can use 'doris.' prefix + stream_load properties
More Doris stream_load Configurations
Examples
Hive to Doris
Config properties
env{
spark.app.name = "hive2doris-template"
spark {
spark.sql.catalogImplementation = "hive"
source {
hive {
result_table_name = "test"
transform {
sink {
Console {
Doris {
fenodes="xxxx:8030"
database="gl_mint_dim"
table="dim_date"
user="root"
password="root"
batch_size=1000
doris.column_separator="\t"
doris.columns="date_key,date_value,day_in_year,day_in_month"
Start command
Contribute UDF
This manual mainly introduces how external users can contribute their own UDF functions to the Doris community.
Prerequisites
1. UDF function is universal
The versatility here mainly refers to: UDF functions are widely used in certain business scenarios. Such UDF functions are
valuable and can be used directly by other users in the community.
If you are not sure whether the UDF function you wrote is universal, you can send an email to [email protected] or
directly create an ISSUE to initiate the discussion.
2. UDF has completed testing and is running normally in the user's production environment
Ready to work
1. UDF source code
2. User Manual of UDF
Source code
Create a folder for UDF functions under contrib/udf/src/ , and store the source code and CMAKE files here. The source code
to be contributed should include: .h , .cpp , CMakeFile.txt . Taking udf_samples as an example here, first create a new
folder under the contrib/udf/src/ path and store the source code.
├──contrib
│ └── udf
│ ├── CMakeLists.txt
│ └── src
│ └── udf_samples
│ ├── CMakeLists.txt
│ ├── uda_sample.cpp
│ ├── uda_sample.h
│ ├── udf_sample.cpp
│ └── udf_sample.h
1. CMakeLists.txt
After the user's CMakeLists.txt is placed here, a small amount of changes are required. Just remove include udf and
udf lib . The reason for the removal is that it has been declared in the CMake file at the contrib/udf level.
manual
The user manual needs to include: UDF function definition description, applicable scenarios, function syntax, how to compile
UDF, how to use UDF in Doris, and use examples.
1. The user manual must contain both Chinese and English versions and be stored under docs/zh-CN/extending-
doris/contrib/udf and docs/en/extending-doris/contrib/udf respectively.
├── docs
│ └──extending-doris
│ └──udf
│ └──contrib
│ ├── udf-simple-manual.md
├── docs
│ └── en
│ └──extending-doris
│ └──udf
│ └──contrib
│ ├── udf-simple-manual.md
2. Add the two manual files to the sidebar in Chinese and English.
vi docs/.vuepress/sidebar/zh-CN.js
directoryPath: "contrib/",
children:
"udf-simple-manual",
],
},
vi docs/.vuepress/sidebar/en.js
directoryPath: "contrib/",
children:
"udf-simple-manual",
],
},
Finally, when the PR assessment is passed and merged. Congratulations, your UDF becomes a third-party UDF supported by
Doris. You can check it out in the ecological expansion section of Doris official website~.
1. The advantage
2. Restrictions on use
Performance: Compared to Native UDFs, UDF services incur extra network overhead and thus have much lower
performance than Native UDFs. At the same time, the implementation of the UDF Service also affects the execution
efficiency of the function. Users need to deal with problems such as high concurrency and thread safety by
themselves.
Single line mode and batch mode: Doris's original query execution framework based on row memory would execute
one UDF RPC call for each row of data, so the execution efficiency was very poor. However, under the new
vectorization execution framework, one UDF RPC call would be executed for each batch of data (2048 rows by
default), so the performance was significantly improved. In actual tests, the performance of Remote UDF based on
vectorization and batch processing is similar to that of Native UDF based on rowmemory, which can be used for
reference.
function_service.proto
PFunctionCallRequest
:
function_name The function name, corresponding to the symbol specified when the function was created
:
args The parameters passed by the method
context :Querying context Information
PFunctionCallResponse
result:Return result
status:Return Status, 0 indicates normal
PCheckFunctionRequest
:
function Function related information
:
match_type Matching type
PCheckFunctionResponse
:
status Return status, 0 indicates normal
Generated interface
Use protoc generate code, and specific parameters are viewed using protoc -h
Implementing an interface
The following three methods need to be implemented
:
fnCall Used to write computational logic
:
checkFn Used to verify function names, parameters, and return values when creating UDFs
:
handShake Used for interface probe
Create UDF
Currently UDTF are not supported
CREATE FUNCTION
name ([,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
Instructions:
1. PROPERTIES中 symbol Represents the name of the method passed by the RPC call, which must be set。
2. PROPERTIES中 object_file Represents the RPC service address. Currently, a single address and a cluster address in
BRPC-compatible format are supported. Refer to the cluster connection modeFormat specification。
3. PROPERTIES中 type Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used。
4. name: A function belongs to a DB and name is of the form dbName . funcName . When dbName is not explicitly specified, the
db of the current session is used dbName 。
Sample:
"SYMBOL"="add_int",
"OBJECT_FILE"="127.0.0.1:9090",
"TYPE"="RPC"
);
Use UDF
Users must have the SELECT permission of the corresponding database to use UDF/UDAF.
The use of UDF is consistent with ordinary function methods. The only difference is that the scope of built-in functions is
global, and the scope of UDF is internal to DB. When the link session is inside the data, directly using the UDF name will find
the corresponding UDF inside the current DB. Otherwise, the user needs to display the specified UDF database name, such
as dbName . funcName .
Delete UDF
When you no longer need UDF functions, you can delete a UDF function by the following command, you can refer to DROP
FUNCTION .
Example
Examples of rpc server implementations and cpp/java/python languages are provided in the samples/doris-demo/ directory.
See the README.md in each directory for details on how to use it.
There are two types of analysis requirements that UDF can meet: UDF and UDAF. UDF in this article refers to both.
1. UDF: User-defined function, this function will operate on a single line and output a single line result. When users use UDFs
for queries, each row of data will eventually appear in the result set. Typical UDFs are string operations such as concat().
2. UDAF: User-defined aggregation function. This function operates on multiple lines and outputs a single line of results.
When the user uses UDAF in the query, each group of data after grouping will finally calculate a value and expand the
result set. A typical UDAF is the set operation sum(). Generally speaking, UDAF will be used together with group by.
This document mainly describes how to write a custom UDF function and how to use it in Doris.
If users use the UDF function and extend Doris' function analysis, and want to contribute their own UDF functions back to
the Doris community for other users, please see the document Contribute UDF.
Writing functions
Create the corresponding header file and CPP file, and implement the logic you need in the CPP file. Correspondence
between the implementation function format and UDF in the CPP file.
Users can put their own source code in a folder. Taking udf_sample as an example, the directory structure is as follows:
└── udf_samples
├── uda_sample.cpp
├── uda_sample.h
├── udf_sample.cpp
└── udf_sample.h
Non-variable parameters
For UDFs with non-variable parameters, the correspondence between the two is straightforward.
For example, the UDF of
INT MyADD(INT, INT) will correspond to IntVal AddUdf(FunctionContext* context, const IntVal& arg1, const IntVal&
arg2) .
variable parameter
For variable parameters, you can refer to the following example, corresponding to UDF String md5sum(String, ...)
The
implementation function is StringVal md5sumUdf(FunctionContext* ctx, int num_args, const StringVal* args)
Type correspondence
TinyInt TinyIntVal
SmallInt SmallIntVal
Int IntVal
BigInt BigIntVal
LargeInt LargeIntVal
Float FloatVal
Double DoubleVal
Date DateTimeVal
Datetime DateTimeVal
Char StringVal
Varchar StringVal
Decimal DecimalVal
After the compilation is completed, the static library file of the UDF framework will be generated. Then introduce the UDF
framework dependency and compile the UDF.
Compile Doris
Running sh build.sh in the root directory of Doris will generate a static library file of the UDF framework headers|libs in
output/udf/
├── output
│ └── udf
│ ├── include
│ │ ├── uda_test_harness.h
│ │ └── udf.h
│ └── lib
│ └── libDorisUdf.a
The thirdparty folder is mainly used to store thirdparty libraries that users' UDF functions depend on, including header
files and static libraries. It must contain the two files udf.h and libDorisUdf.a in the dependent Doris UDF framework.
Taking udf_sample as an example here, the source code is stored in the user's own udf_samples directory. Create a
thirdparty folder in the same directory to store the static library. The directory structure is as follows:
├── thirdparty
│ │── include
│ │ └── udf.h
│ └── lib
│ └── libDorisUdf.a
└── udf_samples
udf.h is the UDF frame header file. The storage path is doris/output/udf/include/udf.h . Users need to copy the header
file in the Doris compilation output to their include folder of thirdparty .
libDorisUdf.a is a static library of UDF framework. After Doris is compiled, the file is stored in
doris/output/udf/lib/libDorisUdf.a . The user needs to copy the file to the lib folder of his thirdparty .
*Note: The static library of UDF framework will not be generated until Doris is compiled.
CMakeFiles.txt is used to declare how UDF functions are compiled. Stored in the source code folder, level with user code.
Here, taking udf_samples as an example, the directory structure is as follows:
├── thirdparty
└── udf_samples
├── CMakeLists.txt
├── uda_sample.cpp
├── uda_sample.h
├── udf_sample.cpp
└── udf_sample.h
```
# Include udf
include_directories(../thirdparty/include)
set(LIBRARY_OUTPUT_PATH "src/udf_samples")
set(EXECUTABLE_OUTPUT_PATH "src/udf_samples")
target_link_libraries(udfsample
udf
-static-libstdc++
-static-libgcc
target_link_libraries(udasample
udf
-static-libstdc++
-static-libgcc
```
If the user's UDF function also depends on other thirdparty libraries, you need to declare include, lib, and add
dependencies in `add_library`.
The complete directory structure after all files are prepared is as follows:
├── thirdparty
│ │── include
│ │ └── udf.h
│ └── lib
│ └── libDorisUdf.a
└── udf_samples
├── CMakeLists.txt
├── uda_sample.cpp
├── uda_sample.h
├── udf_sample.cpp
└── udf_sample.h
Prepare the above files and you can compile UDF directly
Execute compilation
Create a build folder under the udf_samples folder to store the compilation output.
Run the command cmake ../ in the build folder to generate a Makefile, and execute make to generate the corresponding
dynamic library.
├── thirdparty
├── udf_samples
└── build
Compilation result
After the compilation is completed, the UDF dynamic link library is successfully generated. Under build/src/ , taking
udf_samples as an example, the directory structure is as follows:
├── thirdparty
├── udf_samples
└── build
└── src
└── udf_samples
├── libudasample.so
└── libudfsample.so
Then log in to the Doris system and create a UDF function in the mysql-client through the CREATE FUNCTION syntax. You need
to have ADMIN authority to complete this operation. At this time, there will be a UDF created in the Doris system.
name ([argtype][,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
Description:
1. "Symbol" in PROPERTIES means that the symbol corresponding to the entry function is executed. This parameter must
be set. You can get the corresponding symbol through the nm command, for example,
_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_ obtained by nm libudfsample.so | grep AddUdf is the
corresponding symbol.
2. The object_file in PROPERTIES indicates where it can be downloaded to the corresponding dynamic library. This
parameter must be set.
3. name: A function belongs to a certain DB, and the name is in the form of dbName . funcName . When dbName is not
explicitly specified, the db where the current session is located is used as dbName .
For specific use, please refer to CREATE FUNCTION for more detailed information.
Use UDF
Users must have the SELECT permission of the corresponding database to use UDF/UDAF.
The use of UDF is consistent with ordinary function methods. The only difference is that the scope of built-in functions is
global, and the scope of UDF is internal to DB. When the link session is inside the data, directly using the UDF name will find
the corresponding UDF inside the current DB. Otherwise, the user needs to display the specified UDF database name, such
as dbName . funcName .
Delete UDF
When you no longer need UDF functions, you can delete a UDF function by the following command, you can refer to DROP
FUNCTION .
Java UDF
Since Version 1.2.0
Java UDF
Java UDF provides users with a Java interface written in UDF to facilitate the execution of user-defined functions in Java
language. Compared with native UDF implementation, Java UDF has the following advantages and limitations:
1. The advantages
Compatibility: Using Java UDF can be compatible with different Doris versions, so when upgrading Doris version, Java
UDF does not need additional migration. At the same time, Java UDF also follows the same programming specifications
as hive / spark and other engines, so that users can directly move Hive / Spark UDF jar to Doris.
Security: The failure or crash of Java UDF execution will only cause the JVM to report an error, not the Doris process to
crash.
Flexibility: In Java UDF, users can package the third-party dependencies together in the user jar.
2. Restrictions on use
Performance: Compared with native UDF, Java UDF will bring additional JNI overhead, but through batch execution, we
have minimized the JNI overhead as much as possible.
Vectorized engine: Java UDF is only supported on vectorized engine now.
Type correspondence
Type UDF Argument Type
Bool Boolean
TinyInt Byte
SmallInt Short
Int Integer
BigInt Long
LargeInt BigInteger
Float Float
Double Double
Date LocalDate
Datetime LocalDateTime
String String
Decimal BigDecimal
array<Type> ArrayList<Type>
Array types can nested basic types, Eg: In Doris: array<int> corresponds to JAVA UDF Argument Type:
ArrayList<Integer> , Others is also.
To use Java UDF, the main entry of UDF must be the evaluate function. This is consistent with other engines such as Hive. In
the example of AddOne , we have completed the operation of adding an integer as the UDF.
It is worth mentioning that this example is not only the Java UDF supported by Doris, but also the UDF supported by Hive,
that's to say, for users, Hive UDF can be directly migrated to Doris.
Create UDF
CREATE FUNCTION
name ([,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
Instructions:
1. symbol in properties represents the class name containing UDF classes. This parameter must be set.
2. The jar package containing UDF represented by file in properties must be set.
3. The UDF call type represented by type in properties is native by default. When using java UDF, it is transferred to
Java_UDF .
4. In PROPERTIES always_nullable indicates whether there may be a NULL value in the UDF return result. It is an optional
parameter. The default value is true.
5. name : A function belongs to a DB and name is of the form dbName . funcName . When dbName is not explicitly specified, the
db of the current session is used dbName .
Sample :
CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES (
"file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",
"symbol"="org.apache.doris.udf.AddOne",
"always_nullable"="true",
"type"="JAVA_UDF"
);
"file"=" http: //IP:port/udf -code. Jar ", you can also use http to download jar packages in a multi machine environment.
The "always_nullable" is optional attribute, if there is special treatment for the NULL value in the calculation, it is
determined that the result will not return NULL, and it can be set to false, so that the performance may be better in the
whole calculation process.
If you use the local path method, the jar package that the database driver depends on, the FE and BE nodes must be
placed here
Create UDAF
When using Java code to write UDAF, there are some functions that must be implemented (mark required) and an inner
class State, which will be explained with a specific example below. The following SimpleDemo will implement a simple
function similar to sum, the input parameter is INT, and the output parameter is INT
package org.apache.doris.udf.demo;
import org.apache.hadoop.hive.ql.exec.UDAF;
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
/*required*/
/*required*/
/*required*/
/*required*/
if (val != null) {
state.sum += val;
/*required*/
try {
out.writeInt(state.sum);
} catch ( IOException e ) {
/*required*/
int val = 0;
try {
val = in.readInt();
} catch ( IOException e ) {
state.sum = val;
}
/*required*/
state.sum += rhs.sum;
/*required*/
return state.sum;
"file"="file:///pathTo/java-udaf.jar",
"symbol"="org.apache.doris.udf.demo.SimpleDemo",
"always_nullable"="true",
"type"="JAVA_UDF"
);
The implemented jar package can be stored at local or in a remote server and downloaded via http, And each BE node
must be able to obtain the jar package;
Otherwise, the error status message "Couldn't open file..." will be returned
Use UDF
Users must have the SELECT permission of the corresponding database to use UDF/UDAF.
The use of UDF is consistent with ordinary function methods. The only difference is that the scope of built-in functions is
global, and the scope of UDF is internal to DB. When the link session is inside the data, directly using the UDF name will find
the corresponding UDF inside the current DB. Otherwise, the user needs to display the specified UDF database name, such
as dbName . funcName .
Delete UDF
When you no longer need UDF functions, you can delete a UDF function by the following command, you can refer to DROP
FUNCTION .
Example
Examples of Java UDF are provided in the samples/doris-demo/java-udf-demo/ directory. See the README.md in each
directory for details on how to use it, Check it out here
Instructions
1. Complex data types (HLL, bitmap) are not supported.
2. Currently, users are allowed to specify the maximum heap size of the JVM themselves. The configuration item is jvm max
heap_ size.
3. The udf of char type needs to use the String type when creating a function.
4. Due to the problem that the jvm loads classes with the same name, do not use multiple classes with the same name as
udf implementations at the same time. If you want to update the udf of a class with the same name, you need to restart
be to reload the classpath.
1. The advantage
2. Restrictions on use
Performance: Compared to Native UDAFs, UDAF services incur extra network overhead and thus have much lower
performance than Native UDAFs. At the same time, the implementation of the UDAF Service also affects the
execution efficiency of the function. Users need to deal with problems such as high concurrency and thread safety by
themselves.
Single line mode and batch mode: Doris's original query execution framework based on row memory would execute
one UDAF RPC call for each row of data, so the execution efficiency was very poor. However, under the new
vectorization execution framework, one UDAF RPC call would be executed for each batch of data, so the
performance was significantly improved. In actual tests, the performance of Remote UDAF based on vectorization
and batch processing is similar to that of Native UDAF based on rowmemory, which can be used for reference.
function_service.proto
PFunctionCallRequest
:
function_name The function name, corresponding to the symbol specified when the function was created
:
args The parameters passed by the method
context :Querying context Information
PFunctionCallResponse
result:Return result
status:Return Status, 0 indicates normal
PCheckFunctionRequest
:
function Function related information
:
match_type Matching type
PCheckFunctionResponse
:
status Return status, 0 indicates normal
Generated interface
Use protoc generate code, and specific parameters are viewed using protoc -h
Implementing an interface
The following three methods need to be implemented
:
fnCall Used to write computational logic
:
checkFn Used to verify function names, parameters, and return values when creating UDAFs
:
handShake Used for interface probe
Create UDAF
Currently, UDAF and UDTF are not supported
CREATE FUNCTION
name ([,...])
[RETURNS] rettype
PROPERTIES (["key"="value"][,...])
Instructions:
1. PROPERTIES中 symbol Represents the name of the method passed by the RPC call, which must be set。
2. PROPERTIES中 object_file Represents the RPC service address. Currently, a single address and a cluster address in
BRPC-compatible format are supported. Refer to the cluster connection modeFormat specification。
3. PROPERTIES中 type Indicates the UDAF call type, which is Native by default. Rpc is transmitted when Rpc UDAF is
used。
Sample:
"TYPE"="RPC",
"OBJECT_FILE"="127.0.0.1:9000",
"update_fn"="rpc_sum_update",
"merge_fn"="rpc_sum_merge",
"finalize_fn"="rpc_sum_finalize"
);
Use UDAF
Users must have the SELECT permission of the corresponding database to use UDAF/UDAF.
The use of UDAF is consistent with ordinary function methods. The only difference is that the scope of built-in functions is
global, and the scope of UDAF is internal to DB. When the link session is inside the data, directly using the UDAF name will
find the corresponding UDAF inside the current DB. Otherwise, the user needs to display the specified UDAF database name,
such as dbName . funcName .
Delete UDAF
When you no longer need UDAF functions, you can delete a UDAF function by the following command, you can refer to
DROP FUNCTION .
Example
Examples of rpc server implementations and cpp/java/python languages are provided in the samples/doris-demo/ directory.
See the README.md in each directory for details on how to use it.
array
array()
Since Version 1.2.0
array()
description
Syntax
ARRAY<T> array(T, ...)
construct an array with variadic elements and return it, T could be column or literal
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+----------------------+
| array('1', 2, '1.1') |
+----------------------+
| ['1', '2', '1.1'] |
+----------------------+
1 row in set (0.00 sec)
+----------------+
| array(NULL, 1) |
+----------------+
| [NULL, 1] |
+----------------+
+----------------+
| array(1, 2, 3) |
+----------------+
| [1, 2, 3] |
+----------------+
+------------------------------------+
+------------------------------------+
+------------------------------------+
keywords
ARRAY,ARRAY,CONSTRUCTOR
array_max
array_max
Since Version 1.2.0
array_max
description
Get the maximum element in an array ( NULL values are skipped).
When the array is empty or all elements in the array are
NULL values, the function returns NULL .
example
mysql> create table array_type_table(k1 INT, k2 Array<int>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), (1, [NULL]), (2, [1, 2, 3]), (3, [1, NULL, 3]);
+--------------+-----------------+
| k2 | array_max(`k2`) |
+--------------+-----------------+
| [] | NULL |
| [NULL] | NULL |
| [1, 2, 3] | 3 |
| [1, NULL, 3] | 3 |
+--------------+-----------------+
keywords
ARRAY,MAX,ARRAY_MAX
array_min
array_min
Since Version 1.2.0
array_min
description
Get the minimum element in an array ( NULL values are skipped).
When the array is empty or all elements in the array are
NULL values, the function returns NULL .
example
mysql> create table array_type_table(k1 INT, k2 Array<int>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), (1, [NULL]), (2, [1, 2, 3]), (3, [1, NULL, 3]);
+--------------+-----------------+
| k2 | array_min(`k2`) |
+--------------+-----------------+
| [] | NULL |
| [NULL] | NULL |
| [1, 2, 3] | 1 |
| [1, NULL, 3] | 1 |
+--------------+-----------------+
keywords
ARRAY,MIN,ARRAY_MIN
array_avg
array_avg
Since Version 1.2.0
array_avg
description
Get the average of all elements in an array ( NULL values are skipped).
When the array is empty or all elements in the array are
NULL values, the function returns NULL .
example
mysql> create table array_type_table(k1 INT, k2 Array<int>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), (1, [NULL]), (2, [1, 2, 3]), (3, [1, NULL, 3]);
+--------------+-----------------+
| k2 | array_avg(`k2`) |
+--------------+-----------------+
| [] | NULL |
| [NULL] | NULL |
| [1, 2, 3] | 2 |
| [1, NULL, 3] | 2 |
+--------------+-----------------+
keywords
ARRAY,AVG,ARRAY_AVG
array_sum
array_sum
Since Version 1.2.0
array_sum
description
Get the sum of all elements in an array ( NULL values are skipped).
When the array is empty or all elements in the array are
NULL values, the function returns NULL .
example
mysql> create table array_type_table(k1 INT, k2 Array<int>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), (1, [NULL]), (2, [1, 2, 3]), (3, [1, NULL, 3]);
+--------------+-----------------+
| k2 | array_sum(`k2`) |
+--------------+-----------------+
| [] | NULL |
| [NULL] | NULL |
| [1, 2, 3] | 6 |
| [1, NULL, 3] | 4 |
+--------------+-----------------+
keywords
ARRAY,SUM,ARRAY_SUM
array_size
description
Syntax
BIGINT size(ARRAY<T> arr)
Returns the size of the array, returns NULL for NULL input.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------+------------+
| k1 | k2 | size(`k2`) |
+------+-----------+------------+
| 1 | [1, 2, 3] | 3 |
| 2 | [] | 0 |
| 3 | NULL | NULL |
+------+-----------+------------+
+------+-----------+------------------+
| k1 | k2 | array_size(`k2`) |
+------+-----------+------------------+
| 1 | [1, 2, 3] | 3 |
| 2 | [] | 0 |
| 3 | NULL | NULL |
+------+-----------+------------------+
+------+-----------+-------------------+
| k1 | k2 | cardinality(`k2`) |
+------+-----------+-------------------+
| 1 | [1, 2, 3] | 3 |
| 2 | [] | 0 |
| 3 | NULL | NULL |
+------+-----------+-------------------+
keywords
ARRAY_SIZE, SIZE, CARDINALITY
array_remove
array_remove
Since Version 1.2.0
array_remove
description
Syntax
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+-----------------------------------------------------+
+-----------------------------------------------------+
| [test, NULL] |
+-----------------------------------------------------+
+------+--------------------+-----------------------+
| k1 | k2 | array_remove(`k2`, 1) |
+------+--------------------+-----------------------+
| 1 | [1, 2, 3] | [2, 3] |
| 2 | [1, 3] | [3] |
| 3 | NULL | NULL |
| 4 | [1, 3] | [3] |
+------+--------------------+-----------------------+
+------+--------------------+--------------------------+
| k1 | k2 | array_remove(`k2`, `k1`) |
+------+--------------------+--------------------------+
| 1 | [1, 2, 3] | [2, 3] |
| 2 | [1, 3] | [1, 3] |
| 3 | NULL | NULL |
| 4 | [1, 3] | [1, 3] |
+------+--------------------+--------------------------+
+------+--------------------------+-------------------------------------------------+
+------+--------------------------+-------------------------------------------------+
+------+--------------------------+-------------------------------------------------+
+------+-----------+--------------------------+
| k1 | k2 | array_remove(`k2`, `k1`) |
+------+-----------+--------------------------+
| 1 | NULL | NULL |
| 1 | [NULL, 1] | [NULL] |
+------+-----------+--------------------------+
keywords
ARRAY,REMOVE,ARRAY_REMOVE
array_slice
array_slice
Since Version 1.2.0
array_slice
description
Syntax
An empty array is returned when the off is not within the actual range of the array.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+-----------------+-------------------------+
| k2 | array_slice(`k2`, 2, 2) |
+-----------------+-------------------------+
| [1, 2, 3] | [2, 3] |
| [2, 3] | [3] |
| NULL | NULL |
+-----------------+-------------------------+
+-----------------+-------------------------+
| k2 | array_slice(`k2`, 2, 2) |
+-----------------+-------------------------+
| [1, 2, 3] | [2, 3] |
| [2, 3] | [3] |
| NULL | NULL |
+-----------------+-------------------------+
+----------------------------+-------------------------+
| k2 | array_slice(`k2`, 2, 2) |
+----------------------------+-------------------------+
+----------------------------+-------------------------+
+----------------------------+-------------------------+
| k2 | array_slice(`k2`, 2, 2) |
+----------------------------+-------------------------+
+----------------------------+-------------------------+
Negative off:
+-----------+--------------------------+
| k2 | array_slice(`k2`, -2, 1) |
+-----------+--------------------------+
| [1, 2, 3] | [2] |
| [1, 2, 3] | [2] |
| [2, 3] | [2] |
| [2, 3] | [2] |
+-----------+--------------------------+
| k2 | array_slice(`k2`, -2, 1) |
+-----------+--------------------------+
| [1, 2, 3] | [2] |
| [1, 2, 3] | [2] |
| [2, 3] | [2] |
| [2, 3] | [2] |
+-----------+--------------------------+
+----------------------------+--------------------------+
| k2 | array_slice(`k2`, -2, 2) |
+----------------------------+--------------------------+
+----------------------------+--------------------------+
+----------------------------+--------------------------+
| k2 | array_slice(`k2`, -2, 2) |
+----------------------------+--------------------------+
+----------------------------+--------------------------+
+-----------+-------------------------+
| k2 | array_slice(`k2`, 0) |
+-----------+-------------------------+
| [1, 2, 3] | [] |
+-----------+-------------------------+
+-----------+----------------------+
| k2 | array_slice(`k2`, -5) |
+-----------+----------------------+
| [1, 2, 3] | [] |
+-----------+----------------------+
keywords
ARRAY,SLICE,ARRAY_SLICE
array_sort
array_sort
Since Version 1.2.0
array_sort
description
Syntax
Return the array which has been sorted in ascending order. Return NULL for NULL input.
If the element of array is NULL, it will
be placed in the front of the sorted array.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------------------+-----------------------------+
| k1 | k2 | array_sort(`k2`) |
+------+-----------------------------+-----------------------------+
| 1 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] | [6, 7, 8] |
| 3 | [] | [] |
| 4 | NULL | NULL |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [1, 1, 2, 2, 3, 3, 4, 4, 5] |
+------+-----------------------------+-----------------------------+
+------+------------------------------------------+------------------------------------------+
| k1 | k2 | array_sort(`k2`) |
+------+------------------------------------------+------------------------------------------+
| 1 | ['a', 'b', 'c', 'd', 'e'] | ['a', 'b', 'c', 'd', 'e'] |
| 3 | [''] | [''] |
| 3 | [NULL] | [NULL] |
| 5 | ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c'] | ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] |
| 6 | NULL | NULL |
+------+------------------------------------------+------------------------------------------+
keywords
ARRAY, SORT, ARRAY_SORT
array_position
array_position
Since Version 1.2.0
array_position
description
Syntax
BIGINT array_position(ARRAY<T> arr, T value)
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+------------------------------+
| id | c_array | array_position(`c_array`, 5) |
+------+-----------------+------------------------------+
| 1 | [1, 2, 3, 4, 5] | 5 |
| 2 | [6, 7, 8] | 0 |
| 3 | [] | 0 |
| 4 | NULL | NULL |
+------+-----------------+------------------------------+
+--------------------------------------+
+--------------------------------------+
| 2 |
+--------------------------------------+
keywords
ARRAY,POSITION,ARRAY_POSITION
array_contains
array_contains
Since Version 1.2.0
array_contains
description
Syntax
BOOLEAN array_contains(ARRAY<T> arr, T value)
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+------------------------------+
| id | c_array | array_contains(`c_array`, 5) |
+------+-----------------+------------------------------+
| 1 | [1, 2, 3, 4, 5] | 1 |
| 2 | [6, 7, 8] | 0 |
| 3 | [] | 0 |
| 4 | NULL | NULL |
+------+-----------------+------------------------------+
+--------------------------------------+
+--------------------------------------+
| 1 |
+--------------------------------------+
keywords
ARRAY,CONTAIN,CONTAINS,ARRAY_CONTAINS
array_except
array_except
Since Version 1.2.0
array_except
description
Syntax
ARRAY<T> array_except(ARRAY<T> array1, ARRAY<T> array2)
Returns an array of the elements in array1 but not in array2, without duplicates. If the input parameter is null, null is returned.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+--------------+--------------------------+
| k1 | k2 | k3 | array_except(`k2`, `k3`) |
+------+-----------------+--------------+--------------------------+
+------+-----------------+--------------+--------------------------+
+------+-----------------+--------------+--------------------------+
| k1 | k2 | k3 | array_except(`k2`, `k3`) |
+------+-----------------+--------------+--------------------------+
+------+-----------------+--------------+--------------------------+
+------+----------------------------+----------------------------------+--------------------------+
| k1 | k2 | k3 | array_except(`k2`, `k3`) |
+------+----------------------------+----------------------------------+--------------------------+
+------+----------------------------+----------------------------------+--------------------------+
+------+------------------+-------------------+--------------------------+
| k1 | k2 | k3 | array_except(`k2`, `k3`) |
+------+------------------+-------------------+--------------------------+
+------+------------------+-------------------+--------------------------+
keywords
ARRAY,EXCEPT,ARRAY_EXCEPT
array_product
array_product
Since Version 1.2.0
array_product
description
Get the product of all elements in an array ( NULL values are skipped).
When the array is empty or all elements in the array are
NULL values, the function returns NULL .
example
mysql> create table array_type_table(k1 INT, k2 Array<int>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), (1, [NULL]), (2, [1, 2, 3]), (3, [1, NULL, 3]);
+--------------+---------------------+
| k2 | array_product(`k2`) |
+--------------+---------------------+
| [] | NULL |
| [NULL] | NULL |
| [1, 2, 3] | 6 |
| [1, NULL, 3] | 3 |
+--------------+---------------------+
keywords
ARRAY,PRODUCT,ARRAY_PRODUCT
array_intersect
array_intersect
Since Version 1.2.0
array_intersect
description
Syntax
Returns an array of the elements in the intersection of array1 and array2, without duplicates. If the input parameter is null, null
is returned.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+--------------+-----------------------------+
| k1 | k2 | k3 | array_intersect(`k2`, `k3`) |
+------+-----------------+--------------+-----------------------------+
| 2 | [2, 3] | [1, 5] | [] |
| 3 | [1, 1, 1] | [2, 2, 2] | [] |
+------+-----------------+--------------+-----------------------------+
+------+-----------------+--------------+-----------------------------+
| k1 | k2 | k3 | array_intersect(`k2`, `k3`) |
+------+-----------------+--------------+-----------------------------+
+------+-----------------+--------------+-----------------------------+
+------+----------------------------+----------------------------------+-----------------------------+
| k1 | k2 | k3 | array_intersect(`k2`, `k3`) |
+------+----------------------------+----------------------------------+-----------------------------+
+------+----------------------------+----------------------------------+-----------------------------+
+------+------------------+-------------------+-----------------------------+
| k1 | k2 | k3 | array_intersect(`k2`, `k3`) |
+------+------------------+-------------------+-----------------------------+
+------+------------------+-------------------+-----------------------------+
keywords
ARRAY,INTERSECT,ARRAY_INTERSECT
array_range
array_range
Since Version 1.2.0
array_range
description
Syntax
ARRAY<Int> array_range(Int end)
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+--------------------------------+
| array_range(10) |
+--------------------------------+
| [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] |
+--------------------------------+
+------------------------------------------+
| array_range(10, 20) |
+------------------------------------------+
| [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] |
+------------------------------------------+
+-------------------------------------+
| array_range(0, 20, 2) |
+-------------------------------------+
+-------------------------------------+
keywords
ARRAY, RANGE, ARRAY_RANGE
array_distinct
array_distinct
Since Version 1.2.0
array_distinct
description
Syntax
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------------------+---------------------------+
| k1 | k2 | array_distinct(k2) |
+------+-----------------------------+---------------------------+
| 1 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] | [6, 7, 8] |
| 3 | [] | [] |
| 4 | NULL | NULL |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [1, 2, 3, 4, 5] |
+------+-----------------------------+---------------------------+
+------+------------------------------------------+---------------------------+
| k1 | k2 | array_distinct(`k2`) |
+------+------------------------------------------+---------------------------+
| 1 | ['a', 'b', 'c', 'd', 'e'] | ['a', 'b', 'c', 'd', 'e'] |
| 3 | [''] | [''] |
| 3 | [NULL] | [NULL] |
| 5 | ['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c'] | ['a', 'b', 'c', 'd', 'e'] |
| 6 | NULL | NULL |
+------+------------------------------------------+---------------------------+
keywords
ARRAY, DISTINCT, ARRAY_DISTINCT
Edit this page
Feedback
SQL Manual SQL Functions Array Functions array_difference
array_difference
array_difference
Since Version 1.2.0
array_difference
description
Syntax
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------------------+---------------------------------+
| k1 | k2 | array_difference(`k2`) |
+------+-----------------------------+---------------------------------+
| 0 | [] | [] |
| 1 | [NULL] | [NULL] |
| 2 | [1, 2, 3] | [0, 1, 1] |
| 3 | [1, NULL, 3] | [0, NULL, NULL] |
| 4 | [0, 1, 2, 3, NULL, 4, 6] | [0, 1, 1, 1, NULL, NULL, 2] |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [0, 1, 1, 1, 1, -1, -1, -1, -1] |
| 6 | [6, 7, 8] | [0, 1, 1] |
+------+-----------------------------+---------------------------------+
keywords
ARRAY, DIFFERENCE, ARRAY_DIFFERENCE
array_union
array_union
Since Version 1.2.0
array_union
description
Syntax
Returns an array of the elements in the union of array1 and array2, without duplicates. If the input parameter is null, null is
returned.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+--------------+-------------------------+
| k1 | k2 | k3 | array_union(`k2`, `k3`) |
+------+-----------------+--------------+-------------------------+
+------+-----------------+--------------+-------------------------+
+------+-----------------+--------------+-------------------------+
| k1 | k2 | k3 | array_union(`k2`, `k3`) |
+------+-----------------+--------------+-------------------------+
+------+-----------------+--------------+-------------------------+
| k1 | k2 | k3 | array_union(`k2`, `k3`)
|
+------+----------------------------+----------------------------------+-----------------------------------------
----------+
| 1 | ['hello', 'world', 'c++'] | ['I', 'am', 'c++'] | ['hello', 'world', 'c++', 'I', 'am']
|
| 2 | ['a1', 'equals', 'b1'] | ['a2', 'equals', 'b2'] | ['a1', 'equals', 'b1', 'a2', 'b2']
|
| 3 | ['hasnull', NULL, 'value'] | ['nohasnull', 'nonull', 'value'] | ['hasnull', NULL, 'value', 'nohasnull',
'nonull'] |
+------+----------------------------+----------------------------------+-----------------------------------------
----------+
| k1 | k2 | k3 | array_union(`k2`, `k3`) |
+------+------------------+-------------------+----------------------------+
| 1 | [1.1, 2.1, 3.44] | [2.1, 3.4, 5.4] | [1.1, 2.1, 3.44, 3.4, 5.4] |
+------+------------------+-------------------+----------------------------+
keywords
ARRAY,UNION,ARRAY_UNION
array_ join
array_ join
Since Version 1.2.0
array_join
description
Syntax
Combines all elements in the array to generate a new string according to the separator (sep)
and the string to replace NULL
(null_replace).
If sep is NULL, return NULL.
If null_replace is NULL, return NULL.
If sep is an empty string, no delimiter is
applied.
If null_replace is an empty string or not specified, the NULL elements in the array are discarded directly.
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
mysql> select k1, k2, array_join(k2, '_', 'null') from array_test order by k1;
+------+-----------------------------+------------------------------------+
+------+-----------------------------+------------------------------------+
| 1 | [1, 2, 3, 4, 5] | 1_2_3_4_5 |
| 2 | [6, 7, 8] | 6_7_8 |
| 3 | [] | |
| 4 | NULL | NULL |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | 1_2_3_4_5_4_3_2_1 |
+------+-----------------------------+------------------------------------+
mysql> select k1, k2, array_join(k2, '_', 'null') from array_test01 order by k1;
+------+-----------------------------------+------------------------------------+
+------+-----------------------------------+------------------------------------+
+------+-----------------------------------+------------------------------------+
mysql> select k1, k2, array_join(k2, '_') from array_test order by k1;
+------+-----------------------------+----------------------------+
| k1 | k2 | array_join(`k2`, '_') |
+------+-----------------------------+----------------------------+
| 1 | [1, 2, 3, 4, 5] | 1_2_3_4_5 |
| 2 | [6, 7, 8] | 6_7_8 |
| 3 | [] | |
| 4 | NULL | NULL |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | 1_2_3_4_5_4_3_2_1 |
+------+-----------------------------+----------------------------+
mysql> select k1, k2, array_join(k2, '_') from array_test01 order by k1;
+------+-----------------------------------+----------------------------+
| k1 | k2 | array_join(`k2`, '_') |
+------+-----------------------------------+----------------------------+
+------+-----------------------------------+----------------------------+
keywords
ARRAY, JOIN, ARRAY_JOIN
array_with_constant
array_with_constant
Since Version 1.2.0
array_with_constant
description
Syntax
ARRAY<T> array_with_constant(n, T)
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+---------------------------------+
| array_with_constant(2, 'hello') |
+---------------------------------+
| ['hello', 'hello'] |
+---------------------------------+
+-------------------------------+
| array_with_constant(3, 12345) |
+-------------------------------+
+-------------------------------+
+------------------------------+
| array_with_constant(3, NULL) |
+------------------------------+
+------------------------------+
keywords
ARRAY,WITH_CONSTANT,ARRAY_WITH_CONSTANT
array_enumerate
array_enumerate
Since Version 1.2.0
array_enumerate
description
Returns array sub item indexes eg. [1, 2, 3, …, length (arr) ]
example
mysql> create table array_type_table(k1 INT, k2 Array<STRING>) duplicate key (k1)
mysql> insert into array_type_table values (0, []), ("1", [NULL]), ("2", ["1", "2", "3"]), ("3", ["1", NULL,
"3"]), ("4", NULL);
+------------------+-----------------------+
| k2 | array_enumerate(`k2`) |
+------------------+-----------------------+
| [] | [] |
| [NULL] | [1] |
| NULL | NULL |
+------------------+-----------------------+
keywords
ARRAY,ENUMERATE,ARRAY_ENUMERATE
array_popback
array_popback
Since Version 1.2.0
array_popback
description
Syntax
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+-----------------------------------------------------+
+-----------------------------------------------------+
| [test, NULL] |
+-----------------------------------------------------+
keywords
ARRAY,POPBACK,ARRAY_POPBACK
array_compact
array_compact
Since Version 1.2.0
array_compact
description
Removes consecutive duplicate elements from an array. The order of result values is determined by the order in the source
array.
Syntax
array_compact(arr)
Arguments
arr — The array to inspect.
Returned value
The array without continuous duplicate.
Type: Array.
notice
Only supported in vectorized engine
example
select array_compact([1, 2, 3, 3, null, null, 4, 4]);
+----------------------------------------------------+
+----------------------------------------------------+
| [1, 2, 3, NULL, 4] |
+----------------------------------------------------+
+-------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------+
select array_compact(['2015-03-13','2015-03-13']);
+--------------------------------------------------+
| array_compact(ARRAY('2015-03-13', '2015-03-13')) |
+--------------------------------------------------+
| ['2015-03-13'] |
+--------------------------------------------------+
keywords
ARRAY,COMPACT,ARRAY_COMPACT
arrays_overlap
arrays_overlap
Since Version 1.2.0
arrays_overlap
description
Syntax
BOOLEAN arrays_overlap(ARRAY<T> left, ARRAY<T> right)
Check if there is any common element for left and right array. Return below values:
NULL - when left or right array is NULL; OR any element inside left and right array is NULL;
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+--------------+-----------+-------------------------------------+
+--------------+-----------+-------------------------------------+
| [1, 2, 3] | [3, 4, 5] | 1 |
| [1, 2, 3] | [5, 6] | 0 |
| [1, 2, 3] | [1, 2] | 1 |
+--------------+-----------+-------------------------------------+
keywords
ARRAY,ARRAYS,OVERLAP,ARRAYS_OVERLAP
countequal
countequal
Since Version 1.2.0
countequal
description
Syntax
BIGINT countequal(ARRAY<T> arr, T value)
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+-----------------+--------------------------+
| id | c_array | countequal(`c_array`, 5) |
+------+-----------------+--------------------------+
| 1 | [1, 2, 3, 4, 5] | 1 |
| 2 | [6, 7, 8] | 0 |
| 3 | [] | 0 |
| 4 | NULL | NULL |
+------+-----------------+--------------------------+
keywords
ARRAY,COUNTEQUAL,
element_at
element_at
Since Version 1.2.0
element_at
description
Syntax
T element_at(ARRAY<T> arr, BIGINT position)
T arr[position]
Returns an element of an array located at the input position. If there is no element at the position, return NULL.
notice
Only supported in vectorized engine
example
positive position example:
+------+-----------------+--------------------------+
| id | c_array | element_at(`c_array`, 5) |
+------+-----------------+--------------------------+
| 1 | [1, 2, 3, 4, 5] | 5 |
| 2 | [6, 7, 8] | NULL |
| 3 | [] | NULL |
| 4 | NULL | NULL |
+------+-----------------+--------------------------+
+------+-----------------+----------------------------------+
+------+-----------------+----------------------------------+
| 1 | [1, 2, 3, 4, 5] | 4 |
| 2 | [6, 7, 8] | 7 |
| 3 | [] | NULL |
| 4 | NULL | NULL |
+------+-----------------+----------------------------------+
keywords
ELEMENT_AT, SUBSCRIPT
convert_tz
convert_tz
Description
Syntax
DATETIME CONVERT_TZ(DATETIME dt, VARCHAR from_tz, VARCHAR to_tz)
Convert datetime value. Go from the given input time zone to the specified time zone and return the result value. If the
argument is invalid, the function returns null.
Example
mysql> select convert_tz('2019-08-01 13:21:03', 'Asia/Shanghai', 'America/Los_Angeles');
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| 2019-07-31 22:21:03 |
+---------------------------------------------------------------------------+
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| 2019-07-31 22:21:03 |
+--------------------------------------------------------------------+
keywords
CONVERT_TZ
curdate,current_date
curdate,current_date
Description
Syntax
DATE CURDATE()
example
mysql> SELECT CURDATE();
+------------+
| CURDATE() |
+------------+
| 2019-12-20 |
+------------+
+---------------+
| CURDATE() + 0 |
+---------------+
| 20191220 |
+---------------+
keywords
CURDATE,CURRENT_DATE
curtime,current_time
curtime,current_time
Description
Syntax
TIME CURTIME()
Examples
mysql> select current_time();
+----------------+
| current_time() |
+----------------+
| 15:25:47 |
+----------------+
keywords
CURTIME,CURRENT_TIME
current_timestamp
current_timestamp
Description
Syntax
DATETIME CURRENT_TIMESTAMP()
example
mysql> select current_timestamp();
+---------------------+
| current_timestamp() |
+---------------------+
| 2019-05-27 15:59:33 |
+---------------------+
example
mysql> select current_timestamp(3);
+-------------------------+
| current_timestamp(3) |
+-------------------------+
| 2022-09-06 16:18:00.922 |
+-------------------------+
Note:
keywords
CURRENT_TIMESTAMP,CURRENT,TIMESTAMP
localtime,localtimestamp
localtime,localtimestamp
description
Syntax
DATETIME localtime()
DATETIME localtimestamp()
Example
mysql> select localtime();
+---------------------+
| localtime() |
+---------------------+
| 2022-09-22 17:30:23 |
+---------------------+
+---------------------+
| localtimestamp() |
+---------------------+
| 2022-09-22 17:30:29 |
+---------------------+
keywords
localtime,localtimestamp
now
now
Description
Syntax
DATETIME NOW ()
example
mysql> select now();
+---------------------+
| now() |
+---------------------+
| 2019-05-27 15:58:25 |
+---------------------+
example
mysql> select now(3);
+-------------------------+
| now(3) |
+-------------------------+
| 2022-09-06 16:13:30.078 |
+-------------------------+
Note:
keywords
NOW
year
year
Description
Syntax
INT YEAR(DATETIME date)
Returns the year part of the date type, ranging from 1000 to 9999
example
mysql> select year('1987-01-01');
+-----------------------------+
| year('1987-01-01 00:00:00') |
+-----------------------------+
| 1987 |
+-----------------------------+
keywords
YEAR
quarter
quarter
description
Syntax
INT quarter(DATETIME date)
Example
mysql> select quarter('2022-09-22 17:00:00');
+--------------------------------+
| quarter('2022-09-22 17:00:00') |
+--------------------------------+
| 3 |
+--------------------------------+
keywords
quarter
month
month
Description
Syntax
INT MONTH (DATETIME date)
example
mysql> select month('1987-01-01');
+-----------------------------+
| month('1987-01-01 00:00:00') |
+-----------------------------+
| 1 |
+-----------------------------+
keywords
MONTH
day
day
Description
Syntax
INT DAY(DATETIME date)
Get the day information in the date, and return values range from 1 to 31.
example
mysql> select day('1987-01-31');
+----------------------------+
| day('1987-01-31 00:00:00') |
+----------------------------+
| 31 |
+----------------------------+
keywords
DAY
dayofyear
dayofyear
Description
Syntax
INT DAYOFYEAR (DATETIME date)
example
mysql> select dayofyear('2007-02-03 00:00:00');
+----------------------------------+
| dayofyear('2007-02-03 00:00:00') |
+----------------------------------+
| 34 |
+----------------------------------+
keywords
DAYOFYEAR
dayofmonth
dayofmonth
Description
Syntax
INT DAYOFMONTH (DATETIME date)
Get the day information in the date, and return values range from 1 to 31.
example
mysql> select dayofmonth('1987-01-31');
+-----------------------------------+
| dayofmonth('1987-01-31 00:00:00') |
+-----------------------------------+
| 31 |
+-----------------------------------+
keywords
DAYOFMONTH
dayofweek
dayofweek
Description
Syntax
INT DAYOFWEEK (DATETIME date)
The DAYOFWEEK function returns the index value of the working day of the date, that is, 1 on Sunday, 2 on Monday, and 7 on
Saturday.
example
mysql> select dayofweek('2019-06-25');
+----------------------------------+
| dayofweek('2019-06-25 00:00:00') |
+----------------------------------+
| 3 |
+----------------------------------+
+-----------------------------------+
| dayofweek(CAST(20190625 AS DATE)) |
+-----------------------------------+
| 3 |
+-----------------------------------+
keywords
DAYOFWEEK
week
week
Description
Syntax
INT WEEK(DATE date)
INT WEEK(DATE date, INT mode)
Returns the week number for date.The value of the mode argument defaults to 0.
The following table describes how the
mode argument works.
example
mysql> select week('2020-1-1');
+------------------+
| week('2020-1-1') |
+------------------+
| 0 |
+------------------+
+---------------------+
| week('2020-7-1', 1) |
+---------------------+
| 27 |
+---------------------+
keywords
WEEK
weekday
weekday
Description
Syntax
INT WEEKDAY (DATETIME date)
The WEEKDAY function returns the index value of the working day of the date, that is, 0 on Monday, 1 on Tuesday, and 6 on
Sunday.
+-----+-----+-----+-----+-----+-----+-----+
+-----+-----+-----+-----+-----+-----+-----+
weekday | 6 | 0 | 1 | 2 | 3 | 4 | 5 |
+-----+-----+-----+-----+-----+-----+-----+
dayofweek | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+-----+-----+-----+-----+-----+-----+-----+
example
mysql> select weekday('2019-06-25');
+--------------------------------+
| weekday('2019-06-25 00:00:00') |
+--------------------------------+
| 1 |
+--------------------------------+
+---------------------------------+
| weekday(CAST(20190625 AS DATE)) |
+---------------------------------+
| 1 |
+---------------------------------+
keywords
WEEKDAY
weekofyear
weekofyear
Description
Syntax
INT WEEKOFYEAR (DATETIME DATE)
example
mysql> select weekofyear('2008-02-20 00:00:00');
+-----------------------------------+
| weekofyear('2008-02-20 00:00:00') |
+-----------------------------------+
| 8 |
+-----------------------------------+
keywords
WEEKOFYEAR
yearweek
yearweek
Description
Syntax
INT YEARWEEK(DATE date)
INT YEARWEEK(DATE date, INT mode)
Returns year and week for a date.The value of the mode argument defaults to 0.
When the week of the date belongs to the
previous year, the year and week of the previous year are returned;
when the week of the date belongs to the next year, the
year of the next year is returned and the week is 1.
example
mysql> select yearweek('2021-1-1');
+----------------------+
| yearweek('2021-1-1') |
+----------------------+
| 202052 |
+----------------------+
+----------------------+
| yearweek('2020-7-1') |
+----------------------+
| 202026 |
+----------------------+
+------------------------------------+
| yearweek('2024-12-30 00:00:00', 1) |
+------------------------------------+
| 202501 |
+------------------------------------+
keywords
YEARWEEK
dayname
dayname
Description
Syntax
VARCHAR DAYNAME (DATE)
example
mysql> select dayname('2007-02-03 00:00:00');
+--------------------------------+
| dayname('2007-02-03 00:00:00') |
+--------------------------------+
| Saturday |
+--------------------------------+
keywords
DAYNAME
monthname
monthname
Description
Syntax
VARCHAR MONTHNAME (DATE)
example
mysql> select monthname('2008-02-03 00:00:00');
+----------------------------------+
| monthname('2008-02-03 00:00:00') |
+----------------------------------+
| February |
+----------------------------------+
keywords
MONTHNAME
hour
hour
description
Syntax
INT HOUR(DATETIME date)
example
mysql> select hour('2018-12-31 23:59:59');
+-----------------------------+
| hour('2018-12-31 23:59:59') |
+-----------------------------+
| 23 |
+-----------------------------+
keywords
HOUR
minute
minute
description
Syntax
INT MINUTE(DATETIME date)
example
mysql> select minute('2018-12-31 23:59:59');
+-----------------------------+
| minute('2018-12-31 23:59:59') |
+-----------------------------+
| 59 |
+-----------------------------+
keywords
MINUTE
second
second
description
Syntax
INT SECOND(DATETIME date)
example
mysql> select second('2018-12-31 23:59:59');
+-----------------------------+
| second('2018-12-31 23:59:59') |
+-----------------------------+
| 59 |
+-----------------------------+
keywords
SECOND
from_days
from_days
Description
Syntax
DATE FROM_DAYS(INT N)
example
mysql > select from u days (730669);
+-------------------+
| from_days(730669) |
+-------------------+
| 2000-07-03 |
+-------------------+
keywords
FROM_DAYS,FROM,DAYS
last_day
last_day
Description
Syntax
DATE last_day(DATETIME date)
Return the last day of the month, the return day may be :
'28'(February and not a leap year),
'29'(February and a leap year),
'30'(April, June, September, November),
'31'(January, March, May, July, August, October, December)
example
mysql > select last_day('2000-02-03');
+-------------------+
| last_day('2000-02-03 00:00:00') |
+-------------------+
| 2000-02-29 |
+-------------------+
keywords
LAST_DAY,DAYS
to_monday
to_monday
Description
Syntax
DATE to_monday(DATETIME date)
Round a date or datetime down to the nearest Monday, return type is Date or DateV2.
Specially, input 1970-01-01, 1970-01-02,
1970-01-03 and 1970-01-04 will return ' 1970-01-01'
example
MySQL [(none)]> select to_monday('2022-09-10');
+----------------------------------+
| to_monday('2022-09-10 00:00:00') |
+----------------------------------+
| 2022-09-05 |
+----------------------------------+
keywords
MONDAY
from_unixtime
from_unixtime
description
syntax
DATETIME FROM UNIXTIME (INT unix timestamp [, VARCHAR string format]
Convert the UNIX timestamp to the corresponding time format of bits, and the format returned is specified by
string_format
example
mysql> select from_unixtime(1196440219);
+---------------------------+
| from_unixtime(1196440219) |
+---------------------------+
| 2007-12-01 00:30:19 |
+---------------------------+
+-----------------------------------------+
| from_unixtime(1196440219, '%Y-%m-%d') |
+-----------------------------------------+
| 2007-12-01 |
+-----------------------------------------+
+--------------------------------------------------+
+--------------------------------------------------+
| 2007-12-01 00:30:19 |
+--------------------------------------------------+
keywords
FROM_UNIXTIME,FROM,UNIXTIME
unix_timestamp
unix_timestamp
Description
Syntax
INT UNIX_TIMESTAMP(), UNIX_TIMESTAMP(DATETIME date)
Any date before 1970-01-01 00:00:00 or after 2038-01-19 03:14:07 will return 0.
example
mysql> select unix_timestamp();
+------------------+
| unix_timestamp() |
+------------------+
| 1558589570 |
+------------------+
+---------------------------------------+
| unix_timestamp('2007-11-30 10:30:19') |
+---------------------------------------+
| 1196389819 |
+---------------------------------------+
+---------------------------------------+
| unix_timestamp('2007-11-30 10:30-19') |
+---------------------------------------+
| 1196389819 |
+---------------------------------------+
+---------------------------------------+
| unix_timestamp('2007-11-30 10:30%3A19') |
+---------------------------------------+
| 1196389819 |
+---------------------------------------+
+---------------------------------------+
| unix_timestamp('1969-01-01 00:00:00') |
+---------------------------------------+
| 0 |
+---------------------------------------+
keywords
UNIX_TIMESTAMP,UNIX,TIMESTAMP
utc_timestamp
utc_timestamp
Description
Syntax
DATETIME UTC_TIMESTAMP()
Returns the current UTC date and time in "YYYY-MM-DD HH: MM: SS" or
example
mysql> select utc_timestamp(),utc_timestamp() + 1;
+---------------------+---------------------+
| utc_timestamp() | utc_timestamp() + 1 |
+---------------------+---------------------+
+---------------------+---------------------+
keywords
UTC_TIMESTAMP,UTC,TIMESTAMP
to_date
to_date
description
Syntax
DATE TO_DATE(DATETIME)
example
mysql> select to_date("2020-02-02 00:00:00");
+--------------------------------+
| to_date('2020-02-02 00:00:00') |
+--------------------------------+
| 2020-02-02 |
+--------------------------------+
keywords
TO_DATE
to_days
to_days
Description
Syntax
INT TO DAYS
example
mysql> select to_days('2007-10-07');
+-----------------------+
| to_days('2007-10-07') |
+-----------------------+
| 733321 |
+-----------------------+
keywords
TO_DAYS,TO,DAYS
extract
extract
description
Syntax
INT extract(unit FROM DATETIME)
Extract DATETIME The value of a specified unit. The unit can be year, day, hour, minute, or second
Example
mysql> select extract(year from '2022-09-22 17:01:30') as year,
+------+-------+------+------+--------+--------+
+------+-------+------+------+--------+--------+
| 2022 | 9 | 22 | 17 | 1 | 30 |
+------+-------+------+------+--------+--------+
keywords
extract
makedate
makedate
Description
Syntax
DATE MAKEDATE(INT year, INT dayofyear)
Returns a date, given year and day-of-year values. dayofyear must be greater than 0 or the result is NULL.
example
mysql> select makedate(2021,1), makedate(2021,100), makedate(2021,400);
+-------------------+---------------------+---------------------+
+-------------------+---------------------+---------------------+
+-------------------+---------------------+---------------------+
keywords
MAKEDATE
str_to_date
Str_to_date
Description
Syntax
DATETIME STR TWO DATES (VARCHAR STR, VARCHAR format)
Convert STR to DATE type by format specified, if the conversion result does not return NULL
example
mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s');
+---------------------------------------------------------+
+---------------------------------------------------------+
| 2014-12-21 12:34:56 |
+---------------------------------------------------------+
+--------------------------------------------------------------+
+--------------------------------------------------------------+
| 2014-12-21 12:34:56 |
+--------------------------------------------------------------+
+-----------------------------------------+
+-----------------------------------------+
| 2004-10-18 |
+-----------------------------------------+
+------------------------------------------------+
+------------------------------------------------+
| 2020-09-01 00:00:00 |
+------------------------------------------------+
keywords
STR_TO_DATE,STR,TO,DATE
time_round
time_round
description
Syntax
DATETIME TIME_ROUND(DATETIME expr)
The function name TIME_ROUND consists of two parts, Each part consists of the following optional values.
origin specifies the start time of the period, the default is 1970-01-01T00:00:00 , the start time of WEEK is Sunday, which
is 1970-01-04T00:00:00 . Could be larger than expr .
Please try to choose common period , such as 3 MONTH , 90 MINUTE . If you set a uncommon period , please also specify
origin .
example
+------------------------------+
| year_floor('20200202000000') |
+------------------------------+
| 2020-01-01 00:00:00 |
+------------------------------+
+--------------------------------------------------------+
+--------------------------------------------------------+
| 2020-04-01 00:00:00 |
+--------------------------------------------------------+
+---------------------------------------------------------+
+---------------------------------------------------------+
| 2020-02-03 00:00:00 |
+---------------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
| 2020-04-09 00:00:00 |
+-------------------------------------------------------------------------------------------------+
keywords
TIME_ROUND
timediff
timediff
Description
Syntax
TIME TIMEDIFF (DATETIME expr1, DATETIME expr2)
The TIMEDIFF function returns the result of expr1 - expr2 expressed as a time value, with a return value of TIME type
The results are limited to TIME values ranging from - 838:59:59 to 838:59:59.
example
+----------------------------------+
| timediff(now(), utc_timestamp()) |
+----------------------------------+
| 08:00:00 |
+----------------------------------+
+--------------------------------------------------------+
+--------------------------------------------------------+
| 00:00:09 |
+--------------------------------------------------------+
+---------------------------------------+
+---------------------------------------+
| NULL |
+---------------------------------------+
keywords
TIMEDIFF
timestampadd
timestampadd
description
Syntax
DATETIME TIMESTAMPADD(unit, interval, DATETIME datetime_expr)
Adds the integer expression interval to the date or datetime expression datetime_expr.
The unit for interval is given by the unit argument, which should be one of the following values:
example
+------------------------------------------------+
+------------------------------------------------+
| 2019-01-02 00:01:00 |
+------------------------------------------------+
+----------------------------------------------+
+----------------------------------------------+
| 2019-01-09 00:00:00 |
+----------------------------------------------+
keywords
TIMESTAMPADD
timestampdiff
timestampdiff
description
Syntax
INT TIMESTAMPDIFF(unit,DATETIME datetime_expr1, DATETIME datetime_expr2)
Returns datetime_expr2 − datetime_expr1, where datetime_expr1 and datetime_expr2 are date or datetime expressions.
The unit for the result (an integer) is given by the unit argument.
The legal values for unit are the same as those listed in the description of the TIMESTAMPADD() function.
example
+--------------------------------------------------------------------+
+--------------------------------------------------------------------+
| 3 |
+--------------------------------------------------------------------+
+-------------------------------------------------------------------+
+-------------------------------------------------------------------+
| -1 |
+-------------------------------------------------------------------+
+---------------------------------------------------------------------+
+---------------------------------------------------------------------+
| 128885 |
+---------------------------------------------------------------------+
keywords
TIMESTAMPDIFF
date_add
date_add
Description
Syntax
INT DATE_ADD(DATETIME date,INTERVAL expr type)
example
mysql> select date_add('2010-11-30 23:59:59', INTERVAL 2 DAY);
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-02 23:59:59 |
+-------------------------------------------------+
keywords
DATE_ADD,DATE,ADD
date_sub
date_sub
Description
Syntax
DATETIME DATE_SUB(DATETIME date,INTERVAL expr type)
example
mysql> select date_sub('2010-11-30 23:59:59', INTERVAL 2 DAY);
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-11-28 23:59:59 |
+-------------------------------------------------+
keywords
Date, date, date
date_trunc
date_trunc
Since Version 1.2.0
date_trunc
Description
Syntax
DATETIME DATE_TRUNC(DATETIME datetime,VARCHAR unit)
unit is the time unit you want to truncate. The optional values are as follows:
[ second , minute , hour , day , month , quarter , year ] 。
If unit does not meet the above optional values, the result will return
NULL.
example
mysql> select date_trunc('2010-12-02 19:28:30', 'second');
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-02 19:28:30 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-02 19:28:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-02 19:00:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-02 00:00:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-11-28 00:00:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-12-01 00:00:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-10-01 00:00:00 |
+-------------------------------------------------+
+-------------------------------------------------+
+-------------------------------------------------+
| 2010-01-01 00:00:00 |
+-------------------------------------------------+
keywords
DATE_TRUNC,DATE,DATETIME
date_format
date_format
Description
Syntax
VARCHAR DATE' FORMAT (DATETIME DATE, VARCHAR Format)
Convert the date type to a bit string according to the format type.
Currently supports a string with a maximum 128 bytes and
returns NULL if the length of the return value exceeds 128
The date parameter is the valid date. Format specifies the date/time output format.
% f | microseconds
% H | Hours (00-23)
% h | hour (01-12)
% I | Hours (01-12)
% k | hours (0-23)
% L | Hours (1-12)
% M | Moon Name
%p | AM or PM
% s | seconds (00-59)
% V | Week (01-53) Sunday is the first day of the week, and% X is used.
% v | Week (01 - 53) Monday is the first day of the week, and% x is used
% W | Sunday
% X | Year, where Sunday is the first day of the week, 4 places, and% V use
% X | year, of which Monday is the first day of the week, 4 places, and% V
% Y | Year, 4
% Y | Year, 2
%% | Represent %
yyyyMMdd
yyyy-MM-dd
yyyy-MM-dd HH:mm:ss
example
mysql> select date_format('2009-10-04 22:23:00', '%W %M %Y');
+------------------------------------------------+
+------------------------------------------------+
+------------------------------------------------+
+------------------------------------------------+
+------------------------------------------------+
| 22:23:00 |
+------------------------------------------------+
+------------------------------------------------------------+
+------------------------------------------------------------+
+------------------------------------------------------------+
+------------------------------------------------------------+
+------------------------------------------------------------+
| 22 22 10 10:23:00 PM 22:23:00 00 6 |
+------------------------------------------------------------+
+---------------------------------------------+
+---------------------------------------------+
| 1998 52 |
+---------------------------------------------+
+------------------------------------------+
+------------------------------------------+
| 01 |
+------------------------------------------+
+--------------------------------------------+
+--------------------------------------------+
| %01 |
+--------------------------------------------+
keywords
DATE_FORMAT,DATE,FORMAT
datediff
datediff
Description
Syntax
DATETIME DATEDIFF (DATETIME expr1,DATETIME expr2)
Note: Only the date part of the value participates in the calculation.
example
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
| 1 |
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------+
| -31 |
+-----------------------------------------------------------------------------------+
keywords
DATEDIFF
microseconds_add
microseconds_add
description
Syntax
DATETIMEV2 microseconds_add(DATETIMEV2 basetime, INT delta)
example
mysql> select now(3), microseconds_add(now(3), 100000);
+-------------------------+----------------------------------+
+-------------------------+----------------------------------+
+-------------------------+----------------------------------+
now(3) returns current time as type DATETIMEV2 with precision 3d , microseconds_add(now(3), 100000) means 100000
microseconds after current time
keywords
microseconds_add
minutes_add
minutes_add
description
Syntax
DATETIME MINUTES_ADD(DATETIME date, INT minutes)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select minutes_add("2020-02-02", 1);
+---------------------------------------+
| minutes_add('2020-02-02 00:00:00', 1) |
+---------------------------------------+
| 2020-02-02 00:01:00 |
+---------------------------------------+
keywords
MINUTES_ADD
minutes_diff
minutes_diff
description
Syntax
INT minutes_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is a few minutes
example
mysql> select minutes_diff('2020-12-25 22:00:00','2020-12-25 21:00:00');
+------------------------------------------------------------+
+------------------------------------------------------------+
| 60 |
+------------------------------------------------------------+
keywords
minutes_diff
minutes_sub
minutes_sub
description
Syntax
DATETIME MINUTES_SUB(DATETIME date, INT minutes)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select minutes_sub("2020-02-02 02:02:02", 1);
+---------------------------------------+
| minutes_sub('2020-02-02 02:02:02', 1) |
+---------------------------------------+
| 2020-02-02 02:01:02 |
+---------------------------------------+
keywords
MINUTES_SUB
seconds_add
seconds_add
description
Syntax
DATETIME SECONDS_ADD(DATETIME date, INT seconds)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select seconds_add("2020-02-02 02:02:02", 1);
+---------------------------------------+
| seconds_add('2020-02-02 02:02:02', 1) |
+---------------------------------------+
| 2020-02-02 02:02:03 |
+---------------------------------------+
keywords
SECONDS_ADD
seconds_diff
seconds_diff
description
Syntax
INT seconds_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is seconds
example
mysql> select seconds_diff('2020-12-25 22:00:00','2020-12-25 21:00:00');
+------------------------------------------------------------+
+------------------------------------------------------------+
| 3600 |
+------------------------------------------------------------+
keywords
seconds_diff
seconds_sub
seconds_sub
description
Syntax
DATETIME SECONDS_SUB(DATETIME date, INT seconds)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select seconds_sub("2020-01-01 00:00:00", 1);
+---------------------------------------+
| seconds_sub('2020-01-01 00:00:00', 1) |
+---------------------------------------+
| 2019-12-31 23:59:59 |
+---------------------------------------+
keywords
SECONDS_SUB
hours_add
hours_add
description
Syntax
DATETIME HOURS_ADD(DATETIME date, INT hours)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select hours_add("2020-02-02 02:02:02", 1);
+-------------------------------------+
| hours_add('2020-02-02 02:02:02', 1) |
+-------------------------------------+
| 2020-02-02 03:02:02 |
+-------------------------------------+
keywords
HOURS_ADD
hours_diff
hours_diff
description
Syntax
INT hours_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is a few hours
example
mysql> select hours_diff('2020-12-25 22:00:00','2020-12-25 21:00:00');
+----------------------------------------------------------+
+----------------------------------------------------------+
| 1 |
+----------------------------------------------------------+
keywords
hours_diff
hours_sub
hours_sub
description
Syntax
DATETIME HOURS_SUB(DATETIME date, INT hours)
The parameter date can be DATETIME or DATE, and the return type is DATETIME.
example
mysql> select hours_sub("2020-02-02 02:02:02", 1);
+-------------------------------------+
| hours_sub('2020-02-02 02:02:02', 1) |
+-------------------------------------+
| 2020-02-02 01:02:02 |
+-------------------------------------+
keywords
HOURS_SUB
days_add
days_add
description
Syntax
DATETIME DAYS_ADD(DATETIME date, INT days)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select days_add(to_date("2020-02-02 02:02:02"), 1);
+---------------------------------------------+
| days_add(to_date('2020-02-02 02:02:02'), 1) |
+---------------------------------------------+
| 2020-02-03 |
+---------------------------------------------+
keywords
DAYS_ADD
days_diff
days_diff
description
Syntax
INT days_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is a few days
example
mysql> select days_diff('2020-12-25 22:00:00','2020-12-24 22:00:00');
+---------------------------------------------------------+
+---------------------------------------------------------+
| 1 |
+---------------------------------------------------------+
keywords
days_diff
days_sub
days_sub
description
Syntax
DATETIME DAYS_SUB(DATETIME date, INT days)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select days_sub("2020-02-02 02:02:02", 1);
+------------------------------------+
| days_sub('2020-02-02 02:02:02', 1) |
+------------------------------------+
| 2020-02-01 02:02:02 |
+------------------------------------+
keywords
DAYS_SUB
weeks_add
weeks_add
description
Syntax
DATETIME WEEKS_ADD(DATETIME date, INT weeks)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select weeks_add("2020-02-02 02:02:02", 1);
+-------------------------------------+
| weeks_add('2020-02-02 02:02:02', 1) |
+-------------------------------------+
| 2020-02-09 02:02:02 |
+-------------------------------------+
keywords
WEEKS_ADD
weeks_diff
weeks_diff
description
Syntax
INT weeks_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is weeks
example
mysql> select weeks_diff('2020-12-25','2020-10-25');
+----------------------------------------------------------+
+----------------------------------------------------------+
| 8 |
+----------------------------------------------------------+
keywords
weeks_diff
weeks_sub
weeks_sub
description
Syntax
DATETIME WEEKS_SUB(DATETIME date, INT weeks)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select weeks_sub("2020-02-02 02:02:02", 1);
+-------------------------------------+
| weeks_sub('2020-02-02 02:02:02', 1) |
+-------------------------------------+
| 2020-01-26 02:02:02 |
+-------------------------------------+
keywords
WEEKS_SUB
months_add
months_add
description
Syntax
DATETIME MONTHS_ADD(DATETIME date, INT months)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select months_add("2020-01-31 02:02:02", 1);
+--------------------------------------+
| months_add('2020-01-31 02:02:02', 1) |
+--------------------------------------+
| 2020-02-29 02:02:02 |
+--------------------------------------+
keywords
MONTHS_ADD
months_diff
months_diff
description
Syntax
INT months_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is months
example
mysql> select months_diff('2020-12-25','2020-10-25');
+-----------------------------------------------------------+
+-----------------------------------------------------------+
| 2 |
+-----------------------------------------------------------+
keywords
months_diff
months_sub
months_sub
description
Syntax
DATETIME MONTHS_SUB(DATETIME date, INT months)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select months_sub("2020-02-02 02:02:02", 1);
+--------------------------------------+
| months_sub('2020-02-02 02:02:02', 1) |
+--------------------------------------+
| 2020-01-02 02:02:02 |
+--------------------------------------+
keywords
MONTHS_SUB
years_add
years_add
description
Syntax
DATETIME YEARS_ADD(DATETIME date, INT years)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select years_add("2020-01-31 02:02:02", 1);
+-------------------------------------+
| years_add('2020-01-31 02:02:02', 1) |
+-------------------------------------+
| 2021-01-31 02:02:02 |
+-------------------------------------+
keywords
YEARS_ADD
years_diff
years_diff
description
Syntax
INT years_diff(DATETIME enddate, DATETIME startdate)
The difference between the start time and the end time is several years
example
mysql> select years_diff('2020-12-25','2019-10-25');
+----------------------------------------------------------+
+----------------------------------------------------------+
| 1 |
+----------------------------------------------------------+
keywords
years_diff
years_sub
years_sub
description
Syntax
DATETIME YEARS_SUB(DATETIME date, INT years)
The parameter date can be DATETIME or DATE, and the return type is consistent with that of the parameter date.
example
mysql> select years_sub("2020-02-02 02:02:02", 1);
+-------------------------------------+
| years_sub('2020-02-02 02:02:02', 1) |
+-------------------------------------+
| 2019-02-02 02:02:02 |
+-------------------------------------+
keywords
YEARS_SUB
ST_X
ST_X
Description
Syntax
DOUBLE ST_X(POINT point)
When point is a valid POINT type, the corresponding X coordinate value is returned.
example
mysql> SELECT ST_X(ST_Point(24.7, 56.7));
+----------------------------+
| st_x(st_point(24.7, 56.7)) |
+----------------------------+
| 24.7 |
+----------------------------+
keywords
ST_X,ST,X
ST_Y
ST_Y
Description
Syntax
DOUBLE ST_Y(POINT point)
When point is a valid POINT type, the corresponding Y coordinate value is returned.
example
mysql> SELECT ST_Y(ST_Point(24.7, 56.7));
+----------------------------+
| st_y(st_point(24.7, 56.7)) |
+----------------------------+
| 56.7 |
+----------------------------+
keywords
ST_Y,ST,Y
ST_Circle
ST_Circle
Description
Syntax
GEOMETRY ST_Circle(DOUBLE center_lng, DOUBLE center_lat, DOUBLE radius)
Convert a WKT (Well Known Text) into a circle on the earth's sphere. Where center_lng'denotes the longitude of the
center of a circle,
Center_lat denotes the latitude of the center of a circle, radius denotes the radius of a circle in
meters.
example
mysql> SELECT ST_AsText(ST_Circle(111, 64, 10000));
+--------------------------------------------+
+--------------------------------------------+
+--------------------------------------------+
keywords
ST_CIRCLE,ST,CIRCLE
ST_Distance_Sphere
ST_Distance_Sphere
description
Syntax
DOUBLE ST_Distance_Sphere(DOUBLE x_lng, DOUBLE x_lat, DOUBLE y_lng, DOUBLE y_lat)
Calculate the spherical distance between two points of the earth in meters. The incoming parameters are the longitude of
point X, the latitude of point X, the longitude of point Y and the latitude of point Y.
x_lng and y_lng are Longitude values, must be in the range [-180, 180].
x_lat and y_lat are Latitude values, must be in the range
[-90, 90].
example
mysql> select st_distance_sphere(116.35620117, 39.939093, 116.4274406433, 39.9020987219);
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
| 7336.9135549995917 |
+----------------------------------------------------------------------------+
keywords
ST_DISTANCE_SPHERE,ST,DISTANCE,SPHERE
St_Point
St_Point
Description
Syntax
POINT ST_Point(DOUBLE x, DOUBLE y)
Given the X coordinate value, the Y coordinate value returns the corresponding Point.
The current value is meaningful only
for spherical sets, and X/Y corresponds to longitude/latitude.
example
mysql> SELECT ST_AsText(ST_Point(24.7, 56.7));
+---------------------------------+
| st_astext(st_point(24.7, 56.7)) |
+---------------------------------+
+---------------------------------+
keywords
ST_POINT,ST,POINT
ST_Polygon,ST_PolyFromText,ST_PolygonFromText
ST_Polygon,ST_PolyFromText,ST_PolygonFromText
Description
Syntax
GEOMETRY ST_Polygon (VARCHAR wkt)
Converting a WKT (Well Known Text) into a corresponding polygon memory form
example
+------------------------------------------------------------------+
+------------------------------------------------------------------+
+------------------------------------------------------------------+
keywords
ST_POLYGON,ST_POLYFROMTEXT,ST_POLYGONFROMTEXT,ST,POLYGON,POLYFROMTEXT,POLYGONFROMTEXT
ST_AsText,ST_AsWKT
ST_AsText,ST_AsWKT
Description
Syntax
VARCHAR ST_AsText (GEOMETRY geo)
example
mysql> SELECT ST_AsText(ST_Point(24.7, 56.7));
+---------------------------------+
| st_astext(st_point(24.7, 56.7)) |
+---------------------------------+
+---------------------------------+
keywords
ST_ASTEXT,ST_ASWKT,ST,ASTEXT,ASWKT
ST_Contains
ST_Contains
Description
Syntax
BOOL ST_Contains(GEOMETRY shape1, GEOMETRY shape2)
example
mysql> SELECT ST_Contains(ST_Polygon("POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))"), ST_Point(5, 5));
+----------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------------------------------+
keywords
ST_CONTAINS,ST,CONTAINS
ST_GeometryFromText,ST_GeomFromText
ST_GeometryFromText,ST_GeomFromText
Description
Syntax
GEOMETRY ST_GeometryFromText (VARCHAR wkt)
example
mysql> SELECT ST_AsText(ST_GeometryFromText("LINESTRING (1 1, 2 2)"));
+---------------------------------------------------------+
| st_astext(st_geometryfromtext('LINESTRING (1 1, 2 2)')) |
+---------------------------------------------------------+
| LINESTRING (1 1, 2 2) |
+---------------------------------------------------------+
keywords
ST_GEOMETRYFROMTEXT,ST_GEOMFROMTEXT,ST,GEOMETRYFROMTEXT,GEOMFROMTEXT
ST_LineFromText,ST_LineStringFromText
ST_LineFromText,ST_LineStringFromText
Description
Syntax
GEOMETRY ST LineFromText (VARCHAR wkt)
example
mysql> SELECT ST_AsText(ST_LineFromText("LINESTRING (1 1, 2 2)"));
+---------------------------------------------------------+
| st_astext(st_geometryfromtext('LINESTRING (1 1, 2 2)')) |
+---------------------------------------------------------+
| LINESTRING (1 1, 2 2) |
+---------------------------------------------------------+
keywords
ST_LINEFROMTEXT, ST_LINESTRINGFROMTEXT,ST,LINEFROMTEXT,LINESTRINGFROMTEXT
to_base64
to_base64
description
Syntax
VARCHAR to_base64(VARCHAR str)
example
mysql> select to_base64('1');
+----------------+
| to_base64('1') |
+----------------+
| MQ== |
+----------------+
+------------------+
| to_base64('234') |
+------------------+
| MjM0 |
+------------------+
keywords
to_base64
from_base64
from_base64
description
Syntax
VARCHAR from_base64(VARCHAR str)
example
mysql> select from_base64('MQ==');
+---------------------+
| from_base64('MQ==') |
+---------------------+
| 1 |
+---------------------+
+---------------------+
| from_base64('MjM0') |
+---------------------+
| 234 |
+---------------------+
keywords
from_base64
ascii
ascii
Description
Syntax
INT AXES (WARCHAR STR)
Returns the ASCII code corresponding to the first character of the string
example
mysql> select ascii('1');
+------------+
| ascii('1') |
+------------+
| 49 |
+------------+
+--------------+
| ascii('234') |
+--------------+
| 50 |
+--------------+
keywords
ASCII
length
length
Description
Syntax
INT length (VARCHAR str)
example
mysql> select length("abc");
+---------------+
| length('abc') |
+---------------+
| 3 |
+---------------+
+------------------+
| length(' 中国') |
+------------------+
| 6 |
+------------------+
keywords
LENGTH
bit_length
bit_length
Description
Syntax
INT bit_length (VARCHAR str)
example
mysql> select bit_length("abc");
+-------------------+
| bit_length('abc') |
+-------------------+
| 24 |
+-------------------+
+----------------------+
| 48 |
+----------------------+
keywords
BIT_LENGTH
char_length
char_length
Description
Syntax
INT char_length(VARCHAR str)
Returns the length of the string, and the number of characters returned for multi-byte characters. For example, five two-byte
width words return a length of 5, only utf8 encoding is support at the current version. character_length is the alias for this
function.
example
mysql> select char_length("abc");
+--------------------+
| char_length('abc') |
+--------------------+
| 3 |
+--------------------+
| char_length(' 中国
') |
+-----------------------+
| 2 |
+-----------------------+
keywords
CHAR_LENGTH, CHARACTER_LENGTH
lpad
lpad
Description
Syntax
VARCHAR lpad (VARCHAR str, INT len, VARCHAR pad)
Returns a string of length len in str, starting with the initials. If len is longer than str, pad characters are added to STR until the
length of the string reaches len. If len is less than str's length, the function is equivalent to truncating STR strings and
returning only strings of len's length. The len is character length not the bye size.
example
mysql> SELECT lpad("hi", 5, "xy");
+---------------------+
| lpad('hi', 5, 'xy') |
+---------------------+
| xyxhi |
+---------------------+
+---------------------+
| lpad('hi', 1, 'xy') |
+---------------------+
| h |
+---------------------+
keywords
LPAD
rpad
rpad
Description
Syntax
VARCHAR rpad (VARCHAR str, INT len, VARCHAR pad)
Returns a string of length len in str, starting with the initials. If len is longer than str, pad characters are added to the right of
STR until the length of the string reaches len. If len is less than str's length, the function is equivalent to truncating STR strings
and returning only strings of len's length. The len is character length not the bye size.
example
mysql> SELECT rpad("hi", 5, "xy");
+---------------------+
| rpad('hi', 5, 'xy') |
+---------------------+
| hixyx |
+---------------------+
+---------------------+
| rpad('hi', 1, 'xy') |
+---------------------+
| h |
+---------------------+
keywords
RPAD
lower
lower
Description
Syntax
VARCHAR lower (VARCHAR str)
example
mysql> SELECT lower("AbC123");
+-----------------+
| lower('AbC123') |
+-----------------+
| abc123 |
+-----------------+
keywords
LOWER
lcase
lcase
Description
Syntax
INT lcase (VARCHAR str)
keywords
LCASE
upper
upper
description
Syntax
VARCHAR upper(VARCHAR str)
example
mysql> SELECT upper("aBc123");
+-----------------+
| upper('aBc123') |
+-----------------+
| ABC123 |
+-----------------+
keywords
UPPER
ucase
ucase
description
Syntax
INT ucase(VARCHAR str)
keywords
UCASE
initcap
initcap
description
Syntax
VARCHAR initcap(VARCHAR str)
Convert the first letter of each word to upper case and the rest to lower case.
Words are sequences of alphanumeric
characters separated by non-alphanumeric characters.
example
mysql> select initcap('hello hello.,HELLO123HELlo');
+---------------------------------------+
| initcap('hello hello.,HELLO123HELlo') |
+---------------------------------------+
| Hello Hello.,Hello123hello |
+---------------------------------------+
keywords
INITCAP
repeat
repeat
Description
Syntax
VARCHAR repeat (VARCHAR str, INT count)
Repeat the str of the string count times, return empty string when count is less than 1, return NULL when str, count is any
NULL
example
mysql> SELECT repeat("a", 3);
+----------------+
| repeat('a', 3) |
+----------------+
| aaa |
+----------------+
+-----------------+
| repeat('a', -1) |
+-----------------+
| |
+-----------------+
keywords
REPEAT
reverse
reverse
description
Syntax
VARCHAR reverse(VARCHAR str)
ARRAY<T> reverse(ARRAY<T> arr)
The REVERSE() function reverses a string or array and returns the result.
notice
For the array type, only supported in vectorized engine
example
mysql> SELECT REVERSE('hello');
+------------------+
| REVERSE('hello') |
+------------------+
| olleh |
+------------------+
+------------------+
| REVERSE(' 你好') |
+------------------+
| 好你 |
+------------------+
+------+-----------------------------+-----------------------------+
| k1 | k2 | reverse(`k2`) |
+------+-----------------------------+-----------------------------+
| 1 | [1, 2, 3, 4, 5] | [5, 4, 3, 2, 1] |
| 2 | [6, 7, 8] | [8, 7, 6] |
| 3 | [] | [] |
| 4 | NULL | NULL |
| 5 | [1, 2, 3, 4, 5, 4, 3, 2, 1] | [1, 2, 3, 4, 5, 4, 3, 2, 1] |
+------+-----------------------------+-----------------------------+
+------+-----------------------------------+-----------------------------------+
| k1 | k2 | reverse(`k2`) |
+------+-----------------------------------+-----------------------------------+
| 3 | [NULL, 'a', NULL, 'b', NULL, 'c'] | ['c', NULL, 'b', NULL, 'a', NULL] |
| 4 | ['d', 'e', NULL, ' '] | [' ', NULL, 'e', 'd'] |
| 5 | [' ', NULL, 'f', 'g'] | ['g', 'f', NULL, ' '] |
+------+-----------------------------------+-----------------------------------+
keywords
REVERSE, ARRAY
concat
concat
Description
Syntax
VARCHAR concat (VARCHAR,...)
Connect multiple strings and return NULL if any of the parameters is NULL
example
mysql> select concat("a", "b");
+------------------+
| concat('a', 'b') |
+------------------+
| ab |
+------------------+
+-----------------------+
+-----------------------+
| abc |
+-----------------------+
+------------------------+
+------------------------+
| NULL |
+------------------------+
keywords
CONCAT
concat_ws
concat_ws
Description
Syntax
VARCHAR concat_ws (VARCHAR sep, VARCHAR str,...)
VARCHAR concat_ws(VARCHAR sep, ARRAY array)
Using the first parameter SEP as a connector, the second parameter and all subsequent parameters(or all string in an ARRAY)
are spliced into a string.
If the separator is NULL, return NULL.
The concat_ws function does not skip empty strings, it skips
NULL values.
example
mysql> select concat_ws("or", "d", "is");
+----------------------------+
+----------------------------+
| doris |
+----------------------------+
+----------------------------+
+----------------------------+
| NULL |
+----------------------------+
+---------------------------------+
+---------------------------------+
| doris |
+---------------------------------+
+-----------------------------------+
+-----------------------------------+
| doris |
+-----------------------------------+
+-----------------------------------+
+-----------------------------------+
| NULL |
+-----------------------------------+
+-----------------------------------------+
+-----------------------------------------+
| doris |
+-----------------------------------------+
keywords
CONCAT_WS,CONCAT,WS,ARRAY
substr
substr
Description
Syntax
VARCHAR substr(VARCHAR content, INT start, INT length)
Find a substring, return the part of the string described by the first parameter starting from start and having a length of len.
The index of the first letter is 1.
example
mysql> select substr("Hello doris", 2, 1);
+-----------------------------+
| substr('Hello doris', 2, 1) |
+-----------------------------+
| e |
+-----------------------------+
+-----------------------------+
| substr('Hello doris', 1, 2) |
+-----------------------------+
| He |
+-----------------------------+
keywords
SUBSTR
substring
substring
description
Syntax
VARCHAR substring(VARCHAR str, INT pos[, INT len])
The forms without a len argument return a substring from string str starting at position pos .
The forms with a len
argument return a substring len characters long from string str , starting at position pos.
It is also possible to use a negative
value for pos . In this case,
the beginning of the substring is pos characters from the end of the string, rather than the
beginning.
A negative value may be used for pos in any of the forms of this function.
A value of 0 for pos returns an empty
string.
example
mysql> select substring('abc1', 2);
+-----------------------------+
| substring('abc1', 2) |
+-----------------------------+
| bc1 |
+-----------------------------+
+-----------------------------+
| substring('abc1', -2) |
+-----------------------------+
| c1 |
+-----------------------------+
+----------------------+
| substring('abc1', 0) |
+----------------------+
| |
+----------------------+
+-----------------------------+
| substring('abc1', 5) |
+-----------------------------+
| NULL |
+-----------------------------+
+-----------------------------+
| substring('abc1def', 2, 2) |
+-----------------------------+
| bc |
+-----------------------------+
keywords
SUBSTRING, STRING
sub_replace
sub_replace
Description
Syntax
VARCHAR sub_replace(VARCHAR str, VARCHAR new_str, INT start [, INT len])
Return with new_str replaces the str with length and starting position from start.
When start and len are negative integers,
return NULL.
and the default value of len is the length of new_str.
example
mysql> select sub_replace("this is origin str","NEW-STR",1);
+-------------------------------------------------+
+-------------------------------------------------+
| tNEW-STRorigin str |
+-------------------------------------------------+
+-----------------------------------+
| sub_replace('doris', '***', 1, 2) |
+-----------------------------------+
| d***is |
+-----------------------------------+
keywords
SUB_REPLACE
append_trailing_char_if_absent
append_trailing_char_if_absent
description
Syntax
VARCHAR append_trailing_char_if_absent(VARCHAR str, VARCHAR trailing_char)
If the @str string is non-empty and does not contain the @trailing_char character at the end, it appends the @trailing_char
character to the end.
@trailing_char contains only one character, and it will return NULL if contains more than one character
example
MySQL [test]> select append_trailing_char_if_absent('a','c');
+------------------------------------------+
| append_trailing_char_if_absent('a', 'c') |
+------------------------------------------+
| ac |
+------------------------------------------+
+-------------------------------------------+
| append_trailing_char_if_absent('ac', 'c') |
+-------------------------------------------+
| ac |
+-------------------------------------------+
keywords
APPEND_TRAILING_CHAR_IF_ABSENT
ends_with
ends_with
Description
Syntax
BOOLEAN ENDS_WITH (VARCHAR str, VARCHAR suffix)
It returns true if the string ends with the specified suffix, otherwise it returns false.
If any parameter is NULL, it returns NULL.
example
mysql> select ends_with("Hello doris", "doris");
+-----------------------------------+
+-----------------------------------+
| 1 |
+-----------------------------------+
+-----------------------------------+
+-----------------------------------+
| 0 |
+-----------------------------------+
keywords
ENDS_WITH
starts_with
starts_with
Description
Syntax
BOOLEAN STARTS_WITH (VARCHAR str, VARCHAR prefix)
It returns true if the string starts with the specified prefix, otherwise it returns false.
If any parameter is NULL, it returns NULL.
example
MySQL [(none)]> select starts_with("hello world","hello");
+-------------------------------------+
+-------------------------------------+
| 1 |
+-------------------------------------+
+-------------------------------------+
+-------------------------------------+
| 0 |
+-------------------------------------+
keywords
STARTS_WITH
trim
trim
description
Syntax
VARCHAR trim(VARCHAR str)
Remove the space that appears continuously from the starring and ending of the parameter str
example
mysql> SELECT trim(' ab d ') str;
+------+
| str |
+------+
| ab d |
+------+
keywords
TRIM
ltrim
ltrim
Description
Syntax
VARCHAR ltrim (VARCHAR str)
Remove the space that appears continuously from the beginning of the parameter str
example
mysql> SELECT ltrim(' ab d');
+------------------+
| ltrim(' ab d') |
+------------------+
| ab d |
+------------------+
keywords
LTRIM
rtrim
rtrim
description
Syntax
VARCHAR rtrim(VARCHAR str)
Remove the space that appears continuously from the ending of the parameter str
example
mysql> SELECT rtrim('ab d ') str;
+------+
| str |
+------+
| ab d |
+------+
keywords
RTRIM
null_or_empty
null_or_empty
description
Syntax
BOOLEAN NULL_OR_EMPTY (VARCHAR str)
It returns true if the string is an empty string or NULL. Otherwise it returns false.
example
MySQL [(none)]> select null_or_empty(null);
+---------------------+
| null_or_empty(NULL) |
+---------------------+
| 1 |
+---------------------+
+-------------------+
| null_or_empty('') |
+-------------------+
| 1 |
+-------------------+
+--------------------+
| null_or_empty('a') |
+--------------------+
| 0 |
+--------------------+
keywords
NULL_OR_EMPTY
not_null_or_empty
not_null_or_empty
description
Syntax
BOOLEAN NOT_NULL_OR_EMPTY (VARCHAR str)
It returns false if the string is an empty string or NULL. Otherwise it returns true.
example
MySQL [(none)]> select not_null_or_empty(null);
+-------------------------+
| not_null_or_empty(NULL) |
+-------------------------+
| 0 |
+-------------------------+
+-----------------------+
| not_null_or_empty('') |
+-----------------------+
| 0 |
+-----------------------+
+------------------------+
| not_null_or_empty('a') |
+------------------------+
| 1 |
+------------------------+
keywords
NOT_NULL_OR_EMPTY
hex
hex
description
Syntax
VARCHAR hex(VARCHAR str)
If the input parameter is a number, the string representation of the hexadecimal value is returned;
If the input parameter is a string, each character will be converted into two hexadecimal characters, and all the characters
after the conversion will be spliced into a string for output
example
input string
+----------+
| hex('1') |
+----------+
| 31 |
+----------+
+----------+
| hex('@') |
+----------+
| 40 |
+----------+
| hex('12') |
+-----------+
| 3132 |
+-----------+
intput num
+---------+
| hex(12) |
+---------+
| C |
+---------+
+------------------+
| hex(-1) |
+------------------+
| FFFFFFFFFFFFFFFF |
+------------------+
keywords
HEX
unhex
unhex
description
Syntax
VARCHAR unhex(VARCHAR str)
Enter a string, if the length of the string is 0 or an odd number, an empty string is returned;
If the string contains characters
other than [0-9], [a-f], [A-F] , an empty string is returned;
In other cases, every two characters are a group of characters
converted into hexadecimal, and then spliced into a string for output.
example
mysql> select unhex('@');
+------------+
| unhex('@') |
+------------+
| |
+------------+
+-------------+
| unhex('41') |
+-------------+
| A |
+-------------+
+---------------+
| unhex('4142') |
+---------------+
| AB |
+---------------+
keywords
UNHEX
elt
elt
Description
Syntax
VARCHAR elt (INT,VARCHAR,...)
Returns the string at specified index. Returns NULL if there is no string at specified index.
example
mysql> select elt(1, 'aaa', 'bbb');
+----------------------+
| elt(1, 'aaa', 'bbb') |
+----------------------+
| aaa |
+----------------------+
mysql> select elt(2, 'aaa', 'bbb');
+-----------------------+
+-----------------------+
| bbb |
+-----------------------+
+----------------------+
| elt(0, 'aaa', 'bbb') |
+----------------------+
| NULL |
+----------------------+
mysql> select elt(2, 'aaa', 'bbb');
+-----------------------+
+-----------------------+
| NULL |
+-----------------------+
keywords
ELT
instr
instr
Description
Syntax
INSTR (VARCHAR STR, VARCHAR substrate)
Returns the location where substr first appeared in str (counting from 1). If substr does not appear in str, return 0.
example
mysql> select instr("abc", "b");
+-------------------+
| instr('abc', 'b') |
+-------------------+
| 2 |
+-------------------+
+-------------------+
| instr('abc', 'd') |
+-------------------+
| 0 |
+-------------------+
keywords
INSTR
locate
locate
Description
Syntax
INT LOCATION (VARCHAR substrate, VARCHAR str [, INT pos]]
Returns where substr appears in str (counting from 1). If the third parameter POS is specified, the position where substr
appears is found from the string where STR starts with POS subscript. If not found, return 0
example
mysql> SELECT LOCATE('bar', 'foobarbar');
+----------------------------+
| locate('bar', 'foobarbar') |
+----------------------------+
| 4 |
+----------------------------+
+--------------------------+
| locate('xbar', 'foobar') |
+--------------------------+
| 0 |
+--------------------------+
+-------------------------------+
| locate('bar', 'foobarbar', 5) |
+-------------------------------+
| 7 |
+-------------------------------+
keywords
LOCATE
field
field
Since Version dev
field
description
Syntax
field(Expr e, param1, param2, param3,.....)
In the order by clause, you can use custom sorting to arrange the data in expr in the specified param1, 2, and 3 order.
The data
not in the param parameter will not participate in sorting, but will be placed first.
You can use asc and desc to control the
overall order.
If there is a NULL value, you can use nulls first, nulls last to control the order of nulls.
example
mysql> select k1,k7 from baseall where k1 in (1,2,3) order by field(k1,2,1,3);
+------+------------+
| k1 | k7 |
+------+------------+
| 2 | wangyu14 |
| 1 | wangjing04 |
| 3 | yuanyuan06 |
+------+------------+
+------------+
| class_name |
+------------+
| Suzi |
| Suzi |
| Ben |
| Ben |
| Henry |
| Henry |
+------------+
keywords
field
find_in_set
find_in_set
description
Syntax
INT find_in_set(VARCHAR str, VARCHAR strlist)
Return to the location where the str first appears in strlist (counting from 1). Strlist is a comma-separated string. If not, return
0. Any parameter is NULL, returning NULL.
example
mysql> select find_in_set("b", "a,b,c");
+---------------------------+
| find_in_set('b', 'a,b,c') |
+---------------------------+
| 2 |
+---------------------------+
keywords
FIND_IN_SET,FIND,IN,SET
replace
replace
description
Syntax
VARCHAR REPLACE (VARCHAR str, VARCHAR old, VARCHAR new)
example
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
keywords
REPLACE
left
left
Description
Syntax
VARCHAR left (VARCHAR str, INT len)
It returns the left part of a string of specified length, length is char length not the byte size.
example
mysql> select left("Hello doris",5);
+------------------------+
| left('Hello doris', 5) |
+------------------------+
| Hello |
+------------------------+
keywords
LEFT
right
right
Description
Syntax
VARCHAR RIGHT (VARCHAR str, INT len)
It returns the right part of a string of specified length, length is char length not the byte size.
example
mysql> select right("Hello doris",5);
+-------------------------+
| right('Hello doris', 5) |
+-------------------------+
| doris |
+-------------------------+
keywords
RIGHT
strleft
strleft
Description
Syntax
VARCHAR STRAIGHT (VARCHAR str, INT len)
It returns the left part of a string of specified length, length is char length not the byte size.
example
mysql> select strleft("Hello doris",5);
+------------------------+
| strleft('Hello doris', 5) |
+------------------------+
| Hello |
+------------------------+
keywords
STRLEFT
strright
strright
Description
Syntax
VARCHAR strright (VARCHAR str, INT len)
It returns the right part of a string of specified length, length is char length not the byte size.
example
mysql> select strright("Hello doris",5);
+-------------------------+
| strright('Hello doris', 5) |
+-------------------------+
| doris |
+-------------------------+
keywords
STRRIGHT
split_part
split_part
Description
Syntax
VARCHAR split_part(VARCHAR content, VARCHAR delimiter, INT field)
Returns the specified partition by splitting the string according to the delimiter. If field is positive, splitting and counting from
the beginning of content, otherwise from the ending.
example
mysql> select split_part("hello world", " ", 1);
+----------------------------------+
+----------------------------------+
| hello |
+----------------------------------+
+----------------------------------+
+----------------------------------+
| world |
+----------------------------------+
年月号
mysql> select split_part("2019 7 8 ", " ", 1);
月
+-----------------------------------------+
年月号
| split_part('2019 7 8 ', ' ', 1)月 |
+-----------------------------------------+
| 2019 7年 |
+-----------------------------------------+
+----------------------------+
| split_part('abca', 'a', 1) |
+----------------------------+
| |
+----------------------------+
+--------------------------------------+
+--------------------------------------+
| string |
+--------------------------------------+
+--------------------------------------+
+--------------------------------------+
| prefix |
+--------------------------------------+
+----------------------------------------+
+----------------------------------------+
| 234 |
+----------------------------------------+
+----------------------------------------+
+----------------------------------------+
| 123# |
+----------------------------------------+
keywords
SPLIT_PART,SPLIT,PART
split_by_string
split_by_string
Since Version 1.2.2
description
Syntax
split_by_string(s, separator)
Splits a string into substrings separated by a string. It uses a constant string separator of multiple characters as the separator.
If the string separator is empty, it will split the string s into an array of single characters.
Arguments
separator — The separator. Type: String
Returned value(s)
Returns an array of selected substrings. Empty substrings may be selected when:
Type: Array(String)
notice
Only supported in vectorized engine
example
SELECT split_by_string('1, 2 3, 4,5, abcde', ', ');
select split_by_string('a1b1c1d','1');
+---------------------------------+
| split_by_string('a1b1c1d', '1') |
+---------------------------------+
+---------------------------------+
select split_by_string(',,a,b,c,',',');
+----------------------------------+
| split_by_string(',,a,b,c,', ',') |
+----------------------------------+
+----------------------------------+
SELECT split_by_string(NULL,',');
+----------------------------+
| split_by_string(NULL, ',') |
+----------------------------+
| NULL |
+----------------------------+
select split_by_string('a,b,c,abcde',',');
+-------------------------------------+
| split_by_string('a,b,c,abcde', ',') |
+-------------------------------------+
+-------------------------------------+
+---------------------------------------------+
| split_by_string('1,,2,3,,4,5,,abcde', ',,') |
+---------------------------------------------+
+---------------------------------------------+
select split_by_string(',,,,',',,');
+-------------------------------+
| split_by_string(',,,,', ',,') |
+-------------------------------+
+-------------------------------+
select split_by_string(',,a,,b,,c,,',',,');
+--------------------------------------+
| split_by_string(',,a,,b,,c,,', ',,') |
+--------------------------------------+
+--------------------------------------+
keywords
SPLIT_BY_STRING,SPLIT
substring_index
substring_index
Name
Since Version 1.2
SUBSTRING_INDEX
description
Syntax
VARCHAR substring_index(VARCHAR content, VARCHAR delimiter, INT field)
Split content to two parts at position where the field s of delimiter stays, return one of them according to below rules:
if
field is positive, return the left part;
else if field is negative, return the right part;
if field is zero, return an empty string
when content is not null, else will return null.
example
mysql> select substring_index("hello world", " ", 1);
+----------------------------------------+
+----------------------------------------+
| hello |
+----------------------------------------+
+----------------------------------------+
+----------------------------------------+
| hello world |
+----------------------------------------+
+-----------------------------------------+
+-----------------------------------------+
| world |
+-----------------------------------------+
+-----------------------------------------+
+-----------------------------------------+
| hello world |
+-----------------------------------------+
+-----------------------------------------+
+-----------------------------------------+
| hello world |
+-----------------------------------------+
+----------------------------------------+
+----------------------------------------+
| |
+----------------------------------------+
keywords
SUBSTRING_INDEX, SUBSTRING
money_format
money_format
Description
Syntax
VARCHAR money format (Number)
The number is output in currency format, the integer part is separated by commas every three bits, and the decimal part is
reserved for two bits.
example
mysql> select money_format(17014116);
+------------------------+
| money_format(17014116) |
+------------------------+
| 17,014,116.00 |
+------------------------+
+------------------------+
| money_format(1123.456) |
+------------------------+
| 1,123.46 |
+------------------------+
+----------------------+
| money_format(1123.4) |
+----------------------+
| 1,123.40 |
+----------------------+
keywords
MONEY_FORMAT,MONEY,FORMAT
parse_url
parse_url
description
Syntax
VARCHAR parse_url(VARCHAR url, VARCHAR name)
From the URL, the field corresponding to name is resolved. The name options are as follows: 'PROTOCOL', 'HOST', 'PATH',
'REF', 'AUTHORITY', 'FILE', 'USERINFO', 'PORT', 'QUERY', and the result is returned.
example
mysql> SELECT parse_url ('https://doris.apache.org/', 'HOST');
+------------------------------------------------+
| parse_url('https://doris.apache.org/', 'HOST') |
+------------------------------------------------+
| doris.apache.org |
+------------------------------------------------+
keywords
PARSE URL
convert_to
Since Version 1.2
convert_to
description
Syntax
convert_to(VARCHAR column, VARCHAR character)
It is used in the order by clause. eg: order by convert(column using gbk), Now only support character can be converted to
'gbk'.
Because when the order by column contains Chinese, it is not arranged in the order of Pinyin
After the character
encoding of column is converted to gbk, it can be arranged according to pinyin
example
mysql> select * from class_test order by class_name;
+----------+------------+-------------+
+----------+------------+-------------+
| 6 | asd | [6] |
| 7 | qwe | [7] |
| 8 | z | [8] |
| 2 | 哈 | [2] |
| 3 | 哦 | [3] |
| 1 | 啊 | [1] |
| 4 | 张 | [4] |
| 5 | 我 | [5] |
+----------+------------+-------------+
+----------+------------+-------------+
+----------+------------+-------------+
| 6 | asd | [6] |
| 7 | qwe | [7] |
| 8 | z | [8] |
| 1 | 啊 | [1] |
| 2 | 哈 | [2] |
| 3 | 哦 | [3] |
| 5 | 我 | [5] |
| 4 | 张 | [4] |
+----------+------------+-------------+
keywords
convert_to
extract_url_parameter
extract_url_parameter
description
Syntax
VARCHAR extract_url_parameter(VARCHAR url, VARCHAR name)
Returns the value of the "name" parameter in the URL, if present. Otherwise an empty string.
If there are many parameters
with this name, the first occurrence is returned.
This function works assuming that the parameter name is encoded in the
URL exactly as it was in the passed parameter.
+--------------------------------------------------------------------------------+
| extract_url_parameter('http://doris.apache.org?k1=aa&k2=bb&test=cc#999', 'k2') |
+--------------------------------------------------------------------------------+
| bb |
+--------------------------------------------------------------------------------+
keywords
EXTRACT URL PARAMETER
uuid
uuid
Since Version 1.2.0
uuid
description
Syntax
VARCHAR uuid()
example
mysql> select uuid();
+--------------------------------------+
| uuid() |
+--------------------------------------+
| 29077778-fc5e-4603-8368-6b5f8fd55c24 |
+--------------------------------------+
keywords
UUID
space
space
Description
Syntax
VARCHAR space(Int num)
example
mysql> select length(space(10));
+-------------------+
| length(space(10)) |
+-------------------+
| 10 |
+-------------------+
+--------------------+
| length(space(-10)) |
+--------------------+
| 0 |
+--------------------+
keywords
space
sleep
sleep
Description
Syntax
boolean sleep (int num)
example
mysql> select sleep(10);
+-----------+
| sleep(10) |
+-----------+
| 1 |
+-----------+
keywords
sleep
esquery
esquery
description
Syntax
boolean esquery(varchar field, varchar QueryDSL)
Use the esquery (field, QueryDSL) function to match queries that cannot be expressed in SQL are pushed down to
Elasticsearch for filtering.
The first column name parameter of esquery is used to associate indexes, and the second
parameter is the json expression of the basic query DSL of ES, which is contained in curly brackets {}. There is one and only
、
one root key of json, such as match_phrase geo_Shape, bool.
example
match_phrase SQL:
"match_phrase": {
}');
geo SQL:
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates": [
13,
53
],
14,
52
},
"relation": "within"
}');
keywords
esquery
mask
mask
description
syntax
VARCHAR mask(VARCHAR str, [, VARCHAR upper[, VARCHAR lower[, VARCHAR number]]])
Returns a masked version of str . By default, upper case letters are converted to "X", lower case letters are converted to "x"
and numbers are converted to "n". For example mask("abcd-EFGH-8765-4321") results in xxxx-XXXX-nnnn-nnnn. You can
override the characters used in the mask by supplying additional arguments: the second argument controls the mask
character for upper case letters, the third argument for lower case letters and the fourth argument for numbers. For
example, mask("abcd-EFGH-8765-4321", "U", "l", "#") results in llll-UUUU-####-####.
example
// table test
+-----------+
| name |
+-----------+
| abc123EFG |
| NULL |
| 456AbCdEf |
+-----------+
+--------------+
| mask(`name`) |
+--------------+
| xxxnnnXXX |
| NULL |
| nnnXxXxXx |
+--------------+
+-----------------------------+
+-----------------------------+
| ###$$$*** |
| NULL |
| $$$*#*#*# |
+-----------------------------+
### keywords
mask
mask_first_n
mask_first_n
description
syntax
VARCHAR mask_first_n(VARCHAR str, [, INT n])
Returns a masked version of str with the first n values masked. Upper case letters are converted to "X", lower case letters are
converted to "x" and numbers are converted to "n". For example, mask_first_n(" 1234-5678-8765-4321", 4) results in nnnn-
5678-8765-4321.
example
// table test
+-----------+
| name |
+-----------+
| abc123EFG |
| NULL |
| 456AbCdEf |
+-----------+
+-------------------------+
| mask_first_n(`name`, 5) |
+-------------------------+
| xxxnn3EFG |
| NULL |
| nnnXxCdEf |
+-------------------------+
### keywords
mask_first_n
mask_last_n
mask_last_n
description
syntax
VARCHAR mask_last_n(VARCHAR str, [, INT n])
Returns a masked version of str with the last n values masked. Upper case letters are converted to "X", lower case letters are
converted to "x" and numbers are converted to "n". For example, mask_last_n(" 1234-5678-8765-4321", 4) results in 1234-5678-
8765-nnnn.
example
// table test
+-----------+
| name |
+-----------+
| abc123EFG |
| NULL |
| 456AbCdEf |
+-----------+
+------------------------+
| mask_last_n(`name`, 5) |
+------------------------+
| abc1nnXXX |
| NULL |
| 456AxXxXx |
+------------------------+
### keywords
mask_last_n
multi_search_all_positions
multi_search_all_positions
Description
Syntax
ARRAY<INT> multi_search_all_positions(VARCHAR haystack, ARRAY<VARCHAR> needles)
Searches for the substrings needles in the string haystack , and returns array of positions of the found corresponding
substrings in the string. Positions are indexed starting from 1.
example
mysql> select multi_search_all_positions('Hello, World!', ['hello', '!', 'world']);
+----------------------------------------------------------------------+
| multi_search_all_positions('Hello, World!', ['hello', '!', 'world']) |
+----------------------------------------------------------------------+
| [0,13,0] |
+----------------------------------------------------------------------+
+-----------------------------------------------------+
+-----------------------------------------------------+
| [1,2,0] |
+-----------------------------------------------------+
keywords
MULTI_SEARCH,SEARCH,POSITIONS
multi_match_any
multi_match_any
Description
Syntax
TINYINT multi_match_any(VARCHAR haystack, ARRAY<VARCHAR> patterns)
Checks whether the string haystack matches the regular expressions patterns in re2 syntax. returns 0 if none of the regular
expressions are matched and 1 if any of the patterns matches.
example
mysql> select multi_match_any('Hello, World!', ['hello', '!', 'world']);
+-----------------------------------------------------------+
+-----------------------------------------------------------+
| 1 |
+-----------------------------------------------------------+
+--------------------------------------+
+--------------------------------------+
| 0 |
+--------------------------------------+
keywords
MULTI_MATCH,MATCH,ANY
like
like
description
syntax
BOOLEAN like(VARCHAR str, VARCHAR pattern)
Perform fuzzy matching on the string str, return true if it matches, and false if it doesn't match.
example
// table test
+-------+
| k1 |
+-------+
| b |
| bb |
| bab |
| a |
+-------+
+-------+
| k1 |
+-------+
| a |
| bab |
+-------+
+-------+
| k1 |
+-------+
| a |
+-------+
keywords
LIKE
not like
not like
description
syntax
BOOLEAN not like(VARCHAR str, VARCHAR pattern)
Perform fuzzy matching on the string str, return false if it matches, and return true if it doesn't match.
example
// table test
+-------+
| k1 |
+-------+
| b |
| bb |
| bab |
| a |
+-------+
+-------+
| k1 |
+-------+
| b |
| bb |
+-------+
+-------+
| k1 |
+-------+
| b |
| bb |
| bab |
+-------+
keywords
LIKE, NOT, NOT LIKE
regexp
regexp
description
syntax
BOOLEAN regexp(VARCHAR str, VARCHAR pattern)
Perform regular matching on the string str, return true if it matches, and return false if it doesn't match. pattern is a regular
expression.
example
// Find all data starting with 'billie' in the k1 field
+--------------------+
| k1 |
+--------------------+
| billie eillish |
+--------------------+
+----------+
| k1 |
+----------+
| It's ok |
+----------+
keywords
REGEXP
regexp_extract
regexp_extract
Description
Syntax
VARCHAR regexp_extract (VARCHAR str, VARCHAR pattern, int pos)
The string STR is matched regularly and the POS matching part which conforms to pattern is extracted. Patterns need to
match exactly some part of the STR to return to the matching part of the pattern. If there is no match, return an empty
string.
example
mysql> SELECT regexp_extract('AbCdE', '([[:lower:]]+)C([[:lower:]]+)', 1);
+-------------------------------------------------------------+
| regexp_extract('AbCdE', '([[:lower:]]+)C([[:lower:]]+)', 1) |
+-------------------------------------------------------------+
| b |
+-------------------------------------------------------------+
+-------------------------------------------------------------+
| regexp_extract('AbCdE', '([[:lower:]]+)C([[:lower:]]+)', 2) |
+-------------------------------------------------------------+
| d |
+-------------------------------------------------------------+
keywords
REGEXP_EXTRACT,REGEXP,EXTRACT
regexp_extract_all
regexp_extract_all
Description
Syntax
VARCHAR regexp_extract_all (VARCHAR str, VARCHAR pattern)
Regularly matches a string str and extracts the first sub-pattern matching part of pattern. The pattern needs to exactly
match a part of str in order to return an array of strings for the part of the pattern that needs to be matched. If there is no
match or the pattern has no sub-pattern, the empty string is returned.
example
mysql> SELECT regexp_extract_all('AbCdE', '([[:lower:]]+)C([[:lower:]]+)');
+--------------------------------------------------------------+
| regexp_extract_all('AbCdE', '([[:lower:]]+)C([[:lower:]]+)') |
+--------------------------------------------------------------+
| ['b'] |
+--------------------------------------------------------------+
+-----------------------------------------------------------------+
| regexp_extract_all('AbCdEfCg', '([[:lower:]]+)C([[:lower:]]+)') |
+-----------------------------------------------------------------+
| ['b','f'] |
+-----------------------------------------------------------------+
+--------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------+
| ['abc','def','ghi'] |
+--------------------------------------------------------------------------------+
keywords
REGEXP_EXTRACT_ALL,REGEXP,EXTRACT,ALL
regexp_replace
regexp_replace
description
Syntax
VARCHAR regexp_replace(VARCHAR str, VARCHAR pattern, VARCHAR repl)
Regular matching of STR strings, replacing the part hitting pattern with repl
example
mysql> SELECT regexp_replace('a b c', " ", "-");
+-----------------------------------+
+-----------------------------------+
| a-b-c |
+-----------------------------------+
+----------------------------------------+
+----------------------------------------+
| a <b> c |
+----------------------------------------+
keywords
REGEXP_REPLACE,REGEXP,REPLACE
regexp_replace_one
regexp_replace_one
description
Syntax
VARCHAR regexp_replace_one(VARCHAR str, VARCHAR pattern, VARCHAR repl)
Regular matching of STR strings, replacing the part hitting pattern with repl, replacing only the first match.
example
mysql> SELECT regexp_replace_one('a b c', " ", "-");
+-----------------------------------+
+-----------------------------------+
| a-b c |
+-----------------------------------+
+----------------------------------------+
+----------------------------------------+
| a <b> b |
+----------------------------------------+
keywords
REGEXP_REPLACE_ONE,REGEXP,REPLACE,ONE
not regexp
not regexp
description
syntax
BOOLEAN not regexp(VARCHAR str, VARCHAR pattern)
Perform regular matching on the string str, return false if it matches, and return true if it doesn't match. pattern is a regular
expression.
example
// Find all data in the k1 field that does not start with 'billie'
+--------------------+
| k1 |
+--------------------+
| Emmy eillish |
+--------------------+
// Find all the data in the k1 field that does not end with 'ok':
+------------+
| k1 |
+------------+
| It's true |
+------------+
keywords
REGEXP, NOT, NOT REGEXP
COLLECT_SET
COLLECT_SET
Since Version 1.2.0
COLLECT_SET
description
Syntax
ARRAY<T> collect_set(expr[,max_size])
Creates an array containing distinct elements from expr ,with the optional max_size parameter limits the size of the
resulting array to max_size elements. It has an alias group_uniq_array .
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+------------+-------+
| k1 | k2 | k3 |
+------+------------+-------+
| 1 | 2023-01-01 | hello |
| 2 | 2023-01-01 | NULL |
| 2 | 2023-01-02 | hello |
| 3 | NULL | world |
| 3 | 2023-01-02 | hello |
| 4 | 2023-01-02 | doris |
| 4 | 2023-01-03 | sql |
+------+------------+-------+
+-------------------------+--------------------------+
| collect_set(`k1`) | collect_set(`k1`,2) |
+-------------------------+--------------------------+
| [4,3,2,1] | [1,2] |
+----------------------------------------------------+
+------+-------------------------+--------------------------+
| k1 | collect_set(`k2`) | collect_set(`k3`,1) |
+------+-------------------------+--------------------------+
| 1 | [2023-01-01] | [hello] |
| 2 | [2023-01-01,2023-01-02] | [hello] |
| 3 | [2023-01-02] | [world] |
| 4 | [2023-01-02,2023-01-03] | [sql] |
+------+-------------------------+--------------------------+
keywords
COLLECT_SET,GROUP_UNIQ_ARRAY,COLLECT_LIST,ARRAY
MIN
MIN
Description
Syntax
MIN(expr)
example
MySQL > select min(scan_rows) from log_statis group by datetime;
+------------------+
| min(`scan_rows`) |
+------------------+
| 0 |
+------------------+
keywords
MIN
STDDEV_SAMP
STDDEV_SAMP
Description
Syntax
STDDEV SAMP (expr)
example
MySQL > select stddev_samp(scan_rows) from log_statis group by datetime;
+--------------------------+
| stddev_samp(`scan_rows`) |
+--------------------------+
| 2.372044195280762 |
+--------------------------+
keywords
STDDEV SAMP,STDDEV,SAMP
AVG
AVG
Description
Syntax
AVG([DISTINCT] expr)
Optional field DISTINCT parameters can be used to return the weighted average
example
mysql> SELECT datetime, AVG(cost_time) FROM log_statis group by datetime;
+---------------------+--------------------+
| datetime | avg(`cost_time`) |
+---------------------+--------------------+
+---------------------+--------------------+
+---------------------+---------------------------+
+---------------------+---------------------------+
+---------------------+---------------------------+
keywords
AVG
AVG_WEIGHTED
AVG_WEIGHTED
Description
Syntax
double avg_weighted(x, weight)
Calculate the weighted arithmetic mean, which is the sum of the products of all corresponding values and weights, divided
the total weight sum.
If the sum of all weights equals 0, NaN will be returned.
example
mysql> select avg_weighted(k2,k1) from baseall;
+--------------------------+
| avg_weighted(`k2`, `k1`) |
+--------------------------+
| 495.675 |
+--------------------------+
keywords
AVG_WEIGHTED
PERCENTILE
PERCENTILE
Description
Syntax
PERCENTILE(expr, DOUBLE p)
Calculate the exact percentile, suitable for small data volumes. Sort the specified column in descending order first, and then
take the exact p-th percentile. The value of p is between 0 and 1
Parameter Description:
expr: required. The value is an integer (bigint at most).
p: The exact percentile is required. The const
value is [0.0,1.0]
example
MySQL > select `table`, percentile(cost_time,0.99) from log_statis group by `table`;
+---------------------+---------------------------+
+----------+--------------------------------------+
| test | 54.22 |
+----------+--------------------------------------+
+-----------------------+
| percentile(NULL, 0.3) |
+-----------------------+
| NULL |
+-----------------------+
keywords
PERCENTILE
PERCENTILE_ARRAY
PERCENTILE_ARRAY
Description
Syntax
ARRAY_DOUBLE PERCENTILE_ARRAY(BIGINT, ARRAY_DOUBLE p)
Calculate exact percentiles, suitable for small data volumes. Sorts the specified column in descending order first, then takes
the exact pth percentile.
The return value is the result of sequentially taking the specified percentages in the array p.
Parameter Description:
expr: Required. Columns whose values are of type integer (up to bigint).
p: The exact percentile is
required, an array of constants, taking the value [0.0, 1.0].
example
mysql> select percentile_array(k1,[0.3,0.5,0.9]) from baseall;
+----------------------------------------------+
+----------------------------------------------+
| [5.2, 8, 13.6] |
+----------------------------------------------+
keywords
PERCENTILE_ARRAY
HLL_UNION_AGG
HLL_UNION_AGG
description
Syntax
HLL_UNION_AGG(hll)
HLL is an engineering implementation based on HyperLog algorithm, which is used to save the intermediate results of
HyperLog calculation process.
It can only be used as the value column type of the table and reduce the amount of data through aggregation to achieve the
purpose of speeding up the query.
Based on this, we get an estimate with an error of about 1%. The HLL column is generated by other columns or data imported
into the data.
When importing, hll_hash function is used to specify which column in data is used to generate HLL column. It is often used to
replace count distinct, and to calculate UV quickly in business by combining rollup.
example
MySQL > select HLL_UNION_AGG(uv_set) from test_uv;;
+-------------------------+
+-------------------------+
| 17721 |
+-------------------------+
keywords
HLL_UNION_AGG,HLL,UNION,AGG
TOPN
TOPN
description
Syntax
topn(expr, INT top_num[, INT space_expand_rate])
The topn function uses the Space-Saving algorithm to calculate the top_num frequent items in expr, and the result is the
frequent items and their occurrence times, which is an approximation
The space_expand_rate parameter is optional and is used to set the number of counters used in the Space-Saving algorithm
The higher value of space_expand_rate, the more accurate result will be. The default value is 50
example
MySQL [test]> select topn(keyword,10) from keyword_table where date>= '2020-06-01' and date <= '2020-06-19' ;
+------------------------------------------------------------------------------------------------------------+
| topn(`keyword`, 10) |
+------------------------------------------------------------------------------------------------------------+
| a:157, b:138, c:133, d:133, e:131, f:127, g:124, h:122, i:117, k:117 |
+------------------------------------------------------------------------------------------------------------+
MySQL [test]> select date,topn(keyword,10,100) from keyword_table where date>= '2020-06-17' and date <= '2020-06-
19' group by date;
+------------+-----------------------------------------------------------------------------------------------+
+------------+-----------------------------------------------------------------------------------------------+
| 2020-06-19 | a:11, b:8, c:8, d:7, e:7, f:7, g:7, h:7, i:7, j:7 |
| 2020-06-18 | a:10, b:8, c:7, f:7, g:7, i:7, k:7, l:7, m:6, d:6 |
| 2020-06-17 | a:9, b:8, c:8, j:8, d:7, e:7, f:7, h:7, i:7, k:7 |
+------------+-----------------------------------------------------------------------------------------------+
keywords
TOPN
TOPN_ARRAY
TOPN_ARRAY
description
Syntax
ARRAY<T> topn_array(expr, INT top_num[, INT space_expand_rate])
The topn function uses the Space-Saving algorithm to calculate the top_num frequent items in expr,
and return an array
about the top n nums, which is an approximation
The space_expand_rate parameter is optional and is used to set the number of counters used in the Space-Saving algorithm
The higher value of space_expand_rate, the more accurate result will be. The default value is 50
example
mysql> select topn_array(k3,3) from baseall;
+--------------------------+
| topn_array(`k3`, 3) |
+--------------------------+
+--------------------------+
+--------------------------+
| topn_array(`k3`, 3, 100) |
+--------------------------+
+--------------------------+
keywords
TOPN, TOPN_ARRAY
TOPN_WEIGHTED
TOPN_WEIGHTED
description
Syntax
ARRAY<T> topn_weighted(expr, BigInt weight, INT top_num[, INT space_expand_rate])
The topn_weighted function is calculated using the Space-Saving algorithm, and the sum of the weights in expr is the result
of the top n numbers, which is an approximate value
The space_expand_rate parameter is optional and is used to set the number of counters used in the Space-Saving algorithm
The higher value of space_expand_rate, the more accurate result will be. The default value is 50
example
mysql> select topn_weighted(k5,k1,3) from baseall;
+------------------------------+
| topn_weighted(`k5`, `k1`, 3) |
+------------------------------+
+------------------------------+
+-----------------------------------+
+-----------------------------------+
+-----------------------------------+
keywords
TOPN, TOPN_WEIGHTED
COUNT
COUNT
Description
Syntax
COUNT([DISTINCT] expr)
example
MySQL > select count(*) from log_statis group by datetime;
+----------+
| count(*) |
+----------+
| 28515903 |
+----------+
+-------------------+
| count(`datetime`) |
+-------------------+
| 28521682 |
+-------------------+
+-------------------------------+
| count(DISTINCT `datetime`) |
+-------------------------------+
| 71045 |
+-------------------------------+
keywords
COUNT
SUM
SUM
Description
Syntax
Sum (Expr)
example
MySQL > select sum(scan_rows) from log_statis group by datetime;
+------------------+
| sum(`scan_rows`) |
+------------------+
| 8217360135 |
+------------------+
keywords
SUM
MAX_BY
MAX_BY
description
Syntax
MAX_BY(expr1, expr2)
Returns the value of an expr1 associated with the maximum value of expr2 in a group.
example
MySQL > select * from tbl;
+------+------+------+------+
| k1 | k2 | k3 | k4 |
+------+------+------+------+
| 0 | 3 | 2 | 100 |
| 1 | 2 | 3 | 4 |
| 4 | 3 | 2 | 1 |
| 3 | 4 | 2 | 1 |
+------+------+------+------+
+--------------------+
| max_by(`k1`, `k4`) |
+--------------------+
| 0 |
+--------------------+
keywords
MAX_BY
BITMAP_UNION
BITMAP_UNION
description
example
Create table
The aggregation model needs to be used when creating the table. The data type is bitmap and the aggregation function is
bitmap_union.
) ENGINE = OLAP
COMMENT "OLAP"
Note: When the amount of data is large, it is best to create a corresponding rollup table for high-frequency bitmap_union
queries
Data Load
TO_BITMAP (expr) : Convert 0 ~ 18446744073709551615 unsigned bigint to bitmap
BITMAP_EMPTY () : Generate empty bitmap columns, used for insert or import to fill the default value
BITMAP_HASH (expr) or BITMAP_HASH64 (expr) : Convert any type of column to a bitmap by hashing
Stream Load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = to_bitmap
(user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_hash
(user_id)" http: // host: 8410 / api / test / testDb / _stream_load
cat data | curl --location-trusted -u user: passwd -T--H "columns: dt, page, user_id, user_id = bitmap_empty ()"
http: // host: 8410 / api / test / testDb / _stream_load
Insert Into
id2's column type is bitmap
INSERT INTO bitmap_table1 (id, id2) VALUES (1001, to_bitmap (1000)), (1001, to_bitmap (2000));
insert into bitmap_table1 select id, bitmap_union (id2) from bitmap_table2 group by id;
Data Query
Syntax
BITMAP_UNION (expr) : Calculate the union of two Bitmaps. The return value is the new Bitmap value.
BITMAP_UNION_COUNT (expr) : Calculate the cardinality of the union of two Bitmaps, equivalent to BITMAP_COUNT
(BITMAP_UNION (expr)). It is recommended to use the BITMAP_UNION_COUNT function first, its performance is better than
BITMAP_COUNT (BITMAP_UNION (expr)).
BITMAP_UNION_INT (expr) : Count the number of different values in columns of type TINYINT, SMALLINT and INT, return the
sum of COUNT (DISTINCT expr) same
Example
The following SQL uses the pv_bitmap table above as an example:
intersect_count (user_id, page, 'meituan', 'waimai') as retention // Number of users appearing on both 'meituan'
and 'waimai' pages
from pv_bitmap
keywords
BITMAP, BITMAP_COUNT, BITMAP_EMPTY, BITMAP_UNION, BITMAP_UNION_INT, TO_BITMAP, BITMAP_UNION_COUNT,
INTERSECT_COUNT
group_bitmap_xor
group_bitmap_xor
description
Syntax
BITMAP GROUP_BITMAP_XOR(expr)
example
mysql> select page, bitmap_to_string(user_id) from pv_bitmap;
+------+-----------------------------+
| page | bitmap_to_string(`user_id`) |
+------+-----------------------------+
| m | 4,7,8 |
| m | 1,3,6,15 |
| m | 4,7 |
+------+-----------------------------+
+------+-----------------------------------------------+
| page | bitmap_to_string(group_bitmap_xor(`user_id`)) |
+------+-----------------------------------------------+
| m | 1,3,6,8,15 |
+------+-----------------------------------------------+
keywords
GROUP_BITMAP_XOR,BITMAP
group_bit_and
group_bit_and
description
Syntax
expr GROUP_BIT_AND(expr)
example
mysql> select * from group_bit;
+-------+
| value |
+-------+
| 3 |
| 1 |
| 2 |
| 4 |
+-------+
+------------------------+
| group_bit_and(`value`) |
+------------------------+
| 0 |
+------------------------+
keywords
GROUP_BIT_AND,BIT
group_bit_or
group_bit_or
description
Syntax
expr GROUP_BIT_OR(expr)
example
mysql> select * from group_bit;
+-------+
| value |
+-------+
| 3 |
| 1 |
| 2 |
| 4 |
+-------+
+-----------------------+
| group_bit_or(`value`) |
+-----------------------+
| 7 |
+-----------------------+
keywords
GROUP_BIT_OR,BIT
group_bit_xor
group_bit_xor
description
Syntax
expr GROUP_BIT_XOR(expr)
example
mysql> select * from group_bit;
+-------+
| value |
+-------+
| 3 |
| 1 |
| 2 |
| 4 |
+-------+
+------------------------+
| group_bit_xor(`value`) |
+------------------------+
| 4 |
+------------------------+
keywords
GROUP_BIT_XOR,BIT
PERCENTILE_APPROX
PERCENTILE_APPROX
Description
Syntax
PERCENTILE_APPROX(expr, DOUBLE p[, DOUBLE compression])
Return the approximation of the point p, where the value of P is between 0 and 1.
Compression param is optional and can be setted to a value in the range of [2048, 10000]. The bigger compression you set,
the more precise result and more time cost you will get. If it is not setted or not setted in the correct range,
PERCENTILE_APPROX function will run with a default compression param of 10000.
This function uses fixed size memory, so less memory can be used for columns with high cardinality, and can be used to
calculate statistics such as tp99.
example
MySQL > select `table`, percentile_approx(cost_time,0.99) from log_statis group by `table`;
+---------------------+---------------------------+
+----------+--------------------------------------+
| test | 54.22 |
+----------+--------------------------------------+
MySQL > select `table`, percentile_approx(cost_time,0.99, 4096) from log_statis group by `table`;
+---------------------+---------------------------+
+----------+--------------------------------------+
| test | 54.21 |
+----------+--------------------------------------+
keywords
PERCENTILE_APPROX,PERCENTILE,APPROX
STDDEV,STDDEV_POP
STDDEV,STDDEV_POP
Description
Syntax
stddev (expl)
example
MySQL > select stddev(scan_rows) from log_statis group by datetime;
+---------------------+
| stddev(`scan_rows`) |
+---------------------+
| 2.3736656687790934 |
+---------------------+
+-------------------------+
| stddev_pop(`scan_rows`) |
+-------------------------+
| 2.3722760595994914 |
+-------------------------+
keywords
STDDEV,STDDEV_POP,POP
GROUP_CONCAT
GROUP_CONCAT
description
Syntax
VARCHAR GROUP_CONCAT([DISTINCT] VARCHAR str[, VARCHAR sep]) [ORDER BY { col_name | expr} [ASC | DESC])
This function is an aggregation function similar to sum (), and group_concat links multiple rows of results in the result set to a
string. The second parameter, sep, is a connection symbol between strings, which can be omitted. This function usually
needs to be used with group by statements.
Since Version 1.2 Support Order By for sorting multi-row results, sorting and aggregation columns can be different.
example
mysql> select value from test;
+-------+
| value |
+-------+
| a |
| b |
| c |
| c |
+-------+
+-----------------------+
| GROUP_CONCAT(`value`) |
+-----------------------+
| a, b, c, c |
+-----------------------+
+----------------------------+
+----------------------------+
| a b c c |
+----------------------------+
+-----------------------+
| GROUP_CONCAT(`value`) |
+-----------------------+
| a, b, c |
+-----------------------+
+----------------------------+
| GROUP_CONCAT(`value`, NULL)|
+----------------------------+
| NULL |
+----------------------------+
SELECT abs(k3), group_concat(distinct cast(abs(k2) as varchar) order by abs(k1), k5) FROM bigtable group by
abs(k3) order by abs(k3); +------------+---------------------------------------------------------------------
----------+
+------------+-------------------------------------------------------------------------------+
| 103 | 255 |
| 25699 | 1989 |
+------------+-------------------------------------------------------------------------------+
### keywords
GROUP_CONCAT,GROUP,CONCAT
COLLECT_LIST
COLLECT_LIST
description
Syntax
ARRAY<T> collect_list(expr)
Returns an array consisting of all values in expr within the group, and ,with the optional max_size parameter limits the size of
the resulting array to max_size elements.The order of elements in the array is non-deterministic. NULL values are excluded.
It
has an alias group_array .
notice
Only supported in vectorized engine
example
mysql> set enable_vectorized_engine=true;
+------+------------+-------+
| k1 | k2 | k3 |
+------+------------+-------+
| 1 | 2023-01-01 | hello |
| 2 | 2023-01-02 | NULL |
| 2 | 2023-01-02 | hello |
| 3 | NULL | world |
| 3 | 2023-01-02 | hello |
| 4 | 2023-01-02 | sql |
| 4 | 2023-01-03 | sql |
+------+------------+-------+
+-------------------------+--------------------------+
| collect_list(`k1`) | collect_list(`k1`,3) |
+-------------------------+--------------------------+
| [1,2,2,3,3,4,4] | [1,2,2] |
+-------------------------+--------------------------+
+------+-------------------------+--------------------------+
| k1 | collect_list(`k2`) | collect_list(`k3`,1) |
+------+-------------------------+--------------------------+
| 1 | [2023-01-01] | [hello] |
| 2 | [2023-01-02,2023-01-02] | [hello] |
| 3 | [2023-01-02] | [world] |
| 4 | [2023-01-02,2023-01-03] | [sql] |
+------+-------------------------+--------------------------+
keywords
COLLECT_LIST,GROUP_ARRAY,COLLECT_SET,ARRAY
Edit this page
Feedback
SQL Manual SQL Functions Aggregate Functions MIN_BY
MIN_BY
MIN_BY
description
Syntax
MIN_BY(expr1, expr2)
Returns the value of an expr1 associated with the minimum value of expr2 in a group.
example
MySQL > select * from tbl;
+------+------+------+------+
| k1 | k2 | k3 | k4 |
+------+------+------+------+
| 0 | 3 | 2 | 100 |
| 1 | 2 | 3 | 4 |
| 4 | 3 | 2 | 1 |
| 3 | 4 | 2 | 1 |
+------+------+------+------+
+--------------------+
| min_by(`k1`, `k4`) |
+--------------------+
| 4 |
+--------------------+
keywords
MIN_BY
MAX
MAX
description
Syntax
MAX(expr)
example
MySQL > select max(scan_rows) from log_statis group by datetime;
+------------------+
| max(`scan_rows`) |
+------------------+
| 4671587 |
+------------------+
keywords
MAX
ANY_VALUE
ANY_VALUE
Since Version 1.2.0
ANY_VALUE
description
Syntax
ANY_VALUE(expr)
If there is a non NULL value in expr, any non NULL value is returned; otherwise, NULL is returned.
example
mysql> select id, any_value(name) from cost2 group by id;
+------+-------------------+
| id | any_value(`name`) |
+------+-------------------+
| 3 | jack |
| 2 | jack |
+------+-------------------+
keywords
ANY_VALUE, ANY
VARIANCE_SAMP,VARIANCE_SAMP
VARIANCE_SAMP,VARIANCE_SAMP
Description
Syntax
VAR SAMP (expr)
example
MySQL > select var_samp(scan_rows) from log_statis group by datetime;
+-----------------------+
| var_samp(`scan_rows`) |
+-----------------------+
| 5.6227132145741789 |
+-----------------------+
keywords
VAR SAMP, VARIANCE SAMP,VAR,SAMP,VARIANCE
APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT
Description
Syntax
APPROX_COUNT_DISTINCT (expr)
Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col).
It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory can be used for columns with high
cardinality.
example
MySQL > select approx_count_distinct(query_id) from log_statis group by datetime;
+-----------------+
| approx_count_distinct(`query_id`) |
+-----------------+
| 17721 |
+-----------------+
keywords
APPROX_COUNT_DISTINCT
VARIANCE,VAR_POP,VARIANCE_POP
VARIANCE,VAR_POP,VARIANCE_POP
Description
Syntax
VARIANCE(expr)
example
MySQL > select variance(scan_rows) from log_statis group by datetime;
+-----------------------+
| variance(`scan_rows`) |
+-----------------------+
| 5.6183332881176211 |
+-----------------------+
+----------------------+
| var_pop(`scan_rows`) |
+----------------------+
| 5.6230744719006163 |
+----------------------+
keywords
VARIANCE,VAR_POP,VARIANCE_POP,VAR,POP
RETENTION
RETENTION
Since Version 1.2.0
RETENTION
Description
Syntax
retention(event1, event2, ... , eventN);
The retention function takes as arguments a set of conditions from 1 to 32 arguments of type UInt8 that indicate whether
a certain condition was met for the event. Any condition can be specified as an argument.
The conditions, except the first, apply in pairs: the result of the second will be true if the first and second are true, of the third
if the first and third are true, etc.
Arguments
event — An expression that returns a UInt8 result (1 or 0).
Returned value
The array of 1 or 0.
example
DROP TABLE IF EXISTS retention_test;
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
(0, '2022-10-14'),
(1, '2022-10-12'),
(1, '2022-10-13'),
(2, '2022-10-12');
+------+---------------------+
| uid | date |
+------+---------------------+
| 0 | 2022-10-14 00:00:00 |
| 0 | 2022-10-13 00:00:00 |
| 0 | 2022-10-12 00:00:00 |
| 1 | 2022-10-13 00:00:00 |
| 1 | 2022-10-12 00:00:00 |
| 2 | 2022-10-12 00:00:00 |
+------+---------------------+
SELECT
uid,
retention(date = '2022-10-12')
AS r
FROM retention_test
GROUP BY uid
+------+------+
| uid | r |
+------+------+
| 0 | [1] |
| 1 | [1] |
| 2 | [1] |
+------+------+
SELECT
uid,
AS r
FROM retention_test
GROUP BY uid
+------+--------+
| uid | r |
+------+--------+
| 0 | [1, 1] |
| 1 | [1, 1] |
| 2 | [1, 0] |
+------+--------+
SELECT
uid,
AS r
FROM retention_test
GROUP BY uid
+------+-----------+
| uid | r |
+------+-----------+
| 0 | [1, 1, 1] |
| 1 | [1, 1, 0] |
| 2 | [1, 0, 0] |
+------+-----------+
keywords
RETENTION
Edit this page
Feedback
SQL Manual SQL Functions Aggregate Functions SEQUENCE-MATCH
SEQUENCE-MATCH
SEQUENCE-MATCH
Description
Syntax
sequence_match(pattern, timestamp, cond1, cond2, ...);
Checks whether the sequence contains an event chain that matches the pattern.
WARNING!
Events that occur at the same second may lay in the sequence in an undefined order affecting the result.
Arguments
pattern — Pattern string.
Pattern syntax
(?N) — Matches the condition argument at position N. Conditions are numbered in the [1, 32] range. For example, (?1)
matches the argument passed to the cond1 parameter.
.* — Matches any number of events. You do not need conditional arguments to match this element of the pattern.
(?t operator value) — Sets the time in seconds that should separate two events.
We define t as the difference in seconds between two times, For example, pattern (?1)(?t>1800)(?2) matches events that
occur more than 1800 seconds from each other. pattern (?1)(?t>10000)(?2) matches events that occur more than 10000
seconds from each other. An arbitrary number of any events can lay between these events. You can use the >= , > , < , <= ,
== operators.
timestamp — Column considered to contain time data. Typical data types are Date and DateTime . You can also use any of
the supported UInt data types.
cond1 , cond2 — Conditions that describe the chain of events. Data type: UInt8 . You can pass up to 32 condition arguments.
The function takes only the events described in these conditions into account. If the sequence contains data that isn’t
described in a condition, the function skips them.
Returned value
1, if the pattern is matched.
example
match examples
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_match_test1(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 13:28:02 | 2 |
| 3 | 2022-11-02 16:15:01 | 1 |
| 4 | 2022-11-02 19:05:04 | 2 |
| 5 | 2022-11-02 20:08:44 | 3 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------------------+
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_match_test2(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 0 |
+----------------------------------------------------------------+
+------------------------------------------------------------------+
+------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------+
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------------+
special examples
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_match_test3(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
This is a very simple example. The function found the event chain where number 5 follows number 1. It skipped number 7,3,4
between them, because the number is not described as an event. If we want to take this number into account when
searching for the event chain given in the example, we should make a condition for it.
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------------------+
The result is kind of confusing. In this case, the function couldn’t find the event chain matching the pattern, because the
event for number 4 occurred between 1 and 5. If in the same case we checked the condition for number 6, the sequence
would match the pattern.
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| 1 |
+------------------------------------------------------------------------------+
keywords
SEQUENCE_MATCH
SEQUENCE-COUNT
SEQUENCE-COUNT
Description
Syntax
sequence_count(pattern, timestamp, cond1, cond2, ...);
Counts the number of event chains that matched the pattern. The function searches event chains that do not overlap. It
starts to search for the next chain after the current chain is matched.
WARNING!
Events that occur at the same second may lay in the sequence in an undefined order affecting the result.
Arguments
pattern — Pattern string.
Pattern syntax
(?N) — Matches the condition argument at position N. Conditions are numbered in the [1, 32] range. For example, (?1)
matches the argument passed to the cond1 parameter.
.* — Matches any number of events. You do not need conditional arguments to count this element of the pattern.
(?t operator value) — Sets the time in seconds that should separate two events.
We define t as the difference in seconds between two times, For example, pattern (?1)(?t>1800)(?2) matches events that
occur more than 1800 seconds from each other. pattern (?1)(?t>10000)(?2) matches events that occur more than 10000
seconds from each other. An arbitrary number of any events can lay between these events. You can use the >= , > , < , <= ,
== operators.
timestamp — Column considered to contain time data. Typical data types are Date and DateTime . You can also use any of
the supported UInt data types.
cond1 , cond2 — Conditions that describe the chain of events. Data type: UInt8 . You can pass up to 32 condition arguments.
The function takes only the events described in these conditions into account. If the sequence contains data that isn’t
described in a condition, the function skips them.
Returned value
Number of non-overlapping event chains that are matched.
example
count examples
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test2(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 13:28:02 | 2 |
| 3 | 2022-11-02 16:15:01 | 1 |
| 4 | 2022-11-02 19:05:04 | 2 |
| 5 | 2022-11-02 20:08:44 | 3 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 2 |
+----------------------------------------------------------------+
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| 2 |
+---------------------------------------------------------------------------+
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test1(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 0 |
+----------------------------------------------------------------+
+------------------------------------------------------------------+
+------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------+
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------------+
special examples
DUPLICATE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO sequence_count_test3(uid, date, number) values (1, '2022-11-02 10:41:00', 1),
+------+---------------------+--------+
+------+---------------------+--------+
| 1 | 2022-11-02 10:41:00 | 1 |
| 2 | 2022-11-02 11:41:00 | 7 |
| 3 | 2022-11-02 16:15:01 | 3 |
| 4 | 2022-11-02 19:05:04 | 4 |
| 5 | 2022-11-02 21:24:12 | 5 |
+------+---------------------+--------+
+----------------------------------------------------------------+
+----------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------+
This is a very simple example. The function found the event chain where number 5 follows number 1. It skipped number 7,3,4
between them, because the number is not described as an event. If we want to take this number into account when
searching for the event chain given in the example, we should make a condition for it.
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| 0 |
+------------------------------------------------------------------------------+
The result is kind of confusing. In this case, the function couldn’t find the event chain matching the pattern, because the
event for number 4 occurred between 1 and 5. If in the same case we checked the condition for number 6, the sequence
would count the pattern.
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| 1 |
+------------------------------------------------------------------------------+
keywords
SEQUENCE_COUNT
GROUPING
GROUPING
Name
GROUPING
Description
Indicates whether a specified column expression in a GROUP BY list is aggregated or not. GROUPING returns 1 for aggregated
or 0 for not aggregated in the result set. GROUPING can be used only in the SELECT <select> list , HAVING , and ORDER BY
clauses when GROUP BY is specified.
GROUPING is used to distinguish the null values that are returned by ROLLUP , CUBE or GROUPING SETS from standard null
values. The NULL returned as the result of a ROLLUP , CUBE or GROUPING SETS operation is a special use of NULL . This acts as a
column placeholder in the result set and means all.
GROUPING( <column_expression> )
<column_expression>
Is a column or an expression that contains a column in a GROUP BY clause.
Example
The following example groups camp and aggregates occupation amounts in the database. The GROUPING function is applied
to the camp column.
role_id INT,
occupation VARCHAR(32),
camp VARCHAR(32),
register_time DATE
UNIQUE KEY(role_id)
PROPERTIES (
);
SELECT
camp,
COUNT(occupation) AS 'occ_cnt',
GROUPING(camp) AS 'grouping'
FROM
`roles`
GROUP BY
The result set shows two null value under camp . The first NULL is in the summary row added by the ROLLUP operation. The
summary row shows the occupation counts for all camp groups and is indicated by 1 in the Grouping column. The second
NULL represents the group of null values from this column in the table.
+----------+---------+----------+
+----------+---------+----------+
| NULL | 9 | 1 |
| NULL | 1 | 0 |
| alliance | 4 | 0 |
| horde | 4 | 0 |
+----------+---------+----------+
Keywords
GROUPING
Best Practice
See also GROUPING_ID
GROUPING_ID
GROUPING_ID
Name
GROUPING_ID
Description
Is a function that computes the level of grouping. GROUPING_ID can be used only in the SELECT <select> list , HAVING , or
ORDER BY clauses when GROUP BY is specified.
Syntax
Arguments
<column_expression>
Return Type
BIGINT
Remarks
The GROUPING_ID's <column_expression> must exactly match the expression in the GROUP BY list. For example, if you are
grouping by user_id , use GROUPING_ID (user_id) ; or if you are grouping by name , use GROUPING_ID (name) .
Columns aggregated GROUPING_ID (a, b, c) input = GROUPING(a) + GROUPING(b) + GROUPING(c) GROUPING_ID () output
a 100 4
b 010 2
c 001 1
ab 110 6
ac 101 5
bc 011 3
abc 111 7
GROUPING_ID() Equivalents
For a single grouping query, GROUPING (<column_expression>) is equivalent to GROUPING_ID(<column_expression>) , and both
return 0.
For example, the following statements are equivalent:
Statement A:
SELECT GROUPING_ID(A,B)
FROM T
GROUP BY CUBE(A,B)
Statement B:
UNION ALL
UNION ALL
UNION ALL
Example
Before starting our example, We first prepare the following data.
uid INT,
name VARCHAR(32),
level VARCHAR(32),
title VARCHAR(32),
department VARCHAR(32),
hiredate DATE
UNIQUE KEY(uid)
PROPERTIES (
"replication_num" = "1"
);
+------+----------+-----------+----------------------+--------------------+------------+
+------+----------+-----------+----------------------+--------------------+------------+
SELECT
department,
CASE
ELSE 'Unknown'
FROM employee
+--------------------+---------------------------+----------------+
+--------------------+---------------------------+----------------+
| Technology | Senior | 3 |
| Sales | Senior | 1 |
| Sales | Assistant | 2 |
| Sales | Trainee | 1 |
| Marketing | Senior | 1 |
| Marketing | Trainee | 2 |
| Marketing | Assistant | 1 |
+--------------------+---------------------------+----------------+
department,
CASE
ELSE 'Unknown'
COUNT(uid)
FROM employee
+--------------------+-----------+--------------+
+--------------------+-----------+--------------+
| Technology | Senior | 3 |
| Sales | Senior | 1 |
| Marketing | Senior | 1 |
+--------------------+-----------+--------------+
Keywords
GROUPING_ID
Best Practice
For more information, see also:
GROUPING
to_bitmap
to_bitmap
description
Syntax
BITMAP TO_BITMAP(expr)
Convert an unsigned bigint (ranging from 0 to 18446744073709551615) to a bitmap containing that value.
Null will be return
when the input value is not in this range.
Mainly be used to load integer value into bitmap column, e.g.,
example
mysql> select bitmap_count(to_bitmap(10));
+-----------------------------+
| bitmap_count(to_bitmap(10)) |
+-----------------------------+
| 1 |
+-----------------------------+
+---------------------------------+
| bitmap_to_string(to_bitmap(-1)) |
+---------------------------------+
| |
+---------------------------------+
keywords
TO_BITMAP,BITMAP
bitmap_hash
bitmap_hash
description
Syntax
BITMAP BITMAP_HASH(expr)
Compute the 32-bits hash value of a expr of any type, then return a bitmap containing that hash value. Mainly be used to load
non-integer value into bitmap column, e.g.,
example
mysql> select bitmap_count(bitmap_hash('hello'));
+------------------------------------+
| bitmap_count(bitmap_hash('hello')) |
+------------------------------------+
| 1 |
+------------------------------------+
keywords
BITMAP_HASH,BITMAP
bitmap_from_string
bitmap_from_string
description
Syntax
BITMAP BITMAP_FROM_STRING(VARCHAR input)
Convert a string into a bitmap. The input string should be a comma separated unsigned bigint (ranging from 0 to
18446744073709551615).
For example: input string "0, 1, 2" will be converted to a Bitmap with bit 0, 1, 2 set.
If input string is
invalid, return NULL.
example
mysql> select bitmap_to_string(bitmap_empty());
+----------------------------------+
| bitmap_to_string(bitmap_empty()) |
+----------------------------------+
| |
+----------------------------------+
+-------------------------------------------------+
| bitmap_to_string(bitmap_from_string('0, 1, 2')) |
+-------------------------------------------------+
| 0,1,2 |
+-------------------------------------------------+
+-----------------------------------+
| bitmap_from_string('-1, 0, 1, 2') |
+-----------------------------------+
| NULL |
+-----------------------------------+
+--------------------------------------------------------------------+
| bitmap_to_string(bitmap_from_string('0, 1, 18446744073709551615')) |
+--------------------------------------------------------------------+
| 0,1,18446744073709551615 |
+--------------------------------------------------------------------+
keywords
BITMAP_FROM_STRING,BITMAP
bitmap_to_string
bitmap_to_string
description
Syntax
VARCHAR BITMAP_TO_STRING(BITMAP input)
Convert a input BITMAP to a string. The string is a separated string, contains all set bits in Bitmap.
If input is null, return null.
example
mysql> select bitmap_to_string(null);
+------------------------+
| bitmap_to_string(NULL) |
+------------------------+
| NULL |
+------------------------+
+----------------------------------+
| bitmap_to_string(bitmap_empty()) |
+----------------------------------+
| |
+----------------------------------+
+--------------------------------+
| bitmap_to_string(to_bitmap(1)) |
+--------------------------------+
| 1 |
+--------------------------------+
+---------------------------------------------------------+
| bitmap_to_string(bitmap_or(to_bitmap(1), to_bitmap(2))) |
+---------------------------------------------------------+
| 1,2 |
+---------------------------------------------------------+
keywords
BITMAP_TO_STRING,BITMAP
bitmap_to_array
bitmap_to_array
description
Syntax
ARRAY_BIGINT bitmap_to_array(BITMAP input)
example
mysql> select bitmap_to_array(null);
+------------------------+
| bitmap_to_array(NULL) |
+------------------------+
| NULL |
+------------------------+
+---------------------------------+
| bitmap_to_array(bitmap_empty()) |
+---------------------------------+
| [] |
+---------------------------------+
+-------------------------------+
| bitmap_to_array(to_bitmap(1)) |
+-------------------------------+
| [1] |
+-------------------------------+
+--------------------------------------------------+
| bitmap_to_array(bitmap_from_string('1,2,3,4,5')) |
+--------------------------------------------------+
| [1, 2, 3, 4, 5] |
+--------------------------------------------------
keywords
BITMAP_TO_ARRAY,BITMAP
bitmap_from_array
bitmap_from_array
description
Syntax
BITMAP BITMAP_FROM_ARRAY(ARRAY input)
example
mysql> select *, bitmap_to_string(bitmap_from_array(c_array)) from array_test;
+------+-----------------------+------------------------------------------------+
| id | c_array | bitmap_to_string(bitmap_from_array(`c_array`)) |
+------+-----------------------+------------------------------------------------+
| 1 | [NULL] | NULL |
| 3 | [1, 2, 3, 4, 5, 6, 7] | 1,2,3,4,5,6,7 |
+------+-----------------------+------------------------------------------------+
keywords
BITMAP_FROM_ARRAY,BITMAP
bitmap_empty
bitmap_empty
description
Syntax
BITMAP BITMAP_EMPTY()
Return an empty bitmap. Mainly be used to supply default value for bitmap column when loading, e.g.,
example
mysql> select bitmap_count(bitmap_empty());
+------------------------------+
| bitmap_count(bitmap_empty()) |
+------------------------------+
| 0 |
+------------------------------+
keywords
BITMAP_EMPTY,BITMAP
bitmap_or
bitmap_or
description
Syntax
BITMAP BITMAP_OR(BITMAP lhs, BITMAP rhs, ...)
Compute union of two or more input bitmaps, returns the new bitmap.
example
mysql> select bitmap_count(bitmap_or(to_bitmap(1), to_bitmap(2))) cnt;
+------+
| cnt |
+------+
| 2 |
+------+
+------+
| cnt |
+------+
| 1 |
+------+
+---------------------------------------------------------+
| bitmap_to_string(bitmap_or(to_bitmap(1), to_bitmap(2))) |
+---------------------------------------------------------+
| 1,2 |
+---------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------+
| NULL |
+--------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------+
| 0,1,2,10 |
+------------------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------------------+
| 1,2,3,4,5,10 |
+--------------------------------------------------------------------------------------------------------+
keywords
BITMAP_OR,BITMAP
bitmap_and
bitmap_and
description
Syntax
BITMAP BITMAP_AND(BITMAP lhs, BITMAP rhs, ...)
Compute intersection of two or more input bitmaps, return the new bitmap.
example
mysql> select bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))) cnt;
+------+
| cnt |
+------+
| 0 |
+------+
+------+
| cnt |
+------+
| 1 |
+------+
+----------------------------------------------------------+
| bitmap_to_string(bitmap_and(to_bitmap(1), to_bitmap(1))) |
+----------------------------------------------------------+
| 1 |
+----------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------------
-------+
| bitmap_to_string(bitmap_and(bitmap_from_string('1,2,3'), bitmap_from_string('1,2'),
bitmap_from_string('1,2,3,4,5'))) |
+----------------------------------------------------------------------------------------------------------------
-------+
| 1,2
|
+----------------------------------------------------------------------------------------------------------------
-------+
+----------------------------------------------------------------------------------------------------------------
-----------------------+
| bitmap_to_string(bitmap_and(bitmap_from_string('1,2,3'), bitmap_from_string('1,2'),
bitmap_from_string('1,2,3,4,5'), bitmap_empty())) |
+----------------------------------------------------------------------------------------------------------------
-----------------------+
|
|
+----------------------------------------------------------------------------------------------------------------
-----------------------+
+----------------------------------------------------------------------------------------------------------------
-------------+
| bitmap_to_string(bitmap_and(bitmap_from_string('1,2,3'), bitmap_from_string('1,2'),
bitmap_from_string('1,2,3,4,5'), NULL)) |
+----------------------------------------------------------------------------------------------------------------
-------------+
| NULL
|
+----------------------------------------------------------------------------------------------------------------
-------------+
keywords
BITMAP_AND,BITMAP
bitmap_union
bitmap_union function
description
Aggregate function, used to calculate the grouped bitmap union. Common usage scenarios such as: calculating PV, UV.
Syntax
BITMAP BITMAP_UNION(BITMAP value)
Enter a set of bitmap values, find the union of this set of bitmap values, and return.
example
mysql> select page_id, bitmap_union(user_id) from table group by page_id;
Combined with the bitmap_count function, the PV data of the web page can be obtained
When the user_id field is int, the above query semantics is equivalent to
keywords
BITMAP_UNION, BITMAP
bitmap_xor
bitmap_xor
description
Syntax
BITMAP BITMAP_XOR(BITMAP lhs, BITMAP rhs, ...)
Compute the symmetric union of two or more input bitmaps, return the new bitmap.
example
mysql> select bitmap_count(bitmap_xor(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'))) cnt;
+------+
| cnt |
+------+
| 2 |
+------+
+----------------------------------------------------------------------------------------+
| bitmap_to_string(bitmap_xor(bitmap_from_string('2,3'), bitmap_from_string('1,2,3,4'))) |
+----------------------------------------------------------------------------------------+
| 1,4 |
+----------------------------------------------------------------------------------------+
MySQL> select
bitmap_to_string(bitmap_xor(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'),bitmap_from_string('3,4,5'),bit
+------------------------------------------------------------------------------------------------------------------
-----+
+------------------------------------------------------------------------------------------------------------------
-----+
| 1,3,5
|
+------------------------------------------------------------------------------------------------------------------
-----+
MySQL> select
bitmap_to_string(bitmap_xor(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'),bitmap_from_string('3,4,5'),NUL
+------------------------------------------------------------------------------------------------------------------
| bitmap_to_string(bitmap_xor(bitmap_from_string('2,3'), bitmap_from_string('1,2,3,4'), bitmap_from_string('3,4,5')
+------------------------------------------------------------------------------------------------------------------
| NULL
+------------------------------------------------------------------------------------------------------------------
keywords
BITMAP_XOR,BITMAP
bitmap_not
bitmap_not
description
Syntax
BITMAP BITMAP_NOT(BITMAP lhs, BITMAP rhs)
Calculate the set after lhs minus rhs, return the new bitmap.
example
mysql> select bitmap_count(bitmap_not(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'))) cnt;
+------+
| cnt |
+------+
| 0 |
+------+
+----------------------------------------------------------------------------------------+
| bitmap_to_string(bitmap_xor(bitmap_from_string('2,3,5'), bitmap_from_string('1,2,3,4'))) |
+----------------------------------------------------------------------------------------+
| 5 |
+----------------------------------------------------------------------------------------+
keywords
BITMAP_NOT,BITMAP
bitmap_and_not
bitmap_and_not
description
Syntax
BITMAP BITMAP_AND_NOT(BITMAP lhs, BITMAP rhs)
Calculate the set after lhs minus intersection of two input bitmaps, return the new bitmap.
example
mysql> select bitmap_count(bitmap_and_not(bitmap_from_string('1,2,3'),bitmap_from_string('3,4,5'))) cnt;
+------+
| cnt |
+------+
| 2 |
+------+
keywords
BITMAP_AND_NOT,BITMAP
bitmap_subset_limit
bitmap_subset_limit
Description
Syntax
BITMAP BITMAP_SUBSET_LIMIT(BITMAP src, BIGINT range_start, BIGINT cardinality_limit)
Create subset of the BITMAP, begin with range from range_start, limit by cardinality_limit
range_start: start value for the
range
cardinality_limit: subset upper limit
example
mysql> select bitmap_to_string(bitmap_subset_limit(bitmap_from_string('1,2,3,4,5'), 0, 3)) value;
+-----------+
| value |
+-----------+
| 1,2,3 |
+-----------+
+-------+
| value |
+-------+
| 4,5 |
+-------+
keywords
BITMAP_SUBSET_LIMIT,BITMAP_SUBSET,BITMAP
bitmap_subset_in_range
bitmap_subset_in_range
Description
Syntax
BITMAP BITMAP_SUBSET_IN_RANGE(BITMAP src, BIGINT range_start, BIGINT range_end)
example
mysql> select bitmap_to_string(bitmap_subset_in_range(bitmap_from_string('1,2,3,4,5'), 0, 9)) value;
+-----------+
| value |
+-----------+
| 1,2,3,4,5 |
+-----------+
+-------+
| value |
+-------+
| 2 |
+-------+
keywords
BITMAP_SUBSET_IN_RANGE,BITMAP_SUBSET,BITMAP
sub_bitmap
sub_bitmap
description
Syntax
BITMAP SUB_BITMAP(BITMAP src, BIGINT offset, BIGINT cardinality_limit)
Starting from the position specified by offset, intercept cardinality_limit bitmap elements and return a bitmap subset.
example
mysql> select bitmap_to_string(sub_bitmap(bitmap_from_string('1,0,1,2,3,1,5'), 0, 3)) value;
+-------+
| value |
+-------+
| 0,1,2 |
+-------+
+-------+
| value |
+-------+
| 2,3 |
+-------+
+-------+
| value |
+-------+
| 2,3,5 |
+-------+
keywords
SUB_BITMAP,BITMAP_SUBSET,BITMAP
bitmap_count
bitmap_count
description
Syntax
BITMAP BITMAP_COUNT(BITMAP lhs)
+------+
| cnt |
+------+
| 1 |
+------+
+------+
| cnt |
+------+
| 1 |
+------+
keywords
BITMAP_COUNT
bitmap_and_count
bitmap_and_count
description
Syntax
BigIntVal bitmap_and_count(BITMAP lhs, BITMAP rhs, ...)
Calculate the intersection of two or more input bitmaps and return the number of intersections.
example
MySQL> select bitmap_and_count(bitmap_from_string('1,2,3'),bitmap_empty());
+---------------------------------------------------------------+
| bitmap_and_count(bitmap_from_string('1,2,3'), bitmap_empty()) |
+---------------------------------------------------------------+
| 0 |
+---------------------------------------------------------------+
+----------------------------------------------------------------------------+
| bitmap_and_count(bitmap_from_string('1,2,3'), bitmap_from_string('1,2,3')) |
+----------------------------------------------------------------------------+
| 3 |
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
| bitmap_and_count(bitmap_from_string('1,2,3'), bitmap_from_string('3,4,5')) |
+----------------------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------------------+
| 2 |
+-------------------------------------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------------------------------------
-------------+
+----------------------------------------------------------------------------------------------------------------
-------------+
|
0 |
+----------------------------------------------------------------------------------------------------------------
-------------+
+----------------------------------------------------------------------------------------------------------------
---+
+----------------------------------------------------------------------------------------------------------------
---+
|
NULL |
+----------------------------------------------------------------------------------------------------------------
---+
keywords
BITMAP_AND_COUNT,BITMAP
bitmap_and_not_count
bitmap_and_not_count
description
Syntax
BITMAP BITMAP_AND_NOT_COUNT(BITMAP lhs, BITMAP rhs)
Calculate the set after lhs minus intersection of two input bitmaps, return the new bitmap size.
example
mysql> select bitmap_and_not_count(bitmap_from_string('1,2,3'),bitmap_from_string('3,4,5')) cnt;
+------+
| cnt |
+------+
| 2 |
+------+
keywords
BITMAP_AND_NOT_COUNT,BITMAP
orthogonal_bitmap_union_count
orthogonal_bitmap_union_count
description
Syntax
BITMAP ORTHOGONAL_BITMAP_UNION_COUNT(bitmap_column, column_to_filter, filter_values)
Figure out the bitmap union
count function, syntax with the original bitmap_union_count, but the implementation is different.
example
mysql> select orthogonal_bitmap_union_count(members) from tag_map where tag_group in ( 1150000, 1150001,
390006);
+------------------------------------------+
| orthogonal_bitmap_union_count(`members`) |
+------------------------------------------+
| 286957811 |
+------------------------------------------+
keywords
ORTHOGONAL_BITMAP_UNION_COUNT,BITMAP
bitmap_xor_count
bitmap_xor_count
description
Syntax
BIGINT BITMAP_XOR_COUNT(BITMAP lhs, BITMAP rhs, ...)
XOR two or more bitmap sets and return the size of the result set.
example
mysql> select bitmap_xor_count(bitmap_from_string('1,2,3'),bitmap_from_string('3,4,5'));
+----------------------------------------------------------------------------+
| bitmap_xor_count(bitmap_from_string('1,2,3'), bitmap_from_string('3,4,5')) |
+----------------------------------------------------------------------------+
| 4 |
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
| bitmap_xor_count(bitmap_from_string('1,2,3'), bitmap_from_string('1,2,3')) |
+----------------------------------------------------------------------------+
| 0 |
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
| bitmap_xor_count(bitmap_from_string('1,2,3'), bitmap_from_string('4,5,6')) |
+----------------------------------------------------------------------------+
| 6 |
+----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------------------------+
| 3 |
+-----------------------------------------------------------------------------------------------------------+
MySQL> select
(bitmap_xor_count(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'),bitmap_from_string('3,4,5'),bitmap_empty(
+------------------------------------------------------------------------------------------------------------------
-----+
+------------------------------------------------------------------------------------------------------------------
-----+
|
3 |
+------------------------------------------------------------------------------------------------------------------
-----+
MySQL> select
(bitmap_xor_count(bitmap_from_string('2,3'),bitmap_from_string('1,2,3,4'),bitmap_from_string('3,4,5'),NULL));
+-----------------------------------------------------------------------------------------------------------------+
| (bitmap_xor_count(bitmap_from_string('2,3'), bitmap_from_string('1,2,3,4'), bitmap_from_string('3,4,5'), NULL)) |
+-----------------------------------------------------------------------------------------------------------------+
| NULL |
+-----------------------------------------------------------------------------------------------------------------+
keywords
BITMAP_XOR_COUNT,BITMAP
bitmap_or_count
bitmap_or_count
description
Syntax
BigIntVal bitmap_or_count(BITMAP lhs, BITMAP rhs)
Calculates the union of two or more input bitmaps and returns the number of union sets.
example
MySQL> select bitmap_or_count(bitmap_from_string('1,2,3'),bitmap_empty());
+--------------------------------------------------------------+
| bitmap_or_count(bitmap_from_string('1,2,3'), bitmap_empty()) |
+--------------------------------------------------------------+
| 3 |
+--------------------------------------------------------------+
+---------------------------------------------------------------------------+
| bitmap_or_count(bitmap_from_string('1,2,3'), bitmap_from_string('1,2,3')) |
+---------------------------------------------------------------------------+
| 3 |
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| bitmap_or_count(bitmap_from_string('1,2,3'), bitmap_from_string('3,4,5')) |
+---------------------------------------------------------------------------+
| 5 |
+---------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------------------------+
+-----------------------------------------------------------------------------------------------------------+
| 6 |
+-----------------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------------------------------------------+
keywords
BITMAP_OR_COUNT,BITMAP
bitmap_contains
bitmap_contains
description
Syntax
BOOLEAN BITMAP_CONTAINS(BITMAP bitmap, BIGINT input)
Calculates whether the input value is in the Bitmap column and returns a Boolean value.
example
mysql> select bitmap_contains(to_bitmap(1),2) cnt;
+------+
| cnt |
+------+
| 0 |
+------+
+------+
| cnt |
+------+
| 1 |
+------+
keywords
BITMAP_CONTAINS,BITMAP
bitmap_has_all
bitmap_has_all
description
Syntax
BOOLEAN BITMAP_HAS_ALL(BITMAP lhs, BITMAP rhs)
Returns true if the first bitmap contains all the elements of the second bitmap.
Returns true if the second bitmap contains an
empty element.
example
mysql> select bitmap_has_all(bitmap_from_string("0, 1, 2"), bitmap_from_string("1, 2")) cnt;
+------+
| cnt |
+------+
| 1 |
+------+
+------+
| cnt |
+------+
| 0 |
+------+
keywords
BITMAP_HAS_ALL,BITMAP
bitmap_has_any
bitmap_has_any
description
Syntax
BOOLEAN BITMAP_HAS_ANY(BITMAP lhs, BITMAP rhs)
Calculate whether there are intersecting elements in the two Bitmap columns. The return value is Boolean.
example
mysql> select bitmap_has_any(to_bitmap(1),to_bitmap(2)) cnt;
+------+
| cnt |
+------+
| 0 |
+------+
+------+
| cnt |
+------+
| 1 |
+------+
keywords
BITMAP_HAS_ANY,BITMAP
bitmap_max
bitmap_max
description
Syntax
BIGINT BITMAP_MAX(BITMAP input)
example
mysql> select bitmap_max(bitmap_from_string('')) value;
+-------+
| value |
+-------+
| NULL |
+-------+
+------------+
| value |
+------------+
| 9999999999 |
+------------+
keywords
BITMAP_MAX,BITMAP
bitmap_min
bitmap_min
description
Syntax
BIGINT BITMAP_MIN(BITMAP input)
example
mysql> select bitmap_min(bitmap_from_string('')) value;
+-------+
| value |
+-------+
| NULL |
+-------+
+-------+
| value |
+-------+
| 1 |
+-------+
keywords
BITMAP_MIN,BITMAP
intersect_count
intersect_count
description
Syntax
BITMAP INTERSECT_COUNT(bitmap_column, column_to_filter, filter_values)
Calculate the intersection of two or more
bitmaps
Usage: intersect_count(bitmap_column_to_count, filter_column, filter_values ...)
Example: intersect_count(user_id,
event, 'A', 'B', 'C'), meaning find the intersect count of user_id in all A/B/C 3 bitmaps
example
MySQL [test_query_qa]> select dt,bitmap_to_string(user_id) from pv_bitmap where dt in (3,4);
+------+-----------------------------+
| dt | bitmap_to_string(`user_id`) |
+------+-----------------------------+
| 4 | 1,2,3 |
| 3 | 1,2,3,4,5 |
+------+-----------------------------+
+----------------------------------------+
| intersect_count(`user_id`, `dt`, 3, 4) |
+----------------------------------------+
| 3 |
+----------------------------------------+
keywords
INTERSECT_COUNT,BITMAP
bitmap_intersect
bitmap_intersect
description
Aggregation function, used to calculate the bitmap intersection after grouping. Common usage scenarios such as:
calculating user retention rate.
Syntax
BITMAP BITMAP_INTERSECT(BITMAP value)
Enter a set of bitmap values, find the intersection of the set of bitmap values, and return.
example
Table schema
KeysType: AGG_KEY
Find the retention of users between 2020-05-18 and 2020-05-19 under different tags.
mysql> select tag, bitmap_intersect(user_id) from (select tag, date, bitmap_union(user_id) user_id from table
where date in ('2020-05-18', '2020-05-19') group by tag, date) a group by tag;
Used in combination with the bitmap_to_string function to obtain the specific data of the intersection
Who are the users retained under different tags between 2020-05-18 and 2020-05-19?
keywords
BITMAP_INTERSECT, BITMAP
orthogonal_bitmap_intersect
orthogonal_bitmap_intersect
description
Syntax
BITMAP ORTHOGONAL_BITMAP_INTERSECT(bitmap_column, column_to_filter, filter_values)
The bitmap intersection function,
the first parameter is the bitmap column, the second parameter is the dimension column for filtering, and the third
parameter is the variable length parameter, which means different values of the filter dimension column
example
mysql> select orthogonal_bitmap_intersect(members, tag_group, 1150000, 1150001, 390006) from tag_map where
tag_group in ( 1150000, 1150001, 390006);
+-------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------------------------+
keywords
ORTHOGONAL_BITMAP_INTERSECT,BITMAP
orthogonal_bitmap_intersect_count
orthogonal_bitmap_intersect_count
description
Syntax
BITMAP ORTHOGONAL_BITMAP_INTERSECT_COUNT(bitmap_column, column_to_filter, filter_values)
The bitmap intersection
count function, the first parameter is the bitmap column, the second parameter is the dimension column for filtering, and
the third parameter is the variable length parameter, which means different values of the filter dimension column
example
mysql> select orthogonal_bitmap_intersect_count(members, tag_group, 1150000, 1150001, 390006) from tag_map where
tag_group in ( 1150000, 1150001, 390006);
+-------------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------------------+
keywords
ORTHOGONAL_BITMAP_INTERSECT_COUNT,BITMAP
bitmap_hash64
bitmap_hash64
description
Syntax
BITMAP BITMAP_HASH64(expr)
Compute the 64-bits hash value of a expr of any type, then return a bitmap containing that hash value. Mainly be used to
load non-integer value into bitmap column, e.g.,
example
mysql> select bitmap_count(bitmap_hash64('hello'));
+------------------------------------+
| bitmap_count(bitmap_hash64('hello')) |
+------------------------------------+
| 1 |
+------------------------------------+
keywords
BITMAP_HASH,BITMAP
bitand
bitand
description
Syntax
BITAND(Integer-type lhs, Integer-type rhs)
、 、 、
Integer range: TINYINT SMALLINT INT BIGINT LARGEINT 、
example
mysql> select bitand(3,5) ans;
+------+
| ans |
+------+
| 1 |
+------+
+------+
| ans |
+------+
| 4 |
+------+
keywords
BITAND
bitor
bitor
description
Syntax
BITOR(Integer-type lhs, Integer-type rhs)
、 、 、
Integer range: TINYINT SMALLINT INT BIGINT LARGEINT 、
example
mysql> select bitor(3,5) ans;
+------+
| ans |
+------+
| 7 |
+------+
+------+
| ans |
+------+
| 7 |
+------+
keywords
BITOR
bitxor
bitxor
description
Syntax
BITXOR(Integer-type lhs, Integer-type rhs)
、 、 、
Integer range: TINYINT SMALLINT INT BIGINT LARGEINT 、
example
mysql> select bitxor(3,5) ans;
+------+
| ans |
+------+
| 7 |
+------+
+------+
| ans |
+------+
| 6 |
+------+
keywords
BITXOR
bitnot
bitnot
description
Syntax
BITNOT(Integer-type value)
、 、 、
Integer range: TINYINT SMALLINT INT BIGINT LARGEINT 、
example
mysql> select bitnot(7) ans;
+------+
| ans |
+------+
| -8 |
+------+
+------+
| ans |
+------+
| 126 |
+------+
keywords
BITNOT
case
case
description
Syntax
CASE expression
...
[ELSE result]
END
OR
...
[ELSE result]
END
Compare the expression with multiple possible values, and return the corresponding results when matching
example
mysql> select user_id, case user_id when 1 then 'user_id = 1' when 2 then 'user_id = 2' else 'user_id not exist'
end test_case from test;
+---------+-------------+
| user_id | test_case |
+---------+-------------+
| 1 | user_id = 1 |
| 2 | user_id = 2 |
+---------+-------------+
mysql> select user_id, case when user_id = 1 then 'user_id = 1' when user_id = 2 then 'user_id = 2' else 'user_id
not exist' end test_case from test;
+---------+-------------+
| user_id | test_case |
+---------+-------------+
| 1 | user_id = 1 |
| 2 | user_id = 2 |
+---------+-------------+
keywords
CASE
coalesce
coalesce
description
Syntax
coalesce(expr1, expr2, ...., expr_n)
Returns the first non empty expression in the parameter (from left to right)
example
mysql> select coalesce(NULL, '1111', '0000');
+--------------------------------+
+--------------------------------+
| 1111 |
+--------------------------------+
keywords
COALESCE
if
if
description
Syntax
if(boolean condition, type valueTrue, type valueFalseOrNull)
The return type is the type of the result of the valueTrue/valueFalseOrNull expression
example
mysql> select user_id, if(user_id = 1, "true", "false") test_if from test;
+---------+---------+
| user_id | test_if |
+---------+---------+
| 1 | true |
| 2 | false |
+---------+---------+
keywords
IF
ifnull
ifnull
description
Syntax
ifnull(expr1, expr2)
If the value of expr1 is not null, expr1 is returned, otherwise expr2 is returned
example
mysql> select ifnull(1,0);
+--------------+
| ifnull(1, 0) |
+--------------+
| 1 |
+--------------+
+------------------+
| ifnull(NULL, 10) |
+------------------+
| 10 |
+------------------+
keywords
IFNULL
nvl
nvl
Since Version 1.2.0
nvl
description
Syntax
nvl(expr1, expr2)
If the value of expr1 is not null, expr1 is returned, otherwise expr2 is returned
example
mysql> select nvl(1,0);
+--------------+
| nvl(1, 0) |
+--------------+
| 1 |
+--------------+
+------------------+
| nvl(NULL, 10) |
+------------------+
| 10 |
+------------------+
keywords
NVL
nullif
nullif
description
Syntax
nullif(expr1, expr2)
If the two parameters are equal, null is returned. Otherwise, the value of the first parameter is returned. It has the same effect
as the following case when
CASE
ELSE expr1
END
example
mysql> select nullif(1,1);
+--------------+
| nullif(1, 1) |
+--------------+
| NULL |
+--------------+
+--------------+
| nullif(1, 0) |
+--------------+
| 1 |
+--------------+
keywords
NULLIF
jsonb_parse
jsonb_parse
description
jsonb_parse functions parse JSON string to binary format. A series of functions are provided to satisfy different demand for
exception handling.
Syntax
JSONB jsonb_parse(VARCHAR json_str)
JSONB jsonb_parse_error_to_null(VARCHAR json_str)
JSONB
jsonb_parse_error_to_value(VARCHAR json_str, VARCHAR default_json_str)
example
1. parse valid JSON string
+--------------------------------------+
| jsonb_parse('{"k1":"v31","k2":300}') |
+--------------------------------------+
| {"k1":"v31","k2":300} |
+--------------------------------------+
ERROR 1105 (HY000): errCode = 2, detailMessage = json parse error: Invalid document: document must be an object
or an array for value: invalid json
+-------------------------------------------+
| jsonb_parse_error_to_null('invalid json') |
+-------------------------------------------+
| NULL |
+-------------------------------------------+
+--------------------------------------------------+
+--------------------------------------------------+
| {} |
+--------------------------------------------------+
keywords
JSONB, JSON, jsonb_parse, jsonb_parse_error_to_null, jsonb_parse_error_to_value
jsonb_extract
jsonb_extract
Since Version 1.2.0
jsonb_extract
description
jsonb_extract functions extract field specified by json_path from JSONB. A series of functions are provided for different
datatype.
Syntax
JSONB jsonb_extract(JSONB j, VARCHAR json_path)
description
There are two extra functions to check field existence and type
jsonb_exists_path check the existence of the field specified by json_path, return TRUE or FALS
jsonb_exists_path get the type as follows of the field specified by json_path, return NULL if it does not exist
object
array
null
bool
int
bigint
double
string
example
refer to jsonb tutorial for more.
keywords
JSONB, JSON, jsonb_extract, jsonb_extract_isnull, jsonb_extract_bool, jsonb_extract_int, jsonb_extract_bigint,
jsonb_extract_double, jsonb_extract_string, jsonb_exists_path, jsonb_type
get_ json_double
get_ json_double
description
Syntax
DOUBLE get_json_double(VARCHAR json_str, VARCHAR json_path)
Parse and get the floating-point content of the specified path in the JSON string.
Where json_path must start with the
$symbol and use. as the path splitter. If the path contains..., double quotation marks can be used to surround it.
Use [] to
denote array subscripts, starting at 0.
The content of path cannot contain ",[and].
If the json_string format is incorrect, or the
json_path format is incorrect, or matches cannot be found, NULL is returned.
In addition, it is recommended to use the jsonb type and jsonb_extract_XXX function performs the same function.
example
1. Get the value of key as "k1"
+-------------------------------------------------+
+-------------------------------------------------+
| 1.3 |
+-------------------------------------------------+
2. Get the second element of the array whose key is "my. key"
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| 2.2 |
+---------------------------------------------------------------------------+
3. Get the first element in an array whose secondary path is k1. key - > K2
+---------------------------------------------------------------------+
+---------------------------------------------------------------------+
| 1.1 |
+---------------------------------------------------------------------+
keywords
GET_JSON_DOUBLE,GET, JSON,DOUBLE
get_ json_int
get_ json_int
Description
Syntax
INT get_json_int(VARCHAR json_str, VARCHAR json_path)
Parse and retrieve the integer content of the specified path in the JSON string.
Where json_path must start with the $symbol
and use. as the path splitter. If the path contains..., double quotation marks can be used to surround it.
Use [] to denote array
subscripts, starting at 0.
The content of path cannot contain ",[and].
If the json_string format is incorrect, or the json_path
format is incorrect, or matches cannot be found, NULL is returned.
In addition, it is recommended to use the jsonb type and jsonb_extract_XXX function performs the same function.
example
1. Get the value of key as "k1"
+--------------------------------------------+
+--------------------------------------------+
| 1 |
+--------------------------------------------+
2. Get the second element of the array whose key is "my. key"
+------------------------------------------------------------------+
+------------------------------------------------------------------+
| 2 |
+------------------------------------------------------------------+
3. Get the first element in an array whose secondary path is k1. key - > K2
+--------------------------------------------------------------+
+--------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------+
keywords
GET_JSON_INT,GET, JSON,INT
get_ json_string
get_ json_string
description
Syntax
VARCHAR get_json_string (VARCHAR json str, VARCHAR json path)
Parse and retrieve the string content of the specified path in the JSON string.
Where json_path must start with the $symbol
and use. as the path splitter. If the path contains..., double quotation marks can be used to surround it.
Use [] to denote array
subscripts, starting at 0.
The content of path cannot contain ",[and].
If the json_string format is incorrect, or the json_path
format is incorrect, or matches cannot be found, NULL is returned.
In addition, it is recommended to use the jsonb type and jsonb_extract_XXX function performs the same function.
example
1. Get the value of key as "k1"
+---------------------------------------------------+
+---------------------------------------------------+
| v1 |
+---------------------------------------------------+
2. Get the second element of the array whose key is "my. key"
+------------------------------------------------------------------------------+
+------------------------------------------------------------------------------+
| e2 |
+------------------------------------------------------------------------------+
3. Get the first element in an array whose secondary path is k1. key - > K2
+-----------------------------------------------------------------------+
+-----------------------------------------------------------------------+
| v1 |
+-----------------------------------------------------------------------+
4. Get all the values in the array where the key is "k1"
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
| ["v1","v3","v4"] |
+---------------------------------------------------------------------------------+
keywords
GET_JSON_STRING,GET, JSON,STRING
json_array
json_array
Description
Syntax
VARCHAR json_array(VARCHAR,...)
Generate a json array containing the specified values, return empty if no values
example
MySQL> select json_array();
+--------------+
| json_array() |
+--------------+
| [] |
+--------------+
+--------------------+
| json_array('NULL') |
+--------------------+
| [NULL] |
+--------------------+
+-----------------------------------------------+
+-----------------------------------------------+
+-----------------------------------------------+
+------------------------------+
+------------------------------+
+------------------------------+
keywords
json,array,json_array
json_object
json_object
Description
Syntax
VARCHAR json_object(VARCHAR,...)
example
MySQL> select json_object();
+---------------+
| json_object() |
+---------------+
| {} |
+---------------+
+--------------------------------+
| json_object('time', curtime()) |
+--------------------------------+
| {"time": "10:49:18"} |
+--------------------------------+
+-----------------------------------------+
+-----------------------------------------+
+-----------------------------------------+
+---------------------------------+
| json_object('username', 'NULL') |
+---------------------------------+
| {"username": NULL} |
+---------------------------------+
keywords
json,object,json_object
json_quote
json_quote
Description
Syntax
VARCHAR json_quote(VARCHAR)
example
MySQL> SELECT json_quote('null'), json_quote('"null"');
+--------------------+----------------------+
| json_quote('null') | json_quote('"null"') |
+--------------------+----------------------+
| "null" | "\"null\"" |
+--------------------+----------------------+
+-------------------------+
| json_quote('[1, 2, 3]') |
+-------------------------+
| "[1, 2, 3]" |
+-------------------------+
+------------------+
| json_quote(null) |
+------------------+
| NULL |
+------------------+
+------------------------+
| json_quote('\n\b\r\t') |
+------------------------+
| "\n\b\r\t" |
+------------------------+
keywords
json,quote,json_quote
json_valid
json_valid
description
json_valid functions returns 0 or 1 to indicate whether a value is valid JSON and Returns NULL if the argument is NULL.
Syntax
JSONB json_valid(VARCHAR json_str)
example
1. parse valid JSON string
+-------------------------------------+
| json_valid('{"k1":"v31","k2":300}') |
+-------------------------------------+
| 1 |
+-------------------------------------+
+----------------------------+
| json_valid('invalid json') |
+----------------------------+
| 0 |
+----------------------------+
3. parse NULL
+------------------+
| json_valid(NULL) |
+------------------+
| NULL |
+------------------+
keywords
JSON, VALID, JSON_VALID
murmur_hash3_32
murmur_hash3_32
description
Syntax
INT MURMUR_HASH3_32(VARCHAR input, ...)
example
mysql> select murmur_hash3_32(null);
+-----------------------+
| murmur_hash3_32(NULL) |
+-----------------------+
| NULL |
+-----------------------+
+--------------------------+
| murmur_hash3_32('hello') |
+--------------------------+
| 1321743225 |
+--------------------------+
+-----------------------------------+
| murmur_hash3_32('hello', 'world') |
+-----------------------------------+
| 984713481 |
+-----------------------------------+
keywords
MURMUR_HASH3_32,HASH
murmur_hash3_64
murmur_hash3_64
description
Syntax
BIGINT MURMUR_HASH3_64(VARCHAR input, ...)
example
mysql> select murmur_hash3_64(null);
+-----------------------+
| murmur_hash3_64(NULL) |
+-----------------------+
| NULL |
+-----------------------+
+--------------------------+
| murmur_hash3_64('hello') |
+--------------------------+
| -3215607508166160593 |
+--------------------------+
+-----------------------------------+
| murmur_hash3_64('hello', 'world') |
+-----------------------------------+
| 3583109472027628045 |
+-----------------------------------+
keywords
MURMUR_HASH3_64,HASH
HLL_CARDINALITY
HLL_CARDINALITY
description
Syntax
HLL_CARDINALITY(hll)
example
MySQL > select HLL_CARDINALITY(uv_set) from test_uv;
+---------------------------+
| hll_cardinality(`uv_set`) |
+---------------------------+
| 3 |
+---------------------------+
keywords
HLL,HLL_CARDINALITY
HLL_EMPTY
HLL_EMPTY
description
Syntax
HLL_EMPTY(value)
example
MySQL > select hll_cardinality(hll_empty());
+------------------------------+
| hll_cardinality(hll_empty()) |
+------------------------------+
| 0 |
+------------------------------+
keywords
HLL,HLL_EMPTY
HLL_HASH
HLL_HASH
description
Syntax
HLL_HASH(value)
example
MySQL > select HLL_CARDINALITY(HLL_HASH('abc'));
+----------------------------------+
| hll_cardinality(HLL_HASH('abc')) |
+----------------------------------+
| 1 |
+----------------------------------+
keywords
HLL,HLL_HASH
conv
conv
description
Syntax
VARCHAR CONV(VARCHAR input, TINYINT from_base, TINYINT to_base)
VARCHAR CONV(BIGINT input, TINYINT from_base,
TINYINT to_base)
Convert the input number to the target base. The input base range should be within [2,36] .
example
MySQL [test]> SELECT CONV(15,10,2);
+-----------------+
| conv(15, 10, 2) |
+-----------------+
| 1111 |
+-----------------+
+--------------------+
+--------------------+
| 255 |
+--------------------+
+-------------------+
+-------------------+
| E6 |
+-------------------+
keywords
CONV
bin
bin
description
Syntax
STRING bin(BIGINT x)
Convert the decimal number x to binary.
example
mysql> select bin(0);
+--------+
| bin(0) |
+--------+
| 0 |
+--------+
+---------+
| bin(10) |
+---------+
| 1010 |
+---------+
+------------------------------------------------------------------+
| bin(-3) |
+------------------------------------------------------------------+
| 1111111111111111111111111111111111111111111111111111111111111101 |
+------------------------------------------------------------------+
keywords
BIN
sin
sin
description
Syntax
DOUBLE sin(DOUBLE x)
Returns the sine of x , where x is in radians
example
mysql> select sin(0);
+----------+
| sin(0.0) |
+----------+
| 0 |
+----------+
+--------------------+
| sin(1.0) |
+--------------------+
| 0.8414709848078965 |
+--------------------+
+-----------------+
| sin(0.5 * pi()) |
+-----------------+
| 1 |
+-----------------+
keywords
SIN
cos
cos
description
Syntax
DOUBLE cos(DOUBLE x)
Returns the cosine of x , where x is in radians
example
mysql> select cos(1);
+---------------------+
| cos(1.0) |
+---------------------+
| 0.54030230586813977 |
+---------------------+
+----------+
| cos(0.0) |
+----------+
| 1 |
+----------+
| cos(pi()) |
+-----------+
| -1 |
+-----------+
keywords
COS
tan
tan
description
Syntax
DOUBLE tan(DOUBLE x)
Returns the tangent of x , where x is in radians.
example
mysql> select tan(0);
+----------+
| tan(0.0) |
+----------+
| 0 |
+----------+
+--------------------+
| tan(1.0) |
+--------------------+
| 1.5574077246549023 |
+--------------------+
keywords
TAN
asin
asin
description
Syntax
DOUBLE asin(DOUBLE x)
Returns the arc sine of x , or nan if x is not in the range -1 to 1 .
example
mysql> select asin(0.5);
+---------------------+
| asin(0.5) |
+---------------------+
| 0.52359877559829893 |
+---------------------+
+-----------+
| asin(2.0) |
+-----------+
| nan |
+-----------+
keywords
ASIN
acos
acos
description
Syntax
DOUBLE acos(DOUBLE x)
Returns the arc cosine of x , or nan if x is not in the range -1 to 1 .
example
mysql> select acos(1);
+-----------+
| acos(1.0) |
+-----------+
| 0 |
+-----------+
+--------------------+
| acos(0.0) |
+--------------------+
| 1.5707963267948966 |
+--------------------+
+------------+
| acos(-2.0) |
+------------+
| nan |
+------------+
keywords
ACOS
atan
atan
description
Syntax
DOUBLE atan(DOUBLE x)
Returns the arctangent of x , where x is in radians.
example
mysql> select atan(0);
+-----------+
| atan(0.0) |
+-----------+
| 0 |
+-----------+
+--------------------+
| atan(2.0) |
+--------------------+
| 1.1071487177940904 |
+--------------------+
keywords
ATAN
description
Syntax
DOUBLE e()
Returns the constant e value.
example
mysql> select e();
+--------------------+
| e() |
+--------------------+
| 2.7182818284590451 |
+--------------------+
keywords
E
Pi
Pi
description
Syntax
DOUBLE Pi()
Returns the constant Pi value.
example
mysql> select Pi();
+--------------------+
| pi() |
+--------------------+
| 3.1415926535897931 |
+--------------------+
keywords
PI
exp
exp
description
Syntax
DOUBLE exp(DOUBLE x)
Returns x raised to the base e .
example
mysql> select exp(2);
+------------------+
| exp(2.0) |
+------------------+
| 7.38905609893065 |
+------------------+
+--------------------+
| exp(3.4) |
+--------------------+
| 29.964100047397011 |
+--------------------+
keywords
EXP
log
log
description
Syntax
DOUBLE log(DOUBLE b, DOUBLE x)
Returns the logarithm of x based on base b .
example
mysql> select log(5,1);
+---------------+
| log(5.0, 1.0) |
+---------------+
| 0 |
+---------------+
| log(3.0, 20.0) |
+--------------------+
| 2.7268330278608417 |
+--------------------+
+-------------------+
| log(2.0, 65536.0) |
+-------------------+
| 16 |
+-------------------+
keywords
LOG
log2
log2
description
Syntax
DOUBLE log2(DOUBLE x)
Returns the natural logarithm of x to base 2 .
example
mysql> select log2(1);
+-----------+
| log2(1.0) |
+-----------+
| 0 |
+-----------+
+-----------+
| log2(2.0) |
+-----------+
| 1 |
+-----------+
+--------------------+
| log2(10.0) |
+--------------------+
| 3.3219280948873622 |
+--------------------+
keywords
LOG2
ln
ln
description
Syntax
DOUBLE ln(DOUBLE x)
Returns the natural logarithm of x to base e .
example
mysql> select ln(1);
+---------+
| ln(1.0) |
+---------+
| 0 |
+---------+
+---------+
| ln(e()) |
+---------+
| 1 |
+---------+
+--------------------+
| ln(10.0) |
+--------------------+
| 2.3025850929940459 |
+--------------------+
keywords
LN
log10
log10
description
Syntax
DOUBLE log10(DOUBLE x)
Returns the natural logarithm of x to base 10 .
example
mysql> select log10(1);
+------------+
| log10(1.0) |
+------------+
| 0 |
+------------+
| log10(10.0) |
+-------------+
| 1 |
+-------------+
| log10(16.0) |
+--------------------+
| 1.2041199826559248 |
+--------------------+
keywords
LOG10
ceil
ceil
description
Syntax
BIGINT ceil(DOUBLE x)
Returns the smallest integer value greater than or equal to x .
example
mysql> select ceil(1);
+-----------+
| ceil(1.0) |
+-----------+
| 1 |
+-----------+
| ceil(2.4) |
+-----------+
| 3 |
+-----------+
+-------------+
| ceil(-10.3) |
+-------------+
| -10 |
+-------------+
keywords
CEIL
floor
floor
description
Syntax
BIGINT floor(DOUBLE x)
Returns the largest integer value less than or equal to x .
example
mysql> select floor(1);
+------------+
| floor(1.0) |
+------------+
| 1 |
+------------+
+------------+
| floor(2.4) |
+------------+
| 2 |
+------------+
+--------------+
| floor(-10.3) |
+--------------+
| -11 |
+--------------+
keywords
FLOOR
pmod
pmod
description
Syntax
BIGINT PMOD(BIGINT x, BIGINT y)
DOUBLE PMOD(DOUBLE x, DOUBLE y)
Returns the positive result of x mod y in the residue
systems.
Formally, return (x%y+y)%y .
example
MySQL [test]> SELECT PMOD(13,5);
+-------------+
| pmod(13, 5) |
+-------------+
| 3 |
+-------------+
+-------------+
| pmod(-13, 5) |
+-------------+
| 2 |
+-------------+
keywords
PMOD
round
round
description
Syntax
round(x), round(x, d)
Rounds the argument x to d decimal places. d defaults to 0 if not specified. If d is negative, the left
d digits of the decimal point are 0. If x or d is null, null is returned.
example
mysql> select round(2.4);
+------------+
| round(2.4) |
+------------+
| 2 |
+------------+
+------------+
| round(2.5) |
+------------+
| 3 |
+------------+
+-------------+
| round(-3.4) |
+-------------+
| -3 |
+-------------+
+-------------+
| round(-3.5) |
+-------------+
| -4 |
+-------------+
+---------------------+
| round(1667.2725, 2) |
+---------------------+
| 1667.27 |
+---------------------+
+----------------------+
| round(1667.2725, -2) |
+----------------------+
| 1700 |
+----------------------+
keywords
ROUND
round_bankers
round_bankers
description
Syntax
round_bankers(x), round_bankers(x, d)
Rounds the argument x to d specified decimal places. d defaults to 0 if not
specified. If d is negative, the left d digits of the decimal point are 0. If x or d is null, null is returned.
If the rounding number is halfway between two numbers, the function uses banker’s rounding.
In other cases, the function rounds numbers to the nearest integer.
example
mysql> select round_bankers(0.4);
+--------------------+
| round_bankers(0.4) |
+--------------------+
| 0 |
+--------------------+
+---------------------+
| round_bankers(-3.5) |
+---------------------+
| -4 |
+---------------------+
+---------------------+
| round_bankers(-3.4) |
+---------------------+
| -3 |
+---------------------+
+--------------------------+
| round_bankers(10.755, 2) |
+--------------------------+
| 10.76 |
+--------------------------+
+-----------------------------+
| round_bankers(1667.2725, 2) |
+-----------------------------+
| 1667.27 |
+-----------------------------+
+------------------------------+
| round_bankers(1667.2725, -2) |
+------------------------------+
| 1700 |
+------------------------------+
keywords
round_bankers
truncate
truncate
description
Syntax
DOUBLE truncate(DOUBLE x, INT d)
Numerically truncate x according to the number of decimal places d .
example
mysql> select truncate(124.3867, 2);
+-----------------------+
| truncate(124.3867, 2) |
+-----------------------+
| 124.38 |
+-----------------------+
+-----------------------+
| truncate(124.3867, 0) |
+-----------------------+
| 124 |
+-----------------------+
+-------------------------+
| truncate(-124.3867, -2) |
+-------------------------+
| -100 |
+-------------------------+
keywords
TRUNCATE
abs
abs
description
Syntax
DOUBLE abs(DOUBLE x) SMALLINT abs(TINYINT x) INT abs(SMALLINT x)
example
mysql> select abs(-2);
+---------+
| abs(-2) |
+---------+
| 2 |
+---------+
+------------------+
| abs(3.254655654) |
+------------------+
| 3.254655654 |
+------------------+
+---------------------------------+
| abs(-3254654236547654354654767) |
+---------------------------------+
| 3254654236547654354654767 |
+---------------------------------+
keywords
ABS
sqrt
sqrt
description
Syntax
DOUBLE sqrt(DOUBLE x)
Returns the square root of x . x is required to be greater than or equal to 0 .
example
mysql> select sqrt(9);
+-----------+
| sqrt(9.0) |
+-----------+
| 3 |
+-----------+
+--------------------+
| sqrt(2.0) |
+--------------------+
| 1.4142135623730951 |
+--------------------+
+-------------+
| sqrt(100.0) |
+-------------+
| 10 |
+-------------+
keywords
SQRT
cbrt
cbrt
description
Syntax
DOUBLE cbrt(DOUBLE x)
Returns the cube root of x.
example
mysql> select cbrt(8);
+-----------+
| cbrt(8.0) |
+-----------+
| 2 |
+-----------+
| cbrt(2.0) |
+--------------------+
| 1.2599210498948734 |
+--------------------+
+---------------+
| cbrt(-1000.0) |
+---------------+
| -10 |
+---------------+
keywords
CBRT
pow
pow
description
Syntax
DOUBLE pow(DOUBLE a, DOUBLE b)
Returns a raised to the b power.
example
mysql> select pow(2,0);
+---------------+
| pow(2.0, 0.0) |
+---------------+
| 1 |
+---------------+
+---------------+
| pow(2.0, 3.0) |
+---------------+
| 8 |
+---------------+
+--------------------+
| pow(3.0, 2.4) |
+--------------------+
| 13.966610165238235 |
+--------------------+
keywords
POW
degrees
degrees
description
Syntax
DOUBLE degrees(DOUBLE x)
Returns the degree of x , converted from radians to degrees.
example
mysql> select degrees(0);
+--------------+
| degrees(0.0) |
+--------------+
| 0 |
+--------------+
+--------------------+
| degrees(2.0) |
+--------------------+
| 114.59155902616465 |
+--------------------+
+---------------+
| degrees(pi()) |
+---------------+
| 180 |
+---------------+
keywords
DEGREES
radians
radians
description
Syntax
DOUBLE radians(DOUBLE x)
Returns the value of x in radians, converted from degrees to radians.
example
mysql> select radians(0);
+--------------+
| radians(0.0) |
+--------------+
| 0 |
+--------------+
+---------------------+
| radians(30.0) |
+---------------------+
| 0.52359877559829882 |
+---------------------+
+--------------------+
| radians(90.0) |
+--------------------+
| 1.5707963267948966 |
+--------------------+
keywords
RADIANS
sign
sign
description
Syntax
TINYINT sign(DOUBLE x)
Returns the sign of x . Negative, zero or positive numbers correspond to -1, 0 or 1 respectively.
example
mysql> select sign(3);
+-----------+
| sign(3.0) |
+-----------+
| 1 |
+-----------+
+-----------+
| sign(0.0) |
+-----------+
| 0 |
+-------------+
| sign(-10.0) |
+-------------+
| -1 |
+-------------+
keywords
SIGN
positive
positive
description
Syntax
BIGINT positive(BIGINT x) DOUBLE positive(DOUBLE x) DECIMAL positive(DECIMAL x)
Return x .
example
mysql> SELECT positive(-10);
+---------------+
| positive(-10) |
+---------------+
| -10 |
+---------------+
+--------------+
| positive(12) |
+--------------+
| 12 |
+--------------+
keywords
POSITIVE
negative
negative
description
Syntax
BIGINT negative(BIGINT x) DOUBLE negative(DOUBLE x) DECIMAL negative(DECIMAL x)
Return -x .
example
mysql> SELECT negative(-10);
+---------------+
| negative(-10) |
+---------------+
| 10 |
+---------------+
+--------------+
| negative(12) |
+--------------+
| -12 |
+--------------+
keywords
NEGATIVE
greatest
greatest
description
Syntax
greatest(col_a, col_b, …, col_n)
column supports the following types: TINYINT SMALLINT INT BIGINT LARGEINT FLOAT DOUBLE STRING DATETIME DECIMAL
Compares the size of n columns and returns the largest among them. If there is NULL in column , it returns NULL .
example
mysql> select greatest(-1, 0, 5, 8);
+-----------------------+
| greatest(-1, 0, 5, 8) |
+-----------------------+
| 8 |
+-----------------------+
+--------------------------+
| greatest(-1, 0, 5, NULL) |
+--------------------------+
| NULL |
+--------------------------+
+-----------------------------+
+-----------------------------+
| 7.6876 |
+-----------------------------+
+-------------------------------------------------------------------------------+
+-------------------------------------------------------------------------------+
| 2022-02-26 20:02:11 |
+-------------------------------------------------------------------------------+
keywords
GREATEST
least
least
description
Syntax
least(col_a, col_b, …, col_n)
column supports the following types: TINYINT SMALLINT INT BIGINT LARGEINT FLOAT DOUBLE STRING DATETIME DECIMAL
Compare the size of n columns and return the smallest among them. If there is NULL in column , return NULL .
example
mysql> select least(-1, 0, 5, 8);
+--------------------+
| least(-1, 0, 5, 8) |
+--------------------+
| -1 |
+--------------------+
+-----------------------+
| least(-1, 0, 5, NULL) |
+-----------------------+
| NULL |
+-----------------------+
+--------------------------+
+--------------------------+
| 4.29 |
+--------------------------+
+----------------------------------------------------------------------------+
+----------------------------------------------------------------------------+
| 2020-01-23 20:02:11 |
+----------------------------------------------------------------------------+
keywords
LEAST
random
random
description
Syntax
DOUBLE random()
Returns a random number between 0-1.
example
mysql> select random();
+---------------------+
| random() |
+---------------------+
| 0.35446706030596947 |
+---------------------+
keywords
RANDOM
mod
mod
description
Syntax
mod(col_a, col_b)
column support type : TINYINT SMALLINT INT BIGINT LARGEINT FLOAT DOUBLE DECIMAL
Find the remainder of a/b. For floating-point types, use the fmod function.
example
mysql> select mod(10, 3);
+------------+
| mod(10, 3) |
+------------+
| 1 |
+------------+
+-----------------+
| fmod(10.1, 3.2) |
+-----------------+
| 0.50000024 |
+-----------------+
keywords
MOD,FMOD
AES
AES_ENCRYPT
Name
AES_ENCRYPT
description
Encryption of data using the OpenSSL. This function is consistent with the AES_ENCRYPT function in MySQL. Using
AES_128_ECB algorithm by default, and the padding mode is PKCS7.
Syntax
AES_ENCRYPT(str,key_str[,init_vector])
Arguments
str : Content to be encrypted
Return Type
VARCHAR(*)
Remarks
The AES_ENCRYPT function is not used the user secret key directly, but will be further processed. The specific steps are as
follows:
1. Determine the number of bytes of the SECRET KEY according to the encryption algorithm used. For example, if you
using AES_128_ECB, then the number of bytes of SECRET KEY are 128 / 8 = 16 (if using AES_256_ECB, then SECRET KEY
length are 128 / 8 = 32 );
2. Then XOR the i bit and the 16*k+i bit of the SECRET KEY entered by the user. If the length of the SECRET KEY less than
16 bytes, 0 will be padded;
3. Finally, use the newly generated key for encryption;
example
select to_base64(aes_encrypt('text','F3229A0B371ED2D9441B830D21A390C3'));
+--------------------------------+
| to_base64(aes_encrypt('text')) |
+--------------------------------+
| wr2JEDVXzL9+2XtRhgIloA== |
+--------------------------------+
+-----------------------------------------------------+
+-----------------------------------------------------+
| tsmK1HzbpnEdR2//WhO+MA== |
+-----------------------------------------------------+
keywords
AES_ENCRYPT
AES_DECRYPT
Name
AES_DECRYPT
Description
Decryption of data using the OpenSSL. This function is consistent with the AES_DECRYPT function in MySQL. Using
AES_128_ECB algorithm by default, and the padding mode is PKCS7.
Syntax
AES_DECRYPT(str,key_str[,init_vector])
Arguments
str : Content that encrypted
Return Type
VARCHAR(*)
example
select aes_decrypt(from_base64('wr2JEDVXzL9+2XtRhgIloA=='),'F3229A0B371ED2D9441B830D21A390C3');
+------------------------------------------------------+
| aes_decrypt(from_base64('wr2JEDVXzL9+2XtRhgIloA==')) |
+------------------------------------------------------+
| text |
+------------------------------------------------------+
set block_encryption_mode="AES_256_CBC";
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
| text |
+---------------------------------------------------------------------------+
keywords
AES_DECRYPT
MD5
MD5
description
Calculates an MD5 128-bit checksum for the string
Syntax
MD5(str)
example
MySQL [(none)]> select md5("abc");
+----------------------------------+
| md5('abc') |
+----------------------------------+
| 900150983cd24fb0d6963f7d28e17f72 |
+----------------------------------+
keywords
MD5
MD5SUM
MD5SUM
description
Calculates an MD5 128-bit checksum for the strings
Syntax
MD5SUM(str[,str])
example
MySQL > select md5("abcd");
+----------------------------------+
| md5('abcd') |
+----------------------------------+
| e2fc714c4727ee9395f324cd2e7f331f |
+----------------------------------+
+----------------------------------+
| md5sum('ab', 'cd') |
+----------------------------------+
| e2fc714c4727ee9395f324cd2e7f331f |
+----------------------------------+
keywords
MD5SUM
SM4
SM4_ENCRYPT
description
Syntax
VARCHAR SM4_ENCRYPT(str,key_str[,init_vector])
example
MySQL > select TO_BASE64(SM4_ENCRYPT('text','F3229A0B371ED2D9441B830D21A390C3'));
+--------------------------------+
| to_base64(sm4_encrypt('text')) |
+--------------------------------+
| aDjwRflBrDjhBZIOFNw3Tg== |
+--------------------------------+
+----------------------------------------------------------------------------------+
+----------------------------------------------------------------------------------+
| G7yqOKfEyxdagboz6Qf01A== |
+----------------------------------------------------------------------------------+
keywords
SM4_ENCRYPT
SM4_DECRYPT
description
Syntax
VARCHAR SM4_DECRYPT(str,key_str[,init_vector])
example
MySQL [(none)]> select SM4_DECRYPT(FROM_BASE64('aDjwRflBrDjhBZIOFNw3Tg=='),'F3229A0B371ED2D9441B830D21A390C3');
+------------------------------------------------------+
| sm4_decrypt(from_base64('aDjwRflBrDjhBZIOFNw3Tg==')) |
+------------------------------------------------------+
| text |
+------------------------------------------------------+
+--------------------------------------------------------------------------------------------------------+
+--------------------------------------------------------------------------------------------------------+
| text |
+--------------------------------------------------------------------------------------------------------+
keywords
SM4_DECRYPT
SM3
SM3
description
Calculates an SM3 256-bit checksum for the string
Syntax
SM3(str)
example
MySQL > select sm3("abcd");
+------------------------------------------------------------------+
| sm3('abcd') |
+------------------------------------------------------------------+
| 82ec580fe6d36ae4f81cae3c73f4a5b3b5a09c943172dc9053c69fd8e18dca1e |
+------------------------------------------------------------------+
keywords
SM3
SM3SUM
SM3SUM
description
Calculates an SM3 128-bit checksum for the strings
Syntax
SM3SUM(str[,str])
example
MySQL > select sm3("abcd");
+------------------------------------------------------------------+
| sm3('abcd') |
+------------------------------------------------------------------+
| 82ec580fe6d36ae4f81cae3c73f4a5b3b5a09c943172dc9053c69fd8e18dca1e |
+------------------------------------------------------------------+
+------------------------------------------------------------------+
| sm3sum('ab', 'cd') |
+------------------------------------------------------------------+
| 82ec580fe6d36ae4f81cae3c73f4a5b3b5a09c943172dc9053c69fd8e18dca1e |
+------------------------------------------------------------------+
keywords
SM3SUM
explode_ json_array
explode_ json_array
description
Table functions must be used in conjunction with Lateral View.
Expand a json array. According to the array element type, there are three function names. Corresponding to integer, floating
point and string arrays respectively.
grammar:
explode_json_array_int(json_str)
explode_json_array_double(json_str)
explode_json_array_string(json_str)
example
Original table data:
+------+
| k1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+------+
Lateral View:
mysql> select k1, e1 from example1 lateral view explode_json_array_int('[]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_int('[1,2,3]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_int('[1,"b",3]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | 1 |
| 1 | 3 |
| 2 | NULL |
| 2 | 1 |
| 2 | 3 |
| 3 | NULL |
| 3 | 1 |
| 3 | 3 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_int('["a","b","c"]') tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | NULL |
| 1 | NULL |
| 2 | NULL |
| 2 | NULL |
| 2 | NULL |
| 3 | NULL |
| 3 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_int('{"a": 3}') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('[]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('[1,2,3]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | NULL |
| 1 | NULL |
| 2 | NULL |
| 2 | NULL |
| 2 | NULL |
| 3 | NULL |
| 3 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('[1,"b",3]') tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | NULL |
| 1 | NULL |
| 2 | NULL |
| 2 | NULL |
| 2 | NULL |
| 3 | NULL |
| 3 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('[1.0,2.0,3.0]') tmp1 as e1 order by
k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('[1,"b",3]') tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | NULL |
| 1 | NULL |
| 2 | NULL |
| 2 | NULL |
| 2 | NULL |
| 3 | NULL |
| 3 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('["a","b","c"]') tmp1 as e1 order by
k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 1 | NULL |
| 1 | NULL |
| 2 | NULL |
| 2 | NULL |
| 2 | NULL |
| 3 | NULL |
| 3 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_double('{"a": 3}') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_string('[]') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_string('[1.0,2.0,3.0]') tmp1 as e1 order by
k1, e1;
+------+----------+
| k1 | e1 |
+------+----------+
| 1 | 1.000000 |
| 1 | 2.000000 |
| 1 | 3.000000 |
| 2 | 1.000000 |
| 2 | 2.000000 |
| 2 | 3.000000 |
| 3 | 1.000000 |
| 3 | 2.000000 |
| 3 | 3.000000 |
+------+----------+
mysql> select k1, e1 from example1 lateral view explode_json_array_string('[1,"b",3]') tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 1 | 3 |
| 1 | b |
| 2 | 1 |
| 2 | 3 |
| 2 | b |
| 3 | 1 |
| 3 | 3 |
| 3 | b |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_string('["a","b","c"]') tmp1 as e1 order by
k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | a |
| 1 | b |
| 1 | c |
| 2 | a |
| 2 | b |
| 2 | c |
| 3 | a |
| 3 | b |
| 3 | c |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_json_array_string('{"a": 3}') tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
+------+------+
keywords
explode,json,array,json_array,explode_json,explode_json_array
explode
explode
description
Table functions must be used in conjunction with Lateral View.
explode array column to rows. explode_outer will return NULL, while array is NULL or empty.
explode and explode_outer
both keep the nested NULL elements of array.
grammar:
explode(expr)
explode_outer(expr)
example
mysql> set enable_vectorized_engine = true
mysql> select e1 from (select 1 k1) as t lateral view explode([1,2,3]) tmp1 as e1;
+------+
| e1 |
+------+
| 1 |
| 2 |
| 3 |
+------+
mysql> select e1 from (select 1 k1) as t lateral view explode_outer(null) tmp1 as e1;
+------+
| e1 |
+------+
| NULL |
+------+
mysql> select e1 from (select 1 k1) as t lateral view explode([]) tmp1 as e1;
mysql> select e1 from (select 1 k1) as t lateral view explode([null,1,null]) tmp1 as e1;
+------+
| e1 |
+------+
| NULL |
| 1 |
| NULL |
+------+
mysql> select e1 from (select 1 k1) as t lateral view explode_outer([null,1,null]) tmp1 as e1;
+------+
| e1 |
+------+
| NULL |
| 1 |
| NULL |
+------+
keywords
EXPLODE,EXPLODE_OUTER,ARRAY
explode_split
explode_split
description
Table functions must be used in conjunction with Lateral View.
grammar:
explode_split(str, delimiter)
example
Original table data:
+------+---------+
| k1 | k2 |
+------+---------+
| 1 | |
| 2 | NULL |
| 3 | , |
| 4 | 1 |
| 5 | 1,2,3 |
| 6 | a, b, c |
+------+---------+
Lateral View:
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 2 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 2 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 3 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 3 | |
| 3 | |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 4 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 4 | 1 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 5 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 5 | 1 |
| 5 | 2 |
| 5 | 3 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_split(k2, ',') tmp1 as e1 where k1 = 6 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 6 | b |
| 6 | c |
| 6 | a |
+------+------+
keywords
explode,split,explode_split
explode_bitmap
explode_bitmap
description
Table functions must be used in conjunction with Lateral View.
grammar:
explode_bitmap(bitmap)
example
Original table data:
+------+
| k1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+------+
Lateral View:
mysql> select k1, e1 from example1 lateral view explode_bitmap(bitmap_empty()) tmp1 as e1 order by k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | NULL |
| 5 | NULL |
| 6 | NULL |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_bitmap(bitmap_from_string("1")) tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_bitmap(bitmap_from_string("1,2")) tmp1 as e1 order by k1,
e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 3 | 1 |
| 3 | 2 |
| 4 | 1 |
| 4 | 2 |
| 5 | 1 |
| 5 | 2 |
| 6 | 1 |
| 6 | 2 |
+------+------+
mysql> select k1, e1 from example1 lateral view explode_bitmap(bitmap_from_string("1,1000")) tmp1 as e1 order by
k1, e1;
+------+------+
| k1 | e1 |
+------+------+
| 1 | 1 |
| 1 | 1000 |
| 2 | 1 |
| 2 | 1000 |
| 3 | 1 |
| 3 | 1000 |
| 4 | 1 |
| 4 | 1000 |
| 5 | 1 |
| 5 | 1000 |
| 6 | 1 |
| 6 | 1000 |
+------+------+
+------+------+------+
| k1 | e1 | e2 |
+------+------+------+
| 1 | 1 | a |
| 1 | 1 | b |
| 1 | 1000 | a |
| 1 | 1000 | b |
| 2 | 1 | a |
| 2 | 1 | b |
| 2 | 1000 | a |
| 2 | 1000 | b |
| 3 | 1 | a |
| 3 | 1 | b |
| 3 | 1000 | a |
| 3 | 1000 | b |
| 4 | 1 | a |
| 4 | 1 | b |
| 4 | 1000 | a |
| 4 | 1000 | b |
| 5 | 1 | a |
| 5 | 1 | b |
| 5 | 1000 | a |
| 5 | 1000 | b |
| 6 | 1 | a |
| 6 | 1 | b |
| 6 | 1000 | a |
| 6 | 1000 | b |
+------+------+------+
keywords
explode,bitmap,explode_bitmap
outer combinator
outer combinator
description
Adding the _outer suffix after the function name of the table function changes the function behavior from non-outer to
outer , and adds a row of Null data when the table function generates 0 rows of data.
example
mysql> select e1 from (select 1 k1) as t lateral view explode_numbers(0) tmp1 as e1;
Empty set
mysql> select e1 from (select 1 k1) as t lateral view explode_numbers_outer(0) tmp1 as e1;
+------+
| e1 |
+------+
| NULL |
+------+
keywords
outer
numbers
numbers
description
Table-Value-Function, generate a temporary table with only one column named 'number', row values are [0,n).
grammar:
numbers(
"number" = "n",
"backend_num" = "m"
);
parameter :
number : It means to generate rows [0, n).
backend_num : Optional parameters. It means this function is executed simultaneously on m be nodes (multiple BEs need
to be deployed).
example
mysql> select * from numbers("number" = "10");
+--------+
| number |
+--------+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+--------+
keywords
numbers
explode_numbers
explode_numbers
description
Table functions must be used in conjunction with Lateral View.
grammar:
explode_numbers(n)
example
mysql> select e1 from (select 1 k1) as t lateral view explode_numbers(5) tmp1 as e1;
+------+
| e1 |
+------+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
+------+
keywords
explode,numbers,explode_numbers
s3
S3
Name
S3
description
S3 table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like
accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file format.
grammer
s3(
"uri" = "..",
"access_key" = "...",
"secret_key" = "...",
"format" = "csv",
"keyn" = "valuen",
...
);
parameter description
uri : (required) The S3 tvf will decide whether to use the path style access method according to the use_path_style
parameter, and the default access method is the virtual-hosted style method.
access_key : (required)
secret_key : (required)
use_path_style : (optional) default false . The S3 SDK uses the virtual-hosted style by default. However, some object
storage systems may not be enabled or support virtual-hosted style access. At this time, we can add the use_path_style
parameter to force the use of path style access method.
Note: URI currently supports three SCHEMA: http: //, https: // and s3: //.
1. If you use http: // or https: //, you will decide whether to use the 'path style' to access s3 based on the 'use_path_style'
parameter
2. If you use s3: //, you will use the "virtual-hosted style' to access the s3, 'use_path_style' parameter is invalid.
For detailed use cases, you can refer to Best Practice at the bottom.
The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: Json Load
Since Version dev The following 2 parameters are used for loading in csv format
trim_double_quotes : Boolean type (optional), the default value is false . True means that the outermost double quotes
of each field in the csv file are trimmed.
skip_lines : Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled
when the format is csv_with_names or csv_with_names_and_types .
Example
Read and access csv format files on S3-compatible object storage.
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"format" = "csv",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"format" = "csv",
"use_path_style" = "true");
Keywords
s3, table-valued-function, tvf
Best Practice
Since the S3 table-valued-function does not know the table schema in advance, it will read the file first to parse out the table
schema.
"URI" = "https://endpoint/bucket/file/student.csv",
"ACCESS_KEY"= "ak",
"SECRET_KEY" = "sk",
"FORMAT" = "csv",
"use_path_style"="true");
// Note how to write your bucket of URI and set the 'use_path_style' parameter, as well as http://.
"URI" = "https://bucket.endpoint/file/student.csv",
"ACCESS_KEY"= "ak",
"SECRET_KEY" = "sk",
"FORMAT" = "csv",
"use_path_style"="false");
// The OSS on Alibaba Cloud and The COS on Tencent Cloud will use 'virtual-hosted style' to access s3.
// OSS
"URI" = "http://example-bucket.oss-cn-beijing.aliyuncs.com/your-folder/file.parquet",
"ACCESS_KEY" = "ak",
"SECRET_KEY" = "sk",
"REGION" = "oss-cn-beijing",
"FORMAT" = "parquet",
"use_path_style" = "false");
// COS
"URI" = "https://example-bucket.cos.ap-hongkong.myqcloud.com/your-folder/file.parquet",
"ACCESS_KEY" = "ak",
"SECRET_KEY" = "sk",
"REGION" = "ap-hongkong",
"FORMAT" = "parquet",
"use_path_style" = "false");
"URI" = "s3://bucket.endpoint/file/student.csv",
"ACCESS_KEY"= "ak",
"SECRET_KEY" = "sk",
"FORMAT" = "csv");
csv foramt
csv format: Read the file on S3 and process it as a csv file, read the first line in the file to parse out the table
schema. The number of columns in the first line of the file n will be used as the number of columns in the table schema, and
the column names of the table schema will be automatically named c1, c2, ..., cn , and the column type is set to String
, for example:
1,ftw,12
2,zs,18
3,ww,20
use S3 tvf
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| 1 | ftw | 12 |
| 2 | zs | 18 |
| 3 | ww | 20 |
+------+------+------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
csv_with_names foramt
csv_with_names format: The first line of the file is used as the number and name of the columns of
the table schema, and the column type is set to String , for example:
id,name,age
1,ftw,12
2,zs,18
3,ww,20
use S3 tvf
+------+------+------+
| id | name | age |
+------+------+------+
| 1 | ftw | 12 |
| 2 | zs | 18 |
| 3 | ww | 20 |
+------+------+------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
csv_with_names_and_types foramt
csv_with_names_and_types foramt: Currently, it does not support parsing the column type from a csv file. When using this
format, S3 tvf will parse the first line of the file as the number and name of the columns of the table schema, and set the
column type to String. Meanwhile, the second line of the file is ignored.
id,name,age
INT,STRING,INT
1,ftw,12
2,zs,18
3,ww,20
use S3 tvf
+------+------+------+
| id | name | age |
+------+------+------+
| 1 | ftw | 12 |
| 2 | zs | 18 |
| 3 | ww | 20 |
+------+------+------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
+-------+------+------+-------+---------+-------+
json foramt
json format: The json format involves many optional parameters, and the meaning of each parameter can be referred to:
Json Load. When S3 tvf queries the json format file, it locates a json object according to the json_root and jsonpaths
parameters, and uses the key in the object as the column name of the table schema, and sets the column type to String. For
example:
use S3 tvf:
"URI" = "http://127.0.0.1:9312/test2/data.json",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "json",
"strip_outer_array" = "true",
"read_json_by_line" = "true",
"use_path_style"="true");
+------+------+------+
| id | name | age |
+------+------+------+
| 1 | ftw | 18 |
| 2 | xxx | 17 |
| 3 | yyy | 19 |
+------+------+------+
"URI" = "http://127.0.0.1:9312/test2/data.json",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "json",
"strip_outer_array" = "true",
"use_path_style"="true");
+------+------+
| id | age |
+------+------+
| 1 | 18 |
| 2 | 17 |
| 3 | 19 |
+------+------+
parquet foramt
parquet format: S3 tvf supports parsing the column names and column types of the table schema from the parquet file.
Example:
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "parquet",
"use_path_style"="true") limit 5;
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
| 1 | goldenrod lavender spring chocolate lace | Manufacturer#1 | Brand#13 | PROMO BURNISHED COPPER |
7 | JUMBO PKG | 901 | ly. slyly ironi |
| 2 | blush thistle blue yellow saddle | Manufacturer#1 | Brand#13 | LARGE BRUSHED BRASS |
1 | LG CASE | 902 | lar accounts amo |
| 3 | spring green yellow purple cornsilk | Manufacturer#4 | Brand#42 | STANDARD POLISHED BRASS |
21 | WRAP CASE | 903 | egular deposits hag |
| 4 | cornflower chocolate smoke green pink | Manufacturer#3 | Brand#34 | SMALL PLATED BRASS |
14 | MED DRUM | 904 | p furiously r |
| 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN |
15 | SM PKG | 905 | wake carefully |
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "parquet",
"use_path_style"="true");
+---------------+--------------+------+-------+---------+-------+
+---------------+--------------+------+-------+---------+-------+
+---------------+--------------+------+-------+---------+-------+
orc foramt
"URI" = "http://127.0.0.1:9312/test2/test.snappy.orc",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"Format" = "orc",
"use_path_style"="true") limit 5;
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
| 1 | goldenrod lavender spring chocolate lace | Manufacturer#1 | Brand#13 | PROMO BURNISHED COPPER |
7 | JUMBO PKG | 901 | ly. slyly ironi |
| 2 | blush thistle blue yellow saddle | Manufacturer#1 | Brand#13 | LARGE BRUSHED BRASS |
1 | LG CASE | 902 | lar accounts amo |
| 3 | spring green yellow purple cornsilk | Manufacturer#4 | Brand#42 | STANDARD POLISHED BRASS |
21 | WRAP CASE | 903 | egular deposits hag |
| 4 | cornflower chocolate smoke green pink | Manufacturer#3 | Brand#34 | SMALL PLATED BRASS |
14 | MED DRUM | 904 | p furiously r |
| 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN |
15 | SM PKG | 905 | wake carefully |
+-----------+------------------------------------------+----------------+----------+-------------------------+---
-----+-------------+---------------+---------------------+
uri can use wildcards to read multiple files. Note: If wildcards are used, the format of each file must be consistent (especially
csv/csv_with_names/csv_with_names_and_types count as different formats), S3 tvf uses the first file to parse out the table
schema. For example:
The following two csv files:
// file1.csv
1,aaa,18
2,qqq,20
3,qwe,19
// file2.csv
5,cyx,19
6,ftw,21
"URI" = "http://127.0.0.1:9312/test2/file*.csv",
"ACCESS_KEY"= "minioadmin",
"SECRET_KEY" = "minioadmin",
"ForMAT" = "csv",
"use_path_style"="true");
+------+------+------+
| c1 | c2 | c3 |
+------+------+------+
| 1 | aaa | 18 |
| 2 | qqq | 20 |
| 3 | qwe | 19 |
| 5 | cyx | 19 |
| 6 | ftw | 21 |
+------+------+------+
id int,
name varchar(50),
age int
PROPERTIES("replication_num" = "1");
select cast (id as INT) as id, name, cast (age as INT) as age
from s3(
"uri" = "${uri}",
"ACCESS_KEY"= "${ak}",
"SECRET_KEY" = "${sk}",
"format" = "${format}",
"strip_outer_array" = "true",
"read_json_by_line" = "true",
"use_path_style" = "true");
hdfs
HDFS
Name
hdfs
Description
HDFS table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like
accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file format.
grammer
hdfs(
"uri" = "..",
"fs.defaultFS" = "...",
"hadoop.username" = "...",
"format" = "csv",
"keyn" = "valuen"
...
);
parameter description
fs.defaultFS : (required)
hadoop.security.authentication : (optional)
hadoop.username : (optional)
hadoop.kerberos.principal : (optional)
hadoop.kerberos.keytab : (optional)
dfs.client.read.shortcircuit : (optional)
dfs.domain.socket.path : (optional)
The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: Json Load
Since Version dev The following 2 parameters are used for loading in csv format
trim_double_quotes : Boolean type (optional), the default value is false . True means that the outermost double quotes
of each field in the csv file are trimmed.
skip_lines : Integer type (optional), the default value is 0. It will skip some lines in the head of csv file. It will be disabled
when the format is csv_with_names or csv_with_names_and_types .
Examples
Read and access csv format files on hdfs storage.
"uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
"fs.defaultFS" = "hdfs://127.0.0.1:8424",
"hadoop.username" = "doris",
"format" = "csv");
+------+---------+------+
| c1 | c2 | c3 |
+------+---------+------+
| 1 | alice | 18 |
| 2 | bob | 20 |
| 3 | jack | 24 |
| 4 | jackson | 19 |
| 5 | liming | 18 |
+------+---------+------+
"uri" = "hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
"fs.defaultFS" = "hdfs://127.0.0.1:8424",
"hadoop.username" = "doris",
"format" = "csv_with_names");
Keywords
hdfs, table-valued-function, tvf
Best Practice
For more detailed usage of HDFS tvf, please refer to S3 tvf, The only difference between them is the way of accessing the
storage system.
iceberg_meta
iceberg_meta
Name
Since Version 1.2
iceberg_meta
description
,
iceberg_meta table-valued-function(tvf), Use for read iceberg metadata operation history, snapshots of table, file metadata
etc.
grammer
iceberg_meta(
"table" = "ctl.db.tbl",
"query_type" = "snapshots"
...
);
parameter description
Related parameters :
table: (required) Use iceberg table name the format catlog.database.table .
query_type : (required) The type of iceberg metadata. Only snapshots is currently supported.
Example
Read and access the iceberg tabular metadata for snapshots.
Keywords
iceberg_meta, table-valued-function, tvf
Best Prac
Inspect the iceberg table snapshots :
select * from iceberg_meta("table" = "iceberg_ctl.test_db.test_tbl", "query_type" = "snapshots");
+------------------------+----------------+---------------+-----------+-------------------+
+------------------------+----------------+---------------+-----------+-------------------+
+------------------------+----------------+---------------+-----------+-------------------+
Filtered by snapshot_id :
+------------------------+----------------+---------------+-----------+-------------------+
+------------------------+----------------+---------------+-----------+-------------------+
+------------------------+----------------+---------------+-----------+-------------------+
WINDOW-FUNCTION-LAG
description
The LAG() method is used to calculate the value of the current line several lines ahead.
example
Calculate the previous day's closing price
from stock_ticker
order by closing_date;
|--------------|---------------------|---------------|-------------------|
keywords
WINDOW,FUNCTION,LAG
WINDOW-FUNCTION-SUM
description
Calculate the sum of the data in the window
example
Group by property, and calculate the sum of the x columns of the current row and the previous row within the group.
select x, property,
sum(x) over
partition by property
order by x
) as 'moving total'
|----|----------|--------------|
| 2 | even | 6 |
| 4 | even | 12 |
| 6 | even | 18 |
| 8 | even | 24 |
| 10 | even | 18 |
| 1 | odd | 4 |
| 3 | odd | 9 |
| 5 | odd | 15 |
| 7 | odd | 21 |
| 9 | odd | 16 |
keywords
WINDOW,FUNCTION,SUM
WINDOW-FUNCTION-LAST_VALUE
description
LAST_VALUE() returns the last value in the window range. Opposite of FIRST_VALUE() .
example
Using the data from the FIRST_VALUE() example:
last_value(greeting)
from mail_merge;
|---------|---------|--------------|
keywords
WINDOW,FUNCTION,LAST_VALUE
WINDOW-FUNCTION-AVG
description
Calculate the mean of the data within the window
example
Calculate the x-average of the current row and the rows before and after it
select x, property,
avg(x) over
partition by property
order by x
) as 'moving average'
|----|----------|----------------|
| 2 | even | 3 |
| 4 | even | 4 |
| 6 | even | 6 |
| 8 | even | 8 |
| 10 | even | 9 |
| 1 | odd | 2 |
| 3 | odd | 3 |
| 5 | odd | 5 |
| 7 | odd | 7 |
| 9 | odd | 8 |
keywords
WINDOW,FUNCTION,AVG
WINDOW-FUNCTION-MIN
description
The LEAD() method is used to calculate the minimum value within the window.
example
Calculate the minimum value from the first row to the row after the current row
select x, property,
min(x) over
) as 'local minimum'
|---|----------|---------------|
| 7 | prime | 5 |
| 5 | prime | 3 |
| 3 | prime | 2 |
| 2 | prime | 2 |
| 9 | square | 2 |
| 4 | square | 1 |
| 1 | square | 1 |
keywords
WINDOW,FUNCTION,MIN
WINDOW-FUNCTION-COUNT
description
Count the number of occurrences of data in the window
example
Count the number of occurrences of x from the current row to the first row.
select x, property,
count(x) over
partition by property
order by x
) as 'cumulative total'
|----|----------|------------------|
| 2 | even | 1 |
| 4 | even | 2 |
| 6 | even | 3 |
| 8 | even | 4 |
| 10 | even | 5 |
| 1 | odd | 1 |
| 3 | odd | 2 |
| 5 | odd | 3 |
| 7 | odd | 4 |
| 9 | odd | 5 |
keywords
WINDOW,FUNCTION,COUNT
WINDOW-FUNCTION
WINDOW FUNCTION
description
Analytical functions(windown function) are a special class of built-in functions. Similar to aggregate functions, analytic
functions also perform calculations on multiple input rows to obtain a data value. The difference is that the analytic function
processes the input data within a specific window, rather than grouping calculations by group by. The data within each
window can be sorted and grouped using the over() clause. The analytic function computes a single value for each row of the
result set, rather than one value per group by grouping. This flexible approach allows the user to add additional columns to
the select clause, giving the user more opportunities to reorganize and filter the result set. Analytic functions can only appear
in select lists and in the outermost order by clause. During the query process, the analytical function will take effect at the
end, that is, after the join, where and group by operations are performed. Analytical functions are often used in financial and
scientific computing to analyze trends, calculate outliers, and perform bucket analysis on large amounts of data.
order_by_clause ::= ORDER BY expr [ASC | DESC] [, expr [ASC | DESC] ...]
Function
Support Functions: AVG(), COUNT(), DENSE_RANK(), FIRST_VALUE(), LAG(), LAST_VALUE(), LEAD(), MAX(), MIN(), RANK(),
ROW_NUMBER(), SUM()
PARTITION BY clause
The Partition By clause is similar to Group By. It groups the input rows according to the specified column or columns, and
rows with the same value will be grouped together.
ORDER BY clause
The Order By clause is basically the same as the outer Order By. It defines the order in which the input rows are sorted, and if
Partition By is specified, Order By defines the order within each Partition grouping. The only difference from the outer Order
By is that the Order By n (n is a positive integer) in the OVER clause is equivalent to doing nothing, while the outer Order By n
means sorting according to the nth column.
Example:
This example shows adding an id column to the select list with values 1, 2, 3, etc., sorted by the date_and_time column in the
events table.
SELECT
FROM events;
Window clause
The Window clause is used to specify an operation range for the analytical function, the current row is the criterion, and
several rows before and after are used as the object of the analytical function operation. The methods supported by the
Window clause are: AVG(), COUNT(), FIRST_VALUE(), LAST_VALUE() and SUM(). For MAX() and MIN(), the window clause can
specify the starting range UNBOUNDED PRECEDING
syntax:
ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]
example
Suppose we have the following stock data, the stock symbol is JDR, and the closing price is the closing price of each day.
|--------------|---------------|---------------------|
This query uses the analytic function to generate the column moving_average, whose value is the 3-day average price of the
stock, that is, the three-day average price of the previous day, the current day, and the next day. The first day has no value for
the previous day, and the last day does not have the value for the next day, so these two lines only calculate the average of
the two days. Partition By does not play a role here, because all the data are JDR data, but if there is other stock information,
Partition By will ensure that the analysis function value acts within this Partition.
from stock_ticker;
|--------------|---------------------|---------------|----------------|
keywords
WINDOW,FUNCTION
WINDOW-FUNCTION-RANK
description
The RANK() function is used to represent rankings. Unlike DENSE_RANK(), RANK() will have vacancies. For example, if there
are two 1s in a row, the third number in RANK() is 3, not 2.
example
rank by x
| x | y | rank |
|----|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 1 |
| 3 | 2 | 3 |
keywords
WINDOW,FUNCTION,RANK
WINDOW-FUNCTION-DENSE_RANK
description
The DENSE_RANK() function is used to represent rankings. Unlike RANK(), DENSE_RANK() does not have vacancies. For
example, if there are two parallel 1s, the third number of DENSE_RANK() is still 2, and the third number of RANK() is 3.
example
Group by the property column to rank column x:
| x | y | rank |
|----|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
keywords
WINDOW,FUNCTION,DENSE_RANK
WINDOW-FUNCTION-MAX
description
The LEAD() method is used to calculate the maximum value within the window.
example
Calculate the maximum value from the first row to the row after the current row
select x, property,
max(x) over
order by property, x
) as 'local maximum'
|---|----------|---------------|
| 2 | prime | 3 |
| 3 | prime | 5 |
| 5 | prime | 7 |
| 7 | prime | 7 |
| 1 | square | 7 |
| 4 | square | 9 |
| 9 | square | 9 |
keywords
WINDOW,FUNCTION,MAX
WINDOW-FUNCTION-FIRST_VALUE
description
FIRST_VALUE() returns the first value in the window's range.
example
We have the following data
|---------|---------|--------------|
| John | USA | Hi |
Use FIRST_VALUE() to group by country and return the value of the first greeting in each group:
first_value(greeting)
|---------|---------|-----------|
| USA | John | Hi |
| USA | Pete | Hi |
keywords
WINDOW,FUNCTION,FIRST_VALUE
WINDOW-FUNCTION-LEAD
description
The LEAD() method is used to calculate the value of the current line several lines backwards.
example
Calculate the trend of the closing price of the second day compared with the closing price of the day, that is, the closing price
of the second day is higher or lower than that of the day.
case
(lead(closing_price,1, 0)
end as "trending"
from stock_ticker
order by closing_date;
|--------------|---------------------|---------------|---------------|
keywords
WINDOW,FUNCTION,LEAD
WINDOW-FUNCTION-ROW_NUMBER
description
Returns a continuously increasing integer starting from 1 for each row of each Partition. Unlike RANK() and DENSE_RANK(),
the value returned by ROW_NUMBER() does not repeat or appear vacant, and is continuously incremented.
example
select x, y, row_number() over(partition by x order by y) as rank from int_t;
| x | y | rank |
|---|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 2 | 1 | 1 |
| 2 | 2 | 2 |
| 2 | 3 | 3 |
| 3 | 1 | 1 |
| 3 | 1 | 2 |
| 3 | 2 | 3 |
keywords
WINDOW,FUNCTION,ROW_NUMBER
WINDOW-FUNCTION-NTILE
description
For NTILE(n), this function will divides rows in a sorted partition into a specific number of groups(in this case, n buckets).
Each group is assigned a bucket number starting at one. For the case that cannot be distributed evenly, rows are
preferentially allocated to the bucket with the smaller number. The number of rows in all buckets cannot differ by more than
1. For now, n must be constant positive integer.
example
select x, y, ntile(2) over(partition by x order by y) as ntile from int_t;
| x | y | rank |
|---|------|----------|
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 2 |
| 3 | 1 | 1 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
keywords
WINDOW,FUNCTION,NTILE
WINDOW-FUNCTION-WINDOW-FUNNEL
description
Searches the longest event chain happened in order (event1, event2, ... , eventN) along the timestamp_column with length of
window.
The function searches for data that triggers the first condition in the chain and sets the event counter to 1. This is the
moment when the sliding window starts.
If events from the chain occur sequentially within the window, the counter is incremented. If the sequence of events is
disrupted, the counter is not incremented.
If the data has multiple event chains at varying points of completion, the function will only output the size of the longest
chain.
example
CREATE TABLE windowfunnel_test (
DUPLICATE KEY(xwho)
PROPERTIES (
"replication_num" = "1"
);
INSERT into windowfunnel_test (xwho, xwhen, xwhat) values ('1', '2022-03-12 10:41:00', 1),
| level |
|---|
| 2 |
keywords
WINDOW,FUNCTION,WINDOW_FUNNEL
CAST
CAST
Description
cast (input as type)
BIGINT type
Syntax (BIGINT)
cast (input as BIGINT)
example
1. Turn constant, or a column in a table
+-------------------+
| CAST(1 AS BIGINT) |
+-------------------+
| 1 |
+-------------------+
Note: In the import, because the original type is String, when the original data with floating point value is cast, the data
will be converted to NULL, such as 12.0. Doris is currently not truncating raw data. *
If you want to force this type of raw data cast to int. Look at the following words:
+----------------------------------------+
+----------------------------------------+
| 11 |
+----------------------------------------+
keywords
CAST
Edit this page
Feedback
SQL Manual SQL Functions DIGITAL-MASKING
DIGITAL-MASKING
DIGITAL_MASKING
description
Syntax
digital_masking(digital_number)
Desensitizes the input digital_number and returns the result after masking desensitization. digital_number is BIGINT data
type.
example
+------------------------------+
| digital_masking(13812345678) |
+------------------------------+
| 138****5678 |
+------------------------------+
keywords
DIGITAL_MASKING
width_bucket
width_bucket
Description
Constructs equi-width histograms, in which the histogram range is divided into intervals of identical size, and returns the
bucket number into which the value of an expression falls, after it has been evaluated. The function returns an integer value
or null (if any input is null).
Syntax
Arguments
expr -
The expression for which the histogram is created. This expression must evaluate to a numeric value or to a value that
can be implicitly converted to a numeric value.
The low and high end points must be within the range of -(2^53 - 1) to 2^53 - 1 (inclusive). In addition, the difference
between these points must be less than 2^53 (i.e. abs(max_value - min_value) < 2^53) .
num_buckets -
The desired number of buckets; must be a positive integer value. A value from the expression is assigned to
each bucket, and the function then returns the corresponding bucket number.
Returned value
It returns the bucket number into which the value of an expression falls.
example
DROP TABLE IF EXISTS width_bucket_test;
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
PROPERTIES (
"storage_format" = "V2"
);
+------+------------+-----------+--------+
| k1 | v1 | v2 | v3 |
+------+------------+-----------+--------+
+------+------------+-----------+--------+
SELECT k1, v1, v2, v3, width_bucket(v1, date('2023-11-18'), date('2027-11-18'), 4) AS w FROM width_bucket_test
ORDER BY k1;
+------+------------+-----------+--------+------+
| k1 | v1 | v2 | v3 | w |
+------+------------+-----------+--------+------+
+------+------------+-----------+--------+------+
SELECT k1, v1, v2, v3, width_bucket(v2, 200000, 600000, 4) AS w FROM width_bucket_test ORDER BY k1;
+------+------------+-----------+--------+------+
| k1 | v1 | v2 | v3 | w |
+------+------------+-----------+--------+------+
+------+------------+-----------+--------+------+
SELECT k1, v1, v2, v3, width_bucket(v3, 200000, 600000, 4) AS w FROM width_bucket_test ORDER BY k1;
+------+------------+-----------+--------+------+
| k1 | v1 | v2 | v3 | w |
+------+------------+-----------+--------+------+
+------+------------+-----------+--------+------+
keywords
WIDTH_BUCKET
SET-PROPERTY
SET-PROPERTY
Name
SET PROPERTY
Description
Set user attributes, including resources assigned to users, importing clusters, etc.
The user attribute set here is for user, not user_identity. That is, if two users 'jack'@'%' and 'jack'@' 192.%' are created through
the CREATE USER statement, the SET PROPERTY statement can only be used for the user jack, not 'jack'@'% ' or
'jack'@' 192.%'
key:
max_query_instances: The number of instances that a user can use to execute a query at the same time.
sql_block_rules: Set sql block rules. Once set, queries sent by this user will be rejected if they match the rules.
cpu_resource_limit: Limit the cpu resources for queries. See the introduction to the session variable cpu_resource_limit for
details. -1 means not set.
exec_mem_limit: Limit the memory usage of the query. See the introduction to the session variable exec_mem_limit for
details. -1 means not set.
load_cluster.{cluster_name}.priority: Assign priority to the specified cluster, which can be HIGH or NORMAL
Note: If the attributes `cpu_resource_limit`, `exec_mem_limit` are not set, the value in the session variable
will be used by default.
load_cluster.{cluster_name}.hadoop_http_port: hadoop hdfs name node http port. Where hdfs defaults to 8070, afs defaults
to 8010.
Example
1. Modify the maximum number of user jack connections to 1000
'load_cluster.{cluster_name}.hadoop_palo_path' = '/user/doris/doris_path',
'load_cluster.{cluster_name}.hadoop_configs' =
'fs.default.name=hdfs://dpp.cluster.com:port;mapred.job.tracker=dpp.cluster.com:port;hadoop.job.ugi=user
,password;mapred.job.queue.name=job_queue_name_in_hadoop;mapred.job.priority=HIGH;';
8. Modify the number of available instances for user jack's query to 3000
Keywords
SET, PROPERTY
Best Practice
REVOKE
REVOKE
Name
REVOKE
Description
The REVOKE command is used to revoke the privileges assigned by the specified user or role.
user_identity:
The user_identity syntax here is the same as CREATE USER. And must be a user_identity created with CREATE USER. The
host in user_identity can be a domain name. If it is a domain name, the revocation time of permissions may be delayed by
about 1 minute.
It is also possible to revoke the permissions of the specified ROLE, the executed ROLE must exist.
Example
Keywords
REVOKE
Best Practice
GRANT
GRANT
Name
GRANT
Description
The GRANT command is used to grant the specified user or role specified permissions
privilege_list is a list of privileges to be granted, separated by commas. Currently Doris supports the following permissions:
NODE_PRIV: Cluster node operation permissions, including node online and offline operations. Only the root user
has this permission and cannot be granted to other users.
GRANT_PRIV: Privilege for operation privileges. Including creating and deleting users, roles, authorization and
revocation, setting passwords, etc.
Permission classification:
1. *.*.* permissions can be applied to all catalogs, all databases and all tables in them
2. ctl.*.* permissions can be applied to all databases and all tables in them
3. ctl.db.* permissions can be applied to all tables under the specified database
4. ctl.db.tbl permission can be applied to the specified table under the specified database
user_identity:
The user_identity syntax here is the same as CREATE USER. And must be a user_identity created with CREATE USER.
The host in user_identity can be a domain name. If it is a domain name, the effective time of the authority may
be delayed by about 1 minute.
You can also assign permissions to the specified ROLE, if the specified ROLE does not exist, it will be created
automatically.
Example
1. Grant permissions to all catalog and databases and tables to the user
Keywords
GRANT
Best Practice
LDAP
LDAP
Name
LDAP
Description
SET LDAP_ADMIN_PASSWORD
The SET LDAP_ADMIN_PASSWORD command is used to set the LDAP administrator password. When using LDAP
authentication, doris needs to use the administrator account and password to query the LDAP service for login user
information.
Example
1. Set the LDAP administrator password
Keywords
LDAP, PASSWORD, LDAP_ADMIN_PASSWORD
Best Practice
CREATE-ROLE
CREATE-ROLE
Name
CREATE ROLE
Description
The statement user creates a role
This statement creates an unprivileged role, which can be subsequently granted with the GRANT command.
Example
1. Create a character
Keywords
CREATE, ROLE
Best Practice
DROP-ROLE
DROP-ROLE
Description
The statement user removes a role
Deleting a role does not affect the permissions of users who previously belonged to the role. It is only equivalent to
decoupling the role from the user. The permissions that the user has obtained from the role will not change
Example
1. Drop a role1
Keywords
DROP, ROLE
Best Practice
CREATE-USER
CREATE-USER
Name
CREATE USER
Description
The CREATE USER command is used to create a Doris user.
[password_policy]
user_identity:
'user_name'@'host'
password_policy:
1. PASSWORD_HISTORY [n|DEFAULT]
3. FAILED_LOGIN_ATTEMPTS n
4. PASSWORD_LOCK_TIME [n DAY/HOUR/SECOND|UNBOUNDED]
In Doris, a user_identity uniquely identifies a user. user_identity consists of two parts, user_name and host, where username
is the username. host Identifies the host address where the client connects. The host part can use % for fuzzy matching. If no
host is specified, it defaults to '%', which means the user can connect to Doris from any host.
The host part can also be specified as a domain, the syntax is: 'user_name'@['domain'], even if it is surrounded by square
brackets, Doris will think this is a domain and try to resolve its ip address. .
If a role (ROLE) is specified, the newly created user will be automatically granted the permissions of the role. If not specified,
the user has no permissions by default. The specified ROLE must already exist.
password_policy is a clause used to specify policies related to password authentication login. Currently, the following
policies are supported:
1. PASSWORD_HISTORY
Whether to allow the current user to use historical passwords when resetting their passwords. For example,
PASSWORD_HISTORY 10 means that it is forbidden to use the password set in the past 10 times as a new password. If set to
PASSWORD_HISTORY DEFAULT , the value in the global variable password_history will be used. 0 means do not enable this
feature. Default is 0.
2. PASSWORD_EXPIRE
Set the expiration time of the current user's password. For example PASSWORD_EXPIRE INTERVAL 10 DAY means the
password will expire in 10 days. PASSWORD_EXPIRE NEVER means that the password does not expire. If set to
PASSWORD_EXPIRE DEFAULT , the value in the global variable default_password_lifetime is used. Defaults to NEVER (or 0),
which means it will not expire.
3. FAILED_LOGIN_ATTEMPTS and PASSWORD_LOCK_TIME
When the current user logs in, if the user logs in with the wrong password for n times, the account will be locked, and the
lock time is set. For example, FAILED_LOGIN_ATTEMPTS 3 PASSWORD_LOCK_TIME 1 DAY means that if you log in wrongly for 3
times, the account will be locked for one day.
A locked account can be actively unlocked through the ALTER USER statement.
Example
1. Create a passwordless user (if host is not specified, it is equivalent to jack@'%')
3. In order to avoid passing plaintext, use case 2 can also be created in the following way
4. Create a user that is allowed to log in from the ' 192.168' subnet, and specify its role as example_role
7. Create a user, set the password to expire after 10 days, and set the account to be locked for one day if you log in failed for
3 times.
8. Create a user and restrict non-resetable passwords to the last 8 passwords used.
Keywords
CREATE, USER
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference Account Management DROP-USER
DROP-USER
DROP-USER
Name
DROP USER
Description
Delete a user
`user_identity`:
user@'host'
user@['domain']
Example
Keywords
DROP, USER
Best Practice
SET-PASSWORD
SET-PASSWORD
Name
SET PASSWORD
Description
The SET PASSWORD command can be used to modify a user's login password. If the [FOR user_identity] field does not exist,
then change the current user's password
Note that the user_identity here must exactly match the user_identity specified when creating a user with CREATE USER,
otherwise an error will be reported that the user does not exist. If user_identity is not specified, the current user is
'username'@'ip', which may not match any user_identity. Current users can be viewed through SHOW GRANTS.
The plaintext password is input in the PASSWORD() method; when using a string directly, the encrypted password needs to
be passed.
To modify the passwords of other users, administrator privileges are required.
Example
1. Modify the current user's password
Keywords
SET, PASSWORD
Best Practice
ALTER-USER
ALTER USER
Name
ALTER USER
Description
The ALTER USER command is used to modify a user's account attributes, including roles, passwords, and password policies,
etc.
[password_policy]
user_identity:
'user_name'@'host'
password_policy:
1. PASSWORD_HISTORY [n|DEFAULT]
3. FAILED_LOGIN_ATTEMPTS n
4. PASSWORD_LOCK_TIME [n DAY/HOUR/SECOND|UNBOUNDED]
5. ACCOUNT_UNLOCK
In an ALTER USER command, only one of the following account attributes can be modified at the same time:
1. Change password
2. Modify the role
3. Modify PASSWORD_HISTORY
4. Modify PASSWORD_EXPIRE
5. Modify FAILED_LOGIN_ATTEMPTS and PASSWORD_LOCK_TIME
6. Unlock users
Example
4. Unlock a user
Keywords
ALTER, USER
Best Practice
If the user previously belonged to role A, when the user role is modified, all permissions corresponding to role A on the
user will be revoked first, and then all permissions corresponding to the new role will be granted.
Note that if the user has been granted a certain permission before, and role A also includes this permission, after
modifying the role, the individually granted permission will also be revoked.
for example:
Suppose roleA has the privilege: select_priv on db1.* , create user user1 and set the role to roleA.
Then give the user this privilege separately: GRANT select_priv, load_priv on db1.* to user1
roleB has the privilege alter_priv on db1.tbl1 . At this time, modify the role of user1 to B.
Then finally user1 has alter_priv on db1.tbl1 and load_priv on db1.* permissions.
ALTER-SYSTEM-DROP-FOLLOWER
ALTER-SYSTEM-DROP-FOLLOWER
Name
ALTER SYSTEM DROP FOLLOWER
Description
This statement is to delete the node of the FOLLOWER role of FRONTEND, (only for administrators!)
grammar:
illustrate:
Example
Keywords
ALTER, SYSTEM, DROP, FOLLOWER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-DECOMMISSION-BACKEND
ALTER-SYSTEM-DECOMMISSION-BACKEND
Name
ALTER SYSTEM DECOMMISSION BACKEND
Description
The node offline operation is used to safely log off the node. The operation is asynchronous. If successful, the node is
eventually removed from the metadata. If it fails, the logout will not be done (only for admins!)
grammar:
illustrate:
Example
1. Offline two nodes
Keywords
ALTER, SYSTEM, DECOMMISSION, BACKEND, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-DROP-OBSERVER
ALTER-SYSTEM-DROP-OBSERVER
Name
ALTER SYSTEM DROP OBSERVER
Description
This statement is to delete the node of the OBSERVER role of FRONTEND, (only for administrators!)
grammar:
illustrate:
Example
Keywords
ALTER, SYSTEM, DROP, OBSERVER, ALTER SYSTEM
Best Practice
CANCEL-ALTER-SYSTEM
CANCEL-ALTER-SYSTEM
Name
CANCEL DECOMMISSION
Description
This statement is used to undo a node offline operation. (Administrator only!)
grammar:
Example
Keywords
CANCEL, DECOMMISSION, CANCEL ALTER
Best Practice
ALTER-SYSTEM-DROP-BROKER
ALTER-SYSTEM-DROP-BROKER
Name
ALTER SYSTEM DROP BROKER
Description
This statement is to delete the BROKER node, (administrator only)
grammar:
Example
Keywords
ALTER, SYSTEM, DROP, FOLLOWER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-ADD-OBSERVER
ALTER-SYSTEM-ADD-OBSERVER
Name
ALTER SYSTEM ADD OBSERVER
Description
This statement is to increase the node of the OBSERVER role of FRONTEND, (only for administrators!)
grammar:
illustrate:
Example
Keywords
ALTER, SYSTEM, ADD, OBSERVER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-MODIFY-BACKEND
ALTER-SYSTEM-MODIFY-BACKEND
Name
ALTER SYSTEM MKDIFY BACKEND
Description
Modify BE node properties (administrator only!)
grammar:
illustrate:
Note:
1. A backend can be set multi resource tags. But must contain "tag.location" type.
Example
ALTER SYSTEM MODIFY BACKEND "host1:heartbeat_port" SET ("tag.location" = "group_a", "tag.compute" = "c1");
Keywords
ALTER, SYSTEM, ADD, BACKEND, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-ADD-FOLLOWER
ALTER-SYSTEM-ADD-FOLLOWER
Name
ALTER SYSTEM ADD FOLLOWER
Description
This statement is to increase the node of the FOLLOWER role of FRONTEND, (only for administrators!)
grammar:
illustrate:
Example
Keywords
ALTER, SYSTEM, ADD, FOLLOWER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-MODIFY-BROKER
ALTER-SYSTEM-MODIFY-BROKER
Description
Example
Keywords
ALTER, SYSTEM, MODIFY, BROKER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-ADD-BROKER
ALTER-SYSTEM-ADD-BROKER
Name
ALTER SYSTEM ADD BROKER
Description
This statement is used to add a BROKER node. (Administrator only!)
grammar:
Example
Keywords
ALTER, SYSTEM, ADD, FOLLOWER, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-ADD-BACKEND
ALTER-SYSTEM-ADD-BACKEND
Name
ALTER SYSTEM ADD BACKEND
Description
This statement is used to manipulate nodes within a system. (Administrator only!)
grammar:
-- Add nodes (add this method if you do not use the multi-tenancy function)
illustrate:
Example
1. Add a node
Keywords
ALTER, SYSTEM, ADD, BACKEND, ALTER SYSTEM
Best Practice
ALTER-SYSTEM-DROP-BACKEND
ALTER-SYSTEM-DROP-BACKEND
Name
ALTER SYSTEM DROP BACKEND
Description
This statement is used to delete the BACKEND node (administrator only!)
grammar:
illustrate:
Example
Keywords
ALTER, SYSTEM, DROP, BACKEND, ALTER SYSTEM
Best Practice
ALTER-CATALOG
ALTER-CATALOG
Name
Since Version 1.2
ALTER CATALOG
Description
This statement is used to set properties of the specified catalog. (administrator only)
illustrate:
Update values of specified keys. If a key does not exist in the catalog properties, it will be added.
illustrate:
Example
1. rename catalog ctlg_hive to hive
Keywords
ALTER,CATALOG,RENAME,PROPERTY
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference DDL Alter ALTER-DATABASE
ALTER-DATABASE
ALTER-DATABASE
Name
ALTER DATABASE
Description
This statement is used to set properties of the specified database. (administrator only)
illustrate:
After renaming the database, use the REVOKE and GRANT commands to modify the appropriate user permissions,
if necessary.
The default data quota for the database is 1024GB, and the default replica quota is 1073741824.
Example
1. Set the specified database data volume quota
3. Set the quota for the number of copies of the specified database
Keywords
ALTER,DATABASE,RENAME
Best Practice
ALTER-TABLE-BITMAP
ALTER-TABLE-BITMAP
Name
ALTER TABLE BITMAP
Description
This statement is used to perform a bitmap index operation on an existing table.
grammar:
Syntax:
ADD INDEX [IF NOT EXISTS] index_name (column [, ...],) [USING BITMAP] [COMMENT 'balabala'];
Notice:
Syntax:
Example
1. Create a bitmap index for siteid on table1
ALTER TABLE table1 ADD INDEX [IF NOT EXISTS] index_name (siteid) [USING BITMAP] COMMENT 'balabala';
Keywords
ALTER, TABLE, BITMAP, INDEX, ALTER TABLE
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference DDL Alter ALTER-TABLE-PARTITION
ALTER-TABLE-PARTITION
ALTER-TABLE-PARTITION
Name
ALTER TABLE PARTITION
Description
This statement is used to modify the partition of an existing table.
This operation is synchronous, and the return of the command indicates the completion of the execution.
grammar:
1. Add partition
grammar:
partition_desc ["key"="value"]
Notice:
grammar:
DROP PARTITION [IF EXISTS] partition_name [FORCE]
Notice:
grammar:
illustrate:
Example
1. Add partition, existing partition [MIN, 2013-01-01), add partition [2013-01-01, 2014-01-01), use default bucketing method
("replication_num"="1");
7. Delete partition
Keywords
ALTER, TABLE, PARTITION, ALTER TABLE
Best Practice
ALTER-TABLE-COLUMN
ALTER-TABLE-COLUMN
Name
ALTER TABLE COLUMN
Description
This statement is used to perform a schema change operation on an existing table. The schema change is asynchronous, and
the task is returned when the task is submitted successfully. After that, you can use the SHOW ALTER TABLE COLUMN
command to view the progress.
grammar:
grammar:
[AFTER column_name|FIRST]
[TO rollup_index_name]
Notice:
If you add a value column to the aggregation model, you need to specify agg_type
For non-aggregated models (such as DUPLICATE KEY), if you add a key column, you need to specify the KEY keyword
You cannot add columns that already exist in the base index to the rollup index (you can recreate a rollup index if
necessary)
grammar:
[TO rollup_index_name]
Notice:
If you add a value column to the aggregation model, you need to specify agg_type
If you add a key column to the aggregation model, you need to specify the KEY keyword
You cannot add columns that already exist in the base index to the rollup index (you can recreate a rollup index if
necessary)
3. Delete a column from the specified index
grammar:
[FROM rollup_index_name]
Notice:
4. Modify the column type and column position of the specified index
grammar:
MODIFY COLUMN column_name column_type [KEY | agg_type] [NULL | NOT NULL] [DEFAULT "default_value"]
[AFTER column_name|FIRST]
[FROM rollup_index_name]
Notice:
If you modify the value column in the aggregation model, you need to specify agg_type
If you modify the key column for non-aggregate types, you need to specify the KEY keyword
Only the type of the column can be modified, and other attributes of the column remain as they are (that is, other
attributes need to be explicitly written in the statement according to the original attributes, see example 8)
Partitioning and bucketing columns cannot be modified in any way
The following types of conversions are currently supported (loss of precision is guaranteed by the user)
Conversion of TINYINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE types to larger numeric types
Convert TINTINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE/DECIMAL to VARCHAR
VARCHAR supports modifying the maximum length
VARCHAR/CHAR converted to TINTINT/SMALLINT/INT/BIGINT/LARGEINT/FLOAT/DOUBLE
Convert VARCHAR/CHAR to DATE (currently supports "%Y-%m-%d", "%y-%m-%d", "%Y%m%d", "%y%m%d",
"%Y/%m/%d, "%y/%m/%d" six formats)
Convert DATETIME to DATE (only keep year-month-day information, for example: 2019-12-09 21:47:05 <--> 2019-
12-09 )
DATE is converted to DATETIME (hours, minutes and seconds are automatically filled with zeros, for example: 2019-
12-09 <--> 2019-12-09 00:00:00 )
grammar:
[FROM rollup_index_name]
Notice:
Example
1. Add a key column new_col after col1 of example_rollup_index (non-aggregated model)
TO example_rollup_index;
4. Add a value column new_col SUM aggregation type (aggregation model) after col1 of example_rollup_index
TO example_rollup_index;
ADD COLUMN (col1 INT DEFAULT "1", col2 FLOAT SUM DEFAULT "2.3")
TO example_rollup_index;
FROM example_rollup_index;
7. Modify the type of the key column col1 of the base index to BIGINT and move it to the back of the col2 column.
Note: Whether you modify the key column or the value column, you need to declare complete column information
8. Modify the maximum length of the val1 column of base index. The original val1 is (val1 VARCHAR(32) REPLACE DEFAULT
"abc")
9. Reorder the columns in example_rollup_index (set the original column order as: k1,k2,k3,v1,v2)
ORDER BY (k3,k1,k2,v2,v1)
FROM example_rollup_index;
11. Modify the length of a field in the Key column of the Duplicate key table
alter table example_tbl modify column k3 varchar(50) key null comment 'to 50'
Keywords
ALTER, TABLE, COLUMN, ALTER TABLE
Best Practice
ALTER-RESOURCE
ALTER-RESOURCE
Name
ALTER RESOURCE
Description
This statement is used to modify an existing resource. Only the root or admin user can modify resources.
Syntax:
Example
1. Modify the working directory of the Spark resource named spark0:
Support
AWS_MAX_CONNECTIONS : default 50
AWS_SECRET_KEY : s3 sk
AWS_ACCESS_KEY : s3 ak
Not Support
AWS_REGION
AWS_BUCKET
AWS_ROOT_PATH
AWS_ENDPOINT
Keywords
ALTER, RESOURCE
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference DDL Alter CANCEL-ALTER-TABLE
CANCEL-ALTER-TABLE
CANCEL-ALTER-TABLE
Name
CANCEL ALTER TABLE
Description
This statement is used to undo an ALTER operation.
grammar:
FROM db_name.table_name
grammar:
FROM db_name.table_name
grammar:
Notice:
This command is an asynchronous operation. You need to use show alter table rollup to check the task status to
confirm whether the execution is successful or not.
grammar:
(To be implemented...)
Example
FROM example_db.my_table;
FROM example_db.my_table;
1. Undo the ADD ROLLUP operation under my_table according to the job id.
FROM example_db.my_table(12801,12802);
Keywords
CANCEL, ALTER, TABLE, CANCEL ALTER
Best Practice
ALTER-TABLE-COMMENT
ALTER-TABLE-COMMENT
Name
ALTER TABLE COMMENT
Description
This statement is used to modify the comment of an existing table. The operation is synchronous, and the command returns
to indicate completion.
grammar :
ALTER TABLE [database.]table alter_clause;
grammar :
MODIFY COMMENT "new table comment";
grammar :
MODIFY COLUMN col1 COMMENT "new column comment";
Example
1. Change the table1's comment to table1_comment
Keywords
ALTER, TABLE, COMMENT, ALTER TABLE
Best Practice
ALTER-VIEW
ALTER-VIEW
Name
ALTER VIEW
Description
This statement is used to modify the definition of a view
grammar:
ALTER VIEW
[db_name.]view_name
AS query_stmt
illustrate:
Views are all logical, and the data in them will not be stored on physical media. When querying, the view will be used as a
subquery in the statement. Therefore, modifying the definition of the view is equivalent to modifying query_stmt.
query_stmt is any supported SQL
Example
1. Modify the view example_view on example_db
GROUP BY k1, k2
Keywords
ALTER, VIEW
Best Practice
ALTER-SQL-BLOCK-RULE
ALTER-SQL-BLOCK-RULE
Name
ALTER SQL BLOCK RULE
Description
Modify SQL blocking rules to allow modification of each item such as
sql/sqlHash/partition_num/tablet_num/cardinality/global/enable.
grammar:
illustrate:
sql and sqlHash cannot be set at the same time. This means that if a rule sets sql or sqlHash, the other attribute cannot be
modified;
sql/sqlHash and partition_num/tablet_num/cardinality cannot be set at the same time. For example, if a rule sets
partition_num, then sql or sqlHash cannot be modified;
Example
1. Modify according to SQL properties
Keywords
ALTER,SQL_BLOCK_RULE
Best Practice
ALTER-TABLE-REPLACE
ALTER-TABLE-REPLACE
Name
ALTER TABLE REPLACE
Description
Atomic substitution of two tables. This operation applies only to OLAP tables.
[PROPERTIES('swap' = 'true')];
If the swap parameter is true , the data in the table named tbl1 will be the data in the original table named tbl2 after the
replacement. The data in the table named tbl2 is the data in the original tbl1 table. That is, two tables of data have been
swapped.
If the swap parameter is false , the data in the tbl1 table will be the data in the tbl2 table after the replacement. The table
named tbl2 is deleted.
Theory
The replace table function actually turns the following set of operations into an atomic operation.
If you want to replace table A with table B and swap is true , do the following:
1. Delete table A.
2. Rename table B as table A.
Notice
1. The default swap parameter is true . That is, a table replacement operation is equivalent to an exchange of data between
two tables.
2. If the swap parameter is set to false, the replaced table (table A) will be deleted and cannot be restored.
3. The replacement operation can only occur between two OLAP tables and does not check whether the table structure of
the two tables is consistent.
4. The original permission Settings are not changed. Because the permission check is based on the table name.
Example
1. Swap tbl1 with tbl2 without deleting the tbl1 table
[PROPERTIES('swap' = 'true')];
Keywords
ALTER, TABLE, REPLACE, ALTER TABLE
Best Practice
In some cases, the user wants to be able to rewrite the data of a certain table, but if the data is deleted first and then
imported, the data cannot be viewed for a period of time in between. At this time, the user can first use the CREATE TABLE
LIKE statement to create a new table with the same structure, import the new data into the new table, and use the
replacement operation to atomically replace the old table to achieve the goal. Atomic overwrite write operations at the
partition level, see temp partition documentation.
ALTER-TABLE-PROPERTY
ALTER-TABLE-PROPERTY
Name
ALTER TABLE PROPERTY
Description
This statement is used to modify the properties of an existing table. This operation is synchronous, and the return of the
command indicates the completion of the execution.
Modify the properties of the table, currently supports modifying the bloom filter column, the colocate_with attribute and the
dynamic_partition attribute, the replication_num and default.replication_num.
grammar:
Note:
Can also be merged into the above schema change operation to modify, see the example below
Can also be incorporated into the schema change operation above (note that the syntax for multiple clauses is slightly
different)
PROPERTIES ("bloom_filter_columns"="k1,k2,k3");
3. Change the bucketing method of the table from Hash Distribution to Random Distribution
4. Modify the dynamic partition attribute of the table (support adding dynamic partition attribute to the table without
dynamic partition attribute)
If you need to add dynamic partition attributes to tables without dynamic partition attributes, you need to specify all
dynamic partition attributes
(Note: adding dynamic partition attributes is not supported for non-partitioned tables)
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
);
Note:
7. Enable the function of ensuring the import order according to the value of the sequence column
"function_column.sequence_type" = "Date"
);
Note:
Note:
Only support non colocate table with RANGE partition and HASH distribution
ALTER TABLE example_db.my_table MODIFY COLUMN k1 COMMENT "k1", MODIFY COLUMN k2 COMMENT "k2";
Note:
1. The property with the default prefix indicates the default replica distribution for the modified table. This modification
does not modify the current actual replica distribution of the table, but only affects the replica distribution of newly
created partitions on the partitioned table.
2. For non-partitioned tables, modifying the replica distribution property without the default prefix will modify both the
default replica distribution and the actual replica distribution of the table. That is, after the modification, through the
show create table and show partitions from tbl statements, you can see that the replica distribution has been
modified.
changed.
3. For partitioned tables, the actual replica distribution of the table is at the partition level, that is, each partition has its own
replica distribution, which can be viewed through the show partitions from tbl statement. If you want to modify the
actual replica distribution, see ALTER TABLE PARTITION .
Example
1. Modify the bloom filter column of the table
Can also be incorporated into the schema change operation above (note that the syntax for multiple clauses is slightly
different)
PROPERTIES ("bloom_filter_columns"="k1,k2,k3");
3. Change the bucketing method of the table from Hash Distribution to Random Distribution
4. Modify the dynamic partition attribute of the table (support adding dynamic partition attribute to the table without
dynamic partition attribute)
If you need to add dynamic partition attributes to tables without dynamic partition attributes, you need to specify all
dynamic partition attributes
(Note: adding dynamic partition attributes is not supported for non-partitioned tables)
7. Enable the function of ensuring the import order according to the value of the sequence column
ALTER TABLE example_db.my_table MODIFY COLUMN k1 COMMENT "k1", MODIFY COLUMN k2 COMMENT "k2";
12. Add a cold and hot separation data migration strategy to the table
:
NOTE The table can be successfully added only if it hasn't been associated with a storage policy. A table just can have one
storage policy.
13. Add a hot and cold data migration strategy to the partition of the table
:
NOTE The table's partition can be successfully added only if it hasn't been associated with a storage policy. A table just can
have one storage policy.
Keywords
ALTER, TABLE, PROPERTY, ALTER TABLE
Best Practice
ALTER-TABLE-ROLLUP
ALTER-TABLE-ROLLUP
Name
ALTER TABLE ROLLUP
Description
This statement is used to perform a rollup modification operation on an existing table. The rollup is an asynchronous
operation, and the task is returned when the task is submitted successfully. After that, you can use the SHOW ALTER
command to view the progress.
grammar:
grammar:
[FROM from_index_name]
[PROPERTIES ("key"="value", ...)]
properties: Support setting timeout time, the default timeout time is 1 day.
grammar:
[FROM from_index_name]
Notice:
grammar:
Notice:
Example
1. Create index: example_rollup_index, based on base index (k1,k2,k3,v1,v2). Columnar storage.
FROM example_rollup_index;
3. Create index: example_rollup_index3, based on base index (k1,k2,k3,v1), with a custom rollup timeout of one hour.
PROPERTIES("timeout" = "3600");
4. Keywords
Best Practice
ALTER-TABLE-RENAME
ALTER-TABLE-RENAME
Name
ALTER TABLE RENAME
Description
This statement is used to rename certain names of existing table properties. This operation is synchronous, and the return of
the command indicates the completion of the execution.
grammar:
grammar:
RENAME new_table_name;
grammar:
grammar:
grammar:
Notice:
Currently only tables with column unique id are supported, which are created with property 'light_schema_change'.
Example
1. Modify the table named table1 to table2
2. Modify the rollup index named rollup1 in the table example_table to rollup2
Keywords
ALTER, TABLE, RENAME, ALTER TABLE
Best Practice
ALTER-POLICY
ALTER-POLICY
Name
ALTER STORAGE POLICY
Description
This statement is used to modify an existing cold and hot separation migration strategy. Only root or admin users can modify
resources.
Example
1. Modify the name to coolown_datetime Cold and hot separation data migration time point:
2. Modify the name to coolown_countdown of hot and cold separation data migration of ttl
Keywords
ALTER, STORAGE, POLICY
Best Practice
RESTORE
RESTORE
Name
RESTORE
Description
This statement is used to restore the data backed up by the BACKUP command to the specified database. This command is
an asynchronous operation. After the submission is successful, you need to check the progress through the SHOW RESTORE
command. Restoring tables of type OLAP is only supported.
grammar:
FROM `repository_name`
[ON|EXCLUDE] (
...
illustrate:
There can only be one executing BACKUP or RESTORE task under the same database.
The tables and partitions that need to be restored are identified in the ON clause. If no partition is specified, all partitions
of the table are restored by default. The specified table and partition must already exist in the warehouse backup.
Tables and partitions that do not require recovery are identified in the EXCLUDE clause. All partitions of all other tables in
the warehouse except the specified table or partition will be restored.
The table name backed up in the warehouse can be restored to a new table through the AS statement. But the new table
name cannot already exist in the database. The partition name cannot be modified.
You can restore the backed up tables in the warehouse to replace the existing tables of the same name in the database,
but you must ensure that the table structures of the two tables are exactly the same. The table structure includes: table
name, column, partition, Rollup, etc.
You can specify some partitions of the recovery table, and the system will check whether the partition Range or List can
match.
PROPERTIES currently supports the following properties:
"backup_timestamp" = "2018-05-04-16-45-08": Specifies which time version of the corresponding backup to restore,
required. This information can be obtained with the SHOW SNAPSHOT ON repo; statement.
"replication_num" = "3": Specifies the number of replicas for the restored table or partition. Default is 3. If restoring an
existing table or partition, the number of replicas must be the same as the number of replicas of the existing table or
partition. At the same time, there must be enough hosts to accommodate multiple replicas.
"reserve_replica" = "true": Default is false. When this property is true, the replication_num property is ignored and the
restored table or partition will have the same number of replication as before the backup. Supports multiple tables or
multiple partitions within a table with different replication number.
"reserve_dynamic_partition_enable" = "true": Default is false. When this property is true, the restored table will have
the same value of 'dynamic_partition_enable' as before the backup. if this property is not true, the restored table will
set 'dynamic_partition_enable=false'.
"timeout" = "3600": The task timeout period, the default is one day. in seconds.
"meta_version" = 40: Use the specified meta_version to read the previously backed up metadata. Note that this
parameter is used as a temporary solution and is only used to restore the data backed up by the old version of Doris.
The latest version of the backup data already contains the meta version, no need to specify it.
Example
1. Restore the table backup_tbl in backup snapshot_1 from example_repo to database example_db1, the time version is
"2018-05-04-16-45-08". Revert to 1 copy:
FROM `example_repo`
ON ( `backup_tbl` )
PROPERTIES
"backup_timestamp"="2018-05-04-16-45-08",
"replication_num" = "1"
);
2. Restore partitions p1, p2 of table backup_tbl in backup snapshot_2 from example_repo, and table backup_tbl2 to
database example_db1, rename it to new_tbl, and the time version is "2018-05-04-17-11-01". The default reverts to 3
replicas:
FROM `example_repo`
ON
`backup_tbl2` AS `new_tbl`
PROPERTIES
"backup_timestamp"="2018-05-04-17-11-01"
);
3. Restore all tables except for table backup_tbl in backup snapshot_3 from example_repo to database example_db1, the
time version is "2018-05-04-18-12-18".
FROM `example_repo`
EXCLUDE ( `backup_tbl` )
PROPERTIES
"backup_timestamp"="2018-05-04-18-12-18"
);
Keywords
RESTORE
Best Practice
1. There can only be one ongoing recovery operation under the same database.
2. The table backed up in the warehouse can be restored and replaced with the existing table of the same name in the
database, but the table structure of the two tables must be completely consistent. The table structure includes: table
name, columns, partitions, materialized views, and so on.
3. When specifying a partial partition of the recovery table, the system will check whether the partition range can match.
In the case of the same cluster size, the time-consuming of the restore operation is basically the same as the time-
consuming of the backup operation. If you want to speed up the recovery operation, you can first restore only one copy
by setting the replication_num parameter, and then adjust the number of copies by ALTER TABLE PROPERTY, complete
the copy.
DROP-REPOSITORY
DROP-REPOSITORY
Name
DROP REPOSITORY
Description
This statement is used to delete a created repository. Only root or superuser users can delete repositories.
grammar:
illustrate:
Deleting a warehouse just deletes the warehouse's mapping in Palo, not the actual warehouse data. Once deleted, it can
be mapped to the repository again by specifying the same broker and LOCATION.
Example
1. Delete the repository named bos_repo:
Keywords
DROP, REPOSITORY
Best Practice
CANCEL-RESTORE
CANCEL-RESTORE
Name
CANCEL RESTORE
Description
This statement is used to cancel an ongoing RESTORE task.
grammar:
Notice:
When cancellation is around a COMMIT or later stage of recovery, the table being recovered may be rendered
inaccessible. At this time, data recovery can only be performed by executing the recovery job again.
Example
1. Cancel the RESTORE task under example_db.
Keywords
CANCEL, RESTORE
Best Practice
BACKUP
BACKUP
Name
BACKUP
Description
This statement is used to back up the data under the specified database. This command is an asynchronous operation. After
the submission is successful, you need to check the progress through the SHOW BACKUP command. Only backing up tables
of type OLAP is supported.
grammar:
TO `repository_name`
[ON|EXCLUDE] (
...
illustrate:
There can only be one executing BACKUP or RESTORE task under the same database.
The ON clause identifies the tables and partitions that need to be backed up. If no partition is specified, all partitions of
the table are backed up by default
Tables and partitions that do not require backup are identified in the EXCLUDE clause. Back up all partition data for all
tables in this database except the specified table or partition.
PROPERTIES currently supports the following properties:
"type" = "full": indicates that this is a full update (default)
"timeout" = "3600": The task timeout period, the default is one day. in seconds.
Example
1. Fully backup the table example_tbl under example_db to the warehouse example_repo:
TO example_repo
ON (example_tbl)
2. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the
warehouse example_repo:
TO example_repo
ON
example_tbl2
);
3. Full backup of all tables except table example_tbl under example_db to warehouse example_repo:
TO example_repo
EXCLUDE (example_tbl);
4. Create a warehouse named hdfs_repo, rely on Baidu hdfs broker "hdfs_broker", the data root directory is: hdfs: //hadoop-
name-node:54310/path/to/repo/
ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
PROPERTIES
"username" = "user",
"password" = "password"
);
5. Create a repository named s3_repo to link cloud storage directly without going through the broker.
WITH S3
ON LOCATION "s3://s3-repo"
PROPERTIES
"AWS_ENDPOINT" = "http://s3-REGION.amazonaws.com",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "REGION"
);
6. Create a repository named hdfs_repo to link HDFS directly without going through the broker.
WITH hdfs
ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
PROPERTIES
"fs.defaultFS"="hdfs://hadoop-name-node:54310",
"hadoop.username"="user"
);
7. Create a repository named minio_repo to link minio storage directly through the s3 protocol.
WITH S3
ON LOCATION "s3://minio_repo"
PROPERTIES
"AWS_ENDPOINT" = "http://minio.com",
"AWS_ACCESS_KEY" = "MINIO_USER",
"AWS_SECRET_KEY"="MINIO_PASSWORD",
"AWS_REGION" = "REGION",
"use_path_style" = "true"
);
Keywords
BACKUP
Best Practice
1. Only one backup operation can be performed under the same database.
2. The backup operation will back up the underlying table and materialized view of the specified table or partition, and only
one copy will be backed up.
The efficiency of backup operations depends on the amount of data, the number of Compute Nodes, and the number of
files. Each Compute Node where the backup data shard is located will participate in the upload phase of the backup
operation. The greater the number of nodes, the higher the upload efficiency.
The amount of file data refers only to the number of shards, and the number of files in each shard. If there are many
shards, or there are many small files in the shards, the backup operation time may be increased.
CANCEL-BACKUP
CANCEL-BACKUP
Name
CANCEL BACKUP
Description
This statement is used to cancel an ongoing BACKUP task.
grammar:
Example
1. Cancel the BACKUP task under example_db.
Keywords
CANCEL, BACKUP
Best Practice
CREATE-REPOSITORY
CREATE-REPOSITORY
Name
CREATE REPOSITORY
Description
This statement is used to create a repository. Repositories are used for backup or restore. Only root or superuser users can
create repositories.
grammar:
ON LOCATION `repo_location`
illustrate:
Creation of repositories, relying on existing brokers or accessing cloud storage directly through AWS s3 protocol, or
accessing HDFS directly.
If it is a read-only repository, restores can only be done on the repository. If not, backup and restore operations are
available.
PROPERTIES are different according to different types of broker or S3 or hdfs, see the example for details.
ON LOCATION : if it is S3 , here followed by the Bucket Name.
Example
1. Create a warehouse named bos_repo, rely on BOS broker "bos_broker", and the data root directory is: bos: //palo_backup
ON LOCATION "bos://palo_backup"
PROPERTIES
"bos_endpoint" = "http://gz.bcebos.com",
"bos_accesskey" = "bos_accesskey",
"bos_secret_accesskey"="bos_secret_accesskey"
);
ON LOCATION "bos://palo_backup"
PROPERTIES
"bos_endpoint" = "http://gz.bcebos.com",
"bos_accesskey" = "bos_accesskey",
"bos_secret_accesskey"="bos_accesskey"
);
3. Create a warehouse named hdfs_repo, rely on Baidu hdfs broker "hdfs_broker", the data root directory is: hdfs: //hadoop-
name-node:54310/path/to/repo/
ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
PROPERTIES
"username" = "user",
"password" = "password"
);
4. Create a repository named s3_repo to link cloud storage directly without going through the broker.
WITH S3
ON LOCATION "s3://s3-repo"
PROPERTIES
"AWS_ENDPOINT" = "http://s3-REGION.amazonaws.com",
"AWS_ACCESS_KEY" = "AWS_ACCESS_KEY",
"AWS_SECRET_KEY"="AWS_SECRET_KEY",
"AWS_REGION" = "REGION"
);
5. Create a repository named hdfs_repo to link HDFS directly without going through the broker.
WITH hdfs
ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
PROPERTIES
"fs.defaultFS"="hdfs://hadoop-name-node:54310",
"hadoop.username"="user"
);
### Keywords
6. Create a repository named minio_repo to link minio storage directly through the s3 protocol.
WITH S3
ON LOCATION "s3://minio_repo"
PROPERTIES
"AWS_ENDPOINT" = "http://minio.com",
"AWS_ACCESS_KEY" = "MINIO_USER",
"AWS_SECRET_KEY"="MINIO_PASSWORD",
"AWS_REGION" = "REGION",
"use_path_style" = "true"
);
WITH S3
ON LOCATION "s3://minio_repo"
PROPERTIES
"AWS_ENDPOINT" = "AWS_ENDPOINT",
"AWS_ACCESS_KEY" = "AWS_TEMP_ACCESS_KEY",
"AWS_SECRET_KEY" = "AWS_TEMP_SECRET_KEY",
"AWS_TOKEN" = "AWS_TEMP_TOKEN",
"AWS_REGION" = "AWS_REGION"
Keywords
CREATE, REPOSITORY
Best Practice
1. A cluster can create multiple warehouses. Only users with ADMIN privileges can create repositories.
2. Any user can view the created repositories through the SHOW REPOSITORIES command.
3. When performing data migration operations, it is necessary to create the exact same warehouse in the source cluster
and the destination cluster, so that the destination cluster can view the data snapshots backed up by the source cluster
through this warehouse.
CREATE-ENCRYPT-KEY
CREATE-ENCRYPT-KEY
Name
CREATE ENCRYPTKEY
Description
This statement creates a custom key. Executing this command requires the user to have ADMIN privileges.
grammar:
illustrate:
key_name : The name of the key to be created, may contain the name of the database. For example: db1.my_key .
If key_name contains the database name, then the custom key will be created in the corresponding database, otherwise this
function will create the database in the current session. The name of the new key cannot be the same as the existing key in
the corresponding database, otherwise the creation will fail.
Example
To use a custom key, you need to add the keyword KEY / key before the key, separated from the key_name space.
+------------------------------------------------+
+------------------------------------------------+
| D26DB38579D6A343350EDDC6F2AD47C6 |
+------------------------------------------------+
+------------------------------------------------- -------------------+
+------------------------------------------------- -------------------+
| Doris is Great |
+------------------------------------------------- -------------------+
Keywords
CREATE, ENCRYPTKEY
Best Practice
CREATE-TABLE-AS-SELECT
CREATE-TABLE-AS-SELECT
Name
CREATE TABLE AS SELECT
Description
This statement creates the table structure by returning the results from the Select statement and imports the data at the
same time
grammar :
CREATE TABLE table_name [( column_name_list )]
opt_engine:engineName
opt_keys:keys
opt_comment:tableComment
opt_partition:partition
opt_distribution:distribution
opt_rollup:index
opt_properties:tblProperties
opt_ext_properties:extProperties
KW_AS query_stmt:query_def
illustrate:
The user needs to have SELECT permission for the source table and CREATE permission for the target database
After a table is created, data is imported. If the import fails, the table is deleted
You can specify the key type. The default key type is Duplicate Key
Example
PROPERTIES(\"replication_num\" = \"1\")
PROPERTIES(\"replication_num\" = \"1\")
`test`.`join_table` jt on vt.userId=jt.userId
Keywords
Best Practice
CREATE-POLICY
CREATE-POLICY
Name
CREATE POLICY
Description
Create policies,such as:
1. Create security policies(ROW POLICY) and explain to view the rewritten SQL.
2. Create storage migration policy(STORAGE POLICY), used for cold and hot data transform
Grammar:
1. ROW POLICY
:
illustrate
:
filterType It is usual to constrict a set of policies through AND. PERMISSIVE to constrict a set of policies through OR
Configure multiple policies. First, merge the RESTRICTIVE policy with the PERMISSIVE policy
2. STORAGE POLICY
:
illustrate
:
iii. cooldown_ttl hot data stay time. The time cost between the time of tablet created and
the time of migrated to cold
: : :
data, formatted as
1d 1 day
1h 1 hour
50000: 50000 second
Example
1. Create a set of row security policies
select * from (select * from table1 where c1 = 'a' and c2 = 'b' or c3 = 'c' or c4 = 'd')
i. NOTE
To create a cold hot separation policy, you must first create a resource, and then associate the created resource
name when creating a migration policy
Currently, the drop data migration policy is not supported to prevent data from being migrated. If the policy has
been deleted, then the system cannot retrieve the data
ii. Create policy on cooldown_datetime
CREATE STORAGE POLICY testPolicy
PROPERTIES(
"storage_resource" = "s3",
);
PROPERTIES(
"storage_resource" = "s3",
"cooldown_ttl" = "1d"
);
Keywords
CREATE, POLICY
Best Practice
CREATE-VIEW
CREATE-VIEW
Name
CREATE VIEW
Description
This statement is used to create a logical view
grammar:
[db_name.]view_name
AS query_stmt
illustrate:
Views are logical views and have no physical storage. All queries on the view are equivalent to the sub-queries
corresponding to the view.
query_stmt is any supported SQL
Example
AS
AS
Keywords
CREATE, VIEW
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference DDL Create CREATE-DATABASE
CREATE-DATABASE
CREATE-DATABASE
Name
CREATE DATABASE
Description
This statement is used to create a new database (database)
grammar:
PROPERTIES
Additional information about the database, which can be defaulted.
If you create an Iceberg database, you need to provide the following information in properties:
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://127.0.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
illustrate:
Example
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.hive.metastore.uris" = "thrift://127.0.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
Keywords
CREATE, DATABASE
Best Practice
CREATE-FILE
CREATE-FILE
Name
CREATE FILE
Description
This statement is used to create and upload a file to the Doris cluster.
This function is usually used to manage files that need
to be used in some other commands, such as certificates, public and private keys, and so on.
grammar:
[properties]
illustrate:
database: The file belongs to a certain db, if not specified, the db of the current session is used.
properties supports the following parameters:
url: Required. Specifies the download path for a file. Currently only unauthenticated http download paths are
supported. After the command executes successfully, the file will be saved in doris and the url will no longer be
needed.
catalog: Required. The classification name of the file can be customized. However, in some commands, files in the
specified catalog are looked up. For example, in the routine import, when the data source is kafka, the file under the
catalog name kafka will be searched.
md5: optional. md5 of the file. If specified, verification will be performed after the file is downloaded.
Example
1. Create a file ca.pem , classified as kafka
PROPERTIES
"url" = "https://test.bj.bcebos.com/kafka-key/ca.pem",
"catalog" = "kafka"
);
IN my_database
PROPERTIES
"url" = "https://test.bj.bcebos.com/kafka-key/client.key",
"catalog" = "my_catalog",
"md5" = "b5bb901bf10f99205b39a46ac3557dd9"
);
Keywords
CREATE, FILE
Best Practice
1. This command can only be executed by users with amdin privileges. A certain file belongs to a certain database. This file
can be used by any user with access rights to database.
This function is mainly used to manage some small files such as certificates. So a single file size is limited to 1MB. A Doris
cluster can upload up to 100 files.
CREATE-INDEX
CREATE-INDEX
Name
CREATE INDEX
Description
This statement is used to create an index
grammar:
CREATE INDEX [IF NOT EXISTS] index_name ON table_name (column [, ...],) [USING BITMAP] [COMMENT 'balabala'];
Notice:
Example
CREATE INDEX [IF NOT EXISTS] index_name ON table1 (siteid) USING BITMAP COMMENT 'balabala';
Keywords
CREATE, INDEX
Best Practice
CREATE-RESOURCE
CREATE-RESOURCE
Name
CREATE RESOURCE
Description
This statement is used to create a resource. Only the root or admin user can create resources. Currently supports Spark,
ODBC, S3 external resources.
In the future, other external resources may be added to Doris for use, such as Spark/GPU for
query, HDFS/S3 for external storage, MapReduce for ETL, etc.
grammar:
illustrate:
The type of resource needs to be specified in PROPERTIES "type" = "[spark|odbc_catalog|s3]", currently supports spark,
odbc_catalog, s3.
PROPERTIES differs depending on the resource type, see the example for details.
Example
PROPERTIES
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.executor.memory" = "1g",
"spark.yarn.queue" = "queue0",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
);
Working_dir and broker need to be specified when Spark is used for ETL. described as follows:
working_dir: The directory used by the ETL. Required when spark is used as an ETL resource. For example:
hdfs: //host:port/tmp/doris.
broker: broker name. Required when spark is used as an ETL resource. Configuration needs to be done in advance using
the ALTER SYSTEM ADD BROKER command.
broker.property_key: The authentication information that the broker needs to specify when reading the intermediate file
generated by ETL.
PROPERTIES (
"type" = "odbc_catalog",
"host" = "192.168.0.1",
"port" = "8086",
"user" = "test",
"password" = "test",
"database" = "test",
"odbc_type" = "oracle",
);
There is also support for implementing custom parameters per ODBC Driver, see the description of the
corresponding ODBC Driver
3. Create S3 resource
PROPERTIES
"type" = "s3",
"AWS_ENDPOINT" = "bj.s3.com",
"AWS_REGION" = "bj",
"AWS_ACCESS_KEY" = "bbb",
"AWS_SECRET_KEY" = "aaaa",
"AWS_MAX_CONNECTIONS" = "50",
"AWS_REQUEST_TIMEOUT_MS" = "3000",
"AWS_CONNECTION_TIMEOUT_MS" = "1000"
);
If S3 resource is used for cold hot separation, we should add more required fields.
CREATE RESOURCE "remote_s3"
PROPERTIES
"type" = "s3",
"AWS_ENDPOINT" = "bj.s3.com",
"AWS_REGION" = "bj",
"AWS_ACCESS_KEY" = "bbb",
"AWS_SECRET_KEY" = "aaaa",
-- required by cooldown
"AWS_ROOT_PATH" = "/path/to/root",
"AWS_BUCKET" = "test-bucket",
);
Required parameters
AWS_ENDPOINT : s3 endpoint
AWS_REGION : s3 region
AWS_ROOT_PATH : s3 root directory
optional parameter
AWS_MAX_CONNECTIONS
: the maximum number of s3 connections, the default is 50
AWS_REQUEST_TIMEOUT_MS : s3 request timeout, in milliseconds, the default is 3000
AWS_CONNECTION_TIMEOUT_MS : s3 connection timeout, in milliseconds, the default is 1000
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-
connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver"
);
"type"="hdfs",
"username"="user",
"password"="passwd",
"dfs.nameservices" = "my_ha",
"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
"dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
dfs.ha.namenodes.[nameservice ID] :unique identifiers for each NameNode in the nameservice. See hdfs-site.xml
:
dfs.namenode.rpc-address.[nameservice ID].[name node ID]` the fully-qualified RPC address for each NameNode
to listen on. See hdfs-site.xml
dfs.client.failover.proxy.provider.[nameservice ID] :the Java class that HDFS clients use to contact the Active
NameNode, usually it is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
'type'='hms',
'hive.metastore.uris' = 'thrift://127.0.0.1:7004',
'dfs.nameservices'='HANN',
'dfs.ha.namenodes.HANN'='nn1,nn2',
'dfs.namenode.rpc-address.HANN.nn1'='nn1_host:rpc_port',
'dfs.namenode.rpc-address.HANN.nn2'='nn2_host:rpc_port',
'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProv
);
7. Create ES resource
"type"="es",
"hosts"="http://127.0.0.1:29200",
"nodes_discovery"="false",
"enable_keyword_sniff"="true"
);
hosts: ES Connection Address, maybe one or more node, load-balance is also accepted
nodes_discovery: Whether or not to enable ES node discovery, the default is true. In network isolation, set this
parameter to false. Only the specified node is connected
http_ssl_enabled: Whether ES cluster enables https access mode, the current FE/BE implementation is to trust all
Keywords
CREATE, RESOURCE
Best Practice
CREATE-TABLE-LIKE
CREATE-TABLE-LIKE
Name
CREATE TABLE LIKE
Description
This statement is used to create an empty table with the exact same table structure as another table, and can optionally
replicate some rollups.
grammar:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [database.]table_name LIKE [database.]table_name [WITH ROLLUP
(r1,r2,r3,...)]
illustrate:
The copied table structure includes Column Definition, Partitions, Table Properties, etc.
The user needs to have SELECT permission on the copied original table
Example
1. Create an empty table with the same table structure as table1 under the test1 library, the table name is table2
2. Create an empty table with the same table structure as test1.table1 under the test2 library, the table name is table2
3. Create an empty table with the same table structure as table1 under the test1 library, the table name is table2, and copy
the two rollups of r1 and r2 of table1 at the same time
4. Create an empty table with the same table structure as table1 under the test1 library, the table name is table2, and copy all
the rollups of table1 at the same time
5. Create an empty table with the same table structure as test1.table1 under the test2 library, the table name is table2, and
copy the two rollups of r1 and r2 of table1 at the same time
6. Create an empty table with the same table structure as test1.table1 under the test2 library, the table name is table2, and
copy all the rollups of table1 at the same time
7. Create an empty table under the test1 library with the same table structure as the MySQL outer table1, the table name is
table2
Keywords
Best Practice
CREATE-MATERIALIZED-VIEW
CREATE-MATERIALIZED-VIEW
Name
CREATE MATERIALIZED VIEW
Description
This statement is used to create a materialized view.
This operation is an asynchronous operation. After the submission is successful, you need to view the job progress through
SHOW ALTER TABLE MATERIALIZED VIEW. After displaying FINISHED, you can use the desc [table_name] all command
to view the schema of the materialized view.
grammar:
illustrate:
MV name : The name of the materialized view, required. Materialized view names for the same table cannot be repeated.
query : The query statement used to construct the materialized view, the result of the query statement is the data of the
materialized view. Currently supported query formats are:
select_expr
: All columns in the schema of the materialized view.
Contains at least one single column.
base view name : The original table name of the materialized view, required.
Must be a single table and not a subquery
group by : The grouping column of the materialized view, optional.
If not filled, the data will not be grouped.
order by : the sorting column of the materialized view, optional.
The declaration order of the sort column must be the same as the column declaration order in select_expr.
If order by is not declared, the sorting column is automatically supplemented according to the rules. If the
materialized view is an aggregate type, all grouping columns are automatically supplemented as sort columns. If
the materialized view is of a non-aggregate type, the first 36 bytes are automatically supplemented as the sort
column.
If the number of auto-supplemented sorts is less than 3, the first three are used as the sort sequence. If query
contains a grouping column, the sorting column must be the same as the grouping column.
properties
Example
Base table structure is
+-------+--------+------+------+---------+-------+
+-------+--------+------+------+---------+-------+
+-------+--------+------+------+---------+-------+
k1 int null,
k2 int null,
k3 bigint null,
k4 bigint null
properties("replication_num" = "1");
:
attention If the materialized view contains partitioned and distributed columns of the Base table, these columns must be
used as key columns in the materialized view
1. Create a materialized view that contains only the columns of the original table (k1, k2)
The schema of the materialized view is as follows, the materialized view contains only two columns k1, k2 without any
aggregation
+-------+-------+--------+------+------+ ---------+-------+
+-------+-------+--------+------+------+ ---------+-------+
+-------+-------+--------+------+------+ ---------+-------+
The schema of the materialized view is shown in the figure below. The materialized view contains only two columns k2,
k1, where k2 is the sorting column without any aggregation.
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
3. Create a materialized view with k1, k2 grouping and k3 column aggregated by SUM
The schema of the materialized view is shown in the figure below. The materialized view contains two columns k1, k2,
sum(k3) where k1, k2 are the grouping columns, and sum(k3) is the sum value of the k3 column grouped by k1, k2.
Since the materialized view does not declare a sorting column, and the materialized view has aggregated data, the
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
select k1, k2, k3, k4 from duplicate_table group by k1, k2, k3, k4;
The materialized view schema is as shown below. The materialized view contains columns k1, k2, k3, and k4, and there are
no duplicate rows.
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
+-------+-------+--------+------+------- +---------+-------+
5. Create a non-aggregate materialized view that does not declare a sort column
+-------+--------------+------+-------+---------+- ------+
+-------+--------------+------+-------+---------+- ------+
+-------+--------------+------+-------+---------+- ------+
The materialized view contains k3, k4, k5, k6, k7 columns, and does not declare a sort column, the creation statement is
as follows:
The default added sorting column of the system is k3, k4, k5 three columns. The sum of the bytes of these three column
types is 4(INT) + 8(BIGINT) + 16(DECIMAL) = 28 < 36. So the addition is that these three columns are used as sorting
columns. The schema of the materialized view is as follows, you can see that the key field of the k3, k4, k5 columns is true,
that is, the sorting column. The key field of the k6, k7 columns is false, which is a non-sorted column.
+----------------+-------+--------------+------+-- -----+---------+-------+
+----------------+-------+--------------+------+-- -----+---------+-------+
+----------------+-------+--------------+------+-- -----+---------+-------+
Keywords
Best Practice
CREATE-EXTERNAL-TABLE
CREATE-EXTERNAL-TABLE
Name
CREATE EXTERNAL TABLE
Description
This statement is used to create an external table, see CREATE TABLE for the specific syntax.
Which type of external table is mainly identified by the ENGINE type, currently MYSQL, BROKER, HIVE, ICEBERG, HUDI are
optional
PROPERTIES (
"host" = "mysql_server_host",
"port" = "mysql_server_port",
"user" = "your_user_name",
"password" = "your_password",
"database" = "database_name",
"table" = "table_name"
)
and there is an optional propertiy "charset" which can set character fom mysql connection, default value is "utf8". You
can set another value "utf8mb4" instead of "utf8" when you need.
Notice:
"table_name" in "table" entry is the real table name in mysql. The table_name in the CREATE TABLE statement is the
name of the mysql table in Doris, which can be different.
The purpose of creating a mysql table in Doris is to access the mysql database through Doris. Doris itself does not
maintain or store any mysql data.
2. If it is a broker, it means that the access to the table needs to pass through the specified broker, and the following
information needs to be provided in properties:
PROPERTIES (
"broker_name" = "broker_name",
"path" = "file_path1[,file_path2]",
"column_separator" = "value_separator"
"line_delimiter" = "value_delimiter"
In addition, you need to provide the Property information required by the Broker, and pass it through the BROKER
PROPERTIES, for example, HDFS needs to pass in
BROKER PROPERTIES(
"username" = "name",
"password" = "password"
According to different Broker types, the content that needs to be passed in is also different.
Notice:
If there are multiple files in "path", separate them with comma [,]. If the filename contains a comma, use %2c instead.
If the filename contains %, use %25 instead
Now the file content format supports CSV, and supports GZ, BZ2, LZ4, LZO (LZOP) compression formats.
PROPERTIES (
"database" = "hive_db_name",
"table" = "hive_table_name",
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
Where database is the name of the library corresponding to the hive table, table is the name of the hive table, and
hive.metastore.uris is the address of the hive metastore service.
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "iceberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://127.0.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
Example
k1 DATE,
k2 INT,
k3 SMALLINT,
k4 VARCHAR(2048),
k5 DATETIME
ENGINE=mysql
PROPERTIES
"host" = "127.0.0.1",
"port" = "8239",
"user" = "mysql_user",
"password" = "mysql_passwd",
"database" = "mysql_db_test",
"table" = "mysql_table_test",
"charset" = "utf8mb4"
PROPERTIES
"type" = "odbc_catalog",
"user" = "mysql_user",
"password" = "mysql_passwd",
"host" = "127.0.0.1",
"port" = "8239"
);
k1 DATE,
k2 INT,
k3 SMALLINT,
k4 VARCHAR(2048),
k5 DATETIME
ENGINE=mysql
PROPERTIES
"odbc_catalog_resource" = "mysql_resource",
"database" = "mysql_db_test",
"table" = "mysql_table_test"
2. Create a broker external table with data files stored on HDFS, the data is split with "|", and " \n" is newline
k1 DATE,
k2 INT,
k3 SMALLINT,
k4 VARCHAR(2048),
k5 DATETIME
ENGINE=broker
PROPERTIES (
"broker_name" = "hdfs",
"path" =
"hdfs://hdfs_host:hdfs_port/data1,hdfs://hdfs_host:hdfs_port/data2,hdfs://hdfs_host:hdfs_port/data3%2c4",
"column_separator" = "|",
"line_delimiter" = "\n"
BROKER PROPERTIES (
"username" = "hdfs_user",
"password" = "hdfs_password"
k1 TINYINT,
k2 VARCHAR(50),
v INT
ENGINE=hive
PROPERTIES
"database" = "hive_db_name",
"table" = "hive_table_name",
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
ENGINE=ICEBERG
PROPERTIES (
"iceberg.database" = "iceberg_db",
"iceberg.table" = "iceberg_table",
"iceberg.hive.metastore.uris" = "thrift://127.0.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
ENGINE=HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
ENGINE=HUDI
PROPERTIES (
"hudi.database" = "hudi_db_in_hive_metastore",
"hudi.table" = "hudi_table_in_hive_metastore",
"hudi.hive.metastore.uris" = "thrift://127.0.0.1:9083"
);
Keywords
CREATE, EXTERNAL, TABLE
Best Practice
CREATE-TABLE
CREATE-TABLE
Description
This command is used to create a table. The subject of this document describes the syntax for creating Doris self-maintained
tables. For external table syntax, please refer to the CREATE-EXTERNAL-TABLE document.
column_definition_list,
[index_definition_list]
[engine_type]
[keys_type]
[table_comment]
[partition_info]
distribution_desc
[rollup_list]
[properties]
[extra_properties]
column_definition_list
column_definition[, column_definition]
column_definition
Column definition:
column_type
TINYINT (1 byte)
SMALLINT (2 bytes)
INT (4 bytes)
BIGINT (8 bytes)
FLOAT (4 bytes)
precision: 1 ~ 27
scale: 0 ~ 9
DATE (3 bytes)
DATETIME (8 bytes)
CHAR[(length)]
VARCHAR[(length)]
HyperLogLog column type, do not need to specify the length and default value. The length is
controlled within the system according to the degree of data aggregation.
BITMAP
The bitmap column type does not need to specify the length and default value. Represents a
collection of integers, and the maximum number of elements supported is 2^64-1.
aggr_type
REPLACE: Replace. For rows with the same dimension column, the index column will be imported in the
order of import, and the last imported will replace the first imported.
REPLACE_IF_NOT_NULL: non-null value replacement. The difference with REPLACE is that there is no
replacement for null values. It should be noted here that the default value should be NULL, not an
empty string. If it is an empty string, you should replace it with an empty string.
HLL_UNION: The aggregation method of HLL type columns, aggregated by HyperLogLog algorithm.
BITMAP_UNION: The aggregation mode of BIMTAP type columns, which performs the union aggregation of
bitmaps.
default_value
Default value of the column. If the load data does not specify a value for this column, the system
will assign a default value to this column.
```SQL
```
2. Keywords are provided by the system. Currently, the following keywords are supported:
```SQL
// This keyword is used only for DATETIME type. If the value is missing, the system assigns the
current timestamp.
```
Example:
```
k1 TINYINT,
v2 BITMAP BITMAP_UNION,
v3 HLL HLL_UNION,
v4 INT SUM NOT NULL DEFAULT "1" COMMENT "This is column v4"
```
index_definition_list
index_definition[, index_definition]
index_definition
Index definition:
Example:
...
engine_type
Table engine type. All types in this document are OLAP. For other external table engine types, see CREATE EXTERNAL
TABLE document. Example:
ENGINE=olap
keys_type
Data model.
DUPLICATE KEY (default): The subsequent specified column is the sorting column.
UNIQUE KEY: The subsequent specified column is the primary key column.
Example:
table_comment
Table notes. Example:
partition_info
i. LESS THAN: Only define the upper boundary of the partition. The lower bound is determined by the upper bound of
ii. FIXED RANGE: Define the left closed and right open interval of the zone.
:
Since Version 1.2.0 3. MULTI RANGE Multi build RANGE partitions,Define the left closed and right open interval of the
zone, Set the time unit and step size, the time unit supports year, month, day, week and hour.
```
PARTITION BY RANGE(col)
```
* `distribution_desc`
Define the data bucketing method.
1) Hash
Syntax:
Explain:
2) Random
Syntax:
Explain:
rollup_list
Multiple materialized views (ROLLUP) can be created at the same time as the table is built.
ROLLUP (rollup_definition[, rollup_definition, ...])
rollup_definition
rollup_name (col1[, col2, ...]) [DUPLICATE KEY(col1[, col2, ...])] [PROPERTIES("key" = "value")]
Example:
ROLLUP (
properties
replication_num
Number of copies. The default number of copies is 3. If the number of BE nodes is less than 3, you need to specify
that the number of copies is less than or equal to the number of BE nodes.
After version 0.15, this attribute will be automatically converted to the replication_allocation attribute, such as:
replication_allocation
Set the copy distribution according to Tag. This attribute can completely cover the function of the replication_num
attribute.
storage_medium/storage_cooldown_time
Data storage medium. storage_medium is used to declare the initial storage medium of the table data, and
storage_cooldown_time
is used to set the expiration time. Example:
"storage_medium" = "SSD",
This example indicates that the data is stored in the SSD and will be automatically migrated to the HDD storage after
the expiration of 2020-11-20 00:00:00.
colocate_with
When you need to use the Colocation Join function, use this parameter to set the Colocation Group.
"colocate_with" = "group1"
bloom_filter_columns
The user specifies the list of column names that need to be added to the Bloom Filter index. The Bloom Filter index of
each column is independent, not a composite index.
in_memory
Doris has no concept of memory tables.
When this property is set to true , Doris will try to cache the data blocks of the table in the PageCache of the storage
engine, which has reduced disk IO. But this property does not guarantee that the data block is resident in memory, it
is only used as a best-effort identification.
"in_memory" = "true"
function_column.sequence_col
When using the UNIQUE KEY model, you can specify a sequence column. When the KEY columns are the same,
REPLACE will be performed according to the sequence column (the larger value replaces the smaller value,
otherwise it cannot be replaced)
The function_column.sequence_col is used to specify the mapping of the sequence column to a column in the table,
which can be integral and time (DATE, DATETIME). The type of this column cannot be changed after creation. If
function_column.sequence_col is set, function_column.sequence_type is ignored.
"function_column.sequence_col" ='column_name'
function_column.sequence_type
When using the UNIQUE KEY model, you can specify a sequence column. When the KEY columns are the same,
REPLACE will be performed according to the sequence column (the larger value replaces the smaller value,
otherwise it cannot be replaced)
Here we only need to specify the type of sequence column, support time type or integer type. Doris will create a
hidden sequence column.
"function_column.sequence_type" ='Date'
compression
The default compression method for Doris tables is LZ4. After version 1.1, it is supported to specify the compression
method as ZSTD to obtain a higher compression ratio.
"compression"="zstd"
light_schema_change
If set to true, the addition and deletion of value columns can be done more quickly and synchronously.
"light_schema_change"="true"
disable_auto_compaction
If this property is set to 'true', the background automatic compaction process will skip all the tables of this table.
"disable_auto_compaction" = "false"
Dynamic partition related
dynamic_partition.enable : Used to specify whether the dynamic partition function at the table level is enabled.
The default is true.
dynamic_partition.time_unit: is used to specify the time unit for dynamically adding partitions, which can be
selected as DAY (day), WEEK (week), MONTH (month), HOUR (hour).
dynamic_partition.start : Used to specify how many partitions to delete forward. The value must be less than 0.
The default is Integer.MIN_VALUE.
dynamic_partition.end : Used to specify the number of partitions created in advance. The value must be greater
than 0.
dynamic_partition.prefix : Used to specify the partition name prefix to be created. For example, if the partition
name prefix is p, the partition name will be automatically created as p20200108.
dynamic_partition.buckets : Used to specify the number of partition buckets that are automatically created.
dynamic_partition.create_history_partition : Whether to create a history partition.
dynamic_partition.history_partition_num : Specify the number of historical partitions to be created.
Example
k1 TINYINT,
2. Create a detailed model table, partition, specify the sorting column, and set the number of copies to 1
k1 DATE,
PARTITION BY RANGE(k1)
(
PROPERTIES (
"replication_num" = "1"
);
3. Create a table with a unique model of the primary key, set the initial storage medium and cooling time
k1 BIGINT,
k2 LARGEINT,
v1 VARCHAR(2048),
PROPERTIES(
"storage_medium" = "SSD",
);
k1 DATE,
k2 INT,
k3 SMALLINT,
v1 VARCHAR(2048) REPLACE,
5. Create an aggregate model table with HLL and BITMAP column types
k1 TINYINT,
v1 HLL HLL_UNION,
v2 BITMAP BITMAP_UNION
ENGINE=olap
CREATE TABLE t1 (
DUPLICATE KEY(id)
PROPERTIES (
"colocate_with" = "group1"
);
CREATE TABLE t2 (
DUPLICATE KEY(`id`)
PROPERTIES (
"colocate_with" = "group1"
);
7. Create a memory table with bitmap index and bloom filter index
k1 TINYINT,
v1 CHAR(10) REPLACE,
v2 INT SUM,
PROPERTIES (
"bloom_filter_columns" = "k2",
"in_memory" = "true"
);
The table creates partitions 3 days in advance every day, and deletes the partitions 3 days ago. For example, if today is
2020-01-08 , partitions named p20200108 , p20200109 , p20200110 , p20200111 will be created. The partition ranges are:
k1 DATE,
k2 INT,
k3 SMALLINT,
v1 VARCHAR(2048),
PROPERTIES(
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-3",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32"
);
event_day DATE,
citycode SMALLINT,
ROLLUP (
r1(event_day,siteid),
r2(event_day,citycode),
r3(event_day)
PROPERTIES("replication_num" = "3");
10. Set the replica of the table through the replication_allocation property.
k1 TINYINT,
PROPERTIES (
"replication_allocation"="tag.location.group_a:1, tag.location.group_b:2"
);
k1 DATE,
k2 INT,
k3 SMALLINT,
v1 VARCHAR(2048),
PROPERTIES(
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-3",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "32",
"dynamic_partition.replication_allocation" = "tag.location.group_a:3"
);
11. Set the table hot and cold separation policy through the storage_policy property.
k1 BIGINT,
k2 LARGEINT,
v1 VARCHAR(2048)
UNIQUE KEY(k1)
PROPERTIES(
"storage_policy" = "test_create_table_use_policy",
"replication_num" = "1"
);
NOTE: Need to create the s3 resource and storage policy before the table can be successfully associated with the migration
policy
12. Add a hot and cold data migration strategy for the table partition
k1 DATE,
k2 INT,
V1 VARCHAR(2048) REPLACE
NOTE: Need to create the s3 resource and storage policy before the table can be successfully associated with the migration
policy
Since Version 1.2.0
k1 DATE,
k2 INT,
V1 VARCHAR(20)
PROPERTIES(
"replication_num" = "1"
);
k1 DATETIME,
k2 INT,
V1 VARCHAR(20)
PROPERTIES(
"replication_num" = "1"
);
NOTE: Multi Partition can be mixed with conventional manual creation of partitions. When using, you need to limit the
partition column to only one, The default maximum number of partitions created in multi partition is 4096, This parameter
max_multi_partition_num
can be adjusted in fe configuration .
Keywords
CREATE, TABLE
Best Practice
Tables in Doris can be divided into partitioned tables and non-partitioned tables. This attribute is determined when the table
is created and cannot be changed afterwards. That is, for partitioned tables, you can add or delete partitions in the
subsequent use process, and for non-partitioned tables, you can no longer perform operations such as adding partitions
afterwards.
At the same time, partitioning columns and bucketing columns cannot be changed after the table is created. You can neither
change the types of partitioning and bucketing columns, nor do any additions or deletions to these columns.
Therefore, it is recommended to confirm the usage method to build the table reasonably before building the table.
Dynamic Partition
The dynamic partition function is mainly used to help users automatically manage partitions. By setting certain rules, the
Doris system regularly adds new partitions or deletes historical partitions. Please refer to Dynamic Partition document for
more help.
Materialized View
Users can create multiple materialized views (ROLLUP) while building a table. Materialized views can also be added after the
table is built. It is convenient for users to create all materialized views at one time by writing in the table creation statement.
If the materialized view is created when the table is created, all subsequent data import operations will synchronize the data
of the materialized view to be generated. The number of materialized views may affect the efficiency of data import.
If you add a materialized view in the subsequent use process, if there is data in the table, the creation time of the materialized
view depends on the current amount of data.
For the introduction of materialized views, please refer to the document materialized views.
Index
Users can create indexes on multiple columns while building a table. Indexes can also be added after the table is built.
If you add an index in the subsequent use process, if there is data in the table, you need to rewrite all the data, so the creation
time of the index depends on the current data volume.
in_memory property
The "in_memory" = "true" attribute was specified when the table was created. Doris will try to cache the data blocks of the
table in the PageCache of the storage engine, which has reduced disk IO. However, this attribute does not guarantee that the
data block is permanently resident in memory, and is only used as a best-effort identification.
Feedback
SQL Manual SQL Reference DDL Create CREATE-SQL-BLOCK-RULE
CREATE-SQL-BLOCK-RULE
CREATE-SQL-BLOCK-RULE
Name
Description
This statement creates a SQL blocking rule, which is only used to restrict query statements, not to restrict the execution of
explian statements.
grammar:
Parameter Description:
sql: matching rule (based on regular matching, special characters need to be translated,for example select * use select
\\* ), optional, the default value is "NULL"
sqlHash: sql hash value, used for exact matching, we will print this value in fe.audit.log , optional, this parameter and
sql can only be selected one, the default value is "NULL"
partition_num: the maximum number of partitions a scan node will scan, the default value is 0L
tablet_num: The maximum number of tablets that a scanning node will scan, the default value is 0L
cardinality: the rough scan line number of a scan node, the default value is 0L
global: Whether to take effect globally (all users), the default is false
Example
PROPERTIES(
"global"="false",
"enable"="true"
);
Notes:
That the sql statement here does not end with a semicolon
When we execute the sql we just defined in the rule, an exception error will be returned. The example is as follows:
ERROR 1064 (HY000): errCode = 2, detailMessage = sql match regex sql block rule: order_analysis_rule
2. Create test_rule2, limit the maximum number of scanned partitions to 30, and limit the maximum scan base to 10 billion
rows. The example is as follows:
PROPERTIES (
"partition_num" = "30",
"cardinality" = "10000000000",
"global" = "false",
"enable" = "true"
);
PROPERTIES
);
PROPERTIES
);
Keywords
CREATE, SQL_BLCOK_RULE
Best Practice
CREATE-FUNCTION
CREATE-FUNCTION
Name
CREATE FUNCTION
Description
ADMIN
This statement creates a custom function. Executing this command requires the user to have privileges.
If function_name contains the database name, then the custom function will be created in the corresponding database,
otherwise the function will be created in the database where the current session is located. The name and parameters of the
new function cannot be the same as the existing functions in the current namespace, otherwise the creation will fail. But only
with the same name and different parameters can be created successfully.
grammar:
(arg_type [, ...])
[RETURNS ret_type]
[INTERMEDIATE inter_type]
Parameter Description:
AGGREGATE : If there is this item, it means that the created function is an aggregate function.
ALIAS : If there is this item, it means that the created function is an alias function.
If the above two items are absent, it means that the created function is a scalar function
function_name : The name of the function to be created, which can include the name of the database. For example:
db1.my_func .
arg_type : The parameter type of the function, which is the same as the type defined when creating the table. Variable-
length parameters can be represented by , ... . If it is a variable-length type, the type of the variable-length parameter is
the same as that of the last non-variable-length parameter.
NOTE: ALIAS FUNCTION does not support variable-length arguments and must have at least one argument.
ret_type : Required for creating new functions. If you are aliasing an existing function, you do not need to fill in this
parameter.
inter_type : The data type used to represent the intermediate stage of the aggregation function.
param
: used to represent the parameter of the alias function, including at least one.
origin_function : used to represent the original function corresponding to the alias function.
properties : Used to set properties related to aggregate functions and scalar functions. The properties that can be set
include:
object_file : The URL path of the custom function dynamic library. Currently, only HTTP/HTTPS protocol is
supported. This path needs to remain valid for the entire life cycle of the function. This option is required
symbol : The function signature of the scalar function, which is used to find the function entry from the dynamic
init_fn : The initialization function signature of the aggregate function. Required for aggregate functions
update_fn
: update function signature of aggregate function. Required for aggregate functions
merge_fn : Merge function signature of aggregate function. Required for aggregate functions
serialize_fn : Serialize function signature of aggregate function. Optional for aggregate functions, if not specified,
the default serialization function will be used
finalize_fn : The function signature of the aggregate function to get the final result. Optional for aggregate
functions, if not specified, the default get-result function will be used
md5 : The MD5 value of the function dynamic link library, which is used to verify whether the downloaded content is
correct. This option is optional
prepare_fn : The function signature of the prepare function of the custom function, which is used to find the prepare
function entry from the dynamic library. This option is optional for custom functions
close_fn : The function signature of the close function of the custom function, which is used to find the close
function entry from the dynamic library. This option is optional for custom functions
Example
"symbol" = "_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_",
"object_file" = "http://host:port/libmyadd.so"
);
"symbol" = "_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_",
"prepare_fn" = "_ZN9doris_udf14AddUdf_prepareEPNS_15FunctionContextENS0_18FunctionStateScopeE",
"close_fn" = "_ZN9doris_udf12AddUdf_closeEPNS_15FunctionContextENS0_18FunctionStateScopeE",
"object_file" = "http://host:port/libmyadd.so"
);
"init_fn"="_ZN9doris_udf9CountInitEPNS_15FunctionContextEPNS_9BigIntValE",
"update_fn"="_ZN9doris_udf11CountUpdateEPNS_15FunctionContextERKNS_6IntValEPNS_9BigIntValE",
"merge_fn"="_ZN9doris_udf10CountMergeEPNS_15FunctionContextERKNS_9BigIntValEPS2_",
"finalize_fn"="_ZN9doris_udf13CountFinalizeEPNS_15FunctionContextERKNS_9BigIntValE",
"object_file"="http://host:port/libudasample.so"
);
"symbol" = "_ZN9doris_udf6StrConcatUdfEPNS_15FunctionContextERKNS_6IntValES4_",
"object_file" = "http://host:port/libmyStrConcat.so"
);
CREATE ALIAS FUNCTION id_masking(INT) WITH PARAMETER(id) AS CONCAT(LEFT(id, 3), '****', RIGHT(id, 4));
Keywords
CREATE, FUNCTION
Best Practice
CREATE-CATALOG
CREATE-CATALOG
Name
CREATE CATALOG
Description
This statement is used to create an external catalog
Syntax:
Create catalog
Create catalog through resource
'type'='hms|es|jdbc',
...
);
'key' = 'value'
);
'type'='hms|es|jdbc',
...
);
Example
'type'='hms',
'hive.metastore.uris' = 'thrift://127.0.0.1:7004',
'dfs.nameservices'='HANN',
'dfs.ha.namenodes.HANN'='nn1,nn2',
'dfs.namenode.rpc-address.HANN.nn1'='nn1_host:rpc_port',
'dfs.namenode.rpc-address.HANN.nn2'='nn2_host:rpc_port',
'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProv
);
-- 1.2.0 Version
'type'='hms',
'hive.metastore.uris' = 'thrift://127.0.0.1:7004',
'dfs.nameservices'='HANN',
'dfs.ha.namenodes.HANN'='nn1,nn2',
'dfs.namenode.rpc-address.HANN.nn1'='nn1_host:rpc_port',
'dfs.namenode.rpc-address.HANN.nn2'='nn2_host:rpc_port',
'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProv
);
2. Create catalog es
-- 1.2.0+ Version
"type"="es",
"hosts"="http://127.0.0.1:9200"
);
-- 1.2.0 Version
"type"="es",
"hosts"="http://127.0.0.1:9200"
);
-- 1.2.0+ Version
"type"="jdbc",
"user"="root",
"password"="123456",
"jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false",
"driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-
connector-java-8.0.25.jar",
"driver_class" = "com.mysql.cj.jdbc.Driver"
);
-- 1.2.0 Version
"type"="jdbc",
"jdbc.user"="root",
"jdbc.password"="123456",
"jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false",
"jdbc.driver_url" = "https://doris-community-test-1308700295.cos.ap-
hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
"jdbc.driver_class" = "com.mysql.cj.jdbc.Driver"
);
postgresql
"type"="jdbc",
"user"="postgres",
"password"="123456",
"jdbc_url" = "jdbc:postgresql://127.0.0.1:5432/demo",
"driver_url" = "file:/path/to/postgresql-42.5.1.jar",
"driver_class" = "org.postgresql.Driver"
);
"type"="jdbc",
"jdbc.user"="postgres",
"jdbc.password"="123456",
"jdbc.jdbc_url" = "jdbc:postgresql://127.0.0.1:5432/demo",
"jdbc.driver_url" = "file:/path/to/postgresql-42.5.1.jar",
"jdbc.driver_class" = "org.postgresql.Driver"
);
clickhouse
-- 1.2.0+ Version
"type"="jdbc",
"user"="default",
"password"="123456",
"jdbc_url" = "jdbc:clickhouse://127.0.0.1:8123/demo",
"driver_url" = "file:///path/to/clickhouse-jdbc-0.3.2-patch11-all.jar",
"driver_class" = "com.clickhouse.jdbc.ClickHouseDriver"
-- 1.2.0 Version
"type"="jdbc",
"jdbc.jdbc_url" = "jdbc:clickhouse://127.0.0.1:8123/demo",
...
oracle
"type"="jdbc",
"user"="doris",
"password"="123456",
"jdbc_url" = "jdbc:oracle:thin:@127.0.0.1:1521:helowin",
"driver_url" = "file:/path/to/ojdbc6.jar",
"driver_class" = "oracle.jdbc.driver.OracleDriver"
);
"type"="jdbc",
"jdbc.user"="doris",
"jdbc.password"="123456",
"jdbc.jdbc_url" = "jdbc:oracle:thin:@127.0.0.1:1521:helowin",
"jdbc.driver_url" = "file:/path/to/ojdbc6.jar",
"jdbc.driver_class" = "oracle.jdbc.driver.OracleDriver"
);
SQLServer
"user"="SA",
"password"="Doris123456",
"jdbc_url" = "jdbc:sqlserver://localhost:1433;DataBaseName=doris_test",
"driver_url" = "file:/path/to/mssql-jdbc-11.2.3.jre8.jar",
"driver_class" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
);
"type"="jdbc",
"jdbc.user"="SA",
"jdbc.password"="Doris123456",
"jdbc.jdbc_url" = "jdbc:sqlserver://localhost:1433;DataBaseName=doris_test",
"jdbc.driver_url" = "file:/path/to/mssql-jdbc-11.2.3.jre8.jar",
"jdbc.driver_class" = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
);
Keywords
CREATE, CATALOG
Best Practice
DROP-INDEX
DROP-INDEX
Name
DROP INDEX
Description
This statement is used to delete the index of the specified name from a table. Currently, only bitmap indexes are supported.
grammar:
Example
1. Delete the index
Keywords
DROP, INDEX
Best Practice
DROP-RESOURCE
DROP-RESOURCE
Name
DROP RESOURCE
Description
This statement is used to delete an existing resource. Only the root or admin user can delete resources.
grammar:
Example
Keywords
DROP, RESOURCE
Best Practice
DROP-FILE
DROP-FILE
Name
DROP FILE
Description
This statement is used to delete an uploaded file.
grammar:
[properties]
illustrate:
Example
Keywords
DROP, FILE
Best Practice
Feedback
SQL Manual SQL Reference DDL Drop DROP-ENCRYPT-KEY
DROP-ENCRYPT-KEY
DROP-ENCRYPT-KEY
Name
DROP ENCRYPTKEY
Description
grammar:
Parameter Description:
key_name : The name of the key to delete, can include the name of the database. For example: db1.my_key .
Delete a custom key. The name of the key is exactly the same to be deleted.
Example
1. Delete a key
Keywords
Best Practice
DROP-DATABASE
DROP-DATABASE
Name
DOPR DATABASE
Description
This statement is used to delete the database (database)
grammar:
illustrate:
During the execution of DROP DATABASE, the deleted database can be recovered through the RECOVER statement. See
the RECOVER statement for details
If you execute DROP DATABASE FORCE, the system will not check the database for unfinished transactions, the
database will be deleted directly and cannot be recovered, this operation is generally not recommended
Example
1. Delete the database db_test
Keywords
DROP, DATABASE
Best Practice
Feedback
SQL Manual SQL Reference DDL Drop DROP-MATERIALIZED-VIEW
DROP-MATERIALIZED-VIEW
DROP-MATERIALIZED-VIEW
Name
Description
This statement is used to drop a materialized view. Synchronous syntax
grammar:
1. IF EXISTS:
Do not throw an error if the materialized view does not exist. If this keyword is not declared, an error will be
reported if the materialized view does not exist.
2. mv_name:
The name of the materialized view to delete. Required.
3. table_name:
The name of the table to which the materialized view to be deleted belongs. Required.
Example
The table structure is
+----------------+-------+----------+------+------ -+---------+-------+
+----------------+-------+----------+------+------ -+---------+-------+
| | | | | | | | |
+----------------+-------+----------+------+------ -+---------+-------+
+----------------+-------+----------+------+------ -+---------+-------+
+----------------+-------+----------+------+------ -+---------+-------+
+----------------+-------+----------+------+------ -+---------+-------+
ERROR 1064 (HY000): errCode = 2, detailMessage = Materialized view [k1_k2] does not exist in table
[all_type_table]
3. Delete the materialized view k1_k2 in the table all_type_table, if it does not exist, no error will be reported.
Keywords
DROP, MATERIALIZED, VIEW
Best Practice
DROP-MATERIALIZED-VIEW
DROP-POLICY
Name
DROP POLICY
Description
drop policy for row or storage
ROW POLICY
Grammar :
1. Drop row policy
Example
1. Drop the row policy for table1 named test_row_policy_1
Keywords
DROP, POLICY
Best Practice
TRUNCATE-TABLE
TRUNCATE-TABLE
Name
TRUNCATE TABLE
Description
This statement is used to clear the data of the specified table and partition
grammar:
illustrate:
The statement clears the data, but leaves the table or partition.
Unlike DELETE, this statement can only clear the specified table or partition as a whole, and cannot add filter conditions.
Unlike DELETE, using this method to clear data will not affect query performance.
Example
1. Clear the table tbl under example_db
Keywords
TRUNCATE, TABLE
Best Practice
DROP-TABLE
DROP-TABLE
Name
DROP TABLE
Description
This statement is used to drop a table.
grammar:
illustrate:
After executing DROP TABLE for a period of time, the dropped table can be recovered through the RECOVER statement.
See RECOVER statement for details
If you execute DROP TABLE FORCE, the system will not check whether there are unfinished transactions in the table, the
table will be deleted directly and cannot be recovered, this operation is generally not recommended
Example
1. Delete a table
Keywords
DROP, TABLE
Best Practice
DROP-FUNCTION
DROP-FUNCTION
Name
DROP FUNCTION
Description
Delete a custom function. Function names and parameter types are exactly the same to be deleted.
grammar:
(arg_type [, ...])
Parameter Description:
Example
1. Delete a function
Keywords
DROP, FUNCTION
Best Practice
DROP-DATABASE
DROP-SQL-BLOCK-RULE
Name
DROP SQL BLOCK RULE
Description
Delete SQL blocking rules, support multiple rules, separated by ,
grammar:
Example
Keywords
DROP, SQL_BLOCK_RULE
Best Practice
DROP-CATALOG
DROP-CATALOG
Name
CREATE CATALOG
Description
This statement is used to delete the external catalog.
Syntax:
Example
1. Drop catalog hive
```
Keywords
DROP, CATALOG
Best Practice
PAUSE-ROUTINE-LOAD
PAUSE-ROUTINE-LOAD
Name
PAUSE ROUTINE LOAD
Description
Used to pause a Routine Load job. A suspended job can be rerun with the RESUME command.
Example
Keywords
Best Practice
MULTI-LOAD
MULTI-LOAD
Name
MULTI LOAD
Description
Users submit multiple import jobs through the HTTP protocol. Multi Load can ensure the atomic effect of multiple import
jobs
Syntax:
On the basis of 'MINI LOAD', 'MULTI LOAD' can support users to import to multiple tables at the same time. The
specific commands are shown above.
'/api/{db}/{table}/_load' adds a table to be imported to an import task. The main difference from 'MINI LOAD' is
that the 'sub_label' parameter needs to be passed in
'/api/{db}/_multi_commit' submits the entire multi-table import task, and starts processing in the background
'/api/{db}/_multi_desc' can display the number of jobs submitted by a multi-table import task
Authorization Authentication Currently, Doris uses HTTP Basic authorization authentication. So you need to
specify the username and password when importing
This method is to pass the password in clear text, since we are currently in an intranet
environment. . .
Expect Doris needs to send the http request, it needs to have 'Expect' header information, the content is
'100-continue'
why? Because we need to redirect the request, before transmitting the data content,
This can avoid causing multiple transmissions of data, thereby improving efficiency.
Content-Length Doris needs to send the request with the 'Content-Length' header. If the content sent is
greater than
If the 'Content-Length' is less, then Palo thinks that there is a problem with the
transmission, and fails to submit the task.
NOTE: If more data is sent than 'Content-Length', then Doris only reads 'Content-Length'
Parameter Description:
user: If the user is in the default_cluster, the user is the user_name. Otherwise user_name@cluster_name.
label: Used to specify the label number imported in this batch, which is used for later job status query,
etc.
sub_label: Used to specify the subversion number inside a multi-table import task. For loads imported from
multiple tables, this parameter must be passed in.
columns: used to describe the corresponding column names in the import file.
If it is not passed in, then the order of the columns in the file is considered to be the
same as the order in which the table was created.
column_separator: used to specify the separator between columns, the default is '\t'
NOTE: url encoding is required, for example, '\t' needs to be specified as the delimiter,
max_filter_ratio: used to specify the maximum ratio of non-standard data allowed to filter, the default is 0,
no filtering is allowed
NOTE:
1. This import method currently completes the import work on one machine, so it is not suitable for import
work with a large amount of data.
It is recommended that the amount of imported data should not exceed 1GB
2. Currently it is not possible to submit multiple files using `curl -T "{file1, file2}"`, because curl
splits them into multiple files
The request is sent. Multiple requests cannot share a label number, so it cannot be used.
3. Supports the use of curl to import data into Doris in a way similar to streaming, but only after the
streaming ends Doris
The real import behavior will occur, and the amount of data in this way cannot be too large.
Example
1. Import the data in the local file 'testData1' into the table 'testTbl1' in the database 'testDb', and
Import the data of 'testData2' into table 'testTbl2' in 'testDb' (user is in defalut_cluster)
3. Multi-table import to see how much content has been submitted (the user is in the defalut_cluster)
Keywords
MULTI, MINI, LOAD
Best Practice
RESUME-SYNC-JOB
RESUME-SYNC-JOB
Name
RESUME SYNC JOB
Description
Resume a resident data synchronization job whose current database has been suspended by job_name , and the job will
continue to synchronize data from the latest position before the last suspension.
grammar:
Example
1. Resume the data synchronization job named job_name
Keywords
RESUME, SYNC, LOAD
Best Practice
CREATE-ROUTINE-LOAD
CREATE-ROUTINE-LOAD
Name
Description
The Routine Load function allows users to submit a resident import task, and import data into Doris by continuously reading
data from a specified data source.
Currently, only data in CSV or Json format can be imported from Kakfa through unauthenticated or SSL authentication.
grammar:
[merge_type]
[load_properties]
[job_properties]
[COMMENT "comment"]
[db.]job_name
The name of the import job. Within the same database, only one job with the same name can be running.
tbl_name
merge_type
Data merge type. The default is APPEND, which means that the imported data are ordinary append write operations. The
MERGE and DELETE types are only available for Unique Key model tables. The MERGE type needs to be used with the
[DELETE ON] statement to mark the Delete Flag column. The DELETE type means that all imported data are deleted
data.
load_properties
[column_separator],
[columns_mapping],
[preceding_filter],
[where_predicates],
[partitions],
[DELETE ON],
[ORDER BY]
column_separator
Specifies the column separator, defaults to \t
columns_mapping
It is used to specify the mapping relationship between file columns and columns in the table, as well as various
column transformations. For a detailed introduction to this part, you can refer to the [Column Mapping,
Transformation and Filtering] document.
preceding_filter
Filter raw data. For a detailed introduction to this part, you can refer to the [Column Mapping, Transformation and
Filtering] document.
where_predicates
Filter imported data based on conditions. For a detailed introduction to this part, you can refer to the [Column
Mapping, Transformation and Filtering] document.
partitions
Specify in which partitions of the import destination table. If not specified, it will be automatically imported into the
corresponding partition.
DELETE ON
It needs to be used with the MEREGE import mode, only for the table of the Unique Key model. Used to specify the
columns and calculated relationships in the imported data that represent the Delete Flag.
DELETE ON v3 >100
ORDER BY
Tables only for the Unique Key model. Used to specify the column in the imported data that represents the Sequence
Col. Mainly used to ensure data order when importing.
job_properties
PROPERTIES (
"key1" = "val1",
"key2" = "val2"
i. desired_concurrent_number
Desired concurrency. A routine import job will be divided into multiple subtasks for execution. This parameter
specifies the maximum number of tasks a job can execute concurrently. Must be greater than 0. Default is 3.
This degree of concurrency is not the actual degree of concurrency. The actual degree of concurrency will be
comprehensively considered by the number of nodes in the cluster, the load situation, and the situation of the data
source.
"desired_concurrent_number" = "3"
ii. max_batch_interval/max_batch_rows/max_batch_size
a. The maximum execution time of each subtask, in seconds. The range is 5 to 60. Default is 10.
b. The maximum number of lines read by each subtask. Must be greater than or equal to 200000. The default is
200000.
c. The maximum number of bytes read by each subtask. The unit is bytes and the range is 100MB to 1GB. The default
is 100MB.
These three parameters are used to control the execution time and processing volume of a subtask. When either one
reaches the threshold, the task ends.
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200"
iii. max_error_number
The maximum number of error lines allowed within the sampling window. Must be greater than or equal to 0. The
default is 0, which means no error lines are allowed.
The sampling window is max_batch_rows * 10 . That is, if the number of error lines is greater than max_error_number
within the sampling window, the routine operation will be suspended, requiring manual intervention to check data
quality problems.
Rows that are filtered out by where conditions are not considered error rows.
iv. strict_mode
Whether to enable strict mode, the default is off. If enabled, the column type conversion of non-null raw data will be
filtered if the result is NULL. Specify as:
"strict_mode" = "true"
The strict mode mode means strict filtering of column type conversions during the load process. The strict filtering
strategy is as follows:
a. For column type conversion, if strict mode is true, the wrong data will be filtered. The error data here refers to the
fact that the original data is not null, and the result is a null value after participating in the column type
conversion.
b. When a loaded column is generated by a function transformation, strict mode has no effect on it.
c. For a column type loaded with a range limit, if the original data can pass the type conversion normally, but
cannot pass the range limit, strict mode will not affect it. For example, if the type is decimal(1,0) and the original
data is 10, it is eligible for type conversion but not for column declarations. This data strict has no effect on it.
strict mode and load relationship of source data
Note: 10 Although it is a value that is out of range, because its type meets the requirements of decimal, strict
mode has no effect on it. 10 will eventually be filtered in other ETL processing flows. But it will not be filtered by
strict mode.
v. timezone
Specifies the time zone used by the import job. The default is to use the Session's timezone parameter. This
parameter affects the results of all time zone-related functions involved in the import.
vi. format
Specify the import data format, the default is csv, and the json format is supported.
vii. jsonpaths
When the imported data format is json, the fields in the Json data can be extracted by specifying jsonpaths.
viii. strip_outer_array
When the imported data format is json, strip_outer_array is true, indicating that the Json data is displayed in the form
of an array, and each element in the data will be regarded as a row of data. The default value is false.
-H "strip_outer_array: true"
ix. json_root
When the import data format is json, you can specify the root node of the Json data through json_root. Doris will
extract the elements of the root node through json_root for parsing. Default is empty.
-H "json_root: $.RECORDS"
x. send_batch_parallelism
Integer, Used to set the default parallelism for sending batch, if the value for parallelism exceed
max_send_batch_parallelism_per_job in BE config, then the coordinator BE will use the value of
max_send_batch_parallelism_per_job .
load_to_single_tablet
xi.
Boolean type, True means that one task can only load data to one tablet in the
corresponding partition at a time. The default value is false. This parameter can only be set when loading data into the
OLAP table with random partition.
FROM KAFKA
"key1" = "val1",
"key2" = "val2"
i. kafka_broker_list
Kafka's broker connection information. The format is ip:host. Separate multiple brokers with commas.
"kafka_broker_list" = "broker1:9092,broker2:9092"
ii. kafka_topic
"kafka_topic" = "my_topic"
iii. kafka_partitions/kafka_offsets
Specify the kafka partition to be subscribed to, and the corresponding starting offset of each partition. If a time is
specified, consumption will start at the nearest offset greater than or equal to the time.
If not specified, all partitions under topic will be subscribed from OFFSET_END by default.
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,OFFSET_BEGINNING,OFFSET_END"
"kafka_partitions" = "0,1,2,3",
Note that the time format cannot be mixed with the OFFSET format.
iv. property
Specify custom kafka parameters. The function is equivalent to the "--property" parameter in the kafka shell.
When the value of the parameter is a file, you need to add the keyword: "FILE:" before the value.
For how to create a file, please refer to the [CREATE FILE](http: //palo.baidu.com/docs/SQL Manual/Syntax
Help/DML/ROUTINE-LOAD/#Syntax error or this link does not work-) command documentation.
For more supported custom parameters, please refer to the configuration items on the client side in the official
CONFIGURATION document of librdkafka. Such as:
"property.client.id" = "12345",
"property.ssl.ca.location" = "FILE:ca.pem"
a. When connecting to Kafka using SSL, you need to specify the following parameters:
"property.security.protocol" = "ssl",
"property.ssl.ca.location" = "FILE:ca.pem",
"property.ssl.certificate.location" = "FILE:client.pem",
"property.ssl.key.location" = "FILE:client.key",
"property.ssl.key.password" = "abcdefg"
in:
"property.ssl.certificate.location"
"property.ssl.key.location"
"property.ssl.key.password"
They are used to specify the client's public key, private key, and password for the private key, respectively.
At this point, you can specify kafka_default_offsets to specify the starting offset. Defaults to OFFSET_END , i.e.
subscribes from the end.
Example:
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
Example
1. Create a Kafka routine import task named test1 for example_tbl of example_db. Specify the column separator and
group.id and client.id, and automatically consume all partitions by default, and start subscribing from the location where
there is data (OFFSET_BEGINNING)
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"property.group.id" = "xxx",
"property.client.id" = "xxx",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
2. Create a Kafka routine import task named test1 for example_tbl of example_db. Import tasks are in strict mode.
PRECEDING FILTER k1 = 1,
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
3. Import data from the Kafka cluster through SSL authentication. Also set the client.id parameter. The import task is in
non-strict mode and the time zone is Africa/Abidjan
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "Africa/Abidjan"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"property.security.protocol" = "ssl",
"property.ssl.ca.location" = "FILE:ca.pem",
"property.ssl.certificate.location" = "FILE:client.pem",
"property.ssl.key.location" = "FILE:client.key",
"property.ssl.key.password" = "abcdefg",
"property.client.id" = "my_client_id"
);
4. Import data in Json format. By default, the field name in Json is used as the column name mapping. Specify to import
three partitions 0, 1, and 2, and the starting offsets are all 0
COLUMNS(category,price,author)
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"format" = "json"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "0,0,0"
);
5. Import Json data, extract fields through Jsonpaths, and specify the root node of the Json document
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"format" = "json",
"jsonpaths" = "[\"$.category\",\"$.author\",\"$.price\",\"$.timestamp\"]",
"json_root" = "$.RECORDS"
"strip_outer_array" = "true"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "0,0,0"
);
6. Create a Kafka routine import task named test1 for example_tbl of example_db. And use conditional filtering.
WITH MERGE
DELETE ON v3 >100
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
COLUMNS(k1,k2,source_sequence,v1,v2),
ORDER BY source_sequence
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "30",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200"
) FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
PROPERTIES
"desired_concurrent_number"="3",
"max_batch_interval" = "30",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200"
) FROM KAFKA
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
);
Keywords
CREATE, ROUTINE, LOAD, CREATE LOAD
Best Practice
kafka_partitions
: Specify a list of partitions to be consumed, such as "0, 1, 2, 3".
kafka_offsets : Specify the starting offset of each partition, which must correspond to the number of kafka_partitions
list. For example: " 1000, 1000, 2000, 2000"
property.kafka_default_offset : Specifies the default starting offset of the partition.
When creating an import job, these three parameters can have the following combinations:
STOP-ROUTINE-LOAD
STOP-ROUTINE-LOAD
Name
Description
User stops a Routine Load job. A stopped job cannot be rerun.
Example
1. Stop the routine import job named test1.
Keywords
STOP, ROUTINE, LOAD
Best Practice
CLEAN-LABEL
CLEAN-LABEL
Name
CLEAN LABEL
Description
For manual cleanup of historical load jobs. After cleaning, the Label can be reused.
Syntax:
Example
1. Clean label label1 from database db1
Keywords
CLEAN, LABEL
Best Practice
ALTER-ROUTINE-LOAD
ALTER-ROUTINE-LOAD
Name
ALTER ROUTINE LOAD
Description
grammar:
[job_properties]
FROM data_source
[data_source_properties]
1. [db.]job_name
2. tbl_name
3. job_properties
Specifies the job parameters that need to be modified. Currently, only the modification of the following parameters is
supported:
i. desired_concurrent_number
ii. max_error_number
iii. max_batch_interval
iv. max_batch_rows
v. max_batch_size
jsonpaths
vi.
vii. json_root
viii. strip_outer_array
ix. strict_mode
timezone
x.
xi. num_as_string
xii. fuzzy_parse
4. data_source
5. data_source_properties
i. kafka_partitions
ii. kafka_offsets
kafka_broker_list
iii.
iv. kafka_topic
v. Custom properties, such as property.group.id
Note:
i. kafka_partitions and kafka_offsets are used to modify the offset of the kafka partition to be consumed, only the
currently consumed partition can be modified. Cannot add partition.
Example
1. Change desired_concurrent_number to 1
PROPERTIES
"desired_concurrent_number" = "1"
);
2. Modify desired_concurrent_number to 10, modify the offset of the partition, and modify the group id.
PROPERTIES
"desired_concurrent_number" = "10"
FROM kafka
"property.group.id" = "new_group"
);
Keywords
ALTER, ROUTINE, LOAD
Best Practice
CANCEL-LOAD
CANCEL-LOAD
Name
CANCEL LOAD
Description
This statement is used to undo an import job for the specified label. Or batch undo import jobs via fuzzy matching
CANCEL LOAD
[FROM db_name]
Example
1. Cancel the import job whose label is example_db_test_load_label on the database example_db
CANCEL LOAD
FROM example_db
CANCEL LOAD
FROM example_db
CANCEL LOAD
FROM example_db
Keywords
CANCEL, LOAD
Best Practice
1. Only pending import jobs in PENDING, ETL, LOADING state can be canceled.
2. When performing batch undo, Doris does not guarantee the atomic undo of all corresponding import jobs. That is, it is
possible that only some of the import jobs were successfully undone. The user can view the job status through the
SHOW LOAD statement and try to execute the CANCEL LOAD statement repeatedly.
RESUME-ROUTINE-LOAD
RESUME-ROUTINE-LOAD
Name
RESUME ROUTINE LOAD
Description
Used to restart a suspended Routine Load job. The restarted job will continue to consume from the previously consumed
offset.
Example
1. Restart the routine import job named test1.
Keywords
Best Practice
STOP-SYNC-JOB
STOP-SYNC-JOB
Name
STOP SYNC JOB
Description
Stop a non-stop resident data synchronization job in a database by job_name .
grammar:
Example
Keywords
Best Practice
Feedback
SQL Manual SQL Reference DML Load PAUSE-SYNC-JOB
PAUSE-SYNC-JOB
PAUSE-SYNC-JOB
Name
PAUSE SYNC JOB
Description
Pause a running resident data synchronization job in a database via job_name . The suspended job will stop synchronizing
data and keep the latest position of consumption until it is resumed by the user.
grammar:
Example
Keywords
PAUSE, SYNC, JOB
Best Practice
BROKER-LOAD
BROKER-LOAD
Name
BROKER LOAD
Description
This command is mainly used to import data on remote storage (such as S3, HDFS) through the Broker service process.
[broker_properties]
[load_properties]
[COMMENT "comment"];
load_label
Each import needs to specify a unique Label. You can use this label to view the progress of the job later.
[database.]label_name
data_desc1
[MERGE|APPEND|DELETE]
DATA INFILE
[NEGATIVE]
[FORMAT AS "file_type"]
[(column_list)]
[SET (column_mapping)]
[WHERE predicate]
[DELETE ON expr]
[ORDER BY source_sequence]
[MERGE|APPEND|DELETE]
Data merge type. The default is APPEND, indicating that this import is a normal append write operation. The MERGE
and DELETE types are only available for Unique Key model tables. The MERGE type needs to be used with the
[DELETE ON] statement to mark the Delete Flag column. The DELETE type indicates that all data imported this time
are deleted data.
DATA INFILE
Specify the file path to be imported. Can be multiple. Wildcards can be used. The path must eventually match to a
file, if it only matches a directory the import will fail.
NEGATIVE
This keyword is used to indicate that this import is a batch of "negative" imports. This method is only for aggregate
data tables with integer SUM aggregate type. This method will reverse the integer value corresponding to the SUM
aggregate column in the imported data. Mainly used to offset previously imported wrong data.
You can specify to import only certain partitions of the table. Data that is no longer in the partition range will be
ignored.
COLUMNS TERMINATED BY
Specifies the column separator. Only valid in CSV format. Only single-byte delimiters can be specified.
FORMAT AS
Specifies the file type, CSV, PARQUET and ORC formats are supported. Default is CSV.
column list
Used to specify the column order in the original file. For a detailed introduction to this part, please refer to the
Column Mapping, Conversion and Filtering document.
SET (column_mapping)
Pre-filter conditions. The data is first concatenated into raw data rows in order according to column list and
COLUMNS FROM PATH AS . Then filter according to the pre-filter conditions. For a detailed introduction to this part,
please refer to the Column Mapping, Conversion and Filtering document.
WHERE predicate
Filter imported data based on conditions. For a detailed introduction to this part, please refer to the Column
Mapping, Conversion and Filtering document.
DELETE ON expr
It needs to be used with the MEREGE import mode, only for the table of the Unique Key model. Used to specify the
columns and calculated relationships in the imported data that represent the Delete Flag.
ORDER BY
Tables only for the Unique Key model. Used to specify the column in the imported data that represents the Sequence
Col. Mainly used to ensure data order when importing.
Specify some parameters of the imported format. For example, if the imported file is in json format, you can specify
parameters such as json_root , jsonpaths , fuzzy parse , etc.
Specify the Broker service name to be used. In the public cloud Doris. Broker service name is bos
broker_properties
Specifies the information required by the broker. This information is usually used by the broker to be able to access
remote storage systems. Such as BOS or HDFS. See the Broker documentation for specific information.
"key1" = "val1",
"key2" = "val2",
...
load_properties
timeout
max_filter_ratio
The maximum tolerable proportion of data that can be filtered (for reasons such as data irregularity). Zero tolerance
by default. The value range is 0 to 1.
exec_mem_limit
strict_mode
timezone
Specify the time zone for some functions that are affected by time zones, such as
strftime/alignment_timestamp/from_unixtime , etc. Please refer to the timezone documentation for details. If not
specified, the "Asia/Shanghai" timezone is used
load_parallelism
It allows the user to set the parallelism of the load execution plan
on a single node when the broker load is submitted,
default value is 1.
send_batch_parallelism
Used to set the default parallelism for sending batch, if the value for parallelism exceed
max_send_batch_parallelism_per_job in BE config, then the coordinator BE will use the value of
max_send_batch_parallelism_per_job .
load_to_single_tablet
Boolean type, True means that one task can only load data to one tablet in the corresponding partition at a time. The
default value is false. The number of tasks for the job depends on the overall concurrency. This parameter can only be
set when loading data into the OLAP table with random partition.
comment Specify the comment for the import job. The comment can be viewed in the `show load` statement.
Example
1. Import a batch of data from HDFS
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file.txt")
"username"="hdfs_user",
"password"="hdfs_password"
);
Import the file file.txt , separated by commas, into the table my_table .
2. Import data from HDFS, using wildcards to match two batches of files in two batches. into two tables separately.
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file-10*")
PARTITION (p1)
SET (
k2 = tmp_k2 + 1,
k3 = tmp_k3 + 1
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file-20*")
"username"="hdfs_user",
"password"="hdfs_password"
);
DATA INFILE("hdfs://hdfs_host:hdfs_port/user/doris/data/*/*")
"username" = "",
"password" = "",
"dfs.nameservices" = "my_ha",
"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
"dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
Specify the delimiter as Hive's default delimiter \\x01 , and use the wildcard * to specify all files in all directories under the
data directory. Use simple authentication while configuring namenode HA.
4. Import data in Parquet format and specify FORMAT as parquet. The default is to judge by the file suffix
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/file")
FORMAT AS "parquet"
"username"="hdfs_user",
"password"="hdfs_password"
);
5. Import the data and extract the partition field in the file path
DATA INFILE("hdfs://hdfs_host:hdfs_port/input/city=beijing/*/*")
FORMAT AS "csv"
"username"="hdfs_user",
"password"="hdfs_password"
);
The columns in the my_table table are k1, k2, k3, city, utc_date .
The hdfs://hdfs_host:hdfs_port/user/doris/data/input/dir/city=beijing directory includes the following files:
hdfs://hdfs_host:hdfs_port/input/city=beijing/utc_date=2020-10-01/0000.csv
hdfs://hdfs_host:hdfs_port/input/city=beijing/utc_date=2020-10-02/0000.csv
hdfs://hdfs_host:hdfs_port/input/city=tianji/utc_date=2020-10-03/0000.csv
hdfs://hdfs_host:hdfs_port/input/city=tianji/utc_date=2020-10-04/0000.csv
The file only contains three columns of k1, k2, k3 , and the two columns of city, utc_date will be extracted from the
file path.
DATA INFILE("hdfs://host:port/input/file")
SET (
k2 = k2 + 1
PRECEDING FILTER k1 = 1
WHERE k1 > k2
"username"="user",
"password"="pass"
);
Only in the original data, k1 = 1, and after transformation, rows with k1 > k2 will be imported.
7. Import data, extract the time partition field in the file path, and the time contains %3A (in the hdfs path, ':' is not allowed,
all ':' will be replaced by %3A)
DATA INFILE("hdfs://host:port/user/data/*/test.txt")
(k2,k3)
SET (
"username"="user",
"password"="pass"
);
/user/data/data_time=2020-02-17 00%3A00%3A00/test.txt
/user/data/data_time=2020-02-18 00%3A00%3A00/test.txt
k2 INT,
k3 INT
8. Import a batch of data from HDFS, specify the timeout and filter ratio. Broker with clear text my_hdfs_broker. Simple
authentication. And delete the columns in the original data that match the columns with v2 greater than 100 in the
imported data, and other columns are imported normally
WITH HDFS
"hadoop.username"="user",
"password"="pass"
PROPERTIES
"timeout" = "3600",
"max_filter_ratio" = "0.1"
);
Import using the MERGE method. my_table must be a table with Unique Key. When the value of the v2 column in the
imported data is greater than 100, the row is considered a delete row.
The import task timeout is 3600 seconds, and the error rate is allowed to be within 10%.
9. Specify the source_sequence column when importing to ensure the replacement order in the UNIQUE_KEYS table:
DATA INFILE("HDFS://test:802/input/file")
INTO TABLE `my_table`
(k1,k2,source_sequence,v1,v2)
ORDER BY source_sequence
WITH HDFS
"hadoop.username"="user",
"password"="pass"
my_table must be an Unqiue Key model table with Sequence Col specified. The data will be ordered according to the
value of the source_sequence column in the source data.
10. Import a batch of data from HDFS, specify the file format as json , and specify parameters of json_root and jsonpaths .
DATA INFILE("HDFS://test:port/input/file.json")
FORMAT AS "json"
PROPERTIES(
"json_root" = "$.item",
with HDFS (
"hadoop.username" = "user"
"password" = ""
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
DATA INFILE("HDFS://test:port/input/file.json")
FORMAT AS "json"
PROPERTIES(
"json_root" = "$.item",
with HDFS (
"hadoop.username" = "user"
"password" = ""
PROPERTIES
"timeout"="1200",
"max_filter_ratio"="0.1"
);
11. Load data in csv format from cos(Tencent Cloud Object Storage).
DATA INFILE("cosn://my_bucket/input/file.csv")
"fs.cosn.userinfo.secretId" = "xxx",
"fs.cosn.userinfo.secretKey" = "xxxx",
"fs.cosn.bucket.endpoint_suffix" = "cos.xxxxxxxxx.myqcloud.com"
12. Load CSV date and trim double quotes and skip first 5 lines
DATA INFILE("cosn://my_bucket/input/file.csv")
"fs.cosn.userinfo.secretId" = "xxx",
"fs.cosn.userinfo.secretKey" = "xxxx",
"fs.cosn.bucket.endpoint_suffix" = "cos.xxxxxxxxx.myqcloud.com"
Keywords
BROKER, LOAD
Best Practice
Broker Load is an asynchronous import process. The successful execution of the statement only means that the import
task is submitted successfully, and does not mean that the data import is successful. The import status needs to be
viewed through the SHOW LOAD command.
Import tasks that have been submitted but not yet completed can be canceled by the CANCEL LOAD command. After
cancellation, the written data will also be rolled back and will not take effect.
All import tasks in Doris are atomic. And the import of multiple tables in the same import task can also guarantee
atomicity. At the same time, Doris can also use the Label mechanism to ensure that the data imported is not lost or
heavy. For details, see the Import Transactions and Atomicity documentation.
Doris can support very rich column transformation and filtering operations in import statements. Most built-in functions
and UDFs are supported. For how to use this function correctly, please refer to the Column Mapping, Conversion and
Filtering document.
max_filter_ratio
Doris' import tasks can tolerate a portion of malformed data. Tolerated via setting. The default is 0,
which means that the entire import task will fail when there is an error data. If the user wants to ignore some
problematic data rows, the secondary parameter can be set to a value between 0 and 1, and Doris will automatically skip
the rows with incorrect data format.
For some calculation methods of the tolerance rate, please refer to the Column Mapping, Conversion and Filtering
document.
6. Strict Mode
The strict_mode attribute is used to set whether the import task runs in strict mode. The format affects the results of
column mapping, transformation, and filtering. For a detailed description of strict mode, see the strict mode
documentation.
7. Timeout
The default timeout for Broker Load is 4 hours. from the time the task is submitted. If it does not complete within the
timeout period, the task fails.
Broker Load is suitable for importing data within 100GB in one import task. Although theoretically there is no upper limit
on the amount of data imported in one import task. But committing an import that is too large results in a longer run
time, and the cost of retrying after a failure increases.
At the same time, limited by the size of the cluster, we limit the maximum amount of imported data to the number of
ComputeNode nodes * 3GB. In order to ensure the rational use of system resources. If there is a large amount of data to
be imported, it is recommended to divide it into multiple import tasks.
Doris also limits the number of import tasks running simultaneously in the cluster, usually ranging from 3 to 10. Import
jobs submitted after that are queued. The maximum queue length is 100. Subsequent submissions will be rejected
outright. Note that the queue time is also calculated into the total job time. If it times out, the job is canceled. Therefore, it
is recommended to reasonably control the frequency of job submission by monitoring the running status of the job.
CREATE-SYNC-JOB
CREATE-SYNC-JOB
Name
CREATE SYNC JOB
Description
The data synchronization (Sync Job) function supports users to submit a resident data synchronization job, and
incrementally synchronizes the CDC (Change Data Capture) of the user's data update operation in the Mysql database by
reading the Binlog log from the specified remote address. Features.
Currently, the data synchronization job only supports connecting to Canal, obtaining the parsed Binlog data from the Canal
Server and importing it into Doris.
Users can view the data synchronization job status through SHOW SYNC JOB.
grammar:
channel_desc,
channel_desc
...
binlog_desc
1. job_name
The synchronization job name is the unique identifier of the job in the current database. Only one job with the same
job_name can be running.
2. channel_desc
The data channel under the job is used to describe the mapping relationship between the mysql source table and the
doris target table.
grammar:
[columns_mapping]
i. mysql_db.src_tbl
ii. des_tbl
Specify the target table on the doris side. Only unique tables are supported, and the batch delete function of the table
needs to be enabled (see the 'batch delete function' of help alter table for how to enable it).
iii. column_mapping
Specifies the mapping relationship between the columns of the mysql source table and the doris target table. If not
specified, FE will default the columns of the source table and the target table to one-to-one correspondence in order.
Example:
3. binlog_desc
Used to describe the remote data source, currently only one canal is supported.
grammar:
FROM BINLOG
"key1" = "value1",
"key2" = "value2"
canal.
i. The properties corresponding to the Canal data source, prefixed with
g. canal.debug: optional, when set to true, the batch and details of each row of data will be printed out
Example
1. Simply create a data synchronization job named job1 for test_tbl of test_db , connect to the local Canal server,
corresponding to the Mysql source table mysql_db1.tbl1 .
FROM BINLOG
"type" = "canal",
"canal.server.ip" = "127.0.0.1",
"canal.server.port" = "11111",
"canal.destination" = "example",
"canal.username" = "",
"canal.password" = ""
);
job1 test_db
2. Create a data synchronization job named for multiple tables of , corresponding to multiple Mysql source
tables one-to-one, and explicitly specify the column mapping.
FROM BINLOG
"type" = "canal",
"canal.server.ip" = "xx.xxx.xxx.xx",
"canal.server.port" = "12111",
"canal.destination" = "example",
"canal.username" = "username",
"canal.password" = "password"
);
Keywords
CREATE, SYNC, JOB
Best Practice
STREAM-LOAD
STREAM-LOAD
Name
STREAM LOAD
Description
This statement is used to import data into the specified table. The difference from ordinary Load is that this import method is
synchronous import.
This import method can still ensure the atomicity of a batch of import tasks, either all data is imported successfully or all of
them fail.
This operation will update the data of the rollup table related to this base table at the same time.
This is a synchronous operation. After the entire data import work is completed, the import result is returned to the user.
Currently, HTTP chunked and non-chunked uploads are supported. For non-chunked methods, Content-Length must be
used to indicate the length of the uploaded content, which can ensure the integrity of the data.
In addition, it is best for users to set the content of the Expect Header field to 100-continue, which can avoid unnecessary
data transmission in some error scenarios.
Parameter introduction:
Users can pass in import parameters through the Header part of HTTP
1. label: The label imported once, the data of the same label cannot be imported multiple times. Users can avoid the
problem of duplicate data import by specifying Label.
Currently, Doris retains the most recent successful label within 30 minutes.
2. column_separator: used to specify the column separator in the import file, the default is \t. If it is an invisible character,
you need to add \x as a prefix and use hexadecimal to represent the separator.
For example, the separator \x01 of the hive file needs to be specified as -H "column_separator:\x01".
3. line_delimiter: used to specify the newline character in the imported file, the default is \n. Combinations of multiple
characters can be used as newlines.
4. columns: used to specify the correspondence between the columns in the import file and the columns in the table. If the
column in the source file corresponds exactly to the content in the table, then there is no need to specify the content of
this field.
If the source file does not correspond to the table schema, then this field is required for some data conversion. There are
two forms of column, one is directly corresponding to the field in the imported file, which is directly represented by the
field name;
One is derived column, the syntax is column_name = expression. Give a few examples to help understand.
Example 1: There are 3 columns "c1, c2, c3" in the table, and the three columns in the source file correspond to "c3, c2, c1"
at a time; then you need to specify -H "columns: c3, c2, c1 "
Example 2: There are 3 columns "c1, c2, c3" in the table, the first three columns in the source file correspond in turn, but
there is more than 1 column; then you need to specify -H "columns: c1, c2, c3, xxx";
Example 3: There are three columns "year, month, day" in the table, and there is only one time column in the source file,
which is in "2018-06-01 01:02:03" format;
Then you can specify -H "columns: col, year = year(col), month=month(col), day=day(col)" to complete the import
5. where: used to extract part of the data. If the user needs to filter out the unnecessary data, he can achieve this by setting
this option.
Example 1: Only import data greater than k1 column equal to 20180601, then you can specify -H "where: k1 = 20180601"
when importing
6. max_filter_ratio: The maximum tolerable data ratio that can be filtered (for reasons such as data irregularity). Zero
tolerance by default. Data irregularities do not include rows filtered out by where conditions.
7. partitions: used to specify the partition designed for this import. If the user can determine the partition corresponding to
the data, it is recommended to specify this item. Data that does not satisfy these partitions will be filtered out.
8. timeout: Specify the import timeout. in seconds. The default is 600 seconds. The setting range is from 1 second to 259200
seconds.
9. strict_mode: The user specifies whether to enable strict mode for this import. The default is off. The enable mode is -H
"strict_mode: true".
10. timezone: Specify the time zone used for this import. The default is Dongba District. This parameter affects the results of
all time zone-related functions involved in the import.
11. exec_mem_limit: Import memory limit. Default is 2GB. The unit is bytes.
12. format: Specify load data format, support csv, json, csv_with_names(support csv file line header filter),
csv_with_names_and_types(support csv file first two lines filter), parquet, orc, default is csv.
13. jsonpaths: The way of importing json is divided into: simple mode and matching mode.
Simple mode: The simple mode is not set the jsonpaths parameter. In this mode, the json data is required to be an object
type, for example:
Matching mode: It is relatively complex for json data and needs to match the corresponding value through the jsonpaths
parameter.
14. strip_outer_array: Boolean type, true indicates that the json data starts with an array object and flattens the array object,
the default value is false. E.g:
{"k1" : 3, "v1" : 4}
When strip_outer_array is true, the final import into doris will generate two rows of data.
15. json_root: json_root is a valid jsonpath string, used to specify the root node of the json document, the default value is " ".
16. merge_type: The merge type of data, which supports three types: APPEND, DELETE, and MERGE. Among them,
APPEND is the default value, which means that this batch of data needs to be appended to the existing data, and
DELETE means to delete all the data with the same key as this batch of data. Line, the MERGE semantics need to be used
in conjunction with the delete condition, which means that the data that meets the delete condition is processed
according to the DELETE semantics and the rest is processed according to the APPEND semantics, for example: -H
"merge_type: MERGE" -H "delete: flag=1"
17. delete: Only meaningful under MERGE, indicating the deletion condition of the data
function_column.sequence_col: Only
applicable to UNIQUE_KEYS. Under the same key column, ensure that the value column is REPLACEed according to the
source_sequence column. The source_sequence can be a column in the data source or a column in the table structure.
18. fuzzy_parse: Boolean type, true means that json will be parsed with the schema of the first row. Enabling this option can
improve the efficiency of json import, but requires that the order of the keys of all json objects is the same as the first
row, the default is false, only use in json format
19. num_as_string: Boolean type, true means that when parsing json data, the numeric type will be converted to a string,
and then imported without losing precision.
20. read_json_by_line: Boolean type, true to support reading one json object per line, the default value is false.
21. send_batch_parallelism: Integer, used to set the parallelism of sending batch data. If the value of parallelism exceeds
max_send_batch_parallelism_per_job
in the BE configuration, the BE as a coordination point will use the value of
max_send_batch_parallelism_per_job
.
,
22. hidden_columns: Specify hidden column when no columns in Headers multi hidden column shoud be
separated by
commas.
```
hidden_columns: __DORIS_DELETE_SIGN__,__DORIS_SEQUENCE_COL__
The system will use the order specified by user. in case above, data should be ended
with __DORIS_SEQUENCE_COL__.
```
23. load_to_single_tablet: Boolean type, True means that one task can only load data to one tablet in the corresponding
partition at a time. The default value is false. This parameter can only be set when loading data into the OLAP table with
random partition.
RETURN VALUES
After the import is complete, the related content of this import will be returned in Json format.
Currently includes the following fields
Status: Import the last status.
Success: Indicates that the import is successful and
the data is already visible;
Publish Timeout: Indicates that the import job has been successfully committed, but is not
immediately visible for some reason. The user can consider the import to be successful and not have to retry the import
Label Already Exists: Indicates that the Label has been occupied by other jobs. It may be imported successfully or it may
be being imported.
The user needs to determine the subsequent operation through the get label state command
Others:
The import failed, the user can specify the Label to retry the job
Message: Detailed description of the import status. On
failure, the specific failure reason is returned.
NumberTotalRows: The total number of rows read from the data stream
NumberLoadedRows: The number of data rows imported this time, only valid in Success
NumberFilteredRows: The
number of rows filtered out by this import, that is, the number of rows with unqualified data quality
NumberUnselectedRows: This import, the number of rows filtered out by the where condition
LoadBytes: The size of the
source file data imported this time
LoadTimeMs: The time taken for this import
BeginTxnTimeMs: The time it takes to
request Fe to start a transaction, in milliseconds.
StreamLoadPutTimeMs: The time it takes to request Fe to obtain the
execution plan for importing data, in milliseconds.
ReadDataTimeMs: Time spent reading data, in milliseconds.
WriteDataTimeMs: The time taken to perform the write data operation, in milliseconds.
CommitAndPublishTimeMs: The
time it takes to submit a request to Fe and publish the transaction, in milliseconds.
ErrorURL: The specific content of the
filtered data, only the first 1000 items are retained
ERRORS:
Import error details can be viewed with the following statement:
````
````
24. compress_type
Specify compress type file. Only support compressed csv file now. Support gz, lzo, bz2, lz4, lzop, deflate.
25. trim_double_quotes: Boolean type, The default value is false. True means that the outermost double quotes of each field
in the csv file are trimmed.
26. skip_lines: Integer type, the default value is 0. It will skip some lines in the head of csv file. It will be disabled when format
is csv_with_names or csv_with_names_and_types .
Example
1. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', and use Label for deduplication.
Specify a timeout of 100 seconds
2. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', use Label for deduplication, and
only import data whose k1 is equal to 20180601
3. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', allowing a 20% error rate (the
user is in the defalut_cluster)
curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData
http://host:port/api/testDb/testTbl/_stream_load
4. Import the data in the local file 'testData' into the table 'testTbl' in the database 'testDb', allow a 20% error rate, and
specify the column name of the file (the user is in the defalut_cluster)
curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData
http://host:port/api/testDb/testTbl /_stream_load
5. Import the data in the local file 'testData' into the p1, p2 partitions of the table 'testTbl' in the database 'testDb', allowing a
20% error rate.
7. Import a table containing HLL columns, which can be columns in the table or columns in the data to generate HLL
columns, or use hll_empty to supplement columns that are not in the data
8. Import data for strict mode filtering and set the time zone to Africa/Abidjan
9. Import a table with a BITMAP column, which can be a column in the table or a column in the data to generate a BITMAP
column, or use bitmap_empty to fill an empty Bitmap
{"category":"C++","author":"avc","title":"C++ primer","price":895}
Import command:
In order to improve throughput, it supports importing multiple pieces of json data at one time, each line is a json object,
and \n is used as a newline by default. You need to set read_json_by_line to true. The json data format is as follows:
{"category":"C++","author":"avc","title":"C++ primer","price":89.5}
{"category":"Java","author":"avc","title":"Effective Java","price":95}
{"category":"Linux","author":"avc","title":"Linux kernel","price":195}
{"category":"xuxb111","author":"1avc","title":"SayingsoftheCentury","price":895},
{"category":"xuxb222","author":"2avc"," title":"SayingsoftheCentury","price":895},
{"category":"xuxb333","author":"3avc","title":"SayingsoftheCentury","price":895}
Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price
curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H
"jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -T testData
http://host:port/api/testDb/testTbl/_stream_load
illustrate:
1) If the json data starts with an array, and each object in the array is a record, you need to set strip_outer_array
to true, which means flatten the array.
2) If the json data starts with an array, and each object in the array is a record,
when setting jsonpath, our ROOT node is actually an object in the array.
"RECORDS":[
{"category":"11","title":"SayingsoftheCentury","price":895,"timestamp":1589191587},
{"category":"22","author":"2avc","price":895,"timestamp":1589191487},
{"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387}
Precise import by specifying jsonpath, such as importing only three attributes of category, author, and price
curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H
"jsonpaths: [\"$.category\",\" $.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS"
-T testData http://host:port/api/testDb/testTbl/_stream_load
13. Delete the data with the same import key as this batch
14. Delete the columns in this batch of data that match the data whose flag is listed as true, and append other rows normally
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H
"merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load
file data:
id,name,age
1,doris,20
2,flink,10
Keywords
STREAM, LOAD
Best Practice
Stream Load is a synchronous import process. The successful execution of the statement means that the data is
imported successfully. The imported execution result will be returned synchronously through the HTTP return value. And
display it in Json format. An example is as follows:
"TxnId": 17,
"Label": "707717c0-271a-44c5-be0b-4e71bfeacaa5",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 5,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 28,
"LoadTimeMs": 27,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 18
TxnId: Import transaction ID, which is automatically generated by the system and is globally unique.
Label: Import Label, if not specified, the system will generate a UUID.
Status:
ExistingJobStatus:
This field is only displayed when the Status is "Label Already Exists". The user can know the status of the import job
corresponding to the existing Label through this status. "RUNNING" means the job is still executing, "FINISHED"
means the job was successful.
StreamLoadPutTimeMs: The time taken to request the FE to obtain the execution plan for importing data, in
milliseconds.
WriteDataTimeMs: The time spent performing the write data operation, in milliseconds.
CommitAndPublishTimeMs: The time it takes to submit a request to Fe and publish the transaction, in milliseconds.
ErrorURL: If there is a data quality problem, visit this URL to view the specific error line.
2. How to correctly submit the Stream Load job and process the returned results.
Stream Load is a synchronous import operation, so the user needs to wait for the return result of the command
synchronously, and decide the next processing method according to the return result.
The user's primary concern is the Status field in the returned result.
If it is Success , everything is fine and you can do other operations after that.
If the returned result shows a large number of Publish Timeout , it may indicate that some resources (such as IO) of the
cluster are currently under strain, and the imported data cannot take effect finally. The import task in the state of Publish
Timeout has succeeded and does not need to be retried. However, it is recommended to slow down or stop the
submission of new import tasks and observe the cluster load.
If the returned result is Fail , the import failed, and you need to check the problem according to the specific reason.
Once resolved, you can retry with the same Label.
In some cases, the user's HTTP connection may be disconnected abnormally and the final returned result cannot be
obtained. At this point, you can use the same Label to resubmit the import task, and the resubmitted task may have the
following results:
i. Status status is Success , Fail or Publish Timeout . At this point, it can be processed according to the normal
process.
ii. The Status status is Label Already Exists . At this time, you need to continue to view the ExistingJobStatus field.
If the value of this field is FINISHED , it means that the import task corresponding to this Label has been successful,
RUNNING
and there is no need to retry. If it is , it means that the import task corresponding to this Label is still running.
At this time, you need to use the same Label to continue to submit repeatedly at intervals (such as 10 seconds) until
Status Label Already Exists' ExistingJobStatus FINISHED
is not , or until the value of the field is .
Import tasks that have been submitted and not yet completed can be canceled with the CANCEL LOAD command. After
cancellation, the written data will also be rolled back and will not take effect.
All import tasks in Doris are atomic. And the import of multiple tables in the same import task can also guarantee
atomicity. At the same time, Doris can also use the Label mechanism to ensure that the data imported is not lost or
heavy. For details, see the Import Transactions and Atomicity documentation.
Doris can support very rich column transformation and filtering operations in import statements. Most built-in functions
and UDFs are supported. For how to use this function correctly, please refer to the Column Mapping, Conversion and
Filtering document.
Doris' import tasks can tolerate a portion of malformed data. The tolerance ratio is set via max_filter_ratio . The default
is 0, which means that the entire import task will fail when there is an error data. If the user wants to ignore some
problematic data rows, the secondary parameter can be set to a value between 0 and 1, and Doris will automatically skip
the rows with incorrect data format.
For some calculation methods of the tolerance rate, please refer to the Column Mapping, Conversion and Filtering
document.
7. Strict Mode
The strict_mode attribute is used to set whether the import task runs in strict mode. The format affects the results of
column mapping, transformation, and filtering. For a detailed description of strict mode, see the strict mode
documentation.
8. Timeout
The default timeout for Stream Load is 10 minutes. from the time the task is submitted. If it does not complete within the
timeout period, the task fails.
Doris also limits the number of import tasks running at the same time in the cluster, usually ranging from 10-20. Import
jobs submitted after that will be rejected.
MYSQL-LOAD
MYSQL-LOAD
Name
Since Version dev MYSQL LOAD
Description
mysql-load: Import local data using the MySql client
LOAD DATA
[LOCAL]
INFILE 'file_name'
This statement is used to import data to the specified table. Unlike normal Load, this import method is a synchronous
import.
This import method can still guarantee the atomicity of a batch of import tasks, either all data imports are successful or all
fail.
1. MySQL Load starts with the syntax LOAD DATA , without specifying LABEL
2. Specify LOCAL to read client side files. Not specified to read FE server side local files. Server side load was disabled by
default. It can be enabled by setting a secure path in FE configuration mysql_load_server_secure_path
3. The local fill path will be filled after INFILE , which can be a relative path or an absolute path. Currently only a single file is
supported, and multiple files are not supported
4. The table name after INTO TABLE can specify the database name, as shown in the case. It can also be omitted, and the
database where the current user is located will be used.
5. PARTITION syntax supports specified partition to import
6. COLUMNS TERMINATED BY specifies the column separator
PROPERTIES
1. max_filter_ratio :The maximum tolerable data ratio that can be filtered (for reasons such as data irregularity). Zero
tolerance by default. Data irregularities do not include rows filtered out by where conditions.
2. timeout: Specify the import timeout. in seconds. The default is 600 seconds. The setting range is from 1 second to 259200
seconds.
3. strict_mode: The user specifies whether to enable strict mode for this import. The default is off.
4. timezone: Specify the time zone used for this import. The default is Dongba District. This parameter affects the results of
all time zone-related functions involved in the import.
Example
1. Import the data from the client side local file testData into the table testTbl in the database testDb . Specify a timeout
of 100 seconds
INFILE 'testData'
PROPERTIES ("timeout"="100")
2. Import the data from the server side local file /root/testData (set FE config mysql_load_server_secure_path to be root
already) into the table testTbl in the database testDb . Specify a timeout of 100 seconds
LOAD DATA
INFILE '/root/testData'
PROPERTIES ("timeout"="100")
3. Import data from client side local file testData into table testTbl in database testDb , allowing 20% error rate
INFILE 'testData'
PROPERTIES ("max_filter_ratio"="0.2")
4. Import the data from the client side local file testData into the table testTbl in the database testDb , allowing a 20%
error rate and specifying the column names of the file
INFILE 'testData'
PROPERTIES ("max_filter_ratio"="0.2")
5. Import the data in the local file testData into the p1, p2 partitions in the table of testTbl in the database testDb ,
allowing a 20% error rate.
INFILE 'testData'
PROPERTIES ("max_filter_ratio"="0.2")
6. Import the data in the CSV file testData with a local row delimiter of 0102 and a column delimiter of 0304 into the
table testTbl in the database testDb .
LOAD DATA LOCAL
INFILE 'testData'
7. Import the data from the local file testData into the p1, p2 partitions in the table of testTbl in the database testDb and
skip the first 3 lines.
INFILE 'testData'
IGNORE 1 LINES
8. Import data for strict schema filtering and set the time zone to Africa/Abidjan
INFILE 'testData'
9. Import data is limited to 10GB of import memory and timed out in 10 minutes
INFILE 'testData'
Keywords
MYSQL, LOAD
INSERT
INSERT
Name
INSERT
Description
The change statement is to complete the data insertion operation.
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
Parameters
tablet_name: The destination table for importing data. Can be of the form db_name.table_name
table_name
partitions: Specify the partitions to be imported, which must be partitions that exist in . Multiple partition
names are separated by commas
column_name: The specified destination column, must be a column that exists in table_name
query: a common query, the result of the query will be written to the target
hint: some indicator used to indicate the execution behavior of INSERT . Both streaming and the default non- streaming
method use synchronous mode to complete INSERT statement execution
The non- streaming method will return a
label after the execution is completed, which is convenient for users to query the import status through SHOW LOAD
Notice:
When executing the INSERT statement, the default behavior is to filter the data that does not conform to the target table
format, such as the string is too long. However, for business scenarios that require data not to be filtered, you can set the
session variable enable_insert_strict to true to ensure that INSERT will not be executed successfully when data is filtered
out.
Example
The test table contains two columns c1 , c2 .
The first and second statements have the same effect. When no target column is specified, the column order in the table is
used as the default target column.
The third and fourth statements express the same meaning, use the default value of the
c2 column to complete the data import.
2. Import multiple rows of data into the test table at one time
INSERT INTO test (c1, c2) VALUES (1, 2), (3, 2 * 2);
INSERT INTO test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
The first and second statements have the same effect, import two pieces of data into the test table at one time
The effect of
the third and fourth statements is known, and the default value of the c2 column is used to import two pieces of data into
the test table
4. Import a query result into the test table, specifying the partition and label
INSERT INTO test PARTITION(p1, p2) WITH LABEL `label1` SELECT * FROM test2;
INSERT INTO test WITH LABEL `label1` (c1, c2) SELECT * from test2;
Asynchronous import is actually a synchronous import encapsulated into asynchronous. Filling in streaming and not filling
in execution efficiency is the same.
Since the previous import methods of Doris are all asynchronous import methods, in order to be compatible with the old
usage habits, the INSERT statement without streaming will still return a label. Users need to view the label import job
through the SHOW LOAD command. state.
Keywords
INSERT
Best Practice
The INSERT operation is a synchronous operation, and the return of the result indicates the end of the operation. Users
need to perform corresponding processing according to the different returned results.
If the result set of the insert corresponding to the select statement is empty, it will return as follows:
Query OK indicates successful execution. 0 rows affected means that no data was imported.
In the case where the result set is not empty. The returned results are divided into the following situations:
mysql> insert into tbl1 with label my_label1 select * from tbl2;
Query OK indicates successful execution. 4 rows affected means that a total of 4 rows of data were imported.
2 warnings indicates the number of lines to be filtered.
label is a user-specified label or an automatically generated label. Label is the ID of this Insert Into import job.
Each import job has a unique Label within a single database.
status indicates whether the imported data is visible. Show visible if visible, committed if not visible.
When you need to view the filtered rows, the user can pass the following statement
The URL in the returned result can be used to query the wrong data. For details, see the summary of Viewing
Error Lines later.
Invisibility of data is a temporary state, this batch of data will eventually be visible
You can view the visible status of this batch of data with the following statement:
If the TransactionStatus column in the returned result is visible , the representation data is visible.
iii. Execution failed
Execution failure indicates that no data was successfully imported, and the following is returned:
ERROR 1064 (HY000): all partitions have no load data. url: http://10.74.167.16:8042/api/_load_error_log?
file=__shard_2/error_log_insert_stmt_ba8bb9e158e4879-ae8de8507c0bf8a2_ba8bb9e158e4879_ae8de8507c0
Where ERROR 1064 (HY000): all partitions have no load data shows the reason for the failure. The following url
can be used to query the wrong data:
2. Timeout time
Since Version dev The timeout for INSERT operations is controlled by [session variable](/docs/dev/advanced/variables)
`insert_timeout`. The default is 4 hours. If it times out, the job will be canceled.
The INSERT operation also guarantees the atomicity of imports, see the Import Transactions and Atomicity
documentation.
When using CTE(Common Table Expressions) as the query part in an insert operation, the WITH LABEL and column parts
must be specified.
4. Filter Threshold
Unlike other import methods, INSERT operations cannot specify a filter threshold ( max_filter_ratio ). The default filter
threshold is 1, which means that rows with errors can be ignored.
For business scenarios that require data not to be filtered, you can set session variable enable_insert_strict to true to
ensure that when there is data When filtered out, INSERT will not be executed successfully.
5. Performance issues
There is no single row insertion using the VALUES method. If you must use it this way, combine multiple rows of data into
one INSERT statement for bulk commit.
SELECT
SELECT
Name
SELECT
description
Mainly introduces the use of Select syntax
grammar:
SELECT
[FROM table_references
[PARTITION partition_list]
[TABLET tabletid_list]
[REPEATABLE pos_seek]]
[WHERE where_condition]
[HAVING where_condition]
1. Syntax Description:
i. select_expr, ... Columns retrieved and displayed in the result, when using an alias, as is optional.
ii. select_expr, ... Retrieved target table (one or more tables (including temporary tables generated by subqueries)
iii. where_definition retrieves the condition (expression), if there is a WHERE clause, the condition filters the row data.
where_condition is an expression that evaluates to true for each row to be selected. Without the WHERE clause, the
statement selects all rows. In WHERE expressions, you can use any MySQL supported functions and operators
except aggregate functions
iv. ALL | DISTINCT : to refresh the result set, all is all, distinct/distinctrow will refresh the duplicate columns, the default
is all
v. `ALL EXCEPT`: Filter on the full (all) result set, except specifies the name of one or more columns to be excluded
from the full result set. All matching column names will be ignored in the output.
vi. INTO OUTFILE 'file_name' : save the result to a new file (which did not exist before), the difference lies in the save
format.
vii. Group by having : Group the result set, and brush the result of group by when having appears. Grouping Sets ,
Rollup , Cube are extensions of group by, please refer to GROUPING SETS DESIGN for details.
viii. Order by : Sort the final result, Order by sorts the result set by comparing the size of one or more columns.
Order by is a time-consuming and resource-intensive operation, because all data needs to be sent to 1 node before it
can be sorted, and the sorting operation requires more memory than the non-sorting operation.
If you need to return the top N sorted results, you need to use the LIMIT clause; in order to limit memory usage, if the
user does not specify the LIMIT clause, the first 65535 sorted results are returned by default.
ix. Limit n : limit the number of lines in the output result, limit m,n means output n records starting from the mth
line.
x. The Having clause does not filter the row data in the table, but filters the results produced by the aggregate
function.
Typically having is used with aggregate functions (eg : COUNT(), SUM(), AVG(), MIN(), MAX() ) and group by
clauses.
xi. SELECT supports explicit partition selection using PARTITION containing a list of partitions or subpartitions (or both)
following the name of the table in table_reference
xii. [TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek] : Limit the number of rows read from the table in
the FROM clause, select a number of Tablets pseudo-randomly from the table according to the specified number of
rows or percentages, and specify the number of seeds in REPEATABLE to return the selected samples again. In
addition, you can also manually specify the TableID, Note that this can only be used for OLAP tables.
Syntax constraints:
1. SELECT can also be used to retrieve calculated rows without referencing any table.
2. All clauses must be ordered strictly according to the above format, and a HAVING clause must be placed after the
GROUP BY clause and before the ORDER BY clause.
3. The alias keyword AS is optional. Aliases can be used for group by, order by and having
4. Where clause: The WHERE statement is executed to determine which rows should be included in the GROUP BY section,
and HAVING is used to determine which rows in the result set should be used.
5. The HAVING clause can refer to the total function, but the WHERE clause cannot refer to, such as count, sum, max, min,
avg, at the same time, the where clause can refer to other functions except the total function. Column aliases cannot be
used in the Where clause to define conditions.
6. Group by followed by with rollup can count the results one or more times.
Join query:
JION
table_references:
table_reference [, table_reference] …
table_reference:
table_factor
| join_table
table_factor:
| ( table_references )
ON conditional_expr }
join_table:
join_condition:
ON conditional_expr
UNION Grammar :
SELECT ...
UNION SELECT
is used to combine the results of multiple statements into a single result set.
The column names in the first SELECT statement are used as the column names in the returned results. The selected
columns listed in the corresponding position of each SELECT statement should have the same data type. (For example, the
first column selected by the first statement should be of the same type as the first column selected by other statements.)
The default behavior of UNION is to remove duplicate rows from the result. The optional DISTINCT keyword has no effect
other than the default, since it also specifies duplicate row removal. With the optional ALL keyword, no duplicate row
removal occurs, and the result includes all matching rows in all SELECT statements
WITH statement:
To specify common table expressions, use the WITH clause with one or more comma-separated clauses. Each subclause
provides a subquery that generates the result set and associates the name with the subquery. The following example defines
WITH clauses in CTEs named cte1 and cte2 , and refers to the WITH clause below their top-level SELECT :
WITH
cte1 AS
cte2 AS
((SELECT a,b FROM table1),
WITH
In a statement containing the clause, each CTE name can be referenced to access the corresponding CTE result set.
CTE names can be referenced in other CTEs, allowing CTEs to be defined based on other CTEs.
A CTE can refer to itself to define a recursive CTE. Common applications of recursive CTEs include sequence generation and
traversal of hierarchical or tree-structured data.
example
3. GROUP BY Example
--Query the tb_book table, group by type, and find the average price of each type of book,
4. DISTINCT Use
5. ORDER BY Example
Sort query results in ascending (default) or descending (DESC) order. Ascending NULL is first, descending NULL is last
--Query all records in the tb_book table, sort them in descending order by id, and display three records
Can realize fuzzy query, it has two wildcards: % and _ , % can match one or more characters, _ can match one
character
UNION
WITH cte AS
UNION ALL
SELECT 3, 4
Equivalent to
SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
SELECT left_tbl.*
+------+------+------+------+
| a | b | a | c |
+------+------+------+------+
| 2 | y | 2 | z |
| NULL | NULL | 3 | w |
+------+------+------+------+
16. TABLESAMPLE
--Pseudo-randomly sample 1000 rows in t1. Note that several Tablets are actually selected according to the
statistics of the table, and the total number of selected Tablet rows may be greater than 1000, so if you want
to explicitly return 1000 rows, you need to add Limit.
keywords
SELECT
Best Practice
table_references after FROM indicates one or more tables participating in the query. If more than one table is listed, a
JOIN operation is performed. And for each specified table, you can define an alias for it
The selected column after SELECT can be referenced in ORDER IN and GROUP BY by column name, column alias or
integer (starting from 1) representing the column position
ORDER BY r, s;
ORDER BY 2, 3;
If ORDER BY appears in a subquery and also applies to the outer query, the outermost ORDER BY takes precedence.
If GROUP BY is used, the grouped columns are automatically sorted in ascending order (as if there was an ORDER BY
statement followed by the same columns). If you want to avoid the overhead of GROUP BY due to automatic
When sorting columns in a SELECT using ORDER BY or GROUP BY, the server sorts values using only the initial number
of bytes indicated by the max_sort_length system variable.
Having clauses are generally applied last, just before the result set is returned to the MySQL client, and is not optimized.
(while LIMIT is applied after HAVING)
The SQL standard requires: HAVING must refer to a column in the GROUP BY list or used by an aggregate function.
However, MySQL extends this by allowing HAVING to refer to columns in the Select clause list, as well as columns from
outer subqueries.
A warning is generated if the column referenced by HAVING is ambiguous. In the following statement, col2 is
ambiguous:
Remember not to use HAVING where WHERE should be used. HAVING is paired with GROUP BY.
The HAVING clause can refer to aggregate functions, while WHERE cannot.
The LIMIT clause can be used to constrain the number of rows returned by a SELECT statement. LIMIT can have one or
two arguments, both of which must be non-negative integers.
/*Then if you want to retrieve all rows after a certain offset is set, you can set a very large constant for
the second parameter. The following query fetches all data from row 96 onwards */
/*If LIMIT has only one parameter, the parameter specifies the number of rows that should be retrieved, and
the offset defaults to 0, that is, starting from the first row*/
deduplication
The ALL and DISTINCT modifiers specify whether to deduplicate rows in the result set (should not be a column).
ALL is the default modifier, that is, all rows that meet the requirements are to be retrieved.
Subqueries allow structured queries so that each part of a statement can be isolated.
Some operations require complex unions and associations. Subqueries provide other ways to perform these
operations
4. Speed up queries
Use Doris's partition and bucket as data filtering conditions as much as possible to reduce the scope of data scanning
Make full use of Doris's prefix index fields as data filter conditions to speed up query speed
5. UNION
Using only the union keyword has the same effect as using union disitnct. Since the deduplication work is more
memory-intensive, the query speed using the union all operation will be faster and the memory consumption will be
less. If users want to perform order by and limit operations on the returned result set, they need to put the union
operation in the subquery, then select from subquery, and finally put the subquery and order by outside the
subquery.
select * from (select age from student_01 union all select age from student_02) as t1
+-------------+
| age |
+-------------+
| 18 |
| 19 |
| 20 |
| 21 |
+-------------+
6. JOIN
In the inner join condition, in addition to supporting equal-valued joins, it also supports unequal-valued joins. For
performance reasons, it is recommended to use equal-valued joins.
Other joins only support equivalent joins
Edit this page
Feedback
SQL Manual SQL Reference DML Manipulation DELETE
DELETE
DELETE
Name
DELETE
Description
This statement is used to conditionally delete data in the specified table (base index) partition.
This operation will also delete the data of the rollup index related to this base index.
grammar:
WHERE
illustrate:
1. The optional types of op include: =, >, <, >=, <=, !=, in, not in
2. Only conditions on the key column can be specified when using AGGREGATE (UNIQUE) model.
3. When the selected key column does not exist in a rollup, delete cannot be performed.
4. Conditions can only have an "and" relationship. If you want to achieve an "or" relationship, you need to write the
conditions in two DELETE statements.
5. If it is a partitioned table, you can specify a partition. If not specified, Doris will infer partition from the given conditions. In
two cases, Doris cannot infer the partition from conditions: 1) the conditions do not contain partition columns; 2) The
operator of the partition column is not in. When a partition table does not specify the partition, or the partition cannot be
inferred from the conditions, the session variable delete_without_partition needs to be true to make delete statement be
applied to all partitions.
Notice:
1. This statement may reduce query efficiency for a period of time after execution.
2. The degree of impact depends on the number of delete conditions specified in the statement.
3. The more conditions you specify, the greater the impact.
Example
1. Delete the data row whose k1 column value is 3 in my_table partition p1
WHERE k1 = 3;
2. Delete the data rows where the value of column k1 is greater than or equal to 3 and the value of column k2 is "abc" in
my_table partition p1
DELETE FROM my_table PARTITION p1
3. Delete the data rows where the value of column k1 is greater than or equal to 3 and the value of column k2 is "abc" in
my_table partition p1, p2
Keywords
DELETE
Best Practice
UPDATE
UPDATE
Name
UPDATE
Description
This statement is used to update the data (the update statement currently only supports the Unique Key model).
UPDATE table_name
SET assignment_list
WHERE expression
value:
{expr | DEFAULT}
assignment:
col_name = value
assignment_list:
Parameters
table_name: The target table of the data to be updated. Can be of the form 'db_name.table_name'
assignment_list: The target column to be updated, in the format 'col_name = value, col_name = value'
where expression: the condition that is expected to be updated, an expression that returns true or false can be
Note
The current UPDATE statement only supports row updates on the Unique model, and there may be data conflicts caused by
concurrent updates.
At present, Doris does not deal with such problems, and users need to avoid such problems from the
business side.
Example
test
The table is a unique model table, which contains four columns: k1, k2, v1, v2. Where k1, k2 are keys, v1, v2 are values, and
the aggregation method is Replace.
1. Update the v1 column in the 'test' table that satisfies the conditions k1 =1 , k2 =2 to 1
Keywords
UPDATE
Best Practice
EXPORT
EXPORT
Name
EXPORT
Description
This statement is used to export the data of the specified table to the specified location.
This is an asynchronous operation that returns if the task is submitted successfully. After execution, you can use the SHOW
EXPORT command to view the progress.
[PARTITION (p1[,p2])]
TO export_path
[opt_properties]
WITH BROKER
[broker_properties];
illustrate:
table_name
The table name of the table currently being exported. Only the export of Doris local table data is supported.
partition
export_path
opt_properties
column_separator : Specifies the exported column separator, default is \t. Only single byte is supported.
line_delimiter : Specifies the line delimiter for export, the default is \n. Only single byte is supported.
exec_mem_limit : Export the upper limit of the memory usage of a single BE node, the default is 2GB, and the unit is
bytes.
timeout : The timeout period of the import job, the default is 2 hours, the unit is seconds.
tablet_num_per_task : The maximum number of tablets each subtask can allocate to scan.
WITH BROKER
The export function needs to write data to the remote storage through the Broker process. Here you need to define the
relevant connection information for the broker to use.
1. If the export is to Amazon S3, you need to provide the following properties
2. If you use the S3 protocol to directly connect to the remote storage, you need to specify the following properties
(
"AWS_ENDPOINT" = " ",
"AWS_ACCESS_KEY" = " ",
"AWS_SECRET_KEY"=" ",
"AWS_REGION" = " "
)
Example
1. Export all data in the test table to hdfs
"username"="xxx",
"password"="yyy"
);
"username"="xxx",
"password"="yyy"
);
3. Export all data in the testTbl table to hdfs, with "," as the column separator, and specify the label
"username"="xxx",
"password"="yyy"
);
"username"="xxx",
"password"="yyy"
);
6. Export all data in the testTbl table to hdfs with invisible character " \x07" as column or row separator.
PROPERTIES (
"column_separator"="\\x07",
"line_delimiter" = "\\x07"
"username"="xxx",
"password"="yyy"
8. Export all data in the testTbl table to s3 with invisible characters " \x07" as column or row separators.
PROPERTIES (
"column_separator"="\\x07",
"line_delimiter" = "\\x07"
) WITH s3 (
"AWS_ENDPOINT" = "xxxxx",
"AWS_ACCESS_KEY" = "xxxxx",
"AWS_SECRET_KEY"="xxxx",
"AWS_REGION" = "xxxxx"
9. Export all data in the testTbl table to cos(Tencent Cloud Object Storage).
PROPERTIES (
"column_separator"=",",
"line_delimiter" = "\n"
"fs.cosn.userinfo.secretId" = "xxx",
"fs.cosn.userinfo.secretKey" = "xxxx",
"fs.cosn.bucket.endpoint_suffix" = "cos.xxxxxxxxx.myqcloud.com"
Keywords
EXPORT
Best Practice
Splitting of subtasks
An Export job will be split into multiple subtasks (execution plans) to execute. How many query plans need to be
executed depends on how many tablets there are in total, and how many tablets can be allocated to a query plan.
Because multiple query plans are executed serially, the execution time of the job can be reduced if one query plan
handles more shards.
However, if there is an error in the query plan (such as the failure of the RPC calling the broker, the jitter in the remote
storage, etc.), too many Tablets will lead to a higher retry cost of a query plan.
Therefore, it is necessary to reasonably arrange the number of query plans and the number of shards that each query
plan needs to scan, so as to balance the execution time and the execution success rate.
It is generally recommended that the amount of data scanned by a query plan is within 3-5 GB.
memory limit
Usually, the query plan of an Export job has only two parts of scan-export , which does not involve calculation logic that
requires too much memory. So usually the default memory limit of 2GB suffices.
However, in some scenarios, such as a query plan, too many Tablets need to be scanned on the same BE, or too many
Tablet data versions may cause insufficient memory. At this point, you need to set a larger memory, such as 4GB, 8GB,
etc., through the exec_mem_limit parameter.
Precautions
Exporting a large amount of data at one time is not recommended. The maximum recommended export data
volume for an Export job is several tens of GB. An overly large export results in more junk files and higher retry costs.
If the amount of table data is too large, it is recommended to export by partition.
If the Export job fails, the __doris_export_tmp_xxx temporary directory generated in the remote storage and the
generated files will not be deleted, and the user needs to delete it manually.
__doris_export_tmp_xxx
If the Export job runs successfully, the directory generated in the remote storage may be
preserved or cleared according to the file system semantics of the remote storage. For example, in S3 object storage,
after the last file in a directory is removed by the rename operation, the directory will also be deleted. If the directory
CANCEL-EXPORT
CANCEL-EXPORT
Name
Since Version 1.2.2
CANCEL EXPORT
Description
This statement is used to undo an export job for the specified label. Or batch undo export jobs via fuzzy matching
CANCEL EXPORT
[FROM db_name]
Example
example_db_test_export_label
1. Cancel the export job whose label is on the database example_db
CANCEL EXPORT
FROM example_db
CANCEL EXPORT
FROM example_db
CANCEL EXPORT
FROM example_db
Keywords
CANCEL, EXPORT
Best Practice
1. Only pending export jobs in PENDING, EXPORTING state can be canceled.
2. When performing batch undo, Doris does not guarantee the atomic undo of all corresponding export jobs. That is, it is
possible that only some of the export jobs were successfully undone. The user can view the job status through the SHOW
EXPORT statement and try to execute the CANCEL EXPORT statement repeatedly.
Edit this page
Feedback
SQL Manual SQL Reference DML OUTFILE
OUTFILE
OUTFILE
Name
OURFILE
description
This statement is used to export query results to a file using the SELECT INTO OUTFILE command. Currently, it supports
exporting to remote storage, such as HDFS, S3, BOS, COS (Tencent Cloud), through the Broker process, through the S3
protocol, or directly through the HDFS protocol.
grammar:
query_stmt
[properties]
illustrate:
1. file_path
filepath points to the path where the file is stored and the file prefix. Such as `hdfs: //path/to/my_file`.
The final filename will consist of `my_file_`, the file number and the file format suffix. The file serial
number starts from 0, and the number is the number of files to be divided. Such as:
my_file_abcdefg_0.csv
my_file_abcdefg_1.csv
my_file_abcdegf_2.csv
2. format_as
FORMAT AS CSV
Specifies the export format. Supported formats include CSV, PARQUET, CSV_WITH_NAMES,
CSV_WITH_NAMES_AND_TYPES and ORC. Default is CSV.
3. properties
Specify related properties. Currently exporting via the Broker process, or via the S3 protocol is supported.
grammar:
column_separator: column separator. <version since="1.2.0">Support mulit-bytes, such as: "\\x01", "abc"
</version>
line_delimiter: line delimiter. <version since="1.2.0">Support mulit-bytes, such as: "\\x01", "abc"</version>
max_file_size: the size limit of a single file, if the result exceeds this value, it will be cut into multiple
files.
broker.kerberos_keytab: specifies the path to the keytab file of kerberos. The file must be the absolute path
to the file on the server where the broker process is located. and can be accessed by the Broker process
:
dfs.nameservices: if hadoop enable HA, please set fs nameservice. See hdfs-site.xml
dfs.ha.namenodes.[nameservice ID] unique identifiers for each NameNode in the nameservice. See hdfs-site.xml
dfs.namenode.rpc-address.[nameservice ID].[name node ID]` :the fully-qualified RPC address for each NameNode
to listen on. See hdfs-site.xml
dfs.client.failover.proxy.provider.[nameservice ID] :the Java class that HDFS clients use to contact the
Active NameNode, usually it is org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
hadoop.security.authentication: kerberos
hadoop.kerberos.principal: the Kerberos pincipal that Doris will use when connectiong to HDFS.
For the S3 protocol, you can directly execute the S3 protocol configuration:
AWS_ENDPOINT
AWS_ACCESS_KEY
AWS_SECRET_KEY
AWS_REGION
use_path_stype: (optional) default false . The S3 SDK uses the virtual-hosted style by default. However, some
object storage systems may not be enabled or support virtual-hosted style access. At this time, we can add the
use_path_style parameter to force the use of path style access method.
example
1. Use the broker method to export, and export the simple query results to the file hdfs://path/to/result.txt . Specifies
that the export format is CSV. Use my_broker and set kerberos authentication information. Specify the column separator
as , and the row separator as \n .
FORMAT AS CSV
PROPERTIES
"broker.name" = "my_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "[email protected]",
"broker.kerberos_keytab" = "/home/doris/my.keytab",
"column_separator" = ",",
"line_delimiter" = "\n",
"max_file_size" = "100MB"
);
If the final generated file is not larger than 100MB, it will be: result_0.csv .
If larger than 100MB, it may be result_0.csv,
result_1.csv, ... .
2. Export the simple query results to the file hdfs://path/to/result.parquet . Specify the export format as PARQUET. Use
my_broker and set kerberos authentication information.
SELECT c1, c2, c3 FROM tbl
FORMAT AS PARQUET
PROPERTIES
"broker.name" = "my_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "[email protected]",
"broker.kerberos_keytab" = "/home/doris/my.keytab",
"schema"="required,int32,c1;required,byte_array,c2;required,byte_array,c2"
);
3. Export the query result of the CTE statement to the file hdfs://path/to/result.txt . The default export format is CSV.
Use my_broker and set hdfs high availability information. Use the default row and column separators.
WITH
x1 AS
x2 AS
PROPERTIES
"broker.name" = "my_broker",
"broker.username"="user",
"broker.password"="passwd",
"broker.dfs.nameservices" = "my_ha",
"broker.dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
"broker.dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
"broker.dfs.client.failover.proxy.provider" =
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
result_0.csv result_0.csv,
If the final generated file is not larger than 1GB, it will be: .
If larger than 1GB, it may be
result_1.csv, ...
.
4. Export the query result of the UNION statement to the file bos://bucket/result.txt . Specify the export format as
PARQUET. Use my_broker and set hdfs high availability information. The PARQUET format does not require a column
delimiter to be specified.
After the export is complete, an identity file is generated.
FORMAT AS PARQUET
PROPERTIES
"broker.name" = "my_broker",
"broker.bos_endpoint" = "http://bj.bcebos.com",
"broker.bos_accesskey" = "xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"broker.bos_secret_accesskey" = "yyyyyyyyyyyyyyyyyyyyyyyyy",
"schema"="required,int32,k1;required,byte_array,k2"
);
5. Export the query result of the select statement to the file s3a://${bucket_name}/path/result.txt . Specify the export
format as csv.
After the export is complete, an identity file is generated.
select k1,k2,v1 from tbl1 limit 100000
FORMAT AS CSV
PROPERTIES
"broker.name" = "hdfs_broker",
"broker.fs.s3a.access.key" = "xxx",
"broker.fs.s3a.secret.key" = "xxxx",
"broker.fs.s3a.endpoint" = "https://cos.xxxxxx.myqcloud.com/",
"column_separator" = ",",
"line_delimiter" = "\n",
"max_file_size" = "1024MB",
"success_file_name" = "SUCCESS"
If the final generated file is not larger than 1GB, it will be: my_file_0.csv .
If larger than 1GB, it may be my_file_0.csv,
result_1.csv, ... .
Verify on cos
format as csv
properties
"AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
"AWS_ACCESS_KEY" = "xxxx",
"AWS_SECRET_KEY" = "xxx",
"AWS_REGION" = "bd"
7. Use the s3 protocol to export to bos, and enable concurrent export of session variables.
Note: However, since the query
statement has a top-level sorting node, even if the concurrently exported session variable is enabled for this query, it
cannot be exported concurrently.
format as csv
properties
"AWS_ENDPOINT" = "http://s3.bd.bcebos.com",
"AWS_ACCESS_KEY" = "xxxx",
"AWS_SECRET_KEY" = "xxx",
"AWS_REGION" = "bd"
8. Use hdfs export to export simple query results to the file hdfs://${host}:${fileSystem_port}/path/to/result.txt .
Specify the export format as CSV and the user name as work. Specify the column separator as , and the row separator
as \n .
-- the default port of fileSystem_port is 9000
FORMAT AS CSV
PROPERTIES
"fs.defaultFS" = "hdfs://ip:port",
"hadoop.username" = "work"
);
If the Hadoop cluster is highly available and Kerberos authentication is enabled, you can refer to the following SQL
statement:
FORMAT AS CSV
PROPERTIES
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProx
'dfs.namenode.kerberos.principal'='hadoop/[email protected]'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='[email protected]',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
If the final generated file is not larger than 100MB, it will be: `result_0.csv`.
9. Export the query result of the select statement to the file cosn://${bucket_name}/path/result.txt on Tencent Cloud
Object Storage (COS). Specify the export format as csv.
After the export is complete, an identity file is generated.
FORMAT AS CSV
PROPERTIES
"broker.name" = "broker_name",
"broker.fs.cosn.userinfo.secretId" = "xxx",
"broker.fs.cosn.userinfo.secretKey" = "xxxx",
"broker.fs.cosn.bucket.endpoint_suffix" = "https://cos.xxxxxx.myqcloud.com/",
"column_separator" = ",",
"line_delimiter" = "\n",
"max_file_size" = "1024MB",
"success_file_name" = "SUCCESS"
keywords
OUTFILE
Best Practice
Doris does not manage exported files. Including the successful export, or the remaining files after the export fails, all
need to be handled by the user.
The ability to export to a local file is not available for public cloud users, only for private deployments. And the default
user has full control over the cluster nodes. Doris will not check the validity of the export path filled in by the user. If the
process user of Doris does not have write permission to the path, or the path does not exist, an error will be reported. At
the same time, for security reasons, if a file with the same name already exists in this path, the export will also fail.
Doris does not manage files exported locally, nor does it check disk space, etc. These files need to be managed by the
user, such as cleaning and so on.
This command is a synchronous command, so it is possible that the task connection is disconnected during the
execution process, so that it is impossible to live the exported data whether it ends normally, or whether it is complete.
At this point, you can use the success_file_name parameter to request that a successful file identifier be generated in the
directory after the task is successful. Users can use this file to determine whether the export ends normally.
Description
This statement is used to diagnose the specified tablet. The results will show information about the tablet and
some potential problems.
grammar:
illustrate:
2. TabletId: Tablet ID
6. MaterializedIndex: The materialized view to which the Tablet belongs and its ID
Example
1. Diagnose tablet 10001
Keywords
ADMIN,DIAGNOSE,TABLET
ADMIN-SHOW-CONFIG
ADMIN-SHOW-CONFIG
Name
ADMIN SHOW CONFIG
Description
This statement is used to display the configuration of the current cluster (currently only the configuration items of FE are
supported)
grammar:
Example
2. Use the like predicate to search the configuration of the current Fe node
+--------------------+-------+---------+---------- -+------------+---------+
+--------------------+-------+---------+---------- -+------------+---------+
+--------------------+-------+---------+---------- -+------------+---------+
Keywords
ADMIN, SHOW, CONFIG, ADMIN SHOW
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference Database Administration KILL
KILL
KILL
Name
KILL
Description
Each Doris connection runs in a separate thread. You can kill a thread with the KILL processlist_id statement.
The thread process list identifier can be determined from the ID column of the INFORMATION_SCHEMA PROCESSLIST table,
the Id column of the SHOW PROCESSLIST output, and the PROCESSLIST_ID column of the Performance Schema thread
table.
grammar:
Example
Keywords
KILL
Best Practice
Feedback
SQL Manual SQL Reference Database Administration ADMIN-CHECK-TABLET
ADMIN-CHECK-TABLET
ADMIN-CHECK-TABLET
Name
ADMIN CHECK TABLET
Description
This statement is used to perform the specified check operation on a set of tablets.
grammar:
PROPERTIES("type" = "...");
illustrate:
1. A list of tablet ids must be specified along with the type property in PROPERTIES.
consistency: Check the consistency of the replica of the tablet. This command is an asynchronous command. After
sending, Doris will start to execute the consistency check job of the corresponding tablet. The final result will be
SHOW PROC "/cluster_health/tablet_health";
reflected in the InconsistentTabletNum column in the result of .
Example
1. Perform a replica data consistency check on a specified set of tablets.
PROPERTIES("type" = "consistency");
Keywords
ADMIN, CHECK, TABLET
Best Practice
Feedback
SQL Manual SQL Reference Database Administration ADMIN-CLEAN-TRASH
ADMIN-CLEAN-TRASH
ADMIN-CLEAN-TRASH
Name
ADMIN CLEAN TRASH
Description
This statement is used to clean up garbage data in the backend
grammar:
illustrate:
1. Use BackendHost:BackendHeartBeatPort to indicate the backend that needs to be cleaned up, and clean up all backends
without adding the on limit.
Example
Keywords
ENABLE-FEATURE
ENABLE-FEATURE
Description
Example
Keywords
ENABLE, FEATURE
Best Practice
RECOVER
RECOVER
Name
RECOVER
Description
This statement is used to restore a previously deleted database, table or partition. It supports recover meta information by
name or id, and you can set new name for recovered meta information.
You can get all meta informations that can be recovered by statement SHOW CATALOG RECYCLE BIN .
grammar:
8. restore table by name and id, and set new table name
9. restore partition by name and id, and set new partition name
illustrate:
This operation can only restore meta information that was deleted in the previous period. Default is 1 day. (Configurable
through the catalog_trash_expire_second parameter in fe.conf)
If you recover a meta information by name without id, it will recover the last dropped one which has same name.
You can get all meta informations that can be recovered by statement SHOW CATALOG RECYCLE BIN .
Example
1. Restore the database named example_db
7. Restore the database named example_db with id example_db_id, and set new name new_example_db
8. Restore the table named example_tbl, and set new name new_example_tbl
9. Restore the partition named p1 with id p1_id in table example_tbl, and new name new_p1
Keywords
RECOVER
Best Practice
UNINSTALL-PLUGIN
UNINSTALL-PLUGIN
Name
UNINSTALL PLUGIN
Description
This statement is used to uninstall a plugin.
grammar:
Example
1. Uninstall a plugin:
Keywords
UNINSTALL, PLUGIN
Best Practice
ADMIN-SET-REPLICA-STATUS
ADMIN-SET-REPLICA-STATUS
Name
ADMIN SET REPLICA STATUS
Description
This statement is used to set the state of the specified replica.
This command is currently only used to manually set the status of certain replicas to BAD or OK, allowing the system to
automatically repair these replicas
grammar:
3. "status": Required. Specifies the state. Currently only "bad" or "ok" are supported
If the specified replica does not exist, or the status is already bad, it will be ignored.
Note:
The copy set to Bad status may be deleted immediately, please proceed with caution.
Example
1. Set the replica status of tablet 10003 on BE 10001 to bad.
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10003", "backend_id" = "10001", "status" = "bad");
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10003", "backend_id" = "10001", "status" = "ok");
Keywords
ADMIN, SET, REPLICA, STATUS
Best Practice
ADMIN-SHOW-REPLICA-DISTRIBUTION
ADMIN-SHOW-REPLICA-DISTRIBUTION
Name
ADMIN SHOW REPLICA DISTRIBUTION
Description
This statement is used to display the distribution status of a table or partition replica
grammar:
illustrate:
1. The Graph column in the result shows the replica distribution ratio in the form of a graph
Example
1. View the replica distribution of the table
Keywords
ADMIN, SHOW, REPLICA, DISTRIBUTION, ADMIN SHOW
Best Practice
INSTALL-PLUGIN
INSTALL-PLUGIN
Name
INSTALL PLUGIN
Description
This statement is used to install a plugin.
grammar:
Example
4. Download and install a plugin, and set the md5sum value of the zip file at the same time:
Keywords
INSTALL, PLUGIN
Best Practice
ADMIN-REPAIR-TABLE
ADMIN-REPAIR-TABLE
Name
ADMIN REPAIR TABLE
Description
statement used to attempt to preferentially repair the specified table or partition
grammar:
illustrate:
1. This statement only means to let the system try to repair the shard copy of the specified table or partition with high
priority, and does not guarantee that the repair can be successful. Users can view the repair status through the ADMIN
SHOW REPLICA STATUS command.
2. The default timeout is 14400 seconds (4 hours). A timeout means that the system will no longer repair shard copies of the
specified table or partition with high priority. Need to re-use this command to set
Example
1. Attempt to repair the specified table
Keywords
ADMIN, REPAIR, TABLE
Best Practice
ADMIN-CANCEL-REPAIR
ADMIN-CANCEL-REPAIR
Name
Description
This statement is used to cancel the repair of the specified table or partition with high priority
grammar:
illustrate:
1. This statement simply means that the system will no longer repair shard copies of the specified table or partition with
high priority. Replicas are still repaired with the default schedule.
Example
1. Cancel high priority repair
Keywords
Best Practice
SET-VARIABLE
SET-VARIABLE
Name
SET VARIABLE
Description
This statement is mainly used to modify Doris system variables. These system variables can be modified at the global and
session level, and some can also be modified dynamically. You can also view these system variables with SHOW VARIABLE .
grammar:
illustrate:
1. variable_assignment:
user_var_name = expr
| [GLOBAL | SESSION] system_var_name = expr
Note:
Variables that support both the current session and the global effect include:
time_zone
wait_timeout
sql_mode
enable_profile
query_timeout
Since Version dev
insert_timeout
exec_mem_limit
batch_size
allow_partition_column_nullable
insert_visible_timeout_ms
enable_fold_constant_by_be
default_rowset_type
Example
Keywords
SET, VARIABLE
Best Practice
Feedback
SQL Manual SQL Reference Database Administration ADMIN-SET-CONFIG
ADMIN-SET-CONFIG
ADMIN-SET-CONFIG
Name
ADMIN SET CONFIG
Description
This statement is used to set the configuration items of the cluster (currently only the configuration items of FE are
supported).
The settable configuration items can be viewed through the ADMIN SHOW FRONTEND CONFIG; command.
grammar:
Example
Keywords
Best Practice
description
This statement is used to display tablet storage foramt information (for administrators only)
Grammar:
example
MySQL [(none)]> admin show tablet storage format;
+-----------+---------+---------+
+-----------+---------+---------+
| 10002 | 0 | 2867 |
+-----------+---------+---------+
+-----------+----------+---------------+
+-----------+----------+---------------+
| 10002 | 39227 | V2 |
| 10002 | 39221 | V2 |
| 10002 | 39215 | V2 |
| 10002 | 39199 | V2 |
+-----------+----------+---------------+
keywords
ADMIN, SHOW, TABLET, STORAGE, FORMAT, ADMIN SHOW
ADMIN-SHOW-REPLICA-STATUS
ADMIN-SHOW-REPLICA-STATUS
Name
ADMIN SHOW REPLICA STATUS
Description
This statement is used to display replica status information for a table or partition.
grammar:
[where_clause];
illustrate
1. where_clause:
WHERE STATUS [!]= "replica_status"
2. replica_status:
OK: replica is healthy
DEAD: The Backend where the replica is located is unavailable
VERSION_ERROR:
replica data version is missing
SCHEMA _ERROR: The schema hash of the replica is incorrect
MISSING: replica does not
exist
Example
Keywords
ADMIN, SHOW, REPLICA, STATUS, ADMIN SHOW
Best Practice
ADMIN-COPY-TABLET
ADMIN-COPY-TABLET
Name
ADMIN COPY TABLET
Description
This statement is used to make a snapshot for the specified tablet, mainly used to load the tablet locally to reproduce the
problem.
syntax:
Notes:
1. backend_id: Specifies the id of the BE node where the replica is located. If not specified, a replica is randomly selected.
2. version: Specifies the version of the snapshot. The version must be less than or equal to the largest version of the replica.
If not specified, the largest version is used.
3. expiration_minutes: Snapshot retention time. The default is 1 hour. It will automatically clean up after a timeout. Unit
minutes.
TabletId: 10020
BackendId: 10003
Ip: 192.168.10.1
Path: /path/to/be/storage/snapshot/20220830101353.2.3600
ExpirationMinutes: 60
) ENGINE=OLAP
PROPERTIES (
"replication_num" = "1",
"version_info" = "2"
);
TabletId: tablet id
BackendId: BE node id
Ip: BE node ip
Path: The directory where the snapshot is located
Example
2. Take a snapshot of the specified version of the replica on the specified BE node
Keywords
Best Practice
ADMIN-REBALANCE-DISK
ADMIN-REBALANCE-DISK
Since Version 1.2.0
Name
ADMIN REBALANCE DISK
Description
This statement is used to try to rebalance disks of the specified backends first, no matter if the cluster is balanced
Grammar:
Explain:
1. This statement only means that the system attempts to rebalance disks of specified backends with high priority, no
matter if the cluster is balanced.
2. The default timeout is 24 hours. Timeout means that the system will no longer rebalance disks of specified backends with
high priority. The command settings need to be reused.
Example
1. Attempt to rebalance disks of all backends
Keywords
ADMIN,REBALANCE,DISK
Best Practice
ADMIN-CANCEL-REBALANCE-DISK
ADMIN-CANCEL-REBALANCE-DISK
Since Version 1.2.0
Name
Description
This statement is used to cancel rebalancing disks of specified backends with high priority
Grammar:
Explain:
1. This statement only indicates that the system no longer rebalance disks of specified backends with high priority. The
system will still rebalance disks by default scheduling.
Example
1. Cancel High Priority Disk Rebalance of all of backends of the cluster
Keywords
ADMIN,CANCEL,REBALANCE DISK
Best Practice
Name
Description
This command is used to view the execution of the Create Materialized View job submitted through the CREATE-
MATERIALIZED-VIEW statement.
[FROM database]
[WHERE]
[ORDER BY]
[LIMIT OFFSET]
database: View jobs under the specified database. If not specified, the current database is used.
WHERE: You can filter the result column, currently only the following columns are supported:
TableName: Only equal value filtering is supported.
State: Only supports equivalent filtering.
Createtime/FinishTime: Support =, >=, <=, >, <, !=
JobId: 11001
TableName: tbl1
FinishTime: NULL
BaseIndexName: tbl1
RollupIndexName: r1
RollupId: 11002
TransactionId: 5070
State: WAITING_TXN
Msg:
Progress: NULL
Timeout: 86400
State
: job status.
WAITING_TXN:
Before officially starting to generate materialized view data, it will wait for the current running import transaction on
this table to complete. And the TransactionId field is the current waiting transaction ID. When all previous imports
for this ID are complete, the job will actually start.
Progress : job progress. The progress here means completed tablets/total tablets . Materialized views are created at
tablet granularity.
Example
1. View the materialized view jobs under the database example_db
Keywords
Best Practice
SHOW-ALTER
SHOW-ALTER
Name
SHOW ALTER
Description
This statement is used to display the execution of various modification tasks currently in progress
implemented...)
Example
1. Display the task execution of all modified columns of the default db
2. Display the task execution status of the last modified column of a table
SHOW ALTER TABLE COLUMN WHERE TableName = "table1" ORDER BY CreateTime DESC LIMIT 1;
3. Display the task execution of creating or deleting ROLLUP index for the specified db
Keywords
SHOW, ALTER
Best Practice
SHOW-BACKUP
SHOW-BACKUP
Name
SHOW BACKUP
Description
This statement is used to view BACKUP tasks
grammar:
illustrate:
Example
Keywords
SHOW, BACKUP
Best Practice
SHOW-BACKENDS
SHOW-BACKENDS
Name
SHOW BACKENDS
Description
This statement is used to view the BE nodes in the cluster
SHOW BACKENDS;
illustrate:
5. If ClusterDecommissioned is true, it means that the node is going offline in the current cluster.
9. TotalCapacity represents the total disk space. TotalCapacity = AvailCapacity + DataUsedCapacity + other
non-user data files occupy space.
11. ErrMsg is used to display the error message when the heartbeat fails.
12. Status is used to display some status information of BE in JSON format, including the time information of
the last time BE reported its tablet.
13. HeartbeatFailureCounter: The current number of heartbeats that have failed consecutively. If the number
exceeds the `max_backend_heartbeat_failure_tolerance_count` configuration, the isAlive will be set to false.
14. NodeRole is used to display the role of Backend node. Now there are two roles: mix and computation. Mix
node represent the origin Backend node and computation Node represent the compute only node.
Example
Keywords
SHOW, BACKENDS
Best Practice
SHOW-BROKER
SHOW-BROKER
Name
SHOW BROKER
Description
grammar:
SHOW BROKER;
illustrate:
Example
Keywords
SHOW, BROKER
Best Practice
SHOW-CATALOGS
SHOW-CATALOGS
Name
SHOW CATALOGS
Description
This statement is used for view created catalogs
Syntax:
illustrate:
Return result:
Example
1. View all created catalogs
SHOW CATALOGS;
+-----------+-------------+----------+-----------+
+-----------+-------------+----------+-----------+
+-----------+-------------+----------+-----------+
```
+-----------+-------------+----------+-----------+
+-----------+-------------+----------+-----------+
+-----------+-------------+----------+-----------+
```
Keywords
Best Practice
SHOW-CREATE-TABLE
SHOW-CREATE-TABLE
Name
SHOW CREATE TABLE
Description
This statement is used to display the creation statement of the data table.
grammar:
illustrate:
Example
Keywords
Best Practice
SHOW-CHARSET
SHOW-CHARSET
Description
Example
Keywords
SHOW, CHARSET
Best Practice
SHOW-CREATE-CATALOG
SHOW-CREATE-CATALOG
Name
Description
This statement shows the creating statement of a doris catalog.
grammar:
illustrate:
Example
1. View the creating statement of the hive catalog in doris
Keywords
SHOW, CREATE, CATALOG
Best Practice
SHOW-CREATE-DATABASE
SHOW-CREATE-DATABASE
Name
SHOW CREATE DATABASE
Description
This statement checks the creation of the doris database.
grammar:
illustrate:
Example
1. View the creation of the test database in doris
+----------+----------------------------+
+----------+----------------------------+
+----------+----------------------------+
Keywords
Best Practice
SHOW-CREATE-MATERIALIZED-VIEW
SHOW-CREATE-MATERIALIZED-VIEW
Name
SHOW CREATE MATERIALIZED VIEW
Description
grammar :
SHOW CREATE MATERIALIZED VIEW mv_name ON table_name
1. mv_name:
Materialized view name. required.
2. table_name:
The table name of materialized view. required.
Example
Create materialized view
+-----------+----------+----------------------------------------------------------------+
+-----------+----------+----------------------------------------------------------------+
| table3 | id_col1 | create materialized view id_col1 as select id,col1 from table3 |
+-----------+----------+----------------------------------------------------------------+
Keywords
Best Practice
SHOW-CREATE-LOAD
SHOW-CREATE-LOAD
Name
SHOW CREATE LOAD
Description
This statement is used to demonstrate the creation statement of a import job.
grammar:
illustrate:
Example
1. Show the creation statement of the specified import job under the default db
Keywords
SHOW, CREATE, LOAD
Best Practice
SHOW-CREATE-ROUTINE-LOAD
SHOW-CREATE-ROUTINE-LOAD
Name
SHOW CREATE ROUTINE LOAD
Description
This statement is used to demonstrate the creation statement of a routine import job.
The kafka partition and offset in the result show the currently consumed partition and the corresponding offset to be
consumed.
grammar:
illustrate:
1. ALL : optional parameter, which means to get all jobs, including historical jobs
Example
1. Show the creation statement of the specified routine import job under the default db
Keywords
SHOW, CREATE, ROUTINE, LOAD
Best Practice
SHOW-CREATE-FUNCTION
SHOW-CREATE-FUNCTION
Name
SHOW CREATE FUNTION
Description
This statement is used to display the creation statement of the user-defined function
grammar:
illustrate:
function_name
1. : The name of the function to display
2. arg_type : The parameter list of the function to display
3. If db_name is not specified, the current default db is used
Example
1. Show the creation statement of the specified function under the default db
Keywords
SHOW, CREATE, FUNCTION
Best Practice
SHOW-COLUMNS
SHOW-COLUMNS
Name
SHOW FULL COLUMNS
Description
This statement is used to specify the column information of the table
grammar:
Example
1. View the column information of the specified table
Keywords
SHOW, COLUMNS
Best Practice
SHOW-COLLATION
SHOW-COLLATION
Description
Example
Keywords
SHOW, COLLATION
Best Practice
SHOW-DATABASES
SHOW-DATABASES
Name
SHOW DATABASES
Description
grammar:
illustrate:
1. SHOW DATABASES will get all database names from current catalog.
2. SHOW DATABASES FROM catalog will all database names from the catalog named 'catalog'.
3. SHOW DATABASES filter_expr will get filtered database names from current catalog.
Example
1. Display all the database names from current catalog.
SHOW DATABASES;
+--------------------+
| Database |
+--------------------+
| test |
| information_schema |
+--------------------+
+---------------+
| Database |
+---------------+
| default |
| tpch |
+---------------+
3. Display the filtered database names from current catalog with the expr 'like'.
+--------------------+
| Database |
+--------------------+
| information_schema |
+--------------------+
Keywords
SHOW, DATABASES
Best Practice
SHOW-DATA-SKEW
Name
Description
This statement is used to view the data skew of a table or a partition.
grammar:
Description:
1. Only one partition must be specified. For non-partitioned tables, the partition name is the same as the
table name.
2. The result will show row count and data volume of each bucket under the specified partition, and the
proportion of the data volume of each bucket in the total data volume.
Example
1. View the data skew of the table
Keywords
SHOW, DATA, SKEW
Best Practice
SHOW-DATABASE-ID
SHOW-DATABASE-ID
Name
SHOW DATABASE ID
Description
This statement is used to find the corresponding database name according to the database id (only for administrators)
grammar:
Example
1. Find the corresponding database name according to the database id
Keywords
SHOW, DATABASE, ID
Best Practice
SHOW-DYNAMIC-PARTITION
SHOW-DYNAMIC-PARTITION
Name
SHOW DYNAMIC
Description
This statement is used to display the status of all dynamic partition tables under the current db
grammar:
Example
Keywords
SHOW, DYNAMIC, PARTITION
Best Practice
SHOW-DELETE
SHOW-DELETE
Name
SHOW DELETE
Description
This statement is used to display the historical delete tasks that have been successfully executed
grammar:
Example
Keywords
SHOW, DELETE
Best Practice
SHOW-DATA
SHOW-DATA
Name
SHOW DATA
Description
This statement is used to display the amount of data, the number of replicas, and the number of statistical rows.
grammar:
illustrate:
1. If the FROM clause is not specified, the data volume and number of replicas subdivided into each table under the current
db will be displayed. The data volume is the total data volume of all replicas. The number of replicas is the number of
replicas for all partitions of the table and all materialized views.
2. If the FROM clause is specified, the data volume, number of copies and number of statistical rows subdivided into each
materialized view under the table will be displayed. The data volume is the total data volume of all replicas. The number of
replicas is the number of replicas for all partitions of the corresponding materialized view. The number of statistical rows
is the number of statistical rows for all partitions of the corresponding materialized view.
3. When counting the number of rows, the one with the largest number of rows among the multiple copies shall prevail.
4. The Total row in the result set represents the total row. The Quota line represents the quota set by the current
database. The Left line indicates the remaining quota.
5. If you want to see the size of each Partition, see help show partitions .
Example
1. Display the data volume, replica number, aggregate data volume and aggregate replica number of each table in the
default db.
SHOW DATA;
+-----------+-------------+--------------+
+-----------+-------------+--------------+
| tbl1 | 900.000 B | 6 |
| tbl2 | 500.000 B | 3 |
| Total | 1.400 KB | 9 |
+-----------+-------------+--------------+
2. Display the subdivided data volume, the number of replicas and the number of statistical rows of the specified table
under the specified db
+-----------+-----------+-----------+--------------+----------+
+-----------+-----------+-----------+--------------+----------+
| | r2 | 20.000MB | 30 | 20000 |
| | Total | 80.000 | 90 | |
+-----------+-----------+-----------+--------------+----------+
3. It can be combined and sorted according to the amount of data, the number of copies, the number of statistical rows,
etc.
+-----------+-------------+--------------+
+-----------+-------------+--------------+
| table_c | 3.102 KB | 40 |
| table_d | .000 | 20 |
| table_b | 324.000 B | 20 |
| table_a | 1.266 KB | 10 |
| Total | 4.684 KB | 90 |
+-----------+-------------+--------------+
Keywords
SHOW, DATA
Best Practice
Feedback
SQL Manual SQL Reference Show SHOW-ENGINES
SHOW-ENGINES
SHOW-ENGINES
Description
Example
Keywords
SHOW, ENGINES
Best Practice
SHOW-EVENTS
SHOW-EVENTS
Description
Example
Keywords
SHOW, EVENTS
Best Practice
SHOW-EXPORT
SHOW-EXPORT
Name
SHOW EXPORT
Description
This statement is used to display the execution of the specified export task
grammar:
SHOW EXPORT
[FROM db_name]
WHERE
[ID=your_job_id]
[STATE = ["PENDING"|"EXPORTING"|"FINISHED"|"CANCELLED"]]
[LABEL=your_label]
[ORDER BY...]
[LIMIT limit];
illustrate:
4. If LIMIT is specified, limit matching records are displayed. Otherwise show all
Example
1. Show all export tasks of default db
SHOW EXPORT;
2. Display the export tasks of the specified db, sorted by StartTime in descending order
3. Display the export tasks of the specified db, the state is "exporting", and sort by StartTime in descending order
SHOW EXPORT FROM example_db WHERE STATE = "exporting" ORDER BY StartTime DESC;
5. Display the specified db and specify the export task of the label
SHOW EXPORT FROM example_db WHERE LABEL = "mylabel";
Keywords
SHOW, EXPORT
Best Practice
SHOW-ENCRYPT-KEY
SHOW-ENCRYPT-KEY
Name
SHOW ENCRYPTKEYS
Description
View all custom keys under the database. If the user specifies a database, check the corresponding database, otherwise
directly query the database where the current session is located.
ADMIN
Requires privilege on this database
grammar:
parameter
Example
mysql> SHOW ENCRYPTKEYS;
+-------------------+-------------------+
+-------------------+-------------------+
| example_db.my_key | ABCD123456789 |
+-------------------+-------------------+
+-------------------+-------------------+
+-------------------+-------------------+
| example_db.my_key | ABCD123456789 |
+-------------------+-------------------+
Keywords
Best Practice
SHOW-FUNCTIONS
SHOW-FUNCTIONS
Name
SHOW FUNCTIONS
Description
View all custom (system-provided) functions under the database. If the user specifies a database, then view the
corresponding database, otherwise directly query the database where the current session is located
grammar
Parameters
Example
mysql> show full functions in testDb\G
Signature: my_add(INT,INT)
Properties:
{"symbol":"_ZN9doris_udf6AddUdfEPNS_15FunctionContextERKNS_6IntValES4_","object_file":"http://host:port/libudfsampl
**************************** 2. row ******************** ******
Signature: my_count(BIGINT)
Signature: id_masking(BIGINT)
+---------------+
| Function Name |
+---------------+
| year |
| years_add |
| years_diff |
| years_sub |
+---------------+
Keywords
SHOW, FUNCTIONS
Best Practice
SHOW-TYPECAST
SHOW-TYPECAST
Name
SHOW TYPECAST
Description
View all type cast under the database. If the user specifies a database, then view the corresponding database, otherwise
directly query the database where the current session is located
grammar
Parameters
Example
Keywords
SHOW, TYPECAST
Best Practice
SHOW-FILE
SHOW-FILE
Name
SHOW FILE
Description
This statement is used to display a file created in a database
grammar:
illustrate:
Example
Keywords
SHOW, FILE
Best Practice
SHOW-GRANTS
SHOW-GRANTS
Name
SHOW GRANTS
Description
grammar:
illustrate:
Example
SHOW GRANTS;
Keywords
SHOW, GRANTS
Best Practice
SHOW-LAST-INSERT
SHOW-LAST-INSERT
Name
SHOW LAST INSERT
Description
This syntax is used to view the result of the latest insert operation in the current session connection
grammar:
TransactionId: 64067
Label: insert_ba8f33aea9544866-8ed77e2844d0cc9b
Database: default_cluster:db1
Table: t1
TransactionStatus: VISIBLE
LoadedRows: 2
FilteredRows: 0
illustrate:
TransactionId: transaction id
Label: the label corresponding to the insert task
Database: the database corresponding to insert
Example
Keywords
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference Show SHOW-LOAD-PROFILE
SHOW-LOAD-PROFILE
SHOW-LOAD-PROFILE
Name
SHOW LOAD PROFILE
Description
This statement is used to view the Profile information of the import operation. This function requires the user to open the
Profile settings. The versions before 0.15 perform the following settings:
SET is_report_success=true;
grammar:
This command will list all currently saved import profiles. Each line corresponds to one import. where the QueryId column is
the ID of the import job. This ID can also be viewed through the SHOW LOAD statement. We can select the QueryId
corresponding to the Profile we want to see to see the specific situation
Example
+---------+------+-----------+------+-----------+---------------------+---------------------+-----------+-----
-------+
+---------+------+-----------+------+-----------+---------------------+---------------------+-----------+-----
-------+
| 10441 | N/A | N/A | N/A | Load | 2021-04-10 22:15:37 | 2021-04-10 22:18:54 | 3m17s | N/A
|
+---------+------+-----------+------+-----------+---------------------+---------------------+-----------+-----
-------+
+-----------------------------------+------------+
| TaskId | ActiveTime |
+-----------------------------------+------------+
| 980014623046410a-88e260f0c43031f1 | 3m14s |
+-----------------------------------+------------+
+-----------------------------------+------------------+------------+
+-----------------------------------+------------------+------------+
+-----------------------------------+------------------+------------+
Instance:
┌-----------------------------------------┐
│[-1: OlapTableSink] │
│ - Counters: │
│ - CloseWaitTime: 1m53s │
│ - ConvertBatchTime: 0ns │
│ - MaxAddBatchExecTime: 1m46s │
│ - NonBlockingSendTime: 3m11s │
│ - NumberBatchAdded: 782 │
│ - NumberNodeChannels: 1 │
│ - OpenTime: 743.822us │
│ - RowsFiltered: 0 │
│ - SendDataTime: 11s761ms │
│ - TotalAddBatchExecTime: 1m46s │
│ - ValidateDataTime: 9s802ms │
└-----------------------------------------┘
┌-----------------------------------------------------┐
│[0: BROKER_SCAN_NODE] │
│ - Counters: │
│ - BytesDecompressed: 0.00 │
│ - BytesRead: 5.77 GB │
│ - DecompressTime: 0ns │
│ - FileReadTime: 34s263ms │
│ - MaterializeTupleTime(*): 45s54ms │
│ - NumDiskAccess: 0 │
│ - PeakMemoryUsage: 33.03 MB │
│ - TotalRawReadTime(*): 1m20s │
│ - WaitScannerTime: 56s528ms │
└-----------------------------------------------------┘
Keywords
SHOW, LOAD, PROFILE
Best Practice
SHOW-LOAD-WARNINGS
SHOW-LOAD-WARNINGS
Name
SHOW LOAD WARNINGS
Description
If the import task fails and the error message is ETL_QUALITY_UNSATISFIED , it means that there is an import quality problem. If
you want to see these import tasks with quality problems, change the statement to complete this operation.
grammar:
[FROM db_name]
WHERE
[LABEL[="your_label"]]
Example
1. Display the data with quality problems in the import task of the specified db, and specify the label as
"load_demo_20210112"
Keywords
SHOW, LOAD, WARNINGS
Best Practice
SHOW-INDEX
SHOW-INDEX
Name
SHOW INDEX
Description
This statement is used to display information about indexes in a table. Currently, only bitmap indexes are supported.
grammar:
or
Example
1. Display the lower index of the specified table_name
Keywords
SHOW, INDEX
Best Practice
SHOW-MIGRATIONS
SHOW-MIGRATIONS
Name
SHOW MIGRATIONS
Description
This statement is used to view the progress of database migration
grammar:
SHOW MIGRATIONS
Example
Keywords
SHOW, MIGRATIONS
Best Practice
SHOW-PARTITION-ID
SHOW-PARTITION-ID
Name
SHOW PARTITION ID
Description
This statement is used to find the corresponding database name, table name, partition name according to the partition id
(only for administrators)
grammar:
Example
1. Find the corresponding database name, table name, partition name according to the partition id
Keywords
SHOW, PARTITION, ID
Best Practice
SHOW-SNAPSHOT
SHOW-SNAPSHOT
Name
SHOW SNAPSHOT
Description
This statement is used to view backups that already exist in the repository.
grammar:
illustrate:
Example
3. View the details of the backup named backup1 in the warehouse example_repo with the time version "2018-05-05-15-34-
26":
Keywords
SHOW, SNAPSHOT
Best Practice
SHOW-ROLLUP
SHOW-ROLLUP
Description
Example
Keywords
SHOW, ROLLUP
Best Practice
SHOW-SQL-BLOCK-RULE
SQL-BLOCK-RULE
Name
SHOW SQL BLOCK RULE
Description
View the configured SQL blocking rules. If you do not specify a rule name, you will view all rules.
grammar:
Example
1. View all rules.
+------------+------+---------+---------------+---- -------+-------------+--------+--------+
+------------+------+---------+---------------+---- -------+-------------+--------+--------+
+------------+------+---------+---------------+---- -------+-------------+--------+--------+
Keywords
SHOW, SQL_BLOCK_RULE
Best Practice
SHOW-ROUTINE-LOAD
SHOW-ROUTINE-LOAD
Name
SHOW ROUTINE LOAD
Description
This statement is used to display the running status of the Routine Load job
grammar:
Result description:
Id: job ID
State
Progress
For Kafka data sources, displays the currently consumed offset for each partition. For example, {"0":"2"}
indicates that the consumption progress of Kafka partition 0 is 2.
*Lag
For Kafka data sources, shows the consumption latency of each partition. For example, {"0":10} means that the
consumption delay of Kafka partition 0 is 10.
Example
1. Show all routine import jobs named test1 (including stopped or canceled jobs). The result is one or more lines.
3. Display all routine import jobs (including stopped or canceled jobs) under example_db. The result is one or more lines.
use example_db;
use example_db;
5. Display the currently running routine import job named test1 under example_db
6. Displays all routine import jobs named test1 under example_db (including stopped or canceled jobs). The result is one or
more lines.
Keywords
SHOW, ROUTINE, LOAD
Best Practice
SHOW-SYNC-JOB
SHOW-SYNC-JOB
Name
SHOW SYNC JOB
Description
This command is used to currently display the status of resident data synchronization jobs in all databases.
grammar:
Example
1. Display the status of all data synchronization jobs in the current database.
2. Display the status of all data synchronization jobs under the database test_db .
Keywords
Best Practice
SHOW-WHITE-LIST
SHOW-WHITE-LIST
Description
Example
Keywords
Best Practice
SHOW-WARNING
SHOW-WARNING
Description
Example
Keywords
SHOW, WARNING
Best Practice
SHOW-TABLET
SHOW-TABLET
Name
SHOW TABLET
Description
This statement is used to display the specified tablet id information (only for administrators)
grammar:
Example
1. Display the parent-level id information of the tablet with the specified tablet id of 10000
Keywords
SHOW, TABLET
Best Practice
SHOW-VARIABLES
SHOW-VARIABLES
Name
SHOW VARIABLES
Description
This statement is used to display Doris system variables, which can be queried by conditions
grammar:
illustrate:
Executing the SHOW VARIABLES command does not require any privileges, it only requires being able to connect to the
server.
Use the like statement to match with variable_name.
The % percent wildcard can be used anywhere in the matching pattern
Example
1. The default here is to match the Variable_name, here is the exact match
2. Matching through the percent sign (%) wildcard can match multiple items
Keywords
SHOW, VARIABLES
Best Practice
Feedback
SQL Manual SQL Reference Show SHOW-PLUGINS
SHOW-PLUGINS
SHOW-PLUGINS
Name
SHOW PLUGINS
Description
This statement is used to display installed plugins
grammar:
SHOW PLUGINS
This command will display all user-installed and system built-in plugins
Example
SHOW PLUGINS;
Keywords
SHOW, PLUGINS
Best Practice
SHOW-ROLES
SHOW-ROLES
Name
SHOW ROLES
Description
This statement is used to display all created role information, including role name, included users and permissions.
grammar:
SHOW ROLES
Example
SHOW ROLES
Keywords
SHOW, ROLES
Best Practice
SHOW-PROCEDURE
SHOW-PROCEDURE
Description
Example
Keywords
SHOW, PROCEDURE
Best Practice
SHOW-ROUTINE-LOAD-TASK
SHOW-ROUTINE-LOAD-TASK
Name
SHOW ROUTINE LOAD TASK
Description
View the currently running subtasks of a specified Routine Load job.
TaskId: d67ce537f1be4b86-abf47530b79ab8e6
TxnId: 4
TxnStatus: UNKNOWN
JobId: 10280
Timeout: 20
BeId: 10002
DataSourceProperties: {"0":19}
Example
1. Display the subtask information of the routine import task named test1.
Keywords
Best Practice
With this command, you can view how many subtasks are currently running in a Routine Load job, and which BE node is
running on.
SHOW-PROC
SHOW-PROC
Name
SHOW PROC
Description
The Proc system is a unique feature of Doris. Students who have used Linux may understand this concept better. In Linux
systems, proc is a virtual file system, usually mounted in the /proc directory. Users can view the internal data structure of the
system through this file system. For example, you can view the details of the specified pid process through /proc/pid.
Similar to the proc system in Linux, the proc system in Doris is also organized into a directory-like structure to view different
system information according to the "directory path (proc path)" specified by the user.
The proc system is designed mainly for system administrators, so that it is convenient for them to view some running states
inside the system. Such as the tablet status of the table, the cluster balance status, the status of various jobs, and so on. is a
very useful function
1. View through the WEB UI interface provided by Doris, visit the address: http://FE_IP:FE_HTTP_PORT
2. Another way is by command
All commands supported by Doris PROC can be seen through SHOW PROC "/";
After connecting to Doris through the MySQL client, you can execute the SHOW PROC statement to view the information of
the specified proc directory. The proc directory is an absolute path starting with "/".
The results of the show proc statement are presented in a two-dimensional table. And usually the first column of the result
table is the next subdirectory of proc.
+---------------------------+
| name |
+---------------------------+
| statistic |
| brokers |
| frontends |
| routine_loads |
| auth |
| jobs |
| bdbje |
| resources |
| monitor |
| transactions |
| colocation_group |
| backends |
| trash |
| cluster_balance |
| current_queries |
| dbs |
| load_error_hub |
| current_backend_instances |
| tasks |
| cluster_health |
| current_query_stmts |
| stream_loads |
+---------------------------+
illustrate:
1. Statistics: It is mainly used to summarize and view the number of databases, tables, partitions, shards, and replicas in the
Doris cluster. and the number of unhealthy copies. This information helps us to control the size of the cluster meta-
information in general. It helps us view the cluster sharding situation from an overall perspective, and can quickly check
the health of the cluster sharding. This further locates problematic data shards.
2. brokers : View cluster broker node information, equivalent to SHOW BROKER
3. frontends: Display all FE node information in the cluster, including IP address, role, status, whether it is a mater, etc.,
equivalent to SHOW FRONTENDS
4. routine_loads: Display all routine load job information, including job name, status, etc.
5. auth: User name and corresponding permission information
6. jobs:
7. bdbje: To view the bdbje database list, you need to modify the fe.conf file to add enable_bdbje_debug_mode=true , and
then start FE through sh start_fe.sh --daemon to enter the debug mode. After entering debug mode, only http
server MySQLServer BDBJE
and will be started and the instance will be opened, but no metadata loading and
subsequent startup processes will be entered.
8. dbs: Mainly used to view the metadata information of each database and the tables in the Doris cluster. This information
includes table structure, partitions, materialized views, data shards and replicas, and more. Through this directory and its
subdirectories, you can clearly display the table metadata in the cluster, and locate some problems such as data skew,
replica failure, etc.
9. resources : View system resources, ordinary accounts can only see resources that they have USAGE_PRIV permission to
use. Only the root and admin accounts can see all resources. Equivalent to SHOW RESOURCES
10. monitor : shows the resource usage of FE JVM
11. transactions : used to view the transaction details of the specified transaction id, equivalent to SHOW TRANSACTION
12. colocation_group : This command can view the existing Group information in the cluster. For details, please refer to the
Colocation Join chapter
13. backends: Displays the node list of BE in the cluster, equivalent to SHOW BACKENDS
14. trash: This statement is used to view the space occupied by garbage data in the backend. Equivalent to SHOW TRASH
15. cluster_balance : To check the balance of the cluster, please refer to Data Copy Management
16. current_queries : View the list of queries being executed, the SQL statement currently running.
17. load_error_hub: Doris supports centralized storage of error information generated by load jobs in an error hub. Then view
the error message directly through the SHOW LOAD WARNINGS; statement. Shown here is the configuration information of
the error hub.
18. current_backend_instances : Displays a list of be nodes that are currently executing jobs
19. tasks : Displays the total number of tasks of various jobs and the number of failures.
20. Cluster_health: Run SHOW PROC '/cluster_health/tablet_health'; statement to view the replica status of the entire
cluster.
21. Current_query_stmts: Returns the currently executing query.
1. For example, "/dbs" displays all databases, and "/dbs/10002" displays all tables under the database with id 10002
+---------+----------------------+----------+---------------------+--------------+--------+------+------------
--------------+--------------+
+---------+----------------------+----------+---------------------+--------------+--------+------+------------
--------------+--------------+
+---------+----------------------+----------+---------------------+--------------+--------+------+------------
--------------+--------------+
2. Display information about the number of all database tables in the cluster.
+-------+----------------------+----------+--------------+----------+-----------+------------+
+-------+----------------------+----------+--------------+----------+-----------+------------+
| 10002 | default_cluster:test | 4 | 12 | 12 | 21 | 21 |
| Total | 1 | 4 | 12 | 12 | 21 | 21 |
+-------+----------------------+----------+--------------+----------+-----------+------------+
3. The following command can view the existing Group information in the cluster.
+-------------+--------------+--------------+------------+----------------+----------+----------+
+-------------+--------------+--------------+------------+----------------+----------+----------+
+-------------+--------------+--------------+------------+----------------+----------+----------+
GroupId: The cluster-wide unique identifier of a group, the first half is the db id, and the second half is the group id.
GroupName: The full name of the Group.
TabletIds: The id list of Tables contained in this Group.
4. Use the following commands to further view the data distribution of a Group:
+-------------+---------------------+
| BucketIndex | BackendIds |
+-------------+---------------------+
+-------------+---------------------+
BackendIds: The list of BE node IDs where the data shards in the bucket are located.
5. Display the total number of tasks of various jobs and the number of failures.
+-------------------------+-----------+----------+
+-------------------------+-----------+----------+
| CREATE | 0 | 0 |
| DROP | 0 | 0 |
| PUSH | 0 | 0 |
| CLONE | 0 | 0 |
| STORAGE_MEDIUM_MIGRATE | 0 | 0 |
| ROLLUP | 0 | 0 |
| SCHEMA_CHANGE | 0 | 0 |
| CANCEL_DELETE | 0 | 0 |
| MAKE_SNAPSHOT | 0 | 0 |
| RELEASE_SNAPSHOT | 0 | 0 |
| CHECK_CONSISTENCY | 0 | 0 |
| UPLOAD | 0 | 0 |
| DOWNLOAD | 0 | 0 |
| CLEAR_REMOTE_FILE | 0 | 0 |
| MOVE | 0 | 0 |
| REALTIME_PUSH | 0 | 0 |
| PUBLISH_VERSION | 0 | 0 |
| CLEAR_ALTER_TASK | 0 | 0 |
| CLEAR_TRANSACTION_TASK | 0 | 0 |
| RECOVER_TABLET | 0 | 0 |
| STREAM_LOAD | 0 | 0 |
| UPDATE_TABLET_META_INFO | 0 | 0 |
| ALTER | 0 | 0 |
| INSTALL_PLUGIN | 0 | 0 |
| UNINSTALL_PLUGIN | 0 | 0 |
| Total | 0 | 0 |
+-------------------------+-----------+----------+
+----------+---------------------------+-----------+------------+-------------------+----------------------+--
--------------------+--------------+----------------------------+-------------------------+-------------------
+---------------------+----------------------+----------------------+------------------+----------------------
-------+-----------------+-------------+------------+
+----------+---------------------------+-----------+------------+-------------------+----------------------+--
--------------------+--------------+----------------------------+-------------------------+-------------------
+---------------------+----------------------+----------------------+------------------+----------------------
-------+-----------------+-------------+------------+
+----------+---------------------------+-----------+------------+-------------------+----------------------+--
--------------------+--------------+----------------------------+-------------------------+-------------------
+---------------------+----------------------+----------------------+------------------+----------------------
-------+-----------------+-------------+------------+
View the replica status under a database, such as a database with a DbId of 25852112.
Keywords
SHOW, PROC
Best Practice
SHOW-TABLE-STATUS
SHOW-TABLE-STATUS
Name
SHOW TABLE STATUS
Description
This statement is used to view some information about the Table.
grammar:
illustrate:
1. This statement is mainly used to be compatible with MySQL syntax, currently only a small amount of information such as
Comment is displayed
Example
1. View the information of all tables under the current database
2. View the information of the table whose name contains example under the specified database
Keywords
Best Practice
SHOW-REPOSITORIES
SHOW-REPOSITORIES
Name
SHOW REPOSITORIES
Description
This statement is used to view the currently created warehouse
grammar:
SHOW REPOSITORIES;
illustrate:
Example
SHOW REPOSITORIES;
Keywords
SHOW, REPOSITORIES
Best Practice
SHOW-QUERY-PROFILE
SHOW-QUERY-PROFILE
Name
SHOW QUERY PROFILE
Description
This statement is used to view the tree profile information of the query operation,this function requires the user to open
profile settings.
Before versions 0.15, perform the following settings:
SET is_report_success=true;
grammar:
This command will list the profiles of all currently saved query operations.
Get the tree profile information of the specified query ID,Return to profile simple tree.Specify fragment ID and instance ID
returns the corresponding detailed profile tree.
Example
1. List all query Profile
+-----------------------------------+------+-------------------------+--------------------+-----------+-------
--------------+---------------------+-----------+------------+
+-----------------------------------+------+-------------------------+--------------------+-----------+-------
--------------+---------------------+-----------+------------+
+-----------------------------------+------+-------------------------+--------------------+-----------+-------
--------------+---------------------+-----------+------------+
Fragments: ┌────────────────────────┐
│[-1: VDataBufferSender] │
│Fragment: 0 │
│MaxActiveTime: 783.263us│
└────────────────────────┘
┌┘
┌───────────────────┐
│[1: VEXCHANGE_NODE]│
│Fragment: 0 │
└───────────────────┘
└┐
┌────────────────────────┐
│[1: VDataStreamSender] │
│Fragment: 1 │
│MaxActiveTime: 847.612us│
└────────────────────────┘
┌────────────────────┐
│[0: VOLAP_SCAN_NODE]│
│Fragment: 1 │
└────────────────────┘
┌┘
┌─────────────┐
│[OlapScanner]│
│Fragment: 1 │
└─────────────┘
┌─────────────────┐
│[SegmentIterator]│
│Fragment: 1 │
└─────────────────┘
Instances: 327167e0db4749a9-adce3b3d770b2bb2
Host: 172.26.0.1:9111
ActiveTime: 847.612us
Instance: ┌───────────────────────────────────────┐
│[1: VDataStreamSender] │
│ - Counters: │
│ - BytesSent: 0.00 │
│ - IgnoreRows: 0 │
│ - LocalBytesSent: 20.00 B │
│ - PeakMemoryUsage: 0.00 │
│ - SerializeBatchTime: 0ns │
│ - UncompressedRowBatchSize: 0.00 │
└───────────────────────────────────────┘
┌───────────────────────────────────────┐
│[0: VOLAP_SCAN_NODE] │
│ - Counters: │
│ - BatchQueueWaitTime: 444.714us │
│ - BytesRead: 37.00 B │
│ - NumDiskAccess: 1 │
│ - NumScanners: 2 │
│ - PeakMemoryUsage: 320.00 KB │
│ - RowsRead: 4 │
│ - RowsReturned: 4 │
│ - ScannerBatchWaitTime: 206.40us │
│ - ScannerSchedCount : 2 │
│ - ScannerWorkerWaitTime: 34.640us│
│ - TabletCount : 2 │
└───────────────────────────────────────┘
┌─────────────────────────────────┐
│[OlapScanner] │
│ - Counters: │
│ - BlockConvertTime: 0ns │
│ - BlockFetchTime: 183.741us│
│ - ReaderInitTime: 180.741us│
│ - RowsDelFiltered: 0 │
│ - RowsPushedCondFiltered: 0│
│ - ScanCpuTime: 388.576us │
│ - ScanTime: 0ns │
│ - ShowHintsTime_V1: 0ns │
└─────────────────────────────────┘
┌─────────────────────────────────────┐
│[SegmentIterator] │
│ - Counters: │
│ - BitmapIndexFilterTimer: 124ns│
│ - BlockLoadTime: 179.202us │
│ - BlockSeekCount: 5 │
│ - BlockSeekTime: 18.792us │
│ - BlocksLoad: 4 │
│ - CachedPagesNum: 2 │
│ - CompressedBytesRead: 0.00 │
│ - DecompressorTimer: 0ns │
│ - IOTimer: 0ns │
│ - IndexLoadTime_V1: 0ns │
│ - NumSegmentFiltered: 0 │
│ - NumSegmentTotal: 2 │
│ - RawRowsRead: 4 │
│ - RowsBitmapIndexFiltered: 0 │
│ - RowsBloomFilterFiltered: 0 │
│ - RowsConditionsFiltered: 0 │
│ - RowsKeyRangeFiltered: 0 │
│ - RowsStatsFiltered: 0 │
│ - RowsVectorPredFiltered: 0 │
│ - TotalPagesNum: 2 │
│ - UncompressedBytesRead: 0.00 │
│ - VectorPredEvalTime: 0ns │
└─────────────────────────────────────┘
Keywords
Best Practice
SHOW-OPEN-TABLES
SHOW-OPEN-TABLES
Description
Example
Keywords
SHOW, OPEN, TABLES
Best Practice
SHOW-TABLETS
{
"title": "SHOW-TABLETS",
"language": "en"
SHOW-TABLETS
Name
SHOW TABLETS
Description
This statement is used to list tablets of the specified table (only for administrators)
grammar:
[WHERE where_condition]
[ORDER BY col_name]
1. Syntax Description:
Version = version
state = "NORMAL|ROLLUP|CLONE|DECOMMISSION"
BackendId = backend_id
IndexName = rollup_name
Example
Keywords
SHOW, TABLETS
Best Practice
SHOW-LOAD
SHOW-LOAD
Name
SHOW LOAD
Description
This statement is used to display the execution of the specified import task
grammar:
SHOW LOAD
[FROM db_name]
WHERE
[STATE = ["PENDING"|"ETL"|"LOADING"|"FINISHED"|"CANCELLED"|]]
[ORDER BY...]
illustrate:
2. If LABEL LIKE is used, it will match import tasks whose label contains label_matcher
6. If LIMIT is specified, limit matching records are displayed. Otherwise show all
7. If OFFSET is specified, the query results are displayed starting at offset offset. By default the offset is 0.
8. If you are using broker/mini load, the connections in the URL column can be viewed using the following command:
Example
1. Show all import tasks for default db
SHOW LOAD;
2. Display the import tasks of the specified db, the label contains the string "2014_01_02", and display the oldest 10
SHOW LOAD FROM example_db WHERE LABEL LIKE "2014_01_02" LIMIT 10;
3. Display the import tasks of the specified db, specify the label as "load_example_db_20140102" and sort by LoadStartTime
in descending order
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" ORDER BY LoadStartTime DESC;
4. Display the import task of the specified db, specify the label as "load_example_db_20140102", the state as "loading", and
sort by LoadStartTime in descending order
SHOW LOAD FROM example_db WHERE LABEL = "load_example_db_20140102" AND STATE = "loading" ORDER BY
LoadStartTime DESC;
5. Display the import tasks of the specified db and sort them in descending order by LoadStartTime, and display 10 query
results starting from offset 5
Keywords
SHOW, LOAD
Best Practice
SHOW-TABLES
SHOW-TABLES
Name
SHOW TABLES
Description
This statement is used to display all tables under the current db
grammar:
illustrate:
Example
1. View all tables under DB
+---------------------------------+
| Tables_in_demo |
+---------------------------------+
| ads_client_biz_aggr_di_20220419 |
| cmy1 |
| cmy2 |
| intern_theme |
| left_table |
+---------------------------------+
+----------------+
| Tables_in_demo |
+----------------+
| cmy1 |
| cmy2 |
+----------------+
Keywords
SHOW, TABLES
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference Show SHOW-RESOURCES
SHOW-RESOURCES
SHOW-RESOURCES
Name
SHOW RESOURCES
Description
This statement is used to display resources that the user has permission to use. Ordinary users can only display resources
with permission, and root or admin users will display all resources.
grammar:
SHOW RESOURCES
WHERE
[RESOURCETYPE = ["SPARK"]]
[ORDER BY...]
illustrate:
1. If NAME LIKE is used, it will match Resource whose Name contains name_matcher in RESOURCES
6. If OFFSET is specified, the query results are displayed starting at offset offset. By default the offset is 0.
Example
1. Display all resources that the current user has permissions to
SHOW RESOURCES;
2. Display the specified Resource, the name contains the string "20140102", and display 10 attributes
3. Display the specified Resource, specify the name as "20140102" and sort by KEY in descending order
Keywords
SHOW, RESOURCES
Best Practice
SHOW-PARTITIONS
SHOW-PARTITIONS
Name
SHOW PARTITIONS
Description
This statement is used to display partition information
grammar:
illustrate:
1. Support the filtering of PartitionId, PartitionName, State, Buckets, ReplicationNum, LastConsistencyCheckTime and
other columns
2. TEMPORARY specifies to list temporary partitions
Example
1. Display all non-temporary partition information of the specified table under the specified db
2. Display all temporary partition information of the specified table under the specified db
3. Display the information of the specified non-temporary partition of the specified table under the specified db
4. Display the latest non-temporary partition information of the specified table under the specified db
Keywords
SHOW, PARTITIONS
Best Practice
SHOW-FRONTENDS
SHOW-FRONTENDS
Name
SHOW FRONTENDS
Description
This statement is used to view FE nodes
grammar:
SHOW FRONTENDS;
illustrate:
Example
Keywords
SHOW, FRONTENDS
Best Practice
SHOW-RESTORE
SHOW-RESTORE
Name
SHOW RESTORE
Description
This statement is used to view RESTORE tasks
grammar:
illustrate:
DOWNLOAD: The snapshot is complete, ready to download the snapshot in the repository
COMMITING: in effect
UnfinishedTasks: Displays unfinished subtask ids during SNAPSHOTING, DOWNLOADING and COMMITING stages
Example
1. View the latest RESTORE task under example_db.
Keywords
SHOW, RESTORE
Best Practice
SHOW-PROPERTY
SHOW-PROPERTY
Description
This statement is used to view the attributes of the user
user
View the attributes of the specified user. If not specified, check the current user's.
LIKE
+----------------------+-------+
| Key | Value |
+----------------------+-------+
| max_user_connections | 100 |
+----------------------+-------+
Key
Property name.
Value
Attribute value.
Example
Keywords
SHOW, PROPERTY
Best Practice
Edit this page
Feedback
SQL Manual SQL Reference Show SHOW-TRIGGERS
SHOW-TRIGGERS
SHOW-TRIGGERS
Description
Example
Keywords
SHOW, TRIGGERS
Best Practice
SHOW-PROCESSLIST
SHOW-PROCESSLIST
Name
SHOW PROCESSLIST
Description
Display the running threads of the user. It should be noted that except the root user who can see all running threads, other
users can only see their own running threads, and cannot see the running threads of other users.
grammar:
illustrate:
Id: It is the unique identifier of this thread. When we find that there is a problem with this thread, we can use the kill
command to add this Id value to kill this thread. Earlier we said that the information displayed by show processlist comes
from the information_schema.processlist table, so this Id is the primary key of this table.
User: refers to the user who started this thread.
Host: Records the IP and port number of the client sending the request. Through this information, when troubleshooting
the problem, we can locate which client and which process sent the request.
Kill : The kill statement is being executed to kill the specified thread
Example
SHOW PROCESSLIST
Keywords
SHOW, PROCESSLIST
Best Practice
SHOW-TRASH
SHOW-TRASH
Name
SHOW TRASH
Description
This statement is used to view the garbage data footprint within the backend.
grammar:
illustrate:
Example
SHOW TRASH;
2. View the space occupied by garbage data of ' 192.168.0.1:9050' (specific disk information will be displayed).
Keywords
SHOW, TRASH
Best Practice
SHOW-VIEW
SHOW-VIEW
Name
SHOW VIEW
Description
This statement is used to display all views based on the given table
grammar:
Example
Keywords
SHOW, VIEW
Best Practice
SHOW-TRANSACTION
SHOW-TRANSACTION
Name
SHOW TRANSACTION
Description
This syntax is used to view transaction details for the specified transaction id or label.
grammar:
SHOW TRANSACTION
[FROM db_name]
WHERE
[id=transaction_id]
[label = label_name];
TransactionId: 4005
Label: insert_8d807d5d-bcdd-46eb-be6d-3fa87aa4952d
TransactionStatus: VISIBLE
LoadJobSourceType: INSERT_STREAMING
Reason:
ErrorReplicasCount: 0
ListenerId: -1
TimeoutMs: 300000
TransactionId: transaction id
Label: the label corresponding to the import task
Coordinator: The node responsible for transaction coordination
TransactionStatus: transaction status
PREPARE: preparation stage
COMMITTED: The transaction succeeded, but the data was not visible
VISIBLE: The transaction succeeded and the data is visible
ABORTED: Transaction failed
Example
Keywords
SHOW, TRANSACTION
Best Practice
SHOW-STREAM-LOAD
SHOW-STREAM-LOAD
Name
SHOW STREAM LOAD
Description
This statement is used to display the execution of the specified Stream Load task
grammar:
[FROM db_name]
WHERE
[STATUS = ["SUCCESS"|"FAIL"]]
[ORDER BY...]
illustrate:
1. By default, BE does not record Stream Load records. If you want to view records that need to be enabled on BE, the
configuration parameter is: enable_stream_load_record=true . For details, please refer to BE Configuration Items
Example
1. Show all Stream Load tasks of the default db
2. Display the Stream Load task of the specified db, the label contains the string "2014_01_02", and display the oldest 10
SHOW STREAM LOAD FROM example_db WHERE LABEL LIKE "2014_01_02" LIMIT 10;
3. Display the Stream Load task of the specified db and specify the label as "load_example_db_20140102"
4. Display the Stream Load task of the specified db, specify the status as "success", and sort by StartTime in descending
order
SHOW STREAM LOAD FROM example_db WHERE STATUS = "success" ORDER BY StartTime DESC;
5. Display the import tasks of the specified db and sort them in descending order of StartTime, and display 10 query results
starting from offset 5
SHOW STREAM LOAD FROM example_db ORDER BY StartTime DESC limit 5,10;
SHOW STREAM LOAD FROM example_db ORDER BY StartTime DESC limit 10 offset 5;
Keywords
Best Practice
SHOW-STATUS
SHOW-STATUS
Name
SHOW STATUS
Description
This command is used to view the execution of the Create Materialized View job submitted through the CREATE-
MATERIALIZED-VIEW statement.
[FROM database]
[WHERE]
[ORDER BY]
[LIMIT OFFSET]
database: View jobs under the specified database. If not specified, the current database is used.
WHERE: You can filter the result column, currently only the following columns are supported:
TableName: Only equal value filtering is supported.
State: Only supports equivalent filtering.
JobId: 11001
TableName: tbl1
FinishTime: NULL
BaseIndexName: tbl1
RollupIndexName: r1
RollupId: 11002
TransactionId: 5070
State: WAITING_TXN
Msg:
Progress: NULL
Timeout: 86400
WAITING_TXN:
Before officially starting to generate materialized view data, it will wait for the current running import transaction on
this table to complete. And the TransactionId field is the current waiting transaction ID. When all previous imports
for this ID are complete, the job will actually start.
Example
Keywords
SHOW, STATUS
Best Practice
SHOW-TABLE-ID
SHOW-TABLE-ID
Name
SHOW TABLE ID
Description
This statement is used to find the corresponding database name, table name according to the table id (only for
administrators)
grammar:
Example
1. Find the corresponding database name, table name according to the table id
Keywords
SHOW, TABLE, ID
Best Practice
SHOW-SMALL-FILES
SHOW-SMALL-FILES
Name
SHOW FILE
Description
This statement is used to display files created by the CREATE FILE command within a database.
Example
Keywords
SHOW, SMALL, FILES
Best Practice
SHOW-POLICY
SHOW-POLICY
Name
Description
View the row security policy under the current DB
Example
+-------------------+----------------------+-----------+------+-------------+-------------------+------+------
--------------------------------------------------------------------------------------------------------------
-----------------------+
| PolicyName | DbName | TableName | Type | FilterType | WherePredicate | User |
OriginStmt
|
+-------------------+----------------------+-----------+------+-------------+-------------------+------+------
--------------------------------------------------------------------------------------------------------------
-----------------------+
| test_row_policy_1 | default_cluster:test | table1 | ROW | RESTRICTIVE | `id` IN (1, 2) | root | /*
ApplicationName=DataGrip 2021.3.4 */ CREATE ROW POLICY test_row_policy_1 ON test.table1 AS RESTRICTIVE TO root
USING (id in (1, 2));
+-------------------+----------------------+-----------+------+-------------+-------------------+------+------
--------------------------------------------------------------------------------------------------------------
-----------------------+
2 rows in set (0.00 sec)
+-------------------+----------------------+-----------+------+------------+-------------------+--------------
--------+-----------------------------------------------------------------------------------------------------
-------------------------------------+
+-------------------+----------------------+-----------+------+------------+-------------------+--------------
--------+-----------------------------------------------------------------------------------------------------
-------------------------------------+
+-------------------+----------------------+-----------+------+------------+-------------------+--------------
--------+-----------------------------------------------------------------------------------------------------
-------------------------------------+
+---------------------+---------+-----------------------+---------------------+-------------+-----------------
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------+
+---------------------+---------+-----------------------+---------------------+-------------+-----------------
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------+
"AWS_SECRET_KEY": "******",
"AWS_REGION": "bj",
"AWS_ACCESS_KEY": "bbba",
"AWS_MAX_CONNECTIONS": "50",
"AWS_CONNECTION_TIMEOUT_MS": "1000",
"type": "s3",
"AWS_ROOT_PATH": "path/to/rootaaaa",
"AWS_BUCKET": "test-bucket",
"AWS_ENDPOINT": "bj.s3.comaaaa",
"AWS_REQUEST_TIMEOUT_MS": "3000"
} |
+---------------------+---------+-----------------------+---------------------+-------------+-----------------
--------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------+
Keywords
SHOW, POLICY
Best Practice
SHOW-CATALOG-RECYCLE-BIN
SHOW-CATALOG-RECYCLE-BIN
Name
SHOW CATALOG RECYCLE BIN
Description
This statement is used to display the dropped meta informations that can be recovered
grammar:
grammar:
:
The meaning of each column is as follows:
DbId:
TableId:
id of database
id of table
PartitionId
DropTime :: id of partition
Example
1. Display all meta informations that can be recovered
Keywords
SHOW, CATALOG RECYCLE BIN
Best Practice
Feedback
SQL Manual SQL Reference Data Types BOOLEAN
BOOLEAN
BOOLEAN
Description
BOOL, BOOLEAN
Like TINYINT, 0 stands for false and 1 for true.
keywords
BOOLEAN
TINYINT
TINYINT
Description
TINYINT
1 byte signed integer, range [-128, 127]
keywords
TINYINT
SMALLINT
SMALLINT
Description
SMALLINT
2-byte signed integer, range [-32768, 32767]
keywords
SMALLINT
INT
INT
Description
INT
4-byte signed integer, range [-2147483648, 2147483647]
keywords
INT
BIGINT
BIGINT
Description
BIGINT
8-byte signed integer, range [-9223372036854775808, 9223372036854775807]
keywords
BIGINT
LARGEINT
LARGEINT
Description
LARGEINT
16-byte signed integer, range [-2^127 + 1 ~ 2^127 - 1]
keywords
BIGINT
FLOAT
FLOAT
Description
FLOAT
4-byte floating point number
keywords
FLOAT
DOUBLE
DOUBLE
Description
DOUBLE
8-byte floating point number
keywords
DOUBLE
DECIMAL
DECIMAL
Description
DECIMAL (M [,D])
High-precision fixed-point, M stands for the total number of significant numbers (precision), D stands for the maximum
number of decimal points (scale).
The range of M is [1, 27], the range of D is [0, 9], the integer part is [1, 18].
keywords
DECIMAL
DECIMALV3
DECIMALV3
Since Version 1.2.1
DECIMALV3
Description
DECIMALV3 (M [,D])
High-precision fixed-point number, M represents the total number of significant digits, and D represents the scale.
Precision Deduction
DECIMALV3 has a very complex set of type inference rules. For different expressions, different rules will be applied for
precision inference.
Arithmetic Expressions
Plus / Minus: DECIMALV3(a, b) + DECIMALV3(x, y) -> DECIMALV3(max(a - b, x - y) + max(b, y), max(b, y)). That is, the
integer part and the decimal part use the larger value of the two operands respectively.
Aggregation functions
SUM / MULTI_DISTINCT_SUM: SUM(DECIMALV3(a, b)) -> DECIMALV3(38, b).
AVG: AVG(DECIMALV3(a, b)) -> DECIMALV3(38, max(b, 4)).
Default rules
Except for the expressions mentioned above, other expressions use default rules for precision deduction. That is, for the
expression expr(DECIMALV3(a, b)) , the result type is also DECIMALV3(a, b).
If the expected result precision is greater than the default precision, you can adjust the result precision by adjusting the
parameter's precision. For example, if the user expects to calculate AVG(col) and get DECIMALV3(x, y) as the result,
where the type of col is DECIMALV3 (a, b), the expression can be rewritten to AVG(CAST(col as DECIMALV3 (x, y)) .
If the expected result precision is less than the default precision, the desired precision can be obtained by approximating
the output result. For example, if the user expects to calculate AVG(col) and get DECIMALV3(x, y) as the result, where
the type of col is DECIMALV3(a, b), the expression can be rewritten as ROUND(AVG(col), y) .
1. It can represent a wider range. The value ranges of both precision and scale in DECIMALV3 have been significantly
expanded.
2. Higher performance. The old version of DECIMAL requires 16 bytes in memory and 12 bytes in storage, while DECIMALV3
has made adaptive adjustments as shown below.
+----------------------+------------------------------+
+----------------------+------------------------------+
+----------------------+------------------------------+
+----------------------+------------------------------+
+----------------------+------------------------------+
3. More complete precision deduction. For different expressions, different precision inference rules are applied to deduce
the precision of the results.
keywords
DECIMALV3
DATE
DATE
Description
DATE function
Syntax
Date
Convert input type to DATE type
date
Date type, the current range of values is ['0000-01-01',' 9999-12-31'], and the default
print form is 'yyyy-MM-dd'.
note
If you use version 1.2 and above, it is strongly recommended that you use the DATEV2 type instead of the DATE type as
DATEV2 is more efficient than DATE type 。
example
SELECT DATE('2003-12-31 01:02:03');
+-----------------------------+
| date('2003-12-31 01:02:03') |
+-----------------------------+
| 2003-12-31 |
+-----------------------------+
keywords
DATE
DATETIME
DATETIME
Description
DATETIME
Date and time type, value range is ['0000-01-01 00:00:00',' 9999-12-31 23:59:59'].
The form of printing is 'yyyy-MM-dd
HH:mm:ss'
note
If you use version 1.2 and above, it is strongly recommended that you use DATETIMEV2 type instead of DATETIME type.
Compared with DATETIME type, DATETIMEV2 is more efficient and supports time accuracy up to microseconds.
keywords
DATETIME
DATEV2
DATEV2
Since Version 1.2.0
DATEV2
Description
Syntax
datev2
DateV2 type, the current range of values is ['0000-01-01',' 9999-12-31'], and the default print form is 'yyyy-MM-dd'.
note
DATEV2 type is more efficient than DATE type. During calculation, DATEV2 can save half of the memory usage compared
with DATE.
example
SELECT CAST('2003-12-31 01:02:03' as DATEV2);
+---------------------------------------+
+---------------------------------------+
| 2003-12-31 |
+---------------------------------------+
keywords
DATE
DATETIMEV2
DATETIMEV2
Since Version 1.2.0
DATETIMEV2
Description
DATETIMEV2([P])
Date and time type.
The optional parameter P indicates the time precision and the value range is [0, 6], that
is, it supports up to 6 decimal places (microseconds). 0 when not set.
Value range is ['0000-01-01 00:00:00[.000000]',' 9999-12-
31 23:59:59[.999999]'].
The form of printing is 'yyyy-MM-dd HH:mm:ss.SSSSSS'
note
Compared with the DATETIME type, DATETIMEV2 is more efficient and supports a time precision of up to microseconds.
keywords
DATETIMEV2
CHAR
CHAR
Description
CHAR(M)
A fixed-length string, M represents the length of a fixed-length string. The range of M is 1-255.
keywords
CHAR
VARCHAR
VARCHAR
Description
VARCHAR(M)
A variable length string, M represents the byte length of a variable length string. The range of M is 1-65533.
Note: Variable length strings are stored in UTF-8 encoding, so usually English characters occupies 1 byte, and Chinese
characters occupies 3 bytes.
keywords
VARCHAR
STRING
STRING
Description
STRING (M)
A variable length string, max legnth(default) is 1048576(1MB). The length of the String type is also limited by the
string_type_length_soft_limit_bytes
configuration of be, the actual maximum length that can be stored take the
minimum value of both, the String type can only be used in the value column, not in the key column and the partition and
bucket columns
Note: Variable length strings are stored in UTF-8 encoding, so usually English characters occupies 1 byte, and Chinese
characters occupies 3 bytes.
keywords
STRING
HLL (HyperLogLog)
HLL (HyperLogLog)
Description
HLL
HLL cannot be used as a key column, and the aggregation type is HLL_UNION when create table.
The user does not need to
specify the length and default value.
The length is controlled within the system according to the degree of data aggregation.
And HLL columns can only be queried or used through the matching hll_union_agg, hll_raw_agg, hll_cardinality, and hll_hash.
HLL is approximate count of distinct elements, and its performance is better than Count Distinct when the amount of data is
large.
The error of HLL is usually around 1%, sometimes up to 2%.
example
where datekey=20200922
keywords
HLL,HYPERLOGLOG
BITMAP
BITMAP
Description
BITMAP
BITMAP cannot be used as a key column, and the aggregation type is BITMAP_UNION when building the table.
The user does
not need to specify the length and default value. The length is controlled within the system according to the degree of data
aggregation.
And the BITMAP column can only be queried or used by supporting functions such as bitmap_union_count,
The use of BITMAP in offline scenarios will affect the import speed. In the case of a large amount of data, the query speed will
be slower than HLL and better than Count Distinct.
Note: If BITMAP does not use a global dictionary in real-time scenarios,
using bitmap_hash() may cause an error of about one-thousandth. If the error rate is not tolerable, bitmap_hash64 can be
used instead.
example
select hour, BITMAP_UNION_COUNT(pv) over(order by hour) uv from(
where datekey=20200922
keywords
BITMAP
QUANTILE_STATE
QUANTILE_STATE
description
QUANTILE_STATE
QUANTILE_STATE cannot be used as a key column, and the aggregation type is QUANTILE_UNION when building the
table.
The user does not need to specify the length and default value. The length is controlled within the system
according to the degree of data aggregation.
And the QUANTILE_STATE column can only be queried or used through the supporting QUANTILE_PERCENT, QUANTILE_UNION
and TO_QUANTILE_STATE functions.
QUANTILE_STATE is a type for calculating the approximate value of quantiles. Different values with the same key
are pre-aggregated during loading process. When the number of aggregated values does not exceed 2048, all data
are recorded in detail. When the number of aggregated values is greater than 2048, [TDigest] is used.
(https://github.com/tdunning/t-digest/blob/main/docs/t-digest-paper/histo.pdf) algorithm to aggregate (cluster)
the data and save the centroid points after clustering.
related functions:
QUANTILE_UNION(QUANTILE_STATE):
This function is an aggregation function, which is used to aggregate the intermediate results of different
quantile calculations. The result returned by this function is still QUANTILE_STATE
The compression parameter is optional and can be set in the range [2048, 10000].
The larger the value, the higher the precision of quantile approximation calculations, the greater the memory
consumption, and the longer the calculation time.
An unspecified or set value for the compression parameter is outside the range [2048, 10000], run with the
default value of 2048
QUANTILE_PERCENT(QUANTILE_STATE):
This function converts the intermediate result variable (QUANTILE_STATE) of the quantile calculation into a
specific quantile value
notice
Now QUANTILE_STATE can only used in Aggregate Model Tables. We should turn on the switch for the QUANTILE_STATE
types feature with the following command before use:
In this way the config will be reset after the FE process restarts. For permanent setting, you can add
config enable_quantile_state_type=true inside fe.conf.
example
keywords
ARRAY
ARRAY
Since Version 1.2.0
ARRAY
description
ARRAY\<T\>
An array of T-type items, it cannot be used as a key column. Now ARRAY can only used in Duplicate Model Tables.
BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, FLOAT, DOUBLE, DECIMAL, DATE,
example
Create table example:
) ENGINE=OLAP
DUPLICATE KEY(`id`)
COMMENT "OLAP"
PROPERTIES (
"in_memory" = "false",
"storage_format" = "V2"
);
mysql> INSERT INTO `array_test` VALUES (2, [6,7,8]), (3, []), (4, null);
+------+-----------------+
| id | c_array |
+------+-----------------+
| 1 | [1, 2, 3, 4, 5] |
| 2 | [6, 7, 8] |
| 3 | [] |
| 4 | NULL |
+------+-----------------+
keywords
ARRAY
JSONB
JSONB
Since Version 1.2.0
JSONB
description
JSONB (JSON Binary) datatype.
Use binary JSON format for storage and jsonb function to extract field.
note
There are some advantanges for JSONB over plain JSON STRING.
2. JSONB format is more efficient. Using jsonb_extract functions on JSONB format is 2-4 times faster than
get_json_xx on JSON STRING format.
example
A tutorial for JSONB datatype including create table, load data and query.
USE testdb;
id INT,
j JSONB
DUPLICATE KEY(id)
PROPERTIES("replication_num" = "1");
Load data
1 \N
2 null
3 true
4 false
5 100
6 10000
7 1000000000
8 1152921504606846976
9 6.18
10 "abcd"
11 {}
13 []
14 [123, 456]
15 ["abc", "def"]
19 ''
20 'abc'
21 abc
22 100x
23 6.a8
24 {x
25 [123, abc]
,
due to the 28% of rows is invalid stream load with default configuration will fail with error message "too many filtered
rows"
"TxnId": 12019,
"Label": "744d9821-9c9f-43dc-bf3b-7ab048f14e32",
"TwoPhaseCommit": "false",
"Status": "Fail",
"NumberTotalRows": 25,
"NumberLoadedRows": 18,
"NumberFilteredRows": 7,
"NumberUnselectedRows": 0,
"LoadBytes": 380,
"LoadTimeMs": 48,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 45,
"CommitAndPublishTimeMs": 0,
"ErrorURL": "http://172.21.0.5:8840/api/_load_error_log?
file=__shard_2/error_log_insert_stmt_95435c4bf5f156df-426735082a9296af_95435c4bf5f156df_426735082a9296af"
stream load will success after set header configuration 'max_filter_ratio: 0.3'
"TxnId": 12017,
"Label": "f37a50c1-43e9-4f4e-a159-a3db6abe2579",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 25,
"NumberLoadedRows": 18,
"NumberFilteredRows": 7,
"NumberUnselectedRows": 0,
"LoadBytes": 380,
"LoadTimeMs": 68,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 45,
"CommitAndPublishTimeMs": 19,
"ErrorURL": "http://172.21.0.5:8840/api/_load_error_log?
file=__shard_0/error_log_insert_stmt_a1463f98a7b15caf-c79399b920f5bfa3_a1463f98a7b15caf_c79399b920f5bfa3"
use SELECT to view the data loaded by stream load. The column with JSONB type will be displayed as plain JSON string.
+------+---------------------------------------------------------------+
| id | j |
+------+---------------------------------------------------------------+
| 1 | NULL |
| 2 | null |
| 3 | true |
| 4 | false |
| 5 | 100 |
| 6 | 10000 |
| 7 | 1000000000 |
| 8 | 1152921504606846976 |
| 9 | 6.18 |
| 10 | "abcd" |
| 11 | {} |
| 12 | {"k1":"v31","k2":300} |
| 13 | [] |
| 14 | [123,456] |
| 15 | ["abc","def"] |
| 16 | [null,true,false,100,6.18,"abc"] |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} |
+------+---------------------------------------------------------------+
18 rows in set (0.03 sec)
+------+---------------------------------------------------------------+
| id | j |
+------+---------------------------------------------------------------+
| 1 | NULL |
| 2 | null |
| 3 | true |
| 4 | false |
| 5 | 100 |
| 6 | 10000 |
| 7 | 1000000000 |
| 8 | 1152921504606846976 |
| 9 | 6.18 |
| 10 | "abcd" |
| 11 | {} |
| 12 | {"k1":"v31","k2":300} |
| 13 | [] |
| 14 | [123,456] |
| 15 | ["abc","def"] |
| 16 | [null,true,false,100,6.18,"abc"] |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} |
| 26 | {"k1":"v1","k2":200} |
+------+---------------------------------------------------------------+
19 rows in set (0.03 sec)
Query
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
| id | j | jsonb_extract(`j`, '$')
|
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
| 1 | NULL |
NULL |
| 2 | null |
null |
| 3 | true |
true |
| 4 | false |
false |
| 5 | 100 |
100 |
| 6 | 10000 |
10000 |
| 7 | 1000000000 |
1000000000 |
| 8 | 1152921504606846976 |
1152921504606846976 |
| 9 | 6.18 |
6.18 |
| 10 | "abcd" |
"abcd" |
| 11 | {} |
{} |
| 12 | {"k1":"v31","k2":300} |
{"k1":"v31","k2":300} |
| 13 | [] |
[] |
| 14 | [123,456] |
[123,456] |
| 15 | ["abc","def"] |
["abc","def"] |
| 16 | [null,true,false,100,6.18,"abc"] |
[null,true,false,100,6.18,"abc"] |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] |
[{"k1":"v41","k2":400},1,"a",3.14] |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | {"k1":"v31","k2":300,"a1":
[{"k1":"v41","k2":400},1,"a",3.14]} |
| 26 | {"k1":"v1","k2":200} |
{"k1":"v1","k2":200} |
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
+------+---------------------------------------------------------------+----------------------------+
| id | j | jsonb_extract(`j`, '$.k1') |
+------+---------------------------------------------------------------+----------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | "v31" |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | "v31" |
| 26 | {"k1":"v1","k2":200} | "v1" |
+------+---------------------------------------------------------------+----------------------------+
+------+---------------------------------------------------------------+----------------------------+
| id | j | jsonb_extract(`j`, '$[0]') |
+------+---------------------------------------------------------------+----------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | 123 |
| 15 | ["abc","def"] | "abc" |
| 16 | [null,true,false,100,6.18,"abc"] | null |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | {"k1":"v41","k2":400} |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+----------------------------+
+------+---------------------------------------------------------------+------------------------------------+
| id | j | jsonb_extract(`j`, '$.a1') |
+------+---------------------------------------------------------------+------------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | [{"k1":"v41","k2":400},1,"a",3.14] |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+------------------------------------+
mysql> SELECT id, j, jsonb_extract(j, '$.a1[0]'), jsonb_extract(j, '$.a1[0].k1') FROM test_jsonb ORDER BY id;
+------+---------------------------------------------------------------+-------------------------------+---------
-------------------------+
| id | j | jsonb_extract(`j`, '$.a1[0]') |
jsonb_extract(`j`, '$.a1[0].k1') |
+------+---------------------------------------------------------------+-------------------------------+---------
-------------------------+
| 1 | NULL | NULL |
NULL |
| 2 | null | NULL |
NULL |
| 3 | true | NULL |
NULL |
| 4 | false | NULL |
NULL |
| 5 | 100 | NULL |
NULL |
| 6 | 10000 | NULL |
NULL |
| 7 | 1000000000 | NULL |
NULL |
| 8 | 1152921504606846976 | NULL |
NULL |
| 9 | 6.18 | NULL |
NULL |
| 10 | "abcd" | NULL |
NULL |
| 11 | {} | NULL |
NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
NULL |
| 13 | [] | NULL |
NULL |
| 14 | [123,456] | NULL |
NULL |
| 15 | ["abc","def"] | NULL |
NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | {"k1":"v41","k2":400} |
"v41" |
| 26 | {"k1":"v1","k2":200} | NULL |
NULL |
+------+---------------------------------------------------------------+-------------------------------+---------
-------------------------+
,
jsonb_extract_string will extract field with string type convert to string if the field is not string
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
| id | j | jsonb_extract_string(`j`, '$')
|
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
| 1 | NULL | NULL
|
| 2 | null | null
|
| 3 | true | true
|
| 4 | false | false
|
| 5 | 100 | 100
|
| 6 | 10000 | 10000
|
| 7 | 1000000000 | 1000000000
|
| 8 | 1152921504606846976 | 1152921504606846976
|
| 9 | 6.18 | 6.18
|
| 10 | "abcd" | abcd
|
| 11 | {} | {}
|
| 12 | {"k1":"v31","k2":300} | {"k1":"v31","k2":300}
|
| 13 | [] | []
|
| 14 | [123,456] | [123,456]
|
| 15 | ["abc","def"] | ["abc","def"]
|
| 16 | [null,true,false,100,6.18,"abc"] | [null,true,false,100,6.18,"abc"]
|
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | [{"k1":"v41","k2":400},1,"a",3.14]
|
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | {"k1":"v31","k2":300,"a1":
[{"k1":"v41","k2":400},1,"a",3.14]} |
| 26 | {"k1":"v1","k2":200} | {"k1":"v1","k2":200}
|
+------+---------------------------------------------------------------+-----------------------------------------
----------------------+
+------+---------------------------------------------------------------+-----------------------------------+
| id | j | jsonb_extract_string(`j`, '$.k1') |
+------+---------------------------------------------------------------+-----------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | v31 |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | v31 |
| 26 | {"k1":"v1","k2":200} | v1 |
+------+---------------------------------------------------------------+-----------------------------------+
,
jsonb_extract_int will extract field with int type return NULL if the field is not int
+------+---------------------------------------------------------------+-----------------------------+
| id | j | jsonb_extract_int(`j`, '$') |
+------+---------------------------------------------------------------+-----------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | 100 |
| 6 | 10000 | 10000 |
| 7 | 1000000000 | 1000000000 |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+-----------------------------+
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_extract_int(`j`, '$.k2') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | 300 |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 300 |
| 26 | {"k1":"v1","k2":200} | 200 |
+------+---------------------------------------------------------------+--------------------------------+
,
jsonb_extract_bigint will extract field with bigint type return NULL if the field is not bigint
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_extract_bigint(`j`, '$') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | 100 |
| 6 | 10000 | 10000 |
| 7 | 1000000000 | 1000000000 |
| 8 | 1152921504606846976 | 1152921504606846976 |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+--------------------------------+
+------+---------------------------------------------------------------+-----------------------------------+
| id | j | jsonb_extract_bigint(`j`, '$.k2') |
+------+---------------------------------------------------------------+-----------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | 300 |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 300 |
| 26 | {"k1":"v1","k2":200} | 200 |
+------+---------------------------------------------------------------+-----------------------------------+
,
jsonb_extract_double will extract field with double type return NULL if the field is not double
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_extract_double(`j`, '$') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | 100 |
| 6 | 10000 | 10000 |
| 7 | 1000000000 | 1000000000 |
| 8 | 1152921504606846976 | 1.152921504606847e+18 |
| 9 | 6.18 | 6.18 |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+--------------------------------+
+------+---------------------------------------------------------------+-----------------------------------+
| id | j | jsonb_extract_double(`j`, '$.k2') |
+------+---------------------------------------------------------------+-----------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | 300 |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 300 |
| 26 | {"k1":"v1","k2":200} | 200 |
+------+---------------------------------------------------------------+-----------------------------------+
,
jsonb_extract_bool will extract field with boolean type return NULL if the field is not boolean
+------+---------------------------------------------------------------+------------------------------+
| id | j | jsonb_extract_bool(`j`, '$') |
+------+---------------------------------------------------------------+------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | 1 |
| 4 | false | 0 |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+------------------------------+
+------+---------------------------------------------------------------+---------------------------------+
| id | j | jsonb_extract_bool(`j`, '$[1]') |
+------+---------------------------------------------------------------+---------------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | NULL |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | 1 |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | NULL |
| 26 | {"k1":"v1","k2":200} | NULL |
+------+---------------------------------------------------------------+---------------------------------+
,
jsonb_extract_isnull will extract field with json null type return 1 if the field is json null , else 0
json null is different from SQL NULL. SQL NULL stands for no value for a field, but json null stands for an field with special
value null.
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_extract_isnull(`j`, '$') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | 1 |
| 3 | true | 0 |
| 4 | false | 0 |
| 5 | 100 | 0 |
| 6 | 10000 | 0 |
| 7 | 1000000000 | 0 |
| 8 | 1152921504606846976 | 0 |
| 9 | 6.18 | 0 |
| 10 | "abcd" | 0 |
| 11 | {} | 0 |
| 12 | {"k1":"v31","k2":300} | 0 |
| 13 | [] | 0 |
| 14 | [123,456] | 0 |
| 15 | ["abc","def"] | 0 |
| 16 | [null,true,false,100,6.18,"abc"] | 0 |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | 0 |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 0 |
| 26 | {"k1":"v1","k2":200} | 0 |
+------+---------------------------------------------------------------+--------------------------------+
+------+---------------------------------------------------------------+-----------------------------+
| id | j | jsonb_exists_path(`j`, '$') |
+------+---------------------------------------------------------------+-----------------------------+
| 1 | NULL | NULL |
| 2 | null | 1 |
| 3 | true | 1 |
| 4 | false | 1 |
| 5 | 100 | 1 |
| 6 | 10000 | 1 |
| 7 | 1000000000 | 1 |
| 8 | 1152921504606846976 | 1 |
| 9 | 6.18 | 1 |
| 10 | "abcd" | 1 |
| 11 | {} | 1 |
| 12 | {"k1":"v31","k2":300} | 1 |
| 13 | [] | 1 |
| 14 | [123,456] | 1 |
| 15 | ["abc","def"] | 1 |
| 16 | [null,true,false,100,6.18,"abc"] | 1 |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | 1 |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 1 |
| 26 | {"k1":"v1","k2":200} | 1 |
+------+---------------------------------------------------------------+-----------------------------+
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_exists_path(`j`, '$.k1') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | 0 |
| 3 | true | 0 |
| 4 | false | 0 |
| 5 | 100 | 0 |
| 6 | 10000 | 0 |
| 7 | 1000000000 | 0 |
| 8 | 1152921504606846976 | 0 |
| 9 | 6.18 | 0 |
| 10 | "abcd" | 0 |
| 11 | {} | 0 |
| 12 | {"k1":"v31","k2":300} | 1 |
| 13 | [] | 0 |
| 14 | [123,456] | 0 |
| 15 | ["abc","def"] | 0 |
| 16 | [null,true,false,100,6.18,"abc"] | 0 |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | 0 |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 1 |
| 26 | {"k1":"v1","k2":200} | 1 |
+------+---------------------------------------------------------------+--------------------------------+
+------+---------------------------------------------------------------+--------------------------------+
| id | j | jsonb_exists_path(`j`, '$[2]') |
+------+---------------------------------------------------------------+--------------------------------+
| 1 | NULL | NULL |
| 2 | null | 0 |
| 3 | true | 0 |
| 4 | false | 0 |
| 5 | 100 | 0 |
| 6 | 10000 | 0 |
| 7 | 1000000000 | 0 |
| 8 | 1152921504606846976 | 0 |
| 9 | 6.18 | 0 |
| 10 | "abcd" | 0 |
| 11 | {} | 0 |
| 12 | {"k1":"v31","k2":300} | 0 |
| 13 | [] | 0 |
| 14 | [123,456] | 0 |
| 15 | ["abc","def"] | 0 |
| 16 | [null,true,false,100,6.18,"abc"] | 1 |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | 1 |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | 0 |
| 26 | {"k1":"v1","k2":200} | 0 |
+------+---------------------------------------------------------------+--------------------------------+
+------+---------------------------------------------------------------+----------------------+
| id | j | jsonb_type(`j`, '$') |
+------+---------------------------------------------------------------+----------------------+
| 1 | NULL | NULL |
| 2 | null | null |
| 3 | true | bool |
| 4 | false | bool |
| 5 | 100 | int |
| 6 | 10000 | int |
| 7 | 1000000000 | int |
| 8 | 1152921504606846976 | bigint |
| 9 | 6.18 | double |
| 10 | "abcd" | string |
| 11 | {} | object |
| 12 | {"k1":"v31","k2":300} | object |
| 13 | [] | array |
| 14 | [123,456] | array |
| 15 | ["abc","def"] | array |
| 16 | [null,true,false,100,6.18,"abc"] | array |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | array |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | object |
| 26 | {"k1":"v1","k2":200} | object |
+------+---------------------------------------------------------------+----------------------+
+------+---------------------------------------------------------------+-------------------------+
| id | j | jsonb_type(`j`, '$.k1') |
+------+---------------------------------------------------------------+-------------------------+
| 1 | NULL | NULL |
| 2 | null | NULL |
| 3 | true | NULL |
| 4 | false | NULL |
| 5 | 100 | NULL |
| 6 | 10000 | NULL |
| 7 | 1000000000 | NULL |
| 8 | 1152921504606846976 | NULL |
| 9 | 6.18 | NULL |
| 10 | "abcd" | NULL |
| 11 | {} | NULL |
| 12 | {"k1":"v31","k2":300} | string |
| 13 | [] | NULL |
| 14 | [123,456] | NULL |
| 15 | ["abc","def"] | NULL |
| 16 | [null,true,false,100,6.18,"abc"] | NULL |
| 17 | [{"k1":"v41","k2":400},1,"a",3.14] | NULL |
| 18 | {"k1":"v31","k2":300,"a1":[{"k1":"v41","k2":400},1,"a",3.14]} | string |
| 26 | {"k1":"v1","k2":200} | string |
+------+---------------------------------------------------------------+-------------------------+
keywords
JSONB, JSON, jsonb_parse, jsonb_parse_error_to_null, jsonb_parse_error_to_value, jsonb_extract, jsonb_extract_isnull,
jsonb_extract_bool, jsonb_extract_int, jsonb_extract_bigint, jsonb_extract_double, jsonb_extract_string, jsonb_exists_path,
jsonb_type
IN
IN
Since Version 1.2.0
IN
description
Syntax
expr IN (value, ...)
expr IN (subquery)
If expr is equal to any value in the IN list, return true; otherwise, return false.
Subquery can only return one column, and the column types returned by subquery must be compatible with expr types.
notice
Currently, bitmap columns are only returned to in subqueries supported in the vectorized engine.
example
+------+
| id |
+------+
| 2 |
| 1 |
+------+
+------+
| id |
+------+
| 1 |
| 4 |
| 5 |
+------+
+------+
| id |
+------+
| 1 |
| 3 |
+------+
keywords
IN
HELP
HELP
Name
HELP
Description
The directory of help can be queried by changing the command
grammar:
HELP <item>
Note that all text commands must be first on line and end with ';'
connect (\r) Reconnect to the server. Optional arguments are db and host.
pager (\P) Set PAGER [to_pager]. Print the query results via PAGER.
source (\.) Execute an SQL script file. Takes a file name as an argument.
tee (\T) Set outfile [to_outfile]. Append everything into given outfile.
charset (\C) Switch to another charset. Might be needed for processing binlog with multi-byte charsets.
categories:
sql-functions
sql-statements
Example
help contents
help sql-functions
help date-time-functions
Keywords
HELP
Best Practice
USE
USE
Name
USE
Description
grammar:
USE <[CATALOG_NAME].DATABASE_NAME>
illustrate:
1. USE CATALOG_NAME.DATABASE_NAME will switch the current catalog into CATALOG_NAME and then change the current
DATABASE_NAME
database into
Example
Database changed
2. If the demo database exists in catalog hms_catalog, try switching the catalog and accessing it:
Database changed
Keywords
USE
Best Practice
DESCRIBE
DESCRIBE
Name
DESCRIBE
Description
This statement is used to display the schema information of the specified table
grammar:
illustrate:
1. If ALL is specified, the schemas of all indexes (rollup) of the table will be displayed
Example
DESC table_name;
Keywords
DESCRIBE
Best Practice
SWITCH
SWITCH
Name
Description
This statement is used to switch catalog.
Syntax:
SWITCH catalog_name
Example
1. Switch to hive
SWITCH hive;
Keywords
SWITCH, CATALOG
Best Practice
REFRESH
REFRESH
Name
Since Version 1.2.0
REFRESH
Description
This statement refreshes the metadata of the specified Catalog/Database/Table.
syntax:
Example
1. Refresh hive catalog
2. Refresh database1
3. Refresh table1
Keywords
REFRESH, CATALOG, DATABASE, TABLE
Best Practice
Cluster upgrade
Doris can upgrade smoothly by rolling upgrades. The following steps are recommended for security upgrade.
The name of the BE binary that appears in this doc is doris_be , which was palo_be in previous versions.
Note:
1. Doris does not support upgrading across two-digit version numbers, for example: you cannot upgrade directly from
0.13 to 0.15, only through 0.13.x -> 0.14.x -> 0.15.x, and the three-digit version number can be upgraded across
versions, such as from 0.13 .15 can be directly upgraded to 0.14.13.1, it is not necessary to upgrade 0.14.7 or 0.14.12.1
2. The following approaches are based on highly available deployments. That is, data 3 replicas, FE high availability.
Preparen
1. Turn off the replica repair and balance operation.
There will be node restarts during the upgrade process, so unnecessary cluster balancing and replica repair logic may be
triggered. You can close it first with the following command:
# Turn off the replica ealance logic. After it is closed, the balancing operation of the ordinary table
replica will no longer be triggered.
# Turn off the replica balance logic of the colocation table. After it is closed, the replica redistribution
operation of the colocation table will no longer be triggered.
# Turn off the replica scheduling logic. After shutting down, all generated replica repair and balancing tasks
will no longer be scheduled.
After the cluster is upgraded, just use the above command to set the corresponding configuration to the original value.
2. important! ! Metadata needs to be backed up before upgrading(The entire directory needs to be backed up)! !
2. Restart the BE node and check the BE log be.INFO to see if the boot was successful.
3. If the startup fails, you can check the reason first. If the error is not recoverable, you can delete the BE directly through
DROP BACKEND, clean up the data, and restart the BE using the previous version of doris_be. Then re-ADD BACKEND.
(This method will result in the loss of a copy of the data, please make sure that three copies are complete, and perform
this operation!!!)
4. Install Java UDF function Since Version 1.2.0 Install Java UDF function:, because Java UDF function is supported from
version 1.2, you need to download the JAR package of Java UDF function from the official website and put it in the lib
directory of BE, otherwise it may will fail to start.
1. Deploy a test FE process (It is recommended to use your own local development machine, or BE node. If it is on the
Follower or Observer node, you need to stop the started process, but it is not recommended to test on the Follower or
Observer node) using the new version alone.
2. Modify the FE configuration file fe.conf for testing and set all ports to different from online.
3. Add configuration in fe.conf: cluster_id=123456
8. If the startup is successful, run sh bin/stop_fe.sh to stop the FE process of the test environment.
9. The purpose of the above 2-6 steps is to prevent the FE of the test environment from being misconnected to the online
environment after it starts.
Note:
1.1.x Before upgrading 1.2.x, you need to delete existing Native UDF ; otherwise, FE startup fails ; And since version 1.2 no
longer supports Native UDF, please use Java UDF.
Upgrade preparation
1. After data validation, the new version of BE and FE binary files are distributed to their respective directories.
2. In principle, the version upgrade needs to replace the lib directory and bin directory of FE and BE, and other directories
except conf directory, data directory (doris-meta of FE, storage of BE), and log directory.
rolling upgrade
1. Confirm that the new version of the file is deployed. Restart FE and BE instances one by one.
2. It is suggested that BE be restarted one by one and FE be restarted one by one. Because Doris usually guarantees
backward compatibility between FE and BE, that is, the old version of FE can access the new version of BE. However, the
old version of BE may not be supported to access the new version of FE.
3. It is recommended to restart the next instance after confirming the previous instance started successfully. Refer to the
Installation Deployment Document for the identification of successful instance startup.
cases, the rollback of the 3-bit or 4-bit version can be supported, but the rollback of the 2-bit version will not be supported.
Therefore, it is recommended to upgrade some nodes and observe the business operation (gray upgrade) to reduce the
upgrade risk.
Elastic scaling
Doris can easily expand and shrink FE, BE, Broker instances.
FE Scaling
High availability of FE can be achieved by expanding FE to three top-one nodes.
You can also view the FE node through the front-end page connection: http://fe_hostname:fe_http_port/frontend or
http://fe_hostname:fe_http_port/system?Path=//frontends .
The process of FE node expansion and contraction does not affect the current system operation.
Adding FE nodes
FE is divided into three roles: Leader, Follower and Observer. By default, a cluster can have only one Leader and multiple
Followers and Observers. Leader and Follower form a Paxos selection group. If the Leader goes down, the remaining
Followers will automatically select a new Leader to ensure high write availability. Observer synchronizes Leader data, but
does not participate in the election. If only one FE is deployed, FE defaults to Leader.
The first FE to start automatically becomes Leader. On this basis, several Followers and Observers can be added.
The host is the node IP of Leader, and the edit_log_port in Lead's configuration file fe.conf. The --helper is only required when
follower/observer is first startup.
or
The follower_host and observer_host is the node IP of Follower or Observer, and the edit_log_port in its configuration file
fe.conf.
View the status of Follower or Observer. Connect to any booted FE using mysql-client and execute:
SHOW PROC '/frontends';
You can view the FE currently joined the cluster and its corresponding roles.
1. The number of Follower FEs (including Leaders) must be odd. It is recommended that a maximum of three
constituent high availability (HA) modes be deployed.
2. When FE is in a highly available deployment (1 Leader, 2 Follower), we recommend that the reading service capability
of FE be extended by adding Observer FE. Of course, you can continue to add Follower FE, but it's almost
unnecessary.
3. Usually a FE node can handle 10-20 BE nodes. It is suggested that the total number of FE nodes should be less than
10. Usually three can meet most of the needs.
4. The helper cannot point to the FE itself, it must point to one or more existing running Master/Follower FEs.
Delete FE nodes
1. When deleting Follower FE, make sure that the remaining Follower (including Leader) nodes are odd.
BE Scaling
Users can login to Leader FE through mysql-client. By:
You can also view the BE node through the front-end page connection: http://fe_hostname:fe_http_port/backend or
http://fe_hostname:fe_http_port/system?Path=//backends .
The expansion and scaling process of BE nodes does not affect the current system operation and the tasks being performed,
and does not affect the performance of the current system. Data balancing is done automatically. Depending on the amount
of data available in the cluster, the cluster will be restored to load balancing in a few hours to a day. For cluster load, see the
Tablet Load Balancing Document.
Add BE nodes
The BE node is added in the same way as in the BE deployment section. The BE node is added by the ALTER SYSTEM ADD
BACKEND command.
1. After BE expansion, Doris will automatically balance the data according to the load, without affecting the use during
the period.
Delete BE nodes
There are two ways to delete BE nodes: DROP and DECOMMISSION
Note: DROP BACKEND will delete the BE directly and the data on it will not be recovered!!! So we strongly do not
recommend DROP BACKEND to delete BE nodes. When you use this statement, there will be corresponding error-proof
operation hints.
DECOMMISSION clause:
DECOMMISSION notes:
1. This command is used to safely delete BE nodes. After the command is issued, Doris attempts to migrate the data on
the BE to other BE nodes, and when all data is migrated, Doris automatically deletes the node.
2. The command is an asynchronous operation. After execution, you can see that the BE node's SystemDecommissioned
status is true through ` SHOW PROC '/backends'; Indicates that the node is offline.
3. The order does not necessarily carry out successfully. For example, when the remaining BE storage space is
insufficient to accommodate the data on the offline BE, or when the number of remaining machines does not meet
the minimum number of replicas, the command cannot be completed, and the BE will always be in the state of
SystemDecommissioned as true.
SHOW PROC '/backends';
4. The progress of DECOMMISSION can be viewed through Tablet Num, and if it is in progress,
Tablet Num will continue to decrease.
5. The operation can be carried out by:
CANCEL ALTER SYSTEM DECOMMISSION BACKEND
"be_host:be_heartbeat_service_port";
The order was cancelled. When cancelled, the data on the BE will maintain
the current amount of data remaining. Follow-up Doris re-load balancing
For expansion and scaling of BE nodes in multi-tenant deployment environments, please refer to the Multi-tenant Design
Document.
Broker Scaling
There is no rigid requirement for the number of Broker instances. Usually one physical machine is deployed. Broker addition
and deletion can be accomplished by following commands:
Broker is a stateless process that can be started or stopped at will. Of course, when it stops, the job running on it will fail. Just
try again.
load balancing
When deploying multiple FE nodes, users can deploy a load balancing layer on top of multiple FEs to achieve high availability
of Doris.
Code method
Retry and load balance yourself in the application layer code. For example, if a connection is found to be down, it will
automatically retry on other connections. Application layer code retry requires the application to configure multiple doris
front-end node addresses.
JDBC Connector
If you use mysql jdbc connector to connect to Doris, you can use jdbc's automatic retry mechanism:
jdbc:mysql:loadbalance://[host:port],[host:port].../[database][?propertyName1][=propertyValue1][&propertyName2]
[=propertyValue
ProxySQL method
ProxySQL is a flexible and powerful MySQL proxy layer. It is a MySQL middleware that can be actually used in a production
environment. It can realize read-write separation, support Query routing function, support dynamic designation of a certain
SQL for cache, support dynamic loading configuration, failure Switching and some SQL filtering functions.
Doris's FE process is responsible for receiving user connections and query requests. It itself is horizontally scalable and highly
available, but it requires users to set up a proxy on multiple FEs to achieve automatic connection load balancing.
# vim /etc/yum.repos.d/proxysql.repo
[proxysql_repo]
baseurl=http://repo.proxysql.com/ProxySQL/proxysql-1.4.x/centos/\$releasever
gpgcheck=1
gpgkey=http://repo.proxysql.com/ProxySQL/repo_pub_key
Perform installation
# yum makecache
View version
# proxysql --version
After startup, it will listen to two ports, the default is 6032 and 6033. Port 6032 is the management port of
ProxySQL, and 6033 is the port for ProxySQL to provide external services (that is, the forwarding port connected
to the real database of the forwarding backend).
# netstat -tunlp
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
ProxySQL Config
ProxySQL has a configuration file /etc/proxysql.cnf and a configuration database file /var/lib/proxysql/proxysql.db .
Special attention is needed here: If there is a "proxysql.db" file (under the /var/lib/proxysql directory), the ProxySQL
service will only be read when it is started for the first time The proxysql.cnf file and parse it; after startup, the
proxysql.cnf file will not be read! If you want the configuration in the proxysql.cnf file to take effect after restarting the
proxysql service (that is, you want proxysql to read and parse the proxysql.cnf configuration file when it restarts), you need to
delete /var/lib/proxysql/proxysql first. db database file, and then restart the proxysql service. This is equivalent to
initializing the proxysql service, and a pure proxysql.db database file will be produced again (if proxysql related routing rules,
etc. are configured before, it will be erased)
admin_variables=
admin_credentials="admin:admin" #User name and password for connecting to the management terminal
mysql_variables=
threads=4 #Specify the number of threads opened for the forwarding port
max_connections=2048
default_query_delay=0
default_query_timeout=36000000
have_compress=true
poll_timeout=2000
interfaces="0.0.0.0:6033" #Specify the forwarding port, used to connect to the back-end mysql
database, which is equivalent to acting as a proxy
default_schema="information_schema"
stacksize=1048576
connect_timeout_server=3000
monitor_username="monitor"
monitor_password="monitor"
monitor_history=600000
monitor_connect_interval=60000
monitor_ping_interval=10000
monitor_read_only_interval=1500
monitor_read_only_timeout=500
ping_interval_server_msec=120000
ping_timeout_server=500
commands_stats=true
sessions_sort=true
connect_retries_on_failure=10
mysql_servers =
mysql_users:
mysql_query_rules:
scheduler=
mysql_replication_hostgroups=
View the global_variables table information of the main library (it is in this library after login by default)
+-----+---------------+-------------------------------------+
+-----+---------------+-------------------------------------+
| 0 | main | |
| 2 | disk | /var/lib/proxysql/proxysql.db |
| 3 | stats | |
| 4 | monitor | |
| 5 | stats_history | /var/lib/proxysql/proxysql_stats.db |
+-----+---------------+-------------------------------------+
You can turn off this feature to get a quicker startup with -A
Database changed
+--------------------------------------------+
| tables |
+--------------------------------------------+
| global_variables |
| mysql_collations |
| mysql_group_replication_hostgroups |
| mysql_query_rules |
| mysql_query_rules_fast_routing |
| mysql_replication_hostgroups |
| mysql_servers |
| mysql_users |
| proxysql_servers |
| runtime_checksums_values |
| runtime_global_variables |
| runtime_mysql_group_replication_hostgroups |
| runtime_mysql_query_rules |
| runtime_mysql_query_rules_fast_routing |
| runtime_mysql_replication_hostgroups |
| runtime_mysql_servers |
| runtime_mysql_users |
| runtime_proxysql_servers |
| runtime_scheduler |
| scheduler |
+--------------------------------------------+
............
It means that other configurations may have been defined before, you can clear this table or delete the
configuration of the corresponding host
Check whether these 3 nodes are inserted successfully and their status.
hostgroup_id: 10
hostname: 192.168.9.211
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
hostgroup_id: 10
hostname: 192.168.9.212
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
hostgroup_id: 10
hostname: 192.168.9.213
port: 9030
status: ONLINE
weight: 1
compression: 0
max_connections: 1000
max_replication_lag: 0
use_ssl: 0
max_latency_ms: 0
comment:
After the above modification, load it to RUNTIME and save it to disk. The following two steps are very important,
otherwise your configuration information will be gone after you exit and must be saved
First create a user name for monitoring on the back-end master main data node
Verify the monitoring results: The indicators of the ProxySQL monitoring module are stored in the log table of
the monitor library.
The following is the monitoring of whether the connection is normal (monitoring of connect indicators):
Note: There may be many connect_errors, this is because there is an error when the monitoring information is not
configured. After the configuration, if the result of connect_error is NULL, it means normal
MySQL [(none)]> select * from mysql_server_connect_log;
+---------------+------+------------------+-------------------------+---------------+
+---------------+------+------------------+-------------------------+---------------+
+---------------+------+------------------+-------------------------+---------------+
+---------------+------+------------------+----------------------+------------+
+---------------+------+------------------+----------------------+------------+
...........
+---------------+------+------------------+----------------------+------------+
The read_only log is also empty at this time (normally, when the new environment is configured, this read-only
log is empty)
Now, load the modification of the mysql_replication_hostgroups table just now to RUNTIME to take effect 。
+--------------+---------------+------+--------+--------+
+--------------+---------------+------+--------+--------+
+--------------+---------------+------+--------+--------+
the user who sends the SQL statement, the routing rules of the SQL statement, the cache of the SQL query, the rewriting of
the SQL statement, and so on.
This section is the user configuration used by the SQL request, such as the root user. This requires that we need to add
relevant users to the back-end Doris FE node first. Here are examples of two user names root and doris.
Then go back to the mysql-proxy proxy layer node, configure the mysql_users table, and add the two users just now
to the table.
The mysql_users table has many fields. The three main fields are username, password, and default_hostgroup:
-username: The username used by the front-end to connect to ProxySQL and ProxySQL to route SQL statements
to MySQL.
-password: the password corresponding to the user name. It can be a plain text password or a hash password.
If you want to use the hash password, you can execute it on a MySQL node first select password(PASSWORD), and
then copy the encryption result to this field.
-default_hostgroup: The default routing destination of the username. For example, when the field value of
the specified root user is 10, the SQL statement sent by the root user is used by default
In this case, it will be routed to a node in the hostgroup_id=10 group.
username: root
password:
active: 1
use_ssl: 0
default_hostgroup: 10
default_schema: NULL
schema_locked: 0
transaction_persistent: 1
fast_forward: 0
backend: 1
frontend: 1
max_connections: 10000
username: doris
password: P@ssword1!
active: 1
use_ssl: 0
default_hostgroup: 10
default_schema: NULL
schema_locked: 0
transaction_persistent: 1
fast_forward: 0
backend: 1
frontend: 1
max_connections: 10000
Although the mysql_users table is not described in detail here, only users with active=1 are valid users, and the
default active is 1.
In this way, you can use the doris username and password to connect to ProxySQL through the sql client
Next, use the root user and doris user to test whether they can be routed to the default hostgroup_id=10 (it is a write group)
to read data. The following is connected through the forwarding port 6033, the connection is forwarded to the real back-end
database!
Enter password:
ERROR 9001 (HY000) at line 1: Max connect timeout reached while reaching hostgroup 10 after 10000ms
At this time, an error was found, and it was not forwarded to the real doris fe on the backend.
Through the log, you can see that there is set autocommit=0 to open the transaction
mysql-forward_autocommit=false
mysql-autocommit_false_is_transaction=false
We don’t need to read and write separation here, just turn these two parameters into true directly through the
following statement.
+--------------------+
| Database |
+--------------------+
| doris_audit_db |
| information_schema |
| retail |
+--------------------+
OK, that's the end, you can use Mysql client, JDBC, etc. to connect to ProxySQL to operate your doris.
Overview
Nginx can implement load balancing of HTTP and HTTPS protocols, as well as load balancing of TCP protocol. So, the
question is, can the load balancing of the Apache Doris database be achieved through Nginx? The answer is: yes. Next, let's
discuss how to use Nginx to achieve load balancing of Apache Doris.
Environmental preparation
Note: Using Nginx to achieve load balancing of Apache Doris database, the premise is to build an Apache Doris environment.
The IP and port of Apache Doris FE are as follows. Here I use one FE to demonstrate, multiple FEs only You need to add
multiple FE IP addresses and ports in the configuration
The Apache Doris and port to access MySQL through Nginx are shown below.
端口
IP: 172.31.7.119
: 9030
Install dependencies
sudo apt-get install build-essential
Install Nginx
cd nginx-1.18.0
vim /usr/local/nginx/conf/default.conf
events {
worker_connections 1024;
stream {
upstream mysqld {
注意这里如果是多个FE,加载这里就行了
##
###这里是配置代理的端口,超时时间等
server {
listen 6030;
proxy_connect_timeout 300s;
proxy_timeout 300s;
proxy_pass mysqld;
Start Nginx
Start the specified configuration file
cd /usr/local/nginx
/usr/local/nginx/sbin/nginx -c conf.d/default.conf
verify
Parameter explanation: -u specifies the Doris username -p specifies the Doris password, my password here is empty, so
there is no -h specifies the Nginx proxy server IP-P specifies the port
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
+--------------------+
| Database |
+--------------------+
| information_schema |
| test |
+--------------------+
You can turn off this feature to get a quicker startup with -A
Database changed
+------------------+
| Tables_in_test |
+------------------+
| dwd_product_live |
+------------------+
+-----------------+---------------+------+-------+---------+---------+
+-----------------+---------------+------+-------+---------+---------+
+-----------------+---------------+------+-------+---------+---------+
Data Backup
Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards,
you can restore data from the remote storage system to any Doris cluster through the restore command. Through this
function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between
different clusters.
To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can
view the currently deployed broker through SHOW BROKER; .
The snapshot phase takes a snapshot of the specified table or partition data file. After that, backups are all operations on
snapshots. After the snapshot, changes, imports, etc. to the table no longer affect the results of the backup. Snapshots
only generate a hard link to the current data file, which takes very little time. After the snapshot is completed, the
snapshot files will be uploaded one by one. Snapshot uploads are done concurrently by each Backend.
After the data file snapshot upload is complete, Frontend will first write the corresponding metadata to a local file, and
then upload the local metadata file to the remote warehouse through the broker. Completing the final backup job
If the table is a dynamic partition table, the dynamic partition attribute will be automatically disabled after backup. When
restoring, you need to manually enable the dynamic partition attribute of the table. The command is as follows:
4. Backup and Restore operation will NOT keep the colocate_with property of a table.
Start Backup
1. Create a hdfs remote warehouse example_repo:
ON LOCATION "hdfs://hadoop-name-node:54310/path/to/repo/"
PROPERTIES
"username" = "user",
"password" = "password"
);
WITH S3
ON LOCATION "s3://bucket_name/test"
PROPERTIES
"AWS_ENDPOINT" = "http://xxxx.xxxx.com",
"AWS_ACCESS_KEY" = "xxxx",
"AWS_SECRET_KEY" = "xxx",
"AWS_REGION" = "xxx"
);
Note that.
TO example_repo
ON (example_tbl)
4. Under the full backup example_db, the p1, p2 partitions of the table example_tbl, and the table example_tbl2 to the
warehouse example_repo:
TO example_repo
ON
example_tbl2
);
JobId: 17891847
SnapshotName: snapshot_label1
DbName: example_db
State: FINISHED
BackupObjs: [default_cluster:example_db.example_tbl]
UnfinishedTasks:
Progress:
TaskErrMsg:
Status: [OK]
Timeout: 86400
+-----------------+---------------------+--------+
+-----------------+---------------------+--------+
| snapshot_label1 | 2022-04-08-15-52-29 | OK |
+-----------------+---------------------+--------+
Best Practices
Backup
Currently, we support full backup with the smallest partition (Partition) granularity (incremental backup may be supported in
future versions). If you need to back up data regularly, you first need to plan the partitioning and bucketing of the table
reasonably when building the table, such as partitioning by time. Then, in the subsequent running process, regular data
backups are performed according to the partition granularity.
Data Migration
Users can back up the data to the remote warehouse first, and then restore the data to another cluster through the remote
warehouse to complete the data migration. Because data backup is done in the form of snapshots, new imported data after
the snapshot phase of the backup job will not be backed up. Therefore, after the snapshot is completed and until the
recovery job is completed, the data imported on the original cluster needs to be imported again on the new cluster.
It is recommended to import the new and old clusters in parallel for a period of time after the migration is complete. After
verifying the correctness of data and services, migrate services to a new cluster.
Highlights
1. Operations related to backup and recovery are currently only allowed to be performed by users with ADMIN privileges.
Related Commands
1. The commands related to the backup and restore function are as follows. For the following commands, you can use
help cmd; to view detailed help after connecting to Doris through mysql-client.
i. CREATE REPOSITORY
Create a remote repository path for backup or restore. This command needs to use the Broker process to access the
remote storage. Different brokers need to provide different parameters. For details, please refer to Broker
documentation, or you can directly back up to support through the S3 protocol For the remote storage of AWS S3
protocol, or directly back up to HDFS, please refer to Create Remote Warehouse Documentation
ii. BACKUP
Status: Used to record some status information that may appear during the entire job process.
Timeout: The timeout period of the job, in seconds.
More detailed backup information can be displayed if a where clause is specified after SHOW SNAPSHOT .
v. CANCEL BACKUP
Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and
does not delete the actual warehouse data.
More Help
For more detailed syntax and best practices used by BACKUP, please refer to the BACKUP command manual, You can also
type HELP BACKUP on the MySql client command line for more help.
Data Restore
Doris supports backing up the current data in the form of files to the remote storage system through the broker. Afterwards,
you can restore data from the remote storage system to any Doris cluster through the restore command. Through this
function, Doris can support periodic snapshot backup of data. You can also use this function to migrate data between
different clusters.
To use this function, you need to deploy the broker corresponding to the remote storage. Such as BOS, HDFS, etc. You can
view the currently deployed broker through SHOW BROKER; .
This step will first create and restore the corresponding table partition and other structures in the local cluster. After
creation, the table is visible, but not accessible.
2. Local snapshot
This step is to take a snapshot of the table created in the previous step. This is actually an empty snapshot (because the
table just created has no data), and its purpose is to generate the corresponding snapshot directory on the Backend for
later receiving the snapshot file downloaded from the remote warehouse.
3. Download snapshot
The snapshot files in the remote warehouse will be downloaded to the corresponding snapshot directory generated in
the previous step. This step is done concurrently by each Backend.
4. Effective snapshot
After the snapshot download is complete, we need to map each snapshot to the metadata of the current local table.
These snapshots are then reloaded to take effect, completing the final recovery job.
Start Restore
1. Restore the table backup_tbl in backup snapshot_1 from example_repo to database example_db1, the time version is
"2018-05-04-16-45-08". Revert to 1 copy:
FROM `example_repo`
ON ( `backup_tbl` )
PROPERTIES
"backup_timestamp"="2022-04-08-15-52-29",
"replication_num" = "1"
);
2. Restore partitions p1 and p2 of table backup_tbl in backup snapshot_2 from example_repo, and table backup_tbl2 to
database example_db1, and rename it to new_tbl with time version "2018-05-04-17-11-01". The default reverts to 3 replicas:
FROM `example_repo`
ON
`backup_tbl2` AS `new_tbl`
PROPERTIES
"backup_timestamp"="2022-04-08-15-55-43"
);
JobId: 17891851
Label: snapshot_label1
Timestamp: 2022-04-08-15-52-29
DbName: default_cluster:example_db1
State: FINISHED
AllowLoad: false
ReplicationNum: 3
RestoreObjs: {
"name": "snapshot_label1",
"database": "example_db",
"backup_time": 1649404349050,
"content": "ALL",
"olap_table_list": [
"name": "backup_tbl",
"partition_names": [
"p1",
"p2"
],
"view_list": [],
"odbc_table_list": [],
"odbc_resource_list": []
UnfinishedTasks:
Progress:
TaskErrMsg:
Status: [OK]
Timeout: 86400
Related Commands
The commands related to the backup and restore function are as follows. For the following commands, you can use help
cmd; to view detailed help after connecting to Doris through mysql-client.
1. CREATE REPOSITORY
Create a remote repository path for backup or restore. This command needs to use the Broker process to access the
remote storage. Different brokers need to provide different parameters. For details, please refer to Broker
documentation, or you can directly back up to support through the S3 protocol For the remote storage of AWS S3
protocol, directly back up to HDFS, please refer to Create Remote Warehouse Documentation
2. RESTORE
3. SHOW RESTORE
UnfinishedTasks: During SNAPSHOTTING , DOWNLOADING , COMMITTING and other stages, there will be multiple subtasks
going on at the same time. The current stage shown here is the task id of the unfinished subtasks.
TaskErrMsg: If there is an error in the execution of a subtask, the error message of the corresponding subtask will be
displayed here.
Status: Used to record some status information that may appear during the entire job process.
Timeout: The timeout period of the job, in seconds.
4. CANCEL RESTORE
Delete the created remote repository. Deleting a warehouse only deletes the mapping of the warehouse in Doris, and
does not delete the actual warehouse data.
Common mistakes
1. Restore Report An Error :[20181: invalid md5 of downloaded file:
/data/doris.HDD/snapshot/20220607095111.862.86400/19962/668322732/19962.hdr, expected:
f05b63cca5533ea0466f62a9897289b5, get: d41d8cd98f00b204e9800998ecf8427e]
If the number of copies of the table backed up and restored is inconsistent, you need to specify the number of copies
when executing the restore command. For specific commands, please refer to RESTORE command manual
2. Restore Report An Error :[COMMON_ERROR, msg: Could not set meta version to 97 since it is lower than minimum
required version 100]
Backup and restore are not caused by the same version, use the specified meta_version to read the metadata of the
previous backup. Note that this parameter is used as a temporary solution and is only used to restore the data backed up
by the old version of Doris. The latest version of the backup data already contains the meta version, so there is no need to
specify it. For the specific solution to the above error, specify meta_version = 100. For specific commands, please refer to
RESTORE command manual
More Help
For more detailed syntax and best practices used by RESTORE, please refer to the RESTORE command manual, You can also
type HELP RESTORE on the MySql client command line for more help.
Data Recover
In order to avoid disasters caused by misoperation, Doris supports data recovery of accidentally deleted
databases/tables/partitions. After dropping table or database, Doris will not physically delete the data immediately, but will
catalog_trash_expire_second
keep it in Trash for a period of time ( The default is 1 day, which can be configured through the
parameter in fe.conf). The administrator can use the RECOVER command to restore accidentally deleted data.
More Help
For more detailed syntax and best practices used by RECOVER, please refer to the RECOVER command manual, You can
also type HELP RECOVER on the MySql client command line for more help.
Sql Interception
This function is only used to limit the query statement, and does not limit the execution of the explain statement.
Support
SQL block rule by user level:
2. by setting partition_num, tablet_num, cardinality, check whether a sql reaches one of the limitations
partition_num, tablet_num, cardinality could be set together, and once reach one of them, the sql will be blocked.
Rule
SQL block rule CRUD
create SQL block rule,For more creation syntax seeCREATE SQL BLOCK RULE
: ,
sql Regex pattern Special characters need to be translated, "NULL" by default
sqlHash: Sql hash value, Used to match exactly, We print it in fe.audit.log, This parameter is the only choice between
sql and sql, "NULL" by default
partition_num: Max number of partitions will be scanned by a scan node, 0L by default
PROPERTIES(
"global"="false",
"enable"="true",
"sqlHash"=""
Notes:
That the sql statement here does not end with a semicolon
When we execute the sql that we defined in the rule just now, an exception error will be returned. An example is as follows:
ERROR 1064 (HY000): errCode = 2, detailMessage = sql match regex sql block rule: order_analysis_rule
create test_rule2, limits the maximum number of scanning partitions to 30 and the maximum scanning cardinality to 10
billion rows. As shown in the following example:
show configured SQL block rules, or show all rules if you do not specify a rule name,Please see the specific grammar
SHOW SQL BLOCK RULE
SHOW SQL_BLOCK_RULE [FOR RULE_NAME]
,
alter SQL block rule Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone,Please see
the specific grammarALTER SQL BLOCK RULE
sql and sqlHash cannot be set both. It means if sql or sqlHash is set in a rule, another property will never be allowed to
be altered
sql/sqlHash and partition_num/tablet_num/cardinality cannot be set together. For example, partition_num is set in a
rule, then sql or sqlHash will never be allowed to be altered.
,
drop SQL block rule Support multiple rules, separated by , ,Please see the specific grammarDROP SQL BLOCK RULR
Noun Interpretation
FE: Frontend, frontend node of Doris. Responsible for metadata management and request access.
BE: Backend, backend node of Doris. Responsible for query execution and data storage.
Fragment: FE will convert the execution of specific SQL statements into corresponding fragments and distribute them to
BE for execution. BE will execute corresponding fragments and gather the result of RunningProfile to send back FE.
Basic concepts
FE splits the query plan into fragments and distributes them to BE for task execution. BE records the statistics of Running
State when executing fragment. BE print the outputs statistics of fragment execution into the log. FE can also collect these
statistics recorded by each fragment and print the results on FE's web page.
Specific operation
Turn on the report switch on FE through MySQL command
After executing the corresponding SQL statement( is_report_success in old versions), we can see the report information of
the corresponding SQL statement on the FE web page like the picture below.
The latest 100 statements executed will be listed here. We can view detailed statistics of RunningProfile.
Query:
Summary:
Total: 10s323ms
Here is a detailed list of query ID, execution time, execution statement and other summary information. The next step is
to print the details of each fragment collected from be.
Fragment 0:
- MemoryLimit: 2.00 GB
- BytesReceived: 168.08 KB
- PeakUsedReservation: 0.00
- SendersBlockedTimer: 0ns
- DeserializeRowBatchTimer: 501.975us
- PeakMemoryUsage: 577.04 KB
- ConvertRowBatchTime: 180.171us
- PeakMemoryUsage: 0.00
- MemoryUsed: 0.00
- RowsReturnedRate: 811
The fragment ID is listed here; hostname show the be node executing the fragment; active: 10s270ms show the total
execution time of the node; non child: 0.14% means the execution time of the execution node itself (not including the
PeakMemoryUsage indicates the peak memory usage of EXCHANGE_NODE ; RowsReturned indicates the number of rows returned
by EXCHANGE_NODE ; RowsReturnedRate = RowsReturned / ActiveTime ; the meaning of these three statistics in other NODE the
same.
Subsequently, the statistics of the child nodes will be printed in turn. here you can distinguish the parent-child relationship by
intent.
Fragment
AverageThreadTokens: Number of threads used to execute fragment, excluding the usage of thread pool
PeakReservation: Peak memory used by buffer pool
MemoryLimit: Memory limit at query
BlockMgr
ODBC_TABLE_SINK
NumSentRows: Total number of rows written to ODBC table
TupleConvertTime: Time consuming of sending data serialization to insert statement
ResultSendTime: Time consuming of writing through ODBC driver
EXCHANGE_NODE
BytesReceived: Size of bytes received by network
SORT_NODE
InMemorySortTime: In memory sort time
InitialRunsCreated: Number of initialize sort run
MergeGetNext: Time cost of MergeSort from multiple sort_run to get the next batch (only show spilled disk)
MergeGetNextBatch: Time cost MergeSort one sort_run to get the next batch (only show spilled disk)
SortDataSize: Total sorted data
TotalMergesPerformed: Number of external sort merges
AGGREGATION_NODE
PartitionsCreated: Number of partition split by aggregate
GetResultsTime: Time to get aggregate results from each partition
HASH_JOIN_NODE
ExecOption: The way to construct a HashTable for the right child (synchronous or asynchronous), the right child in Join
may be a table or a subquery, the same is true for the left child
BuildBuckets: The number of Buckets in HashTable
BuildRows: the number of rows of HashTable
ProbeTime: Time consuming to traverse the left child for Hash Probe, excluding the time consuming to call GetNext on
the left child RowBatch
PushDownComputeTime: The calculation time of the predicate pushdown condition
PushDownTime: The total time consumed by the predicate push down. When Join, the right child who meets the
requirements is converted to the left child's in query
CROSS_JOIN_NODE
ExecOption: The way to construct RowBatchList for the right child (synchronous or asynchronous)
BuildRows: The number of rows of RowBatchList (ie the number of rows of the right child)
BuildTime: Time-consuming to construct RowBatchList
UNION_NODE
MaterializeExprsEvaluateTime: When the field types at both ends of the Union are inconsistent, the time spent to
evaluates type conversion exprs and materializes the results
ANALYTIC_EVAL_NODE
OLAP_SCAN_NODE
The OLAP_SCAN_NODE is responsible for specific data scanning tasks. One OLAP_SCAN_NODE will generate one or more
OlapScanner . Each Scanner thread is responsible for scanning part of the data.
Some or all of the predicate conditions in the query will be pushed to OLAP_SCAN_NODE . Some of these predicate conditions
will continue to be pushed down to the storage engine in order to use the storage engine's index for data filtering. The other
part will be kept in OLAP_SCAN_NODE to filter the data returned from the storage engine.
The profile of the OLAP_SCAN_NODE node is usually used to analyze the efficiency of data scanning. It is divided into three
OLAP_SCAN_NODE OlapScanner SegmentIterator
layers: , , and according to the calling relationship.
The profile of a typical OLAP_SCAN_NODE is as follows. Some indicators will have different meanings depending on the storage
format (V1 or V2).
- BytesRead: 265.00 B # The amount of data read from the data file. Assuming that 10 32-bit
integers are read, the amount of data is 10 * 4B = 40 Bytes. This data only represents the fully expanded size of
the data in memory, and does not represent the actual IO size.
- PeakMemoryUsage: 0.00 # Peak memory usage during query, not used yet
- RowsRead: 7 # The number of rows returned from the storage engine to the Scanner,
excluding the number of rows filtered by the Scanner.
- RowsReturned: 7 # The number of rows returned from ScanNode to the upper node.
- TotalReadThroughput: 74.70 KB/sec # BytesRead divided by the total time spent in this node (from Open to
Close). For IO bounded queries, this should be very close to the total throughput of all the disks
- ScannerBatchWaitTime: 426.886us # To count the time the transfer thread waits for the scaner thread to
return rowbatch.
- ScannerWorkerWaitTime: 17.745us # To count the time that the scanner thread waits for the available
worker threads in the thread pool.
OlapScanner:
- BlockConvertTime: 8.941us # The time it takes to convert a vectorized Block into a RowBlock with a
row structure. The vectorized Block is VectorizedRowBatch in V1 and RowBlockV2 in V2.
- ReaderInitTime: 5.475ms # The time when OlapScanner initializes Reader. V1 includes the time to
form MergeHeap. V2 includes the time to generate various Iterators and read the first group of blocks.
- RowsDelFiltered: 0 # Including the number of rows filtered out according to the Delete
information in the Tablet, and the number of rows filtered for marked deleted rows under the unique key model.
- RowsPushedCondFiltered: 0 # Filter conditions based on the predicates passed down, such as the
conditions passed from BuildTable to ProbeTable in Join calculation. This value is not accurate, because if the
filtering effect is poor, it will no longer be filtered.
- ScanTime: 39.24us # The time returned from ScanNode to the upper node.
- ShowHintsTime_V1: 0ns # V2 has no meaning. Read part of the data in V1 to perform ScanRange
segmentation.
SegmentIterator:
- CachedPagesNum: 30 # In V2 only, when PageCache is enabled, the number of Pages that hit the
Cache.
- CompressedBytesRead: 0.00 # In V1, the size of the data read from the file before decompression. In
V2, the pre-compressed size of the read page that did not hit the PageCache.
- NumSegmentFiltered: 0 # When generating Segment Iterator, the number of Segments that are
completely filtered out through column statistics and query conditions.
- RawRowsRead: 7 # The number of raw rows read in the storage engine. See below for
details.
- RowsBitmapIndexFiltered: 0 # Only in V2, the number of rows filtered by the Bitmap index.
- RowsStatsFiltered: 0 # In V2, the number of rows filtered by the ZoneMap index, including the
deletion condition. V1 also contains the number of rows filtered by BloomFilter.
- RowsConditionsFiltered: 0 # Only in V2, the number of rows filtered by various column indexes.
- UncompressedBytesRead: 0.00 # V1 is the decompressed size of the read data file (if the file does not
need to be decompressed, the file size is directly counted). In V2, only the decompressed size of the Page that
missed PageCache is counted (if the Page does not need to be decompressed, the Page size is directly counted)
The predicate push down and index usage can be inferred from the related indicators of the number of data rows in the
profile. The following only describes the profile in the reading process of segment V2 format data. In segment V1 format, the
meaning of these indicators is slightly different.
When reading a segment V2, if the query has key_ranges (the query range composed of prefix keys), first filter the data
through the SortkeyIndex index, and the number of filtered rows is recorded in RowsKeyRangeFiltered .
After that, use the Bitmap index to perform precise filtering on the columns containing the bitmap index in the query
condition, and the number of filtered rows is recorded in RowsBitmapIndexFiltered .
After that, according to the equivalent (eq, in, is) condition in the query condition, use the BloomFilter index to filter the
data and record it in RowsBloomFilterFiltered . The value of RowsBloomFilterFiltered is the difference between the total
number of rows of the Segment (not the number of rows filtered by the Bitmap index) and the number of remaining
rows after BloomFilter, so the data filtered by BloomFilter may overlap with the data filtered by Bitmap.
After that, use the ZoneMap index to filter the data according to the query conditions and delete conditions and record it
in RowsStatsFiltered .
RowsConditionsFiltered is the number of rows filtered by various indexes, including the values o f
RowsBloomFilterFiltered RowsStatsFiltered
and .
So far, the Init phase is completed, and the number of rows filtered by the condition to be deleted in the Next phase is
recorded in RowsDelFiltered . Therefore, the number of rows actually filtered by the delete condition are recorded in
RowsStatsFiltered and RowsDelFiltered respectively.
RawRowsRead is the final number of rows to be read after the above filtering.
RowsRead is the number of rows finally returned to Scanner. RowsRead is usually smaller than RawRowsRead , because
returning from the storage engine to the Scanner may go through a data aggregation. If the difference between
RawRowsRead and RowsRead is large, it means that a large number of rows are aggregated, and aggregation may be time-
consuming.
RowsReturned is the number of rows finally returned by ScanNode to the upper node. RowsReturned is usually smaller
than RowsRead . Because there will be some predicate conditions on the Scanner that are not pushed down to the
storage engine, filtering will be performed once. If the difference between RowsRead and RowsReturned is large, it means
that many rows are filtered in the Scanner. This shows that many highly selective predicate conditions are not pushed to
the storage engine. The filtering efficiency in Scanner is worse than that in storage engine.
Through the above indicators, you can roughly analyze the number of rows processed by the storage engine and the size of
the final filtered result row. Through the Rows***Filtered group of indicators, it is also possible to analyze whether the query
conditions are pushed down to the storage engine, and the filtering effects of different indexes. In addition, a simple analysis
can be made through the following aspects.
Many indicators under OlapScanner , such as IOTimer , BlockFetchTime , etc., are the accumulation of all Scanner thread
indicators, so the value may be relatively large. And because the Scanner thread reads data asynchronously, these
cumulative indicators can only reflect the cumulative working time of the Scanner, and do not directly represent the
time consumption of the ScanNode. The time-consuming ratio of ScanNode in the entire query plan is the value
recorded in the Active field. Sometimes it appears that IOTimer has tens of seconds, but Active is actually only a few
seconds. This situation is usually due to:
IOTimer is the accumulated time of multiple Scanners, and there are more Scanners.
The upper node is time-consuming. For example, the upper node takes 100 seconds, while the lower ScanNode only
Active
takes 10 seconds. The field reflected in may be only a few milliseconds. Because while the upper layer is
processing data, ScanNode has performed data scanning asynchronously and prepared the data. When the upper
node obtains data from ScanNode, it can obtain the prepared data, so the Active time is very short.
NumScanners represents the number of Tasks submitted by the Scanner to the thread pool. It is scheduled by the thread
pool in RuntimeState . The two parameters doris_scanner_thread_pool_thread_num and
doris_scanner_thread_pool_queue_size control the size of the thread pool and the queue length respectively. Too many
or too few threads will affect query efficiency. At the same time, some summary indicators can be divided by the number
of threads to roughly estimate the time consumption of each thread.
TabletCount indicates the number of tablets to be scanned. Too many may mean a lot of random read and data merge
operations.
UncompressedBytesRead indirectly reflects the amount of data read. If the value is large, it means that there may be a lot
of IO operations.
CachedPagesNum and TotalPagesNum can check the hitting status of PageCache. The higher the hit rate, the less time-
consuming IO and decompression operations.
Buffer pool
AllocTime: Memory allocation time
CumulativeAllocationBytes: Cumulative amount of memory allocated
tracing
Tracing records the life cycle of a request execution in the system, including the request and its sub-procedure call links,
execution time and statistics, which can be used for slow query location, performance bottleneck analysis, etc.
Principle
doris is responsible for collecting traces and exporting them to a third-party tracing analysis system, which is responsible for
the presentation and storage of traces.
Quick Start
doris currently supports exporting traces directly to zipkin.
Deploy zipkin
curl -sSL https://zipkin.io/quickstart.sh | bash -s
trace_export_url = http://127.0.0.1:9411/api/v2/spans
enable_tracing = true
trace_export_url = http://127.0.0.1:9411/api/v2/spans
# Queue size for caching spans. span export will be triggered once when the number of spans reaches half of the
queue capacity. spans arriving in the queue will be discarded when the queue is full.
max_span_queue_size=2048
max_span_export_batch_size=512
export_span_schedule_delay_millis=500
Start fe and be
sh fe/bin/start_fe.sh --daemon
sh be/bin/start_be.sh --daemon
Executing a query
...
View zipkin UI
The browser opens http://127.0.0.1:9411/zipkin/ to view the query tracing.
Meanwhile, opentelemetry collector provides a rich set of operators to process traces. For example, filterprocessor ,
tailsamplingprocessor. For more details, refer to collector processor.
Download collector
Download otelcol-contrib, available on the official website more precompiled versions for more platforms
wget https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.55.0/otelcol-
contrib_0.55.0_linux_amd64.tar.gz
The following configuration file uses the otlp (OpenTelemetry Protocol) protocol to receive traces data, perform batch
processing and filter out traces longer than 50ms, and finally export them to zipkin and file.
receivers:
otlp:
protocols:
http:
exporters:
zipkin:
endpoint: "http://10.81.85.90:8791/api/v2/spans"
file:
path: ./filename.json
processors:
batch:
tail_sampling:
policies:
name: duration_policy,
type: latency,
extensions:
service:
pipelines:
traces:
receivers: [otlp]
EOF
Start collector
nohup ./otelcol-contrib --config=otel-collector-config.yaml &
trace_exporter = collector
# Configure traces export to collector, 4318 is the default port for collector otlp http
trace_export_url = http://127.0.0.1:4318/v1/traces
trace_exporter = collector
# Configure traces export to collector, 4318 is the default port for collector otlp http
trace_export_url = http://127.0.0.1:4318/v1/traces
# Queue size for caching spans. span export will be triggered once when the number of spans reaches half of the
queue capacity. spans arriving in the queue will be discarded when the queue is full.
max_span_queue_size=2048
max_span_export_batch_size=512
export_span_schedule_delay_millis=500
Start fe and be
sh fe/bin/start_fe.sh --daemon
sh be/bin/start_be.sh --daemon
Executing a query
...
View zipkin UI
...
performance optimization
Monitor Metrics
(TODO)
There is no English document, please visit the Chinese document.
If Doris' data disk capacity is not controlled, the process will hang because the disk is full. Therefore, we monitor the disk
usage and remaining capacity, and control various operations in the Doris system by setting different warning levels, and try
to avoid the situation where the disk is full.
Glossary
Data Dir :Data directory, each data directory specified in the storage_root_path of the BE configuration file be.conf .
Usually a data directory corresponds to a disk, so the following disk also refers to a data directory.
Basic Principles
BE will report disk usage to FE on a regular basis (every minute). FE records these statistical values and restricts various
operation requests based on these statistical values.
Two thresholds, High Watermark and Flood Stage, are set in FE. Flood Stage is higher than High Watermark. When the disk
usage is higher than High Watermark, Doris will restrict the execution of certain operations (such as replica balancing, etc.). If
it is higher than Flood Stage, certain operations (such as load data) will be prohibited.
At the same time, a Flood Stage is also set on the BE. Taking into account that FE cannot fully detect the disk usage on BE in
a timely manner, and cannot control certain BE operations (such as Compaction). Therefore, Flood Stage on the BE is used
for the BE to actively refuse and stop certain operations to achieve the purpose of self-protection.
FE Parameter
High Watermark:
When disk capacity more than storage_high_watermark_usage_percent , or disk free capacity less than
storage_min_left_capacity_bytes
, the disk will no longer be used as the destination path for the following operations:
Tablet Balance
Colocation Relocation
Decommission
Flood Stage:
storage_flood_stage_usage_percent
When disk capacity more than , or disk free capacity less than
storage_flood_stage_left_capacity_bytes , the disk will no longer be used as the destination path for the following
operations:
Tablet Balance
Colocation Relocation
Replica make up
Restore
Load/Insert
BE Parameter
Flood Stage:
When disk capacity more than storage_flood_stage_usage_percent , and disk free capacity less than
storage_flood_stage_left_capacity_bytes , the following operations on this disk will be prohibited:
Base/Cumulative Compaction
Data load
Clone Task (Usually occurs when the replica is repaired or balanced.)
Push Task (Occurs during the Loading phase of Hadoop import, and the file is downloaded. )
Alter Task (Schema Change or Rollup Task.)
Download Task (The Downloading phase of the recovery operation.)
By deleting tables or partitions, you can quickly reduce the disk space usage and restore the cluster.
Note: Only the DROP
DELETE
operation can achieve the purpose of quickly reducing the disk space usage, the operation cannot.
BE expansion
After backend expansion, data tablets will be automatically balanced to BE nodes with lower disk usage. The expansion
operation will make the cluster reach a balanced state in a few hours or days depending on the amount of data and the
number of nodes.
You can reduce the number of replica of a table or partition. For example, the default 3 replica can be reduced to 2
replica. Although this method reduces the reliability of the data, it can quickly reduce the disk usage rate and restore the
cluster to normal.
This method is usually used in emergency recovery systems. Please restore the number of copies to 3
after reducing the disk usage rate by expanding or deleting data after recovery.
Modifying the replica operation takes effect instantly, and the backends will automatically and asynchronously delete the
redundant replica.
When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely
detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can
start.
Files in the following directories can be deleted directly:
:
log/ Log files in the log directory.
If the BE can still be started, you can use ADMIN CLEAN TRASH ON(BackendHost:BackendHeartBeatPort); to actively
clean up temporary files. all trash files and expired snapshot files will be cleaned up, This will affect the operation of
restoring data from the trash bin.
If you do not manually execute `ADMIN CLEAN TRASH`, the system will still automatically execute the cleanup
within a few minutes to tens of minutes.There are two situations as follows:
* If the disk usage does not reach 90% of the **Flood Stage**, expired trash files and expired snapshot files
will be cleaned up. At this time, some recent files will be retained without affecting the recovery of data.
* If the disk usage has reached 90% of the **Flood Stage**, **all trash files** and expired snapshot files will
be cleaned up, **This will affect the operation of restoring data from the trash bin**.
The time interval for automatic execution can be changed by `max_garbage_sweep_interval` and
`min_garbage_sweep_interval` in the configuration items.
When the recovery fails due to lack of trash files, the following results may be returned:
```
```
When none of the above operations can free up capacity, you need to delete data files to free up space. The data file is in
the data/ directory of the specified data directory. To delete a tablet, you must first ensure that at least one replica of the
tablet is normal, otherwise deleting the only replica will result in data loss.
data/0/12345/
Record the tablet id and schema hash. The schema hash is the name of the next-level directory of the previous step.
The following is 352781111:
data/0/12345/352781111
rm -rf data/0/12345/
Repairing and balancing copies of tables with Collocation attributes can be referred to HERE
Noun Interpretation
1. Tablet: The logical fragmentation of a Doris table, where a table has multiple fragmentations.
2. Replica: A sliced copy, defaulting to three copies of a slice.
3. Healthy Replica: A healthy copy that survives at Backend and has a complete version.
4. Tablet Checker (TC): A resident background thread that scans all Tablets regularly, checks the status of these Tablets, and
decides whether to send them to Tablet Scheduler based on the results.
5. Tablet Scheduler (TS): A resident background thread that handles Tablets sent by Tablet Checker that need to be repaired.
At the same time, cluster replica balancing will be carried out.
6. Tablet SchedCtx (TSC): is a tablet encapsulation. When TC chooses a tablet, it encapsulates it as a TSC and sends it to TS.
7. Storage Medium: Storage medium. Doris supports specifying different storage media for partition granularity, including
SSD and HDD. The replica scheduling strategy is also scheduled for different storage media.
+--------+ +-----------+
| Meta | | Backends |
+---^----+ +------^----+
1. Check tablets | | |
+--------v------+ +-----------------+
+---------------+ +-----------------+
2. Waiting to be scheduled
Duplicate status
Multiple copies of a Tablet may cause state inconsistencies due to certain circumstances. Doris will attempt to automatically
fix the inconsistent copies of these states so that the cluster can recover from the wrong state as soon as possible.
1. BAD
That is, the copy is damaged. Includes, but is not limited to, the irrecoverable damaged status of copies caused by disk
failures, BUGs, etc.
2. VERSION_MISSING
Version missing. Each batch of imports in Doris corresponds to a data version. A copy of the data consists of several
consecutive versions. However, due to import errors, delays and other reasons, the data version of some copies may be
incomplete.
3. HEALTHY
Health copy. That is, a copy of the normal data, and the BE node where the copy is located is in a normal state (heartbeat
is normal and not in the offline process).
The health status of a Tablet is determined by the status of all its copies. There are the following categories:
1. REPLICA_MISSING
The copy is missing. That is, the number of surviving copies is less than the expected number of copies.
2. VERSION_INCOMPLETE
The number of surviving copies is greater than or equal to the number of expected copies, but the number of healthy
copies is less than the number of expected copies.
3. REPLICA_RELOCATING
Have a full number of live copies of the replication num version, but the BE nodes where some copies are located are in
unavailable state (such as decommission)
4. REPLICA_MISSING_IN_CLUSTER
When using multi-cluster, the number of healthy replicas is greater than or equal to the expected number of replicas, but
the number of replicas in the corresponding cluster is less than the expected number of replicas.
5. REDUNDANT
Duplicate redundancy. Healthy replicas are in the corresponding cluster, but the number of replicas is larger than the
expected number. Or there's a spare copy of unavailable.
6. FORCE_REDUNDANT
This is a special state. It only occurs when the number of existed replicas is greater than or equal to the number of
available nodes, and the number of available nodes is greater than or equal to the number of expected replicas, and when
the number of alive replicas is less than the number of expected replicas. In this case, you need to delete a copy first to
ensure that there are available nodes for creating a new copy.
7. COLOCATE_MISMATCH
Fragmentation status of tables for Collocation attributes. Represents that the distribution of fragmented copies is
inconsistent with the specified distribution of Colocation Group.
8. COLOCATE_REDUNDANT
Fragmentation status of tables for Collocation attributes. Represents the fragmented copy redundancy of the Colocation
table.
9. HEALTHY
Note 1: The main idea of replica repair is to make the number of fragmented replicas reach the desired value by creating
or completing them first. Then delete the redundant copy.
Note 2: A clone task is to complete the process of copying specified data from a specified remote end to a specified
destination.
1. REPLICA_MISSING/REPLICA_RELOCATING
Select a low-load, available BE node as the destination. Choose a healthy copy as the source. Clone tasks copy a
complete copy from the source to the destination. For replica completion, we will directly select an available BE node,
regardless of the storage medium.
2. VERSION_INCOMPLETE
Select a relatively complete copy as the destination. Choose a healthy copy as the source. The clone task attempts to
copy the missing version from the source to the destination.
3. REPLICA_MISSING_IN_CLUSTER
4. REDUNDANT
Usually, after repair, there will be redundant copies in fragmentation. We select a redundant copy to delete it. The
selection of redundant copies follows the following priorities:
5. FORCE_REDUNDANT
Unlike REDUNDANT, because at this point Tablet has a copy missing, because there are no additional available nodes for
creating new copies. So at this point, a copy must be deleted to free up a available node for creating a new copy.
The
order of deleting copies is the same as REDUNDANT.
6. COLOCATE_MISMATCH
Select one of the replica distribution BE nodes specified in Colocation Group as the destination node for replica
completion.
7. COLOCATE_REDUNDANT
Delete a copy on a BE node that is distributed by a copy specified in a non-Colocation Group.
Doris does not deploy a copy of the same Tablet on a different BE of the same host when selecting a replica node. It
ensures that even if all BEs on the same host are deactivated, all copies will not be lost.
Scheduling priority
Waiting for the scheduled fragments in Tablet Scheduler gives different priorities depending on the status. High priority
fragments will be scheduled first. There are currently several priorities.
1. VERY_HIGH
REDUNDANT. For slices with duplicate redundancy, we give priority to them. Logically, duplicate redundancy is the
least urgent, but because it is the fastest to handle and can quickly release resources (such as disk space, etc.), we
give priority to it.
FORCE_REDUNDANT. Ditto.
2. HIGH
REPLICA_MISSING and most copies are missing (for example, 2 copies are missing in 3 copies)
VERSION_INCOMPLETE and most copies are missing
COLOCATE_MISMATCH We hope that the fragmentation related to the Collocation table can be repaired as soon as
possible.
COLOCATE_REDUNDANT
3. NORMAL
REPLICA_MISSING, but most survive (for example, three copies lost one)
VERSION_INCOMPLETE, but most copies are complete
REPLICA_RELOCATING and relocate is required for most replicas (e.g. 3 replicas with 2 replicas)
4. LOW
REPLICA_MISSING_IN_CLUSTER
REPLICA_RELOCATING most copies stable
Manual priority
The system will automatically determine the scheduling priority. Sometimes, however, users want the fragmentation of
some tables or partitions to be repaired faster. So we provide a command that the user can specify that a slice of a table or
This command tells TC to give VERY HIGH priority to the problematic tables or partitions that need to be repaired first when
scanning Tablets.
Note: This command is only a hint, which does not guarantee that the repair will be successful, and the priority will
change with the scheduling of TS. And when Master FE switches or restarts, this information will be lost.
Priority scheduling
Priority ensures that severely damaged fragments can be repaired first, and improves system availability. But if the high
priority repair task fails all the time, the low priority task will never be scheduled. Therefore, we will dynamically adjust the
priority of tasks according to the running status of tasks, so as to ensure that all tasks have the opportunity to be scheduled.
If the scheduling fails for five consecutive times (e.g., no resources can be obtained, no suitable source or destination can
be found, etc.), the priority will be lowered.
At the same time, in order to ensure the weight of the initial priority, we stipulate that the initial priority is VERY HIGH, and
the lowest is lowered to NORMAL. When the initial priority is LOW, it is raised to HIGH at most. The priority adjustment here
also adjusts the priority set manually by the user.
Replicas Balance
Doris automatically balances replicas within the cluster. Currently supports two rebalance strategies, BeLoad and Partition.
BeLoad rebalance will consider about the disk usage and replica count for each BE. Partition rebalance just aim at replica
count for each partition, this helps to avoid hot spots. If you want high read/write performance, you may need this. Note that
Partition rebalance do not consider about the disk usage, pay more attention to it when you are using Partition rebalance.
The strategy selection config is not mutable at runtime.
BeLoad
The main idea of balancing is to create a replica of some fragments on low-load nodes, and then delete the replicas of these
fragments on high-load nodes. At the same time, because of the existence of different storage media, there may or may not
exist one or two storage media on different BE nodes in the same cluster. We require that fragments of storage medium A be
stored in storage medium A as far as possible after equalization. So we divide the BE nodes of the cluster according to the
storage medium. Then load balancing scheduling is carried out for different BE node sets of storage media.
Similarly, replica balancing ensures that a copy of the same table will not be deployed on the BE of the same host.
BE Node Load
We use Cluster LoadStatistics (CLS) to represent the load balancing of each backend in a cluster. Tablet Scheduler triggers
cluster equilibrium based on this statistic. We currently calculate a load Score for each BE as the BE load score by using disk
usage and number of copies. The higher the score, the heavier the load on the BE.
Disk usage and number of copies have a weight factor, which is capacityCoefficient and replicaNumCoefficient, respectively.
The sum of them is constant to 1. Among them, capacityCoefficient will dynamically adjust according to actual disk
utilization. When the overall disk utilization of a BE is below 50%, the capacityCoefficient value is 0.5, and if the disk utilization
is above 75% (configurable through the FE configuration item capacity_used_percent_high_water ), the value is 1. If the
utilization rate is between 50% and 75%, the weight coefficient increases smoothly. The formula is as follows:
The weight coefficient ensures that when disk utilization is too high, the backend load score will be higher to ensure that the
BE load is reduced as soon as possible.
Partition
The main idea of partition rebalancing is to decrease the skew of partitions. The skew of the partition is defined as the
difference between the maximum replica count of the partition over all bes and the minimum replica count over all bes.
So we only consider about the replica count, do not consider replica size(disk usage).
To fewer moves, we use
TwoDimensionalGreedyAlgo which two dims are cluster & partition. It prefers a move that reduce the skew of the cluster
when we want to rebalance a max skew partition.
Skew Info
The skew info is represented by ClusterBalanceInfo . partitionInfoBySkew is a multimap which key is the partition's skew,
so we can get max skew partitions simply. beByTotalReplicaCount is a multimap which key is the total replica count of the
backend.
When get more than one max skew partitions, we random select one partition to calculate the move.
Equilibrium strategy
Tablet Scheduler uses Load Balancer to select a certain number of healthy fragments as candidate fragments for balance in
each round of scheduling. In the next scheduling, balanced scheduling will be attempted based on these candidate
fragments.
Resource control
Both replica repair and balancing are accomplished by replica copies between BEs. If the same BE performs too many tasks
at the same time, it will bring a lot of IO pressure. Therefore, Doris controls the number of tasks that can be performed on
each node during scheduling. The smallest resource control unit is the disk (that is, a data path specified in be.conf). By
default, we configure two slots per disk for replica repair. A clone task occupies one slot at the source and one slot at the
destination. If the number of slots is zero, no more tasks will be assigned to this disk. The number of slots can be configured
by FE's schedule_slot_num_per_path parameter.
In addition, by default, we provide two separate slots per disk for balancing tasks. The purpose is to prevent high-load nodes
from losing space by balancing because slots are occupied by repair tasks.
Tablet state
Through SHOW PROC'/cluster_health/tablet_health'; commands can view the replica status of the entire cluster.
+-------+--------------------------------+-----------+------------+-------------------+----------------------
+----------------------+--------------+----------------------------+-------------------------+----------------
---+---------------------+----------------------+----------------------+------------------+-------------------
----------+-----------------+-------------+------------+
+-------+--------------------------------+-----------+------------+-------------------+----------------------
+----------------------+--------------+----------------------------+-------------------------+----------------
---+---------------------+----------------------+----------------------+------------------+-------------------
----------+-----------------+-------------+------------+
| 10005 | default_cluster:doris_audit_db | 84 | 84 | 0 | 0 |
0 | 0 | 0 | 0 | 0
| 0 | 0 | 0 | 0 | 0
| 0 | 0 | 0 |
+-------+--------------------------------+-----------+------------+-------------------+----------------------
+----------------------+--------------+----------------------------+-------------------------+----------------
---+---------------------+----------------------+----------------------+------------------+-------------------
----------+-----------------+-------------+------------+
The HealthyNum column shows how many Tablets are in a healthy state in the corresponding database.
ReplicaCompactionTooSlowNum column shows how many Tablets are in a too many versions state in the corresponding
database, InconsistentNum column shows how many Tablets are in an inconsistent replica state in the corresponding
database. The last Total line counts the entire cluster. Normally TabletNum and HealthyNum should be equal. If it's not
equal, you can further see which Tablets are there. As shown in the figure above, one table in the ssb1 database is not
healthy, you can use the following command to see which one is.
+-----------------------+--------------------------+--------------------------+------------------+------------
--------------------+-----------------------------+-----------------------+-------------------------+---------
-----------------+--------------------------+----------------------+---------------------------------+--------
-------------+-----------------+
+-----------------------+--------------------------+--------------------------+------------------+------------
--------------------+-----------------------------+-----------------------+-------------------------+---------
-----------------+--------------------------+----------------------+---------------------------------+--------
-------------+-----------------+
| 14679 | | | |
| | | | |
| | | | |
+-----------------------+--------------------------+--------------------------+------------------+------------
--------------------+-----------------------------+-----------------------+-------------------------+---------
-----------------+--------------------------+----------------------+---------------------------------+--------
-------------+-----------------+
The figure above shows the specific unhealthy Tablet ID (14679). Later we'll show you how to view the status of each
copy of a specific Tablet.
Users can view the status of a copy of a specified table or partition through the following commands and filter the status
through a WHERE statement. If you look at table tbl1, the state on partitions P1 and P2 is a copy of OK:
ADMIN SHOW REPLICA STATUS FROM tbl1 PARTITION (p1, p2) WHERE STATUS = "OK";
+----------+-----------+-----------+---------+-------------------+--------------------+------------------+----
--------+------------+-------+--------+--------+
+----------+-----------+-----------+---------+-------------------+--------------------+------------------+----
--------+------------+-------+--------+--------+
+----------+-----------+-----------+---------+-------------------+--------------------+------------------+----
--------+------------+-------+--------+--------+
The status of all copies is shown here. Where IsBad is listed as true , the copy is damaged. The Status column displays
other states. Specific status description, you can see help through HELP ADMIN SHOW REPLICA STATUS .
The ADMIN SHOW REPLICA STATUS command is mainly used to view the health status of copies. Users can also view
additional information about copies of a specified table by using the following commands:
+----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------
------+------------------+----------------------+---------------+----------+----------+--------+--------------
-----------+--------------+----------------------+--------------+----------------------+----------------------
+----------------------+
| TabletId | ReplicaId | BackendId | SchemaHash | Version | VersionHash | LstSuccessVersion |
LstSuccessVersionHash | LstFailedVersion | LstFailedVersionHash | LstFailedTime | DataSize | RowCount | State
| LstConsistencyCheckTime | CheckVersion | CheckVersionHash | VersionCount | PathHash |
MetaUrl | CompactionStatus |
+----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------
------+------------------+----------------------+---------------+----------+----------+--------+--------------
-----------+--------------+----------------------+--------------+----------------------+----------------------
+----------------------+
| 29502429 | 29502432 | 10006 | 1421156361 | 2 | 0 | 2 | 0
| -1 | 0 | N/A | 784 | 0 | NORMAL | N/A
| -1 | -1 | 2 | -5822326203532286804 | url | url
|
+----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------
------+------------------+----------------------+---------------+----------+----------+--------+--------------
-----------+--------------+----------------------+--------------+----------------------+----------------------
+----------------------+
The figure above shows some additional information, including copy size, number of rows, number of versions, where
the data path is located.
Note: The contents of the State column shown here do not represent the health status of the replica, but the status
of the replica under certain tasks, such as CLONE, SCHEMA CHANGE, ROLLUP, etc.
In addition, users can check the distribution of replicas in a specified table or partition by following commands.
+-----------+------------+-------+---------+
+-----------+------------+-------+---------+
| 10000 | 7 | | 7.29 % |
| 10001 | 9 | | 9.38 % |
| 10002 | 7 | | 7.29 % |
| 10003 | 7 | | 7.29 % |
| 10004 | 9 | | 9.38 % |
+-----------+------------+-------+---------+
Here we show the number and percentage of replicas of table tbl1 on each BE node, as well as a simple graphical display.
When we want to locate a specific Tablet, we can use the following command to view the status of a specific Tablet. For
example, check the tablet with ID 2950253:
+------------------------+-----------+---------------+-----------+----------+----------+-------------+--------
--+--------+---------------------------------------------------------------------------+
+------------------------+-----------+---------------+-----------+----------+----------+-------------+--------
--+--------+---------------------------------------------------------------------------+
+------------------------+-----------+---------------+-----------+----------+----------+-------------+--------
--+--------+---------------------------------------------------------------------------+
The figure above shows the database, tables, partitions, roll-up tables and other information corresponding to this tablet.
The user can copy the command in the DetailCmd command to continue executing:
Show Proc'/DBS/29502391/29502428/Partitions/29502427/29502428/29502553;
+-----------+-----------+---------+-------------+-------------------+-----------------------+-----------------
-+----------------------+---------------+------------+----------+----------+--------+-------+--------------+--
--------------------+
+-----------+-----------+---------+-------------+-------------------+-----------------------+-----------------
-+----------------------+---------------+------------+----------+----------+--------+-------+--------------+--
--------------------+
| 43734060 | 10004 | 2 | 0 | -1 | 0 | -1
| 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 |
-8566523878520798656 |
| 29502555 | 10002 | 2 | 0 | 2 | 0 | -1
| 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 |
1885826196444191611 |
| 39279319 | 10007 | 2 | 0 | -1 | 0 | -1
| 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 |
1656508631294397870 |
+-----------+-----------+---------+-------------+-------------------+-----------------------+-----------------
-+----------------------+---------------+------------+----------+----------+--------+-------+--------------+--
--------------------+
The figure above shows all replicas of the corresponding Tablet. The content shown here is the same as SHOW TABLETS
FROM tbl1; . But here you can clearly see the status of all copies of a specific Tablet.
+----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+-
--------+---------------------+---------------------+---------------------+----------+------+-------------+---
------------+---------------------+------------+---------------------+--------+---------------------+---------
----------------------+
| TabletId | Type | Status | State | OrigPrio | DynmPrio | SrcBe | SrcPath | DestBe | DestPath |
Timeout | Create | LstSched | LstVisit | Finished | Rate | FailedSched |
FailedRunning | LstAdjPrio | VisibleVer | VisibleVerHash | CmtVer | CmtVerHash | ErrMsg
|
+----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+-
--------+---------------------+---------------------+---------------------+----------+------+-------------+---
------------+---------------------+------------+---------------------+--------+---------------------+---------
----------------------+
+----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+-
--------+---------------------+---------------------+---------------------+----------+------+-------------+---
------------+---------------------+------------+---------------------+--------+---------------------+---------
----------------------+
TabletId: The ID of the Tablet waiting to be scheduled. A scheduling task is for only one Tablet
By default, we reserve only the last 1,000 completed tasks. The columns in the result have the same meaning as
pending_tablets . If State is listed as FINISHED , the task is normally completed. For others, you can see the specific
reason based on the error information in the ErrMsg column.
You can view the current load of the cluster by following commands:
+---------------+
| StorageMedium |
+---------------+
| HDD |
| SSD |
+---------------+
Click on a storage medium to see the equilibrium state of the BE node that contains the storage medium:
+----------+-----------------+-----------+---------------+----------------+-------------+------------+--------
--+-----------+--------------------+-------+
+----------+-----------------+-----------+---------------+----------------+-------------+------------+--------
--+-----------+--------------------+-------+
+----------+-----------------+-----------+---------------+----------------+-------------+------------+--------
--+-----------+--------------------+-------+
Class: Classified by load, LOW/MID/HIGH. Balanced scheduling moves copies from high-load nodes to low-load
nodes
Users can further view the utilization of each path on a BE, such as the BE with ID 10001:
+------------------+------------------+---------------+---------------+---------+--------+--------------------
--+
+------------------+------------------+---------------+---------------+---------+--------+--------------------
--+
+------------------+------------------+---------------+---------------+---------+--------+--------------------
--+
The disk usage of each data path on the specified BE is shown here.
2. Scheduling resources
Users can view the current slot usage of each node through the following commands:
+----------+----------------------+------------+------------+-------------+----------------------+
+----------+----------------------+------------+------------+-------------+----------------------+
In this paper, data path is used as granularity to show the current use of slot. Among them, `AvgRate'is the copy rate of
clone task in bytes/seconds on the path of historical statistics.
The following command allows you to view the priority repaired tables or partitions set by the `ADMIN REPAIR
TABLE'command.
Among them, `Remaining TimeMs'indicates that these priority fixes will be automatically removed from the priority fix
queue after this time. In order to prevent resources from being occupied due to the failure of priority repair.
We have collected some statistics of Tablet Checker and Tablet Scheduler during their operation, which can be viewed
through the following commands:
+---------------------------------------------------+-------------+
| Item | Value |
+---------------------------------------------------+-------------+
+---------------------------------------------------+-------------+
num of replica missing error: the number of tablets whose status is checked is the missing copy
num of replica version missing error: 检查的状态为版本缺失的 tablet 的数量(该统计值包括了 num of replica relocating 和
num of replica missing in cluster error)
num of replica relocation 29366; 24577;replica relocation tablet *
num of replica redundant error: Number of tablets whose checked status is replica redundant
检查的状态为不在对应 cluster 的 tablet 的数量
num of replica missing in cluster error:
Note: The above states are only historical accumulative values. We also print these statistics regularly in the FE logs,
where the values in parentheses represent the number of changes in each statistical value since the last printing
dependence of the statistical information.
Adjustable parameters
The following adjustable parameters are all configurable parameters in fe.conf.
use_new_tablet_scheduler
Description: Whether to enable the new replica scheduling mode. The new replica scheduling method is the replica
scheduling method introduced in this document. If turned on, disable_colocate_join must be true . Because the
new scheduling strategy does not support data fragmentation scheduling of co-locotion tables for the time being.
Default value:true
Importance: High
tablet_repair_delay_factor_second
Note: For different scheduling priorities, we will delay different time to start repairing. In order to prevent a large
number of unnecessary replica repair tasks from occurring in the process of routine restart and upgrade. This
parameter is a reference coefficient. For HIGH priority, the delay is the reference coefficient 1; for NORMAL priority,
the delay is the reference coefficient 2; for LOW priority, the delay is the reference coefficient * 3. That is, the lower the
priority, the longer the delay waiting time. If the user wants to repair the copy as soon as possible, this parameter can
be reduced appropriately.
Default value: 60 seconds
Importance: High
schedule_slot_num_per_path
Note: The default number of slots allocated to each disk for replica repair. This number represents the number of
replica repair tasks that a disk can run simultaneously. If you want to repair the copy faster, you can adjust this
parameter appropriately. The higher the single value, the greater the impact on IO.
Default value: 2
Importance: High
balance_load_score_threshold
Description: Threshold of Cluster Equilibrium. The default is 0.1, or 10%. When the load core of a BE node is not higher
than or less than 10% of the average load core, we think that the node is balanced. If you want to make the cluster
load more even, you can adjust this parameter appropriately.
storage_high_watermark_usage_percent 和 storage_min_left_capacity_bytes
Description: These two parameters represent the upper limit of the maximum space utilization of a disk and the
lower limit of the minimum space remaining, respectively. When the space utilization of a disk is greater than the
upper limit or the remaining space is less than the lower limit, the disk will no longer be used as the destination
address for balanced scheduling.
disable_balance
Description: Control whether to turn off the balancing function. When replicas are in equilibrium, some functions,
such as ALTER TABLE, will be banned. Equilibrium can last for a long time. Therefore, if the user wants to do the
prohibited operation as soon as possible. This parameter can be set to true to turn off balanced scheduling.
Default value: false
Importance:
clone_worker_count
Description: Affects the speed of copy equalization. In the case of low disk pressure, you can speed up replica
balancing by adjusting this parameter.
Default: 3
Importance: Medium
Unadjustable parameters
The following parameters do not support modification for the time being, just for illustration.
The maximum number of waiting tasks and running tasks is 2000. When over 2000, Tablet Checker will no longer
generate new scheduling tasks to Tablet Scheduler.
The maximum number of balanced tasks is 500. When more than 500, there will be no new balancing tasks.
The number of slots per disk for balancing tasks is 2. This slot is independent of the slot used for replica repair.
Tablet Scheduler recalculates the load score of the cluster every 20 seconds.
A clone task timeout time range is 3 minutes to 2 hours. The specific timeout is calculated by the size of the tablet. The
formula is (tablet size)/ (5MB/s). When a clone task fails three times, the task terminates.
The minimum priority adjustment interval is 5 minutes. When a tablet schedule fails five times, priority is lowered. When
a tablet is not scheduled for 30 minutes, priority is raised.
Relevant issues
In some cases, the default replica repair and balancing strategy may cause the network to be full (mostly in the case of
gigabit network cards and a large number of disks per BE). At this point, some parameters need to be adjusted to reduce
the number of simultaneous balancing and repair tasks.
Current balancing strategies for copies of Colocate Table do not guarantee that copies of the same Tablet will not be
distributed on the BE of the same host. However, the repair strategy of the copy of Colocate Table detects this
distribution error and corrects it. However, it may occur that after correction, the balancing strategy regards the replicas
as unbalanced and rebalances them. As a result, the Colocate Group cannot achieve stability because of the continuous
alternation between the two states. In view of this situation, we suggest that when using Colocate attribute, we try to
ensure that the cluster is isomorphic, so as to reduce the probability that replicas are distributed on the same host.
Best Practices
Control and manage the progress of replica repair and balancing of clusters
In most cases, Doris can automatically perform replica repair and cluster balancing by default parameter configuration.
However, in some cases, we need to manually intervene to adjust the parameters to achieve some special purposes. Such as
prioritizing the repair of a table or partition, disabling cluster balancing to reduce cluster load, prioritizing the repair of non-
colocation table data, and so on.
This section describes how to control and manage the progress of replica repair and balancing of the cluster by modifying
the parameters.
In some cases, Doris may not be able to automatically detect some corrupt replicas, resulting in frequent query or
import errors on the corrupt replicas. In this case, we need to delete the corrupted copies manually. This method can be
used to: delete a copy with a high version number resulting in a -235 error, delete a corrupted copy of a file, etc.
First, find the tablet id of the corresponding copy, let's say 10001, and use show tablet 10001; and execute the show
proc statement to see the details of each copy of the corresponding tablet.
Assuming that the backend id of the copy to be deleted is 20001, the following statement is executed to mark the copy
bad
as .
ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10001", "backend_id" = "20001", "status" = "bad");
At this point, the show proc statement again shows that the IsBad column of the corresponding copy has a value of
true .
The replica marked as bad will no longer participate in imports and queries. The replica repair logic will automatically
replenish a new replica at the same time. 2.
help admin repair table; View help. This command attempts to repair the tablet of the specified table or partition as a
priority.
The balancing task will take up some network bandwidth and IO resources. If you wish to stop the generation of new
balancing tasks, you can do so with the following command.
Copy scheduling tasks include balancing and repair tasks. These tasks take up some network bandwidth and IO
resources. All replica scheduling tasks (excluding those already running, including colocation tables and common tables)
can be stopped with the following command.
The colocation table copy scheduling is run separately and independently from the regular table. In some cases, users
may wish to stop the balancing and repair of colocation tables first and use the cluster resources for normal table repair
with the following command.
ADMIN SET FRONTEND CONFIG ("disable_colocate_balance" = "true");
Doris automatically repairs replicas when it detects missing replicas, BE downtime, etc. However, in order to reduce
some errors caused by jitter (e.g., BE being down briefly), Doris delays triggering these tasks.
tablet_repair_delay_factor_second
The parameter. Default 60 seconds. Depending on the priority of the repair task,
it will delay triggering the repair task for 60 seconds, 120 seconds, or 180 seconds. This time can be extended so that
longer exceptions can be tolerated to avoid triggering unnecessary repair tasks by using the following command.
ADMIN SET FRONTEND CONFIG ("tablet_repair_delay_factor_second" = "120");
by.
Doris' replica balancing logic adds a normal replica first and then deletes the old one for the purpose of replica migration.
When deleting the old replica, Doris waits for the completion of the import task that has already started on this replica to
avoid the balancing task from affecting the import task. However, this will slow down the execution speed of the
balancing logic. In this case, you can make Doris ignore this wait and delete the old replica directly by modifying the
following parameters.
This operation may cause some import tasks to fail during balancing (requiring a retry), but it will speed up balancing
significantly.
Overall, when we need to bring the cluster back to a normal state quickly, consider handling it along the following lines.
1. find the tablet that is causing the highly optimal task to report an error and set the problematic copy to bad.
2. repair some tables with the admin repair statement.
3. Stop the replica balancing logic to avoid taking up cluster resources, and then turn it on again after the cluster is restored.
4. Use a more conservative strategy to trigger repair tasks to deal with the avalanche effect caused by frequent BE
downtime.
5. Turn off scheduling tasks for colocation tables on-demand and focus cluster resources on repairing other high-optimality
data.
OLAP_SUCCESS 0 Success
OLAP_ERR_UB_FUNC_ERROR -110
OLAP_ERR_TOO_MANY_VERSION -235 The tablet data version exceeds the maximum limit (default
500)
OLAP_ERR_PUSH_BUILD_DELTA _ERROR -907 The pushed incremental file has an incorrect check code
OLAP_ERR_PUSH_VERSION_ALREADY_EXIST -908 PUSH version already exists
OLAP_ERR_HEADER_FLAG_PUT -1409
OLAP_ERR_ALTER_DELTA _DOES_NOT_EXISTS -1601 Failed to get all data sources, Tablet has no version
OLAP_ERR_ALTER_STATUS_ERR -1602 Failed to check the row number, internal sorting failed, row
block sorting failed, these will return this code
1005 Failed to create the table, give the specific reason in the returned error message
1007 The database already exists, you cannot create a database with the same name
1045 The user name and password do not match, and the system cannot be accessed
1096 The query statement does not specify the data table to be queried or operated
1113 The column set of the newly created table cannot be empty
1115 Unsupported character set used
1141 When revoking user permissions, the permissions that the user does not have are specified
1203 The number of active connections used by the user exceeds the limit
1353 SELECT and view field lists have different number of columns
1364 The field does not allow NULL values, but no default value is set
1507 Delete a partition that does not exist, and no condition is specified to delete it if it exists
1508 Unable to delete all partitions, please use DROP TABLE instead
1735 The specified partition name does not exist in the table
You cannot insert data into a table with empty partitions. Use "SHOW PARTITIONS FROM tbl" to view the current partition of
1748
this table
5002 The column name must be explicitly specified in the column substitution
5003 The Key column should be sorted before the Value column
5009 The PARTITION clause is invalid for INSERT into an unpartitioned table
Error
Info
Code
5010 The number of columns is not equal to the number of select lists in the SELECT statement
5011 Unable to resolve table reference
5026 An unsupported data type was used when creating a table with a SELECT statement
5027 The specified parameter is not set
5037 Before deleting the cluster, all databases in the cluster must be deleted
5037 The BE node with this ID does not exist in the cluster
5038 No cluster name specified
5053 There is no migration task from the source database to the target database
5054 The specified database is connected to the target database, or data is being migrated
5055 Data connection or data migration cannot be performed in the same cluster
5056 The database cannot be deleted: it is linked to another database or data is being migrated
5056 The database cannot be renamed: it is linked to another database or data is being migrated
5063 The data types of the partition columns of the Colocate table must be consistent
5065 The specified time unit is illegal. The correct units include: HOUR / DAY / WEEK / MONTH
5066 The starting value of the dynamic partition should be less than 0
5066 The dynamic partition start value is not a valid number
5066 The end value of the dynamic partition should be greater than 0
5066 The end value of the dynamic partition is not a valid number
5074 Create historical dynamic partition parameters: create_history_partition is invalid, what is expected is: true or false
Error Info
Code
5079 The specified dynamic partition reserved_history_periods' first date is larger than the second one
Background
In the latest version of the code, we introduced RocksDB in BE to store meta-information of tablet, in order to solve various
functional and performance problems caused by storing meta-information through header file. Currently, each data
directory (root path) has a corresponding RocksDB instance, in which all tablets on the corresponding root path are stored in
the key-value manner.
To facilitate the maintenance of these metadata, we provide an online HTTP interface and an offline meta tool to complete
related management operations.
The HTTP interface is only used to view tablet metadata online, and can be used when the BE process is running.
However, meta tool is only used for off-line metadata management operations. BE must be stopped before it can be used.
Operation
Online
Access BE's HTTP interface to obtain the corresponding Tablet Meta information:
api:
http://{host}:{port}/api/meta/header/{tablet_id}/{schema_hash}
Host: be Hostname
Give an example:
http://be_host:8040/api/meta/header/14156/2458238340
If the final query is successful, the Tablet Meta will be returned as json.
Offline
Get Tablet Meta on a disk based on the meta\ tool tool.
Command:
Load header
The function of loading header is provided to realize manual migration of tablet. This function is based on Tablet Meta in JSON
format, so if changes in the shard field and version information are involved, they can be changed directly in the JSON
content of Tablet Meta. Then use the following commands to load.
Command:
Delete header
In order to realize the function of deleting a tablet meta from a disk of a BE. Support single delete and batch delete.
Single delete:
Batch delete:
Each line in tablet_file.txt represents the information of a tablet. The format is:
root_path,tablet_id,schema_hash
tablet_file example:
/output/be/data/,14217,352781111
/output/be/data/,14219,352781111
/output/be/data/,14223,352781111
/output/be/data/,14227,352781111
/output/be/data/,14233,352781111
/output/be/data/,14239,352781111
Batch delete will skip the line with incorrect tablet information format in tablet_file . And after the execution is completed,
the number of successful deletions and the number of errors are displayed.
TabletMeta in Pb format
This command is to view the old file-based management PB format Tablet Meta, and to display Tablet Meta in JSON format.
Command:
This command is to view the PB format segment meta, and to display segment meta in JSON format.
Command:
Note: Before 0.9.0 (excluding), please use revision 1. For version 0.9.x, use revision 2. For version 0.10.x, use revision 3.
Dashboard templates are updated from time to time. The way to update the template is shown in the last section.
Components
Doris uses Prometheus and Grafana to collect and display input monitoring items.
1. Prometheus
Prometheus is an open source system monitoring and alarm suite. It can collect monitored items by Pull or Push and
store them in its own time series database. And through the rich multi-dimensional data query language, to meet the
different data display needs of users.
2. Grafana
Grafana is an open source data analysis and display platform. Support multiple mainstream temporal database sources
including Prometheus. Through the corresponding database query statements, the display data is obtained from the
data source. With flexible and configurable dashboard, these data can be quickly presented to users in the form of
graphs.
Note: This document only provides a way to collect and display Doris monitoring data using Prometheus and Grafana. In
principle, these components are not developed or maintained. For more details on these components, please step
through the corresponding official documents.
Monitoring data
Doris's monitoring data is exposed through the HTTP interface of Frontend and Backend. Monitoring data is presented in the
form of key-value text. Each Key may also be distinguished by different Labels. When the user has built Doris, the monitoring
data of the node can be accessed in the browser through the following interfaces:
Frontend: fe_host:fe_http_port/metrics
Backend: be_host:be_web_server_port/metrics
Users will see the following monitoring item results (for example, FE partial monitoring items):
```
jvm_heap_size_bytes{type="max"} 41661235200
jvm_heap_size_bytes{type="committed"} 19785285632
jvm_heap_size_bytes{type="used"} 10113221064
jvm_non_heap_size_bytes{type="committed"} 105295872
jvm_non_heap_size_bytes{type="used"} 103184784
jvm_young_size_bytes{type="used"} 6505306808
jvm_young_size_bytes{type="peak_used"} 10308026368
jvm_young_size_bytes{type="max"} 10308026368
jvm_old_size_bytes{type="used"} 3522435544
jvm_old_size_bytes{type="peak_used"} 6561017832
jvm_old_size_bytes{type="max"} 30064771072
jvm_direct_buffer_pool_size_bytes{type="count"} 91
jvm_direct_buffer_pool_size_bytes{type="used"} 226135222
jvm_direct_buffer_pool_size_bytes{type="capacity"} 226135221
jvm_young_gc{type="count"} 2186
jvm_young_gc{type="time"} 93650
jvm_old_gc{type="time"} 58268
jvm_thread{type="peak_count"} 831
...
```
This is a monitoring data presented in Prometheus Format. We take one of these monitoring items as an example to
illustrate:
jvm_heap_size_bytes{type="max"} 41661235200
jvm_heap_size_bytes{type="committed"} 19785285632
jvm_heap_size_bytes{type="used"} 10113221064
1. Behavior commentary line at the beginning of "#". HELP is the description of the monitored item; TYPE represents the
data type of the monitored item, and Gauge is the scalar data in the example. There are also Counter, Histogram and
other data types. Specifically, you can see Prometheus Official Document.
2. jvm_heap_size_bytes is the name of the monitored item (Key); type= "max" is a label named type , with a value of max .
A monitoring item can have multiple Labels.
3. The final number, such as 41661235200 , is the monitored value.
Monitoring Architecture
The entire monitoring architecture is shown in the following figure:
1. The yellow part is Prometheus related components. Prometheus Server is the main process of Prometheus. At present,
Prometheus accesses the monitoring interface of Doris node by Pull, and then stores the time series data in the time
series database TSDB (TSDB is included in the Prometheus process, and need not be deployed separately). Prometheus
also supports building Push Gateway to allow monitored data to be pushed to Push Gateway by Push by monitoring
system, and then data from Push Gateway by Prometheus Server through Pull.
2. Alert Manager is a Prometheus alarm component, which needs to be deployed separately (no solution is provided yet,
but can be built by referring to official documents). Through Alert Manager, users can configure alarm strategy, receive
mail, short messages and other alarms.
3. The green part is Grafana related components. Grafana Server is the main process of Grafana. After startup, users can
configure Grafana through Web pages, including data source settings, user settings, Dashboard drawing, etc. This is also
where end users view monitoring data.
Start building
Please start building the monitoring system after you have completed the deployment of Doris.
Prometheus
1. Download the latest version of Prometheus on the Prometheus Website. Here we take version 2.3.2-linux-amd64 as an
example.
2. Unzip the downloaded tar file on the machine that is ready to run the monitoring service.
3. Open the configuration file prometheus.yml. Here we provide an example configuration and explain it (the configuration
file is in YML format, pay attention to uniform indentation and spaces):
Here we use the simplest way of static files to monitor configuration. Prometheus supports a variety of service discovery,
which can dynamically sense the addition and deletion of nodes.
# my global config
global:
evaluation_interval: 15s # Global rule trigger interval, default 1 m, set 15s here
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'PALO_CLUSTER' # Each Doris cluster, we call it a job. Job can be given a name here as the name
of Doris cluster in the monitoring system.
metrics_path: '/metrics' # Here you specify the restful API to get the monitors. With host: port in the
following targets, Prometheus will eventually collect monitoring items through host: port/metrics_path.
static_configs: # Here we begin to configure the target addresses of FE and BE, respectively. All FE and
BE are written into their respective groups.
labels:
group: fe # Here configure the group of fe, which contains three Frontends
labels:
group: be # Here configure the group of be, which contains three Backends
- job_name: 'PALO_CLUSTER_2' # We can monitor multiple Doris clusters in a Prometheus, where we begin the
configuration of another Doris cluster. Configuration is the same as above, the following is outlined.
metrics_path: '/metrics'
static_configs:
labels:
group: fe
labels:
group: be
4. start Prometheus
Start Prometheus with the following command:
This command will run Prometheus in the background and specify its Web port as 8181. After startup, data is collected
and stored in the data directory.
5. stop Promethues
At present, there is no formal way to stop the process, kill - 9 directly. Of course, Prometheus can also be set as a service
to start and stop in a service way.
6. access Prometheus
Prometheus can be easily accessed through web pages. The page of Prometheus can be accessed by opening port 8181
through browser. Click on the navigation bar, Status -> Targets , and you can see all the monitoring host nodes of the
grouped Jobs. Normally, all nodes should be UP , indicating that data acquisition is normal. Click on an Endpoint to see
the current monitoring value. If the node state is not UP, you can first access Doris's metrics interface (see previous
article) to check whether it is accessible, or query Prometheus related documents to try to resolve.
7. So far, a simple Prometheus has been built and configured. For more advanced usage, see Official Documents
Grafana
1. Download the latest version of Grafana on Grafana's official website. Here we take version 5.2.1.linux-amd64 as an
example.
2. Unzip the downloaded tar file on the machine that is ready to run the monitoring service.
3. Open the configuration file conf/defaults.ini. Here we only list the configuration items that need to be changed, and the
other configurations can be used by default.
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
data = data
logs = data/log
protocal = http
http_addr =
http_port = 8182
4. start Grafana
This command runs Grafana in the background, and the access port is 8182 configured above.
5. stop Grafana
At present, there is no formal way to stop the process, kill - 9 directly. Of course, you can also set Grafana as a service to
start and stop as a service.
6. access Grafana
Through the browser, open port 8182, you can start accessing the Grafana page. The default username password is
admin.
7. Configure Grafana
For the first landing, you need to set up the data source according to the prompt. Our data source here is Prometheus,
which was configured in the previous step.
vi. Click Save & Test at the bottom. If Data source is working , it means that the data source is available.
vii. After confirming that the data source is available, click on the + number in the left navigation bar and start adding
Dashboard. Here we have prepared Doris's dashboard template (at the beginning of this document). When the
download is complete, click New dashboard -> Import dashboard -> Upload.json File above to import the
downloaded JSON file.
viii. After importing, you can name Dashboard by default Doris Overview . At the same time, you need to select the data
source, where you select the doris_monitor_data_source you created earlier.
Import
ix. Click to complete the import. Later, you can see Doris's dashboard display.
8. So far, a simple Grafana has been built and configured. For more advanced usage, see Official Documents
Dashboard
Here we briefly introduce Doris Dashboard. The content of Dashboard may change with the upgrade of version. This
document is not guaranteed to be the latest Dashboard description.
1. Top Bar
Cluster name: Each job name in the Prometheus configuration file represents a Doris cluster. Select a different
cluster, and the chart below shows the monitoring information for the corresponding cluster.
Interval: Some charts show rate-related monitoring items, where you can choose how much interval to sample and
calculate the rate (Note: 15s interval may cause some charts to be unable to display).
2. Row.
In Grafana, the concept of Row is a set of graphs. As shown in the figure above, Overview and Cluster Overview are two
different Rows. Row can be folded by clicking Row. Currently Dashboard has the following Rows (in continuous
updates):
3. Charts
i. Hover the I icon in the upper left corner of the mouse to see the description of the chart.
ii. Click on the illustration below to view a monitoring item separately. Click again to display all.
iii. Dragging in the chart can select the time range.
iv. The selected cluster name is displayed in [] of the title.
-right
v. Some values correspond to the Y-axis on the left and some to the right, which can be distinguished by the at
the end of the legend.
vi. Click on the name of the chart -> Edit to edit the chart.
Dashboard Update
+ Dashboard
1. Click on in the left column of Grafana and .
2. Click New dashboard in the upper left corner, and Import dashboard appears on the right.
3. Click Upload .json File to select the latest template file.
4. Selecting Data Sources
Feedback
Admin Manual Maintenance and Monitor Tablet Local Debug
At this time, it is necessary to copy the copy data of the tablet online to the local environment for reproduction, and then
locate the problem.
DbName: default_cluster:db1
TableName: tbl1
PartitionName: tbl1
IndexName: tbl1
DbId: 10004
TableId: 10016
PartitionId: 10015
IndexId: 10017
IsSync: true
Order: 1
ReplicaId: 10021
BackendId: 10003
Version: 3
LstSuccessVersion: 3
LstFailedVersion: -1
LstFailedTime: NULL
SchemaHash: 785778507
LocalDataSize: 780
RemoteDataSize: 0
RowCount: 2
State: NORMAL
IsBad: false
VersionCount: 3
PathHash: 7390150550643804973
MetaUrl: http://192.168.10.1:8040/api/meta/header/10020
CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10020
TabletId: 10020
BackendId: 10003
Ip: 192.168.10.1
Path: /path/to/be/storage/snapshot/20220830101353.2.3600
ExpirationMinutes: 60
) ENGINE=OLAP
PROPERTIES (
"replication_num" = "1",
"version_info" = "2"
);
The admin copy tablet command can generate a snapshot file of the corresponding replica and version for the specified
tablet. Snapshot files are stored in the Path directory of the BE node indicated by the Ip field.
There will be a directory named tablet id under this directory, which will be packaged as a whole for later use. (Note that the
directory is kept for a maximum of 60 minutes, after which it is automatically deleted).
cd /path/to/be/storage/snapshot/20220830101353.2.3600
The command will also generate the table creation statement corresponding to the tablet at the same time. Note that this
table creation statement is not the original table creation statement, its bucket number and replica number are both 1, and
versionInfo
the field is specified. This table building statement is used later when loading the tablet locally.
So far, we have obtained all the necessary information, the list is as follows:
Deploy a single-node Doris cluster (1FE, 1BE) locally, and the deployment version is the same as the online cluster. If the
online deployment version is DORIS-1.1.1, the local environment also deploys the DORIS-1.1.1 version.
2. Create a table
Create a table in the local environment using the create table statement from the previous step.
Because the number of buckets and replicas of the newly created table is 1, there will only be one tablet with one replica:
TabletId: 10017
ReplicaId: 10018
BackendId: 10003
SchemaHash: 44622287
Version: 1
LstSuccessVersion: 1
LstFailedVersion: -1
LstFailedTime: NULL
LocalDataSize: 0
RemoteDataSize: 0
RowCount: 0
State: NORMAL
LstConsistencyCheckTime: NULL
CheckVersion: -1
VersionCount: -1
PathHash: 7390150550643804973
MetaUrl: http://192.168.10.1:8040/api/meta/header/10017
CompactionStatus: http://192.168.10.1:8040/api/compaction/show?tablet_id=10017
DbName: default_cluster:db1
TableName: tbl1
PartitionName: tbl1
IndexName: tbl1
DbId: 10004
TableId: 10015
PartitionId: 10014
IndexId: 10016
IsSync: true
Order: 0
TableId
PartitionId
TabletId
SchemaHash
At the same time, we also need to go to the data directory of the BE node in the debugging environment to confirm the
shard id where the new tablet is located:
This command will enter the directory where the tablet 10017 is located and display the path. Here we will see a path
similar to the following:
/path/to/storage/data/0/10017
Unzip the tablet data package obtained in the first step. The editor opens the 10017.hdr.json file, and modifies the
following fields to the information obtained in the previous step:
"table_id":10015
"partition_id":10014
"tablet_id":10017
"schema_hash":44622287
"shard_id":0
First, stop the debug environment's BE process (. /bin/stop_be.sh). Then copy all the .dat files in the same level directory
of the 10017.hdr.json file to the /path/to/storage/data/0/10017/44622287 directory. This directory is the directory where
the debugging environment tablet we obtained in step 3 is located. 10017/44622287 are the tablet id and schema hash
respectively.
Delete the original tablet meta with the meta_tool tool. The tool is located in the be/lib directory.
/path/to/storage
Where is the data root directory of BE. If the deletion is successful, the delete successfully log will
appear.
6. Verification
Restart the debug environment's BE process (. /bin/start_be.sh). Query the table, if correct, you can query the data of the
loaded tablet, or reproduce the online problem.
user data. Tablet data deleted by users will not be deleted directly, but will be stored in the recycle bin for a period of time.
After a period of time, there will be a regular cleaning mechanism to delete expired data. The data in the recycle bin includes:
tablet data file (.dat), tablet index file (.idx) and tablet metadata file (.hdr). The data will be stored in a path in the following
format:
/root_path/trash/time_label/tablet_id/schema_hash/
root_path
: a data root directory corresponding to the BE node.
trash : The directory of the recycle bin.
time_label : Time label, for the uniqueness of the data directory in the recycle bin, while recording the data time, use the
time label as a subdirectory.
When a user finds that online data has been deleted by mistake, he needs to recover the deleted tablet from the recycle bin.
This tablet data recovery function is needed.
BE provides http interface and restore_tablet_tool.sh script to achieve this function, and supports single tablet operation
(single mode) and batch operation mode (batch mode).
Operation
single mode
1. http request method
BE provides an http interface for single tablet data recovery, the interface is as follows:
If it fails, the corresponding failure reason will be returned. One possible result is as follows:
2. Script mode
restore_tablet_tool.sh can be used to realize the function of single tablet data recovery.
sh tools/restore_tablet_tool.sh -b "http://127.0.0.1:8040" -t 12345 -s 11111
batch mode
The batch recovery mode is used to realize the function of recovering multiple tablet data.
When using, you need to put the restored tablet id and schema hash in a file in a comma-separated format in advance, one
tablet per line.
12345,11111
12346,11111
12347,11111
Then perform the recovery with the following command (assuming the file name is: tablets.txt ):
Note: This operation is only used to avoid the problem of error reporting due to the inability to find a queryable replica, and it
is impossible to recover the data that has been substantially lost.
If there is data loss, there will be a log similar to the following in the log:
backend [10001] invalid situation. tablet[20000] has few replica[1], replica num setting is [3]
This log indicates that all replicas of tablet 20000 have been damaged or lost.
After confirming that the data cannot be recovered, you can execute the following command to generate blank replicas.
Note: You can first check whether the current version supports this parameter through the ADMIN SHOW FRONTEND
CONFIG; command.
3. A few minutes after the setup is complete, you should see the following log in the Master FE log fe.log :
tablet 20000 has only one replica 20001 on backend 10001 and it is lost. create an empty replica to recover
it.
The log indicates that the system has created a blank tablet to fill in the missing replica.
4. Judge whether it has been repaired successfully through query.
For the time being, read the Doris metadata design document to understand how Doris metadata works.
Important tips
Current metadata design is not backward compatible. That is, if the new version has a new metadata structure change
(you can see whether there is a new VERSION in the FeMetaVersion.java file in the FE code), it is usually impossible to
roll back to the old version after upgrading to the new version. Therefore, before upgrading FE, be sure to test metadata
compatibility according to the operations in the Upgrade Document.
/path/to/doris-meta/
|-- bdb/
| |-- 00000000.jdb
| |-- je.config.csv
| |-- je.info.0
| |-- je.info.0.lck
| |-- je.lck
| `-- je.stat.csv
`-- image/
|-- ROLE
|-- VERSION
`-- image.xxxx
1. bdb
We use bdbje as a distributed kV system to store metadata journal. This BDB directory is equivalent to the "data
directory" of bdbje.
The .jdb suffix is the data file of bdbje. These data files will increase with the increasing number of metadata journals.
When Doris regularly completes the image, the old log is deleted. So normally, the total size of these data files varies from
several MB to several GB (depending on how Doris is used, such as import frequency). When the total size of the data file
is larger than 10GB, you may need to wonder whether the image failed or the historical journals that failed to distribute
the image could not be deleted.
je.info.0 is the running log of bdbje. The time in this log is UTC + 0 time zone. We may fix this in a later version. From
this log, you can also see how some bdbje works.
2. image directory
The image directory is used to store metadata mirrors generated regularly by Doris. Usually, you will see a image.xxxxx
mirror file. Where xxxxx is a number. This number indicates that the image contains all metadata journal before xxxx .
And the generation time of this file (viewed through ls -al ) is usually the generation time of the mirror.
You may also see a image.ckpt file. This is a metadata mirror being generated. The du -sh command should show that
the file size is increasing, indicating that the mirror content is being written to the file. When the mirror is written, it
automatically renames itself to a new image.xxxxx and replaces the old image file.
Only FE with a Master role will actively generate image files on a regular basis. After each generation, FE is pushed to
other non-Master roles. When it is confirmed that all other FEs have received this image, Master FE deletes the metadata
journal in bdbje. Therefore, if image generation fails or image push fails to other FEs, data in bdbje will accumulate.
ROLE file records the type of FE (FOLLOWER or OBSERVER), which is a text file.
VERSION file records the cluster ID of the Doris cluster and the token used to access authentication between nodes,
which is also a text file.
ROLE file and VERSION file may only exist at the same time, or they may not exist at the same time (e.g. at the first
startup).
Basic operations
1. First start-up
ii. Ensure that path/to/doris-meta already exists, that the permissions are correct and that the directory is empty.
sh bin/start_fe.sh
iii. Start directly through .
iv. After booting, you should be able to see the following log in fe.log:
Palo FE starting...
image does not exist: /path/to/doris-meta/image/image.0
transfer from INIT to UNKNOWN
transfer from UNKNOWN to MASTER
QE service start
thrift server started
The above logs are not necessarily strictly in this order, but they are basically similar.
v. The first start-up of a single-node FE usually does not encounter problems. If you haven't seen the above logs,
generally speaking, you haven't followed the document steps carefully, please read the relevant wiki carefully.
2. Restart
ii. After restarting, you should be able to see the following log in fe.log:
Palo FE starting...
finished to get cluster id: xxxx, role: FOLLOWER and node name: xxxx
QE service start
The above logs are not necessarily strictly in this order, but they are basically similar.
3. Common problems
For the deployment of single-node FE, start-stop usually does not encounter any problems. If you have any questions,
please refer to the relevant Wiki and check your operation steps carefully.
Add FE
Adding FE processes is described in detail in the Elastic Expansion Documents and will not be repeated. Here are some
points for attention, as well as common problems.
1. Notes
Before adding a new FE, make sure that the current Master FE runs properly (connection is normal, JVM is normal,
image generation is normal, bdbje data directory is too large, etc.)
The first time you start a new FE, you must make sure that the --helper parameter is added to point to Master FE.
There is no need to add --helper when restarting. (If --helper is specified, FE will directly ask the helper node for its
role. If not, FE will try to obtain information from ROLE and VERSION files in the doris-meta/image/ directory.
The first time you start a new FE, you must make sure that the meta_dir of the FE is created, has correct
permissions and is empty.
Starting a new FE and executing the ALTER SYSTEM ADD FOLLOWER/OBSERVER statement adds FE to metadata in a
sequence that is not required. If a new FE is started first and no statement is executed, the current node is not
added to the group. Please add it first. in the new FE log. When the statement is executed, it enters the normal
process.
Make sure that after the previous FE is added successfully, the next FE is added.
Connect to MASTER FE and execute ALTER SYSTEM ADD FOLLOWER/OBSERVER stmt.
2. Common problems
When you first start a FE to be added, if the data in doris-meta/bdb on Master FE is large, you may see the words
this node is DETACHED bdb/
. in the FE log to be added. At this point, bdbje is copying data, and you can see that the
directory of FE to be added is growing. This process usually takes several minutes (depending on the amount of data
in bdbje). Later, there may be some bdbje-related error stack information in fe. log. If QE service start and thrift
server start are displayed in the final log, the start is usually successful. You can try to connect this FE via mysql-
client. If these words do not appear, it may be the problem of bdbje replication log timeout. At this point, restarting
the FE directly will usually solve the problem.
If OBSERVER is added, because OBSERVER-type FE does not participate in the majority of metadata writing, it
can theoretically start and stop at will. Therefore, for the case of adding OBSERVER failure. The process of
OBSERVER FE can be killed directly. After clearing the metadata directory of OBSERVER, add the process again.
If FOLLOWER is added, because FOLLOWER is mostly written by participating metadata. So it is possible that
FOLLOWER has joined the bdbje electoral team. If there are only two FOLLOWER nodes (including MASTER),
then stopping one FE may cause another FE to quit because it cannot write most of the time. At this point, we
should first delete the newly added FOLLOWER node from the metadata through the ALTER SYSTEM DROP
FOLLOWER command, then kill the FOLLOWER process, empty the metadata and re-add the process.
Delete FE
The corresponding type of FE can be deleted by the ALTER SYSTEM DROP FOLLOWER/OBSERVER command. The following points
should be noted:
For FOLLOWER type FE. First, you should make sure that you start deleting an odd number of FOLLOWERs (three or
more).
i. If the FE of non-MASTER role is deleted, it is recommended to connect to MASTER FE, execute DROP command, and
then kill the process.
ii. If you want to delete MASTER FE, first confirm that there are odd FOLLOWER FE and it works properly. Then kill the
MASTER FE process first. At this point, a FE will be elected MASTER. After confirming that the remaining FE is
working properly, connect to the new MASTER FE and execute the DROP command to delete the old MASTER FE.
Advanced Operations
Failure recovery
FE may fail to start bdbje and synchronize between FEs for some reasons. Phenomena include the inability to write
metadata, the absence of MASTER, and so on. At this point, we need to manually restore the FE. The general principle of
manual recovery of FE is to start a new MASTER through metadata in the current meta_dir , and then add other FEs one by
one. Please follow the following steps strictly:
1. First, stop all FE processes and all business access. Make sure that during metadata recovery, external access will not lead
to other unexpected problems.
After that, we use the FE node with the latest metadata to recover.
If using metadata of OBSERVER node to recover will be more troublesome, it is recommended to choose
FOLLOWER node as far as possible.
i. If the node is an OBSERVER, first change the role=OBSERVER in the meta_dir/image/ROLE file to role=FOLLOWER .
(Recovery from the OBSERVER node will be more cumbersome, first follow the steps here, followed by a separate
description)
vi. If successful, through the show frontends; command, you should see all the FEs you added before, and the current
FE is master.
vii. Delete the metadata_failure_recovery=true configuration item in fe.conf, or set it to false , and restart the FE
(Important).
> If you are recovering metadata from an OBSERVER node, after completing the above steps, you will find that the
current FE role is OBSERVER, but `IsMaster` appears as `true`. This is because the "OBSERVER" seen here is
recorded in Doris's metadata, but whether it is master or not, is recorded in bdbje's metadata. Because we
recovered from an OBSERVER node, there was inconsistency. Please take the following steps to fix this problem (we
will fix it in a later version):
> 1. First, all FE nodes except this "OBSERVER" are DROPed out.
> 2. A new FOLLOWER FE is added through the `ADD FOLLOWER` command, assuming that it is on hostA.
> 4. After successful startup, you should see two FEs through the `show frontends;` statement, one is the
previous OBSERVER, the other is the newly added FOLLOWER, and the OBSERVER is the master.
> 5. After confirming that the new FOLLOWER is working properly, the new FOLLOWER metadata is used to perform a
failure recovery operation again.
> 6. The purpose of the above steps is to manufacture a metadata of FOLLOWER node artificially, and then use this
metadata to restart fault recovery. This avoids inconsistencies in recovering metadata from OBSERVER.
>The meaning of `metadata_failure_recovery = true` is to empty the metadata of `bdbje`. In this way, bdbje will
not contact other FEs before, but start as a separate FE. This parameter needs to be set to true only when
restoring startup. After recovery, it must be set to false. Otherwise, once restarted, the metadata of bdbje will
be emptied again, which will make other FEs unable to work properly.
4. After the successful execution of step 3, we delete the previous FEs from the metadata by using the ALTER SYSTEM DROP
FOLLOWER/OBSERVER command and add them again by adding new FEs.
FE type change
If you need to change the existing FOLLOWER/OBSERVER type FE to OBSERVER/FOLLOWER type, please delete FE in the
way described above, and then add the corresponding type FE.
FE Migration
If you need to migrate one FE from the current node to another, there are several scenarios.
After adding a new FOLLOWER / OBSERVER directly, delete the old FOLLOWER / OBSERVER.
When there is only one FE, refer to the Failure Recovery section. Copy the doris-meta directory of FE to the new node
and start the new MASTER in Step 3 of the Failure Recovery section
3. A set of FOLLOWER migrates from one set of nodes to another set of new nodes
Deploy FE on the new node and add the new node first by adding FOLLOWER. The old nodes can be dropped by DROP
one by one. In the process of DROP-by-DROP, MASTER automatically selects the new FOLLOWER node.
Replacement of FE port
FE currently has the following ports
1. edit_log_port
If this port needs to be replaced, it needs to be restored with reference to the operations in the Failure Recovery
section. Because the port has been persisted into bdbje's own metadata (also recorded in Doris's own metadata), it is
necessary to clear bdbje's metadata by setting metadata_failure_recovery=true .
2. http_port
All FE http_ports must be consistent. So if you want to modify this port, all FEs need to be modified and restarted.
Modifying this port will be more complex in the case of multiple FOLLOWER deployments (involving laying eggs and
laying hens...), so this operation is not recommended. If necessary, follow the operation in the Failure Recovery section
directly.
3. rpc_port
After modifying the configuration, restart FE directly. Master FE informs BE of the new port through heartbeat. Only this
port of Master FE will be used. However, it is still recommended that all FE ports be consistent.
4. query_port
After modifying the configuration, restart FE directly. This only affects mysql's connection target.
2. Execute the following command to dump metadata from the Master FE memory: (hereafter called image_mem)
3. Replace the image file in the meta_dir/image directory on the OBSERVER FE node with the image_mem file, restart the
OBSERVER FE node, and verify the integrity and correctness of the image_mem file. You can check whether the DB and
Table metadata are normal on the FE Web page, whether there is an exception in fe.log , whether it is in a normal
replayed jour.
image_mem
Since 1.2.0, it is recommanded to use following method to verify the file:
If verify succeed, it will print: Load image success. Image file /absolute/path/to/image.xxxxxx is valid .
4. Replace the image file in the meta_dir/image directory on the FOLLOWER FE node with the image_mem file in turn,
restart the FOLLOWER FE node, and confirm that the metadata and query services are normal.
5. Replace the image file in the meta_dir/image directory on the Master FE node with the image_mem file, restart the
Master FE node, and then confirm that the FE Master switch is normal and The Master FE node can generate a new
image file through checkpoint.
Note: If the Image file is large, the entire process can take a long time, so during this time, make sure Master FE does not
generate a new image file via checkpoint. When the image.ckpt file in the meta_dir/image directory on the Master FE node is
observed to be as large as the image.xxx file, the image.ckpt file can be deleted directly.
to metadata errors. In this case, Doris provides a way to help users query the data stored in BDBJE to facilitate
troubleshooting.
First, you need to add configuration in fe.conf: enable_bdbje_debug_mode=true , and then start FE through sh start_fe.sh --
daemon .
At this time, FE will enter the debug mode, only start the http server and MySQL server, and open the BDBJE instance, but
will not load any metadata and other subsequent startup processes.
This is, we can view the data stored in BDBJE by visiting the web page of FE, or after connecting to Doris through the MySQL
client, through show proc /bdbje; .
+----------+---------------+---------+
+----------+---------------+---------+
| 110589 | 4273 | |
| epochDB | 4 | |
| metricDB | 430694 | |
+----------+---------------+---------+
The first level directory will display all the database names in BDBJE and the number of entries in each database.
+-----------+
| JournalId |
+-----------+
| 1 |
| 2 |
...
| 114858 |
| 114859 |
| 114860 |
| 114861 |
+-----------+
Entering the second level, all the entry keys under the specified database will be listed.
+-----------+--------------+---------------------------------------------+
+-----------+--------------+---------------------------------------------+
+-----------+--------------+---------------------------------------------+
The third level can display the value information of the specified key.
Best Practices
The deployment recommendation of FE is described in the Installation and Deployment Document. Here are some
supplements.
If you don't know the operation logic of FE metadata very well, or you don't have enough experience in the operation
and maintenance of FE metadata, we strongly recommend that only one FOLLOWER-type FE be deployed as MASTER
in practice, and the other FEs are OBSERVER, which can reduce many complex operation and maintenance problems.
Don't worry too much about the failure of MASTER single point to write metadata. First, if you configure it properly, FE
as a java process is very difficult to hang up. Secondly, if the MASTER disk is damaged (the probability is very low), we can
also use the metadata on OBSERVER to recover manually through fault recovery .
The JVM of the FE process must ensure sufficient memory. We strongly recommend that FE's JVM memory should be at
least 10GB and 32GB to 64GB. And deploy monitoring to monitor JVM memory usage. Because if OOM occurs in FE,
metadata writing may fail, resulting in some failures that cannot recover!
FE nodes should have enough disk space to prevent the excessive metadata from causing insufficient disk space. At the
same time, FE logs also take up more than a dozen gigabytes of disk space.
This is usually because the FE cannot elect Master. For example, if three FOLLOWERs are configured, but only one
FOLLOWER is started, this FOLLOWER will cause this problem. Usually, just start the remaining FOLLOWER. If the
problem has not been solved after the start-up, manual recovery may be required in accordance with the way in the
Failure Recovery section.
2. Clock delta: xxxx ms. between Feeder: xxxx and this Replica exceeds max permissible delta: xxxx ms.
Bdbje requires that clock errors between nodes should not exceed a certain threshold. If exceeded, the node will exit
abnormally. The default threshold is 5000ms, which is controlled by FE parameter `max_bdbje_clock_delta_ms', and can
be modified as appropriate. But we suggest using NTP and other clock synchronization methods to ensure the clock
synchronization of Doris cluster hosts.
3. Mirror files in the image/ directory have not been updated for a long time
Master FE generates a mirror file by default for every 50,000 metadata journal. In a frequently used cluster, a new image
file is usually generated every half to several days. If you find that the image file has not been updated for a long time (for
example, more than a week), you can see the reasons in sequence as follows:
i. Search for memory is not enough to do checkpoint. Committed memroy XXXX Bytes, used memory XXXX Bytes. in the
fe.log of Master FE. If found, it indicates that the current FE's JVM memory is insufficient for image generation
(usually we need to reserve half of the FE memory for image generation). Then you need to add JVM memory and
restart FE before you can observe. Each time Master FE restarts, a new image is generated directly. This restart
method can also be used to actively generate new images. Note that if there are multiple FOLLOWER deployments,
then when you restart the current Master FE, another FOLLOWER FE will become MASTER, and subsequent image
generation will be the responsibility of the new Master. Therefore, you may need to modify the JVM memory
configuration of all FOLLOWER FE.
ii. Search for begin to generate new image: image.xxxx in the fe.log of Master FE. If it is found, then the image is
generated. Check the subsequent log of this thread, and if checkpoint finished save image.xxxx appears, the image
is written successfully. If Exception when generating new image file occurs, the generation fails and specific error
messages need to be viewed.
4. The size of the bdb/ directory is very large, reaching several Gs or more.
The BDB directory will remain large for some time after eliminating the error that the new image cannot be generated.
Maybe it's because Master FE failed to push image. You can search push image.XXXX to other nodes. totally XX nodes,
push successed YY nodes in the fe. log of Master FE. If YY is smaller than xx, then some FEs are not pushed successfully.
You can see the specific error Exception when pushing image file.url = xxx in the fe. log.
At the same time, you can add the configuration in the FE configuration file: edit_log_roll_num = xxxx . This parameter
sets the number of metadata journals and makes an image once. The default is 50000. This number can be reduced
appropriately to make images more frequent, thus speeding up the deletion of old journals.
5. FOLLOWER FE hangs up one after another
Because Doris's metadata adopts the majority writing strategy, that is, a metadata journal must be written to at least a
number of FOLLOWER FEs (for example, three FOLLOWERs, two must be written successfully) before it can be
considered successful. If the write fails, the FE process exits on its own initiative. So suppose there are three FOLLOWERs:
A, B and C. C hangs up first, and then B hangs up, then A will hang up. So as described in the Best Practices section, if
you don't have extensive experience in metadata operations and maintenance, it's not recommended to deploy multiple
FOLLOWERs.
6. fe.log 中出现 get exception when try to close previously opened bdb database. ignore it
If there is the word ignore it behind it, there is usually no need to deal with it. If you are interested, you can search for
this error in BDBEnvironment.java , and see the annotations.
7. From show frontends; Look, the Join of a FE is listed as true , but actually the FE is abnormal.
Through show frontends; see the Join information. If the column is true , it only means that the FE has joined the
cluster. It does not mean that it still exists normally in the cluster. If false , it means that the FE has never joined the
cluster.
master_sync_policy is used to specify whether fsync (), replica_sync_policy is called when Leader FE writes metadata
log, and replica_sync_policy is used to specify whether other Follower FE calls fsync () when FE HA deploys
synchronous metadata. In earlier versions of Oris, these two parameters defaulted to WRITE_NO_SYNC , i.e., fsync () was not
called. In the latest version of Oris, the default has been changed to SYNC , that is, fsync () is called. Calling fsync ()
significantly reduces the efficiency of metadata disk writing. In some environments, IOPS may drop to several hundred
and the latency increases to 2-3ms (but it's still enough for Doris metadata manipulation). Therefore, we recommend the
following configuration:
i. For a single Follower FE deployment, master_sync_policy is set to SYNC , which prevents the loss of metadata due to
the downtime of the FE system.
ii. For multi-Follower FE deployment, we can set master_sync_policy and replica_sync_policy to WRITE_NO_SYNC ,
because we think that the probability of simultaneous outage of multiple systems is very low.
If master_sync_policy is set to WRITE_NO_SYNC in a single Follower FE deployment, then a FE system outage may occur,
resulting in loss of metadata. At this point, if other Observer FE attempts to restart, it may report an error:
Node xxx must rollback xx total commits(numPassedDurableCommits of which were durable) to the earliest point
indicated by transaction xxxx in order to rejoin the replication group, but the transaction rollback limit of
xxx prohibits this.
This means that some transactions that have been persisted need to be rolled back, but the number of entries exceeds the
upper limit. Here our default upper limit is 100, which can be changed by setting txn_rollback_limit . This operation is only
used to attempt to start FE normally, but lost metadata cannot be recovered.
Memory Tracker
Since Version 1.2.0
The Memory Tracker records the memory usage of the Doris BE process, including the memory used in the life cycle of tasks
such as query, import, Compaction, and Schema Change, as well as various caches for memory control and analysis.
principle
Each query, import and other tasks in the system will create its own Memory Tracker when it is initialized, and put the
Memory Tracker into TLS (Thread Local Storage) during execution, and each memory application and release of the BE
process will be in the Mem Hook Consume the Memory Tracker in the middle, and display it after the final summary.
View statistics
The real-time memory statistics results can be viewed through Doris BE's Web page http: //ip:http_port/mem_tracker.
For the
memory statistics results of historical queries, you can view the peakMemoryBytes of each query in fe/log/fe.audit.log , or
search Deregister query/load memory tracker, queryId in be/log/be.INFO `View memory peaks per query on a single BE.
Home /mem_tracker
1. Type: Divide the memory used by Doris BE into the following categories
process: The total memory of the process, the sum of all other types.
global: Global Memory Tracker with the same life cycle and process, such as each Cache, Tablet Manager, Storage Engine,
etc.
query: the in-memory sum of all queries.
load: Sum of all imported memory.
tc/jemalloc_cache: The general memory allocator TCMalloc or Jemalloc cache, you can view the original profile of the
memory allocator in real time at http: //ip:http_port/memz.
compaction, schema_change, consistency, batch_load, clone: corresponding to the memory sum of all Compaction,
Schema Change, Consistency, Batch Load, and Clone tasks respectively.
4. Peak Consumption (Bytes): The peak value of the memory after the BE process is started, the unit is B, and it will be reset
after the BE restarts.
5. Peak Consumption(Normalize): The .G.M.K formatted output of the memory peak value after the BE process starts, and
resets after the BE restarts.
Orphan: Tracker consumed by default. Memory that does not specify a tracker will be recorded in Orphan by default. In
addition to the Child Tracker subdivided below, Orphan also includes some memory that is inconvenient to accurately
subdivide and count, including BRPC.
LoadChannelMgr: The sum of the memory of all imported Load Channel stages, used to write the scanned data to
the Segment file on disk, a subset of Orphan.
StorageEngine:, the memory consumed by the storage engine during loading the data directory, a subset of Orphan.
IndexPageCache: The index used to cache the data Page, used to speed up Scan.
ChunkAllocator: Used to cache power-of-2 memory blocks, and reuse memory at the application layer.
i i i i
DeleteBitmap AggCache: Gets aggregated delete_bitmap on rowset_id and version.
1. Limit: The upper limit of memory used by a single query, show session variables to view and modify exec_mem_limit .
2. Label: The label naming rule of the Tracker for a single query is Query#Id=xxx .
Query#Id=xxx
3. Parent Label: Parent is the Tracker record of to query the memory used by different operators during
execution.
ERROR 1105 (HY000): errCode = 2, detailMessage = Memory limit exceeded:<consuming tracker:<xxx>, xxx. backend
172.1.1.1 process memory used xxx GB, limit xxx GB. If query tracker exceeded, `set exec_mem_limit=8G ` to change
limit, details mem usage see be. INFO.
The process memory exceeds the limit OR the remaining available memory of the
system is insufficient
When the following error is returned, it means that the process memory exceeds the limit, or the remaining available
memory of the system is insufficient. The specific reason depends on the memory statistics.
2. process memory used 2.68 GB exceed limit 2.47 GB or sys mem available 50.95 GB less than low water mark 3.20
GB, failed alloc size 2.00 MB : The reason for exceeding the limit is that the 2.68GB of memory used by the BE process
exceeds the limit of 2.47GB limit, the value of limit comes from mem_limit * system MemTotal in be.conf, which is equal
to 80% of the total memory of the operating system by default. The remaining available memory of the current operating
system is 50.95 GB, which is still higher than the minimum water level of 3.2GB. This time, we are trying to apply for 2MB
of memory.
3. executing msg:<execute:<ExecNode:VAGGREGATION_NODE (id=7)>>, backend 172.24.47.117 process memory used 2.68 GB,
limit 2.47 GB : The location of this memory application is ExecNode:VAGGREGATION_NODE (id= 7)> , the current IP of the
BE node is 172.1.1.1, and print the memory statistics of the BE node again.
Log Analysis
At the same time, you can find the following log in log/be.INFO to confirm whether the memory usage of the current
process meets expectations. The log is also divided into three parts:
applications in most locations of BE will sense it, and try to make a predetermined callback method, and print the
l if h l i If h l f T All i ll d ’ d i ll
process memory overrun log, so if the log is If the value of Try Alloc is small, you don’t need to pay attention to Alloc
Stacktrace , just analyze Memory Tracker Summary directly.
5. When the process memory exceeds the limit, BE will trigger memory GC.
W1127 17:23:16.372572 19896 mem_tracker_limiter.cpp:214] System Mem Exceed Limit Check Faild, Try Alloc: 1062688
process memory used 2.68 GB limit 2.47 GB, sys mem available 50.95 GB min reserve 3.20 GB, tc/jemalloc
allocator cache 51.97 MB
Alloc Stacktrace:
@ 0x50028e8 doris::MemTrackerLimiter::try_consume()
@ 0x50027c1 doris::ThreadMemTrackerMgr::flush_untracked_mem<>()
@ 0x595f234 malloc
@ 0x8f316a2 google::LogMessage::Init()
@ 0x5813fef doris::FragmentExecState::coordinator_callback()
@ 0x58383dc doris::PlanFragmentExecutor::send_report()
@ 0x5837ea8 doris::PlanFragmentExecutor::update_status()
@ 0x58355b0 doris::PlanFragmentExecutor::open()
@ 0x5815244 doris::FragmentExecState::execute()
@ 0x5817965 doris::FragmentMgr::_exec_actual()
@ 0x581fffb std::_Function_handler<>::_M_invoke()
@ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
@ 0x5a6843f doris::Thread::supervise_thread()
@ 0x7feb54f931ca start_thread
@ 0x7feb5576add3 __GI___clone
@ (nil) (unknown)
MemTrackerLimiter Label=Orphan, Type=global, Limit=-1.00 B(-1 B), Used=474.20 MB(497233597 B), Peak=649.18
MB(680718208 B)
MemTrackerLimiter Label=IndexPageCache, Type=global, Limit=-1.00 B(-1 B), Used=1.00 MB(1051092 B), Peak=0(0
B)
MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
Among them, MemAvailable is the total amount of memory that the operating system can provide to the user process
without triggering swap as much as possible given by the operating system considering the current free memory, buffer,
cache, memory fragmentation and other factors. A simple calculation Formula: MemAvailable = MemFree - LowWaterMark +
(PageCache - min(PageCache / 2, LowWaterMark)), which is the same as the available value seen by cmd free , for details,
please refer to:
https: //serverfault.com/questions/940196/why-is-memaavailable-a-lot-less-than-memfreebufferscached
https: //git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773
The low water mark defaults to a maximum of 1.6G, calculated based on MemTotal , vm/min_free_kbytes , confg::mem_limit ,
config::max_sys_mem_available_low_water_mark_bytes , and avoid wasting too much memory. Among them, MemTotal is
the total memory of the system, and the value also comes from /proc/meminfo ; vm/min_free_kbytes is the buffer reserved
by the operating system for the memory GC process, and the value is usually between 0.4% and 5%. vm/min_free_kbytes
may be 5% on some cloud servers, which will lead to visually that the available memory of the system is less than the real
value; increasing config::max_sys_mem_available_low_water_mark_bytes will reserve more for Full GC on machines with
more than 16G memory If there are more memory buffers, otherwise, the memory will be fully used as much as possible.
Log Analysis
After set global enable_profile=true , you can print a log in log/be.INFO when a single query memory exceeds the limit, to
confirm whether the current query memory usage meets expectations.
At the same time, you can find the following logs in
log/be.INFO to confirm whether the current query memory usage meets expectations. The logs are also divided into three
parts:
current used 96.88 MB>, executing msg:<execute:<ExecNode:VHASH_JOIN_NODE (id=2)>>. backend 172.24.47.117 process
memory used 1.13 GB, limit 98.92 GB. If query tracker exceed, `set exec_mem_limit=8G` to change limit, details
mem usage see be.INFO.
process memory used 1.13 GB limit 98.92 GB, sys mem available 45.15 GB min reserve 3.20 GB, tc/jemalloc
allocator cache 27.62 MB
Alloc Stacktrace:
@ 0x66cf73a doris::vectorized::HashJoinNode::_materialize_build_side()
@ 0x69cb1ee doris::vectorized::VJoinNodeBase::open()
@ 0x66ce27a doris::vectorized::HashJoinNode::open()
@ 0x5835dad doris::PlanFragmentExecutor::open_vectorized_internal()
@ 0x58351d2 doris::PlanFragmentExecutor::open()
@ 0x5815244 doris::FragmentExecState::execute()
@ 0x5817965 doris::FragmentMgr::_exec_actual()
@ 0x581fffb std::_Function_handler<>::_M_invoke()
@ 0x5a6f2c1 doris::ThreadPool::dispatch_thread()
@ 0x5a6843f doris::Thread::supervise_thread()
@ 0x7f6faa94a1ca start_thread
@ 0x7f6fab121dd3 __GI___clone
@ (nil) (unknown)
BE OOM Analysis
Since Version 1.2.0
Ideally, in Memory Limit Exceeded Analysis, we regularly detect the remaining available memory of the operating system and
respond in time when the memory is insufficient , such as triggering the memory GC to release the cache or cancel the
memory overrun query, but because refreshing process memory statistics and memory GC both have a certain lag, and it is
difficult for us to completely catch all large memory applications, there are still OOM risk.
Solution
Refer to BE Configuration Items to reduce mem_limit and increase max_sys_mem_available_low_water_mark_bytes in
be.conf .
Memory analysis
If you want to further understand the memory usage location of the BE process before OOM and reduce the memory usage
of the process, you can refer to the following steps to analyze.
dmesg -T
1. confirms the time of OOM and the process memory at the time of OOM.
2. Check whether there is a Memory Tracker Summary log at the end of be/log/be.INFO. If it indicates that BE has detected
memory overrun, go to step 3, otherwise go to step 8.
MemTrackerLimiter Label=Orphan, Type=global, Limit=-1.00 B(-1 B), Used=474.20 MB(497233597 B), Peak=649.18
MB(680718208 B)
MemTrackerLimiter Label=IndexPageCache, Type=global, Limit=-1.00 B(-1 B), Used=1.00 MB(1051092 B), Peak=0(0
B)
MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
3. When the end of be/log/be.INFO before OOM contains the system memory exceeded log, refer to Memory Limit
Exceeded Analysis. The log analysis method in md) looks at the memory usage of each category of the process. If the
current type=query memory usage is high, if the query before OOM is known, continue to step 4, otherwise continue to
step 5; if the current type=load memory usage is more, continue to step 6, if the current type= Global memory is used
too much and continue to step 7.
4. type=query query memory usage is high, and the query before OOM is known, such as test cluster or scheduled task,
restart the BE node, refer to Memory Tracker View real-time memory tracker statistics, retry the query after set global
enable_profile=true , observe the memory usage location of specific operators, confirm whether the query memory
usage is reasonable, and further consider optimizing SQL memory usage, such as adjusting the join order .
5. type=query query memory usage is high, and the query before OOM is unknown, such as in an online cluster, then
Deregister query/load memory tracker from the back to the front in , queryId Register
search be/log/be.INFO and
query/load memory tracker, query/load id , if the same query id prints the above two lines of logs at the same time, it
means that the query or import is successful. If there is only Register but no Deregister, the query or import is still before
OOM In this way, all running queries and imports before OOM can be obtained, and the memory usage of suspicious
large-memory queries can be analyzed according to the method in step 4.
type=global type=global
7. When the memory is used for a long time, continue to check the detailed statistics in the
second half of the Memory Tracker Summary log. When DataPageCache, IndexPageCache, SegmentCache,
ChunkAllocator, LastestSuccessChannelCache, etc. use a lot of memory, refer to BE Configuration Item to consider
modifying the size of the cache; when Orphan memory usage is too large, Continue the analysis as follows.
If the sum of the tracker statistics of Parent Label=Orphan only accounts for a small part of the Orphan memory, it
means that there is currently a large amount of memory that has no accurate statistics, such as the memory of the brpc
process. At this time, you can consider using the heap profile Memory Tracker to further analyze memory locations.
If the tracker statistics of Parent Label=Orphan account for most of Orphan’s memory, when Label=TabletManager uses
a lot of memory, further check the number of tablets in the cluster. If there are too many tablets, delete them and they
will not be used table or data; when Label=StorageEngine uses too much memory, further check the number of
segment files in the cluster, and consider manually triggering compaction if the number of segment files is too large;
8. If be/log/be.INFO does not print the Memory Tracker Summary log before OOM, it means that BE did not detect the
memory limit in time, observe Grafana memory monitoring to confirm the memory growth trend of BE before OOM, if
OOM is reproducible, consider adding memory_debug=true in be.conf , after restarting the cluster, the cluster memory
statistics will be printed every second, observe the last Memory Tracker Summary log before OOM, and continue to step 3
for analysis;
Config Dir
conf/
The configuration file directory for FE and BE is . In addition to storing the default fe.conf, be.conf and other files, this
directory is also used for the common configuration file storage directory.
Users can store some configuration files in it, and the system will automatically read them.
We can manually fill in various HDFS/Hive parameters in the corresponding statement of the function.
But these parameters are very many, if all are filled in manually, it is very troublesome.
Therefore, users can place the HDFS or Hive configuration file hdfs-site.xml/hive-site.xml directly in the conf/ directory.
Doris will automatically read these configuration files.
The configuration that the user fills in the command will overwrite the configuration items in the configuration file.
In this way, users only need to fill in a small amount of configuration to complete the access to HDFS/Hive.
FE Configuration
This document mainly introduces the relevant configuration items of FE.
The FE configuration file fe.conf is usually stored in the conf/ directory of the FE deployment path. In version 0.14, another
configuration file fe_custom.conf will be introduced. The configuration file is used to record the configuration items that are
dynamically configured and persisted by the user during operation.
After the FE process is started, it will read the configuration items in fe.conf first, and then read the configuration items in
fe_custom.conf . The configuration items in fe_custom.conf will overwrite the same configuration items in fe.conf .
The location of the fe_custom.conf file can be configured in fe.conf through the custom_config_dir configuration item.
1. FE web page
Open the FE web page http://fe_host:fe_http_port/variable in the browser. You can see the currently effective FE
configuration items in Configure Info .
2. View by command
After the FE is started, you can view the configuration items of the FE in the MySQL client with the following command:
MasterOnly: Whether it is a unique configuration item of Master FE node. If it is true, it means that the configuration
item is meaningful only at the Master FE node, and is meaningless to other types of FE nodes. If false, it means that
the configuration item is meaningful in all types of FE nodes.
Comment: The description of the configuration item.
1. Static configuration
Add and set configuration items in the conf/fe.conf file. The configuration items in fe.conf will be read when the FE
fe.conf
process starts. Configuration items not in will use default values.
2. Dynamic configuration via MySQL protocol
After the FE starts, you can set the configuration items dynamically through the following commands. This command
requires administrator privilege.
Not all configuration items support dynamic configuration. You can check whether the dynamic configuration is
supported by the IsMutable column in the ADMIN SHOW FRONTEND CONFIG; command result.
MasterOnly
If the configuration item of is modified, the command will be directly forwarded to the Master FE and only
the corresponding configuration item in the Master FE will be modified.
Configuration items modified in this way will become invalid after the FE process restarts.
For more help on this command, you can view it through the HELP ADMIN SET CONFIG; command.
This method can also persist the modified configuration items. The configuration items will be persisted in the
fe_custom.conf file and will still take effect after FE is restarted.
Examples
1. Modify async_pending_load_task_pool_size
Through ADMIN SHOW FRONTEND CONFIG; you can see that this configuration item cannot be dynamically configured
( IsMutable is false). You need to add in fe.conf :
async_pending_load_task_pool_size = 20
2. Modify dynamic_partition_enable
Through ADMIN SHOW FRONTEND CONFIG; you can see that the configuration item can be dynamically configured
( IsMutable is true). And it is the unique configuration of Master FE. Then first we can connect to any FE and execute the
following command to modify the configuration:
Afterwards, you can view the modified value with the following command:
After modification in the above manner, if the Master FE restarts or a Master election is performed, the configuration will
be invalid. You can add the configuration item directly in fe.conf and restart the FE to make the configuration item
permanent.
3. Modify max_distribution_pruner_recursion_depth
Through ADMIN SHOW FRONTEND CONFIG; you can see that the configuration item can be dynamically configured
( IsMutable is true). It is not unique to Master FE.
Similarly, we can modify the configuration by dynamically modifying the configuration command. Because this
configuration is not unique to the Master FE, user need to connect to different FEs separately to modify the configuration
dynamically, so that all FEs use the modified configuration values.
Configurations
meta_dir
Default :DORIS_HOME_DIR + "/doris-meta"
Type: string Description: Doris meta data will be saved here.The storage of this dir is highly recommended as to be:
IsMutable:true
The tryLock timeout configuration of catalog lock. Normally it does not need to change, unless you need to test something.
enable_bdbje_debug_mode
Default :false
If set to true, FE will be started in BDBJE debug mode
max_bdbje_clock_delta_ms
Default :5000 (5s)
Set the maximum acceptable clock skew between non-master FE to Master FE host. This value is checked whenever a non-
master FE establishes a connection to master FE via BDBJE. The connection is abandoned if the clock skew is larger than this
value.
metadata_failure_recovery
Default :false
If true, FE will reset bdbje replication group(that is, to remove all electable nodes info) and is supposed to start as Master. If all
the electable nodes can not start, we can copy the meta data to another node and set this config to true to try to restart the
FE..
txn_rollback_limit
Default :100
the max txn number which bdbje can rollback when trying to rejoin the group
bdbje_replica_ack_timeout_second
Default :10 (s)
The replica ack timeout when writing to bdbje , When writing some relatively large logs, the ack time may time out,
resulting in log writing failure. At this time, you can increase this value appropriately.
bdbje_lock_timeout_second
Default :1
The lock timeout of bdbje operation , If there are many LockTimeoutException in FE WARN log, you can try to increase this
value
bdbje_heartbeat_timeout_second
Default :30
The heartbeat timeout of bdbje between master and follower. the default is 30 seconds, which is same as default value in
bdbje. If the network is experiencing transient problems, of some unexpected long java GC annoying you, you can try to
increase this value to decrease the chances of false timeouts
replica_ack_policy
Default :SIMPLE_MAJORITY
OPTION :ALL, NONE, SIMPLE_MAJORITY
Replica ack policy of bdbje. more info, see:
http: //docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.ReplicaAckPolicy.html
replica_sync_policy
Default :SYNC
选项:SYNC, NO_SYNC, WRITE_NO_SYNC
Follower FE sync policy of bdbje.
master_sync_policy
Default :SYNC
选项:SYNC, NO_SYNC, WRITE_NO_SYNC
Master FE sync policy of bdbje. If you only deploy one Follower FE, set this to 'SYNC'. If you deploy more than 3 Follower FE,
you can set this and the following 'replica_sync_policy' to WRITE_NO_SYNC. more info, see:
http: //docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/Durability.SyncPolicy.html
bdbje_reserved_disk_bytes
The desired upper limit on the number of bytes of reserved space to retain in a replicated JE Environment.
Default: 1073741824
ignore_meta_check
:false
Default
IsMutable:true
If true, non-master FE will ignore the meta data delay gap between Master FE and its self, even if the metadata delay gap
exceeds meta_delay_toleration_second. Non-master FE will still offer read service.
This is helpful when you try to stop the
Master FE for a relatively long time for some reason, but still wish the non-master FE can offer read service.
meta_delay_toleration_second
Default :300 (5 min)
Non-master FE will stop offering service if meta data delay gap exceeds meta_delay_toleration_second
edit_log_port
Default :9010
bdbje port
edit_log_type
Default :BDB
Edit log type.
BDB: write log to bdbje
LOCAL: deprecated..
edit_log_roll_num
Default :50000
:
IsMutable true
MasterOnly:true
force_do_metadata_checkpoint
Default :false
:
IsMutable true
MasterOnly :true
If set to true, the checkpoint thread will make the checkpoint regardless of the jvm memory used percent
metadata_checkpoint_memory_threshold
:60 (60%)
Default
IsMutable:true
MasterOnly :true
If the jvm memory used percent(heap or old mem pool) exceed this threshold, checkpoint thread will not work to avoid
OOM.
max_same_name_catalog_trash_num
It is used to set the maximum number of meta information with the same name in the catalog recycle bin. When the
maximum value is exceeded, the earliest deleted meta trash will be completely deleted and cannot be recovered. 0 means
not to keep objects of the same name. < 0 means no limit.
Note: The judgment of metadata with the same name will be limited to a certain range. For example, the judgment of the
database with the same name will be limited to the same cluster, the judgment of the table with the same name will be
limited to the same database (with the same database id), the judgment of the partition with the same name will be limited
to the same database (with the same database id) and the same table (with the same table) same table id).
Default: 3
cluster_id
Default :-1
node(FE or BE) will be considered belonging to the same Palo cluster if they have same cluster id. Cluster id is usually a
random integer generated when master FE start at first time. You can also specify one.
heartbeat_mgr_blocking_queue_size
:1024
Default
MasterOnly:true
heartbeat_mgr_threads_num
Default :8
MasterOnly :true
num of thread to handle heartbeat events in heartbeat_mgr.
disable_cluster_feature
Default :true
:
IsMutable true
The multi cluster feature will be deprecated in version 0.12 ,set this config to true will disable all operations related to cluster
feature, include:
1. create/drop cluster
2. add free backend/add backend to cluster/decommission cluster balance
3. change the backends num of cluster
4. link/migration db
enable_deploy_manager
Default :disable
Set to true if you deploy Doris using thirdparty deploy manager
enable_fqdn_mode
This configuration is mainly used in the k8s cluster environment. When enable_fqdn_mode is true, the name of the pod
where the be is located will remain unchanged after reconstruction, while the ip can be changed.
Default: false
enable_token_check
Default :true
For forward compatibility, will be removed later. check token when download image file.
enable_multi_tags
Default: false
Service
query_port
Default :9030
FE MySQL server port
frontend_address
Status: Deprecated, not recommended use. This parameter may be deleted later
Type: string
Description: Explicitly set the IP address of FE instead of using InetAddress.getByName to get the IP address. Usually in
InetAddress.getByName When the expected results cannot be obtained. Only IP address is supported, not hostname.
priority_networks
Default :none
Declare a selection strategy for those servers have many ips. Note that there should at most one ip match this list. this is a list
in semicolon-delimited format, in CIDR notation, e.g. 10.10.10.0/24 , If no ip match this rule, will choose one randomly..
http_port
Default :8030
HTTP bind port. Defaults to 8030
qe_max_connection
Default :1024
Maximal number of connections per FE.
max_connection_scheduler_threads_num
Default :4096
Maximal number of thread in connection-scheduler-pool.
The current strategy is to apply for a separate thread for service when there is a request
check_java_version
Default :true
Doris will check whether the compiled and run Java versions are compatible, if not, it will throw a Java version mismatch
exception message and terminate the startup
rpc_port
Default :9020
FE Thrift Server port
thrift_server_type
This configuration represents the service model used by The Thrift Service of FE, is of type String and is case-insensitive.
If this parameter is 'SIMPLE', then the 'TSimpleServer' model is used, which is generally not suitable for production and is
limited to test use.
If the parameter is 'THREADED', then the 'TThreadedSelectorServer' model is used, which is a non-blocking I/O model,
namely the master-slave Reactor model, which can timely respond to a large number of concurrent connection requests
and performs well in most scenarios.
If this parameter is THREAD_POOL , then the TThreadPoolServer model is used, the model for blocking I/O model, use the
thread pool to handle user connections, the number of simultaneous connections are limited by the number of thread pool,
if we can estimate the number of concurrent requests in advance, and tolerant enough thread resources cost, this model will
have a better performance, the service model is used by default
thrift_server_max_worker_threads
Default :4096
The thrift server max worker threads
thrift_backlog_num
Default :1024
The backlog_num for thrift server , When you enlarge this backlog_num, you should ensure it's value larger than the linux
/proc/sys/net/core/somaxconn config
thrift_client_timeout_ms
Default :0
The connection timeout and socket timeout config for thrift server.
use_compact_thrift_rpc
Default: true
Whether to use compressed format to send query plan structure. After it is turned on, the size of the query plan structure
can be reduced by about 50%, thereby avoiding some "send fragment timeout" errors.
However, in some high-concurrency
small query scenarios, the concurrency may be reduced by about 10%.
grpc_max_message_size_bytes
Default :1G
Used to set the initial flow window size of the GRPC client channel, and also used to max message size. When the result set is
large, you may need to increase this value.
max_mysql_service_task_threads_num
Default :4096
When FeEstarts the MySQL server based on NIO model, the number of threads responsible for Task events. Only
mysql_service_nio_enabled is true takes effect.
mysql_service_io_threads_num
Default :4
When FeEstarts the MySQL server based on NIO model, the number of threads responsible for IO events.
mysql_nio_backlog_num
Default :1024
The backlog_num for mysql nio server, When you enlarge this backlog_num, you should enlarge the value in the linux
/proc/sys/net/core/somaxconn file at the same time
broker_timeout_ms
Default :10000 (10s)
Default broker RPC timeout
backend_rpc_timeout_ms
Timeout millisecond for Fe sending rpc request to BE
Default: 60000
drop_backend_after_decommission
:false
Default
IsMutable:true
MasterOnly:true
1. This configuration is used to control whether the system drops the BE after successfully decommissioning the BE. If true,
the BE node will be deleted after the BE is successfully offline. If false, after the BE successfully goes offline, the BE will
remain in the DECOMMISSION state, but will not be dropped.
This configuration can play a role in certain scenarios. Assume that the initial state of a Doris cluster is one disk per BE
node. After running for a period of time, the system has been vertically expanded, that is, each BE node adds 2 new disks.
Because Doris currently does not support data balancing among the disks within the BE, the data volume of the initial
disk may always be much higher than the data volume of the newly added disk. At this time, we can perform manual
inter-disk balancing by the following operations:
iii. After the decommission operation is completed, the BE will not be dropped. At this time, cancel the decommission
status of the BE. Then the data will start to balance from other BE nodes back to this node. At this time, the data will
be evenly distributed to all disks of the BE.
iv. Perform steps 2 and 3 for all BE nodes in sequence, and finally achieve the purpose of disk balancing for all nodes
max_backend_down_time_second
:3600 (1 hours)
Default
IsMutable:true
MasterOnly :true
If a backend is down for max_backend_down_time_second, a BACKEND_DOWN event will be triggered.
disable_backend_black_list
Used to disable the BE blacklist function. After this function is disabled, if the query request to the BE fails, the BE will not be
added to the blacklist.
This parameter is suitable for regression testing environments to reduce occasional bugs that cause a
large number of regression tests to fail.
Default: false
max_backend_heartbeat_failure_tolerance_count
The maximum tolerable number of BE node heartbeat failures. If the number of consecutive heartbeat failures exceeds this
value, the BE state will be set to dead.
This parameter is suitable for regression test environments to reduce occasional
heartbeat failures that cause a large number of regression test failures.
Default: 1
enable_access_file_without_broker
Default :false
:
IsMutable true
MasterOnly :true
This config is used to try skip broker when access bos or other cloud storage via broker
agent_task_resend_wait_time_ms
Default :5000
:
IsMutable true
MasterOnly:true
This configuration will decide whether to resend agent task when create_time for agent_task is set, only when current_time -
create_time > agent_task_resend_wait_time_ms can ReportHandler do resend agent task.
This configuration is currently mainly used to solve the problem of repeated sending of PUBLISH_VERSION agent tasks. The
current default value of this configuration is 5000, which is an experimental value.
Because there is a certain time delay between submitting agent tasks to AgentTaskQueue and submitting to be, Increasing
the value of this configuration can effectively solve the problem of repeated sending of agent tasks,
But at the same time, it will cause the submission of failed or failed execution of the agent task to be executed again for an
extended period of time
max_agent_task_threads_num
:4096
Default
MasterOnly:true
remote_fragment_exec_timeout_ms
Default :5000 (ms)
:
IsMutable true
The timeout of executing async remote fragment. In normal case, the async remote fragment will be executed in a short
,
time. If system are under high load condition try to set this timeout longer.
auth_token
Default :empty
Cluster token used for internal authentication.
enable_http_server_v2
Default :The default is true after the official 0.14.0 version is released, and the default is false before
HTTP Server V2 is implemented by SpringBoot. It uses an architecture that separates the front and back ends. Only when
httpv2 is enabled can users use the new front-end UI interface.
http_api_extra_base_path
In some deployment environments, user need to specify an additional base path as the unified prefix of the HTTP API. This
parameter is used by the user to specify additional prefixes.
After setting, user can get the parameter value through the GET
/api/basepath interface. And the new UI will also try to get this base path first to assemble the URL. Only valid when
enable_http_server_v2 is true.
The default is empty, that is, not set
jetty_server_acceptors
Default :2
jetty_server_selectors
Default :4
jetty_server_workers
Default :0
With the above three parameters, Jetty's thread architecture model is very simple, divided into acceptors, selectors and
workers three thread pools. Acceptors are responsible for accepting new connections, and then hand them over to selectors
to process the unpacking of the HTTP message protocol, and finally workers process the request. The first two thread pools
adopt a non-blocking model, and one thread can handle the read and write of many sockets, so the number of thread pools
is small.
For most projects, only 1-2 acceptors threads are required, and 2 to 4 selectors threads are sufficient. Workers are obstructive
business logic, often have more database operations, and require a large number of threads. The specific number depends on
the proportion of QPS and IO events of the application. The higher the QPS, the more threads are required, the higher the
proportion of IO, the more threads waiting, and the more total threads required.
Worker thread pool is not set by default, set according to your needs
jetty_server_max_http_post_size
Default : 100 * 1024 * 1024 (100MB)
This is the maximum number of bytes of the file uploaded by the put or post method, the default value: 100MB
jetty_server_max_http_header_size
Default :10240 (10K)
http header size configuration parameter, the default value is 10K
enable_tracing
:false
Default
IsMutable:false
MasterOnly :false
Whether to enable tracking
trace_exporter
Default :zipkin
:
IsMutable false
MasterOnly:false
trace_export_url
Default : http://127.0.0.1:9411/api/v2/spans
:
IsMutable false
MasterOnly:false
http://127.0.0.1:9411/api/v2/spans
trace export to zipkin like:
Query Engine
default_max_query_instances
The default value when user property max_query_instances is equal or less than 0. This config is used to limit the max
number of instances for a user. This parameter is less than or equal to 0 means unlimited.
max_query_retry_time
Default :1
:
IsMutable true
The number of query retries. A query may retry if we encounter RPC exception and no result has been sent to user. You may
reduce this number to avoid Avalanche disaster
max_dynamic_partition_num
:500
Default
IsMutable:true
MasterOnly :true
Used to limit the maximum number of partitions that can be created when creating a dynamic partition table, to avoid
creating too many partitions at one time. The number is determined by "start" and "end" in the dynamic partition
parameters..
dynamic_partition_enable
Default :true
:
IsMutable true
MasterOnly:true
dynamic_partition_check_interval_seconds
Default :600 (s)
:
IsMutable true
MasterOnly :true
Decide how often to check dynamic partition
:4096
Default
IsMutable:false
MasterOnly :true
Used to limit the maximum number of partitions that can be created when multi creating partitions, to avoid creating too
many partitions at one time.
partition_in_memory_update_interval_secs
Default :300 (s)
:
IsMutable true
MasterOnly:true
enable_concurrent_update
:false
Default
IsMutable:false
MasterOnly :true
Whether to enable concurrent update
lower_case_table_names
Default :0
:
IsMutable false
MasterOnly:true
This configuration can only be configured during cluster initialization and cannot be modified during cluster
restart and
upgrade after initialization is complete.
0: table names are stored as specified and comparisons are case sensitive.
1: table names are stored in lowercase and
comparisons are not case sensitive.
2: table names are stored as given but compared in lowercase.
table_name_length_limit
:64
Default
IsMutable:true
MasterOnly :true
Used to control the maximum table name length
cache_enable_sql_mode
Default :true
:
IsMutable true
MasterOnly:false
If this switch is turned on, the SQL query result set will be cached. If the interval between the last visit version time in all
partitions of all tables in the query is greater than cache_last_version_interval_second, and the result set is less than
cache_result_max_row_count, the result set will be cached, and the next same SQL will hit the cache
If set to true, fe will enable sql result caching. This option is suitable for offline data update scenarios
cache_enable_partition_mode
:true
Default
IsMutable:true
MasterOnly :false
If set to true, fe will get data from be cache, This option is suitable for real-time updating of partial partitions.
cache_result_max_row_count
:3000
Default
IsMutable:true
MasterOnly :false
In order to avoid occupying too much memory, the maximum number of rows that can be cached is 2000 by default. If this
threshold is exceeded, the cache cannot be set
cache_last_version_interval_second
:900
Default
IsMutable:true
MasterOnly :false
The time interval of the latest partitioned version of the table refers to the time interval between the data update and the
current version. It is generally set to 900 seconds, which distinguishes offline and real-time import
enable_batch_delete_by_default
Default :false
:
IsMutable true
MasterOnly:true
Whether to add a delete sign column when create unique table
max_allowed_in_element_num_of_delete
Default :1024
:
IsMutable true
MasterOnly:true
max_running_rollup_job_num_per_table
Default :1
:
IsMutable true
MasterOnly:true
max_distribution_pruner_recursion_depth
:100
Default
IsMutable:true
MasterOnly :false
This will limit the max recursion depth of hash distribution pruner.
eg: where a in (5 elements) and b in (4 elements) and c in (3
elements) and d in (2 elements).
a/b/c/d are distribution columns, so the recursion depth will be 5 4 3 * 2 = 120, larger than 100,
So that distribution pruner will no work and just return all buckets.
Increase the depth can support distribution pruning for
more elements, but may cost more CPU.
enable_local_replica_selection
:false
Default
IsMutable:true
If set to true, Planner will try to select replica of tablet on same host as this Frontend.
This may reduce network transmission
in following case:
In this case, all Frontends can only use local replicas to do the query. If you want to allow fallback to nonlocal replicas when
no local replicas available, set enable_local_replica_selection_fallback to true.
enable_local_replica_selection_fallback
:false
Default
IsMutable:true
Used with enable_local_replica_selection. If the local replicas is not available, fallback to the nonlocal replicas.
expr_depth_limit
:3000
Default
IsMutable:true
Limit on the depth of an expr tree. Exceed this limit may cause long analysis time while holding db read lock. Do not set this if
you know what you are doing
expr_children_limit
:10000
Default
IsMutable:true
Limit on the number of expr children of an expr tree. Exceed this limit may cause long analysis time while holding database
read lock.
be_exec_version
Used to define the serialization format for passing blocks between fragments.
Sometimes some of our code changes will change the data format of the block. In order to make the BE compatible with
each other during the rolling upgrade process, we need to issue a data version from the FE to decide what format to send the
data in.
Specifically, for example, there are 2 BEs in the cluster, one of which can support the latest $v_1$ after being upgraded, while
the other only supports $v_0$. At this time, since the FE has not been upgraded yet, $v_0 is issued uniformly. $, BE interact in
the old data format. After all BEs are upgraded, we will upgrade FE. At this time, the new FE will issue $v_1$, and the cluster
will be uniformly switched to the new data format.
The default value is max_be_exec_version . If there are special needs, we can manually set the format version to lower, but it
should not be lower than min_be_exec_version .
Note that we should always keep the value of this variable between BeExecVersionManager::min_be_exec_version and
BeExecVersionManager::max_be_exec_version for all BEs. (That is to say, if a cluster that has completed the update needs to
be downgraded, it should ensure the order of downgrading FE and then downgrading BE, or manually lower the variable in
the settings and downgrade BE)
max_be_exec_version
The latest data version currently supported, cannot be modified, and should be consistent with the
BeExecVersionManager::max_be_exec_version in the BE of the matching version.
min_be_exec_version
The oldest data version currently supported, which cannot be modified, should be consistent with the
BeExecVersionManager::min_be_exec_version in the BE of the matching version.
max_query_profile_num
Default: 100
publish_version_interval_ms
Default :10 (ms)
minimal intervals between two publish version action
publish_version_timeout_second
Default :30 (s)
:
IsMutable true
MasterOnly:true
Maximal waiting time for all publish version tasks of one transaction to be finished
query_colocate_join_memory_limit_penalty_factor
Default :1
:
IsMutable true
的
colocote join PlanFragment instance memory_limit = exec_mem_limit / min
(query_colocate_join_memory_limit_penalty_factor, instance_num)
rewrite_count_distinct_to_bitmap_hll
Default: true
This variable is a session variable, and the session level takes effect.
Type: boolean
Description: Only for the table of the AGG model, when the variable is true, when the user query contains aggregate
functions such as count(distinct c1), if the type of the c1 column itself is bitmap, count distnct will be rewritten It is
bitmap_union_count(c1). When the type of the c1 column itself is hll, count distinct will be rewritten as hll_union_agg(c1) If
the variable is false, no overwriting occurs..
enable_vectorized_load
Default: true
enable_new_load_scan_node
Default: true
default_max_filter_ratio
:0
Default
IsMutable:true
MasterOnly :true
Maximum percentage of data that can be filtered (due to reasons such as data is irregularly) , The default value is 0.
max_running_txn_num_per_db
:1000
Default
IsMutable:true
MasterOnly :true
This configuration is mainly used to control the number of concurrent load jobs of the same database.
When there are too many load jobs running in the cluster, the newly submitted load jobs may report errors:
When this error is encountered, it means that the load jobs currently running in the cluster exceeds the configuration value.
At this time, it is recommended to wait on the business side and retry the load jobs.
If you use the Connector, the value of this parameter can be adjusted appropriately, and there is no problem with thousands
using_old_load_usage_pattern
:false
Default
IsMutable:true
MasterOnly :true
If set to true, the insert stmt with processing error will still return a label to user. And user can use this label to check the load
job's status. The default value is false, which means if insert operation encounter errors, exception will be thrown to user
client directly without load label.
disable_load_job
:false
Default
IsMutable:true
MasterOnly :true
if this is set to true
all pending load job will failed when call begin txn api
all prepare load job will failed when call commit txn api
all committed load job will waiting to be published
commit_timeout_second
Default :30
:
IsMutable true
MasterOnly:true
Maximal waiting time for all data inserted before one transaction to be committed
This is the timeout second for the
command "commit"
max_unfinished_load_job
Default :1000
:
IsMutable true
MasterOnly :true
、 、 、
Max number of load jobs, include PENDING ETL LOADING QUORUM_FINISHED. If exceed this number, load job is not
allowed to be submitted
db_used_data_quota_update_interval_secs
:300 (s)
Default
IsMutable:true
MasterOnly :true
One master daemon thread will update database used data quota for db txn manager every
db_used_data_quota_update_interval_secs
For better data load performance, in the check of whether the amount of data used by the database before data load
exceeds the quota, we do not calculate the amount of data already used by the database in real time, but obtain the
periodically updated value of the daemon thread.
This configuration is used to set the time interval for updating the value of the amount of data used by the database
disable_show_stream_load
Default :false
:
IsMutable true
MasterOnly:true
Whether to disable show stream load and clear stream load records in memory.
max_stream_load_record_size
Default :5000
:
IsMutable true
MasterOnly:true
Default max number of recent stream load record that can be stored in memory.
fetch_stream_load_record_interval_second
Default :120
:
IsMutable true
MasterOnly:true
max_bytes_per_broker_scanner
: 3 * 1024 * 1024 * 1024L (3G)
Default
IsMutable:true
MasterOnly :true
Max bytes a broker scanner can process in one broker load job. Commonly, each Backends has one broker scanner.
default_load_parallelism
Default: 1
:
IsMutable true
MasterOnly:true
max_broker_concurrency
Default :10
:
IsMutable true
MasterOnly:true
min_bytes_per_broker_scanner
:67108864L (64M)
Default
IsMutable:true
MasterOnly :true
Minimum bytes that a single broker scanner will read.
period_of_auto_resume_min
:5 (s)
Default
IsMutable:true
MasterOnly :true
Automatically restore the cycle of Routine load
max_tolerable_backend_down_num
Default :0
:
IsMutable true
MasterOnly:true
max_routine_load_task_num_per_be
Default :5
:
IsMutable true
MasterOnly :true
the max concurrent routine load task num per BE. This is to limit the num of routine load tasks sending to a BE, and it should
also less than BE config 'routine_load_thread_pool_size'(default 10), which is the routine load task thread pool size on BE.
max_routine_load_task_concurrent_num
:5
Default
IsMutable:true
MasterOnly :true
the max concurrent routine load task num of a single routine load job
max_routine_load_job_num
Default :100
the max routine load job num, including NEED_SCHEDULED, RUNNING, PAUSE
desired_max_waiting_jobs
Default :100
:
IsMutable true
MasterOnly:true
Default number of waiting jobs for routine load and version 2 of load , This is a desired number. In some situation, such as
switch the master, the current number is maybe more than desired_max_waiting_jobs.
disable_hadoop_load
:false
Default
IsMutable:true
MasterOnly :true
Load using hadoop cluster will be deprecated in future. Set to true to disable this kind of load.
enable_spark_load
:false
Default
IsMutable:true
MasterOnly :true
Whether to enable spark load temporarily, it is not enabled by default
Note: This parameter has been deleted in version 1.2, spark_load is enabled by default
spark_load_checker_interval_second
Default :60
Spark load scheduler run interval, default 60 seconds
async_loading_load_task_pool_size
Default :10
:
IsMutable false
MasterOnly:true
The loading_load task executor pool size. This pool size limits the max running loading_load tasks.
async_pending_load_task_pool_size
Default :10
:
IsMutable false
MasterOnly:true
The pending_load task executor pool size. This pool size limits the max running pending_load tasks.
Currently, it only limits the pending_load task of broker load and spark load.
async_load_task_pool_size
Default :10
:
IsMutable false
MasterOnly:true
This configuration is just for compatible with old version, this config has been replaced by
async_loading_load_task_pool_size, it will be removed in the future.
enable_single_replica_load
Default :false
:
IsMutable true
MasterOnly:true
Whether to enable to write single replica for stream load and broker load.
min_load_timeout_second
:1 (1s)
Default
IsMutable:true
MasterOnly :true
Min stream load timeout applicable to all type of load
max_stream_load_timeout_second
:259200 (3 day)
Default
IsMutable:true
MasterOnly :true
This configuration is specifically used to limit timeout setting for stream load. It is to prevent that failed stream load
transactions cannot be canceled within a short time because of the user's large timeout setting
max_load_timeout_second
Default :259200 (3 day)
:
IsMutable true
MasterOnly:true
Max load timeout applicable to all type of load except for stream load
stream_load_default_timeout_second
:86400 * 3 (3 day)
Default
IsMutable:true
MasterOnly:true
stream_load_default_precommit_timeout_second
:3600(s)
Default
IsMutable:true
MasterOnly :true
Default stream load pre-submission timeout
insert_load_default_timeout_second
:3600(1 hour)
Default
IsMutable:true
MasterOnly :true
Default insert load timeout
mini_load_default_timeout_second
Default :3600(1 hour)
:
IsMutable true
MasterOnly:true
broker_load_default_timeout_second
Default :14400(4 hour)
:
IsMutable true
MasterOnly :true
Default broker load timeout
spark_load_default_timeout_second
Default :86400 (1 day)
:
IsMutable true
MasterOnly:true
hadoop_load_default_timeout_second
Default :86400 * 3 (3 day)
:
IsMutable true
MasterOnly:true
load_running_job_num_limit
Default :0
:
IsMutable true
MasterOnly:true
load_input_size_limit_gb
:0
Default
IsMutable:true
MasterOnly :true
The size of the data entered by the Load job, the default is 0, unlimited
load_etl_thread_num_normal_priority
Default :10
Concurrency of NORMAL priority etl load jobs. Do not change this if you know what you are doing.
load_etl_thread_num_high_priority
Default :3
Concurrency of HIGH priority etl load jobs. Do not change this if you know what you are doing
load_pending_thread_num_normal_priority
Default :10
Concurrency of NORMAL priority pending load jobs. Do not change this if you know what you are doing.
load_pending_thread_num_high_priority
Default :3
Concurrency of HIGH priority pending load jobs. Load job priority is defined as HIGH or NORMAL. All mini batch load jobs are
HIGH priority, other types of load jobs are NORMAL priority. Priority is set to avoid that a slow load job occupies a thread for a
long time. This is just a internal optimized scheduling policy. Currently, you can not specified the job priority manually, and
do not change this if you know what you are doing.
load_checker_interval_second
Default :5 (s)
The load scheduler running interval. A load job will transfer its state from PENDING to LOADING to FINISHED. The load
scheduler will transfer load job from PENDING to LOADING while the txn callback will transfer load job from LOADING to
FINISHED. So a load job will cost at most one interval to finish when the concurrency has not reached the upper limit.
load_straggler_wait_second
:300
Default
IsMutable:true
MasterOnly :true
Maximal wait seconds for straggler node in load
eg.
there are 3 replicas A, B, C
load is already quorum finished(A,B) at t1 and C
is not finished
if (current_time - t1) > 300s, then palo will treat C as a failure node
will call transaction manager to commit the
transaction and tell transaction manager that C is failed
Note: this parameter is the default value for all job and the DBA could specify it for separate job
label_keep_max_second
: 3 * 24 * 3600 (3 day)
Default
IsMutable:true
MasterOnly :true
labels of finished or cancelled load jobs will be removed after label_keep_max_second ,
1. The removed labels can be reused.
2. Set a short time will lower the FE memory usage. (Because all load jobs' info is kept in memory before being removed)
In the case of high concurrent writes, if there is a large backlog of jobs and call frontend service failed, check the log. If the
metadata write takes too long to lock, you can adjust this value to 12 hours, or 6 hours less
streaming_label_keep_max_second
Default :43200 (12 hour)
:
IsMutable true
MasterOnly:true
For some high-frequency load work, such as: INSERT, STREAMING LOAD, ROUTINE_LOAD_TASK. If it expires, delete the
completed job or task.
label_clean_interval_second
Default :1 * 3600 (1 hour)
Load label cleaner will run every label_clean_interval_second to clean the outdated jobs.
transaction_clean_interval_second
Default :30
the transaction will be cleaned after transaction_clean_interval_second seconds if the transaction is visible or aborted we
should make this interval as short as possible and each clean cycle as soon as possible
sync_commit_interval_second
The maximum time interval for committing transactions. If there is still data in the channel that has not been submitted after
this time, the consumer will notify the channel to submit the transaction.
Default: 10 (seconds)
sync_checker_interval_second
Data synchronization job running status check.
()
Default: 10 s
max_sync_task_threads_num
The maximum number of threads in the data synchronization job thread pool.
默认值:10
min_sync_commit_size
The minimum number of events that must be satisfied to commit a transaction. If the number of events received by Fe is
less than it, it will continue to wait for the next batch of data until the time exceeds sync_commit_interval_second . The
default value is 10000 events. If you want to modify this configuration, please make sure that this value is smaller than the
canal.instance.memory.buffer.size configuration on the canal side (default 16384), otherwise Fe will try to get the queue
length longer than the store before ack More events cause the store queue to block until it times out.
Default: 10000
min_bytes_sync_commit
The minimum data size required to commit a transaction. If the data size received by Fe is smaller than it, it will continue to
wait for the next batch of data until the time exceeds sync_commit_interval_second . The default value is 15MB, if you want to
canal.instance.memory.buffer.size
modify this configuration, please make sure this value is less than the product of and
canal.instance.memory.buffer.memunit on the canal side (default 16MB), otherwise Before the ack, Fe will try to obtain data
that is larger than the store space, causing the store queue to block until it times out.
max_bytes_sync_commit
The maximum number of threads in the data synchronization job thread pool. There is only one thread pool in the entire FE,
which is used to process all data synchronization tasks in the FE that send data to the BE. The implementation of the thread
SyncTaskPool
pool is in the class.
Default: 10
enable_outfile_to_local
Default :false
Whether to allow the outfile function to export the results to the local disk.
export_tablet_num_per_task
:5
Default
IsMutable:true
MasterOnly :true
Number of tablets per export query plan
export_task_default_timeout_second
Default :2 * 3600 (2 hour)
:
IsMutable true
MasterOnly:true
export_running_job_num_limit
:5
Default
IsMutable:true
MasterOnly :true
Limitation of the concurrency of running export jobs. Default is 5. 0 is unlimited
export_checker_interval_second
Default :5
Export checker's running interval.
Log
log_roll_size_mb
Default :1024 (1G)
The max size of one sys log and audit log
sys_log_dir
Default :PaloFe.DORIS_HOME_DIR + "/log"
sys_log_dir:
sys_log_level
Default :INFO
log level:INFO, WARNING, ERROR, FATAL
sys_log_roll_num
Default :10
Maximal FE log files to be kept within an sys_log_roll_interval. default is 10, which means there will be at most 10 log files in a
day
sys_log_verbose_modules
Default :{}
Verbose modules. VERBOSE level is implemented by log4j DEBUG level.
:
eg
sys_log_verbose_modules = org.apache.doris.catalog
This will only print debug log of files in package
org.apache.doris.catalog and all its sub packages.
sys_log_roll_interval
Default :DAY
sys_log_roll_interval:
sys_log_delete_age
Default :7d
default is 7 days, if log's last modify time is 7 days ago, it will be deleted.
support format:
7d 7 day
10h 10 hours
60m 60 min
120s 120 seconds
sys_log_roll_mode
Default :SIZE-MB-1024
The size of the log split, split a log file every 1 G
audit_log_dir
Default :DORIS_HOME_DIR + "/log"
audit_log_dir :
This specifies FE audit log dir..
Audit log fe.audit.log contains all requests with related infos such as user, host,
cost, status, etc
audit_log_roll_num
Default :90
Maximal FE audit log files to be kept within an audit_log_roll_interval.
audit_log_modules
qe_slow_log_ms
Default :5000 (5 seconds)
If the response time of a query exceed this threshold, it will be recorded in audit log as slow_query.
audit_log_roll_interval
Default :DAY
DAY: logsuffix is :yyyyMMdd
HOUR: logsuffix is :yyyyMMddHH
audit_log_delete_age
Default :30d
default is 30 days, if log's last modify time is 30 days ago, it will be deleted.
support format:
7d 7 day
10h 10 hours
60m 60 min
Storage
min_replication_num_per_tablet
Default: 1
max_replication_num_per_tablet
Default: 32767
default_db_data_quota_bytes
:1PB
Default
IsMutable:true
MasterOnly :true
Used to set the default database data quota size. To set the quota size of a single database, you can use:
View configuration
default_db_replica_quota_size
Default: 1073741824
:
IsMutable true
MasterOnly:true
Used to set the default database replica quota. To set the quota size of a single database, you can use:
(
View configuration
show data :
Detail HELP SHOW DATA )
recover_with_empty_tablet
:false
Default
IsMutable:true
MasterOnly :true
In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost.
In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query
will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank
Tablet to fill the missing replica to ensure that the query can be executed normally.
Set to true so that Doris will automatically use blank replicas to fill tablets which all replicas have been damaged or missing
MasterOnly :true
Can cooperate with mix_clone_task_timeout_sec to control the maximum and minimum timeout of a clone task. Under
normal circumstances, the timeout of a clone task is estimated by the amount of data and the minimum transfer rate
(5MB/s). In some special cases, these two configurations can be used to set the upper and lower bounds of the clone task
timeout to ensure that the clone task can be completed successfully.
disable_storage_medium_check
:false
Default
IsMutable:true
MasterOnly :true
If disable_storage_medium_check is true, ReportHandler would not check tablet's storage medium and disable storage cool
down function, the default value is false. You can set the value true when you don't care what the storage medium of the
tablet is.
decommission_tablet_check_threshold
:5000
Default
IsMutable:true
MasterOnly :true
This configuration is used to control whether the Master FE need to check the status of tablets on decommissioned BE. If the
size of tablets on decommissioned BE is lower than this threshold, FE will start a periodic check, if all tablets on
decommissioned BE have been recycled, FE will drop this BE immediately.
For performance consideration, please don't set a very high value for this configuration.
partition_rebalance_max_moves_num_per_selection
:10
Default
IsMutable:true
MasterOnly:true
IsMutable:true
MasterOnly :true
Valid only if use PartitionRebalancer. If this changed, cached moves will be cleared
tablet_rebalancer_type
:BeLoad
Default
MasterOnly:true
Rebalancer type(ignore case): BeLoad, Partition. If type parse failed, use BeLoad as default
max_balancing_tablets
:100
Default
IsMutable:true
MasterOnly :true
if the number of balancing tablets in TabletScheduler exceed max_balancing_tablets, no more balance check
max_scheduling_tablets
Default :2000
:
IsMutable true
MasterOnly:true
disable_balance
Default :false
:
IsMutable true
MasterOnly:true
disable_disk_balance
:true
Default
IsMutable:true
MasterOnly :true
if set to true, TabletScheduler will not do disk balance.
balance_load_score_threshold
:0.1 (10%)
Default
IsMutable:true
MasterOnly :true
the threshold of cluster balance score, if a backend's load score is 10% lower than average score, this backend will be marked
as LOW load, if load score is 10% higher than average score, HIGH load will be marked
capacity_used_percent_high_water
Default :0.75 (75%)
:
IsMutable true
MasterOnly:true
The high water of disk capacity used percent. This is used for calculating load score of a backend
clone_distribution_balance_threshold
Default :0.2
:
IsMutable true
MasterOnly:true
Balance threshold of num of replicas in Backends.
clone_capacity_balance_threshold
:0.2
Default
IsMutable:true
MasterOnly :true
Balance threshold of data size in BE.
i. Calculate the average used capacity(AUC) of the entire cluster. (total data size / total backends num)
iv. The Clone checker will try to move replica from high water level BE to low water level BE.
disable_colocate_balance
Default :false
:
IsMutable true
MasterOnly:true
This configs can set to true to disable the automatic colocate tables's relocate and balance. If 'disable_colocate_balance' is set
to true, ColocateTableBalancer will not relocate and balance colocate tables.
Attention:
disable_tablet_scheduler
Default:false
:
IsMutable true
MasterOnly:true
If set to true, the tablet scheduler will not work, so that all tablet repair/balance task will not work.
enable_force_drop_redundant_replica
Default: false
If set to true, the system will immediately drop redundant replicas in the tablet scheduling logic. This may cause some load
jobs that are writing to the corresponding replica to fail, but it will speed up the balance and repair speed of the tablet.
When
there are a large number of replicas waiting to be balanced or repaired in the cluster, you can try to set this config to speed
up the balance and repair of replicas at the expense of partial load success rate.
colocate_group_relocate_delay_second
Default: 1800
The relocation of a colocation group may involve a large number of tablets moving within the cluster. Therefore, we should
use a more conservative strategy to avoid relocation of colocation groups as much as possible.
Reloaction usually occurs
after a BE node goes offline or goes down. This parameter is used to delay the determination of BE node unavailability. The
default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group will not be triggered.
#### allow_replica_on_same_host
Default: false
Whether to allow multiple replicas of the same tablet to be distributed on the same host. This parameter is mainly used for
local testing, to facilitate building multiple BEs to test certain multi-replica situations. Do not use it for non-test
environments.
repair_slow_replica
Default: false
:
IsMutable true
MasterOnly: true
If set to true, the replica with slower compaction will be automatically detected and migrated to other machines. The
detection condition is that the version count of the fastest replica exceeds the value of
min_version_count_indicate_replica_compaction_too_slow , and the ratio of the version count difference from the fastest
replica exceeds the value of valid_version_count_delta_ratio_between_replicas
min_version_count_indicate_replica_compaction_too_slow
Default: 200
The version count threshold used to judge whether replica compaction is too slow
skip_compaction_slower_replica
Default: true
valid_version_count_delta_ratio_between_replicas
Default: 0.5
The valid ratio threshold of the difference between the version count of the slowest replica and the fastest replica. If
repair_slow_replica is set to true, it is used to determine whether to repair the slowest replica
min_bytes_indicate_replica_too_large
Default: 2 * 1024 * 1024 * 1024 (2G)
The data size threshold used to judge whether replica is too large
schedule_slot_num_per_path
Default :2
the default slot number per path in tablet scheduler , remove this config and dynamically adjust it by clone task statistic
tablet_repair_delay_factor_second
:60 (s)
Default
IsMutable:true
MasterOnly :true
the factor of delay time before deciding to repair tablet.
tablet_stat_update_interval_second
Default :300(5min)
update interval of tablet stat,
All frontends will get tablet stat from all backends at each interval
storage_flood_stage_usage_percent
:95 (95%)
Default
IsMutable:true
MasterOnly :true
storage_flood_stage_left_capacity_bytes
MasterOnly :true
If capacity of disk reach the 'storage_flood_stage_usage_percent' and 'storage_flood_stage_left_capacity_bytes', the following
operation will be rejected:
1. load job
2. restore job
storage_high_watermark_usage_percent
:85 (85%)
Default
IsMutable:true
MasterOnly :true
storage_min_left_capacity_bytes
Default : 2 * 1024 * 1024 * 1024 (2GB)
:
IsMutable true
MasterOnly:true
'storage_high_watermark_usage_percent' limit the max capacity usage percent of a Backend storage path.
'storage_min_left_capacity_bytes' limit the minimum left capacity of a Backend storage path. If both limitations are reached,
this storage path can not be chose as tablet balance destination. But for tablet recovery, we may exceed these limit for
keeping data integrity as much as possible.
catalog_trash_expire_second
Default :86400L (1 day)
:
IsMutable true
MasterOnly :true
After dropping database(table/partition), you can recover it by using RECOVER stmt. And this specifies the maximal data
retention time. After time, the data will be deleted permanently.
storage_cooldown_second
default_storage_medium
Default :HDD
When create a table(or partition), you can specify its storage medium(HDD or SSD). If not set, this specifies the default
medium when creat.
enable_storage_policy
Whether to enable the Storage Policy feature. This feature allows users to separate hot and cold data. This feature is still under
development. Recommended for test environments only.
Default: false
check_consistency_default_timeout_second
:600 (10 minutes)
Default
IsMutable:true
MasterOnly :true
Default timeout of a single consistency check task. Set long enough to fit your tablet size
consistency_check_start_time
Default :23
:
IsMutable true
MasterOnly:true
If the two times are the same, no consistency check will be triggered.
consistency_check_end_time
Default :23
:
IsMutable true
MasterOnly:true
If the two times are the same, no consistency check will be triggered.
replica_delay_recovery_second
:0
Default
IsMutable:true
MasterOnly :true
the minimal delay seconds between a replica is failed and fe try to recovery it using clone.
tablet_create_timeout_second
Default :1(s)
:
IsMutable true
MasterOnly :true
Maximal waiting time for creating a single replica.
eg.
if you create a table with #m tablets and #n replicas for each tablet,
the create table request will run at most (m n
tablet_create_timeout_second) before timeout.
tablet_delete_timeout_second
:2
Default
IsMutable:true
MasterOnly :true
Same meaning as tablet_create_timeout_second, but used when delete a tablet.
alter_table_timeout_second
:86400 * 30(1 month)
Default
IsMutable:true
MasterOnly :true
Maximal timeout of ALTER TABLE request. Set long enough to fit your table data size.
max_replica_count_when_schema_change
The maximum number of replicas allowed when OlapTable is doing schema changes. Too many replicas will lead to FE OOM.
Default: 100000
history_job_keep_max_second
Default : 7 * 24 * 3600 (7 day)
:
IsMutable true
MasterOnly:true
The max keep time of some kind of jobs. like schema change job and rollup job.
max_create_table_timeout_second
:60 (s)
Default
IsMutable:true
MasterOnly :true
In order not to wait too long for create table(index), set a max timeout.
External Table
file_scan_node_split_num
Default :128
:
IsMutable true
MasterOnly:false
file_scan_node_split_size
Default :256 1024 1024
:
IsMutable true
MasterOnly:false
enable_odbc_table
:false
Default
IsMutable:true
MasterOnly :true
Whether to enable the ODBC table, it is not enabled by default. You need to manually configure it when you use it.
Note: This parameter has been deleted in version 1.2. The ODBC External Table is enabled by default, and the ODBC External
Table will be deleted in a later version. It is recommended to use the JDBC External Table
disable_iceberg_hudi_table
:true
Default
IsMutable:true
MasterOnly :false
Starting from version 1.2, we no longer support create hudi and iceberg External Table. Please use the multi catalog.
iceberg_table_creation_interval_second
Default :10 (s)
:
IsMutable true
MasterOnly:false
iceberg_table_creation_strict_mode
Default :true
:
IsMutable true
MasterOnly :true
If set to TRUE, the column definitions of iceberg table and the doris table must be consistent
If set to FALSE, Doris only
creates columns of supported data types.
max_iceberg_table_creation_record_size
Default max number of recent iceberg database table creation record that can be stored in memory.
:2000
Default
IsMutable:true
MasterOnly :true
max_hive_partition_cache_num
The maximum number of caches for the hive partition.
Default: 100000
hive_metastore_client_timeout_second
Default: 10
max_external_file_cache_num
Maximum number of file cache to use for external external tables.
Default: 100000
max_external_schema_cache_num
Maximum number of schema cache to use for external external tables.
Default: 10000
external_cache_expire_time_minutes_after_access
Set how long the data in the cache expires after the last access. The unit is minutes.
Applies to External Schema Cache as well
as Hive Partition Cache.
Default: 1440
Is it possible to dynamically configure: false
es_state_sync_interval_second
Default :10
fe will call es api to get es index shard info every es_state_sync_interval_secs
External Resources
dpp_hadoop_client_path
Default :/lib/hadoop-client/hadoop/bin/hadoop
dpp_bytes_per_reduce
Default : 100 * 1024 * 1024L (100M)
dpp_default_cluster
Default :palo-dpp
dpp_default_config_str
Default :{
hadoop_configs :
'mapred.job.priority=NORMAL;mapred.job.map.capacity=50;mapred.job.reduce.capacity=50;mapred.hce.replace.streaming=
false;abaci.long.stored.job=true;dce.shuffle.enable=false;dfs.client.authserver.force_stop=true;dfs.client.auth.method=0'
}
dpp_config_str
Default :{
palo-dpp : {
hadoop_palo_path : '/dir',
hadoop_configs :
'fs.default.name=hdfs: //host:port;mapred.job.tracker=host:port;hadoop.job.ugi=user,password'
}
}
yarn_config_dir
Default :PaloFe.DORIS_HOME_DIR + "/lib/yarn-config"
Default yarn config file directory ,Each time before running the yarn command, we need to check that the config file exists
under this path, and if not, create them.
yarn_client_path
Default :DORIS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn"
Default yarn client path
spark_launcher_log_dir
Default : sys_log_dir + "/spark_launcher_log"
The specified spark launcher log dir
spark_resource_path
Default :none
Default spark dependencies path
spark_home_default_dir
Default :DORIS_HOME_DIR + "/lib/spark2x"
Default spark home dir
spark_dpp_version
Default :1.0.0
Default spark dpp version
Else
tmp_dir
custom_config_dir
Default :PaloFe.DORIS_HOME_DIR + "/conf"
Custom configuration file directory
Configure the location of the fe_custom.conf file. The default is in the conf/ directory.
In some deployment environments, the conf/ directory may be overwritten due to system upgrades. This will cause the
user modified configuration items to be overwritten. At this time, we can store fe_custom.conf in another specified
directory to prevent the configuration file from being overwritten.
plugin_dir
Default :DORIS_HOME + "/plugins
plugin install directory
plugin_enable
Default:true
:
IsMutable true
MasterOnly:true
small_file_dir
Default :DORIS_HOME_DIR/small_files
Save small files
max_small_file_size_bytes
:1M
Default
IsMutable:true
MasterOnly :true
The max size of a single file store in SmallFileMgr
max_small_file_number
:100
Default
IsMutable:true
MasterOnly :true
The max number of files store in SmallFileMgr
enable_metric_calculator
Default :true
If set to true, metric collector will be run as a daemon timer to collect metrics at fix interval
report_queue_size
Default : 100
:
IsMutable true
MasterOnly:true
This threshold is to avoid piling up too many report task in FE, which may cause OOM exception. In some large Doris cluster,
eg: 100 Backends with ten million replicas, a tablet report may cost several seconds after some modification of
metadata(drop partition, etc..). And one Backend will report tablets info every 1 min, so unlimited receiving reports is
unacceptable. we will optimize the processing speed of tablet report in future, but now, just discard the report if queue size
exceeding limit.
Some online time cost:
2. sk report: 0-1 ms
3. tablet report
backup_job_default_timeout_ms
:86400 * 1000 (1 day)
Default
IsMutable:true
MasterOnly :true
default timeout of backup job
max_backup_restore_job_num_per_db
Default: 10
This configuration is mainly used to control the number of backup/restore tasks recorded in each database.
enable_quantile_state_type
Default :false
:
IsMutable true
MasterOnly:true
:false
Default
IsMutable:true
MasterOnly :true
If set to TRUE, FE will convert date/datetime to datev2/datetimev2(0) automatically.
enable_decimal_conversion
:false
Default
IsMutable:true
MasterOnly :true
If set to TRUE, FE will convert DecimalV2 to DecimalV3 automatically.
proxy_auth_magic_prefix
Default :x@8
proxy_auth_enable
Default :false
enable_func_pushdown
:true
Default
IsMutable:true
MasterOnly :false
Whether to push the filter conditions with functions down to MYSQL, when exectue query of ODBC 、JDBC external tables
jdbc_drivers_dir
Default: ${DORIS_HOME}/jdbc_drivers ;
:
IsMutable false
MasterOnly :false
The default dir to put jdbc drivers.
max_error_tablet_of_broker_load
Default: 3;
:
IsMutable true
MasterOnly:true
default_db_max_running_txn_num
Default :-1
:
IsMutable true
MasterOnly :true
Used to set the default database transaction quota size.
(
View configuration
show data :
Detail HELP SHOW DATA )
prefer_compute_node_for_external_table
:false
Default
IsMutable:true
MasterOnly :false
If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is
controlled by min_backend_num_for_external_table .
If set to false, query on external table will assign to any node.
min_backend_num_for_external_table
Default :3
:
IsMutable true
MasterOnly:false
prefer_compute_node_for_external_table
Only take effect when is true. If the compute node number is less than this value,
query on external table will try to get some mix node to assign, to let the total number of node reach this value.
If the
compute node number is larger than this value, query on external table will assign to compute node only.
BE Configuration
This document mainly introduces the relevant configuration items of BE.
The BE configuration file be.conf is usually stored in the conf/ directory of the BE deployment path. In version 0.14, another
configuration file be_custom.conf will be introduced. The configuration file is used to record the configuration items that are
dynamically configured and persisted by the user during operation.
After the BE process is started, it will read the configuration items in be.conf first, and then read the configuration items in
be_custom.conf . The configuration items in be_custom.conf will overwrite the same configuration items in be.conf .
The location of the be_custom.conf file can be configured in be.conf through the custom_config_dir configuration item.
http://be_host:be_webserver_port/varz
1. Static configuration
Add and set configuration items in the conf/be.conf file. The configuration items in be.conf will be read when BE
be.conf
starts. Configuration items not in will use default values.
2. Dynamic configuration
After BE starts, the configuration items can be dynamically set with the following commands.
In version 0.13 and before, the configuration items modified in this way will become invalid after the BE process restarts.
In 0.14 and later versions, the modified configuration can be persisted through the following command. The modified
be_custom.conf
configuration items are stored in the file.
Examples
1. Modify max_base_compaction_threads statically
max_base_compaction_threads=5
After BE starts, the configuration item streaming_load_max_mb is dynamically set by the following command:
"status": "OK",
"msg": ""
The configuration will become invalid after the BE restarts. If you want to persist the modified results, use the following
command:
Configurations
Services
be_port
Type: int32
Description: The port of the thrift server on BE which used to receive requests from FE
Default value: 9060
heartbeat_service_port
Type: int32
Description: Heartbeat service port (thrift) on BE, used to receive heartbeat from FE
webserver_port
Type: int32
Description: Service port of http server on BE
Default value: 8040
brpc_port
Type: int32
Description: The port of BRPC on BE, used for communication between BEs
Default value: 8060
single_replica_load_brpc_port
Type: int32
Description: The port of BRPC on BE, used for single replica load. There is a independent BRPC thread pool for the
communication between the Master replica and Slave replica during single replica load, which prevents data
synchronization between the replicas from preempt the thread resources for data distribution and query tasks when the
load concurrency is large.
Description: The port of http for segment download on BE, used for single replica load. There is a independent HTTP
thread pool for the Slave replica to download segments during single replica load, which prevents data synchronization
between the replicas from preempt the thread resources for other http tasks when the load concurrency is large.
Default value: 8050
priority_networks
Description: Declare a selection strategy for those servers with many IPs. Note that at most one ip should match this list.
This is a semicolon-separated list in CIDR notation, such as 10.10.10.0/24. If there is no IP matching this rule, one will be
randomly selected
Default value: blank
storage_root_path
Type: string
Description: data root path, separate by ';'.you can specify the storage medium of each root path, HDD or SSD. you can
add capacity limit at the end of each root path, separate by ','.If the user does not use a mix of SSD and HDD disks, they do
not need to configure the configuration methods in Example 1 and Example 2 below, but only need to specify the storage
directory; they also do not need to modify the default storage media configuration of FE.
storage_root_path=/home/disk1/doris.HDD;/home/disk2/doris.SSD;/home/disk2/doris
eg.1:
eg.2: storage_root_path=/home/disk1/doris,medium:hdd;/home/disk2/doris,medium:ssd
,
1. /home/disk1/doris,medium:hdd indicates that the storage medium is HDD;
heartbeat_service_thread_count
Type: int32
Description: The number of threads that execute the heartbeat service on BE. the default is 1, it is not recommended to
modify
Default value: 1
ignore_broken_disk
Type: bool
ignore_broken_disk=true
If the path does not exist or the file (bad disk) cannot be read or written under the path, the path will be ignored. If there
are other available paths, the startup will not be interrupted.
ignore_broken_disk=false
If the path does not exist or the file (bad disk) cannot be read or written under the path, the system will abort the startup
failure and exit.
mem_limit
Type: string
Description: Limit the percentage of the server's maximum memory used by the BE process. It is used to prevent BE
memory from occupying to many the machine's memory. This parameter must be greater than 0. When the percentage
is greater than 100%, the value will default to 100%.
Default value: 80%
cluster_id
Type: int32
Description: Configure the cluster id to which the BE belongs.
This value is usually delivered by the FE to the BE by the heartbeat, no need to configure. When it is confirmed that a
BE belongs to a certain Drois cluster, it can be configured. The cluster_id file under the data directory needs to be
modified to make sure same as this parament.
Default value: - 1
custom_config_dir
Description: Configure the location of the be_custom.conf file. The default is in the conf/ directory.
In some deployment environments, the conf/ directory may be overwritten due to system upgrades. This will cause
the user modified configuration items to be overwritten. At this time, we can store be_custom.conf in another
specified directory to prevent the configuration file from being overwritten.
Default value: blank
trash_file_expire_time_sec
Description: The interval for cleaning the recycle bin is 72 hours. When the disk space is insufficient, the file retention
period under trash may not comply with this parameter
Default value: 259200
es_http_timeout_ms
Description: The timeout period for connecting to ES via http.
Default value: 5000 (ms)
es_scroll_keepalive
Description: es scroll Keeplive hold time
external_table_connect_timeout_sec
Type: int32
Description: The timeout when establishing connection with external table such as ODBC table.
Default value: 5 seconds
status_report_interval
brpc_max_body_size
Description: This configuration is mainly used to modify the parameter max_body_size of brpc.
Sometimes the query fails and an error message of body_size is too large will appear in the BE log. This may
happen when the SQL mode is "multi distinct + no group by + more than 1T of data".This error indicates that the
packet size of brpc exceeds the configured value. At this time, you can avoid this error by increasing the
configuration.
brpc_socket_max_unwritten_bytes
Description: This configuration is mainly used to modify the parameter socket_max_unwritten_bytes of brpc.
Sometimes the query fails and an error message of The server is overcrowded will appear in the BE log. This means
there are too many messages to buffer at the sender side, which may happen when the SQL needs to send large
bitmap value. You can avoid this error by increasing the configuration.
transfer_large_data_by_brpc
Type: bool
Description: This configuration is used to control whether to serialize the protoBuf request and embed the Tuple/Block
data into the controller attachment and send it through http brpc when the length of the Tuple/Block data is greater
than 1.8G. To avoid errors when the length of the protoBuf request exceeds 2G: Bad request, error_text=[E1003]Fail to
compress request. In the past version, after putting Tuple/Block data in the attachment, it was sent through the default
baidu_std brpc, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending through http
brpc.
Default value: true
brpc_num_threads
Description: This configuration is mainly used to modify the number of bthreads for brpc. The default value is set to -1,
which means the number of bthreads is #cpu-cores.
User can set this configuration to a larger value to get better QPS performance. For more information, please refer to
https://github.com/apache/incubator-brpc/blob/master/docs/cn/benchmark.md
Default value: -1
thrift_rpc_timeout_ms
Description: thrift default timeout time
Default value: 10000
thrift_client_retry_interval_ms
Type: int64
Description: Used to set retry interval for thrift client in be to avoid avalanche disaster in fe thrift server, the unit is ms.
thrift_connect_timeout_seconds
thrift_server_type_of_fe
Type: string
Description:This configuration indicates the service model used by FE's Thrift service. The type is string and is case-
insensitive. This parameter needs to be consistent with the setting of fe's thrift_server_type parameter. Currently there
are two values for this parameter, THREADED and THREAD_POOL .
If the parameter is THREADED , the model is a non-blocking I/O model.
txn_commit_rpc_timeout_ms
Description:txn submit rpc timeout
Default value: 10,000 (ms)
txn_map_shard_size
Description: txn_map_lock fragment size, the value is 2^n, n=0,1,2,3,4. This is an enhancement to improve the
performance of managing txn
Default value: 128
txn_shard_size
Description: txn_lock shard size, the value is 2^n, n=0,1,2,3,4, this is an enhancement function that can improve the
performance of submitting and publishing txn
Default value: 1024
unused_rowset_monitor_interval
Description: Time interval for clearing expired Rowset
Default value: 30 (s)
max_client_cache_size_per_host
Description: The maximum number of client caches per host. There are multiple client caches in BE, but currently we use
the same cache size configuration. If necessary, use different configurations to set up different client-side caches
Default value: 10
string_type_length_soft_limit_bytes
Type: int32
big_column_size_buffer
Type: int64
Description: When using the odbc external table, if a column type of the odbc source table is HLL, CHAR or VARCHAR,
and the length of the column value exceeds this value, the query will report an error 'column value length longer than
buffer length'. You can increase this value
Default value: 65535
small_column_size_buffer
Type: int64
Description: When using the odbc external table, if a column type of the odbc source table is not HLL, CHAR or
VARCHAR, and the length of the column value exceeds this value, the query will report an error 'column value length
longer than buffer length'. You can increase this value
Default value: 100
Query
fragment_pool_queue_size
Description: The upper limit of query requests that can be processed on a single node
Default value: 2048
fragment_pool_thread_num_min
Description: Query the number of threads. By default, the minimum number of threads is 64.
Default value: 64
fragment_pool_thread_num_max
Description: Follow up query requests create threads dynamically, with a maximum of 512 threads created.
Default value: 512
doris_max_pushdown_conjuncts_return_rate
Type: int32
Description: When BE performs HashJoin, it will adopt a dynamic partitioning method to push the join condition to
OlapScanner. When the data scanned by OlapScanner is larger than 32768 rows, BE will check the filter condition. If the
filter rate of the filter condition is lower than this configuration, Doris will stop using the dynamic partition clipping
condition for data filtering.
Default value: 90
doris_max_scan_key_num
Type: int
Description: Used to limit the maximum number of scan keys that a scan node can split in a query request. When a
conditional query request reaches the scan node, the scan node will try to split the conditions related to the key column
in the query condition into multiple scan key ranges. After that, these scan key ranges will be assigned to multiple
scanner threads for data scanning. A larger value usually means that more scanner threads can be used to increase the
parallelism of the scanning operation. However, in high concurrency scenarios, too many threads may bring greater
scheduling overhead and system load, and will slow down the query response speed. An empirical value is 50. This
configuration can be configured separately at the session level. For details, please refer to the description of
max_scan_key_num in Variables.
When the concurrency cannot be improved in high concurrency scenarios, try to reduce this value and observe the
impact.
Default value: 48
doris_scan_range_row_count
Type: int32
Description: When BE performs data scanning, it will split the same scanning range into multiple ScanRanges. This
parameter represents the scan data range of each ScanRange. This parameter can limit the time that a single
OlapScanner occupies the io thread.
Default value: 524288
doris_scanner_queue_size
Type: int32
Description: The length of the RowBatch buffer queue between TransferThread and OlapScanner. When Doris performs
data scanning, it is performed asynchronously. The Rowbatch scanned by OlapScanner will be placed in the scanner
buffer queue, waiting for the upper TransferThread to take it away.
doris_scanner_row_num
Description: The maximum number of data rows returned by each scanning thread in a single execution
doris_scanner_thread_pool_queue_size
Type: int32
Description: The queue length of the Scanner thread pool. In Doris' scanning tasks, each Scanner will be submitted as a
thread task to the thread pool waiting to be scheduled, and after the number of submitted tasks exceeds the length of
the thread pool queue, subsequent submitted tasks will be blocked until there is a empty slot in the queue.
Default value: 102400
doris_scanner_thread_pool_thread_num
Type: int32
Description: The number of threads in the Scanner thread pool. In Doris' scanning tasks, each Scanner will be submitted
as a thread task to the thread pool to be scheduled. This parameter determines the size of the Scanner thread pool.
Default value: 48
doris_max_remote_scanner_thread_pool_thread_num
Type: int32
Description: Max thread number of Remote scanner thread pool. Remote scanner thread pool is used for scan task of all
external data sources.
Default: 512
enable_prefetch
Type: bool
Description: When using PartitionedHashTable for aggregation and join calculations, whether to perform HashBuket
prefetch. Recommended to be set to true
Default value: true
enable_quadratic_probing
Type: bool
Description: When a Hash conflict occurs when using PartitionedHashTable, enable to use the square detection method
to resolve the Hash conflict. If the value is false, linear detection is used to resolve the Hash conflict. For the square
detection method, please refer to: quadratic_probing
Default value: true
exchg_node_buffer_size_bytes
Type: int32
Description: The size of the Buffer queue of the ExchangeNode node, in bytes. After the amount of data sent from the
Sender side is larger than the Buffer size of ExchangeNode, subsequent data sent will block until the Buffer frees up space
for writing.
Default value: 10485760
max_pushdown_conditions_per_column
Type: int
Description: Used to limit the maximum number of conditions that can be pushed down to the storage engine for a
single column in a query request. During the execution of the query plan, the filter conditions on some columns can be
pushed down to the storage engine, so that the index information in the storage engine can be used for data filtering,
reducing the amount of data that needs to be scanned by the query. Such as equivalent conditions, conditions in IN
predicates, etc. In most cases, this parameter only affects queries containing IN predicates. Such as WHERE colA IN
(1,2,3,4, ...) . A larger number means that more conditions in the IN predicate can be pushed to the storage engine,
but too many conditions may cause an increase in random reads, and in some cases may reduce query efficiency. This
configuration can be individually configured for session level. For details, please refer to the description of
max_pushdown_conditions_per_column
in Variables.
Example
The table structure is' id INT, col2 INT, col3 varchar (32),... '.
The query request is'WHERE id IN (v1, v2, v3, ...)
max_send_batch_parallelism_per_job
Type: int
Description: Max send batch parallelism for OlapTableSink. The value set by the user for send_batch_parallelism is not
allowed to exceed max_send_batch_parallelism_per_job , if exceed, the value of send_batch_parallelism would be
max_send_batch_parallelism_per_job .
Default value: 5
serialize_batch
Type: bool
Description: Whether the rpc communication between BEs serializes RowBatch for data transmission between query
layers
Default value: false
doris_scan_range_max_mb
Type: int32
Description: The maximum amount of data read by each OlapScanner.
Default value: 1024
compaction
disable_auto_compaction
Type: bool
enable_vectorized_compaction
Type: bool
enable_vertical_compaction
Type: bool
Description: Whether to enable vertical compaction
Default value: true
vertical_compaction_num_columns_per_group
Type: bool
vertical_compaction_max_row_source_memory_mb
Type: bool
Description: In vertical compaction, max memory usage for row_source_buffer
Default value: true
vertical_compaction_max_segment_size
Type: bool
Description: In vertical compaction, max dest segment file size
enable_ordered_data_compaction
Type: bool
Description: Whether to enable ordered data compaction
ordered_data_compaction_min_segment_size
Type: bool
Description: In ordered data compaction, min segment size for input rowset
Default value: true
max_base_compaction_threads
Type: int32
Description: The maximum of thread number in base compaction thread pool.
Default value: 4
generate_compaction_tasks_interval_ms
Description: Minimal interval (ms) to generate compaction tasks
base_compaction_min_rowset_num
Description: One of the triggering conditions of BaseCompaction: The limit of the number of Cumulative files to be
reached. After reaching this limit, BaseCompaction will be triggered
Default value: 5
base_compaction_min_data_ratio
Description: One of the trigger conditions of BaseCompaction: Cumulative file size reaches the proportion of Base file
Default value: 0.3 (30%)
total_permits_for_compaction_score
Type: int64
Description: The upper limit of "permits" held by all compaction tasks. This config can be set to limit memory
consumption for compaction.
Default value: 10000
Dynamically modifiable: Yes
compaction_promotion_size_mbytes
Type: int64
Description: The total disk size of the output rowset of cumulative compaction exceeds this configuration size, and the
rowset will be used for base compaction. The unit is m bytes.
Generally, if the configuration is less than 2G, in order to prevent the cumulative compression time from being too
long, resulting in the version backlog.
compaction_promotion_ratio
Type: double
Description: When the total disk size of the cumulative compaction output rowset exceeds the configuration ratio of the
base version rowset, the rowset will be used for base compaction.
Generally, it is recommended that the configuration should not be higher than 0.1 and lower than 0.02.
Default value: 0.05
compaction_promotion_min_size_mbytes
Type: int64
Description: If the total disk size of the output rowset of the cumulative compaction is lower than this configuration size,
the rowset will not undergo base compaction and is still in the cumulative compaction process. The unit is m bytes.
Generally, the configuration is within 512m. If the configuration is too large, the size of the early base version is too
small, and base compaction has not been performed.
Default value: 64
compaction_min_size_mbytes
Type: int64
Description: When the cumulative compaction is merged, the selected rowsets to be merged have a larger disk size than
this configuration, then they are divided and merged according to the level policy. When it is smaller than this
configuration, merge directly. The unit is m bytes.
Generally, the configuration is within 128m. Over configuration will cause more cumulative compaction write
amplification.
Default value: 64
default_rowset_type
Type: string
Description: Identifies the storage format selected by BE by default. The configurable parameters are: "ALPHA", "BETA".
Mainly play the following two roles
When the storage_format of the table is set to Default, select the storage format of BE through this configuration.
Select the storage format of when BE performing Compaction
Default value: BETA
cumulative_compaction_min_deltas
Description: Cumulative compaction strategy: the minimum number of incremental files
Default value: 5
cumulative_compaction_max_deltas
Description: Cumulative compaction strategy: the maximum number of incremental files
Default value: 1000
base_compaction_trace_threshold
Type: int32
Description: Threshold to logging base compaction's trace information, in seconds
Default value: 10
Base compaction is a long time cost background task, this configuration is the threshold to logging trace information. Trace
information in log file looks like:
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:26:33.513441 (+ 141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics:
{"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_seg
cumulative_compaction_trace_threshold
Type: int32
Description: Threshold to logging cumulative compaction's trace information, in seconds
Similar to base_compaction_trace_threshold .
Default value: 2
compaction_task_num_per_disk
Type: int32
Description: The number of compaction tasks which execute in parallel for a disk(HDD).
Default value: 2
compaction_task_num_per_fast_disk
Type: int32
Description: The number of compaction tasks which execute in parallel for a fast disk(SSD).
Default value: 4
cumulative_compaction_rounds_for_each_base_compaction_round
Type: int32
Description: How many rounds of cumulative compaction for each round of base compaction when compaction tasks
generation.
Default value: 9
cumulative_compaction_policy
Type: string
Description: Configure the merge strategy in the cumulative compression phase. Currently, two merge strategies are
the ordinary policy. Version merging can only be performed when the disk volume of the rowset is the same order of
magnitude. After merging, qualified rowsets are promoted to the base compaction stage. In the case of a large
number of small batch imports, it can reduce the write magnification of base compact, balance the read
magnification and space magnification, and reduce the data of file versions.
max_cumu_compaction_threads
Type: int32
Description: The maximum of thread number in cumulative compaction thread pool.
Default value: 10
enable_segcompaction
Type: bool
Description: Enable to use segment compaction during loading
Default value: false
segcompaction_threshold_segment_num
Type: int32
Description: Trigger segcompaction if the num of segments in a rowset exceeds this threshold
Default value: 10
segcompaction_small_threshold
Type: int32
Description: The segment whose row number above the threshold will be compacted during segcompaction
Default value: 1048576
disable_compaction_trace_log
Type: bool
Description: disable the trace log of compaction
If set to true, the cumulative_compaction_trace_threshold and base_compaction_trace_threshold won't work and
log is disabled.
Default value: true
Load
enable_stream_load_record
Type: bool
load_data_reserve_hours
Description: Used for mini load. The mini load data file will be deleted after this time
Default value: 4 (h)
push_worker_count_high_priority
Description: Import the number of threads for processing HIGH priority tasks
Default value: 3
push_worker_count_normal_priority
Description: Import the number of threads for processing NORMAL priority tasks
Default value: 3
load_error_log_reserve_hours
Description: The load error log will be deleted after this time
load_process_max_memory_limit_percent
Description: The percentage of the upper memory limit occupied by all imported threads on a single node, the default is
50%
Set these default values very large, because we don't want to affect load performance when users upgrade Doris. If
necessary, the user should set these configurations correctly
Default value: 50 (%)
load_process_soft_mem_limit_percent
Description: The soft limit refers to the proportion of the load memory limit of a single node. For example, the load
memory limit for all load tasks is 20GB, and the soft limit defaults to 50% of this value, that is, 10GB. When the load
memory usage exceeds the soft limit, the job with the largest memory consuption will be selected to be flushed to
release the memory space, the default is 50%
routine_load_thread_pool_size
Description: The thread pool size of the routine load task. This should be greater than the FE
configuration'max_concurrent_task_num_per_be'
Default value: 10
single_replica_load_brpc_num_threads
Type: int32
Description: This configuration is mainly used to modify the number of bthreads for single replica load brpc. When the
load concurrency increases, you can adjust this parameter to ensure that the Slave replica synchronizes data files from
the Master replica timely.
Default value: 64
single_replica_load_download_num_workers
Type: int32
Description:This configuration is mainly used to modify the number of http threads for segment download, used for
single replica load. When the load concurrency increases, you can adjust this parameter to ensure that the Slave replica
synchronizes data files from the Master replica timely.
Default value: 64
slave_replica_writer_rpc_timeout_sec
Type: int32
Description: This configuration is mainly used to modify timeout of brpc between master replica and slave replica, used
for single replica load.
Default value: 60
max_segment_num_per_rowset
Type: int32
Description: Used to limit the number of segments in the newly generated rowset when importing. If the threshold is
exceeded, the import will fail with error -238. Too many segments will cause compaction to take up a lot of memory and
cause OOM errors.
Default value: 200
high_priority_flush_thread_num_per_store
Type: int32
Description: The number of flush threads per store path allocated for the high priority import task.
Default value: 1
routine_load_consumer_pool_size
Type: int32
Description: The number of caches for the data consumer used by the routine load.
Default value: 10
load_task_high_priority_threshold_second
Type: int32
Description: When the timeout of an import task is less than this threshold, Doris will consider it to be a high priority task.
High priority tasks use a separate pool of flush threads.
Default: 120
min_load_rpc_timeout_ms
Type: int32
Description: The minimum timeout for each rpc in the load job.
Default: 20
kafka_api_version_request
Type: bool
Description: If the dependent Kafka version is lower than 0.10.0.0, this value should be set to false.
Default: true
kafka_broker_version_fallback
Description: If the dependent Kafka version is lower than 0.10.0.0, the value set by the fallback version
kafka_broker_version_fallback will be used if the value of kafka_api_version_request is set to false, and the valid values are:
0.9.0.x, 0.8.x.y.
Default: 0.10.0
max_consumer_num_per_group
Description: The maximum number of consumers in a data consumer group, used for routine load
Default: 3
streaming_load_max_mb
Type: int64
Description: Used to limit the maximum amount of csv data allowed in one Stream load.
Stream Load is generally suitable for loading data less than a few GB, not suitable for loading` too large data.
Default value: 10240 (MB)
streaming_load_json_max_mb
Type: int64
Description: it is used to limit the maximum amount of json data allowed in one Stream load. The unit is MB.
Some data formats, such as JSON, cannot be split. Doris must read all the data into the memory before parsing can
begin. Therefore, this value is used to limit the maximum amount of data that can be loaded in a single Stream load.
Default value: 100
Dynamically modifiable: Yes
Thread
delete_worker_count
Description: Number of threads performing data deletion tasks
Default value: 3
clear_transaction_task_worker_count
Description: Number of threads used to clean up transactions
Default value: 1
clone_worker_count
be_service_threads
Type: int32
Description: The number of execution threads of the thrift server service on BE which represents the number of threads
that can be used to execute FE requests.
Default value: 64
download_worker_count
Description: The number of download threads.
Default value: 1
drop_tablet_worker_count
Description: Number of threads to delete tablet
Default value: 3
flush_thread_num_per_store
Description: The number of threads used to refresh the memory table per store
Default value: 2
num_threads_per_core
Description: Control the number of threads that each core runs. Usually choose 2 times or 3 times the number of cores.
This keeps the core busy without causing excessive jitter
Default value: 3
num_threads_per_disk
Description: The maximum number of threads per disk is also the maximum queue depth of each disk
Default value: 0
number_slave_replica_download_threads
Description: Number of threads for slave replica synchronize data, used for single replica load.
Default value: 64
publish_version_worker_count
Description: the count of thread to publish version
Default value: 8
upload_worker_count
Description: Maximum number of threads for uploading files
Default value: 1
webserver_num_workers
send_batch_thread_pool_thread_num
Type: int32
Description: The number of threads in the SendBatch thread pool. In NodeChannels' sending data tasks, the SendBatch
operation of each NodeChannel will be submitted as a thread task to the thread pool to be scheduled. This parameter
determines the size of the SendBatch thread pool.
Default value: 64
send_batch_thread_pool_queue_size
Type: int32
Description: The queue length of the SendBatch thread pool. In NodeChannels' sending data tasks, the SendBatch
operation of each NodeChannel will be submitted as a thread task to the thread pool waiting to be scheduled, and after
the number of submitted tasks exceeds the length of the thread pool queue, subsequent submitted tasks will be blocked
until there is a empty slot in the queue.
Default value: 102400
make_snapshot_worker_count
Description: Number of threads making snapshots
Default value: 5
release_snapshot_worker_count
Description: Number of threads releasing snapshots
Default value: 5
Memory
disable_mem_pools
Type: bool
buffer_pool_clean_pages_limit
Description: Clean up pages that may be saved by the buffer pool
Default value: 50%
buffer_pool_limit
Type: string
Description: The largest allocatable memory of the buffer pool
The maximum amount of memory available in the BE buffer pool. The buffer pool is a new memory management
structure of BE, which manages the memory by the buffer page and enables spill data to disk. The memory for all
concurrent queries will be allocated from the buffer pool. The current buffer pool only works on AggregationNode
and ExchangeNode.
Default value: 20%
chunk_reserved_bytes_limit
Description: The reserved bytes limit of Chunk Allocator, usually set as a percentage of mem_limit. defaults to bytes if no
unit is given, the number of bytes must be a multiple of 2. must larger than 0. and if larger than physical memory size, it
will be set to physical memory size. increase this variable can improve performance, but will acquire more free memory
which can not be used by other modules.
Default value: 20%
madvise_huge_pages
Type: bool
max_memory_sink_batch_count
Description: The maximum external scan cache batch count, which means that the cache
max_memory_cache_batch_count batch_size row, the default is 20, and the default value of batch_size is 1024, which
means that 20 1024 rows will be cached
Default value: 20
memory_limitation_per_thread_for_schema_change
Description: The maximum memory allowed for a single schema change task
Default value: 2 (GB)
memory_max_alignment
Description: Maximum alignment memory
Default value: 16
mmap_buffers
Description: Whether to use mmap to allocate memory
Default value: false
download_cache_buffer_size
Type: int64
Description: The size of the buffer used to receive data when downloading the cache.
Default value: 10485760
zone_map_row_num_threshold
Type: int32
Description: If the number of rows in a page is less than this value, no zonemap will be created to reduce data expansion
Default value: 20
enable_tcmalloc_hook
Type: bool
Description: Whether Hook TCmalloc new/delete, currently consume/release tls mem tracker in Hook.
Default value: true
memory_mode
Type: string
Description: Control gc of tcmalloc, in performance mode doirs releases memory of tcmalloc cache when usgae >= 90%
mem_limit, otherwise, doris releases memory of tcmalloc cache when usage >= 50% mem_limit;
Default value: performance
max_sys_mem_available_low_water_mark_bytes
Type: int64
Description: The maximum low water mark of the system /proc/meminfo/MemAvailable , Unit byte, default 1.6G, actual
low water mark=min(1.6G, MemTotal * 10%), avoid wasting too much memory on machines with large memory larger
than 16G. Turn up max. On machines with more than 16G memory, more memory buffers will be reserved for Full GC.
Turn down max. will use as much memory as possible.
Default value: 1717986918
memory_limitation_per_thread_for_schema_change_bytes
mem_tracker_consume_min_size_bytes
Type: int32
Description: The minimum length of TCMalloc Hook when consume/release MemTracker. Consume size smaller than this
value will continue to accumulate to avoid frequent calls to consume/release of MemTracker. Decreasing this value will
increase the frequency of consume/release. Increasing this value will cause MemTracker statistics to be inaccurate.
Theoretically, the statistical value of a MemTracker differs from the true value = ( mem_tracker_consume_min_size_bytes
* the number of BE threads where the MemTracker is located).
Default value: 1,048,576
cache_clean_interval
Description: File handle cache cleaning interval, used to clean up file handles that have not been used for a long time.Also
the clean interval of Segment Cache.
Default value: 1800 (s)
min_buffer_size
Description: Minimum read buffer size
Default value: 1024 (byte)
write_buffer_size
Description: The size of the buffer before flashing
Imported data is first written to a memory block on the BE, and only written back to disk when this memory block
reaches the threshold. The default size is 100MB. too small a threshold may result in a large number of small files on
the BE. This threshold can be increased to reduce the number of files. However, too large a threshold may cause RPC
timeouts
Default value: 104,857,600
remote_storage_read_buffer_mb
Type: int32
Description: The cache size used when reading files on hdfs or object storage.
Increasing this value can reduce the number of calls to read remote data, but it will increase memory overhead.
Default value: 16MB
segment_cache_capacity
Type: int32
Description: The maximum number of Segments cached by Segment Cache.
The default value is currently only an empirical value, and may need to be modified according to actual scenarios.
Increasing this value can cache more segments and avoid some IO. Decreasing this value will reduce memory usage.
Default value: 1000000
file_cache_type
Type: string
Description: Type of cache file. whole_file_cache : download the entire segment file, sub_file_cache : the segment file is
divided into multiple files by size. if set " ", no cache, please set this parameter when caching is required.
file_cache_alive_time_sec
Type: int64
file_cache_max_size_per_disk
Type: int64
Description: The cache occupies the disk size. Once this setting is exceeded, the cache that has not been accessed for the
longest time will be deleted. If it is 0, the size is not limited. unit is bytes.
Default value: 0
max_sub_cache_file_size
Type: int64
Description: Cache files using sub file The maximum size of the split file during cache
download_cache_thread_pool_thread_num
Type: int32
Description: The number of threads in the DownloadCache thread pool. In the download cache task of FileCache, the
download cache operation will be submitted to the thread pool as a thread task and wait to be scheduled. This parameter
determines the size of the DownloadCache thread pool.
Default value: 48
download_cache_thread_pool_queue_size
Type: int32
Description: The number of threads in the DownloadCache thread pool. In the download cache task of FileCache, the
download cache operation will be submitted to the thread pool as a thread task and wait to be scheduled. After the
number of submitted tasks exceeds the length of the thread pool queue, subsequent submitted tasks will be blocked
until there is a empty slot in the queue.
Default value: 102400
generate_cache_cleaner_task_interval_sec
:
Type int64
:
Description Cleaning interval of cache files, in seconds
:
Description Whether to enable the recycle scan data thread check
Default :true
path_gc_check_interval_second
:
Description Recycle scan data thread check interval
Default :86400 (s)
path_gc_check_step
Default :1000
path_gc_check_step_interval_ms
Default :10 (ms)
path_scan_interval_second
Default :86400
scan_context_gc_interval_min
:
Description This configuration is used for the context gc thread scheduling cycle. Note: The unit is minutes, and the
default is 5 minutes
Default :5
Storage
default_num_rows_per_column_file_block
Type: int32
Description: Configure how many rows of data are contained in a single RowBlock.
disable_storage_page_cache
Type: bool
Description: Disable to use page cache for index caching, this configuration only takes effect in BETA storage format,
usually it is recommended to false
Default value: false
disk_stat_monitor_interval
Description: Disk status check interval
Default value: 5 (s)
max_free_io_buffers
Description: For each io buffer size, the maximum number of buffers that IoMgr will reserve ranges from 1024B to 8MB
buffers, up to about 2GB buffers.
Default value: 128
max_garbage_sweep_interval
Description: The maximum interval for disk garbage cleaning
Default value: 3600 (s)
max_percentage_of_error_disk
Type: int32
Description: The storage engine allows the percentage of damaged hard disks to exist. After the damaged hard disk
exceeds the changed ratio, BE will automatically exit.
Default value: 0
read_size
Description: The read size is the read size sent to the os. There is a trade-off between latency and the whole process,
getting to keep the disk busy but not introducing seeks. For 8 MB reads, random io and sequential io have similar
performance.
min_garbage_sweep_interval
Description: The minimum interval between disk garbage cleaning
pprof_profile_dir
Description: pprof profile save directory
small_file_dir
Description: 用于保存 SmallFileMgr 下载的文件的目录
Default value: ${DORIS_HOME}/lib/small_file/
user_function_dir
storage_flood_stage_left_capacity_bytes
storage_flood_stage_usage_percent
Description: The storage_flood_stage_usage_percent and storage_flood_stage_left_capacity_bytes configurations limit
the maximum usage of the capacity of the data directory.
Default value: 90 (90%)
storage_medium_migrate_count
Description: the count of thread to clone
Default value: 1
storage_page_cache_limit
Description: Cache for storage page size
Default value: 20%
storage_page_cache_shard_size
Description: Shard size of StoragePageCache, the value must be power of two. It's recommended to set it to a value close
to the number of BE cores in order to reduce lock contentions.
Default value: 16
index_page_cache_percentage
Type: int32
Description: Index page cache as a percentage of total storage page cache, value range is [0, 100]
Default value: 10
storage_strict_check_incompatible_old_format
Type: bool
Description: Used to check incompatible old format strictly
sync_tablet_meta
Description: Whether the storage engine opens sync and keeps it to the disk
Default value: false
pending_data_expire_time_sec
Description: The maximum duration of unvalidated data retained by the storage engine
Default value: 1800 (s)
ignore_rowset_stale_unconsistent_delete
Type: boolean
Description:It is used to decide whether to delete the outdated merged rowset if it cannot form a consistent version
path.
The merged expired rowset version path will be deleted after half an hour. In abnormal situations, deleting these
versions will result in the problem that the consistent path of the query cannot be constructed. When the
configuration is false, the program check is strict and the program will directly report an error and exit.When
configured as true, the program will run normally and ignore this error. In general, ignoring this error will not affect
the query, only when the merged version is dispatched by fe, -230 error will appear.
Default value: false
create_tablet_worker_count
Description: Number of worker threads for BE to create a tablet
Default value: 3
check_consistency_worker_count
Description: The number of worker threads to calculate the checksum of the tablet
Default value: 1
max_tablet_version_num
Type: int
Description: Limit the number of versions of a single tablet. It is used to prevent a large number of version accumulation
problems caused by too frequent import or untimely compaction. When the limit is exceeded, the import task will be
rejected.
Default value: 500
number_tablet_writer_threads
Description: Number of tablet write threads
Default value: 16
tablet_map_shard_size
Description: tablet_map_lock fragment size, the value is 2^n, n=0,1,2,3,4, this is for better tablet management
Default value: 1
tablet_meta_checkpoint_min_interval_secs
Description: TabletMeta Checkpoint 线程轮询的时间间隔
Default value: 600 (s)
tablet_meta_checkpoint_min_new_rowsets_num
Description: The minimum number of Rowsets for storing TabletMeta Checkpoints
Default value: 10
tablet_stat_cache_update_interval_second
tablet_rowset_stale_sweep_time_sec
Type: int64
Description: It is used to control the expiration time of cleaning up the merged rowset version. When the current time
now() minus the max created rowset‘s create time in a version path is greater than tablet_rowset_stale_sweep_time_sec,
the current path is cleaned up and these merged rowsets are deleted, the unit is second.
When writing is too frequent and the disk time is insufficient, you can configure less
tablet_rowset_stale_sweep_time_sec. However, if this time is less than 5 minutes, it may cause fe to query the version
that has been merged, causing a query -230 error.
Default value: 1800
tablet_writer_open_rpc_timeout_sec
Description: Update interval of tablet state cache
The RPC timeout for sending a Batch (1024 lines) during import. The default is 60 seconds. Since this RPC may
involve writing multiple batches of memory, the RPC timeout may be caused by writing batches, so this timeout can
be adjusted to reduce timeout errors (such as send batch fail errors). Also, if you increase the write_buffer_size
configuration, you need to increase this parameter as well.
Default value: 60
tablet_writer_ignore_eovercrowded
Type: bool
Description: Used to ignore brpc error '[E1011]The server is overcrowded' when writing data.
Default value: false
streaming_load_rpc_max_alive_time_sec
Description: The lifetime of TabletsChannel. If the channel does not receive any data at this time, the channel will be
deleted.
Default value: 1200
alter_tablet_worker_count
Description: The number of threads making schema changes
Default value: 3
ignore_load_tablet_failure
Type: bool
Description: It is used to decide whether to ignore errors and continue to start be in case of tablet loading failure
When BE starts, a separate thread will be started for each data directory to load the meta information of the tablet header. In
the default configuration, if a data directory fails to load a tablet, the startup process will terminate. At the same time, it will
be displayed in the be The following error message is seen in the INFO log:
load tablets from header failed, failed tablets size: xxx, path=xxx
Indicates how many tablets failed to load in the data directory. At the same time, the log will also contain specific
information about the tablets that failed to load. At this time, manual intervention is required to troubleshoot the cause of
the error. After troubleshooting, there are usually two ways to restore:
The tablet information cannot be repaired. If the other copies are normal, you can delete the wrong tablet with the
meta_tool tool.
Set ignore_load_tablet_failure to true, BE will ignore these faulty tablets and start normally
report_disk_state_interval_seconds
Description: The interval time for the agent to report the disk status to FE
Default value: 60 (s)
result_buffer_cancelled_interval_time
snapshot_expire_time_sec
Description: Snapshot file cleaning interval.
Default value:172800 (48 hours)
compress_rowbatches
Type: bool
Description: enable to use Snappy compression algorithm for data compression when serializing RowBatch
Default value: true
Since Version 1.2
jvm_max_heap_size
Type: string
-Xmx
Description: The maximum size of JVM heap memory used by BE, which is the parameter of JVM
Default value: 1024M
Log
sys_log_dir
Type: string
Description: Storage directory of BE log data
sys_log_level
Description: Log Level: INFO < WARNING < ERROR < FATAL
sys_log_roll_mode
Description: The size of the log split, one log file is split every 1G
Default value: SIZE-MB-1024
sys_log_roll_num
sys_log_verbose_level
Description: Log display level, used to control the log output at the beginning of VLOG in the code
Default value: 10
sys_log_verbose_modules
Description: Log printing module, writing olap will only print the log under the olap module
Default value: 空
aws_log_level
Type: int32
Description: log level of AWS SDK,
Off = 0,
Fatal = 1,
Error = 2,
Warn = 3,
Info = 4,
Debug = 5,
Trace = 6
Default value: 3
log_buffer_level
Description: The log flushing strategy is kept in memory by default
Default value: 空
Else
report_tablet_interval_seconds
Description: The interval time for the agent to report the olap table to the FE
Default value: 60 (s)
report_task_interval_seconds
Description: The interval time for the agent to report the task signature to FE
Default value: 10 (s)
periodic_counter_update_period_ms
Description: Update rate counter and sampling counter cycle
Default value: 500 (ms)
enable_metric_calculator
Description: If set to true, the metric calculator will run to collect BE-related indicator information, if set to false, it will not
run
enable_system_metrics
Description: User control to turn on and off system indicators.
enable_token_check
Description: Used for forward compatibility, will be removed later.
max_runnings_transactions_per_txn_map
Description: Max number of txns for every txn_partition_map in txn manager, this is a self protection to avoid too many
txns saving in manager
max_download_speed_kbps
Description: Maximum download speed limit
download_low_speed_limit_kbps
doris_cgroups
Description: Cgroups assigned to doris
Default value: empty
row_nums_check
Description: Check row nums for BE/CE and schema change
Default value: true
priority_queue_remaining_tasks_increased_frequency
Description: the increased frequency of priority for remaining tasks in BlockingPriorityQueue
Default value: 512
Description: Whether parse multidimensional array, if false encountering will return ERROR
Default value: true
enable_simdjson_reader
Description: Whether enable simdjson to parse json while stream load
Default value: false
User Property
This document mainly introduces related configuration items at the User level. The configuration of the User level is mainly
effective for a single user. Each user can set their own User property. Does not affect each other.
The specific syntax can be queried through the command: help show property; .
The specific syntax can be queried through the command: help set property; .
User-level configuration items will only take effect for the specified users, and will not affect the configuration of other users.
Application examples
1. Modify the max_user_connections of user Billie
Use SET PROPERTY FOR 'Billie' 'max_user_connections' = '200'; to modify the current maximum number of
connections for Billie users to 200.
max_user_connections
The maximum number of user connections, the default value is 100 In general, this parameter does not need to be
changed unless the number of concurrent queries exceeds the default value.
max_query_instances
The maximum number of instances that the user can use at a certain point in time, The default value is -1,
negative number means use default_max_query_instances config.
resource
quota
default_load_cluster
load_cluster
Authority Management
Doris's new privilege management system refers to Mysql's privilege management mechanism, achieves table-level fine-
grained privilege control, role-based privilege access control, and supports whitelist mechanism.
Noun Interpretation
1. user_identity
In a permission system, a user is identified as a User Identity. User ID consists of two parts: username and userhost.
Username is a user name, which is composed of English upper and lower case. Userhost represents the IP from which
the user link comes. User_identity is presented as username@'userhost', representing the username from userhost.
Another expression of user_identity is username@['domain'], where domain is the domain name, which can be resolved
into a set of IPS by DNS . The final expression is a set of username@'userhost', so we use username@'userhost'to
represent it.
2. Privilege
The objects of permissions are nodes, catalogs, databases or tables. Different permissions represent different operating
permissions.
3. Role
Doris can create custom named roles. Roles can be seen as a set of permissions. When a newly created user can be
assigned a role, the role's permissions are automatically granted. Subsequent changes in the role's permissions will also
be reflected in all user permissions that belong to the role.
4. user_property
User attributes are directly attached to a user, not to a user identity. That is, both cmy@' 192.%'and cmy@['domain'] have
the same set of user attributes, which belong to user cmy, not cmy@' 192.%' or cmy@['domain'].
User attributes include, but are not limited to, the maximum number of user connections, import cluster configuration,
and so on.
Supported operations
1. Create users: CREATE USER
2. Delete users: DROP USER
3. Authorization: GRANT
4. Withdrawal: REVOKE
Permission type
Doris currently supports the following permissions
1. Node_priv
Nodes change permissions. Including FE, BE, BROKER node addition, deletion, offline operations. Currently, this
permission can only be granted to Root users.
Users who have both Grant_priv and Node_priv can grant this privilege to other users.
2. Grant_priv
Permissions change permissions. Allow the execution of operations including authorization, revocation,
add/delete/change user/role, etc.
However, a user with this permission can not grant node_priv permission to other users, unless the user itself has
node_priv permission.
3. Select_priv
4. Load_priv
Write permissions to databases and tables. Including Load, Insert, Delete and so on.
5. Alter_priv
Change permissions on databases and tables. It includes renaming libraries/tables, adding/deleting/changing columns,
and adding/deleting partitions.
6. Create_priv
7. Drop_priv
Permission hierarchy
At the same time, according to the scope of application of permissions, we divide them into four levels:
1. GLOBAL LEVEL: Global permissions. That is, permissions on *.*.* granted by GRANT statements. The granted
permissions apply to any table in any database.
2. CATALOG LEVEL: Catalog level permissions. That is, the permissions on ctl.*.* granted through the GRANT statement.
The permissions granted apply to any library table in the specified Catalog.
ctl.db.*
3. DATABASE LEVEL: Database-level permissions. That is, the permissions on granted through the GRANT
statement. The privileges granted apply to any table in the specified database.
4. TABLE LEVEL: Table-level permissions. That is, the permissions on ctl.db.tbl granted through the GRANT statement.
The privileges granted apply to the specified table in the specified database.
ADMIN /GRANT
ADMIN_PRIV and GRANT_PRIV have the authority of "grant authority" at the same time, which is more special. The
operations related to these two privileges are described here one by one.
1. CREATE USER
Users with ADMIN privileges, or GRANT privileges at the GLOBAL and DATABASE levels can create new users.
2. DROP USER
Users with ADMIN authority or GRANT authority at the global level can drop users.
3. CREATE/DROP ROLE
Users with ADMIN authority or GRANT authority at the global level can create or drop role.
4. GRANT /REVOKE
Users with ADMIN or GLOBAL GRANT privileges can grant or revoke the privileges of any user.
Users with GRANT privileges at the DATABASE level can grant or revoke the privileges of any user on the specified
database.
Users with GRANT privileges at TABLE level can grant or revoke the privileges of any user on the specified tables in
the specified database.
5. SET PASSWORD
Users with ADMIN or GLOBAL GRANT privileges can set any user's password.
Ordinary users can set their corresponding User Identity password. The corresponding User Identity can be viewed
by SELECT CURRENT_USER(); command.
Users with GRANT privileges at non-GLOBAL level cannot set the password of existing users, but can only specify the
password when creating users.
Some explanations
1. When Doris initializes, the following users and roles are automatically created:
i. Operator role: This role has Node_priv and Admin_priv, i.e. all permissions for Doris. In a subsequent upgrade version,
we may restrict the role's permissions to Node_priv, which is to grant only node change permissions. To meet some
cloud deployment requirements.
ii. admin role: This role has Admin_priv, which is all permissions except for node changes.
iii. root@'%': root user, which allows login from any node, with the role of operator.
iv. admin@'%': admin user, allowing login from any node, role admin.
2. It is not supported to delete or change the permissions of default created roles or users.
3. The user of the operator role has one and only one user, that is, root. Users of admin roles can create multiple.
And authorize:
The permissions of CMY @'ip1'will be changed to SELECT_PRIV, ALTER_PRIV. And when we change the permissions
of cmy@['domain'] again, cmy@'ip1' will not follow.
In priority,' 192.%'takes precedence over'%', so when user CMY tries to login Doris with password ' 12345' from
192.168.1.1, the machine will be rejected.
5. Forget passwords
If you forget your password and cannot log in to Doris, you can log in to Doris without a password using the following
command on the machine where the Doris FE node is located:
After login, the password can be reset through the SET PASSWORD command.
6. No user can reset the password of the root user except the root user himself.
8. Having GRANT_PRIV at GLOBAL level is actually equivalent to having ADMIN_PRIV, because GRANT_PRIV at this level
has the right to grant arbitrary permissions, please use it carefully.
Users can view current_user and user respectively by SELECT current_user(); and SELECT user(); . Where
current_user user
indicates which identity the current user is passing through the authentication system, and is the
user's current actual user_identity .
For example, suppose the user user1@'192.%' is created, and then a user user1 from 192.168.10.1 is logged into the system.
At this time, current_user is user1@'192.%' , and user is user1@'192.168.10.1' .
All privileges are given to a current_user , and the real user has all the privileges of the corresponding current_user .
In version 1.2, the verification function of user password strength has been added. This feature is controlled by the global
variable validate_password_policy . Defaults to NONE/0 , i.e. password strength is not checked. If set to STRONG/2 , the
password must contain 3 items of "uppercase letters", "lowercase letters", "numbers" and "special characters", and the
length must be greater than or equal to 8.
Best Practices
Here are some usage scenarios of Doris privilege system.
1. Scene 1
The users of Doris cluster are divided into Admin, RD and Client. Administrators have all the rights of the whole cluster,
mainly responsible for cluster building, node management and so on. The development engineer is responsible for
business modeling, including database building, data import and modification. Users access different databases and
tables to get data.
In this scenario, ADMIN or GRANT privileges can be granted to administrators. Give RD CREATE, DROP, ALTER, LOAD,
SELECT permissions to any or specified database tables. Give Client SELECT permission to any or specified database
table. At the same time, it can also simplify the authorization of multiple users by creating different roles.
2. Scene 2
There are multiple services in a cluster, and each business may use one or more data. Each business needs to manage its
own users. In this scenario. Administrator users can create a user with GRANT privileges at the DATABASE level for each
database. The user can only authorize the specified database for the user.
3. Blacklist
Doris itself does not support blacklist, only whitelist, but we can simulate blacklist in some way. Suppose you first create
a user named user@'192.%' , which allows users from 192.* to login. At this time, if you want to prohibit users from
192.168.10.1 from logging in, you can create another user with cmy@'192.168.10.1' and set a new password. Since
192.168.10.1 has a higher priority than 192.% , user can no longer login by using the old password from 192.168.10.1 .
More help
For more detailed syntax and best practices for permission management use, please refer to the GRANTS command manual.
Enter HELP GRANTS at the command line of the MySql client for more help information.
LDAP
Access to third-party LDAP services to provide authentication login and group authorization services for Doris.
LDAP authentication login complements Doris authentication login by accessing the LDAP service for password
authentication; Doris uses LDAP to authenticate the user's password first; if the user does not exist in the LDAP service, it
continues to use Doris to authenticate the password; if the LDAP password is correct but there is no corresponding account
in Doris, a temporary user is created to log in to Doris.
LDAP group authorization, is to map the group in LDAP to the Role in Doris, if the user belongs to multiple user groups in
LDAP, after logging into Doris the user will get the permission of all groups corresponding to the Role, requiring the group
name to be the same as the Role name.
Noun Interpretation
LDAP: Lightweight directory access protocol that enables centralized management of account passwords.
Privilege: Permissions act on nodes, databases or tables. Different permissions represent different permission to operate.
Role: Doris can create custom named roles. A role can be thought of as a collection of permissions.
Server-side Configuration
You need to configure the LDAP basic information in the fe/conf/ldap.conf file, and the LDAP administrator password needs
to be set using sql statements.
ldap_host = 127.0.0.1
LDAP service ip.
ldap_port = 389
LDAP service port, the default plaintext transfer port is 389, currently Doris' LDAP function only supports plaintext
password transfer.
ldap_admin_name = cn=admin,dc=domain,dc=com
LDAP administrator account "Distinguished Name". When a user logs into Doris using LDAP authentication, Doris will
bind the administrator account to search for user information in LDAP.
ldap_user_basedn = ou=people,dc=domain,dc=com
Doris base dn when searching for user information in LDAP.
ldap_user_filter = (&(uid={login}))
For Doris' filtering criteria when searching for user information in LDAP, the placeholder "{login}" will be replaced with
the login username. You must ensure that the user searched by this filter is unique, otherwise Doris will not be able to
verify the password through LDAP and the error message "ERROR 5081 (42000): user is not unique in LDAP server." will
appear when logging in.
For example, if you use the LDAP user node uid attribute as the username to log into Doris, you can configure it as:
;
ldap_user_filter = (&(uid={login}))
This item can be configured using the LDAP user mailbox prefix as the user name:
ldap_user_filter = (&(mail={login}@baidu.com)) 。
ldap_group_basedn = ou=group,dc=domain,dc=com
base dn when Doris searches for group information in LDAP. if this
item is not configured, LDAP group authorization will not be enabled.
Client-side configuration
Client-side LDAP authentication requires the mysql client-side explicit authentication plugin to be enabled. Logging into
Doris using the command line enables the mysql explicit authentication plugin in one of two ways.
LDAP User Doris User Password Login Status Login to Doris users
After LDAP is enabled, when a user logs in using mysql client, Doris will first verify the user's password through the LDAP
service, and if the LDAP user exists and the password is correct, Doris will use the user to log in; at this time, if the
corresponding account exists, Doris will directly log in to the account, and if the corresponding account does not exist, it will
create a temporary account for the user and log in to the account. The temporary account has the appropriate pair of
permissions (see LDAP Group Authorization) and is only valid for the current connection. doris does not create the user and
does not generate metadata for creating the user pair.
If no login user exists in the LDAP service, Doris is used for password authentication.
The following assumes that LDAP authentication is enabled, ldap_user_filter = (&(uid={login})) is configured, and all other
configuration items are correct, and the client sets the environment variable LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1
For example:
2:The user exists in LDAP and the corresponding account does not exist in Doris.
LDAP user node presence attribute: uid: jack User password: abcdef
Use the following command to create a temporary user and log in to jack@'%', the temporary user has basic privileges
DatabasePrivs: Select_priv, Doris will delete the temporary user after the user logs out and logs in:
Login user Privileges are related to Doris user and group Privileges, as shown in the following table:
|LDAP Users|Doris Users|Login User Privileges|
|--|--|--|
|exist|exist|LDAP group Privileges + Doris user Privileges|
|Does not
exist|Exists|Doris user Privileges|
|exist|non-exist|LDAP group Privileges|
If the logged-in user is a temporary user and no group permission exists, the user has the select_priv permission of the
information_schema by default
Example:
LDAP user dn is the "member" attribute of the LDAP group node then the user is considered to belong to the group, Doris
objectClass: groupOfNames
member: uid=jack,ou=aidp,dc=domain,dc=com
If jack also belongs to the LDAP groups doris_qa, doris_pm; Doris exists roles: doris_rd, doris_qa, doris_pm, after logging in
using LDAP authentication, the user will not only have the original permissions of the account, but will also get the roles
doris_rd, doris_qa and doris _pm privileges.
backends
backends
Name
backends
description
backends is a built-in system table of doris, which is stored under the information_schema database. You can view the BE
node information through the backends system table.
+-----------------------+-------------+------+-------+---------+-------+
+-----------------------+-------------+------+-------+---------+-------+
+-----------------------+-------------+------+-------+---------+-------+
Example
+-----------+-----------------+-----------+---------------+--------+----------+----------+---------------------+-
--------------------+-------+----------------------+-----------------------+-----------+------------------+------
---------+---------------+--------------------+------------------+--------------------+--------------------------
+--------+-----------------------------+-------------------------------------------------------------------------
------------------------------------------------------+
+-----------+-----------------+-----------+---------------+--------+----------+----------+---------------------+-
--------------------+-------+----------------------+-----------------------+-----------+------------------+------
---------+---------------+--------------------+------------------+--------------------+--------------------------
+--------+-----------------------------+-------------------------------------------------------------------------
------------------------------------------------------+
+-----------+-----------------+-----------+---------------+--------+----------+----------+---------------------+-
--------------------+-------+----------------------+-----------------------+-----------+------------------+------
---------+---------------+--------------------+------------------+--------------------+--------------------------
+--------+-----------------------------+-------------------------------------------------------------------------
------------------------------------------------------+
KeyWords
backends, information_schema
Best Practice
rowsets
rowsets
Name
rowsets
description
rowsets is a built-in system table of doris, which is stored under the information_schema database. You can view the current
BE rowsets
rowsets information of each through the system table.
+------------------------+------------+------+-------+---------+-------+
+------------------------+------------+------+-------+---------+-------+
+------------------------+------------+------+-------+---------+-------+
Example
select * from information_schema.rowsets where BACKEND_ID = 10004 limit 10;
+------------+--------------------------------------------------+-----------+-----------------+--------+---------
-----+---------------+-------------+-----------------+----------------+---------------+------------------------+-
-----------------------+
| BACKEND_ID | ROWSET_ID | TABLET_ID | ROWSET_NUM_ROWS | TXN_ID |
NUM_SEGMENTS | START_VERSION | END_VERSION | INDEX_DISK_SIZE | DATA_DISK_SIZE | CREATION_TIME |
OLDEST_WRITE_TIMESTAMP | NEWEST_WRITE_TIMESTAMP |
+------------+--------------------------------------------------+-----------+-----------------+--------+---------
-----+---------------+-------------+-----------------+----------------+---------------+------------------------+-
-----------------------+
| 10004 | 02000000000000994847fbd41a42297d7c7a57d3bcb46f8c | 10771 | 66850 | 6 |
1 | 3 | 3 | 2894 | 688855 | 1659964582 | 1659964581 |
1659964581 |
+------------+--------------------------------------------------+-----------+-----------------+--------+---------
-----+---------------+-------------+-----------------+----------------+---------------+------------------------+-
-----------------------+
KeyWords
rowsets, information_schema
Best Practice
Multi-tenancy
The main purpose of Doris's multi-tenant and resource isolation solution is to reduce interference between multiple users
when performing data operations in the same Doris cluster, and to allocate cluster resources to each user more reasonably.
The scheme is mainly divided into two parts, one is the division of resource groups at the node level in the cluster, and the
other is the resource limit for a single query.
Nodes in Doris
First, let's briefly introduce the node composition of Doris. There are two types of nodes in a Doris cluster: Frontend (FE) and
Backend (BE).
FE is mainly responsible for metadata management, cluster management, user request access and query plan analysis.
FE does not participate in the processing and calculation of user data, so it is a node with low resource consumption. The BE
is responsible for all data calculations and task processing, and is a resource-consuming node. Therefore, the resource
division and resource restriction schemes introduced in this article are all aimed at BE nodes. Because the FE node consumes
relatively low resources and can also be scaled horizontally, there is usually no need to isolate and restrict resources, and the
FE node can be shared by all users.
Assume that the current Doris cluster has 6 BE nodes. They are host[1-6] respectively. In the initial situation, all nodes
We can use the following command to divide these 6 nodes into 3 resource groups: group_a, group_b, group_c:
Here we combine host[1-2] to form a resource group group_a , host[3-4] to form a resource group group_b , and
host[5-6] to form a resource group group_c .
After the resource group is divided. We can distribute different copies of user data in different resource groups. Assume a
user table UserTable. We want to store a copy in each of the three resource groups, which can be achieved by the
following table creation statement:
properties(
In this way, the data in the UserTable table will be stored in the form of 3 copies in the nodes where the resource groups
group_a, group_b, and group_c are located.
The following figure shows the current node division and data distribution:
┌────────────────────────────────────────────────────┐
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ host1 │ │ host2 │ │
│ │ ┌─────────────┐ │ │ │ │
│ group_a │ │ replica1 │ │ │ │ │
│ │ └─────────────┘ │ │ │ │
│ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
├────────────────────────────────────────────────────┤
├────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ host3 │ │ host4 │ │
│ │ │ │ ┌─────────────┐ │ │
│ group_b │ │ │ │ replica2 │ │ │
│ │ │ │ └─────────────┘ │ │
│ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
├────────────────────────────────────────────────────┤
├────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ host5 │ │ host6 │ │
│ │ │ │ ┌─────────────┐ │ │
│ group_c │ │ │ │ replica3 │ │ │
│ │ │ │ └─────────────┘ │ │
│ │ │ │ │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
└────────────────────────────────────────────────────┘
After the execution of the first two steps is completed, we can limit a user's query by setting the user's resource usage
permissions, and can only use the nodes in the specified resource group to execute.
For example, we can use the following statement to restrict user1 to only use nodes in the group_a resource group for
data query, user2 can only use the group_b resource group, and user3 can use 3 resource groups at the same time:
After the setting is complete, when user1 initiates a query on the UserTable table, it will only access the data copy on the
nodes in the group_a resource group, and the query will only use the node computing resources in the group_a
resource group. The query of user3 can use copies and computing resources in any resource group.
In this way, we have achieved physical resource isolation for different user queries by dividing nodes and restricting user
resource usage. Furthermore, we can create different users for different business departments and restrict each user
from using different resource groups. In order to avoid the use of resource interference between different business parts.
For example, there is a business table in the cluster that needs to be shared by all 9 business departments, but it is hoped
that resource preemption between different departments can be avoided as much as possible. Then we can create 3
copies of this table and store them in 3 resource groups. Next, we create 9 users for 9 business departments, and limit the
use of one resource group for every 3 users. In this way, the degree of competition for resources is reduced from 9 to 3.
On the other hand, for the isolation of online and offline tasks. We can use resource groups to achieve this. For example,
we can divide nodes into two resource groups, Online and Offline. The table data is still stored in 3 copies, of which 2
copies are stored in the Online resource group, and 1 copy is stored in the Offline resource group. The Online resource
group is mainly used for online data services with high concurrency and low latency. Some large queries or offline ETL
operations can be executed using nodes in the Offline resource group. So as to realize the ability to provide online and
offline services simultaneously in a unified cluster.
The resource usage of load jobs (including insert, broker load, routine load, stream load, etc.) can be divided into two
parts:
i. Computing resources: responsible for reading data sources, data transformation and distribution.
ii. Write resource: responsible for data encoding, compression and writing to disk.
The write resource must be the node where the replica is located, and the computing resource can theoretically select
any node to complete. Therefore, the allocation of resource groups for load jobs is divided into two steps:
i. Use user-level resource tags to limit the resource groups that computing resources can use.
ii. Use the resource tag of the replica to limit the resource group that the write resource can use.
So if you want all the resources used by the load operation to be limited to the resource group where the data is located,
you only need to set the resource tag of the user level to the same as the resource tag of the replica.
Therefore, in addition to the resource group solution, Doris also provides a single query resource restriction function.
At present, Doris's resource restrictions on single queries are mainly divided into two aspects: CPU and memory restrictions.
1. Memory Limitation
Doris can limit the maximum memory overhead that a query is allowed to use. To ensure that the memory resources of
the cluster will not be fully occupied by a query. We can set the memory limit in the following ways:
# Set the session variable exec_mem_limit. Then all subsequent queries in the session (within the connection)
use this memory limit.
set exec_mem_limit=1G;
# Set the global variable exec_mem_limit. Then all subsequent queries of all new sessions (new connections)
use this memory limit.
# Set the variable exec_mem_limit in SQL. Then the variable only affects this SQL.
Because Doris' query engine is based on the full-memory MPP query framework. Therefore, when the memory usage of
a query exceeds the limit, the query will be terminated. Therefore, when a query cannot run under a reasonable memory
limit, we need to solve it through some SQL optimization methods or cluster expansion.
2. CPU limitations
Users can limit the CPU resources of the query in the following ways:
# Set the session variable cpu_resource_limit. Then all queries in the session (within the connection) will
use this CPU limit.
set cpu_resource_limit = 2
# Set the user's attribute cpu_resource_limit, then all queries of this user will use this CPU limit. The
priority of this attribute is higher than the session variable cpu_resource_limit
The value of cpu_resource_limit is a relative value. The larger the value, the more CPU resources can be used. However,
the upper limit of the CPU that can be used by a query also depends on the number of partitions and buckets of the
table. In principle, the maximum CPU usage of a query is positively related to the number of tablets involved in the query.
In extreme cases, assuming that a query involves only one tablet, even if cpu_resource_limit is set to a larger value, only
1 CPU resource can be used.
Through memory and CPU resource limits. We can divide user queries into more fine-grained resources within a resource
group. For example, we can make some offline tasks with low timeliness requirements, but with a large amount of
calculation, use less CPU resources and more memory resources. Some delay-sensitive online tasks use more CPU resources
and reasonable memory resources.
3. The copy distribution of all tables is modified by default to: "tag.location.default:xx . xx is the number of original
copies.
4. Users can still specify the number of replicas in the table creation statement by "replication_num" = "xx" , this attribute
will be automatically converted to: "tag.location.default:xx . This ensures that there is no need to modify the original
creation. Table statement.
5. By default, the memory limit for a single query is 2GB for a single node, and the CPU resources are unlimited, which is
consistent with the original behavior. And the user's resource_tags.location attribute is empty, that is, by default, the
user can access the BE of any Tag, which is consistent with the original behavior.
Here we give an example of the steps to start using the resource division function after upgrading from the original cluster to
version 0.15:
Next, you can use the alter system modify backend statement to set the BE Tag. And through the alter table
statement to modify the copy distribution strategy of the table. Examples are as follows:
After the tag and copy distribution are set, we can turn on the data repair and equalization logic to trigger data
redistribution.
This process will continue for a period of time depending on the amount of data involved. And it will cause some
show
colocation tables to fail colocation planning (because the copy is being migrated). You can view the progress by
proc "/cluster_balance/" . You can also judge the progress by the number of UnhealthyTabletNum in show proc
"/statistic" . When UnhealthyTabletNum drops to 0, it means that the data redistribution is completed. .
After the data is redistributed. We can start to set the user's resource label permissions. Because by default, the user's
resource_tags.location
attribute is empty, that is, the BE of any tag can be accessed. Therefore, in the previous steps,
resource_tags.location
the normal query of existing users will not be affected. When the property is not empty, the
user will be restricted from accessing the BE of the specified Tag.
Through the above 4 steps, we can smoothly use the resource division function after the original cluster is upgraded.
Config Action
Request
GET /rest/v1/config/fe/
Description
Config Action is used to obtain current FE configuration information.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"rows": [{
"Value": "DAY",
"Name": "sys_log_roll_interval"
}, {
"Value": "23",
"Name": "consistency_check_start_time"
}, {
"Value": "4096",
"Name": "max_mysql_service_task_threads_num"
}, {
"Value": "1000",
"Name": "max_unfinished_load_job"
}, {
"Value": "100",
"Name": "max_routine_load_job_num"
}, {
"Value": "SYNC",
"Name": "master_sync_policy"
}]
},
"count": 0
The returned result is the same as System Action . Is a description of the table.
Edit this page
Feedback
Admin Manual HTTP API FE HA Action
HA Action
Request
GET /rest/v1/ha
Description
HA Action is used to obtain the high availability group information of the FE cluster.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"Observernodes": [],
"CurrentJournalId": [{
"Value": 433648,
"Name": "FrontendRole"
}],
"Electablenodes": [{
"Value": "host1",
"Name": "host1"
}],
"allowedFrontends": [{
"Name": "192.168.1.1_9213_1597652404352"
}],
"removedFronteds": [],
"CanRead": [{
"Value": true,
"Name": "Status"
}],
"databaseNames": [{
"Name": "DatabaseNames"
}],
"FrontendRole": [{
"Value": "MASTER",
"Name": "FrontendRole"
}],
"CheckpointInfo": [{
"Value": 433435,
"Name": "Version"
}, {
"Value": "2020-09-03T02:07:37.000+0000",
"Name": "lastCheckPointTime"
}]
},
"count": 0
Request
GET /rest/v1/hardware_info/fe/
Description
Hardware Info Action is used to obtain the hardware information of the current FE.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"VersionInfo": {
"Git": "git://host/core@5bc28f4c36c20c7b424792df662fc988436e679e",
"Version": "trunk",
"BuildInfo": "[email protected]",
},
"HarewareInfo": {
"NetworkParameter": "...",
"Processor": "...",
"OS": "...",
"Memory": "...",
"FileSystem": "...",
"NetworkInterface": "...",
"Processes": "...",
"Disk": "..."
},
"count": 0
The contents of each value in the HarewareInfo field are all hardware information text displayed in html format.
Help Action
Request
GET /rest/v1/help
Description
Used to obtain help through fuzzy query.
Path parameters
None
Query parameters
query
Request body
None
Response
{
"msg":"success",
"code":0,
"count":0
Log Action
Request
GET /rest/v1/log
Description
GET is used to obtain the latest part of Doris's WARNING log, and the POST method is used to dynamically set the log level of
FE.
Path parameters
None
Query parameters
add_verbose
Optional parameters for the POST method. Enable the DEBUG level log of the specified package.
del_verbose
Optional parameters for the POST method. Turn off the DEBUG level log of the specified package.
Request body
None
Response
GET /rest/v1/log
"msg": "success",
"code": 0,
"data": {
"LogContents": {
"logPath": "/home/disk1/cmy/git/doris/core-for-ui/output/fe/log/fe.warn.log",
},
"LogConfiguration": {
"VerboseNames": "org",
"AuditNames": "slow_query,query",
"Level": "INFO"
},
"count": 0
Among them, data.LogContents.log means the log content in the latest part of fe.warn.log .
POST /rest/v1/log?add_verbose=org
"msg": "success",
"code": 0,
"data": {
"LogConfiguration": {
"VerboseNames": "org",
"AuditNames": "slow_query,query",
"Level": "INFO"
},
"count": 0
Login Action
Request
POST /rest/v1/login
Description
Used to log in to the service.
Path parameters
None
Query parameters
None
Request body
None
Response
Login success
"code": 200
Login failure
"code": xxx,
"count": 0
Logout Action
Request
POST /rest/v1/logout
Description
Logout Action is used to log out of the current login.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "OK",
"code": 0
Request
GET /rest/v1/query_profile/<query_id>
Description
The Query Profile Action is used to obtain the Query profile.
Path parameters
<query_id>
Optional parameters. When not specified, the latest query list is returned. When specified, return the profile of the
specified query.
Query parameters
None
Request body
None
Response
Not specify <query_id>
GET /rest/v1/query_profile/
"msg": "success",
"code": 0,
"data": {
"column_names": ["Query ID", "User", "Default Db", "Sql Statement", "Query Type", "Start Time", "End
Time", "Total", "Query State"],
"rows": [{
"User": "root",
"__hrefPath": ["/query_profile/d73a8a0b004f4b2f-b4829306441913da"],
"Total": "5ms",
}, {
"User": "root",
"__hrefPath": ["/query_profile/fd706dd066824c21-9d1a63af9f5cb50c"],
"Total": "6ms",
}]
},
"count": 3
System Action
The returned result is the same as , which is a table description.
Specify <query_id>
GET /rest/v1/query_profile/<query_id>
"msg": "success",
"code": 0,
"data": "Query:</br> Summary:</br>...",
"count": 0
Session Action
Request
GET /rest/v1/session
Description
Session Action is used to obtain the current session information.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"column_names": ["Id", "User", "Host", "Cluster", "Db", "Command", "Time", "State", "Info"],
"rows": [{
"User": "root",
"Command": "Sleep",
"State": "",
"Cluster": "default_cluster",
"Host": "10.81.85.89:31465",
"Time": "230",
"Id": "0",
"Info": "",
"Db": "db1"
}]
},
"count": 2
The returned result is the same as System Action . Is a description of the table.
System Action
Request
GET /rest/v1/system
Description
System Action is used for information about the Proc system built in Doris.
Path parameters
None
Query parameters
path
Request body
None
Response
Take /dbs/10003/10054/partitions/10053/10055 as an example:
"msg": "success",
"code": 0,
"data": {
"rows": [{
"SchemaHash": "1294206575",
"LstFailedTime": "\\N",
"LstFailedVersion": "-1",
"MetaUrl": "URL",
"__hrefPaths": ["http://192.168.100.100:8030/rest/v1/system?
path=/dbs/10003/10054/partitions/10053/10055/10056", "http://192.168.100.100:8043/api/meta/header/10056",
"http://192.168.100.100:8043/api/compaction/show?tablet_id=10056"],
"CheckVersionHash": "-1",
"ReplicaId": "10057",
"VersionHash": "4611804212003004639",
"LstConsistencyCheckTime": "\\N",
"LstSuccessVersionHash": "4611804212003004639",
"CheckVersion": "-1",
"Version": "6",
"VersionCount": "2",
"State": "NORMAL",
"BackendId": "10032",
"DataSize": "776",
"LstFailedVersionHash": "0",
"LstSuccessVersion": "6",
"CompactionStatus": "URL",
"TabletId": "10056",
"PathHash": "-3259732870068082628",
"RowCount": "21"
}]
},
"count": 1
The column_names in the data part is the header information, and href_columns indicates which columns in the table are
hyperlink columns. Each element in the rows array represents a row. Among them, __hrefPaths is not the table data, but
the link URL of the hyperlink column, which corresponds to the column in href_columns one by one.
Request
GET /api/colocate
POST/DELETE /api/colocate/group_stable
POST /api/colocate/bucketseq
Description
Used to obtain or modify colocate group information.
Path parameters
None
Query parameters
None
Request body
None
Response
TO DO
Meta Action
Request
GET /image
GET /info
GET /version
GET /put
GET /journal_id
GET /role
GET /check
GET /dump
Description
This is a set of APIs related to FE metadata, except for /dump , they are all used for internal communication between FE
nodes.
Path parameters
TODO
Query parameters
TODO
Request body
TODO
Response
TODO
Examples
TODO
Cluster Action
Request
GET /rest/v2/manager/cluster/cluster_info/conn_info
Description
Used to get cluster http, mysql connection information.
Response
{
"msg": "success",
"code": 0,
"data": {
"http": [
"fe_host:http_ip"
],
"mysql": [
"fe_host:query_ip"
},
"count": 0
Examples
```
GET /rest/v2/manager/cluster/cluster_info/conn_info
Response:
"msg": "success",
"code": 0,
"data": {
"http": [
"127.0.0.1:8030"
],
"mysql": [
"127.0.0.1:9030"
},
"count": 0
```
Node Action
Request
GET /rest/v2/manager/node/frontends
GET /rest/v2/manager/node/backends
GET /rest/v2/manager/node/brokers
GET /rest/v2/manager/node/configuration_name
GET /rest/v2/manager/node/node_list
POST /rest/v2/manager/node/configuration_info
POST /rest/v2/manager/node/set_config/fe
POST /rest/v2/manager/node/set_config/be
POST /rest/v2/manager/node/{action}/be
POST /rest/v2/manager/node/{action}/fe
GET /rest/v2/manager/node/backends
GET /rest/v2/manager/node/brokers
Description
Used to get cluster to get fe, be, broker node information.
Response
frontends:
"msg": "success",
"code": 0,
"data": {
"column_names": [
"Name",
"IP",
"HostName",
"EditLogPort",
"HttpPort",
"QueryPort",
"RpcPort",
"Role",
"IsMaster",
"ClusterId",
"Join",
"Alive",
"ReplayedJournalId",
"LastHeartbeat",
"IsHelper",
"ErrMsg",
"Version"
],
"rows": [
...
},
"count": 0
backends:
"msg": "success",
"code": 0,
"data": {
"column_names": [
"BackendId",
"Cluster",
"IP",
"HostName",
"HeartbeatPort",
"BePort",
"HttpPort",
"BrpcPort",
"LastStartTime",
"LastHeartbeat",
"Alive",
"SystemDecommissioned",
"ClusterDecommissioned",
"TabletNum",
"DataUsedCapacity",
"AvailCapacity",
"TotalCapacity",
"UsedPct",
"MaxDiskUsedPct",
"ErrMsg",
"Version",
"Status"
],
"rows": [
...
},
"count": 0
brokers:
"msg": "success",
"code": 0,
"data": {
"column_names": [
"Name",
"IP",
"HostName",
"Port",
"Alive",
"LastStartTime",
"LastUpdateTime",
"ErrMsg"
],
"rows": [
...
},
"count": 0
GET /rest/v2/manager/node/node_list
POST /rest/v2/manager/node/configuration_info
Description
configuration_name Used to get the name of the node configuration item.
node_list Get the list of nodes.
configuration_info to get the node configuration details.
Query parameters
GET /rest/v2/manager/node/configuration_name
none
GET /rest/v2/manager/node/node_list
none
POST /rest/v2/manager/node/configuration_info
type
The value is fe or be, which specifies to get the configuration information of fe or the configuration information of
be.
Request body
GET /rest/v2/manager/node/configuration_name
none
GET /rest/v2/manager/node/node_list
none
POST /rest/v2/manager/node/configuration_info
"conf_name": [
""
],
"node": [
""
If no body is included, the parameters in the body use the default values.
conf_name specifies which configuration items to return, the default is all configuration items.
node is used to specify which node's configuration information is returned, the default is all fe nodes or be
nodes configuration information.
Response
GET /rest/v2/manager/node/configuration_name
"msg": "success",
"code": 0,
"data": {
"backend":[
""
],
"frontend":[
""
},
"count": 0
GET /rest/v2/manager/node/node_list
"msg": "success",
"code": 0,
"data": {
"backend": [
""
],
"frontend": [
""
},
"count": 0
POST /rest/v2/manager/node/configuration_info?type=fe
"msg": "success",
"code": 0,
"data": {
"column_names": [
"
"
配置项
节点",
",
"节点类型",
"配置值类型",
"MasterOnly",
"
"
配置值
可修改",
"
],
"rows": [
""
},
"count": 0
POST /rest/v2/manager/node/configuration_info?type=be
"msg": "success",
"code": 0,
"data": {
" 配置项
"column_names": [
",
"节点",
"节点类型",
"配置值类型",
"配置值",
"可修改"
],
"rows": [
""
},
"count": 0
Examples
POST /rest/v2/manager/node/configuration_info?type=fe
body:
"conf_name":[
"agent_task_resend_wait_time_ms"
Response:
"msg": "success",
"code": 0,
"data": {
"column_names": [
"
"
配置项
节点",
",
"节点类型",
"配置值类型",
"配置值
"MasterOnly",
",
"可修改"
],
"rows": [
"agent_task_resend_wait_time_ms",
"127.0.0.1:8030",
"FE",
"long",
"true",
"50000",
"true"
},
"count": 0
POST /rest/v2/manager/node/set_config/be
Description
Used to modify fe or be node configuration values
Request body
"config_name":{
"node":[
""
],
"value":"",
"persist":
persist is true for permanent modification and false for temporary modification. persist means permanent
modification, false means temporary modification. permanent modification takes effect after reboot, temporary
modification fails after reboot.
Response
GET /rest/v2/manager/node/configuration_name
"msg": "",
"code": 0,
"data": {
"failed":[
"config_name":"name",
"value"="",
"node":"",
"err_info":""
},
"count": 0
Examples
POST /rest/v2/manager/node/set_config/fe
body:
"agent_task_resend_wait_time_ms":{
"node":[
"127.0.0.1:8030"
],
"value":"10000",
"persist":"true"
},
"alter_table_timeout_second":{
"node":[
"127.0.0.1:8030"
],
"value":"true",
"persist":"true"
}
Response:
"msg": "success",
"code": 0,
"data": {
"failed": [
"config_name": "alter_table_timeout_second",
"node": "10.81.85.89:8837",
"value": "true"
},
"count": 0
Operate be node
POST /rest/v2/manager/node/{action}/be
Description
:
action ADD/DROP/DECOMMISSION
Request body
"hostPorts": ["127.0.0.1:9050"],
"properties": {
"tag.location": "test"
properties The configuration passed in when adding a node is only used to configure the tag. If not, the default
tag is used
Response
"msg": "Error",
"code": 1,
"count": 0
msg Success/Error
code 0/1
Examples
1. add be node
post /rest/v2/manager/node/ADD/be
Request body
"hostPorts": ["127.0.0.1:9050"]
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
2. drop be node
post /rest/v2/manager/node/DROP/be
Request body
"hostPorts": ["127.0.0.1:9050"]
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
3. offline be node
post /rest/v2/manager/node/DECOMMISSION/be
Request body
"hostPorts": ["127.0.0.1:9050"]
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
Operate fe node
POST /rest/v2/manager/node/{action}/fe
Description
Used to add/drop fe node
:
action ADD/DROP
Request body
"role": "FOLLOWER",
"hostPort": "127.0.0.1:9030"
role FOLLOWER/OBSERVER
Response
"msg": "Error",
"code": 1,
"count": 0
msg Success/Error
code 0/1
Examples
1. add FOLLOWER node
post /rest/v2/manager/node/ADD/fe
Request body
"role": "FOLLOWER",
"hostPort": "127.0.0.1:9030"
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
post /rest/v2/manager/node/DROP/fe
Request body
"role": "FOLLOWER",
"hostPort": "127.0.0.1:9030"
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
Request
GET /rest/v2/manager/query/query_info
GET /rest/v2/manager/query/trace/{trace_id}
GET /rest/v2/manager/query/sql/{query_id}
GET /rest/v2/manager/query/profile/text/{query_id}
GET /rest/v2/manager/query/profile/graph/{query_id}
GET /rest/v2/manager/query/profile/json/{query_id}
GET /rest/v2/manager/query/profile/fragments/{query_id}
GET /rest/v2/manager/query/current_queries
GET /rest/v2/manager/query/kill/{query_id}
Description
Gets information about select queries for all fe nodes in the cluster.
Query parameters
query_id
Optional, specifies the query ID of the query to be returned, default returns information for all queries.
search
Optional, specifies that query information containing strings is returned, currently only string matches are performed.
is_all_node
Optional, if true, returns query information for all fe nodes, if false, returns query information for the current fe node. The
default is true.
Response
"msg": "success",
"code": 0,
"data": {
"column_names": [
"Query ID",
"FE 节点",
"查询用户",
"执行数据库",
"Sql",
"查询类型",
"开始时间",
"结束时间",
"执行时长",
"状态"
],
"rows": [
...
},
"count": 0
Examples
GET /rest/v2/manager/query/query_info
"msg": "success",
"code": 0,
"data": {
"column_names": [
"FE节点
"Query ID",
",
"
"
查询用户
执行数据库 ",
",
"查询类型",
"Sql",
"开始时间",
"结束时间",
"执行时长",
"状态"
],
"rows": [
"d7c93d9275334c35-9e6ac5f295a7134b",
"127.0.0.1:8030",
"root",
"default_cluster:testdb",
"select c.id, c.name, p.age, p.phone, c.date, c.cost from cost c join people p on c.id = p.id
where p.age > 20 order by c.id",
"Query",
"2021-07-29 16:59:12",
"2021-07-29 16:59:12",
"109ms",
"EOF"
},
"count": 0
Description
After executing the Query within the same Session, the query id can be obtained through the trace id.
Path parameters
{trace_id}
Query parameters
Response
"msg": "success",
"code": 0,
"data": "fb1d9737de914af1-a498d5c5dec638d3",
"count": 0
"code": 403,
"count": 0
Get the sql and text profile for the specified query
GET /rest/v2/manager/query/sql/{query_id}
GET /rest/v2/manager/query/profile/text/{query_id}
Description
Get the sql and profile text for the specified query id.
Path parameters
query_id
The query id.
Query parameters
is_all_node
Optional, if true then query for the specified query id in all fe nodes, if false then query for the specified query id in the
currently connected fe nodes. The default is true.
Response
{
"msg": "success",
"code": 0,
"data": {
"sql": ""
},
"count": 0
"msg": "success",
"code": 0,
"data": {
"profile": ""
},
"count": 0
Admin and Root user can view all queries. Ordinary users can only view the Query sent by themselves. If the specified trace id
does not exist or has no permission, it will return Bad Request:
"code": 403,
"count": 0
### Examples
1. get sql.
GET /rest/v2/manager/query/sql/d7c93d9275334c35-9e6ac5f295a7134b
Response:
"msg": "success",
"code": 0,
"data": {
"sql": "select c.id, c.name, p.age, p.phone, c.date, c.cost from cost c join people p on c.id = p.id
where p.age > 20 order by c.id"
},
"count": 0
Description
Get the fragment name, instance id and execution time for the specified query id.
Path parameters
query_id
Query parameters
is_all_node
Optional, if true then query for the specified query id in all fe nodes, if false then query for the specified query id in the
currently connected fe nodes. The default is true.
Response
{
"msg": "success",
"code": 0,
"data": [
"fragment_id": "",
"time": "",
"instance_id": {
"": ""
],
"count": 0
Admin and Root user can view all queries. Ordinary users can only view the Query sent by themselves. If the specified trace id
does not exist or has no permission, it will return Bad Request:
"code": 403,
"count": 0
### Examples
```
GET /rest/v2/manager/query/profile/fragments/d7c93d9275334c35-9e6ac5f295a7134b
Response:
"msg": "success",
"code": 0,
"data": [
"fragment_id": "0",
"time": "36.169ms",
"instance_id": {
"d7c93d9275334c35-9e6ac5f295a7134e": "36.169ms"
},
"fragment_id": "1",
"time": "20.710ms",
"instance_id": {
"d7c93d9275334c35-9e6ac5f295a7134c": "20.710ms"
},
"fragment_id": "2",
"time": "7.83ms",
"instance_id": {
"d7c93d9275334c35-9e6ac5f295a7134d": "7.83ms"
],
"count": 0
```
Description
Get the tree profile information of the specified query id, same as show query profile command.
Path parameters
query_id
Query parameters
If both are specified, a detailed profile tree is returned, which is equivalent to show query profile
'/query_id/fragment_id/instance_id'
.
is_all_node
Optional, if true then query information about the specified query id in all fe nodes, if false then query information about
the specified query id in the currently connected fe nodes. The default is true.
Response
{
"msg": "success",
"code": 0,
"data": {
"graph":""
},
"count": 0
"code": 403,
"count": 0
Description
Path parameters
Query parameters
is_all_node
Optional. Return current running queries from all FE if set to true. Default is true.
Response
{
"msg": "success",
"code": 0,
"data": {
"rows": [
},
"count": 0
Cancel query
POST /rest/v2/manager/query/kill/{query_id}
Description
Cancel query of specified connection.
Path parameters
{query_id}
Query parameters
Response
"msg": "success",
"code": 0,
"data": null,
"count": 0
Backends Action
Request
GET /api/backends
Description
Backends Action returns the Backends list, including Backend's IP, PORT and other information.
Path parameters
None
Query parameters
is_alive
Optional parameters. Whether to return the surviving BE nodes. The default is false, which means that all BE nodes are
returned.
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"backends": [
"ip": "192.1.1.1",
"http_port": 8040,
"is_alive": true
},
"count": 0
Bootstrap Action
Request
GET /api/bootstrap
Description
It is used to judge whether the FE has started. When no parameters are provided, only whether the startup is successful is
returned. If token and cluster_id are provided, more detailed information is returned.
Path parameters
none
Query parameters
cluster_id
doris-meta/image/VERSION
The cluster id. It can be viewed in the file .
token
doris-meta/image/VERSION
Cluster token. It can be viewed in the file .
Request body
none
Response
No parameters provided
"msg": "OK",
"code": 0,
"data": null,
"count": 0
A code of 0 means that the FE node has started successfully. Error codes other than 0 indicate other errors.
"msg": "OK",
"code": 0,
"data": {
"queryPort": 9030,
"rpcPort": 9020,
"maxReplayedJournal": 17287
},
"count": 0
Examples
1. No parameters
GET /api/bootstrap
Response:
"msg": "OK",
"code": 0,
"data": null,
"count": 0
token cluster_id
2. Provide and
GET /api/bootstrap?cluster_id=935437471&token=ad87f6dd-c93f-4880-bcdb-8ca8c9ab3031
Response:
"msg": "OK",
"code": 0,
"data": {
"queryPort": 9030,
"rpcPort": 9020,
"maxReplayedJournal": 17287
},
"count": 0
Request
POST /api/<db>/_cancel
Description
Used to cancel the load transaction of the specified label.
RETURN VALUES
Return a JSON format string:
Status:
Success:
cancel succeed
Others: cancel failed
Message: Error message if cancel failed
Path parameters
<db>
Query parameters
<label>
Request body
None
Response
Cancel success
"msg": "OK",
"code": 0,
"data": null,
"count": 0
Cancel failed
"code": 1,
"data": null,
"count": 0
Examples
1. Cancel the load transaction of the specified label
POST /api/example_db/_cancel?label=my_label1
Response:
"msg": "OK",
"code": 0,
"data": null,
"count": 0
Request
GET /api/check_decommission
Description
Used to determine whether the specified BE can be decommissioned. For example, after the node being decommissioned,
whether the remaining nodes can meet the space requirements and the number of replicas.
Path parameters
None
Query parameters
host_ports
Request body
None
Response
Return a list of nodes that can be decommissioned
"msg": "OK",
"code": 0,
"count": 0
Examples
1. Check whether the specified BE node can be decommissioned
GET /api/check_decommission?host_ports=192.168.10.11:9050,192.168.10.11:9050
Response:
"msg": "OK",
"code": 0,
"data": ["192.168.10.11:9050"],
"count": 0
Request
GET /api/_check_storagetype
Description
It is used to check whether the storage format of the table under the specified database is the row storage format. (The row
format is deprecated)
Path parameters
None
Query parameters
db
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"tbl2": {},
"tbl1": {}
},
"count": 0
If there is content after the table name, the base or rollup table whose storage format is row storage will be displayed.
Examples
1. Check whether the storage format of the following table of the specified database is row format
GET /api/_check_storagetype
Response:
"msg": "success",
"code": 0,
"data": {
"tbl2": {},
"tbl1": {}
},
"count": 0
Connection Action
Request
GET /api/connection
Description
Given a connection id, return the query id that is currently being executed for this connection or the last execution
completed.
The connection id can be viewed through the id column in the MySQL command show processlist; .
Path parameters
无
Query parameters
connection_id
Specified connection id
Request body
None
Response
{
"msg": "OK",
"code": 0,
"data": {
"query_id": "b52513ce3f0841ca-9cb4a96a268f2dba"
},
"count": 0
Examples
1. Get the query id of the specified connection id
GET /api/connection?connection_id=101
Response:
"msg": "OK",
"code": 0,
"data": {
"query_id": "b52513ce3f0841ca-9cb4a96a268f2dba"
},
"count": 0
Request
GET /api/basepath
Description
Used to obtain http basepath.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg":"success",
"code":0,
"data":{"enable":false,"path":""},
"count":0
Request
GET /api/fe_version_info
Description
Get fe version info from fe host.
Path parameters
None.
Query parameters
None.
Request body
None.
Response
```
"msg":"success",
"code":0,
"data":{
"feVersionInfo":{
"dorisBuildVersionPrefix":"doris",
"dorisBuildVersionMajor":0,
"dorisBuildVersionMinor":0,
"dorisBuildVersionPatch":0,
"dorisBuildVersionRcVersion":"trunk",
"dorisBuildVersion":"doris-0.0.0-trunk",
"dorisBuildHash":"git://4b7b503d1cb3/data/doris/doris/be/../@a04f9814fe5a09c0d9e9399fe71cc4d765f8bff1",
"dorisBuildShortHash":"a04f981",
"dorisBuildInfo":"root@4b7b503d1cb3",
},
"count":0
```
Examples
```
GET /api/fe_version_info
Response:
"msg":"success",
"code":0,
"data":{
"feVersionInfo":{
"dorisBuildVersionPrefix":"doris",
"dorisBuildVersionMajor":0,
"dorisBuildVersionMinor":0,
"dorisBuildVersionPatch":0,
"dorisBuildVersionRcVersion":"trunk",
"dorisBuildVersion":"doris-0.0.0-trunk",
"dorisBuildHash":"git://4b7b503d1cb3/data/doris/doris/be/../@a04f9814fe5a09c0d9e9399fe71cc4d765f8bff1",
"dorisBuildShortHash":"a04f981",
"dorisBuildInfo":"root@4b7b503d1cb3",
},
"count":0
```
Request
GET /api/_get_ddl
Description
Used to get the table creation statement, partition creation statement and rollup statement of the specified table.
Path parameters
None
Query parameters
db
Specify database
table
Specify table
Request body
None
Response
{
"msg": "OK",
"code": 0,
"data": {
},
"count": 0
Examples
1. Get the DDL statement of the specified table
Response
"msg": "OK",
"code": 0,
"data": {
"create_partition": [],
"create_table": ["CREATE TABLE `tbl1` (\n `k1` int(11) NULL COMMENT \"\",\n `k2` int(11) NULL
COMMENT \"\"\n) ENGINE=OLAP\nDUPLICATE KEY(`k1`, `k2`)\nCOMMENT \"OLAP\"\nDISTRIBUTED BY HASH(`k1`) BUCKETS
1\nPROPERTIES (\n\"replication_num\" = \"1\",\n\"version_info\" = \"1,0\",\n\"in_memory\" =
\"false\",\n\"storage_format\" = \"DEFAULT\"\n);"],
"create_rollup": []
},
"count": 0
Request
GET /api/<db>/_load_info
Description
Used to obtain the information of the load job of the specified label.
Path parameters
<db>
Specify database
Query parameters
label
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"dbName": "default_cluster:db1",
"tblNames": ["tbl1"],
"label": "my_label",
"clusterName": "default_cluster",
"state": "FINISHED",
"failMsg": "",
"trackingUrl": ""
},
"count": 0
Examples
1. Get the load job information of the specified label
GET /api/example_db/_load_info?label=my_label
Response
"msg": "success",
"code": 0,
"data": {
"dbName": "default_cluster:db1",
"tblNames": ["tbl1"],
"label": "my_label",
"clusterName": "default_cluster",
"state": "FINISHED",
"failMsg": "",
"trackingUrl": ""
},
"count": 0
Request
GET /api/<db>/get_load_state
Description
Returns the status of the load transaction of the specified label
Return of JSON format string of the status of specified
transaction:
Label: The specified label.
Status: Success or not of this request.
Message: Error messages
State:
UNKNOWN/PREPARE/COMMITTED/VISIBLE/ABORTED
Path parameters
<db>
Specify database
Query parameters
label
Specify label
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": "VISIBLE",
"count": 0
"msg": "success",
"code": 0,
"data": "UNKNOWN",
"count": 0
Examples
1. Get the status of the load transaction of the specified label.
GET /api/example_db/get_load_state?label=my_label
"msg": "success",
"code": 0,
"data": "VISIBLE",
"count": 0
Request
HEAD /api/get_log_file
GET /api/get_log_file
Description
Users can obtain FE log files through the HTTP interface.
The HEAD request is used to obtain the log file list of the specified log type. GET request is used to download the specified log
file.
Path parameters
None
Query parameters
type
file
Request body
None
Response
HEAD
HTTP/1.1 200 OK
file_infos: {"fe.audit.log":24759,"fe.audit.log.20190528.1":132934}
content-type: text/html
connection: keep-alive
The returned header lists all current log files of the specified type and the size of each file.
GET
HEAD /api/get_log_file?type=fe.audit.log
Response:
HTTP/1.1 200 OK
file_infos: {"fe.audit.log":24759,"fe.audit.log.20190528.1":132934}
content-type: text/html
connection: keep-alive
In the returned header, the file_infos field displays the file list and the corresponding file size (in bytes) in json format
GET /api/get_log_file?type=fe.audit.log&file=fe.audit.log.20190528.1
Response:
Request
GET /api/get_small_file
Description
Through the file id, download the file in the small file manager.
Path parameters
None
Query parameters
token
file_id
The file id displayed in the file manager. The file id can be viewed with the SHOW FILE command.
Request body
None
Response
< HTTP/1.1 200
"code": 1,
"data": null,
"count": 0
Examples
1. Download the file with the specified id
GET /api/get_small_file?token=98e8c0a6-3a41-48b8-a72b-0432e42a7fe5&file_id=11002
Response:
Health Action
Request
GET /api/health
Description
Returns the number of BE nodes currently surviving in the cluster and the number of BE nodes that are down.
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"online_backend_num": 10,
"total_backend_num": 10
},
"count": 0
List Database
Request
GET /api/meta/namespaces/<ns_name>/databases
Description
Path parameters
None
Query parameters
limit
offset
Request body
None
Response
{
"msg": "OK",
"code": 0,
"data": [
],
"count": 3
List Table
Request
GET /api/meta/namespaces/<ns_name>/databases/<db_name>/tables
Description
Get a list of tables in the specified database, arranged in alphabetical order.
Path parameters
<db_name>
Specify database
Query parameters
limit
offset
Request body
None
Response
"msg": "OK",
"code": 0,
"data": [
],
"count": 0
Schema Info
Request
GET /api/meta/namespaces/<ns_name>/databases/<db_name>/tables/<tbl_name>/schema
Description
Get the table structure information of the specified table in the specified database.
Path parameters
<db_name>
<tbl_name>
Query parameters
with_mv
Optional. If not specified, the table structure of the base table is returned by default. If specified, all rollup index will also
be returned.
Request body
None
Response
GET /api/meta/namespaces/default/databases/db1/tables/tbl1/schema
"msg": "success",
"code": 0,
"data": {
"tbl1": {
"schema": [{
"Field": "k1",
"Type": "INT",
"Null": "Yes",
"Extra": "",
"Default": null,
"Key": "true"
},
"Field": "k2",
"Type": "INT",
"Null": "Yes",
"Extra": "",
"Default": null,
"Key": "true"
],
"is_base": true
},
"count": 0
GET /api/meta/namespaces/default/databases/db1/tables/tbl1/schema?with_mv?=1
"msg": "success",
"code": 0,
"data": {
"tbl1": {
"schema": [{
"Field": "k1",
"Type": "INT",
"Null": "Yes",
"Extra": "",
"Default": null,
"Key": "true"
},
"Field": "k2",
"Type": "INT",
"Null": "Yes",
"Extra": "",
"Default": null,
"Key": "true"
],
"is_base": true
},
"rollup1": {
"schema": [{
"Field": "k1",
"Type": "INT",
"Null": "Yes",
"Extra": "",
"Default": null,
"Key": "true"
}],
"is_base": false
},
"count": 0
The data field returns the table structure information of the base table or rollup table.
Request
GET /api/_meta_replay_state
Description
Get the status of FE node metadata replay.
Path parameters
None
Query parameters
None
Request body
None
Response
TODO
Examples
TODO
Metrics Action
Request
GET /api/metrics
Description
Used to obtain doris metrics infomation.
Path parameters
None
Query parameters
None
Request body
None
Profile Action
Request
GET /api/profile
Description
Used to obtain the query profile of the specified query id.
If query_id is not exists, return 404 NOT FOUND ERROR
If query_id
exists, return query profile like this
Query:
Summary:
- Total: 8ms
- User: root
Fragment 0:
- MemoryLimit: 2.00 GB
- PeakUsedReservation: 0.00
- PeakMemoryUsage: 72.00 KB
- RowsProduced: 5
- AverageThreadTokens: 0.00
- PeakReservation: 0.00
BlockMgr:
- BlocksCreated: 0
- BlockWritesOutstanding: 0
- BytesWritten: 0.00
- TotalEncryptionTime: 0ns
- BufferedPins: 0
- TotalReadBlockTime: 0ns
- TotalBufferWaitTime: 0ns
- BlocksRecycled: 0
- TotalIntegrityCheckTime: 0ns
- MaxBlockSize: 8.00 MB
DataBufferSender (dst_fragment_instance_id=a0a9259df9844029-845331577440a3be):
- AppendBatchTime: 9.23us
- ResultRendTime: 956ns
- TupleConvertTime: 5.735us
- NumSentRows: 5
- TotalRawReadTime: 0ns
- CompressedBytesRead: 6.47 KB
- PeakMemoryUsage: 0.00
- RowsPushedCondFiltered: 0
- ScanRangesComplete: 0
- ScanTime: 25.195us
- BitmapIndexFilterTimer: 0ns
- BitmapIndexFilterCount: 0
- NumScanners: 65
- RowsStatsFiltered: 0
- VectorPredEvalTime: 0ns
- BlockSeekTime: 1.299ms
- ScannerThreadsVoluntaryContextSwitches: 0
- RowsDelFiltered: 0
- IndexLoadTime: 911.104us
- NumDiskAccess: 1
- ScannerThreadsTotalWallClockTime: 0ns
- MaterializeTupleTime: 0ns
- ScannerThreadsUserTime: 0ns
- ScannerThreadsSysTime: 0ns
- TotalPagesNum: 0
- BlockLoadTime: 539.289us
- CachedPagesNum: 0
- BlocksLoad: 384
- UncompressedBytesRead: 0.00
- RowsBloomFilterFiltered: 0
- TabletCount : 1
- RowsReturned: 5
- ScannerThreadsInvoluntaryContextSwitches: 0
- DecompressorTimer: 0ns
- RowsVectorPredFiltered: 0
- ReaderInitTime: 6.498ms
- RowsRead: 5
- PerReadThreadRawHdfsThroughput: 0.0 /sec
- BlockFetchTime: 4.318ms
- ShowHintsTime: 0ns
- IOTimer: 1.154ms
- BytesRead: 48.49 KB
- BlockConvertTime: 97.539us
- BlockSeekCount: 0
Path parameters
None
Query parameters
query_id
Specify query id
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
},
"count": 0
Examples
1. Get the query profile of the specified query id
GET /api/profile?query_id=f732084bc8e74f39-8313581c9c3c0b58
Response:
"msg": "success",
"code": 0,
"data": {
},
"count": 0
Request
GET /api/query_detail
Description
Used to obtain information about all queries after a specified time point
Path parameters
None
Query parameters
event_time
At the specified time point (Unix timestamp, in milliseconds), obtain query information after that time point.
Request body
无
Response
{
"msg": "success",
"code": 0,
"data": {
"query_details": [{
"eventTime": 1596462699216,
"queryId": "f732084bc8e74f39-8313581c9c3c0b58",
"startTime": 1596462698969,
"endTime": 1596462699216,
"latency": 247,
"state": "FINISHED",
"database": "db1",
}, {
"eventTime": 1596463013929,
"queryId": "ed2d0d80855d47a5-8b518a0f1472f60c",
"startTime": 1596463013913,
"endTime": 1596463013929,
"latency": 16,
"state": "FINISHED",
"database": "db1",
}]
},
"count": 0
Examples
GET /api/query_detail?event_time=1596462079958
Response:
"msg": "success",
"code": 0,
"data": {
"query_details": [{
"eventTime": 1596462699216,
"queryId": "f732084bc8e74f39-8313581c9c3c0b58",
"startTime": 1596462698969,
"endTime": 1596462699216,
"latency": 247,
"state": "FINISHED",
"database": "db1",
}, {
"eventTime": 1596463013929,
"queryId": "ed2d0d80855d47a5-8b518a0f1472f60c",
"startTime": 1596463013913,
"endTime": 1596463013929,
"latency": 16,
"state": "FINISHED",
"database": "db1",
}]
},
"count": 0
Request
POST /api/query_schema/<ns_name>/<db_name>
Description
The Query Schema Action can return the table creation statement for the given SQL-related table. Can be used to test some
query scenarios locally.
Path parameters
<db_name>
Specify the database name. This database will be considered as the default database for the current session, and will be
used if the table name in SQL does not qualify the database name.
Query parameters
None
Request body
text/plain
sql
Response
Return value
) ENGINE=OLAP
COMMENT 'OLAP'
PROPERTIES (
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
) ENGINE=OLAP
COMMENT 'OLAP'
PROPERTIES (
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
Example
1. Write the SQL in local file 1.sql
Request
GET /api/rowcount
Description
Used to manually update the row count statistics of the specified table. While updating the statistics of the number of rows,
the table and the number of rows corresponding to the rollup will also be returned in JSON format
Path parameters
None
Query parameters
db
Specify database
table
Specify table
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"tbl1": 10000
},
"count": 0
Examples
1. Update and get the number of rows in the specified Table
GET /api/rowcount?db=example_db&table=tbl1
Response:
"msg": "success",
"code": 0,
"data": {
"tbl1": 10000
},
"count": 0
Request
GET /api/_set_config
Description
Used to dynamically set the configuration of FE. This command is passed through the ADMIN SET FRONTEND CONFIG
command. But this command will only set the configuration of the corresponding FE node. And it will not automatically
forward the MasterOnly configuration item to the Master FE node.
Path parameters
None
Query parameters
confkey1=confvalue1
Specify the configuration name to be set, and its value is the configuration value to be modified.
persist
Whether to persist the modified configuration. The default is false, which means it is not persisted. If it is true, the
modified configuration item will be written into the fe_custom.conf file and will still take effect after FE is restarted.
reset_persist
Whether or not to clear the original persist configuration only takes effect when the persist parameter is
true. For compatibility with the original version, reset_persist defaults to true.
If persist is set to true and reset_persist is not set or reset_persist is true, the configuration in the fe_custom.conf file will
be cleared before this modified configuration is written to fe_custom.conf .
If persist is set to true and reset_persist is false, this modified configuration item will be incrementally added to
fe_custom.conf .
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"set": {
"key": "value"
},
"err": [
"config_name": "",
"config_value": "",
"err_info": ""
],
"persist":""
},
"count": 0
set err
The field indicates the successfully set configuration. The field indicates the configuration that failed to be set. The
persist field indicates persistent information.
Examples
1. Set the values of storage_min_left_capacity_bytes , replica_ack_policy and agent_task_resend_wait_time_ms .
GET /api/_set_config?
storage_min_left_capacity_bytes=1024&replica_ack_policy=SIMPLE_MAJORITY&agent_task_resend_wait_time_ms=true
Response:
"msg": "success",
"code": 0,
"data": {
"set": {
"storage_min_left_capacity_bytes": "1024"
},
"err": [
"config_name": "replica_ack_policy",
"config_value": "SIMPLE_MAJORITY",
},
"config_name": "agent_task_resend_wait_time_ms",
"config_value": "true",
],
"persist": ""
},
"count": 0
storage_min_left_capacity_bytes Successfully;
replica_ack_policy Failed, because the configuration item does not support dynamic modification.
agent_task_resend_wait_time_ms Failed, failed to set the boolean type because the configuration item is of
type long.
GET /api/_set_config?max_bytes_per_broker_scanner=21474836480&persist=true&reset_persist=false
Response:
"msg": "success",
"code": 0,
"data": {
"set": {
"max_bytes_per_broker_scanner": "21474836480"
},
"err": [],
"persist": "ok"
},
"count": 0
#You can modify this file manually, and the configurations in this file
max_bytes_per_broker_scanner=21474836480
Request
GET /api/show_data
Description
Used to get the total data volume of the cluster or the data volume of the specified database. Unit byte.
Path parameters
None
Query parameters
db
Request body
None
Response
1. Specify the amount of data in the database.
"msg": "success",
"code": 0,
"data": {
"default_cluster:db1": 381
},
"count": 0
2. Total data
"msg": "success",
"code": 0,
"data": {
"__total_size": 381
},
"count": 0
Examples
1. Get the data volume of the specified database
GET /api/show_data?db=db1
Response:
"msg": "success",
"code": 0,
"data": {
"default_cluster:db1": 381
},
"count": 0
GET /api/show_data
Response:
"msg": "success",
"code": 0,
"data": {
"__total_size": 381
},
"count": 0
Request
GET /api/show_meta_info
Description
Used to display some metadata information
Path parameters
无
Query parameters
action
Specify the type of metadata information to be obtained. Currently supports the following:
SHOW_DB_SIZE
SHOW_HA
Obtain the playback status of FE metadata logs and the status of electable groups.
Request body
None
Response
SHOW_DB_SIZE
"msg": "success",
"code": 0,
"data": {
"default_cluster:information_schema": 0,
"default_cluster:db1": 381
},
"count": 0
SHOW_HA
"msg": "success",
"code": 0,
"data": {
"can_read": "true",
"role": "MASTER",
"is_ready": "true",
"last_checkpoint_version": "1492",
"last_checkpoint_time": "1596465109000",
"current_journal_id": "1595",
"electable_nodes": "",
"observer_nodes": "",
"master": "10.81.85.89"
},
"count": 0
Examples
1. View the data size of each database in the cluster
GET /api/show_meta_info?action=show_db_size
Response:
"msg": "success",
"code": 0,
"data": {
"default_cluster:information_schema": 0,
"default_cluster:db1": 381
},
"count": 0
GET /api/show_meta_info?action=show_ha
Response:
"msg": "success",
"code": 0,
"data": {
"can_read": "true",
"role": "MASTER",
"is_ready": "true",
"last_checkpoint_version": "1492",
"last_checkpoint_time": "1596465109000",
"current_journal_id": "1595",
"electable_nodes": "",
"observer_nodes": "",
"master": "10.81.85.89"
},
"count": 0
Request
GET /api/show_proc
Description
Used to obtain PROC information.
Path parameters
None
Query parameters
path
forward
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": [
],
"count": 0
Examples
1. View /statistic information
GET /api/show_proc?path=/statistic
Response:
"msg": "success",
"code": 0,
"data": [
["10003", "default_cluster:db1", "2", "3", "3", "3", "3", "0", "0", "0"],
["10013", "default_cluster:doris_audit_db__", "1", "4", "4", "4", "4", "0", "0", "0"],
["Total", "2", "3", "7", "7", "7", "7", "0", "0", "0"]
],
"count": 0
GET /api/show_proc?path=/statistic&forward=true
Response:
"msg": "success",
"code": 0,
"data": [
["10003", "default_cluster:db1", "2", "3", "3", "3", "3", "0", "0", "0"],
["10013", "default_cluster:doris_audit_db__", "1", "4", "4", "4", "4", "0", "0", "0"],
["Total", "2", "3", "7", "7", "7", "7", "0", "0", "0"]
],
"count": 0
Request
GET /api/show_runtime_info
Description
Used to obtain Runtime information of FE JVM
Path parameters
None
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"free_mem": "855642056",
"total_mem": "1037959168",
"thread_cnt": "98",
"max_mem": "1037959168"
},
"count": 0
Examples
1. Get the JVM information of the current FE node
GET /api/show_runtime_info
Response:
"msg": "success",
"code": 0,
"data": {
"free_mem": "855642056",
"total_mem": "1037959168",
"thread_cnt": "98",
"max_mem": "1037959168"
},
"count": 0
Request
POST /api/query/<ns_name>/<db_name>
Description
Statement Execution Action is used to execute a statement and return the result.
Path parameters
<db_name>
Specify the database name. This database will be regarded as the default database of the current session. If the table
name in SQL does not qualify the database name, this database will be used.
Query parameters
None
Request body
{
返回结果集
{
"msg": "success",
"code": 0,
"data": {
"type": "result_set",
"data": [
[1],
[2]
],
"meta": [{
"name": "k1",
"type": "INT"
}],
"status": {},
"time": 10
},
"count": 0
The type field is result_set , which means the result set is returned. The results need to be obtained and displayed
based on the meta and data fields. The meta field describes the column information returned. The data field returns
the result row. The column type in each row needs to be judged by the content of the meta field. The status field
returns some information of MySQL, such as the number of alarm rows, status code, etc. The time field return the
execution time, unit is millisecond.
"msg": "success",
"code": 0,
"data": {
"type": "exec_status",
"status": {}
},
"count": 0,
"time": 10
The type field is exec_status , which means the execution result is returned. At present, if the return result is
received, it means that the statement was executed successfully.
Request
POST /api/<db>/<table>/_query_plan
Description
Given a SQL, it is used to obtain the query plan corresponding to the SQL.
This interface is currently used in Spark-Doris-Connector, Spark obtains Doris' query plan.
Path parameters
<db>
Specify database
<table>
Specify table
Query parameters
None
Request body
{
Response
{
"msg": "success",
"code": 0,
"data": {
"partitions": {
"10039": {
"routings": ["10.81.85.89:9062"],
"version": 2,
"versionHash": 982459448378619656,
"schemaHash": 1294206575
},
"opaqued_query_plan":
"DAABDAACDwABDAAAAAEIAAEAAAAACAACAAAAAAgAAwAAAAAKAAT//////////w8ABQgAAAABAAAAAA8ABgIAAAABAAIACAAMABIIAAEAAAAADwACCw
"status": 200
},
"count": 0
Examples
1. Get the query plan of the specified SQL
POST /api/db1/tbl1/_query_plan
Response:
"msg": "success",
"code": 0,
"data": {
"partitions": {
"10039": {
"routings": ["192.168.1.1:9060"],
"version": 2,
"versionHash": 982459448378619656,
"schemaHash": 1294206575
},
"opaqued_query_plan": "DAABDAACDwABD...",
"status": 200
},
"count": 0
Request
GET /api/<db>/<table>/_count
Description
Used to obtain statistics about the number of rows in a specified table. This interface is currently used in Spark-Doris-
Connector. Spark obtains Doris table statistics.
Path parameters
<db>
Specify database
<table>
Specify table
Query parameters
None
Request body
None
Response
{
"msg": "success",
"code": 0,
"data": {
"size": 1,
"status": 200
},
"count": 0
The data.size field indicates the number of rows in the specified table.
Examples
1. Get the number of rows in the specified table.
GET /api/db1/tbl1/_count
Response:
"msg": "success",
"code": 0,
"data": {
"size": 1,
"status": 200
},
"count": 0
Request
GET /api/<db>/<table>/_schema
Description
Used to obtain the table structure information of the specified table. This interface is currently used in Spark/Flink Doris
Connector. obtains Doris table structure information.
Path parameters
<db>
Specify database
<table>
Specify table
Query parameters
None
Request body
None
Response
The http interface returns as follows:
"msg": "success",
"code": 0,
"data": {
"properties": [{
"type": "INT",
"name": "k1",
"comment": "",
"aggregation_type":""
}, {
"type": "INT",
"name": "k2",
"comment": "",
"aggregation_type":"MAX"
}],
"keysType":UNIQUE_KEYS,
"status": 200
},
"count": 0
"msg": "success",
"code": 0,
"data": {
"properties": [{
"type": "INT",
"name": "k1",
"comment": ""
}, {
"type": "INT",
"name": "k2",
"comment": ""
}],
"keysType":UNIQUE_KEYS,
"status": 200
},
"count": 0
Note: The difference is that the http method returns more aggregation_type fields than the http v2 method. The http v2
is enabled by setting enable_http_server_v2 . For detailed parameter descriptions, see fe parameter settings
Examples
1. Get the table structure information of the specified table via http interface.
GET /api/db1/tbl1/_schema
Response:
"msg": "success",
"code": 0,
"data": {
"properties": [{
"type": "INT",
"name": "k1",
"comment": "",
"aggregation_type":""
}, {
"type": "INT",
"name": "k2",
"comment": "",
"aggregation_type":"MAX"
}],
"keysType":UNIQUE_KEYS,
"status": 200
},
"count": 0
2. Get the table structure information of the specified table via http v2 interface.
GET /api/db1/tbl1/_schema
Response:
"msg": "success",
"code": 0,
"data": {
"properties": [{
"type": "INT",
"name": "k1",
"comment": ""
}, {
"type": "INT",
"name": "k2",
"comment": ""
}],
"keysType":UNIQUE_KEYS,
"status": 200
},
"count": 0
Upload Action
Upload Action currently mainly serves the front-end page of FE, and is used for users to load small test files.
Request
POST /api/<namespace>/<db>/<tbl>/upload
Path parameters
<namespace>
<db>
Specify database
<tbl>
Specify table
Query parameters
column_separator
preview
Optional, if set to true , up to 10 rows of data rows split according to column_separator will be displayed in the returned
result.
Request body
The content of the file to be uploaded, the Content-type is multipart/form-data
Response
{
"msg": "success",
"code": 0,
"data": {
"id": 1,
"uuid": "b87824a4-f6fd-42c9-b9f1-c6d68c5964c2",
"originFileName": "data.txt",
"fileSize": 102400,
"absPath": "/path/to/file/data.txt"
"maxColNum" : 5
},
"count": 1
Request
PUT /api/<namespace>/<db>/<tbl>/upload
Path parameters
<namespace>
<db>
Specify database
<tbl>
Specify table
Query parameters
file_id
Specify the load file id, which is returned by the API that uploads the file.
file_uuid
Specify the file uuid, which is returned by the API that uploads the file.
Header
The options in the header are the same as those in the header in the Stream Load request.
Request body
None
Response
"msg": "success",
"code": 0,
"data": {
"TxnId": 7009,
"Label": "9dbdfb0a-120b-47a2-b078-4531498727cb",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 3,
"NumberLoadedRows": 3,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 12,
"LoadTimeMs": 71,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 1,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 13,
"CommitAndPublishTimeMs": 53
},
"count": 1
Example
PUT /api/default_cluster/db1/tbl1/upload?file_id=1&file_uuid=b87824a4-f6fd-42c9-b9f1-c6d68c5964c2
Import Action
Request
POST /api/import/file_review
Description
View the contents of the file in CSV or PARQUET format.
Path parameters
None
Query parameters
None
Request body
TO DO
Response
TO DO
Request
GET /api/meta/namespaces/<ns>/databases GET /api/meta/namespaces/<ns>/databases/<db>/tables GET
/api/meta/namespaces/<ns>/databases/<db>/tables/<tbl>/schema
Description
Used to obtain metadata information about the cluster, including the database list, table list, and table schema.
Path parameters
ns
db
tbl
Query parameters
None
Request body
None
Response
{
"msg":"success",
"code":0,
"count":0
Statistic Action
Request
GET /rest/v2/api/cluster_overview
Description
获取集群统计信息、库表数量等。
Path parameters
无
Query parameters
无
Request body
无
Response
{
"msg":"success",
"code":0,
"data":{"diskOccupancy":0,"remainDisk":5701197971457,"feCount":1,"tblCount":27,"beCount":1,"dbCount":2},
"count":0
Request
GET be_host:be_http_port/api/be_version_info
Description
Get be version info from be host.
Path parameters
None.
Query parameters
None.
Request body
None.
Response
```
"msg":"success",
"code":0,
"data":{
"beVersionInfo":{
"dorisBuildVersionPrefix":"doris",
"dorisBuildVersionMajor":0,
"dorisBuildVersionMinor":0,
"dorisBuildVersionPatch":0,
"dorisBuildVersionRcVersion":"trunk",
"dorisBuildVersion":"doris-0.0.0-trunk",
"dorisBuildHash":"git://4b7b503d1cb3/data/doris/doris/be/../@a04f9814fe5a09c0d9e9399fe71cc4d765f8bff1",
"dorisBuildShortHash":"a04f981",
"dorisBuildInfo":"root@4b7b503d1cb3"
},
"count":0
```
Examples
```
GET be_host:be_http_port/api/be_version_info
Response:
"msg":"success",
"code":0,
"data":{
"beVersionInfo":{
"dorisBuildVersionPrefix":"doris",
"dorisBuildVersionMajor":0,
"dorisBuildVersionMinor":0,
"dorisBuildVersionPatch":0,
"dorisBuildVersionRcVersion":"trunk",
"dorisBuildVersion":"doris-0.0.0-trunk",
"dorisBuildHash":"git://4b7b503d1cb3/data/doris/doris/be/../@a04f9814fe5a09c0d9e9399fe71cc4d765f8bff1",
"dorisBuildShortHash":"a04f981",
"dorisBuildInfo":"root@4b7b503d1cb3"
},
"count":0
```
RESTORE TABLET
description
To restore the tablet data from trash dir on BE
METHOD: POST
URI: http://be_host:be_http_port/api/restore_tablet?tablet_id=xxx&schema_hash=xxx
example
curl -X POST "http://hostname:8088/api/restore_tablet?tablet_id=123456\&schema_hash=1111111"
keyword
RESTORE,TABLET,RESTORE,TABLET
PAD ROWSET
description
Pad one empty rowset as one substitute for error replica.
METHOD: POST
URI: http://be_host:be_http_port/api/pad_rowset?tablet_id=xxx&start_version=xxx&end_version=xxx
example
curl -X POST "http://hostname:8088/api/pad_rowset?tablet_id=123456\&start_version=1111111\$end_version=1111112"
keyword
ROWSET,TABLET,ROWSET,TABLET
status: "Success",
or
status: "Fail",
status: "Success",
dest_disk: "xxxxxx"
or
status: "Success",
dest_disk: "xxxxxx"
or
status: "Success",
dest_disk: "xxxxxx"
The return is the number distribution of tablets under each partition between different disks on BE node, which only include
tablet number.
msg: "OK",
code: 0,
data: {
host: "***",
tablets_distribution: [
partition_id:***,
disks:[
disk_path:"***",
tablets_num:***,
},
disk_path:"***",
tablets_num:***,
},
...
},
partition_id:***,
disks:[
disk_path:"***",
tablets_num:***,
},
disk_path:"***",
tablets_num:***,
},
...
},
...
},
count: ***
The return is the number distribution of tablets under the particular partition between different disks on BE node, which
include tablet number, tablet id, schema hash and tablet size.
msg: "OK",
code: 0,
data: {
host: "***",
tablets_distribution: [
partition_id:***,
disks:[
disk_path:"***",
tablets_num:***,
tablets:[
tablet_id:***,
schema_hash:***,
tablet_size:***
},
...
},
...
},
count: ***
Compaction Action
This API is used to view the overall compaction status of a BE node or the compaction status of a specified tablet. It can also
be used to manually trigger Compaction.
Return JSON:
"CumulativeCompaction": {
"/home/disk2" : [10003]
},
"BaseCompaction": {
"/home/disk2" : [10003]
This structure represents the id of the tablet that is performing the compaction task in a certain data directory, and the type
of compaction.
"status": "Fail",
"rowsets": [
],
"missing_rowsets": [],
},
Explanation of results:
cumulative policy type: The cumulative compaction policy type which is used by current tablet.
cumulative point: The version boundary between base and cumulative compaction. Versions before (excluding) points
are handled by base compaction. Versions after (inclusive) are handled by cumulative compaction.
last cumulative failure time: The time when the last cumulative compaction failed. After 10 minutes by default,
cumulative compaction is attempted on the this tablet again.
last base failure time: The time when the last base compaction failed. After 10 minutes by default, base compaction is
Examples
curl -X GET http://192.168.10.24:8040/api/compaction/show?tablet_id=10015
The only one manual compaction task that can be performed at a moment, and the value range of compact_type is base or
cumulative
"status": "Fail",
If the compaction execution task fails to be triggered, an error in JSON format is returned:
{
"status": "Fail",
If the compaction execution task successes to be triggered, an error in JSON format is returned:
"status": "Success",
"msg": "compaction task is successfully triggered."
Explanation of results:
status: Trigger task status, when it is successfully triggered, it is Success; when for some reason (for example, the
appropriate version is not obtained), it returns Fail.
Examples
curl -X POST http://192.168.10.24:8040/api/compaction/run?tablet_id=10015\&compact_type=cumulative
"status": "Fail",
If the tablet exists and the tablet is not running, JSON format is returned:
"status" : "Success",
"run_status" : false,
"tablet_id" : 11308,
"compact_type" : ""
If the tablet exists and the tablet is running, JSON format is returned:
"status" : "Success",
"run_status" : true,
"msg" : "this tablet_id is running",
"tablet_id" : 11308,
"compact_type" : "cumulative"
Explanation of results:
run_status: Get the current manual compaction task execution status.
Examples
curl -X GET http://192.168.10.24:8040/api/compaction/run_status?tablet_id=10015
The return is the tablet id and schema hash for a certain number of tablets on the BE node. The data is returned as a rendered
Web page. The number of returned tablets is determined by the parameter limit. If parameter limit does not exist, none tablet
will be returned. if the value of parameter limit is "all", all the tablets on the BE node will be returned. if the value of parameter
limit is non-numeric type other than "all", none tablet will be returned.
The return is the tablet id and schema hash for a certain number of tablets on the BE node. The returned data is organized as
a Json object. The number of returned tablets is determined by the parameter limit. If parameter limit does not exist, none
tablet will be returned. if the value of parameter limit is "all", all the tablets on the BE node will be returned. if the value of
parameter limit is non-numeric type other than "all", none tablet will be returned.
msg: "OK",
code: 0,
data: {
host: "10.38.157.107",
tablets: [
tablet_id: 11119,
schema_hash: 714349777
},
...
tablet_id: 11063,
schema_hash: 714349777
},
count: 30
description
Description: Check whether the connection cache is available, the maximum load is 10M
METHOD: GET
URI: http://be_host:be_http_port/api/check_rpc_channel/{host_to_check}/{remot_brpc_port}/{payload_size}
METHOD: GET
URI: http://be_host:be_http_port/api/reset_rpc_channel/{endpoints}
example
curl -X GET "http://host:port/api/check_rpc_channel/host2/8060/1024000"
When parameter repair is set to true , tablets with lost segment will be set to SHUTDOWN status and treated as bad replica,
which can be detected and repaired by FE. Otherwise, all tablets with missing segment are returned and nothing is done.
The return is all tablets on the current BE node that have lost segment:
status: "Success",
num: 3,
bad_tablets: [
11190,
11210,
11216
],
set_bad: true,
host: "172.3.0.101"
Install Error
This document is mainly used to record the common problems of operation and maintenance during the use of Doris. It will
be updated from time to time.
The name of the BE binary that appears in this doc is doris_be , which was palo_be in previous versions.
Q1. Why is there always some tablet left when I log off the BE node through DECOMMISSION?
During the offline process, use show backends to view the tabletNum of the offline node, and you will observe that the
number of tabletNum is decreasing, indicating that data shards are being migrated from this node. When the number is
reduced to 0, the system will automatically delete the node. But in some cases, tabletNum will not change after it drops to a
certain value. This is usually possible for two reasons:
1. The tablets belong to the table, partition, or materialized view that was just dropped. Objects that have just been deleted
remain in the recycle bin. The offline logic will not process these shards. The time an object resides in the recycle bin can
be modified by modifying the FE configuration parameter catalog_trash_expire_second. These tablets are disposed of
when the object is removed from the recycle bin.
2. There is a problem with the migration task for these tablets. At this point, you need to view the errors of specific tasks
through show proc show proc "/cluster_balance" .
For the above situation, you can first check whether there are unhealthy shards in the cluster through show proc
"/cluster_health/tablet_health"; . If it is 0, you can delete the BE directly through the drop backend statement. Otherwise,
you also need to check the replicas of unhealthy shards in detail.
The value of priorty_network is expressed in CIDR format. Divided into two parts, the first part is the IP address in dotted
decimal, and the second part is a prefix length. For example 10.168.1.0/8 will match all 10.xx.xx.xx IP addresses, and 10.168.1.0/16
will match all 10.168.xx.xx IP addresses.
The reason why the CIDR format is used instead of specifying a specific IP directly is to ensure that all nodes can use a
uniform configuration value. For example, there are two nodes: 10.168.10.1 and 10.168.10.2, then we can use 10.168.10.0/24 as
the value of priorty_network.
All FE nodes of the Follower role will form an optional group, similar to the group concept in the Paxos consensus protocol.
A Follower will be elected as the Master in the group. When the Master hangs up, a new Follower will be automatically
selected as the Master. The Observer will not participate in the election, so the Observer will not be called Master.
A metadata log needs to be successfully written in most Follower nodes to be considered successful. For example, if there
are 3 FEs, only 2 can be successfully written. This is why the number of Follower roles needs to be an odd number.
The role of Observer is the same as the meaning of this word. It only acts as an observer to synchronize the metadata logs
that have been successfully written, and provides metadata reading services. He will not be involved in the logic of the
majority writing.
Typically, 1 Follower + 2 Observer or 3 Follower + N Observer can be deployed. The former is simple to operate and maintain,
and there is almost no consistency agreement between followers to cause such complex error situations (Most companies
use this method). The latter can ensure the high availability of metadata writing. If it is a high concurrent query scenario,
Observer can be added appropriately.
Q4. A new disk is added to the node, why is the data not balanced to the new disk?
The current Doris balancing strategy is based on nodes. That is to say, the cluster load is judged according to the overall load
index of the node (number of shards and total disk utilization). And migrate data shards from high-load nodes to low-load
nodes. If each node adds a disk, from the overall point of view of the node, the load does not change, so the balancing logic
cannot be triggered.
In addition, Doris currently does not support balancing operations between disks within a single node. Therefore, after adding
a new disk, the data will not be balanced to the new disk.
However, when data is migrated between nodes, Doris takes the disk into account. For example, when a shard is migrated
from node A to node B, the disk with low disk space utilization in node B will be preferentially selected.
Create a new table through the create table like statement, and then use the insert into select method to synchronize
data from the old table to the new table. Because when a new table is created, the data shards of the new table will be
distributed in the new disk, so the data will also be written to the new disk. This method is suitable for situations where
the amount of data is small (within tens of GB).
The decommission command is used to safely decommission a BE node. This command will first migrate the data
shards on the node to other nodes, and then delete the node. As mentioned earlier, during data migration, the disk with
low disk utilization will be prioritized, so this method can "force" the data to be migrated to the disks of other nodes.
When the data migration is completed, we cancel the decommission operation, so that the data will be rebalanced back
to this node. When we perform the above steps on all BE nodes, the data will be evenly distributed on all disks of all
nodes.
Note that before executing the decommission command, execute the following command to avoid the node being
deleted after being offline.
Doris provides HTTP API, which can manually specify the migration of data shards on one disk to another disk.
1. FE
FE logs mainly include:
Usually, we mainly view the fe.log log. In special cases, some logs may be output to fe.out.
2. BE
be.INFO: main log. This is actually a soft link, connected to the latest be.INFO.xxxx.
be.WARNING: A subset of the main log, only WARN and FATAL level logs are logged. This is actually a soft link,
connected to the latest be.WARN.xxxx.
be.out: log for standard/error output (stdout and stderr).
I0916 23:21:22.038795 28087 task_worker_pool.cpp:1594] finish report TASK. master host: 10.10.10.10, port:
9222
I0916 23:21:22.038795 : log level and datetime. The capital letter I means INFO, W means WARN, and F means
FATAL.
28087 : thread id. Through the thread id, you can view the context information of this thread and check what
happened in this thread.
task_worker_pool.cpp:1594 : code file and line number.
Usually we mainly look at the be.INFO log. In special cases, such as BE downtime, you need to check be.out.
1. BE
The BE process is a C/C++ process, which may hang due to some program bugs (memory out of bounds, illegal address
access, etc.) or Out Of Memory (OOM). At this point, we can check the cause of the error through the following steps:
i. View be.out
The BE process realizes that when the program exits due to an exception, it will print the current error stack to be.out
(note that it is be.out, not be.INFO or be.WARNING). Through the error stack, you can usually get a rough idea of
where the program went wrong.
Note that if there is an error stack in be.out, it is usually due to a program bug, and ordinary users may not be able to
solve it by themselves. Welcome to the WeChat group, github discussion or dev mail group for help, and post the
corresponding error stack, so that you can quickly Troubleshoot problems.
ii. dmesg
If there is no stack information in be.out, the probability is that OOM was forcibly killed by the system. At this time,
you can use the dmesg -T command to view the Linux system log. If a log similar to Memory cgroup out of memory:
Kill process 7187 (doris_be) score 1007 or sacrifice child appears at the end, it means that it is caused by OOM.
Memory problems can have many reasons, such as large queries, imports, compactions, etc. Doris is also constantly
optimizing memory usage. Welcome to the WeChat group, github discussion or dev mail group for help.
Logs starting with F are Fatal logs. For example, F0916 , indicating the Fatal log on September 16th. Fatal logs usually
indicate a program assertion error, and an assertion error will directly cause the process to exit (indicating a bug in
the program). Welcome to the WeChat group, github discussion or dev mail group for help.
2. FE
FE is a java process, and the robustness is better than the C/C++ program. Usually the reason for FE to hang up may be
OOM (Out-of-Memory) or metadata write failure. These errors usually have an error stack in fe.log or fe.out. Further
investigation is required based on the error stack information.
Q7. About the configuration of data directory SSD and HDD, create table encounter error Failed to
find enough host with storage medium and tag
Doris supports one BE node to configure multiple storage paths. Usually, one storage path can be configured for each disk.
At the same time, Doris supports storage media properties that specify paths, such as SSD or HDD. SSD stands for high-
speed storage device and HDD stands for low-speed storage device.
If doris cluster has only one storage medium type, the practice is not specify storage medium in be.conf configuration file.
Failed to find enough host with storage medium and tag , generally we got this error for only config SSD medium in
be.conf, but default parameter default_storage_medium in fe is HDD, so there is no HDD storage medium in cluster. There are
several ways to fix this, one is modify the parameter in fe.conf and restart fe; the other way is take the SSD config in be.conf
away,and the third way is add properties when create table {"storage_medium" = "ssd"}
By specifying the storage medium properties of the path, we can take advantage of Doris's hot and cold data partition
storage function to store hot data in SSD at the partition level, while cold data is automatically transferred to HDD.
It should be noted that Doris does not automatically perceive the actual storage medium type of the disk where the storage
path is located. This type needs to be explicitly indicated by the user in the path configuration. For example, the path
"/path/to/data1.SSD" means that this path is an SSD storage medium. And "data1.SSD" is the actual directory name. Doris
determines the storage media type based on the ".SSD" suffix after the directory name, not the actual storage media type.
That is to say, the user can specify any path as the SSD storage medium, and Doris only recognizes the directory suffix and
does not judge whether the storage medium matches. If no suffix is written, it will default to HDD.
In other words, ".HDD" and ".SSD" are only used to identify the "relative" "low speed" and "high speed" of the storage
directory, not the actual storage medium type. Therefore, if the storage path on the BE node has no medium difference, the
suffix does not need to be filled in.
Q8. Multiple FEs cannot log in when using Nginx to implement web UI load balancing
Doris can deploy multiple FEs. When accessing the Web UI, if Nginx is used for load balancing, there will be a constant
prompt to log in again because of the session problem. This problem is actually a problem of session sharing. Nginx provides
centralized session sharing. The solution, here we use the ip_hash technology in nginx, ip_hash can direct the request of an ip
to the same backend, so that a client and a backend under this ip can establish a stable session, ip_hash is defined in the
upstream configuration:
upstream doris.com {
ip_hash;
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
http {
'"$http_user_agent" "$http_x_forwarded_for"';
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
# See http://nginx.org/en/docs/ngx_core_module.html#include
include /etc/nginx/conf.d/*.conf;
#include /etc/nginx/custom/*.conf;
upstream doris.com {
server 172.22.197.238:8030 weight=3;
ip_hash;
server {
listen 80;
server_name gaia-pro-bigdata-fe02;
if ($request_uri ~ _load) {
location / {
proxy_pass http://doris.com;
proxy_redirect default;
location = /50x.html {
root html;
Q9. FE fails to start, "wait catalog to be ready. FE type UNKNOWN" keeps scrolling in fe.log
There are usually two reasons for this problem:
1. The local IP obtained when FE is started this time is inconsistent with the last startup, usually because priority_network
is not set correctly, which causes FE to match the wrong IP address when it starts. Restart FE after modifying
priority_network .
2. Most Follower FE nodes in the cluster are not started. For example, there are 3 Followers, and only one is started. At this
time, at least one other FE needs to be started, so that the FE electable group can elect the Master to provide services.
If the above situation cannot be solved, you can restore it according to the [metadata operation and maintenance
document] (.. /admin-manual/maint-monitor/metadata-operation.md) in the Doris official website document.
Q10. Lost connection to MySQL server at 'reading initial communication packet', system error: 0
If the following problems occur when using MySQL client to connect to Doris, this is usually caused by the different jdk
version used when compiling FE and the jdk version used when running FE. Note that when using docker to compile the
image, the default JDK version is openjdk 11, and you can switch to openjdk 8 through the command (see the compilation
documentation for details).
Q11. recoveryTracker should overlap or follow on disk last VLSN of 4,422,880 recoveryFirst=
4,422,882 UNEXPECTED_STATE_FATAL
Sometimes when FE is restarted, the above error will occur (usually only in the case of multiple Followers). And the two
values in the error differ by 2. Causes FE to fail to start.
This is a bug in bdbje that has not yet been resolved. In this case, you can only restore the metadata by performing the
operation of failure recovery in Metadata Operation and Maintenance Documentation.
Q13. Error starting FE or unit test locally Cannot find external parser table action_table.dat
Run the following command
cp fe-core/target/generated-sources/cup/org/apache/doris/analysis/action_table.dat fe-
core/target/classes/org/apache/doris/analysis
### Q14. Doris upgrades to version 1.0 or later and reports error ` Failed to set ciphers to use
(2026) in MySQL appearance via ODBC.
This problem occurs after doris upgrades to version 1.0 and uses Connector/ODBC 8.0.x or higher. Connector/ODBC 8.0.x has
multiple access methods, such as /usr/lib64/libmyodbc8w.so which is installed via yum and relies on libssl.so.10 and
libcrypto.so.10 .
In doris 1.0 onwards, openssl has been upgraded to 1.1 and is built into the doris binary package, so this can
lead to openssl conflicts and errors like the following
ERROR 1105 (HY000): errCode = 2, detailMessage = driver connect Error: HY000 [MySQL][ODBC 8.0(w) Driver]SSL
connection error: Failed to set ciphers to use (2026)
The solution is to use the Connector/ODBC 8.0.28 version of ODBC Connector and select Linux - Generic in the operating
system, this version of ODBC Driver uses openssl version 1.1. Or use a lower version of ODBC connector, e.g.
Connector/ODBC 5.3.14. For details, see the ODBC exterior documentation.
You can verify the version of openssl used by MySQL ODBC Driver by
If the output contains libssl.so.10 , there may be problems using it, if it contains libssl.so.1.1 , it is compatible with doris
1.0
Q15. After upgrading to version 1.2, the BE NoClassDefFoundError issue failed to start
Since Version 1.2 Java UDF dependency errorIf the upgrade support starts be, the following Java
Q16. After upgrading to version 1.2, BE startup shows Failed to initialize JNI
Since Version 1.2 If the following `Failed to initialize JNI` error occurs when starting BE after upgrading ``` Failed to
initialize JNI: Failed to find the library libjvm.so. ``` You need to set the `JAVA _HOME` environment variable, or add `export
JAVA _HOME=your_java_home_path` in the first line of the `start_be.sh` startup script, and then restart the BE node.
Q1. Use Stream Load to access FE's public network address to import data, but is redirected to the
intranet IP?
When the connection target of stream load is the http port of FE, FE will only randomly select a BE node to perform the http
307 redirect operation, so the user's request is actually sent to a BE assigned by FE. The redirect returns the IP of the BE, that
is, the intranet IP. So if you send the request through the public IP of FE, it is very likely that you cannot connect because it is
redirected to the internal network address.
The usual way is to ensure that you can access the intranet IP address, or to assume a load balancer for all BE upper layers,
and then directly send the stream load request to the load balancer, and the load balancer will transparently transmit the
request to the BE node .
Doris supports modifying database name, table name, partition name, materialized view (Rollup) name, as well as column
type, comment, default value, etc. But unfortunately, modifying column names is currently not supported.
For some historical reasons, the column names are currently written directly to the data file. When Doris queries, it also finds
the corresponding column through the class name. Therefore, modifying the column name is not only a simple metadata
modification, but also involves data rewriting, which is a very heavy operation.
We do not rule out some compatible means to support lightweight column name modification operations in the future.
Q3. Does the table of the Unique Key model support creating a materialized view?
not support.
The table of the Unique Key model is a business-friendly table. Because of its unique function of deduplication according to
the primary key, it can easily synchronize business databases with frequently changed data. Therefore, many users will first
consider using the Unique Key model when accessing data into Doris.
But unfortunately, the table of the Unique Key model cannot establish a materialized view. The reason is that the essence of
the materialized view is to "pre-compute" the data through pre-computation, so that the calculated data is directly returned
during the query to speed up the query. In the materialized view, the "pre-computed" data is usually some aggregated
indicators, such as sum and count. At this time, if the data changes, such as update or delete, because the pre-computed
data has lost detailed information, it cannot be updated synchronously. For example, a sum value of 5 may be 1+4 or 2+3.
Because of the loss of detailed information, we cannot distinguish how this summation value is calculated, so we cannot
meet the needs of updating.
The -238 error usually occurs when the same batch of imported data is too large, resulting in too many Segment files for a
tablet (default is 200, controlled by the BE parameter max_segment_num_per_rowset ). At this time, it is recommended to
reduce the amount of data imported in one batch, or appropriately increase the BE configuration parameter value to solve
the problem.
This error can occur during a query or import operation. Usually means that the copy of the corresponding tablet has an
exception.
At this point, you can first check whether the BE node is down by using the show backends command. For example, the
isAlive field is false, or the LastStartTime is a recent time (indicating that it has been restarted recently). If the BE is down, you
need to go to the node corresponding to the BE and check the be.out log. If BE is down for abnormal reasons, the exception
stack is usually printed in be.out to help troubleshoot the problem. If there is no error stack in be.out. Then you can use the
linux command dmesg -T to check whether the process is killed by the system because of OOM.
If no BE node is down, you need to pass the show tablet 110309738 statement, and then execute the show proc statement in
the result to check the status of each tablet copy for further investigation.
At this point, you need to go to the corresponding BE node to check the usage in the data directory. The trash directory and
snapshot directory can be manually cleaned to free up space. If the data directory occupies a large space, you need to
consider deleting some data to free up space. For details, please refer to Disk Space Management.
Q7. Calling stream load to import data through a Java program may result in a Broken Pipe error
when a batch of data is large.
Apart from Broken Pipe, some other weird errors may occur.
This situation usually occurs after enabling httpv2. Because httpv2 is an http service implemented using spring boot, and
uses tomcat as the default built-in container. However, there seems to be some problems with tomcat's handling of 307
forwarding, so the built-in container was modified to jetty later. In addition, the version of apache http client in the java
program needs to use the version after 4.5.13. In the previous version, there were also some problems with the processing of
forwarding.
1. Disable httpv2
Restart FE after adding enable_http_server_v2=false in fe.conf. However, the new version of the UI interface can no longer
be used, and some new interfaces based on httpv2 can not be used. (Normal import queries are not affected).
2. Upgrade
A -214 error means that the data version for the corresponding tablet is missing. For example, the above error indicates that
the data version of the copy of tablet 63416 on the BE of 192.168.100.10 is missing. (There may be other similar error codes,
which can be checked and repaired in the following ways).
Typically, if your data has multiple copies, the system will automatically repair these problematic copies. You can
troubleshoot with the following steps:
Normally, the Version of multiple copies of a tablet should be the same. And it is the same as the VisibleVersion version of the
corresponding partition.
You can view the corresponding partition version with show partitions from tblx (the partition corresponding to the tablet
can be obtained in the show tablet statement.)
At the same time, you can also visit the URL in the CompactionStatus column in the show proc statement (just open it in a
browser) to view more specific version information to check which versions are missing.
If there is no automatic repair for a long time, you need to use the show proc "/cluster_balance" statement to view the
tablet repair and scheduling tasks currently being executed by the system. It may be because there are a large number of
tablets waiting to be scheduled, resulting in a longer repair time. You can follow records in pending_tablets and
running_tablets .
Further, you can use the admin repair statement to specify a table or partition to be repaired first. For details, please refer to
help admin repair ;
If it still can't be repaired, then in the case of multiple replicas, we use the admin set replica status command to force the
replica in question to go offline. For details, see the example of setting the replica status to bad in help admin set replica
status . (After set to bad, the copy will no longer be accessed. And it will be automatically repaired later. But before operation,
you should make sure that other copies are normal)
We may encounter this error when importing or querying. If you go to the corresponding BE log, you may also find similar
errors.
This is an RPC error, and there are usually two possibilities: 1. The corresponding BE node is down. 2. rpc congestion or other
errors.
If the BE node is down, you need to check the specific downtime reason. Only the problem of rpc congestion is discussed
here.
One case is OVERCROWDED, which means that the rpc source has a large amount of unsent data that exceeds the
threshold. BE has two parameters associated with it:
1. brpc_socket_max_unwritten_bytes : The default value is 1GB. If the unsent data exceeds this value, an error will be
reported. This value can be modified appropriately to avoid OVERCROWDED errors. (But this cures the symptoms but
not the root cause, and there is still congestion in essence).
2. tablet_writer_ignore_eovercrowded : Default is false. If set to true, Doris will ignore OVERCROWDED errors during
import. This parameter is mainly to avoid import failure and improve the stability of import.
The second is that the packet size of rpc exceeds max_body_size. This problem may occur if the query has a very large String
type, or a bitmap type. It can be circumvented by modifying the following BE parameters:
The reason for this problem may be that when importing data from external storage (such as HDFS), because there are too
many files in the directory, it takes too long to list the file directory. Here, the Broker RPC Timeout defaults to 10 seconds, and
the timeout needs to be adjusted appropriately here. time.
broker_timeout_ms = 10000
##The default here is 10 seconds, you need to increase this parameter appropriately
View the smallest offset of kafka, use the ALTER ROUTINE LOAD command to modify the offset, and resume the task
FROM kafka
"kafka_partitions" = "0",
"kafka_offsets" = "xxx",
"property.group.id" = "xxx"
);
SQL Error
Q1. Query error: Failed to get scan range, no queryable replica found in tablet: xxxx
This happens because the corresponding tablet does not find a copy that can be queried, usually because the BE is down, the
copy is missing, etc. You can first pass the show tablet tablet_id statement and then execute the following show proc
statement to view the replica information corresponding to this tablet and check whether the replica is complete. At the
same time, you can also query the progress of replica scheduling and repair in the cluster through show proc
"/cluster_balance"
information.
For commands related to data copy management, please refer to Data Copy Management.
After executing certain statements such as show backends/frontends , some columns may be found to be incomplete in the
results. For example, the disk capacity information cannot be seen in the show backends result.
Usually this problem occurs when the cluster has multiple FEs. If users connect to non-Master FE nodes to execute these
statements, they will see incomplete information. This is because some information exists only on the Master FE node. For
example, BE's disk usage information, etc. Therefore, complete information can only be obtained after a direct connection to
the Master FE.
set forward_to_master=true;
Of course, users can also execute before executing these statements. After the session
variable is set to true, some information viewing statements executed subsequently will be automatically forwarded to the
Master FE to obtain the results. In this way, no matter which FE the user is connected to, the complete result can be
obtained.
The Master FE node of Doris will actively send heartbeats to each FE or BE node, and will carry a cluster_id in the heartbeat
information. cluster_id is the unique cluster ID generated by the Master FE when a cluster is initialized. When the FE or BE
receives the heartbeat information for the first time, the cluster_id will be saved locally in the form of a file. The file of FE is in
the image/ directory of the metadata directory, and the BE has a cluster_id file in all data directories. After that, each time the
node receives the heartbeat, it will compare the content of the local cluster_id with the content in the heartbeat. If it is
inconsistent, it will refuse to respond to the heartbeat.
This mechanism is a node authentication mechanism to prevent receiving false heartbeat messages sent by nodes outside
the cluster.
If needed to recover from this error. The first thing to do is to make sure that all the nodes are in the correct cluster. After
that, for the FE node, you can try to modify the cluster_id value in the image/VERSION file in the metadata directory and
restart the FE. For the BE node, you can delete all the cluster_id files in the data directory and restart the BE.
For example, the table is defined as k1, v1. A batch of imported data is as follows:
1, "abc"
1, "def"
1, "abc" 1, "def"
Then maybe the result of copy 1 is , and the result of copy 2 is . As a result, the query results are
inconsistent.
To ensure that the data sequence between different replicas is unique, you can refer to the Sequence Column function.
This is because the bitmap / hll type is in the vectorized execution engine: the input is all NULL, and the output result is also
NULL instead of 0
set return_object_data_as_binary=true;
1. First you have to
2. Turn off vectorization set enable_vectorized_engine=false;
3. Turn off SQL cache set [global] enable_sql_cache = false;
This is because the bitmap/hll type is in the vectorized execution engine: the input is all NULL, and the output result is also
NULL instead of 0
Q6. Error when accessing object storage: curl 77: Problem with the SSL CA cert
If the curl 77: Problem with the SSL CA cert error appears in the be.INFO log. You can try to solve it in the following ways:
Star-Schema-Benchmark
Star Schema Benchmark(SSB) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified
star schema data based on TPC-H, which is mainly used to test the performance of multi-table JOIN query under star
schema. In addition, the industry usually flattens SSB into a wide table model (Referred as: SSB flat) to test the performance
of the query engine, refer to Clickhouse.
This document mainly introduces the performance of Doris on the SSB 100G test set.
Note 1: The standard test set including SSB usually has a large gap with the actual business scenario, and some tests will
perform parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the
performance of the database in a specific scenario. It is recommended that users use actual business data for further
testing.
Note 2: The operations involved in this document are all performed in the Ubuntu Server 20.04 environment, and CentOS
7 as well.
With 13 queries on the SSB standard test data set, we conducted a comparison test based on Apache Doris 1.2.0-rc01, Apache
Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions.
On the SSB flat wide table, the overall performance of Apache Doris 1.2.0-rc01 has been improved by nearly 4 times compared
with Apache Doris 1.1.3, and nearly 10 times compared with Apache Doris 0.15.0 RC04.
On the SQL test with standard SSB, the overall performance of Apache Doris 1.2.0-rc01 has been improved by nearly 2 times
compared with Apache Doris 1.1.3, and nearly 31 times compared with Apache Doris 0.15.0 RC04.
1. Hardware Environment
Number of machines 4 Tencent Cloud Hosts (1 FE, 3 BEs)
CPU AMD EPYC™ Milan (2.55GHz/3.5GHz) 16 Cores
Memory 64G
2. Software Environment
Doris deployed 3BEs and 1FE;
Kernel version: Linux version 5.4.0-96-generic (buildd@lgw01-amd64-051)
4. Test Results
We use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for comparative testing. The test results are
as follows:
Query Apache Doris 1.2.0-rc01(ms) Apache Doris 1.1.3 (ms) Doris 0.15.0 RC04 (ms)
Q1.1 20 90 250
Q1.2 10 10 30
Q1.3 30 70 120
The data set corresponding to the test results is scale 100, about 600 million.
The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1 FE,
3 BEs deployment.
We select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test
process will not consume so many hardware resources.
Q1.1 40 18 350
Q1.2 30 100 80
Q1.3 20 70 80
Q2.1 350 940 20,680
Q3.4 60 70 160
Q4.1 840 1,480 24,320
The data set corresponding to the test results is scale 100, about 600 million.
The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1 FE
3 BEs deployment.
We select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test
process will not consume so many hardware resources.
6. Environment Preparation
Please first refer to the [official documentation](. /install/install-deploy.md) to install and deploy Apache Doris first to obtain a
Doris cluster which is working well(including at least 1 FE 1 BE, 1 FE 3 BEs is recommended).
The scripts mentioned in the following documents are stored in the Apache Doris codebase: ssb-tools
7. Data Preparation
sh build-ssb-dbgen.sh
dbgen ssb-dbgen/
After successful installation, the binary will be generated under the directory.
Note 2: The data will be generated under the ssb-data/ directory with the suffix .tbl . The total file size is about 60GB
and may need a few minutes to an hour to generate.
Note 3: -s 100 indicates that the test set size factor is 100, -c 100 indicates that 100 concurrent threads generate the
data of the lineorder table. The -c parameter also determines the number of files in the final lineorder table. The larger
the parameter, the larger the number of files and the smaller each file.
The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:
export FE_HOST="xxx"
export FE_HTTP_PORT="8030"
export FE_QUERY_PORT="9030"
export USER="root"
export PASSWORD='xxx'
export DB="ssb"
7.3.2 Execute the Following Script to Generate and Create the SSB Table:
sh create-ssb-tables.sh
Or copy the table creation statements in create-ssb-tables.sql and create-ssb-flat-table.sql and then execute them in the
MySQL client.
The following is the lineorder_flat table build statement. Create the lineorder_flat table in the above create-ssb-flat-
table.sh script, and perform the default number of buckets (48 buckets). You can delete this table and adjust the number of
buckets according to your cluster scale node configuration, so as to obtain a better test result.
) ENGINE=OLAP
COMMENT "OLAP"
PARTITION BY RANGE(`LO_ORDERDATE`)
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupxx1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
sh bin/load-ssb-data.sh -c 10
-c 5 means start 10 concurrent threads to import (5 by default). In the case of a single BE node, the lineorder data generated
by sh gen-ssb-data.sh -s 100 -c 100 will also generate the data of the ssb-flat table in the end. If more threads are enabled,
the import speed can be accelerated. But it will cost extra memory.
Notes.
1. To get faster import speed, you can add flush_thread_num_per_store=5 in be.conf and then restart BE. This
configuration indicates the number of disk writing threads for each data directory, 2 by default. Larger data can
improve write data throughput, but may increase IO Util. (Reference value: 1 mechanical disk, with 2 by default, the
IO Util during the import process is about 12%. When it is set to 5, the IO Util is about 26%. If it is an SSD disk, it is
almost 0%) .
2. The flat table data is imported by 'INSERT INTO ... SELECT ... '.
The amount of data should be consistent with the number of rows of generated data.
FROM lineorder_flat
WHERE LO_ORDERDATE >= 19930101 AND LO_ORDERDATE <= 19931231 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY <
25;
--Q1.2
FROM lineorder_flat
WHERE LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19940131 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY
BETWEEN 26 AND 35;
--Q1.3
FROM lineorder_flat
WHERE weekofyear(LO_ORDERDATE) = 6 AND LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19941231 AND LO_DISCOUNT
BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;
--Q2.1
--Q2.2
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
--Q2.3
FROM lineorder_flat
--Q3.1
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231
--Q3.2
FROM lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <=
19971231
--Q3.3
FROM lineorder_flat
WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >=
19920101 AND LO_ORDERDATE <= 19971231
--Q3.4
FROM lineorder_flat
WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >=
19971201 AND LO_ORDERDATE <= 19971231
--Q4.1
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR IN ('MFGR#1', 'MFGR#2')
--Q4.2
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND
P_MFGR IN ('MFGR#1', 'MFGR#2')
--Q4.3
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_CITY, P_BRAND, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_CATEGORY =
'MFGR#14'
--Q1.1
WHERE
lo_orderdate = d_datekey
--Q1.2
WHERE
lo_orderdate = d_datekey
--Q1.3
SELECT
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear = 6
--Q2.1
WHERE
lo_orderdate = d_datekey
--Q2.2
WHERE
lo_orderdate = d_datekey
--Q2.3
WHERE
lo_orderdate = d_datekey
--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue) AS REVENUE
WHERE
lo_custkey = c_custkey
--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
WHERE
lo_custkey = c_custkey
--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
WHERE
lo_custkey = c_custkey
AND (
AND (
--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue) AS REVENUE
WHERE
lo_custkey = c_custkey
AND (
AND (
--Q4.1
d_year,
c_nation,
WHERE
lo_custkey = c_custkey
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
--Q4.2
d_year,
s_nation,
p_category,
WHERE
lo_custkey = c_custkey
AND (
d_year = 1997
OR d_year = 1998
)
AND (
p_mfgr = 'MFGR#1'
OR p_mfgr = 'MFGR#2'
--Q4.3
d_year,
s_city,
p_brand,
WHERE
lo_custkey = c_custkey
AND (
d_year = 1997
OR d_year = 1998
)
TPC-H Benchmark
TPC-H is a decision support benchmark (Decision Support Benchmark), which consists of a set of business-oriented special
query and concurrent data modification. The data that is queried and populates the database has broad industry relevance.
This benchmark demonstrates a decision support system that examines large amounts of data, executes highly complex
queries, and answers key business questions. The performance index reported by TPC-H is called TPC-H composite query
performance index per hour (QphH@Size), which reflects multiple aspects of the system's ability to process queries. These
aspects include the database size chosen when executing the query, the query processing capability when the query is
submitted by a single stream, and the query throughput when the query is submitted by many concurrent users.
This document mainly introduces the performance of Doris on the TPC-H 100G test set.
Note 1: The standard test set including TPC-H is usually far from the actual business scenario, and some tests will perform
parameter tuning for the test set. Therefore, the test results of the standard test set can only reflect the performance of
the database in a specific scenario. We suggest users use actual business data for further testing.
Note 2: The operations involved in this document are all tested on CentOS 7.x.
On 22 queries on the TPC-H standard test data set, we conducted a comparison test based on Apache Doris 1.2.0-rc01,
Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 versions. Compared with Apache Doris 1.1.3, the overall performance of
Apache Doris 1.2.0-rc01 has been improved by nearly 3 times, and by nearly 11 times compared with Apache Doris 0.15.0
RC04.
1. Hardware Environment
Hardware Configuration Instructions
Memory 64G
Network 5Gbps
Disk ESSD Cloud Hard Disk
2. Software Environment
Doris Deployed 3BEs and 1FE
4. Test SQL
TPCH 22 test query statements : TPCH-Query-SQL
Notice:
The following four parameters in the above SQL do not exist in Apache Doris 0.15.0 RC04. When executing, please remove:
1. enable_vectorized_engine=true,
2. batch_size=4096,
3. disable_join_reorder=false
4. enable_projection=true
5. Test Results
Here we use Apache Doris 1.2.0-rc01, Apache Doris 1.1.3 and Apache Doris 0.15.0 RC04 for comparative testing. In the test, we
use Query Time(ms) as the main performance indicator. The test results are as follows:
Query Apache Doris 1.2.0-rc01 (ms) Apache Doris 1.1.3 (ms) Apache Doris 0.15.0 RC04 (ms)
Result Description
The data set corresponding to the test results is scale 100, about 600 million.
The test environment is configured as the user's common configuration, with 4 cloud servers, 16-core 64G SSD, and 1
FE 3 BEs deployment.
Select the user's common configuration test to reduce the cost of user selection and evaluation, but the entire test
process will not consume so many hardware resources.
Apache Doris 0.15 RC04 failed to execute Q14 in the TPC-H test, unable to complete the query.
6. Environmental Preparation
Please refer to the official document to install and deploy Doris to obtain a normal running Doris cluster (at least 1 FE 1 BE, 1
FE 3 BE is recommended).
7. Data Preparation
Execute the following script to download and compile the tpch-tools tool.
sh build-tpch-dbgen.sh
After successful installation, the dbgen binary will be generated under the TPC-H_Tools_v3.0.0/ directory.
sh gen-tpch-data.sh
Note 2: The data will be generated under the tpch-data/ directory with the suffix .tbl . The total file size is about 100GB
and may need a few minutes to an hour to generate.
Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.
The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:
# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
export DB='tpch1'
sh ./load-tpch-data.sh
./run-tpch-queries.sh
Notice:
1. At present, the query optimizer and statistics functions of Doris are not so perfect, so we rewrite some queries in
TPC-H to adapt to the execution framework of Doris, but it does not affect the correctness of the results
--Q1
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
--Q2
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
partsupp join
select
ps_partkey as a_partkey,
min(ps_supplycost) as a_min
from
partsupp,
part,
supplier,
nation,
region
where
p_partkey = ps_partkey
and p_size = 15
group by a_partkey
part,
supplier,
nation,
region
where
p_partkey = ps_partkey
and p_size = 15
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;
--Q3
l_orderkey,
o_orderdate,
o_shippriority
from
) t1 join customer c
on c.c_custkey = t1.o_custkey
group by
l_orderkey,
o_orderdate,
o_shippriority
order by
revenue desc,
o_orderdate
limit 10;
--Q4
o_orderpriority,
count(*) as order_count
from
select
from
lineitem
) t1
on t1.l_orderkey = o_orderkey
where
group by
o_orderpriority
order by
o_orderpriority;
--Q5
n_name,
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
group by
n_name
order by
revenue desc;
--Q6
from
lineitem
where
--Q7
supp_nation,
cust_nation,
l_year,
sum(volume) as revenue
from
select
n1.n_name as supp_nation,
n2.n_name as cust_nation,
from
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
where
s_suppkey = l_suppkey
and (
) as shipping
group by
supp_nation,
cust_nation,
l_year
order by
supp_nation,
cust_nation,
l_year;
--Q8
o_year,
sum(case
else 0
from
select
n2.n_name as nation
from
lineitem,
orders,
customer,
supplier,
part,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
) as all_nations
group by
o_year
order by
o_year;
--Q9
select/*+SET_VAR(exec_mem_limit=37179869184, parallel_fragment_exec_instance_num=2,
enable_vectorized_engine=true, batch_size=4096, disable_join_reorder=false, enable_cost_based_join_reorder=false,
enable_projection=true, enable_remove_no_conjuncts_runtime_filter_policy=true,
runtime_filter_wait_time_ms=100000) */
nation,
o_year,
sum(amount) as sum_profit
from
select
n_name as nation,
from
where
) as profit
group by
nation,
o_year
order by
nation,
o_year desc;
--Q10
c_custkey,
c_name,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
) t1,
nation
where
c_custkey = t1.o_custkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
limit 20;
--Q11
ps_partkey,
from
partsupp,
select s_suppkey
) B
where
ps_suppkey = B.s_suppkey
group by
ps_partkey having
select
from
partsupp,
(select s_suppkey
) A
where
ps_suppkey = A.s_suppkey
order by
value desc;
--Q12
l_shipmode,
sum(case
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
group by
l_shipmode
order by
l_shipmode;
--Q13
c_count,
count(*) as custdist
from
select
c_custkey,
count(o_orderkey) as c_count
from
c_custkey = o_custkey
group by
c_custkey
) as c_orders
group by
c_count
order by
custdist desc,
c_count desc;
--Q14
100.00 * sum(case
else 0
from
part,
lineitem
where
l_partkey = p_partkey
--Q15
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue0
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue0
order by
s_suppkey;
--Q16
p_brand,
p_type,
p_size,
from
partsupp,
part
where
p_partkey = ps_partkey
select
s_suppkey
from
supplier
where
group by
p_brand,
p_type,
p_size
order by
supplier_cnt desc,
p_brand,
p_type,
p_size;
--Q17
from
where
p1.p_brand = 'Brand#23'
select
0.2 * avg(l_quantity)
from
where
l_partkey = p1.p_partkey
);
--Q18
c_name,
c_custkey,
t3.o_orderkey,
t3.o_orderdate,
t3.o_totalprice,
sum(t3.l_quantity)
from
customer join
select * from
lineitem join
select * from
select
l_orderkey
from
lineitem
group by
) t1
on o_orderkey = t1.l_orderkey
) t2
on t2.o_orderkey = l_orderkey
) t3
on c_custkey = t3.o_custkey
group by
c_name,
c_custkey,
t3.o_orderkey,
t3.o_orderdate,
t3.o_totalprice
order by
t3.o_totalprice desc,
t3.o_orderdate
limit 100;
--Q19
from
lineitem,
part
where
p_partkey = l_partkey
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
or
p_partkey = l_partkey
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity >= 10 and l_quantity <= 10 + 10
or
p_partkey = l_partkey
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
);
--Q20
select * from
from lineitem
group by l_partkey,l_suppkey
) t2 join
) t1
) t3
on s_suppkey = t3.ps_suppkey
join nation
order by s_name;
--Q21
from
select * from
select * from
join
select * from
) t1
) t2
on l3.l_orderkey = t2.l_orderkey and l3.l_suppkey <> t2.l_suppkey and l3.l_receiptdate > l3.l_commitdate
) t3
group by
t3.s_name
order by
numwait desc,
t3.s_name
limit 100;
--Q22
avg(c_acctbal) as av
from
customer
where
and substring(c_phone, 1, 2) in
select /*+SET_VAR(exec_mem_limit=8589934592,
parallel_fragment_exec_instance_num=4,runtime_bloom_filter_size=4194304) */
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from
select
substring(c_phone, 1, 2) as cntrycode,
c_acctbal
from
orders right anti join customer c on o_custkey = c.c_custkey join tmp on c.c_acctbal > tmp.av
where
substring(c_phone, 1, 2) in
) as custsale
group by
cntrycode
order by
cntrycode;
Release 1.2.2
Lakehouse
Support reading the Iceberg Snapshot, and viewing the Snapshot history.
Auto Bucket
Set and scale the number of buckets for different partitions to keep the number of tablet in a relatively appropriate range.
New Functions
Add the new function width_bucket .
Behavior Changes
Disable BE's page cache by default: disable_storage_page_cache=true
Turn off this configuration to optimize memory usage and reduce the risk of memory OOM.
But it will reduce the query
latency of some small queries.
If you are sensitive to query latency, or have high concurrency and small query scenarios, you
can configure disable_storage_page_cache=false to enable page cache again.
group_by_and_having_use_alias_first
Add new session variable , used to control whether the group and having clauses
use alias.
Improvement
Compaction
Vertical Compaction
Support . To optimize the compaction overhead and efficiency of wide tables.
Support Segment ompaction . Fix -238 and -235 issues with high frequency imports.
Lakehouse
Other
Optimize some problems of Memtracker, improve memory management accuracy, and optimize memory application.
Bug Fix
Fixed memory leak when loading data with Doris Flink Connector.
Fixed the possible thread scheduling problem of BE and reduce the Fragment sent timeout error caused by BE thread
exhaustion.
Fixed the problem data correctness issue with Unique Key Merge-on-Read table.
Fixed various known issues with the Light Schema Change feature.
Fixed the problem of poor reading performance of csv reader introduced in version 1.2.1.
Fixed the problem of BE OOM caused by Spark Load data download phase.
Fixed possible metadata compatibility issues when upgrading from version 1.1 to version 1.2.
Fixed the metadata problem when creating JDBC Catalog with Resource.
Fixed the problem of FE OOM caused by a large number of failed Broker Load jobs.
Fixed the problem of memory leak when useing 2PC stream load
Other
Add metrics to view the total rowset and segment numbers on BE
@adonis0147
@AshinGau
@BePPPower
@BiteTheDDDDt
@ByteYue
@caiconghui
@cambyzju
@chenlinzhong
@DarvenDuan
@dataroaring
@Doris-Extras
@dutyu
@englefly
@freemandealer
@Gabriel39
@HappenLee
@Henry2SS
@htyoung
@isHuangXin
@JackDrogon
@jacktengg
@Jibing-Li
@kaka11chen
@Kikyou1997
@Lchangliang
@LemonLiTree
@liaoxin01
@liqing-coder
@luozenglin
@morningman
@morrySnow
@mrhhsg
@nextdreamblue
@qidaye
@qzsee
@spaces-X
@stalary
@starocean999
@weizuo93
@wsjz
@xiaokang
@xinyiZzz
@xy720
@yangzhg
@yiguolei
@yixiutt
@Yukang-Lian
@Yulei-Yang
@zclllyybb
@zddr
@zhangstar333
@zhannngchen
@zy-kkk
Release 1.2.1
Supports new type DecimalV3
DecimalV3, which supports higher precision and better performance, has the following advantages over past versions.
Larger representable range, the range of values are significantly expanded, and the valid number range [1,38].
Higher performance, adaptive adjustment of the storage space occupied according to different precision.
More complete precision derivation support, for different expressions, different precision derivation rules are applied to
the accuracy of the result.
DecimalV3
Support Iceberg V2
Support Iceberg V2 (only Position Delete is supported, Equality Delete will be supported in subsequent versions).
Support OR condition to IN
Support converting OR condition to IN condition, which can improve the execution efficiency in some scenarios.#15437
#12872
Broker supports Tencent Cloud CHDFS and Baidu Cloud BOS, AFS
Data on CHDFS, BOS, and AFS can be accessed through Broker. #15297 #15448
New function
Add function substring_index . #15373
Bug Fix
In some cases, after upgrading from version 1.1 to version 1.2, the user permission information will be lost. #15144
Fix the problem that the partition value is wrong when using datev2/datetimev2 type for partitioning. #15094
Bug fixes for a large number of released features. For a complete list see: PR List
Upgrade Notice
Known Issues
Do not use JDK11 as the runtime JDK of BE, it will cause BE Crash.
The reading performance of the csv format in this version has declined, which will affect the import and reading
efficiency of the csv format. We will fix it as soon as possible in the next three-digit version
Behavior Changed
The default value of the FE configuration item enable_new_load_scan_node is changed to true. Import tasks will be
performed using the new File Scan Node. No impact on users.#14808
Delete the FE configuration item enable_multi_catalog . The Multi-Catalog function is enabled by default.
The session variable enable_vectorized_engine will no longer take effect. Enabled by default.
To make it valid again, set the FE configuration item disable_enable_vectorized_engine to false, and restart FE to make
enable_vectorized_engine valid again.
Big Thanks
Thanks to ALL who contributed to this release!
@adonis0147
@AshinGau
@BePPPower
@BiteTheDDDDt
@ByteYue
@caiconghui
@cambyzju
@chenlinzhong
@dataroaring
@Doris-Extras
@dutyu
@eldenmoon
@englefly
@freemandealer
@Gabriel39
@HappenLee
@Henry2SS
@hf200012
@jacktengg
@Jibing-Li
@Kikyou1997
@liaoxin01
@luozenglin
@morningman
@morrySnow
@mrhhsg
@nextdreamblue
@qidaye
@spaces-X
@starocean999
@wangshuo128
@weizuo93
@wsjz
@xiaokang
@xinyiZzz
@xutaoustc
@yangzhg
@yiguolei
@yixiutt
@Yulei-Yang
@yuxuan-luo
@zenoyang
@zhangstar333
@zhannngchen
@zhengshengjun
Edit this page
Feedback
Release notes Release 1.2.0
Release 1.2.0
Highlight
1. Full Vectorizied-Engine support, greatly improved performance
In the standard ssb-100-flat benchmark, the performance of 1.2 is 2 times faster than that of 1.1; in complex TPCH 100
benchmark, the performance of 1.2 is 3 times faster than that of 1.1.
Support Merge-On-Write on Unique Key Model. This mode marks the data that needs to be deleted or updated when the
data is written, thereby avoiding the overhead of Merge-On-Read when querying, and greatly improving the reading
efficiency on the updateable data model.
3. Multi Catalog
The multi-catalog feature provides Doris with the ability to quickly access external data sources for access. Users can
CREATE CATALOG
connect to external data sources through the command. Doris will automatically map the library and
table information of external data sources. After that, users can access the data in these external data sources just like
accessing ordinary tables. It avoids the complicated operation that the user needs to manually establish external
i. Hive Metastore: You can access data tables including Hive, Iceberg, and Hudi. It can also be connected to data
sources compatible with Hive Metastore, such as Alibaba Cloud's DataLake Formation. Supports data access on both
HDFS and object storage.
Note: The corresponding permission level will also be changed automatically, see the "Upgrade Notes" section for
details.
In the new version, it is no longer necessary to change the data file synchronously for the operation of adding and
subtracting columns to the data table, and only need to update the metadata in FE, thus realizing the millisecond-level
Schema Change operation. Through this function, the DDL synchronization capability of upstream CDC data can be realized.
For example, users can use Flink CDC to realize DML and DDL synchronization from upstream database to Doris.
5. JDBC facade
Users can connect to external data sources through JDBC. Currently supported:
MySQL
PostgreSQL
Oracle
SQL Server
Clickhouse
Note: The ODBC feature will be removed in a later version, please try to switch to the JDBC.
6. JAVA UDF
Supports writing UDF/UDAF in Java, which is convenient for users to use custom functions in the Java ecosystem. At the
same time, through technologies such as off-heap memory and Zero Copy, the efficiency of cross-language data access
has been greatly improved.
7. Remote UDF
Supports accessing remote user-defined function services through RPC, thus completely eliminating language
restrictions for users to write UDFs. Users can use any programming language to implement custom functions to
complete complex data analysis work.
Array type
Array types are supported. It also supports nested array types. In some scenarios such as user portraits and tags, the
Array type can be used to better adapt to business scenarios. At the same time, in the new version, we have also
implemented a large number of data-related functions to better support the application of data types in actual
scenarios.
Jsonb type
Support binary Json data type: Jsonb. This type provides a more compact json encoding format, and at the same
time provides data access in the encoding format. Compared with json data stored in strings, it is several times
newer and can be improved.
Date V2
Sphere of influence:
a. The user needs to specify datev2 and datetimev2 when creating the table, and the date and datetime of the
original table will not be affected.
b. When datev2 and datetimev2 are calculated with the original date and datetime (for example, equivalent
connection), the original type will be cast into a new type for calculation
More
1. A new memory management framework
Doris implements a set of Table Valued Function (TVF). TVF can be regarded as an ordinary table, which can appear in all
places where "table" can appear in SQL.
For example, we can use S3 TVF to implement data import on object storage:
insert into tbl select * from s3("s3://bucket/file.*", "ak" = "xx", "sk" = "xxx") where c1 > 2;
TVF can help users make full use of the rich expressiveness of SQL and flexibly process various data.
Documentation:
https: //doris.apache.org//docs/dev/sql-manual/sql-functions/table-functions/s3
https: //doris.apache.org//docs/dev/sql-manual/sql-functions/table-functions/hdfs
Support for creating multiple partitions within a time range via the FROM TO command.
4. Column renaming
For tables with Light Schema Change enabled, column renaming is supported.
6. Import
Stream Load adds hidden_columns , which can explicitly specify the delete flag column and sequence column.
Added support for Alibaba Cloud oss, Tencent Cloud cos/chdfs and Huawei Cloud obs in broker load.
7. Support viewing the contents of the catalog recycle bin through SHOW CATALOG RECYCLE BIN function.
10. Support to modify the number of Query Profiles that can be saved through configuration.
Document search FE configuration item: max_query_profile_num
11. The DELETE statement supports IN predicate conditions. And it supports partition pruning.
12. The default value of the time column supports using CURRENT_TIMESTAMP
Documentation:
https: //doris.apache.org//docs/dev/admin-manual/system-table/backends
https: //doris.apache.org//docs/dev/admin-manual/system-table/rowsets
The Restore job supports the reserve_replica parameter, so that the number of replicas of the restored table is the
same as that of the backup.
The Restore job supports reserve_dynamic_partition_enable parameter, so that the restored table keeps the
dynamic partition enabled.
Support backup and restore operations through the built-in libhdfs, no longer rely on broker.
15. Support data balance between multiple disks on the same machine
Documentation:
https: //doris.apache.org//docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-REBALANCE-
DISK
https: //doris.apache.org//docs/dev/sql-manual/sql-reference/Database-Administration-Statements/ADMIN-CANCEL-
REBALANCE-DISK
cbrt
sequence_match/sequence_count
mask/mask_first_n/mask_last_n
elt
any/any_value
group_bitmap_xor
ntile
nvl
uuid
initcap
regexp_replace_one/regexp_extract_all
multi_search_all_positions/multi_match_any
domain/domain_without_www/protocol
running_difference
bitmap_hash64
murmur_hash3_64
to_monday
not_null_or_empty
window_funnel
group_bit_and/group_bit_or/group_bit_xor
outer combine
and all array functions
Upgrade Notice
Known Issues
Use JDK11 will cause BE crash, please use JDK8 instead.
Behavior Changed
Because the catalog level is introduced, the corresponding user permission level will also be changed automatically. The
rules are as follows:
In GroupBy and Having clauses, match on column names in preference to aliases. (#14408)
Creating columns starting with mv_ is no longer supported. mv_ is a reserved keyword in materialized views (#14361)
Removed the default limit of 65535 rows added by the order by statement, and added the session variable
default_order_by_limit to configure this limit. (#12478)
In the table generated by "Create Table As Select", all string columns use the string type uniformly, and no longer
distinguish varchar/char/string (#14382)
In the audit log, remove the word default_cluster before the db and user names. (#13499) (#11408)
The union clause always changes the order by logic. In the new version, the order by clause will be executed after the
union is executed, unless explicitly associated by parentheses. (#9745)
During the decommission operation, the tablet in the recycle bin will be ignored to ensure that the decomission can be
completed. (#14028)
The returned result of Decimal will be displayed according to the precision declared in the original column, or according
to the precision specified in the cast function. (#13437)
Increased create_table_timeout value. The default timeout for table creation operations will be increased. (#13520)
Increase the parameter max_replica_count_when_schema_change to limit the number of replicas involved in the alter
job, the default is 100000. (#12850)
disable_iceberg_hudi_table
Add . The iceberg and hudi appearances are disabled by default, and the multi catalog
function is recommended. (#13932)
Removed disable_stream_load_2pc parameter. 2PC's stream load can be used directly. (#13520)
Modify the variable enable_insert_strict to true by default. This will cause some insert operations that could be
executed before, but inserted illegal values, to no longer be executed. (11866)
Add skip_storage_engine_merge variable for debugging unique or agg model data (#11952)
Documentation: https: //doris.apache.org//docs/dev/advanced/variables
The BE startup script will check whether the value is greater than 200W through /proc/sys/vm/max_map_count .
Otherwise, the startup fails. (#11052)
FE Metadata Version
FE Meta Version changed from 107 to 114, and cannot be rolled back after upgrading.
During Upgrade
1. Upgrade preparation
Need to replace: lib, bin directory (start/stop scripts have been modified)
BE also needs to configure JAVA _HOME, and already supports JDBC Table and Java UDF.
The repeat function cannot be used and an error is reported: vectorized repeat function cannot be executed , you
can turn off the vectorized execution engine before upgrading. (#13868)
schema change fails with error: desc_tbl is not set. Maybe the FE version is not equal to the BE (#13822)
Vectorized hash join cannot be used and an error will be reported. vectorized hash join cannot be executed . You
can turn off the vectorized execution engine before upgrading. (#13753)
Performance Impact
By default, JeMalloc is used as the memory allocator of the new version BE, replacing TcMalloc (#13367)
The batch size in the tablet sink is modified to be at least 8K. (#13912)
Api change
BE's http api error return information changed from {"status": "Fail", "msg": "xxx"} to more specific {"status":
"Not found", "msg": "Tablet not found. tablet_id=1202"} (#9771)
In SHOW CREATE TABLE , the content of comment is changed from double quotes to single quotes (#10327)
Support ordinary users to obtain query profile through http command. (#14016)
Documentation:
https: //doris.apache.org//docs/dev/admin-manual/http-actions/fe/manager/query-profile-action
Optimized the way to specify the sequence column, you can directly specify the column name. (#13872)
Documentation:
https: //doris.apache.org//docs/dev/data-operate/update-delete/sequence-column-manual
Increase the space usage of remote storage in the results returned by show backends and show tablets (#11450)
Refactored BE's error code mechanism, some returned error messages will change (#8855)
other
script related
The stop scripts of FE and BE support exiting FE and BE via the --grace parameter (use kill -15 signal instead of kill -9)
FE start script supports checking the current FE version via --version (#11563)
Support to get the data and related table creation statement of a tablet through the ADMIN COPY TABLET command, for
local problem debugging (#12176)
Support to obtain a table creation statement related to a SQL statement through the http api for local problem
reproduction (#11979)
Support to close the compaction function of this table when creating a table, for testing (#11743)
Big Thanks
Thanks to ALL who contributed to this release! (alphabetically)
@924060929
@a19920714liou
@adonis0147
@Aiden-Dong
@aiwenmo
@AshinGau
@b19mud
@BePPPower
@BiteTheDDDDt
@bridgeDream
@ByteYue
@caiconghui
@CalvinKirs
@cambyzju
@caoliang-web
@carlvinhust2012
@catpineapple
@ccoffline
@chenlinzhong
@chovy-3012
@coderjiang
@cxzl25
@dataalive
@dataroaring
@dependabot[bot]
@dinggege1024
@DongLiang-0
@Doris-Extras
@eldenmoon
@EmmyMiao87
@englefly
@FreeOnePlus
@Gabriel39
@gaodayue
@geniusjoe
@gj-zhang
@gnehil
@GoGoWen
@HappenLee
@hello-stephen
@Henry2SS
@hf200012
@huyuanfeng2018
@jacktengg
@jackwener
@jeffreys-cat
@Jibing-Li
@JNSimba
@Kikyou1997
@Lchangliang
@LemonLiTree
@lexoning
@liaoxin01
@lide-reed
@link3280
@liutang123
@liuyaolin
@LOVEGISER
@lsy3993
@luozenglin
@luzhijing
@madongz
@morningman
@morningman-cmy
@morrySnow
@mrhhsg
@Myasuka
@myfjdthink
@nextdreamblue
@pan3793
@pangzhili
@pengxiangyu
@platoneko
@qidaye
@qzsee
@SaintBacchus
@SeekingYang
@smallhibiscus
@sohardforaname
@song7788q
@spaces-X
@ssusieee
@stalary
@starocean999
@SWJTU-ZhangLei
@TaoZex
@timelxy
@Wahno
@wangbo
@wangshuo128
@wangyf0555
@weizhengte
@weizuo93
@wsjz
@wunan1210
@xhmz
@xiaokang
@xiaokangguo
@xinyiZzz
@xy720
@yangzhg
@Yankee24
@yeyudefeng
@yiguolei
@yinzhijian
@yixiutt
@yuanyuan8983
@zbtzbtzbt
@zenoyang
@zhangboya1
@zhangstar333
@zhannngchen
@ZHbamboo
@zhengshiJ
@zhenhb
@zhqu1148980644
@zuochunwei
@zy-kkk
Release 1.1.5
In this release, Doris Team has fixed about 36 issues or performance improvement since 1.1.4. This release is a bugfix release
on 1.1 and all users are encouraged to upgrade to this release.
Behavior Changes
When alias name is same as the original column name like "select year(birthday) as birthday" and use it in group by, order by ,
having clause, doris's behavior is different from MySQL in the past. In this release, we make it follow MySQL's behavior. Group
by and having clause will use original column at first and order by will use alias first. It maybe a litter confuse here so there is a
simple advice here, you'd better not use an alias the same as original column name.
Features
Add support of murmur_hash3_64. #14636
Improvements
Add timezone cache for convert_tz to improve performance. #14616
Bug Fix
Fix coredump when there is a if constant expr in select clause. #14858
Update high_priority_flush_thread_num_per_store default value to 6 and it will improve the load performance. #14775
Partition column is not duplicate key, spark load will throw IndexOutOfBounds error. #14661
Using avg rowset to calculate batch size instead of using total_bytes since it costs a lot of cpu. #14273
Release 1.1.4
In this release, Doris Team has fixed about 60 issues or performance improvement since 1.1.3. This release is a bugfix release
on 1.1 and all users are encouraged to upgrade to this release.
Features
Support obs broker load for Huawei Cloud. #13523
Improvements
Do not acquire mutex in metric hook since it will affect query performance during heavy load.#10941
BugFix
The where condition does not take effect when spark load loads the file. #13804
If function return error result when there is nullable column in vectorized mode. #13779
Fix incorrect result when using anti join with other join predicates. #13743
Table name and column name is not recognized correctly in lateral view clause. #13600
Fix allow create mv using to_bitmap() on negative value columns when enable_vectorized_alter_table is true. #13448
Sort exprs nullability property may not be right after subsitute using child's smap info. #13328
Fix bug that last line of data lost for stream load. #13066
Restore table or partition with the same replication num as before the backup. #11942
Release 1.1.3
In this release, Doris Team has fixed more than 80 issues or performance improvement since 1.1.2. This release is a bugfix
release on 1.1 and all users are encouraged to upgrade to this release.
Features
Support escape identifiers for sqlserver and postgresql in ODBC table.
Improvements
Optimize flush policy to avoid small segments. #12706 #12716
Lots of memory control related issues during query or load process. #12682 #12688 #12708 #12776 #12782 #12791 #12794
#12820 #12932 #12954 #12951
BugFix
Core dump on compaction with largeint. #10094
Level1Iterator should release iterators in heap and it may cause memory leak. #12592
Fix decommission failure with 2 BEs and existing colocation table. #12644
BE may core dump because of stack-buffer-overflow when TBrokerOpenReaderResponse too large. #12658
BE may OOM during load when error code -238 occurs. #12666
Fix bug that tablet version may be wrong when doing alter and load. #13070
Upgrade Notes
PageCache and ChunkAllocator are disabled by default to reduce memory usage and can be re-enabled by modifying the
configuration items disable_storage_page_cache and chunk_reserved_bytes_limit .
Storage Page Cache and Chunk Allocator cache user data chunks and memory preallocation, respectively.
These two functions take up a certain percentage of memory and are not freed. This part of memory cannot be flexibly
allocated, which may lead to insufficient memory for other tasks in some scenarios, affecting system stability and availability.
Therefore, we disabled these two features by default in version 1.1.3.
However, in some latency-sensitive reporting scenarios, turning off this feature may lead to increased query latency. If you
are worried about the impact of this feature on your business after upgrade, you can add the following parameters to be.conf
to keep the same behavior as the previous version.
disable_storage_page_cache=false
chunk_reserved_bytes_limit=10%
disable_storage_page_cache : Whether to disable Storage Page Cache. version 1.1.2 (inclusive), the default is false, i.e., on.
version 1.1.3 defaults to true, i.e., off.
chunk_reserved_bytes_limit : Chunk allocator reserved memory size. 1.1.2 (and earlier), the default is 10% of the overall
memory. 1.1.3 version default is 209715200 (200MB).
Release 1.1.2
In this release, Doris Team has fixed more than 170 issues or performance improvement since 1.1.1. This release is a bugfix
release on 1.1 and all users are encouraged to upgrade to this release.
Features
New MemTracker
Introduced new MemTracker for both vectorized engine and non-vectorized engine which is more accurate.
Improvements
Add reuse block in non-vectorized engine and have 50% performance improvement in some cases.
#11392
Bug Fix
Some issues about FE that will cause FE failure or data corrupt.
Add reserved disk config to avoid too many reserved BDB-JE files.(Serious) In an HA environment, BDB JE will retains as
many reserved files. The BDB-je log doesn't delete until approaching a disk limit.
Fix fatal bug in BDB-JE which will cause FE replica could not start correctly or data corrupted. (Serious)
Fe will hang on waitFor_rpc during query and BE will hang in high concurrent scenarios.
#12459 #12458 #12392
A fatal issue in vectorized storage engine which will cause wrong result. (Serious)
#11754 #11694
Lots of planner related issues that will cause BE core or in abnormal state.
#12080 #12075 #12040 #12003 #12007 #11971 #11933 #11861 #11859 #11855 #11837 #11834 #11821 #11782 #11723 #11569
Release 1.1.1
Features
Improvements
Some data is compressed using bitshuffle and it costs a lot of time to decompress it during query. In 1.1.1, doris will
decompress the data that encoded by bitshuffle to accelerate query and we find it could reduce 30% latency for some query
in ssb-flat.
Bug Fix
Fix the problem that could not do rolling upgrade from 1.0.(Serious)
This issue was introduced in version 1.1 and may cause BE core when upgrade BE but not upgrade FE.
If you encounter this problem, you can try to fix it with #10833.
Fix the problem that some query not fall back to non-vectorized engine, and BE will core.
Currently, vectorized engine could not deal with all sql queries and some queries (like left outer join) will use non-vectorized
engine to run. But there are some cases not covered in 1.1. And it will cause be crash.
Thanks
Thanks to everyone who has contributed to this release:
@jacktengg
@mrhhsg
@xinyiZzz
@yixiutt
@starocean999
@morrySnow
@morningman
@HappenLee
Release 1.1.0
In version 1.1, we realized the full vectorization of the computing layer and storage layer, and officially enabled the vectorized
execution engine as a stable function. All queries are executed by the vectorized execution engine by default, and the
performance is 3-5 times higher than the previous version. It increases the ability to access the external tables of Apache
Iceberg and supports federated query of data in Doris and Iceberg, and expands the analysis capabilities of Apache Doris on
the data lake; on the basis of the original LZ4, the ZSTD compression algorithm is added , further improves the data
compression rate; fixed many performance and stability problems in previous versions, greatly improving system stability.
Downloading and using is recommended.
Upgrade Notes
In version 1.0, we introduced the vectorized execution engine as an experimental feature and Users need to manually enable
set batch_size = 4096 set
it when executing queries by configuring the session variables through and
enable_vectorized_engine = true
.
In version 1.1, we officially fully enabled the vectorized execution engine as a stable function. The session variable
enable_vectorized_engine is set to true by default. All queries are executed by default through the vectorized execution
engine.
In order to ensure the maintainability of the code structure and reduce the additional learning and development costs
caused by redundant historical codes, we have decided to no longer support the Segment v1 storage format from the next
version. It is expected that this part of the code will be deleted in the Apache Doris 1.2 version.
Normal Upgrade
For normal upgrade operations, you can perform rolling upgrades according to the cluster upgrade documentation on the
official website.
https: //doris.apache.org//docs/admin-manual/cluster-management/upgrade
Features
In version 1.1, Apache Doris supports creating Iceberg external tables and querying data, and supports automatic
synchronization of all table schemas in the Iceberg database through the REFRESH command.
At present, the data compression method in Apache Doris is uniformly specified by the system, and the default is LZ4. For
some scenarios that are sensitive to data storage costs, the original data compression ratio requirements cannot be met.
In version 1.1, users can set "compression"="zstd" in the table properties to specify the compression method as ZSTD when
creating a table. In the 25GB 110 million lines of text log test data, the highest compression rate is nearly 10 times, which is 53%
higher than the original compression rate, and the speed of reading data from disk and decompressing it is increased by 30%.
Improvements
In version 1.1, we implemented full vectorization of the compute and storage layers, including:
The storage layer implements vectorization and supports dictionary optimization for low-cardinality string columns
Optimized and resolved numerous performance and stability issues with the vectorization engine.
We tested the performance of Apache Doris version 1.1 and version 0.15 on the SSB and TPC-H standard test datasets:
On all 13 SQLs in the SSB test data set, version 1.1 is better than version 0.15, and the overall performance is improved by about
3 times, which solves the problem of performance degradation in some scenarios in version 1.0;
On all 22 SQLs in the TPC-H test data set, version 1.1 is better than version 0.15, the overall performance is improved by about
4.5 times, and the performance of some scenarios is improved by more than ten times;
SSB Benchmark
TPC-H Benchmark
https: //doris.apache.org//docs/benchmark/ssb
https: //doris.apache.org//docs/benchmark/tpch
At the same time, for high-frequency small file cumulative compaction, the scheduling and isolation of compaction tasks is
implemented to prevent the heavyweight base compaction from affecting the merging of new data.
Finally, for the merging of small files, the strategy of merging small files is optimized, and the method of gradient merging is
adopted. Each time the files participating in the merging belong to the same data magnitude, it prevents versions with large
differences in size from merging, and gradually merges hierarchically. , reducing the number of times a single file participates
in merging, which can greatly save the CPU consumption of the system.
When the data upstream maintains a write frequency of 10w per second (20 concurrent write tasks, 5000 rows per job, and
checkpoint interval of 1s), version 1.1 behaves as follows:
Quick data consolidation: Tablet version remains below 50 and compaction score is stable. Compared with the -235
problem that frequently occurred during high concurrent writing in the previous version, the compaction merge
efficiency has been improved by more than 10 times.
Significantly reduced CPU resource consumption: The strategy has been optimized for small file Compaction. In the
above scenario of high concurrent writing, CPU resource consumption is reduced by 25%;
Stable query time consumption: The overall orderliness of data is improved, and the fluctuation of query time
consumption is greatly reduced. The query time consumption during high concurrent writing is the same as that of only
querying, and the query performance is improved by 3-4 times compared with the previous version.
Bugfix
Fix the problem that the data cannot be queried due to the missing data version.(Serious)
This issue was introduced in version 1.0 and may result in the loss of data versions for multiple replicas.
Fix the problem that the resource isolation is invalid for the resource usage limit of loading tasks
(Moderate)
In 1.1, the broker load and routine load will use Backends with specified resource tags to do the load.
Use HTTP BRPC to transfer network data packets over 2GB (Moderate)
In the previous version, when the data transmitted between Backends through BRPC exceeded 2GB,
it may cause data
transmission errors.
Others
The String type that has been written can be accessed normally.
Download to Use
Download Link
hhttps: //doris.apache.org/download
Feedback
If you encounter any problems with use, please feel free to contact us through GitHub discussion forum or Dev e-mail group
anytime.
Thanks
Thanks to everyone who has contributed to this release:
@adonis0147
@airborne12
@amosbird
@aopangzi
@arthuryangcs
@awakeljw
@BePPPower
@BiteTheDDDDt
@bridgeDream
@caiconghui
@cambyzju
@ccoffline
@chenlinzhong
@daikon12
@DarvenDuan
@dataalive
@dataroaring
@deardeng
@Doris-Extras
@emerkfu
@EmmyMiao87
@englefly
@Gabriel39
@GoGoWen
@gtchaos
@HappenLee
@hello-stephen
@Henry2SS
@hewei-nju
@hf200012
@jacktengg
@jackwener
@Jibing-Li
@JNSimba
@kangshisen
@Kikyou1997
@kylinmac
@Lchangliang
@leo65535
@liaoxin01
@liutang123
@lovingfeel
@luozenglin
@luwei16
@luzhijing
@mklzl
@morningman
@morrySnow
@nextdreamblue
@Nivane
@pengxiangyu
@qidaye
@qzsee
@SaintBacchus
@SleepyBear96
@smallhibiscus
@spaces-X
@stalary
@starocean999
@steadyBoy
@SWJTU-ZhangLei
@Tanya-W
@tarepanda1024
@tianhui5
@Userwhite
@wangbo
@wangyf0555
@weizuo93
@whutpencil
@wsjz
@wunan1210
@xiaokang
@xinyiZzz
@xlwh
@xy720
@yangzhg
@Yankee24
@yiguolei
@yinzhijian
@yixiutt
@zbtzbtzbt
@zenoyang
@zhangstar333
@zhangyifan27
@zhannngchen
@zhengshengjun
@zhengshiJ
@zingdle
@zuochunwei
@zy-kkk