跳到主要内容
版本:2.3.2

SQL语法参考手册

数据库操作

数据类型

类型描述大小
BIGINT整型8字节
BIGINT UNSIGNED无符号整型8字节
BOOLEAN布尔类型1字节
TIMESTAMP时间戳8字节
STRINGUTF-8编码的字符串-----
DOUBLE双精度浮点型8字节

其他数据类型

以下数据类型无法直接存储,但会在SQL表达式中出现

类型描述备注
BINARY二进制数据,可以使用Cast子句转换成STRINGsha224, sha256, sha384, sha512函数的返回值均属于此类型
INTERVAL时间间隔时间加减运算和date_bin函数参数需要
ARRAY数组类型聚合函数array_agg返回类型为此

常量

类型语法说明
BIGINT[{+-}]123数值类型
BIGINT UNSIGNED[+]123数值类型
DOUBLE123.45数值类型,目前暂不支持科学记数法
BOOLEAN{true | false | t | f}
STRING'abc'不支持双引号格式,引号中连续两个''转义成‘
TIMESTAMPTIMESTAMP '1900-01-01T12:00:00Z'时间戳,TIMESTAMP 关键字表示后面的字符串常量需要被解释为 TIMESTAMP 类型。
Geometry点击跳转几何类型
--NULL空值

TIMESTAMP 常量语法

时间戳是按RFC3339标准

T代表间隔,仅可以用空格代替

Z代表零时区

+08:00 代表东八区

如下:

  • 1997-01-31T09:26:56.123Z # 标准RFC3339 UTC 时区
  • 1997-01-31T09:26:56.123+08:00 # 标准RFC3339 东八区
  • 1997-01-31 09:26:56.123+08:00 # 接近RFC3339, 只是用空格代替T
  • 1997-01-31T09:26:56.123 # 接近RFC3339, 没有指定时区,默认UTC
  • 1997-01-31 09:26:56.123 # 接近RFC3339, 用空格代替T, 且没有指定时区
  • 1997-01-31 09:26:56 # 接近RFC3339, 精确度是秒级

注意CAST (BIGINT AS TIMESTAMP) 是转化为纳秒级的时间戳,如下:

SELECT CAST (1 AS TIMESTAMP);
+-------------------------------+
| Int64(1) |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+

INTERVAL 常量

示例

  1. INTERVAL '1' 一秒
  2. INTERVAL '1 SECONDE' 一秒
  3. INTERVAL '1 MILLISECONDS' 一毫秒
  4. INTERVAL '1 MINUTE' 一分钟
  5. INTERVAL '0.5 MINUTE' 半分钟
  6. INTERVAL '1 HOUR' 一小时
  7. INTERVAL '1 DAY' 一天
  8. INTERVAL '1 DAY 1' 一天零一秒
  9. INTERVAL '1 WEEK' 一周
  10. INTERVAL '1 MONTH' 一月(30天)
  11. INTERVAL '0.5 MONTH' 半月(15天)
  12. INTERVAL '1 YEAR' 一年(12个月)
  13. INTERVAL '1 YEAR 1 DAY 1 HOUR 1 MINUTE' 一年零一天零一小时一分
  14. INTERVAL '1 DECADES' 一个十年

注意:

INTERVAL '1 YEAR' 并不是365天或366天,而是12个月。 INTERVAL '1 MONTH' 并不是28天或29天或31天,而是30天。

Geometry

WKT

WKT格式是一种文本格式,用于描述二维和三维几何对象的空间特征。 WKT是“Well-Known Text”的缩写,是一种开放的国际标准。 WKT格式包括一些基本的几何对象,例如点、线、多边形和圆形,以及一些复合对象,例如多边形集合和几何对象集合。

语法

<geometry tag> <wkt data>

<geometry tag> ::= POINT | LINESTRING | POLYGON | MULTIPOINT |
MULTILINESTRING | MULTIPOLYGON | GEOMETRYCOLLECTION

<wkt data> ::= <point> | <linestring> | <polygon> | <multipoint> |
<multilinestring> | <multipolygon> | <geometrycollection>
几何对象语法描述
POINT (<x1> <y1>)
线LINESTRING (<x1> <y1>, <x2> <y2>, ...)
多边形POLYGON ((<x1> <y1>, <x2> <y2>, ...))
多点MULTIPOINT (<x1> <y1>, <x2> <y2>, ...)
多线MULTILINESTRING ((<x1> <y1>, <x2> <y2>, ...), (<x1> <y1>, <x2> <y2>, ...))
多多边形MULTIPOLYGON (((<x1> <y1>, <x2> <y2>, ...)), ((<x1> <y1>, <x2> <y2>, ...)))
几何对象集合GEOMETRYCOLLECTION (<geometry tag1> <wkt data1>, <geometry tag2> <wkt data2>, ...)

示例

几何对象图片示例
POINT (30 10)
线LINESTRING (30 10, 10 30, 40 40)
多边形POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
多点MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
多线MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))
多面MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))
几何对象集合GEOMETRYCOLLECTION (POINT (40 10), LINESTRING (10 10, 20 20, 10 40), POLYGON ((40 40, 20 45, 45 30, 40 40)))

创建数据库

语法

CREATE DATABASE [IF NOT EXISTS] db_name [WITH db_options];

db_options:
db_option ...

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
| PRECISION {'ms' | 'us' | 'ns'}
}

参数说明

  1. TTL:表示数据文件保存的时间,默认无限,用带单位的数据表示,支持天(d),小时(h),分钟(m),如TTL 10d,TTL 50h,TTL 100m,当不带单位时,默认为天,如TTL 30。
  2. SHARD:表示数据分片个数,默认为1。
  3. VNODE_DURATION:表示数据在shard中的时间范围,默认为365天,同样使用带单位的数据来表示,数据意义与TTL的value一致。
  4. REPLICA:表示数据在集群中的副本数,默认为1(副本数不大于分布式数据节点的数量)。
  5. PRECISION:数据库的时间戳精度,ms 表示毫秒,us 表示微秒,ns 表示纳秒,默认为ns纳秒。

示例

> CREATE DATABASE oceanic_station;
Query took 0.062 seconds.

查看数据库

语法

SHOW DATABASES;

示例

+-----------------+
| database_name |
+-----------------+
| oceanic_station |
| public |
+-----------------+

使用数据库

如果你通过HTTP API来使用数据库, 你可以在url中指定参数 db=database_name 来使用数据库。

在 CnosDB-Cli 中,可以使用如下命令切换数据库:

\c dbname
public ❯ \c oceanic_station
oceanic_station ❯

删除数据库

语法

DROP DATABASE [IF EXISTS] db_name [AFTER '7d'];

删除数据库会将指定database的所有table数据及元数据全部删除。

当不带AFTER时,会立即删除;

当带AFTER时,为延迟删除,会在指定时间后删除,时间支持天(d),小时(h),分钟(m),如10d,50h,100m,当不带单位时,默认为天。延迟删除期间数据库不可见且不可用。

语法

RECOVER DATABASE [IF EXISTS] db_name;

取消延迟删除,数据库恢复正常状态。

注意:只有对延迟删除的资源,且在延迟删除期间,执行RECOVER语句才有作用。

示例

DROP DATABASE oceanic_station AFTER7d’;

RECOVER DATABASE oceanic_station;

DROP DATABASE oceanic_station;

修改数据库参数

语法

ALTER DATABASE db_name [alter_db_options]

alter_db_options:
SET db_option

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
}

示例

ALTER DATABASE oceanic_station SET TTL '30d';

查看数据库参数

语法

DESCRIBE DATABASE dbname;

示例

DESCRIBE DATABASE oceanic_station;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

表操作

创建表

可以使用 CREATE TABLE 创建表。

CnosDB 支持创建普通表和外部表。

创建普通(TSKV)表

语法

CREATE TABLE [IF NOT EXISTS] tb_name
(field_definition [, field_definition ] ... [, TAGS(tg_name [, tg_name] ...)]);

field_definition:
column_name data_type [field_codec_type]

field_codec_type:
CODEC(code_type)

使用说明

  1. 创建表时无需创建timestamp列,系统自动添加名为"time"的timestamp列。
  2. 各列的名字需要互不相同。
  3. 创建表时如果不指定压缩算法,则使用系统默认的压缩算法。
  4. 目前各种类型支持的压缩算法如下,每种类型第一个为默认指定的算法,NULL表示不使用压缩算法。
  • BIGINT/BIGINT UNSIGNED: DELTA, QUANTILE, SDT, DEADBAND, NULL
  • DOUBLE: GORILLA, QUANTILE, SDT, DEADBAND, NULL
  • STRING: SNAPPY, ZSTD, GZIP, BZIP, ZLIB, NULL
  • BOOLEAN: BITPACK, NULL

想了解更多有关压缩算法的内容可以看压缩算法详情

示例

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.033 seconds.

创建外部表

语法

-- Column definitions can not be specified for PARQUET files

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tb_name
( field_definition [, field_definition] ... ) tb_option;

field_definition:
column_name data_type [ NULL ]

tb_option: {
STORED AS { PARQUET | NDJSON | CSV | AVRO }
| [ WITH HEADER ROW ]
| [ DELIMITER 'a_single_char' ]
| [ PARTITIONED BY ( column_name, [, ... ] ) ]
| LOCATION '/path/to/file'
}

使用说明

  1. 外部表并不存在数据库中,而是将一个操作系统文件当作数据库普通表来访问。
  2. 数据均是只读的,不能执行 DML 操作,也不能建索引。

参数说明

  1. STORED AS:表示文件以什么格式储存,目前支持 PARQUET,NDJSON,CSV,AVRO格式。
  2. WITH HEADER ROW:仅在csv文件格式下生效,表示带有csv表头。
  3. DELIMITER:仅在csv格式下生效,表示列数据的分隔符。
  4. PARTITIONED BY:使用创建表时指定的列来进行分区。
  5. LOCATION:表示关联的文件的位置。

示例

CREATE EXTERNAL TABLE cpu (
cpu_hz DECIMAL(10,6) NOT NULL,
temp DOUBLE NOT NULL,
version_num BIGINT NOT NULL,
is_old BOOLEAN NOT NULL,
weight DECIMAL(12,7) NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION 'tests/data/csv/cpu.csv';
Query took 0.031 seconds.

删除表

语法

DROP TABLE [ IF EXISTS ] tb_name;

示例

DROP TABLE IF EXISTS air;
Query took 0.033 seconds.

显示当前数据库所有表

语法

SHOW TABLES;

示例

SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+

查看表的模式

外部表和普通表的模式都可以使用该语句查看。

语法

DESCRIBE DATABASE table_name;

示例

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+

修改表

说明

目前我们支持修改普通表。

  1. 添加列:添加 field,tag 列。
  2. 删除列:删除 field 列,当删除列导致删除某一行的最后一个 field 值时,我们认为这一行没有值,SELECT 时将不显示这一行。
  3. 修改列:修改列定义,目前支持修改列名、修改列的压缩算法。

语法

ALTER TABLE tb_name alter_table_option;

alter_table_option: {
ADD TAG col_name
| ADD FIELD col_name [data_type] [CODEC(code_type)]
| ALTER col_name SET CODEC(code_type)
| DROP col_name
| RENAME COLUMN col_name TO new_col_name
}

不支持修改 time 列名。 避免在执行 rename tag column 时执行写入操作,可能会引起 series 冲突。

示例

ALTER TABLE air ADD TAG height;
ALTER TABLE air ADD FIELD humidity DOUBLE CODEC(DEFAULT);
ALTER TABLE air ALTER humidity SET CODEC(QUANTILE);
ALTER TABLE air DROP humidity;
ALTER TABLE air RENAME COLUMN height to height_v2;

插入数据

CnosDB支持两种数据写入的方法,一种是使用INSERT INTO语句,另一种是使用HTTP API的write接口,写入Line Protocol格式数据。

本页面只展示INSERT相关的语法。

INSERT

语法

INSERT [INTO] tb_name [ ( column_name [, ...] ) ] VALUES (  const [, ...] ) [, ...] | query; 

说明

CnosDB 要求插入的数据列必须要有时间戳,且VALUES列表必须为常量。 如果有列没有被选中,那么值为NULL

注意

时间列不能为NULL,Tag列和Field列可以为NULL

例如INSERT INTO air (TIME, station, visibility) VALUES(1666132800000000000, NULL, NULL)

如果 VALUES 列表需要表达式,请使用INSERT SELECT语法。

插入一条记录

TIME 列的数据既可以用时间字符串表示,也可以用数字类型的时间戳表示,请注意。

示例

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69, 77);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.044 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 06:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.032 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

注意

关于时区表示,请参考Timestamp

插入多条记录

VALUES关键字后面可以跟多个列表,用,分隔开。

例子

INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 05:40:00', 'XiaoMaiDao', 55, 68, 76),
('2022-10-19 04:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 2 |
+------+
Query took 0.037 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

插入查询结果(INSERT SELECT)

你还可以使用 INSERT SELECT语法,向表中插入查询的数据。

示例

CREATE TABLE air_visibility (
visibility DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT air_visibility (TIME, station, visibility) 
SELECT TIME, station, visibility FROM air;
+------+
| rows |
+------+
| 4 |
+------+
Query took 0.045 seconds.
SELECT * FROM air_visibility;
+----------------------------+------------+------------+
| time | station | visibility |
+----------------------------+------------+------------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 |
+----------------------------+------------+------------+

插入重复数据

CnosDB的存储引擎可以看成一种KV存储,其中Timestamp 和 Tags 构成了 KEY,Fields 构成了一系列Value。

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
INSERT INTO air (TIME, station, visibility, temperature) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69);

上面语句相当于插入了如下k-v对:

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')56
(1666165200290401000, 'XiaoMaiDao')69

结果为

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 56.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+

当同一field字段出现重复的k-v 对时,会发生覆盖。

INSERT INTO air (TIME, station, visibility) VALUES
(1666165200290401000, 'XiaoMaiDao', 66);

相当于插入

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')66

key 为 (1666165200290401000, 'XiaoMaiDao') 的 visibility-value 发生变化,变为 66。

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+
INSERT INTO air (TIME, station, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 77);

相当于插入

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')77
select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | 77.0 |
+----------------------------+------------+------------+-------------+----------+

更新数据

更新 tag 列

语法

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause

assignment clause :
tag_name = value_expression

使用说明

  1. CnosDB支持单独更新单个或多个 tag 列值,不支持同时更新 tag 列及 field 列。
  2. CnosDB支持更新 tag 列值为 NULL。
  3. value_expression 只能为编译期能确定值的表达式,如:'常量'、'1 + 2'、'CAST('1999-12-31 00:00:00.000' as timestamp)' 等。
  4. where_clause 中不能包含 field 列或 time 列,且不能为空,如果想更新表中所有数据,需要使用 'where true',这代表你接受在表数据量比较大时带来的性能问题。
  5. 不支持修改成已经存在 series(所有的 tag 列值构成 series)。
  6. 避免在写入数据时执行更新 tag 操作,可能会引起 series 冲突。

示例

update air set station = 'ShangHai' where station = 'LianYunGang';

更新 field 列

语法

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause

assignment clause :
field_name = value_expression

使用说明

  1. CnosDB支持单独更新单个或多个 field 列值,不支持同时更新 tag 列及 field 列。

示例

update air set pressure = pressure + 100 where pressure = 68 and time < '2023-01-14T16:03:00';

删除数据

按 tag 和 时间列过滤并删除数据。

语法

DELETE FROM table_name where_clause

使用说明

  1. where_clause 中仅能包含 tag 列和 time 列,不能包含 field 列。

示例

delete from air where station = 'LianYunGang' and time < '2023-01-14T16:03:00';

查询数据

CnosDB SQL 的灵感来自于 DataFusion ,我们支持DataFusion的大部分SQL语法。

注意:为了查询能更高效,没有指定排序的查询,每次行顺序都不一定相同,如果需要按字段排序的话,请参看ORDER BY子句。

示例数据

为了进一步学习CnosDB,本节将提供示例数据供您下载,并教您如何将数据导入数据库。后面章节中引用的数据源都来自此示例数据。

下载数据

如果在 cnosdb-cli 中,请输入\q退出

在shell中执行以下命令将在本地生成一个名称为oceanic_station的Line Protocol格式的数据文件

curl -o oceanic_station.txt https://dl.cnosdb.com/sample/oceanic_station.txt

导入数据

  • 启动CLI

    cnosdb-cli
  • 创建数据库

    create database oceanic_station;
  • 切换到指定数据库

    \c oceanic_station
  • 导入数据

    执行\w指令,\w后面为数据文件的绝对路径或相对cnosdb-cli的工作路径。

    \w oceanic_station.txt

SQL 语法

语法

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ OFFSET count ]
[ LIMIT { count | ALL } ];

-- from_item
-- 1.
tb_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
-- 2.
from_item join_type from_item
{ ON join_condition | USING ( join_column [, ...] ) }

-- join_type
[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
FULL [ OUTER ] JOIN
CROSS JOIN

-- grouping_element
()

SELECT 子句

SELECT *

通配符 * 可以用于代指全部列。

示例

SELECT * FROM air;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

ALL/DISTINCT

语法

SELECT [ ALL | DISTINCT ] select_expression [, ...];

SELECT关键字后可以使用DISTINCT去掉重复字段,只返回去重后的值。 使用ALL会返回字段中所有重复的值。不指定此选项时,默认值为ALL

示例

SELECT DISTINCT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
+-------------+------------+
SELECT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+------------+

别名

可以用 AS 关键字为列表达式或表取别名。

为列表达式取别名

语法

expression [ [ AS ] column_alias ]

示例

SELECT station s, visibility AS v FROM air;
+-------------+----+
| s | v |
+-------------+----+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+----+

为表取别名

你也可以用关键字AS为表取别名。

语法

FROM tb_name [AS] alias_name

示例

SELECT a.visibility, s.temperature
FROM air AS a JOIN sea s ON a.temperature = s.temperature limit 10;
+------------+-------------+
| visibility | temperature |
+------------+-------------+
| 67 | 62 |
| 50 | 78 |
| 50 | 78 |
| 65 | 79 |
+------------+-------------+

SELECT限制

  • 如果SELECT子句仅包含Tag列,相当于 SELECT DISTINCT Tag列

    示例

    -- station是Tag列,temperature是Field列
    SELECT station, temperature FROM air;
    +-------------+-------------+
    | station | temperature |
    +-------------+-------------+
    | XiaoMaiDao | 69 |
    | XiaoMaiDao | 78 |
    | XiaoMaiDao | 62 |
    | XiaoMaiDao | 79 |
    | XiaoMaiDao | 53 |
    | XiaoMaiDao | 72 |
    | XiaoMaiDao | 71 |
    | LianYunGang | 69 |
    | LianYunGang | 80 |
    | LianYunGang | 74 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    +-------------+-------------+
    -- station 是Tag列
    SELECT station FROM air;
    +-------------+
    | station |
    +-------------+
    | XiaoMaiDao |
    | LianYunGang |
    +-------------+

LIMIT 子句

语法

LIMIT n

限制返回结果集的行数为n,n必须非负。

示例

SELECT *
FROM air LIMIT 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
+---------------------+-------------+------------+-------------+----------+

OFFSET 子句

语法

OFFSET m

返回的结果集跳过 m 条记录, 默认 m=0。

示例

SELECT *
FROM air OFFSET 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

OFFSET可以和LIMIT语句配合使用,用于指定跳过的行数,格式为LIMIT n OFFSET m。 其中:LIMIT n控制输出m行数据,OFFSET m表示在开始返回数据之前跳过的行数。 OFFSET 0与省略OFFSET子句效果相同。

示例

SELECT *
FROM air LIMIT 3 OFFSET 3;
+---------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
+---------------------+------------+------------+-------------+----------+

WITH 子句

语法

WITH cte AS cte_query_definiton [, ...] query

可选。WITH子句包含一个或多个常用的表达式CTE(Common Table Expression)。 CTE充当当前运行环境中的临时表,您可以在之后的查询中引用该表。CTE使用规则如下:

  • 在同一WITH子句中的CTE必须具有唯一的名字。
  • 在WITH子句中定义的CTE仅对在其后定义的同一WITH子句中的其他CTE可以使用。 假设A是子句中的第一个CTE,B是子句中的第二个CTE:

示例

SELECT station, avg 
FROM ( SELECT station, AVG(visibility) AS avg
FROM air
GROUP BY station) AS x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+
WITH x AS 
(SELECT station, AVG(visibility) AS avg FROM air GROUP BY station)
SELECT station, avg
FROM x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+

UNION 子句

UNION 子句用于合并多个 SELECT 语句的分析结果。

语法

select_clause_set_left
[ UNION | UNION ALL| EXCEPT | INTERSECT]
select_clause_set_right
[sort_list_columns] [limit_clause]

UNION 会对合并的结果集去重。 UNION ALL 保留合并的结果集中相同的数据。 EXCEPT 会作两个结果集的差,从左查询中返回右查询没有找到的所有非重复值。 INTERSECT 返回两个结果集的交集(即两个查询都返回的所有非重复值)。

注意

UNION 内每个 SELECT 子句必须拥有相同数量的列,对应列的数据类型相同。

示例

  • UNION ALL

    SELECT visibility FROM air WHERE temperature < 60
    UNION ALL
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    +------------+
  • UNION

    SELECT visibility FROM air WHERE temperature < 60
    UNION
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • EXCEPT

    SELECT visibility FROM air
    EXCEPT
    SELECT visibility FROM air WHERE temperature < 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • INTERSECT

    SELECT visibility FROM air
    INTERSECT
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+

ORDER BY 子句

按引用的表达式对结果进行排序。默认情况使用升序 (ASC)。通过在 ORDER BY 的表达式后添加 DESC 按降序排序。

示例

SELECT * FROM air ORDER BY temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY temperature DESC;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY station, temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
+---------------------+-------------+------------+-------------+----------+

表达式

表达式是符号和运算符的一种组合,CnosDB 将处理该组合以获得单个数据值。 简单表达式可以是一个常量、变量、列或标量函数。 可以用运算符将两个或更多的简单表达式联接起来组成复杂表达式。

语法

<expresion> :: = { 
constant
| [ table_name. ] column
| scalar_function
| ( expression )
| expression { binary_operator } expression
| case_when_expression
| window_function | aggregate_function
}

常量

表示单个特定数据值的符号。 详细内容请阅览常量

示例

select 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+

标量函数

详细内容请阅览函数

单目运算符

运算符说明
NOT如果子表达式为true,则整个表达式false,如果整个表达式为false,则整个表达式为true
IS NULL如果子表达式为null,则整个表达式为true
IS NOT NULL如果子表达式为null,则整个表达式为false

二元运算符

二元运算符和两个表达式组合在一起,形成一个新的表达式。

支持的二元运算符有:

运算符说明
+数字类型表达式相加
-数字类型表达式相减
*数字类型表达式相乘
/数字类型表达式相除
%整数类型表达式取余
||字符串类型表达式拼接
=比较表达式是否相等
!=、 <>比较表达式是否不相等
<比较表达式是否小于
<=比较表达式是否小于等于
>比较表达式是否大于
>=比较表达式是否大于等于
AND先求左表达式的值,如果为true,计算右表达式的值,都为true为true
OR先求左表达式的值,如果为false,计算右表达式的值,都为false为false
LIKE判断左表达式是否符合右表达式的模式

BETWEEN AND 表达式

语法

expr BETWEEN expr AND expr

示例

SELECT DISTINCT PRESSURE FROM AIR WHERE PRESSURE BETWEEN 50 AND 60;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 60 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

注意:BETWEEN x AND y 会列出x和y之间的数,包括x和y

IN 表达式

IN 操作符判断列表中是否有值与表达式相等。

示例

SELECT station, temperature, visibility FROM air WHERE temperature  IN (68, 69);
+-------------+-------------+------------+
| station | temperature | visibility |
+-------------+-------------+------------+
| XiaoMaiDao | 69 | 56 |
| LianYunGang | 69 | 78 |
+-------------+-------------+------------+

注意

IN 列表暂时只支持常量

CASE WHEN 表达式

当表达式需要按照不同情况得不同的值时,可以使用 CASE WHEN 表达式。

语法

CASE
( WHEN expression THEN result1 [, ...] )
ELSE result
END;

示例

SELECT DISTINCT 
CASE WHEN PRESSURE >= 60 THEN 50
ELSE PRESSURE
END PRESSURE
FROM AIR;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

运算符优先级

如果一个复杂表达式有多个运算符,则运算符优先级将确定操作序列。 执行顺序可能对结果值有明显的影响。

运算符的优先级别如下表中所示。 在较低级别的运算符之前先对较高级别的运算符进行求值。 在下表中,1 代表最高级别,8 代表最低级别。

级别运算符
1*(乘)、/(除)、%(取模)
2+(正)、-(负)、+(加)、+(串联)、-(减)
3=、>=、<=、<>、!=、>、<(比较运算符)
4NOT
5AND
6BETWEEN、IN、LIKE、OR

SHOW

语法

SHOW {DATABASES | TABLES | QUERIES}

显示所有数据库,或显示所有表, 或正在执行的SQL。

示例

SHOW DATABASES;
+---------------+
| database_name |
+---------------+
| public |
+---------------+
SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+
SHOW QUERIES;
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| query_id | query_text | user_id | user_name | tenant_id | tenant_name | state | duration |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| 36 | select * FROM air join sea ON air.temperature = sea.temperature; | 108709109615072923019194003831375742761 | root | 13215126763611749424716665303609634152 | cnosdb | SCHEDULING | 12.693345666 |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+

关于 SHOW QUERIES 语句的详细信息,可以在系统表 QUERIES 查看。

SHOW SERIES

返回指定表的series

语法

SHOW SERIES [ON database_name] FROM table_name [WHERE expr] [order_by_clause] [limit_clause] 

示例

SHOW SERIES FROM air WHERE station = 'XiaoMaiDao' ORDER BY key LIMIT 1;
+------------------------+
| key |
+------------------------+
| air,station=XiaoMaiDao |
+------------------------+

注意

WEHER子句中的表达式列,只能是tag列或time列,ORDER BY 子句的表达式只能是 key

SHOW TAG VALUES

语法

SHOW TAG VALUES [ON database_name] FROM table_name WITH KEY [<operator> "<tag_key>" | [[NOT] IN ("<tag_key1>", ..)]] [WHERE expr] [order_by_clause] [limit_clause];

operator 包括 =, !=

示例

SHOW TAG VALUES FROM air WITH KEY = "station" WHERE station = 'XiaoMaiDao' ORDER BY key, value LIMIT 1;
+---------+------------+
| key | value |
+---------+------------+
| station | XiaoMaiDao |
+---------+------------+
SHOW TAG VALUES FROM air WITH KEY NOT IN ("station1");
+---------+-------------+
| key | value |
+---------+-------------+
| station | XiaoMaiDao |
| station | LianYunGang |
+---------+-------------+

EXPLAIN

语法

EXPLAIN [ ANALYZE ] [ VERBOSE ] <statement>;

说明

EXPLAIN 语句仅用于显示查询的执行计划,而不执行查询。

EXPLAIN ANALYZE 执行查询,并显示查询的执行计划。

EXPLAIN ANALYZE VERBOSE 执行查询,并显示更详细的执行计划,包括读取的行数等。

示例

EXPLAIN SELECT station, temperature, visibility FROM air;
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #air.station, #air.temperature, #air.visibility |
| | TableScan: air projection=[station, visibility, temperature] |
| physical_plan | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature] |
| | |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE VERBOSE SELECT station, temperature, visibility FROM air;
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=26.75µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=13.225875ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_point_to_record_batch=3.918163ms, elapsed_field_scan=3.992161ms, elapsed_series_scan=1.657416ms] |
| | |
| Plan with Full Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.865034 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=26.75µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.864225 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=13.225875ms, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13, elapsed_point_to_record_batch{partition=0}=3.918163ms, elapsed_field_scan{partition=0}=3.992161ms, elapsed_series_scan{partition=0}=1.657416ms] |
| | |
| Output Rows | 13 |
| Duration | 13.307708ms |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

DESCRIBE

语法

DESCRIBE {DATABASE db_name | TABLE tb_name};

描述数据库的参数,描述表的模式。

示例

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+
DESCRIBE DATABASE public;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

Join 子句

CnosDB支持 INNER JOINLEFT OUTER JOINRIGHT OUTER JOINFULL OUTER JOIN

目前暂不支持 CROSS JOIN

INNER JOIN

关键字 JOININNER JOIN 定义了一个只显示两个表中匹配的行的连接。

示例

SELECT * FROM air INNER JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

LEFT JOIN

用关键字 LEFT JOINLEFT OUTER JOIN 定义一个左连接。该连接包括左表中的所有行,如果右表没有匹配行,则连接的右侧为空值。

示例

SELECT * FROM air LEFT JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

RIGHT JOIN

用关键字 RIGHT JOINRIGHT OUTER JOIN 定义一个右连接。该连接包括右表中的所有行,如果左表没有匹配行,则连接的左侧为空值。

示例

SELECT * FROM air RIGHT JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

FULL JOIN

关键字 FULL JOINFULL OUTER JOIN 定义了一个全连接,实际上它是 LEFT OUTER JOIN 和 RIGHT OUTER JOIN 的联合。 它会显示连接左侧和右侧的所有行,并将在连接的任一侧不匹配的地方产生空值。

示例

SELECT * FROM air FULL JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

GROUP BY 子句

GROUP BY 子句必须在 WHERE 子句(如果有的话)的条件之后,ORDER BY 子句(如果有的话)之前。

示例

SELECT station, AVG(temperature) 
FROM air
GROUP BY station;
+-------------+----------------------+
| station | AVG(air.temperature) |
+-------------+----------------------+
| XiaoMaiDao | 69.14285714285714 |
| LianYunGang | 72.16666666666667 |
+-------------+----------------------+

HAVING 子句

语法

group_by_clause 
[ HAVING condition ];

在 SELECT 查询中,HAVING 子句必须紧随 GROUP BY 子句,并出现在 ORDER BY 子句(如果有的话)之前。

HAVING 与 WHERE 的区别

HAVING 在 GROUP BY 子句之后使你能够指定过滤条件,从而控制查询结果中哪些组可以出现在最终结果里面。

WHERE 在 GROUP BY 子句之前对被选择的列施加条件,而 HAVING 子句则对 GROUP BY 子句所产生的组施加条件。

示例

SELECT station, AVG(temperature)  AS avg_t 
FROM air
GROUP BY station
HAVING avg_t > 70;
+-------------+-------------------+
| station | avg_t |
+-------------+-------------------+
| LianYunGang | 72.16666666666667 |
+-------------+-------------------+

复杂的分组操作

CnosDB 提供了 ROLLUPCUBE 等复杂分组操作,使您能以不同的方式操作查询结果

ROLLUP

您可以在单个查询中使用 ROLLUP 选项来生成多个分组集。

ROLLUP 假定输入列之间存在层次结构。

如果你的group by 子句是:

语法

SELECT ...
FROM ...
GROUP BY ROLLUP(column_1,column_2);

它与如下的语句等同:

语法

SELECT ...
FROM ...


UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1

UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1, column2;

ROLLUP 生成在此层次结构中有意义的所有分组集。 每次 column_1 的值发生变化时,它都会生成一个小计行;

因此,我们经常在报告中使用 ROLLUP 来生成小计和总计。 ROLLUP 中列的顺序非常重要。

示例

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY ROLLUP (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| | | 70.53846153846153 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
+-------------+------------+----------------------+

CUBE

与 ROLLUP 类似,CUBE 是 GROUP BY 子句的扩展。 它允许您为 GROUP BY 子句中指定的分组列的所有组合生成小计。

CUBE为指定表达式集的每个可能组合创建分组集。首先会对(A、B、C)进行group by,

然后依次是(A、B),(A、C),(A),(B、C),(B),(C),最后对全表进行group by操作。

SELECT ... 
FROM ...
GROUP BY CUBE (column1, column2);

等价于:

SELECT ...
FROM ...
GROUP BY column1

UNION ALL

SELECT ...
FROM ...
GROUP BY column2

UNION ALL

SELECT ...
FROM ...
GROUP BY column1, column2

UNION ALL

SELECT ...
FROM ...
;

示例

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY CUBE (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
| | 56 | 69 |
| | 50 | 78 |
| | 67 | 66 |
| | 65 | 79 |
| | 53 | 53 |
| | 74 | 72 |
| | 71 | 71 |
| | 78 | 69 |
| | 79 | 80 |
| | 59 | 72 |
| | 80 | 70 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| | | 70.53846153846153 |
+-------------+------------+----------------------+

聚合函数

一般聚合函数

COUNT

语法

COUNT(x)

功能:返回选定元素中检索过的行的数目。

包含DISTINCT关键字,会对去重后的结果计数。

COUNT() 和 COUNT(literal value) 是等价的,如果sql的投影中仅含有 `/literal value`,则sql会被重写为 COUNT(time)。

COUNT(tag) 等价于 COUNT(DISTINCT tag)。

COUNT(field) 返回非NULL值的个数。

参数类型:任意

返回类型:BIGINT

示例

SELECT COUNT(*) FROM air;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 13 |
+-----------------+
SELECT COUNT(temperature) FROM air;
+------------------------+
| COUNT(air.temperature) |
+------------------------+
| 13 |
+------------------------+
SELECT COUNT(DISTINCT temperature) FROM air;
+---------------------------------+
| COUNT(DISTINCT air.temperature) |
+---------------------------------+
| 10 |
+---------------------------------+

SUM

语法

SUM(NUMERICS)

功能:返回从选定元素计算出的总和值。

参数类型:数值类型。

返回类型:与参数类型相同。

示例

SELECT SUM(temperature) FROM air;
+----------------------+
| SUM(air.temperature) |
+----------------------+
| 917 |
+----------------------+

MIN

语法

MIN(STRING | NUMERICS | TIMESTAMP)

功能:返回选定元素中最小值。

参数类型:数值类型或STRING或TIMESTAMP。

返回类型:与参数类型相同。

示例

 SELECT MIN(time), MIN(station), MIN(temperature) FROM air;
+---------------------+------------------+----------------------+
| MIN(air.time) | MIN(air.station) | MIN(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:21:00 | LianYunGang | 53 |
+---------------------+------------------+----------------------+

MAX

语法

MAX(STRINGS | NUMERICS | TIMESTAMPS)

功能:返回选定元素中最大值。

参数类型:数值类型或STRING或TIMESTAMP。

返回类型:与参数类型相同。

示例

SELECT MAX(time), MAX(station), MAX(temperature) FROM air;
+---------------------+------------------+----------------------+
| MAX(air.time) | MAX(air.station) | MAX(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:39:00 | XiaoMaiDao | 80 |
+---------------------+------------------+----------------------+

AVG

语法

AVG(NUMERICS)

功能:返回选定元素的平均值。

参数类型:数值类型。

返回类型:数值类型。

示例

SELECT AVG(temperature) FROM air;
+----------------------+
| AVG(air.temperature) |
+----------------------+
| 70.53846153846153 |
+----------------------+

ARRAY_AGG

语法

ARRAY_AGG(expr)

功能:返回一个数组,该数组由选定元素的所有值组成,元素类型必须相同。

参数类型:任意。

返回类型:参数类型的数组。

示例

SELECT ARRAY_AGG(temperature) from air;
+------------------------------------------------------+
| ARRAYAGG(air.temperature) |
+------------------------------------------------------+
| [69, 78, 62, 79, 53, 72, 71, 69, 80, 74, 70, 70, 70] |
+------------------------------------------------------+

注意:该聚合函数结果,无法以CSV格式返回。

FIRST

first(time,  value)

获取一列按另一列排序的第一个值。

参数:

  • time: Timestamp

  • value: any

返回值: 同value类型相同

示例

select first(time, pressure) from air;
+------------------------------+
| first(air.time,air.pressure) |
+------------------------------+
| 63.0 |
+------------------------------+

LAST

last(time,  value)

获取一列按另一列排序的最后一个值。

参数:

  • time: Timestamp

  • value: any

返回值: 同value类型相同

示例

select last(time, pressure) from air;
+-----------------------------+
| last(air.time,air.pressure) |
+-----------------------------+
| 55.0 |
+-----------------------------+

MODE

mode(value)

计算一列的众数。

参数: value: any

返回值: 同value类型相同

示例

select mode(pressure) from air;
+--------------------+
| mode(air.pressure) |
+--------------------+
| 69.0 |
+--------------------+

INCREASE

increase(time, value order by time)

计算 value 在时间序列中的增量

参数: value 数字类型

返回值: 同value类型相同

示例

CREATE DATABASE IF NOT EXISTS TEST_INCREASE;
ALTER DATABASE TEST_INCREASE SET TTL '100000D';
CREATE TABLE IF NOT EXISTS test_increase.test_increase(f0 BIGINT, TAGS(t0));
INSERT INTO test_increase.test_increase(time, t0, f0)
VALUES
('1999-12-31 00:00:00.000', 'a', 1),
('1999-12-31 00:00:00.005', 'a', 2),
('1999-12-31 00:00:00.010', 'a', 3),
('1999-12-31 00:00:00.015', 'a', 4),
('1999-12-31 00:00:00.020', 'a', 5),
('1999-12-31 00:00:00.025', 'a', 6),
('1999-12-31 00:00:00.030', 'a', 7),
('1999-12-31 00:00:00.035', 'a', 8),
('1999-12-31 00:00:00.000', 'b', 1),
('1999-12-31 00:00:00.005', 'b', 2),
('1999-12-31 00:00:00.010', 'b', 3),
('1999-12-31 00:00:00.015', 'b', 4),
('1999-12-31 00:00:00.020', 'b', 1),
('1999-12-31 00:00:00.025', 'b', 2),
('1999-12-31 00:00:00.030', 'b', 3),
('1999-12-31 00:00:00.035', 'b', 4);
SELECT t0, INCREASE(time, f0 ORDER BY time) AS increase
FROM test_increase.test_increase GROUP BY t0 ORDER BY t0;
+----+----------+
| t0 | increase |
+----+----------+
| a | 7 |
| b | 7 |
+----+----------+

统计聚合函数

VAR | VAR_SAMP

语法

VAR(NUMERICS)

功能:计算给定样本的方差

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT VAR(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 51.43589743589741 |
+---------------------------+

VAR_POP

语法

VAR_POP(NUMERICS)

功能:计算总体方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT VAR_POP(temperature) FROM air;
+------------------------------+
| VARIANCEPOP(air.temperature) |
+------------------------------+
| 47.47928994082838 |
+------------------------------+

STDDEV | STDDEV_SAMP

STDDEV(NUMERICS)

功能:计算样本标准差。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT STDDEV(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 7.1718824192744135 |
+-------------------------+

STDDEV_POP

语法

STDDEV_POP(NUMERICS)

功能:计算出的总体标准差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT STDDEV_POP(temperature) FROM air;
+----------------------------+
| STDDEVPOP(air.temperature) |
+----------------------------+
| 6.890521746633442 |
+----------------------------+

COVAR | COVAR_SAMP

语法

COVAR(NUMERICS, NUMERICS)

功能:返回样本的协方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT COVAR(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| -5.121794871794841 |
+------------------------------------------+

COVAR_POP

语法

COVAR_POP(NUMERICS, NUMERICS)

功能:返回组中数字对的总体协方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT COVAR_POP(temperature, pressure) FROM air;
+---------------------------------------------+
| COVARIANCEPOP(air.temperature,air.pressure) |
+---------------------------------------------+
| -4.727810650887546 |
+---------------------------------------------+

CORR

语法

CORR**(NUMERICS, NUMERICS)

功能:返回表示一组数字对之间的关联情况的皮尔逊系数。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT CORR(temperature, pressure) FROM air;
+-------------------------------------------+
| CORRELATION(air.temperature,air.pressure) |
+-------------------------------------------+
| -0.07955796767766017 |
+-------------------------------------------+

近似聚合函数

APPROX_DISTINCT

语法

APPROX_DISTINCT(x)

功能:返回不同输入值的近似值(HyperLogLog)。

参数类型:STRING

返回类型:BIGINT

示例

SELECT APPROX_DISTINCT(station) FROM air;
+-----------------------------+
| APPROXDISTINCT(air.station) |
+-----------------------------+
| 2 |
+-----------------------------+

APPROX_PERCENTILE_CONT

语法

APPROX_PERCENTILE_CONT(x, p)  

功能:返回输入值x的近似百分位(TDigest),p是百分位,是0到1(包括1)之间的64位浮点数。

参数类型:x为数值类型,p为DOUBLE类型。

返回类型:DOUBLE

示例

SELECT APPROX_PERCENTILE_CONT(temperature, 0.1) FROM air;
+----------------------------------------------------+
| APPROXPERCENTILECONT(air.temperature,Float64(0.1)) |
+----------------------------------------------------+
| 60.4 |
+----------------------------------------------------+

APPROX_PERCENTILE_CONT_WITH_WEIGHT

语法

APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, w, p)  

功能:x返回带权重的输入值的近似百分比(TDigest),其中w是权重列表达式,p是0到1(包括)之间的浮点64。

APPROX_PERCENTILE_CONT(x, p) 相当于 APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, 1, p)。

参数类型:x,w为数值类型,p为DOUBLE类型。

返回类型:DOUBLE

示例

SELECT APPROX_PERCENTILE_CONT_WITH_WEIGHT(temperature,2, 0.1) FROM air;
+-----------------------------------------------------------------------+
| APPROXPERCENTILECONTWITHWEIGHT(air.temperature,Int64(2),Float64(0.1)) |
+-----------------------------------------------------------------------+
| 54.35 |
+-----------------------------------------------------------------------+

APPROX_MEDIAN(NUMERICS)

语法

APPROX_MEDIAN(NUMERICS)

功能:返回输入值的近似中值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT APPROX_MEDIAN(temperature) FROM air;
+-------------------------------+
| APPROXMEDIAN(air.temperature) |
+-------------------------------+
| 70 |
+-------------------------------+

SAMPLE

语法

SAMPLE(<column_key>, <N>)

功能:从给定的列 column_key 中随机选择 N 条记录。

参数类型

  • column_key:任意类型

  • N:整数类型

返回类型:数组

示例

select sample(visibility, 5) from air;
+--------------------------------------+
| sample(air.visibility,Int64(5)) |
+--------------------------------------+
| [65.0, 74.0, 76.0, 77.0, 72.0, 77.0] |
+--------------------------------------+

ASAP_SMOOTH

asap_smooth(time, value, resolution order by time)

ASAP smoothing算法旨在创建人类可读的图形,保留输入数据的粗糙形状和较大趋势,同时最小化点之间的局部方差。 采用 (Timestamp,value)对,将它们标准化为目标时间间隔,并返回 ASAP smooth 值。

参数:

  • time: Timestamp

  • value: Double

  • resolution: Bigint ,要返回的大概点数( (Timestamp, value) 对),确定结果图的水平分辨率。

返回值: TimeVector

Struct {
time: List[Timestamp], -- 毫秒级
value: List[Double],
resolution: Int Unsigned,
}

示例

select asap_smooth(time, pressure, 10) from air group by date_trunc('month', time);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| asap_smooth(air.time,air.pressure,Int64(10)) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10} |
| {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10} |
| {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

两阶段聚合函数

stats_agg

对二维数据执行线性回归分析,例如计算相关系数和协方差。 并且还可以分别计算每个维度的常见统计数据,例如平均值和标准差。 stats_agg 提供了与sum, count, corr, covar_pop 等聚合函数相同的功能, 适用于一条SQL中,包含多个分析函数的场景。

注意: 两列都不为NULL时才纳入聚合。

stats_agg

stats_agg(y, x)

功能:进行统计聚合。

参数类型

  • y: double 类型
  • x: double 类型

返回类型:结构体类型

{ 
n: bigint, -- count
sx: double, -- sum(x)- sum(x)
sx2: double, -- sum((x-sx/n)^2) (sum of squares)
sx3: double, -- sum((x-sx/n)^3)
sx4: double, -- sum((x-sx/n)^4)
sy: double, -- sum(y)
sy2: double, -- sum((y-sy/n)^2) (sum of squares)
sy3: double, -- sum((y-sy/n)^3)
sy4: double, -- sum((y-sy/n)^4)
sxy: double, -- sum((x-sx/n)*(y-sy/n)) (sum of products)
}

示例

create table if not exists test_stats(x bigint, y bigint);
alter database public set ttl '1000000d';
insert into test_stats(time, x, y) values
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 1, 4),
(5, 1, 5),
(6, 2, 1),
(7, 2, 2),
(8, 2, 3),
(9, 2, 4),
(10, 2, 5);
select stats_agg(y, x) from test_stats;
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stats_agg(test_stats.y,test_stats.x) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {n: 10, sx: 15.0, sx2: 2.5, sx3: -2.7755575615628914e-16, sx4: 0.6249999999999999, sy: 30.0, sy2: 20.0, sy3: -1.7763568394002505e-15, sy4: 68.0, sxy: 0.0} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+

num_vals

计算二维统计聚合后的数据行数。

返回类型:BIGINT UNSIGNED

select num_vals(stats_agg(y, x)) from test_stats;
+------------------------------------------------+
| num_vals(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------+
| 10 |
+------------------------------------------------+

average_y, average_x

计算二维统计聚合后指定维度的平均值。

返回类型:DOUBLE

select average_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| average_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 1.5 |
+-------------------------------------------------+

sum_y, sum_x

计算二维统计聚合后指定维度的和,方式为 population。

返回类型:DOUBLE

select sum_x(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| sum_x(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 15.0 |
+---------------------------------------------+

stddev_samp_y, stddev_samp_x

计算二维统计聚合后指定维度的标准差,方式为 sample。

返回类型:DOUBLE

select stddev_samp_x(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------+
| stddev_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------+
| 0.5270462766947299 |
+-----------------------------------------------------+

stddev_pop_y, stddev_pop_x

计算二维统计聚合后指定维度的标准差,方式为 population。

返回类型:DOUBLE

select stddev_pop_x(stats_agg(y, x)) from test_stats;
+----------------------------------------------------+
| stddev_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+----------------------------------------------------+
| 0.5 |
+----------------------------------------------------+

var_samp_y, var_samp_x

计算二维统计聚合后指定维度的方差,方式为 sample。

返回类型:DOUBLE

select var_samp_x(stats_agg(y, x)) from test_stats;
+--------------------------------------------------+
| var_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+--------------------------------------------------+
| 0.2777777777777778 |
+--------------------------------------------------+

var_pop_y, var_pop_x

计算二维统计聚合后指定维度的方差,方式为 population。

返回类型:DOUBLE

select var_pop_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| var_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 0.25 |
+-------------------------------------------------+

skewness_samp_y, skewness_samp_x

计算二维统计聚合后指定维度的偏度值,方式为 sample。

返回类型:DOUBLE

select skewness_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| skewness_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| -2.1065000811460203e-16 |
+-------------------------------------------------------+

skewness_pop_y, skewness_pop_x

计算二维统计聚合后指定维度的偏度值,方式为 population。

返回类型:DOUBLE

select skewness_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| skewness_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| -2.220446049250313e-16 |
+------------------------------------------------------+

kurtosis_samp_y, kurtosis_samp_x

计算二维统计聚合后指定维度的峰度值,方式为 sample。

返回类型:DOUBLE

select kurtosis_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| kurtosis_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.8999999999999998 |
+-------------------------------------------------------+

kurtosis_pop_y, kurtosis_pop_x

计算二维统计聚合后指定维度的峰度值,方式为 population。

返回类型:DOUBLE

select kurtosis_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| kurtosis_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.9999999999999998 |
+------------------------------------------------------+

correlation

计算二维统计聚合后的相关。

返回类型:DOUBLE

select correlation(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| correlation(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| 0.0 |
+---------------------------------------------------+

covariance_samp, covariance_pop

计算二维统计聚合后的协方差。

返回类型:DOUBLE

select covariance_samp(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| covariance_samp(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.0 |
+-------------------------------------------------------+
select covariance_pop(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| covariance_pop(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.0 |
+------------------------------------------------------+

determination_coeff

计算二维统计聚合后的决定系数。

返回类型:DOUBLE

select determination_coeff(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------------+
| determination_coeff(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------------+
| 0.0 |
+-----------------------------------------------------------+

slope

根据二维统计聚合,计算线性拟合线的斜率。

返回类型:DOUBLE

select slope(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| slope(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 0.0 |
+---------------------------------------------+

intercept

计算二维统计聚合后y的截距。

返回类型:DOUBLE

select intercept(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| intercept(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 3.0 |
+-------------------------------------------------+

x_intercept

计算二维统计聚合后x的截距。

返回类型:DOUBLE

select x_intercept(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| x_intercept(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| -inf |
+---------------------------------------------------+

gauge_agg

分析Gauge数据。与Counter不同,Gauge可以减少也可以增加。

gauge_agg

gauge_agg(time, value)

这是分析 Gauge 数据的第一步。使用 gauge_agg 创建中间聚合数据, 接下来其他函数使用中间聚合数据进行计算。

参数

  • time: Timestamp

  • value: DOUBLE

返回值

Struct {
first: Struct {
ts: Timestamp,
value: Double
},
second: Struct {
ts: Timestamp,
value: Double
},
penultimate: Struct {
ts: Timestamp,
val: Double
},
last: Struct {
ts: Timestamp,
val: Double
},
num_elements: Bigint Unsingned
}

示例:

select gauge_agg(time, pressure) from air group by date_trunc('month', time);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gauge_agg(air.time,air.pressure) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {first: {ts: 2023-03-01T00:00:00, val: 54.0}, second: {ts: 2023-03-01T00:00:00, val: 59.0}, penultimate: {ts: 2023-03-14T16:00:00, val: 55.0}, last: {ts: 2023-03-14T16:00:00, val: 80.0}, num_elements: 13122} |
| {first: {ts: 2023-01-14T16:00:00, val: 63.0}, second: {ts: 2023-01-14T16:00:00, val: 68.0}, penultimate: {ts: 2023-01-31T23:57:00, val: 77.0}, last: {ts: 2023-01-31T23:57:00, val: 54.0}, num_elements: 16640} |
| {first: {ts: 2023-02-01T00:00:00, val: 54.0}, second: {ts: 2023-02-01T00:00:00, val: 60.0}, penultimate: {ts: 2023-02-28T23:57:00, val: 74.0}, last: {ts: 2023-02-28T23:57:00, val: 59.0}, num_elements: 26880} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

delta

获取一段时间内Gauge的变化。这是简单的增量,通过从第一个值减去最后一个看到的值来计算。

返回值: Double

select delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+-----------------------------------------+
| delta(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------+
| 26.0 |
| -9.0 |
| 5.0 |
+-----------------------------------------+

time_delta

获取持续时间,最后一个Gauge的时间减去第一个Gauge的时间。

返回值: INTERVAL

select time_delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------------------------+
| time_delta(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------------------+
| 0 years 0 mons 13 days 16 hours 0 mins 0.000000000 secs |
| 0 years 0 mons 17 days 7 hours 57 mins 0.000000000 secs |
| 0 years 0 mons 27 days 23 hours 57 mins 0.000000000 secs |
+----------------------------------------------------------+

rate

计算Gauge变化和时间变化的比率。

返回值: Double

单位:

时间单位是ns时,比率单位是 /ns,

时间单位是ms时,比率单位就是 /ms

时间单位是s时,比率单位就是 /s

select rate(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------+
| rate(gauge_agg(air.time,air.pressure)) |
+----------------------------------------+
| 2.2018970189701897e-14 |
| 9.349414325974008e-15 |
| -4.133905465849807e-16 |
+----------------------------------------+

first_time

取得Gauge中最小的时间戳

返回类型: TIMESTAMP

select first_time(gauge_agg(time, pressure)) from air;
+----------------------------------------------+
| first_time(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------+
| 2023-01-14T16:00:00 |
+----------------------------------------------+

last_time

取得Gauge中最大的时间戳

返回类型: TIMESTAMP

select last_time(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| last_time(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 2023-03-14T16:00:00 |
+---------------------------------------------+

first_val

取得Gauge中最小时间戳对应的值

返回类型: gauge_agg 中指定的列的类型

select first_val(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| first_val(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 68.0 |
+---------------------------------------------+

last_val

取得Gauge中最大时间戳对应的值

返回类型: gauge_agg 中指定的列的类型

select last_val(gauge_agg(time, pressure)) from air;
+--------------------------------------------+
| last_val(gauge_agg(air.time,air.pressure)) |
+--------------------------------------------+
| 80.0 |
+--------------------------------------------+

idelta_left

计算Gauge最早的瞬时变化。这等于第二个值减去第一个值。

返回类型:gauge_agg 中指定的列的类型

 select time, station, pressure from air where station = 'XiaoMaiDao' order by time limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 63.0 |
| 2023-01-14T16:03:00 | XiaoMaiDao | 58.0 |
| 2023-01-14T16:06:00 | XiaoMaiDao | 65.0 |
| 2023-01-14T16:09:00 | XiaoMaiDao | 52.0 |
+---------------------+------------+----------+
select idelta_left(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+-----------------------------------------------+
| idelta_left(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------------+
| -5.0 |
+-----------------------------------------------+

idelta_right

计算Gauge最晚的瞬时变化。这等于最后一个值值减去倒数第二个值。

返回类型:gauge_agg 中指定的列的类型

select time, station, pressure from air where station = 'XiaoMaiDao' order by time desc limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-03-14T16:00:00 | XiaoMaiDao | 55.0 |
| 2023-03-14T15:57:00 | XiaoMaiDao | 62.0 |
| 2023-03-14T15:54:00 | XiaoMaiDao | 75.0 |
| 2023-03-14T15:51:00 | XiaoMaiDao | 61.0 |
+---------------------+------------+----------+
select idelta_right(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+------------------------------------------------+
| idelta_right(gauge_agg(air.time,air.pressure)) |
+------------------------------------------------+
| -7.0 |
+------------------------------------------------+

compact_state_agg

给定一个在离散状态之间切换的系统或值,

汇总每个状态所花费的时间。

例如,您可以使用compact_state_agg函数来跟踪系统

处于错误、运行或启动状态的时间。

compact_state_agg 设计用于处理相对较少的状态。它可能在行之间状态过多的数据集上表现不佳。

如果您需要跟踪进入和退出每个状态的时间,请使用 state_agg 函数。

如果您需要根据心跳信号跟踪系统的活跃度,请考虑使用 heartbeat_agg 函数。

compact_state_agg

compact_state_agg(ts, state)

统计每个状态所花费的时间,并聚合成StateAggData类型

参数: ts 为时间戳类型,state为任意类型

返回值: StateAggData 类型

Struct {
state_duration: List[
Struct{
state: any,
interval: duration
},
...
]
state_periods: List[
Struct(
state: any,
periods: List[
Struct {
start_time: timestamp,
end_time: timestamp
},
...
]
),
......
]
}

示例:

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select compact_state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| compact_state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}, {state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}], state_periods: []} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

统计某个状态的持续时间,或统计某个状态在某个时间段内的持续时间。

参数

state_agg_data: StateAggData

state: any 与compact_state_agg 的state类型相同。

begin_time: 可选,指定时间段内的开始时间。

interval_time: 可选,指定时间段的持续时间,不指定时,时间段为无穷大。

返回值: INTERVAL 类型

示例:

select duration_in(compact_state_agg(time, state), 'running') from states;
+--------------------------------------------------------------------------+
| duration_in(compact_state_agg(states.time,states.state),Utf8("running")) |
+--------------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+--------------------------------------------------------------------------+

state_agg

给定一个在离散状态之间切换的系统或值,跟踪状态之间的转换。

state_agg

state_agg(ts, state)

统计每个状态所花费的时间,,并聚合成StateAggData类型。

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}], state_periods: [{state: starting, periods: [{start_time: 2020-01-01T10:00:00, end_time: 2020-01-01T10:30:00}, {start_time: 2020-01-03T18:30:00, end_time: 2020-01-03T19:30:00}]}, {state: error, periods: [{start_time: 2020-01-03T16:00:00, end_time: 2020-01-03T18:30:00}]}, {state: running, periods: [{start_time: 2020-01-01T10:30:00, end_time: 2020-01-03T16:00:00}, {start_time: 2020-01-03T19:30:00, end_time: 2020-01-05T12:00:00}]}]} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

统计某个状态的持续时间,或统计某个状态在某个时间段内的持续时间。

参数

state_agg_data: StateAggData

state: any 与compact_state_agg 的state类型相同。

begin_time: 可选,指定时间段内的开始时间。

interval_time: 可选,指定时间段的持续时间,不指定时,时间段为无穷大。

返回值: INTERVAL 类型

示例:

统计 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running') from states;
+------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running")) |
+------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+------------------------------------------------------------------+

统计 从2020-01-01 11:00:00 开始 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00') 
from states;
+----------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00")) |
+----------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 21 hours 30 mins 0.000000000 secs |
+----------------------------------------------------------------------------------------------+

统计 从2020-01-01 11:00:00 开始的四天内 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00', interval '4 day')
from states;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00"),IntervalMonthDayNano("73786976294838206464")) |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 20 hours 30 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------+

state_at

state_at(state_agg_data, ts)

统计某一时刻所处的状态。

参数

  • state_agg_data: StateAggData

  • ts: Timestamp

返回值: any 类型,与state_agg_data 统计的状态类型相同。

select state_at(state_agg(time, state), Timestamp '2020-01-01 10:30:00') from states;
+---------------------------------------------------------------------------+
| state_at(state_agg(states.time,states.state),Utf8("2020-01-01 10:30:00")) |
+---------------------------------------------------------------------------+
| running |
+---------------------------------------------------------------------------+

candlestick_agg

进行金融资产数据分析。该功能使编写涉及 candlestick 财务分析查询变得更容易。

candlestick_agg 能得到股票的开盘价和收盘价,何时最高价。

candlestick_agg 从原始报价数据生成中间聚合数据CandleStackData ,

然后可以对此中间聚合数据使用访问函数和汇总函数。

candlestick_agg

candlestick_agg(time, price, volume)

从原始报价查询生成中间聚合数据 CandleStackData 。

参数:

  • time: Timestamp

  • price: Double 价格

  • volume: Double 交易量

返回值: CandleStackData

Struct {
open: Struct {
ts: Timestamp,
val: Double,
},
close: Struct {
ts: Timestamp,
val: Double,
},
high: Struct {
ts: Timestamp,
val: Double,
},
low: Struct {
ts: Timestamp,
val: Double
},
volume: Struct {
vol: Double,
vwap: Double,
}
}

示例

alter database public set ttl '1000000d';
create table if not exists tick(price bigint ,volume bigint);
insert tick(time, price, volume)
values
('1999-12-31 00:00:00.000', 111, 444),
('1999-12-31 00:00:00.005', 222, 444),
('1999-12-31 00:00:00.010', 333, 222),
('1999-12-31 00:00:10.015', 444, 111),
('1999-12-31 00:00:10.020', 222, 555),
('1999-12-31 00:10:00.025', 333, 555),
('1999-12-31 00:10:00.030', 444, 333),
('1999-12-31 01:00:00.035', 555, 222);
select candlestick_agg(time, price, volume) from tick;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| candlestick_agg(tick.time,tick.price,tick.volume) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {open: {ts: 1999-12-31T00:00:00, val: 111.0}, close: {ts: 1999-12-31T01:00:00.035, val: 555.0}, low: {ts: 1999-12-31T00:00:00, val: 111.0}, high: {ts: 1999-12-31T01:00:00.035, val: 555.0}, volume: {vol: 2886.0, vwap: 850149.0}} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

close

close(candlestick_agg_data)

获取收盘价。

返回值: DOUBLE

示例

select close(candlestick_agg(time, price, volume)) from tick;
+----------------------------------------------------------+
| close(candlestick_agg(tick.time,tick.price,tick.volume)) |
+----------------------------------------------------------+
| 555.0 |
+----------------------------------------------------------+

close_time

close_time(candlestick_agg_data)

获取收盘时间。

返回值: Timestamp

示例

select close_time(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------------+
| close_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+---------------------------------------------------------------+

high

high(candlestick_agg_data)

获取最高价。

返回值: DOUBLE

示例

select high(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| high(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 555.0 |
+---------------------------------------------------------+

high_time

high_time(candlestick_agg_data)

获取最高价所在的时间。

返回值: DOUBLE

示例

select high_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| high_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+--------------------------------------------------------------+

low

low(candlestick_agg_data)

获取最低价。

返回值: DOUBLE

示例

select low(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------+
| low(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------+
| 111.0 |
+--------------------------------------------------------+

low_time

low_time(candlestick_agg_data)

获取最低价所在的时间。

返回值: Timestamp

示例

select low_time(candlestick_agg(time, price, volume)) from tick;
+-------------------------------------------------------------+
| low_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+-------------------------------------------------------------+

open

open(candlestick_agg_data)

获取最低价。

返回值: DOUBLE

示例

select open(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| open(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 111.0 |
+---------------------------------------------------------+

open_time

open_time(candlestick_agg_data)

获取最低价所在的时间。

返回值: Timestamp

示例

select open_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| open_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+--------------------------------------------------------------+

volume

volume(candlestick_agg_data)

获取总共交易量。

返回值: DOUBLE

示例

select volume(candlestick_agg(time, price, volume)) from tick;
+-----------------------------------------------------------+
| volume(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-----------------------------------------------------------+
| 2886.0 |
+-----------------------------------------------------------+

vwap

vwap(candlestick_agg_data)

获取成交量加权平均价格。

返回值: DOUBLE

示例

select vwap(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| vwap(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 294.5769230769231 |
+---------------------------------------------------------+

函数

数学函数

abs(x)

功能:返回x的绝对值。

参数类型:数值类型

返回类型:与函数参数类型一致

示例

SELECT abs(-1);
+----------------+
| abs(Int64(-1)) |
+----------------+
| 1 |
+----------------+

acos(x)

功能:返回x的反余弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT acos(3);
+----------------+
| acos(Int64(3)) |
+----------------+
| NaN |
+----------------+
SELECT acos(0.5);
+--------------------+
| acos(Float64(0.5)) |
+--------------------+
| 1.0471975511965976 |
+--------------------+

asin(x)

功能:返回x的反正弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT asin(0.5);
+--------------------+
| asin(Float64(0.5)) |
+--------------------+
| 0.5235987755982988 |
+--------------------+
SELECT asin(5);
+----------------+
| asin(Int64(5)) |
+----------------+
| NaN |
+----------------+

atan(x)

功能:返回x的反正切值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT atan(5);
+-------------------+
| atan(Int64(5)) |
+-------------------+
| 1.373400766945016 |
+-------------------+

atan2(y,x)

功能:返回y/x的反正切值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT atan2(10, 2);
+---------------------------+
| atan2(Int64(10),Int64(2)) |
+---------------------------+
| 1.3734008 |
+---------------------------+

ceil(x)

功能:向上取整。

参数类型:数值类型

返回类型:BIGINT

示例

SELECT ceil(1.6);
+--------------------+
| ceil(Float64(1.6)) |
+--------------------+
| 2 |
+--------------------+

floor(x)

功能:向下取整

参数类型:数值类型

返回类型:BIGINT

示例

SELECT floor(-3.1);
+----------------------+
| floor(Float64(-3.1)) |
+----------------------+
| -4 |
+----------------------+

cos(x)

功能:返回x的余弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT cos(1);
+--------------------+
| cos(Int64(1)) |
+--------------------+
| 0.5403023058681398 |
+--------------------+

sin(x)

功能:x的正弦值

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT sin(5);
+---------------------+
| sin(Int64(5)) |
+---------------------+
| -0.9589242746631385 |
+---------------------+

exp(x)

功能:返回e的x次方。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT exp(1);
+-------------------+
| exp(Int64(1)) |
+-------------------+
| 2.718281828459045 |
+-------------------+

ln(x)

功能:自然对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT ln(2.718281828459045);
+--------------------------------+
| ln(Float64(2.718281828459045)) |
+--------------------------------+
| 1 |
+--------------------------------+

log(x) | log10(x)

功能:以10为底的对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT log(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+
SELECT log10(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+

log2(x)

功能:以 2 为底的对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT log2(4);
+----------------+
| log2(Int64(4)) |
+----------------+
| 2 |
+----------------+

power(x,y) | pow(x,y)

功能:x的y次方

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT power(2, 3);
+--------------------------+
| power(Int64(2),Int64(3)) |
+--------------------------+
| 8 |
+--------------------------+

round(x)

功能:四舍五入到最接近的整数

参数类型:数值类型

返回类型:BIGINT

示例

SELECT round(3.5);
+---------------------+
| round(Float64(3.5)) |
+---------------------+
| 4 |
+---------------------+

signum(x)

功能:参数的符号 (-1, 0, +1)

参数类型:数值类型

返回类型:BIGINT

示例

SELECT signum(-3);
+-------------------+
| signum(Int64(-3)) |
+-------------------+
| -1 |
+-------------------+

sqrt(x)

功能:x的平方根

参数类型:数值类型

返回类型:与函数参数类型一致

示例

SELECT sqrt(4);
+----------------+
| sqrt(Int64(4)) |
+----------------+
| 2 |
+----------------+

tan(x)

功能:x的正切值

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT tan(1);
+-------------------+
| tan(Int64(1)) |
+-------------------+
| 1.557407724654902 |
+-------------------+

trunc(x)

功能:向零取整

参数类型:数值类型

返回类型:BIGINT

示例

SELECT trunc(-3.9);
+----------------------+
| trunc(Float64(-3.9)) |
+----------------------+
| -3 |
+----------------------+

struct

语法

struct(expr1 [, ...] ) 

功能:创建一个具有指定字段值的 STRUCT。

参数类型:数值类型

注意:struct函数目前功能并不完善


条件函数

coalesce

语法

coalesce(expr[,...exp])

功能:返回其第一个不为空的参数。只有当所有参数都为 null 时才返回 Null。当检索数据以进行显示时,它通常用于将默认值替换为空值。

参数类型:任意

返回类型:第一个不为null的参数类型

示例

SELECT coalesce(temperature, null, station) FROM air;
+--------------------------------------------+
| coalesce(air.temperature,NULL,air.station) |
+--------------------------------------------+
| 69.0 |
| 78.0 |
| 62.0 |
| 79.0 |
| 53.0 |
| 72.0 |
| 71.0 |
| 69.0 |
| 80.0 |
| 74.0 |
| 70.0 |
| 70.0 |
| 70.0 |
+--------------------------------------------+

nullif

语法

nullif(expr1, expr2) 

功能:如果 expr1 等于 expr2,则返回 NULL;否则返回 expr1。

参数类型:expr1,expr2为数值类型,且为带列值的表达式

返回类型:expr1的类型或NULL

示例

SELECT nullif(temperature, 70) FROM air;
+-----------------------------------+
| nullif(air.temperature,Int64(70)) |
+-----------------------------------+
| 69 |
| 78 |
| 62 |
| 79 |
| 53 |
| 72 |
| 71 |
| 69 |
| 80 |
| 74 |
| |
| |
| |
+-----------------------------------+

字符串函数

ascii

语法

ascii(str) 

功能: 将 str 中的第一个字符转换成其ASCII 码后返回。

参数类型:STRING

返回类型:BIGINT

示例

SELECT ascii('abc');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+
SELECT ascii('a');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+

bit_length

语法

bit_length(str) 

功能:返回字符串数据的位长度或二进制数据的位数。

参数类型:STRING

返回类型:BIGINT

示例

SELECT bit_length('abc');
+------------------------+
| bitlength(Utf8("abc")) |
+------------------------+
| 24 |
+------------------------+

btrim

语法

btrim(string [, matching_string ] ) 

功能:函数通过删除前导空格和尾随空格或删除与可选的指定字符串匹配的字符来剪裁字符串。

参数类型:STRING

返回类型: STRING

示例

SELECT btrim('     abc                  ');
+-------------------------------------------+
| btrim(Utf8(" abc ")) |
+-------------------------------------------+
| abc |
+-------------------------------------------+
SELECT btrim('111abc111','1');
+------------------------------------+
| btrim(Utf8("111abc111"),Utf8("1")) |
+------------------------------------+
| abc |
+------------------------------------+

trim

语法

trim(str) 

功能:删除str首尾的空白字符

参数类型:STRING

返回类型:STRING


char_length | character_length

语法

char_length(expr) 

功能:以字符数形式返回指定字符串的长度。

参数类型:STRING

返回类型:BIGINT

示例

SELECT char_length('你好');
+-------------------------------+
| characterlength(Utf8("你好")) |
+-------------------------------+
| 2 |
+-------------------------------+

chr

语法

chr(expr) 

功能:返回位于提供的 UTF-16 码位的字符。

参数类型: BIGINT

返回类型: STRING

示例

SELECT chr(20005);
+-------------------+
| chr(Int64(20005)) |
+-------------------+
| 严 |
+-------------------+

concat

语法

concat(expr1, expr2 [, ...exp] ) 

功能:联接两个或两个以上表达式并返回生成的表达式。

参数类型:STRING

返回类型: STRING

示例

SELECT concat('a', 'b', 'c');
+---------------------------------------+
| concat(Utf8("a"),Utf8("b"),Utf8("c")) |
+---------------------------------------+
| abc |
+---------------------------------------+

concat_ws

语法

concat_ws(sep , expr1 [, ...] ) 

功能:返回由 sep 分隔的串联字符串。

参数类型:STRING

返回类型:STRING

示例

SELECT concat_ws(' ', 'a', 'b', 'c');
+--------------------------------------------------------------+
| concatwithseparator(Utf8(" "),Utf8("a"),Utf8("b"),Utf8("c")) |
+--------------------------------------------------------------+
| a b c |
+--------------------------------------------------------------+

initcap

语法

initcap(expr) 

功能:将参数中每个单词的首字母大写。

参数类型:STRING

返回类型:BIGINT

示例

SELECT initcap('hello world');
+------------------------------+
| initcap(Utf8("hello world")) |
+------------------------------+
| Hello World |
+------------------------------+

left

语法

left(str, len) 

功能:返回 str 中最左边的 len 个字符。

参数类型:str为STRING类型,len为BIGINT类型

返回类型:STRING

示例

SELECT left('abcde', 3);
+------------------------------+
| left(Utf8("abcde"),Int64(3)) |
+------------------------------+
| abc |
+------------------------------+

lpad

语法

lpad(expr, len [, pad] ) 

功能:返回 expr左侧填充了 pad,填充后长度为 len。

参数类型:expr, pad 类型为 STRING, len类型为BIGINT

返回类型:BIGINT

当len为负数时,len表现为0,当len过大,函数执行失败

示例

SELECT lpad('abc', 10, '1');
+---------------------------------------+
| lpad(Utf8("abc"),Int64(10),Utf8("1")) |
+---------------------------------------+
| 1111111abc |
+---------------------------------------+

rpad

语法

rpad(expr, len [, pad] ) 

功能:返回右侧填充了 pad 的 expr,填充后整个字符的长度为 len。

参数类型:expr, pad 类型为 STRING, len类型为BIGINT

返回类型:STRING

示例

SELECT rpad('aaa', 10, 'b');
+---------------------------------------+
| rpad(Utf8("aaa"),Int64(10),Utf8("b")) |
+---------------------------------------+
| aaabbbbbbb |
+---------------------------------------+

lower

语法

lower(expr) 

功能:返回字母小写。

参数类型:STRING

返回类型:STRING

示例

SELECT lower('ABC');
+--------------------+
| lower(Utf8("ABC")) |
+--------------------+
| abc |
+--------------------+

upper

语法

upper(expr)

功能:返回将 expr 的所有字符均更改为大写后的结果。

参数类型:STRING

返回类型:STRING


ltrim

语法

ltrim(str[, trimstr] ) 

功能:返回 str,其中删除了 trimStr 内的前导字符。默认trimestr为空白符

参数类型:STRING

返回类型:STRING

示例

SELECT ltrim('   abc');
+-----------------------+
| ltrim(Utf8(" abc")) |
+-----------------------+
| abc |
+-----------------------+

md5

语法

md5(expr) 

功能:以十六进制字符串形式返回 expr 的 MD5 128 位校验和。

参数类型:STRING

返回类型:STRING

示例

SELECT md5('abc');
+----------------------------------+
| md5(Utf8("abc")) |
+----------------------------------+
| 900150983cd24fb0d6963f7d28e17f72 |
+----------------------------------+

octet_length

语法

octet_length(expr) 

功能:返回字符串数据的字节长度。

参数类型:STRING

返回类型:BIGINT

示例

SELECT octet_length('你好');
+---------------------------+
| octetlength(Utf8("你好")) |
+---------------------------+
| 6 |
+---------------------------+

random

语法

random( [seed] ) 

功能:返回介于 0 和 1 之间的随机值。

参数类型:无

返回类型:DOUBLE

示例

SELECT random();
+---------------------+
| random() |
+---------------------+
| 0.37577771377596325 |
+---------------------+

repeat

语法

repeat(expr, n) 

功能:返回重复 expr, n 次的字符串。

参数类型:expr类型为STRING,n类型为BIGINT

返回类型:BIGINT

示例

SELECT repeat('a', 5);
+----------------------------+
| repeat(Utf8("a"),Int64(5)) |
+----------------------------+
| aaaaa |
+----------------------------+

replace

语法

replace(str, search, replace ) 

功能:将所有 search 项都替换为 replace。

参数类型:STRING

返回类型:BIGINT

示例

SELECT replace('aaa', 'a', 'b');
+------------------------------------------+
| replace(Utf8("aaa"),Utf8("a"),Utf8("b")) |
+------------------------------------------+
| bbb |
+------------------------------------------+

reverse

语法

reverse(expr) 

功能:返回一个反向字符串或一个包含逆序的元素的数组。

参数类型:STRING

返回类型:BIGINT

示例

SELECT reverse('你好');
+-----------------------+
| reverse(Utf8("你好")) |
+-----------------------+
| 好你 |
+-----------------------+

语法

right(str, len) 

功能:返回字符串 str 中最右边的 len 个字符。

参数类型:STRING

返回类型:BIGINT

示例

 SELECT right('aaabbb', 3);
+--------------------------------+
| right(Utf8("aaabbb"),Int64(3)) |
+--------------------------------+
| bbb |
+--------------------------------+

digest

语法

digest(expr, algorithm)

功能:把表达式用给定算法计算散列值

参数类型:expr, algorithm都为STRING类型

algorithm指定计算散列的算法,仅支持 md5, sha224, sha256, sha384, sha512, blake2s, blake2b, blake3

返回类型:BINARY

示例

SELECT digest('abc', 'md5');
+----------------------------------+
| digest(Utf8("abc"),Utf8("md5")) |
+----------------------------------+
| 900150983cd24fb0d6963f7d28e17f72 |
+----------------------------------+

rtrim

语法

rtrim( str [, trimstr] ) 

功能:返回删除了尾随字符trimstr的str,trimstr默认是空白字符。

参数类型:STRING

返回类型:STRING

示例

SELECT rtrim('aaabbb', 'b');
+---------------------------------+
| rtrim(Utf8("aaabbb"),Utf8("b")) |
+---------------------------------+
| aaa |
+---------------------------------+

sha224

语法

sha224(str)

功能:计算字符串的 sha224 散列值

返回类型:BINARY

参数类型:STRING

示例

 SELECT sha224('abc');
+----------------------------------------------------------+
| sha224(Utf8("abc")) |
+----------------------------------------------------------+
| 23097d223405d8228642a477bda255b32aadbce4bda0b3f7e36c9da7 |
+----------------------------------------------------------+

sha256

语法

sha256(str)

功能:计算字符串的 sha256 散列值

返回类型:BINARY

参数类型:STRING

示例

SELECT sha256('abc');
+------------------------------------------------------------------+
| sha256(Utf8("abc")) |
+------------------------------------------------------------------+
| ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad |
+------------------------------------------------------------------+

sha384

语法

sha384(str)

功能:计算字符串的 sha384 散列值

返回类型:BINARY

参数类型:STRING

示例

SELECT sha384('abc');
+--------------------------------------------------------------------------------------------------+
| sha384(Utf8("abc")) |
+--------------------------------------------------------------------------------------------------+
| cb00753f45a35e8bb5a03d699ac65007272c32ab0eded1631a8b605a43ff5bed8086072ba1e7cc2358baeca134c825a7 |
+--------------------------------------------------------------------------------------------------+

sha512

语法

sha512(str)

功能:计算字符串的 sha512 散列值

返回类型:BINARY

参数类型:STRING


split_part

语法

split_part(str, delim, n) 

功能:将 str 按照 delim 做拆分,返回第n部分。

参数类型:str,delim类型为STRING,partNum类型为BIGINT

返回类型:STRING

示例

SELECT split_part('abc|def|ghi', '|', 2);
+---------------------------------------------------+
| splitpart(Utf8("abc|def|ghi"),Utf8("|"),Int64(2)) |
+---------------------------------------------------+
| def |
+---------------------------------------------------+

starts_with

语法

starts_with(expr, startExpr) 

功能:如果 expr 以 startExpr 开头,则返回 true。

参数类型:STRING

返回类型:BOOLEAN

示例

SELECT starts_with('abcdefg', 'abc');
+-----------------------------------------+
| startswith(Utf8("abcdefg"),Utf8("abc")) |
+-----------------------------------------+
| true |
+-----------------------------------------+

strpos

语法

strpos(str, substr ) 

功能:返回子字符串在指定字符串中的位置。

参数类型:STRING

返回类型:BIGINT

示例

SELECT strpos('abcdef', 'def');
+------------------------------------+
| strpos(Utf8("abcdef"),Utf8("def")) |
+------------------------------------+
| 4 |
+------------------------------------+

substr

语法

substr(expr, pos [, len] ) 

功能:返回 expr 的子字符串(从 pos 开始,长度为 len)。

参数类型:expr 类型为STRING,pos,len类型为BIGINT

返回类型:STRING

示例

SELECT substr('abcdef', 4, 3);
+------------------------------------------+
| substr(Utf8("abcdef"),Int64(4),Int64(3)) |
+------------------------------------------+
| def |
+------------------------------------------+

to_hex

语法

to_hex(value)

功能:将十进制数字转换为十六进制表示形式。

参数类型:BIGINT

返回类型:STRING

示例

SELECT to_hex(100);
+-------------------+
| tohex(Int64(100)) |
+-------------------+
| 64 |
+-------------------+

translate

语法

translate(expr, from, to) 

功能:返回一个 expr,其中 from 中的所有字符都替换为 to 中的字符。

参数类型:STRING

返回类型:STRING

示例

SELECT translate('aaabbb', 'bbb', 'ccc');
+---------------------------------------------------+
| translate(Utf8("aaabbb"),Utf8("bbb"),Utf8("ccc")) |
+---------------------------------------------------+
| aaaccc |
+---------------------------------------------------+

时间函数

date_part

语法

date_part(field, expr) 

功能:提取部分日期、时间戳或间隔。

参数类型

field 类型为STRING,且只能是('year', 'quarter', 'month', 'week', 'day', 'doy', 'dow', 'hour', 'minute', ' second')中的一种。

expr 类型为 TIMESTAMP

返回类型:BIGINT

示例

SELECT date_part('hour', TIMESTAMP '2022-11-21T09:18:17');
+----------------------------------------------------+
| datepart(Utf8("hour"),Utf8("2022-11-21T09:18:17")) |
+----------------------------------------------------+
| 9 |
+----------------------------------------------------+

date_trunc

语法

date_trunc(field, expr) 

功能:返回已截断到 field 中指定的单位的值。

参数类型:field 类型为STRING,且只能是('year', 'quarter', 'month', 'week', 'day', 'doy', 'dow', 'hour', 'minute', ' second')中的一种。

expr 类型为TIMESTAMP

示例

SELECT date_trunc('month', TIMESTAMP '2022-11-21T09:18:17');
+------------------------------------------------------+
| datetrunc(Utf8("month"),Utf8("2022-11-21T09:18:17")) |
+------------------------------------------------------+
| 2022-11-01T00:00:00 |
+------------------------------------------------------+

date_bin

语法

date_bin(interval, source, origin)

功能:按 interval 分组成与origin对齐的均匀 bucket, 返回source所在的bucket 的 timestamp。

参数类型

interval 是 STRING 类型,会解析成时间间隔,

source, origin 是 TIMESTAMP 类型。

返回类型:TIMESTAMP

示例

SELECT date_bin(INTERVAL '1' DAY, TIMESTAMP '2022-11-21T09:10:24', TIMESTAMP '2022-11-01T00:00:00');
+------------------------------------------------------------------------------------------------+
| datebin(IntervalDayTime("4294967296"),Utf8("2022-11-21T09:10:24"),Utf8("2022-11-01T00:00:00")) |
+------------------------------------------------------------------------------------------------+
| 2022-11-21T00:00:00 |
+------------------------------------------------------------------------------------------------+

to_timestamp

语法

to_timestamp(expr) 

功能:返回使用可选格式设置强制转换为某个时间戳的 expr。

参数类型:STRING或BIGINT

返回类型:TIMESTAMP类型,精度随参数确定,BIGINT类型的参数,返回的是纳秒级的TIMESTAMP

示例

SELECT to_timestamp('1970-01-01T00:00:00');
+------------------------------------------+
| totimestamp(Utf8("1970-01-01T00:00:00")) |
+------------------------------------------+
| 1970-01-01T00:00:00 |
+------------------------------------------+
SELECT to_timestamp(1);
+-------------------------------+
| totimestamp(Int64(1)) |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+

to_timestamp_millis

语法

to_timestamp_millis(expr) 

功能:转化为毫秒级的时间戳

参数类型:BIGINT 或 STRING

返回类型:毫秒级的TIMESTAMP

示例

SELECT to_timestamp_millis('1970-01-01T00:00:00.00301');
+------------------------------------------------------+
| totimestampmillis(Utf8("1970-01-01T00:00:00.00301")) |
+------------------------------------------------------+
| 1970-01-01T00:00:00.003 |
+------------------------------------------------------+
SELECT to_timestamp_millis(1);
+-----------------------------+
| totimestampmillis(Int64(1)) |
+-----------------------------+
| 1970-01-01T00:00:00.001 |
+-----------------------------+

to_timestamp_micros

语法

to_timestamp_micros(expr) 

功能:转为微秒精度的时间戳。

参数:BIGINT 或 STRING

返回类型: 微秒精度的TIMESTAMP

示例

SELECT to_timestamp_micros(1)
+-----------------------------+
| totimestampmicros(Int64(1)) |
+-----------------------------+
| 1970-01-01T00:00:00.000001 |
+-----------------------------+

to_timestamp_seconds

语法

to_timestamp_seconds(expr) 

功能:转为秒级的时间戳

参数:BIGINT 或 STRING

返回类型:秒精度的TIMESTAMP

示例

SELECT to_timestamp_seconds(1);
+------------------------------+
| totimestampseconds(Int64(1)) |
+------------------------------+
| 1970-01-01T00:00:01 |
+------------------------------+

from_unixtime

语法

from_unixtime(unixTime) 

功能:返回 unixTime。

参数: BIGINT

返回类型: unix时间,秒级

示例

SELECT from_unixtime(1);
+------------------------+
| fromunixtime(Int64(1)) |
+------------------------+
| 1970-01-01T00:00:01 |
+------------------------+

now

语法

now()

功能:返回当前时间戳

返回类型:TIMESTAMP

示例

SELECT now();
+----------------------------------+
| now() |
+----------------------------------+
| 2022-11-21T04:44:19.742107+00:00 |
+----------------------------------+

time_window

语法

time_window(time_expr, window_duration [, slide_duration])

time_column 为 Timestamp 类型

window_duration 为INTERVAL类型,指定时间窗口的窗口大小

slide_duration 为INTERVAL类型,指定时间窗口滑动的大小,不指定此参数时,滑动大小为时间窗口大小,变成滚动窗口

time_window(time, window_duration, slide_duration) 生成的窗口为:

start, end
time, time_column + window_duration
time - slide_duration, time + window_duration - slide_duration
time - 2 * slide_duration, time + window_duration - 2 * slide_duration
...
time - n * slide_duration, time + window_duration - n * slide_duration

且窗口满足 start <= time < end

示例

CREATE TABLE test(a BIGINT, TAGS(b));
INSERT INTO test(time, a, b) VALUES ('2023-04-23T00:00:00.000000Z', 1, 'b');
SELECT time FROM test;
+---------------------+
| time |
+---------------------+
| 2023-04-23T00:00:00 |
+---------------------+
SELECT time_window(time, interval '3 day') FROM test;
+---------------------------------------------------------------------+
| TIME_WINDOW(test.time,IntervalMonthDayNano("55340232221128654848")) |
+---------------------------------------------------------------------+
| {start: 2023-04-23T00:00:00, end: 2023-04-26T00:00:00} |
+---------------------------------------------------------------------+
SELECT time_window(time, interval '5 day', interval '3 day') FROM test;
+------------------------------------------------------------------------------------------------------------------+
| TIME_WINDOW(test.time,IntervalMonthDayNano("92233720368547758080"),IntervalMonthDayNano("55340232221128654848")) |
+------------------------------------------------------------------------------------------------------------------+
| {start: 2023-04-23T00:00:00, end: 2023-04-28T00:00:00} |
| {start: 2023-04-20T00:00:00, end: 2023-04-25T00:00:00} |
+------------------------------------------------------------------------------------------------------------------+

空间函数

CnosDB 提供了 ST_Geometry SQL 系列的空间函数,关于Geometry类型,请查看Geometry数据类型一节。

ST_AsBinary

ST_AsBinary(geometry)

功能:将空间几何对象 Geometry 以 OGC/ISO Well-Known Binary(WKB) 格式返回。

参数类型:Geometry

返回类型: Binary

示例:

SELECT ST_AsBinary('POINT(0 3)');
+--------------------------------------------+
| st_AsBinary(Utf8("POINT(0 3)")) |
+--------------------------------------------+
| 010100000000000000000000000000000000000840 |
+--------------------------------------------+

ST_GeomFromWKB

ST_GeomFromWKB(wkb)

功能:把WKB格式二进制转为Geometry类型

参数类型:Binary

返回类型: Geometry

示例

SELECT ST_GeomFromWKB(ST_AsBinary('POINT(0 3)'))
+-------------------------------------------------+
| st_GeomFromWKB(st_AsBinary(Utf8("POINT(0 3)"))) |
+-------------------------------------------------+
| POINT(0 3) |
+-------------------------------------------------+

ST_Distance

ST_Distance(geometry1, gemometry2)

功能: ST_Distance 返回两个几何体的 2D 投影之间的最小欧氏距离。

参数类型:Binary

返回类型: Double

示例

两点间距离

SELECT ST_Distance('POINT(0 0)', 'LINESTRING (30 10, 10 30, 40 40)');
+----------------------------------------------------+
| st_distance(Utf8("POINT(1 0)"),Utf8("POINT(0 0)")) |
+----------------------------------------------------+
| 1.0 |
+----------------------------------------------------+

点到直线距离

SELECT ST_Distance('POINT(0 0)', 'LINESTRING (30 10, 10 30, 40 40)');
+--------------------------------------------------------------------------+
| st_distance(Utf8("POINT(0 0)"),Utf8("LINESTRING (30 10, 10 30, 40 40)")) |
+--------------------------------------------------------------------------+
| 28.284271247461902 |
+--------------------------------------------------------------------------+

平面和平面之间的距离

SELECT ST_Distance('POLYGON((0 2,1 1,0 -1,0 2))', 'POLYGON((-1 -3,-2 -1,0 -3,-1 -3))');
+--------------------------------------------------------------------------------------------+
| st_distance(Utf8("POLYGON((0 2,1 1,0 -1,0 2))"),Utf8("POLYGON((-1 -3,-2 -1,0 -3,-1 -3))")) |
+--------------------------------------------------------------------------------------------+
| 1.4142135623730951 |
+--------------------------------------------------------------------------------------------+

ST_Area

ST_Area(geometry)

功能:返回几何对象 2D 投影的笛卡尔面积。面积单位与用于表示输入几何体坐标的单位相同。 对于点、线串、多点和多线串,此函数返回 0。 对于几何体集合,它返回集合中几何体的面积之和。

参数类型 Geometry

返回类型: Double

示例

SELECT ST_Area('POLYGON ((40 40, 20 45, 45 30, 40 40))');

+---------------------------------------------------------+ | st_Area(Utf8("POLYGON ((40 40, 20 45, 45 30, 40 40))")) | +---------------------------------------------------------+ | 87.5 | +---------------------------------------------------------+

部分几何图形不支持计算面积,对这些几何体计算面积会返回 0,如:Point、MultiPoint、LineString、MultiLineString、Line。 如果参数内容格式非法,返回值为 NULL。

ST_Equals

ST_Equals(A, B)

功能:比较两个几何体,如果两个几何体完全相同,则返回true

ST_Equals(A, B) 等价于 ST_Within(A, B) && ST_Within(B, A)

参数类型: Geometry

返回类型: Boolean

示例

select ST_Equals(
'LINESTRING(0 0, 10 10)',
'LINESTRING(0 0, 5 5, 10 10)'
) st_equals;
+-----------+
| st_equals |
+-----------+
| true |
+-----------+

ST_Contains

ST_Contains(A, B)

功能:如果几何对象A包含几何对象B,返回True

ST_Contains(A, B) => ST_Within(B, A)

参数:Geometry

返回类型:Boolean

示例

select ST_Contains(
'POLYGON((0 0,0 3,3 0,0 0))',
'POLYGON((0 0,0 1,1 0,0 0))'
) st_contains;
+-------------+
| st_contains |
+-------------+
| true |
+-------------+

ST_Intersects

ST_Intersects(A, B)

功能:如果两个几何对象相交,则返回True

参数: Geometry

返回类型: Boolean

示例

select ST_Intersects(
'LINESTRING(3 2, 7 6)',
'LINESTRING(3 4, 8 4)'
) st_intersects;
+---------------+
| st_intersects |
+---------------+
| true |
+---------------+

ST_Disjoint

ST_Disjoint(A, B)

功能: 如果两个几何对象不相接,返回True

参数: Geometry

返回类型:Boolean

示例

select ST_Disjoint(
'LINESTRING(0 0,-3 -3)',
'LINESTRING(0 1,1 0)'
);
+-------------+
| st_disjoint |
+-------------+
| true |
+-------------+

ST_Within

ST_Within(A, B)

功能:如果给定的Geometry对象A完全在对象B之内,则返回True

参数:Geometry

返回类型:Boolean

示例

select ST_Within(
'POLYGON((1 1, 1 2, 2 2, 2 1, 1 1))',
'POLYGON((0 0, 0 3, 3 3, 3 0, 0 0))'
);
+-----------+
| st_within |
+-----------+
| true |
+-----------+

窗口函数

语法

function([...expr] ) OVER ([PARTITION BY expr] [ORDER BY expr] [window_frame]);

function: {aggregate_function | analytic_function}

window_frame: { frame_mode frame_start |
frame_mode BETWEEN frame_start AND frame_end } }
frame_mode: {RANGE | ROWS}

frame_start: {UNBOUNDED PRECEDING | offset_start PRECEDING | CURRENT ROW | offset_start FOLLOWING }

frame_end: {offset_stop PRECEDING | CURRENT ROW | offset_stop FOLLOWING | UNBOUNDED FOLLOWING}

函数类型

排名函数

函数名
DENSE_RANK
PERCENT_RANK
RANK
ROW_NUMBER

其中DENSE_RANK | RANK | PERCENT_RANK 需要 ORDER BY 子句

其中 RANK, DENSE_RANK, ROW_NUMBER 指定window_frame 无效

聚合函数

详见聚合函数

分析窗口函数

函数名
CUME_DIST
LAG
LEAD
NTH_VALUE

PARTITION BY 子句

一个或多个表达式,用于指定一个行分区,如果没有该子句,则分区由所有行组成

ORDER BY 子句

指定行在分区中的顺序

window_frame 子句

frame 是当前分区的一个子集,在分区里进一步细分窗口

指定ROWS,则窗口以行为单位算偏移量

指定RANGE,则必须指定 ORDER BY 子句,窗口以ORDER BY 表达式的值为单位算偏移量

  • UNBOUND PRECEDING ROWS 模式下为分区的第一行,RANGE模式下为分区ORDER BY表达式的第一个值
  • offset PRECEDING ROWS 模式下为当前行的前offset行,RANGE 模式下为当前值的前offset值
  • CURRENT ROW ROWS 模式下为当前行,RANGE模式下为当前值
  • offset FOLLOWING ROWS 模式下为当前行的后offset行,RANGE 模式下为当前值的后offset值
  • UNBOUND FOLLOWING ROWS 模式下为分区的最后一行,RANGE模式下为ORDER BY表达式的最后一个值

使用限制

  • 窗口函数只能出现在SELECT语句中。
  • 窗口函数中不能嵌套使用窗口函数和聚合函数。

窗口函数列表

包括聚合函数

ROW_NUMBER

语法

ROW_NUMBER() OVER([partition_clause] [orderby_clause])

功能:根据窗口分区中的行顺序,为每一行分配唯一的顺序编号(从 1 开始)。

参数类型:无

返回类型:BIGINT

示例

SELECT temperature, station, 
ROW_NUMBER() OVER (PARTITION BY station)
FROM air;
+-------------+-------------+--------------+
| temperature | station | ROW_NUMBER() |
+-------------+-------------+--------------+
| 69 | LianYunGang | 1 |
| 80 | LianYunGang | 2 |
| 74 | LianYunGang | 3 |
| 70 | LianYunGang | 4 |
| 70 | LianYunGang | 5 |
| 70 | LianYunGang | 6 |
| 69 | XiaoMaiDao | 1 |
| 78 | XiaoMaiDao | 2 |
| 62 | XiaoMaiDao | 3 |
| 79 | XiaoMaiDao | 4 |
| 53 | XiaoMaiDao | 5 |
| 72 | XiaoMaiDao | 6 |
| 71 | XiaoMaiDao | 7 |
+-------------+-------------+--------------+

RANK

语法

RANK() OVER([partition_clause] [orderby_clause])

功能:返回某个值相对于分区中所有值的排名(跳跃排名)。

参数类型:无

返回类型:BIGINT

示例

SELECT station, temperature, 
RANK() OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+--------+
| station | temperature | RANK() |
+-------------+-------------+--------+
| LianYunGang | 69 | 1 |
| LianYunGang | 70 | 2 |
| LianYunGang | 70 | 2 |
| LianYunGang | 70 | 2 |
| LianYunGang | 74 | 5 |
| LianYunGang | 80 | 6 |
| XiaoMaiDao | 53 | 1 |
| XiaoMaiDao | 62 | 2 |
| XiaoMaiDao | 69 | 3 |
| XiaoMaiDao | 71 | 4 |
| XiaoMaiDao | 72 | 5 |
| XiaoMaiDao | 78 | 6 |
| XiaoMaiDao | 79 | 7 |
+-------------+-------------+--------+

DENSE_RANK

语法

DENSE_RANK() OVER([partition_clause] [orderby_clause])

功能:返回某个值相对于分区中所有值的排名(连续排名)。

参数类型:无

返回类型:BIGINT

示例

SELECT station, temperature, 
DENSE_RANK() OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+--------------+
| station | temperature | DENSE_RANK() |
+-------------+-------------+--------------+
| LianYunGang | 69 | 1 |
| LianYunGang | 70 | 2 |
| LianYunGang | 70 | 2 |
| LianYunGang | 70 | 2 |
| LianYunGang | 74 | 3 |
| LianYunGang | 80 | 4 |
| XiaoMaiDao | 53 | 1 |
| XiaoMaiDao | 62 | 2 |
| XiaoMaiDao | 69 | 3 |
| XiaoMaiDao | 71 | 4 |
| XiaoMaiDao | 72 | 5 |
| XiaoMaiDao | 78 | 6 |
| XiaoMaiDao | 79 | 7 |
+-------------+-------------+--------------+

PERCENT_RANK

语法

PERCENT_RANK() OVER([partition_clause] [orderby_clause])

功能:计算分区中某个值的百分比排名。

参数类型:无

返回类型:DOUBLE

示例

 SELECT station, temperature, 
PERCENT_RANK() OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+---------------------+
| station | temperature | PERCENT_RANK() |
+-------------+-------------+---------------------+
| LianYunGang | 69 | 0 |
| LianYunGang | 70 | 0.2 |
| LianYunGang | 70 | 0.2 |
| LianYunGang | 70 | 0.2 |
| LianYunGang | 74 | 0.8 |
| LianYunGang | 80 | 1 |
| XiaoMaiDao | 53 | 0 |
| XiaoMaiDao | 62 | 0.16666666666666666 |
| XiaoMaiDao | 69 | 0.3333333333333333 |
| XiaoMaiDao | 71 | 0.5 |
| XiaoMaiDao | 72 | 0.6666666666666666 |
| XiaoMaiDao | 78 | 0.8333333333333334 |
| XiaoMaiDao | 79 | 1 |
+-------------+-------------+---------------------+

CUME_DIST

语法

CUME_DIST() OVER ([partition_clause] [orderby_clause])

功能:返回某个值相对于分区中的所有值的位置。

参数类型:无

返回类型:DOUBLE

示例

SELECT station, temperature, 
CUME_DIST() OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+---------------------+
| station | temperature | CUME_DIST() |
+-------------+-------------+---------------------+
| LianYunGang | 69 | 0.16666666666666666 |
| LianYunGang | 70 | 0.6666666666666666 |
| LianYunGang | 70 | 0.6666666666666666 |
| LianYunGang | 70 | 0.6666666666666666 |
| LianYunGang | 74 | 0.8333333333333334 |
| LianYunGang | 80 | 1 |
| XiaoMaiDao | 53 | 0.14285714285714285 |
| XiaoMaiDao | 62 | 0.2857142857142857 |
| XiaoMaiDao | 69 | 0.42857142857142855 |
| XiaoMaiDao | 71 | 0.5714285714285714 |
| XiaoMaiDao | 72 | 0.7142857142857143 |
| XiaoMaiDao | 78 | 0.8571428571428571 |
| XiaoMaiDao | 79 | 1 |
+-------------+-------------+---------------------+

LAG

语法

lag( expr [, offset [, default] ] ) OVER([partition_clause] orderby_clause)

功能:返回分区中当前行前offset行的expr的值。

参数类型:expr为任意类型,

offset为BIGINT,为负数时,从分区中后offset行返回值,默认为1

default 需要与expr对应的数据类型相同,默认为NULL

返回类型:与expr相同的类型

示例

SELECT station, temperature, 
LAG(temperature, 2) OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+-------------------------------+
| station | temperature | LAG(air.temperature,Int64(2)) |
+-------------+-------------+-------------------------------+
| LianYunGang | 69 | |
| LianYunGang | 70 | |
| LianYunGang | 70 | 69 |
| LianYunGang | 70 | 70 |
| LianYunGang | 74 | 70 |
| LianYunGang | 80 | 70 |
| XiaoMaiDao | 53 | |
| XiaoMaiDao | 62 | |
| XiaoMaiDao | 69 | 53 |
| XiaoMaiDao | 71 | 62 |
| XiaoMaiDao | 72 | 69 |
| XiaoMaiDao | 78 | 71 |
| XiaoMaiDao | 79 | 72 |
+-------------+-------------+-------------------------------+

LEAD

语法

lead(expr [, offset [, default] ] ) OVER ([partition_clause] orderby_clause)

功能:返回分区中当前行后offset行的expr的值。

参数类型:expr为任意类型,

offset为BIGINT,为负数时,从分区中前offset行返回值,默认为1

default需要与expr类型相同,默认是NULL

返回类型:与expr类型相同

示例

SELECT station, temperature, 
LEAD(temperature, 2) OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+--------------------------------+
| station | temperature | LEAD(air.temperature,Int64(2)) |
+-------------+-------------+--------------------------------+
| LianYunGang | 69 | 70 |
| LianYunGang | 70 | 70 |
| LianYunGang | 70 | 74 |
| LianYunGang | 70 | 80 |
| LianYunGang | 74 | |
| LianYunGang | 80 | |
| XiaoMaiDao | 53 | 69 |
| XiaoMaiDao | 62 | 71 |
| XiaoMaiDao | 69 | 72 |
| XiaoMaiDao | 71 | 78 |
| XiaoMaiDao | 72 | 79 |
| XiaoMaiDao | 78 | |
| XiaoMaiDao | 79 | |
+-------------+-------------+--------------------------------+

FIRST_VALUE

语法

FIRST_VALUE(expr) OVER ([partition_clause] [orderby_clause])

功能:返回一组值(该组通常是有序集合)中的第一个值。

参数类型:expr为任意类型,ignore_nulls为BOOLEAN类型,默认值为false

返回类型:与expr类型相同

示例

SELECT station, temperature, 
FIRST_VALUE(temperature) OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+------------------------------+
| station | temperature | FIRST_VALUE(air.temperature) |
+-------------+-------------+------------------------------+
| LianYunGang | 69 | 69 |
| LianYunGang | 70 | 69 |
| LianYunGang | 70 | 69 |
| LianYunGang | 70 | 69 |
| LianYunGang | 74 | 69 |
| LianYunGang | 80 | 69 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 62 | 53 |
| XiaoMaiDao | 69 | 53 |
| XiaoMaiDao | 71 | 53 |
| XiaoMaiDao | 72 | 53 |
| XiaoMaiDao | 78 | 53 |
| XiaoMaiDao | 79 | 53 |
+-------------+-------------+------------------------------+

LAST_VALUE

语法

LAST_VALUE(expr) OVER ([partition_clause] [orderby_clause])

功能:返回当前窗口中的最后一个值。

参数类型:expr为任意类型,ignore_nulls为BOOLEAN类型,默认值为false

返回类型:与expr类型相同

示例

SELECT station, temperature, 
LAST_VALUE(temperature) OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+-----------------------------+
| station | temperature | LAST_VALUE(air.temperature) |
+-------------+-------------+-----------------------------+
| LianYunGang | 69 | 69 |
| LianYunGang | 70 | 70 |
| LianYunGang | 70 | 70 |
| LianYunGang | 70 | 70 |
| LianYunGang | 74 | 74 |
| LianYunGang | 80 | 80 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 62 | 62 |
| XiaoMaiDao | 69 | 69 |
| XiaoMaiDao | 71 | 71 |
| XiaoMaiDao | 72 | 72 |
| XiaoMaiDao | 78 | 78 |
| XiaoMaiDao | 79 | 79 |
+-------------+-------------+-----------------------------+

NTH_VALUE

语法

NTH_VALUE(expr, number) OVER ([partition_clause] [orderby_clause])

功能:返回相对于窗口的第一行的窗口框架的指定行的表达式值。

参数类型:expr为任意类型,number为BIGINT

返回类型:与expr类型相同

示例

SELECT station, temperature, 
NTH_VALUE(temperature, 2) OVER (PARTITION BY station ORDER BY temperature)
FROM air;
+-------------+-------------+-------------------------------------+
| station | temperature | NTH_VALUE(air.temperature,Int64(2)) |
+-------------+-------------+-------------------------------------+
| LianYunGang | 69 | |
| LianYunGang | 70 | 70 |
| LianYunGang | 70 | 70 |
| LianYunGang | 70 | 70 |
| LianYunGang | 74 | 70 |
| LianYunGang | 80 | 70 |
| XiaoMaiDao | 53 | |
| XiaoMaiDao | 62 | 62 |
| XiaoMaiDao | 69 | 62 |
| XiaoMaiDao | 71 | 62 |
| XiaoMaiDao | 72 | 62 |
| XiaoMaiDao | 78 | 62 |
| XiaoMaiDao | 79 | 62 |
+-------------+-------------+-------------------------------------+

高级函数

插值函数

在数据库中,插值是用于处理数据中缺失值的技术。当数据中存在缺失值时,这些技术可以帮助我们估计或推测这些缺失值,从而填补数据的空白部分。

time_window_gapfill

time_window_gapfill 与 time_window 类似,但具有填补缺失数据的功能。interpolate 和 locf 必须与 time_window_gapfill 一起使用,它们控制如何处理缺失值。

time_window_gapfill 必须作为查询或子查询中的顶级表达式使用。例如,不能将 time_window_gapfill 嵌套在另一个函数中,如sum( time_window_gapfill(...))。

语法

  • time_window_gapfill
time_window_gapfill(<time column>, <window interval>[, <sliding interval>[, <start time>]]): <time window struct>

策略

  • interpolate

线性插值的核心思想是假设已知数据点之间的关系是线性的,然后根据已知数据点之间的线性关系来估算未知数据点的值。具体地,线性插值通过使用已知数据点的纵坐标之间的线性变化率来推断未知数据点的纵坐标。

线性插值适用于连续变量的估算,例如在时间序列中填补缺失值或在空间数据中进行插值。然而,线性插值的准确性和适用性取决于数据的特性和实际情况。在某些情况下,数据可能具有非线性关系,或存在其他更适合的插值方法。因此,在应用线性插值之前,需要仔细考虑数据的性质和插值的目的,以确保插值结果合理和准确。

interpolate(<expr>)
  • locf

该函数用于在时间窗口内进行缺失值填补(Gap filling),并使用 "Last Observation Carried Forward"(LOCF)操作来填充缺失值。

"Last Observation Carried Forward"(LOCF)是一种用于填充缺失值的方法,它使用最近的可观察值来进行填充。具体处理方式如下:

  1. 找到缺失值之前的最近一个非缺失值。
  2. 将该非缺失值的值复制到缺失值所在的位置。
  3. 继续向后遍历,直到遇到下一个非缺失值。
  4. 如果遇到下一个非缺失值,则重复步骤1和2,将该非缺失值的值复制到缺失值位置。
  5. 如果在数据序列的末尾仍有缺失值,则最后一个非缺失值将一直被复制,直到填充完所有缺失值。

简而言之,LOCF 方法通过将最近的可观察值复制到缺失值位置来填充缺失值,使得数据在时间上保持连续性。这种方法假设缺失值之后的数据与最后观察到的值相同或非常接近。

需要注意的是,LOCF 方法可能会引入一定的偏差,特别是当缺失值之后的数据发生剧烈变化时。因此,在使用 LOCF 进行缺失值填充时,需要谨慎考虑数据的特点和分析的目的,以确保填补的值能够合理反映实际情况。

locf(<expr>)

示例

---- 准备数据
DROP DATABASE IF EXISTS gapfill_db;
CREATE DATABASE gapfill_db WITH TTL '1000000d';
CREATE TABLE gapfill_db.m2(f0 BIGINT, f1 DOUBLE, TAGS(t0, t1, t2));

INSERT gapfill_db.m2(TIME, f0, f1, t0, t1)
VALUES
('1999-12-31 00:00:00.000', 111, 444, 'tag11', 'tag21'),
('1999-12-31 00:00:00.005', 222, 333, 'tag12', 'tag22'),
('1999-12-31 00:00:00.010', 333, 222, 'tag13', 'tag23'),
('1999-12-31 00:00:00.015', 444, 111, 'tag14', 'tag24'),
('1999-12-31 00:00:00.020', 222, 555, 'tag11', 'tag21'),
('1999-12-31 00:00:00.025', 333, 444, 'tag12', 'tag22'),
('1999-12-31 00:00:00.030', 444, 333, 'tag13', 'tag23'),
('1999-12-31 00:00:00.035', 555, 222, 'tag14', 'tag24');
---- interpolate
SELECT
t0,
time_window_gapfill(time, interval '10 milliseconds') as minute,
interpolate(avg(f1))
from gapfill_db.m2
where time between timestamp '1999-12-31T00:00:00.000Z' and timestamp '1999-12-31T00:00:00.055Z'
group by t0, minute;
+-------+-------------------------+-----------------------+
| t0 | minute | AVG(gapfill_db.m2.f1) |
+-------+-------------------------+-----------------------+
| tag11 | 1999-12-31T00:00:00 | 444.0 |
| tag11 | 1999-12-31T00:00:00.010 | 499.5 |
| tag11 | 1999-12-31T00:00:00.020 | 555.0 |
| tag11 | 1999-12-31T00:00:00.030 | |
| tag11 | 1999-12-31T00:00:00.040 | |
| tag11 | 1999-12-31T00:00:00.050 | |
| tag12 | 1999-12-31T00:00:00 | 333.0 |
| tag12 | 1999-12-31T00:00:00.010 | 388.5 |
| tag12 | 1999-12-31T00:00:00.020 | 444.0 |
| tag12 | 1999-12-31T00:00:00.030 | |
| tag12 | 1999-12-31T00:00:00.040 | |
| tag12 | 1999-12-31T00:00:00.050 | |
| tag13 | 1999-12-31T00:00:00 | |
| tag13 | 1999-12-31T00:00:00.010 | 222.0 |
| tag13 | 1999-12-31T00:00:00.020 | 277.5 |
| tag13 | 1999-12-31T00:00:00.030 | 333.0 |
| tag13 | 1999-12-31T00:00:00.040 | |
| tag13 | 1999-12-31T00:00:00.050 | |
| tag14 | 1999-12-31T00:00:00 | |
| tag14 | 1999-12-31T00:00:00.010 | 111.0 |
| tag14 | 1999-12-31T00:00:00.020 | 166.5 |
| tag14 | 1999-12-31T00:00:00.030 | 222.0 |
| tag14 | 1999-12-31T00:00:00.040 | |
| tag14 | 1999-12-31T00:00:00.050 | |
+-------+-------------------------+-----------------------+
---- locf
SELECT
t0,
time_window_gapfill(time, interval '10 milliseconds') as minute,
locf(avg(f1))
from gapfill_db.m2
where time between timestamp '1999-12-31T00:00:00.000Z' and timestamp '1999-12-31T00:00:00.055Z'
group by t0, minute;
+-------+-------------------------+-----------------------+
| t0 | minute | AVG(gapfill_db.m2.f1) |
+-------+-------------------------+-----------------------+
| tag11 | 1999-12-31T00:00:00 | 444.0 |
| tag11 | 1999-12-31T00:00:00.010 | 444.0 |
| tag11 | 1999-12-31T00:00:00.020 | 555.0 |
| tag11 | 1999-12-31T00:00:00.030 | 555.0 |
| tag11 | 1999-12-31T00:00:00.040 | 555.0 |
| tag11 | 1999-12-31T00:00:00.050 | 555.0 |
| tag12 | 1999-12-31T00:00:00 | 333.0 |
| tag12 | 1999-12-31T00:00:00.010 | 333.0 |
| tag12 | 1999-12-31T00:00:00.020 | 444.0 |
| tag12 | 1999-12-31T00:00:00.030 | 444.0 |
| tag12 | 1999-12-31T00:00:00.040 | 444.0 |
| tag12 | 1999-12-31T00:00:00.050 | 444.0 |
| tag13 | 1999-12-31T00:00:00 | |
| tag13 | 1999-12-31T00:00:00.010 | 222.0 |
| tag13 | 1999-12-31T00:00:00.020 | 222.0 |
| tag13 | 1999-12-31T00:00:00.030 | 333.0 |
| tag13 | 1999-12-31T00:00:00.040 | 333.0 |
| tag13 | 1999-12-31T00:00:00.050 | 333.0 |
| tag14 | 1999-12-31T00:00:00 | |
| tag14 | 1999-12-31T00:00:00.010 | 111.0 |
| tag14 | 1999-12-31T00:00:00.020 | 111.0 |
| tag14 | 1999-12-31T00:00:00.030 | 222.0 |
| tag14 | 1999-12-31T00:00:00.040 | 222.0 |
| tag14 | 1999-12-31T00:00:00.050 | 222.0 |
+-------+-------------------------+-----------------------+

系统视图

CnosDB 提供了系统视图用来查看集群状态和集群Schema信息。

有两个特殊的数据库存放这些视图:

  • CLUSTER_SCHEMA 关于数据库集群
  • INFORMATION_SCHEMA 关于租户信息

CLUSTER_SCHEMA

该数据库属于整个集群,只有管理员可以访问。

数据库中包含有关集群的元数据信息,例如租户信息,用户信息。

TENANTS

该视图可用于查询整个集群的所有租户信息。

视图定义

字段数据类型描述
TENANT_NAMESTRING租户名称
TENANT_OPTIONSSTRING租户配置,json形式的字符串

示例

SELECT * FROM cluster_schema.tenants;
+-------------+---------------------------------------------------+
| tenant_name | tenant_options |
+-------------+---------------------------------------------------+
| cnosdb | {"comment":"system tenant","limiter_config":null} |
+-------------+---------------------------------------------------+

USERS

视图定义

该视图可以查询整个集群的所有用户信息。

字段数据类型描述
USER_NAMESTRING用户名称
IS_ADMINBOOLEAN是否为系统管理员
USER_OPTIONSSTRING用户配置,JSON形式的字符串

示例

SELECT * FROM cluster_schema.users;
+-----------+----------+-------------------------------------------------------------------------------------------------+
| user_name | is_admin | user_options |
+-----------+----------+-------------------------------------------------------------------------------------------------+
| root | true | {"password":"*****","must_change_password":true,"rsa_public_key":null,"comment":"system admin"} |
+-----------+----------+-------------------------------------------------------------------------------------------------+

INFORMATION_SCHEMA

该数据库属于某个租户,在创建Tenant时,自动创建该DB,对租户下的所有成员可见。

DATABASES

该视图存放租户下数据库的信息。

视图定义

字段名称数据类型描述
TENANT_NAMESTRING数据库所属的租户名
DATABASE_NAMESTRING数据库名称
TTLSTRING表示数据文件保存的时间
SHARDBIGINT UNSIGNED表示数据分片个数
VNODE_DURATIONSTRING表示数据在SHARD中的时间范围
PREPLICABIGINT UNSIGNED表示数据在集群中的副本数
PERCISIONSTRING表示数据库的时间精度

示例

SELECT * FROM information_schema.databases;
+-------------+---------------+----------+-------+----------------+---------+-----------+
| tenant_name | database_name | ttl | shard | vnode_duration | replica | percision |
+-------------+---------------+----------+-------+----------------+---------+-----------+
| cnosdb | public | 365 Days | 1 | 365 Days | 1 | NS |
+-------------+---------------+----------+-------+----------------+---------+-----------+

TABLES

该视图存放租户下所有表的信息。

视图定义

字段名称数据类型描述
TABLE_TENANTSTRING表所属的租户
TABLE_DATABASESTRING表所属的数据库
TABLE_NAMESTRING表名
TABLE_TYPESTRING表是基础表,还是视图
TABLE_ENGINESTRING表存储引擎,目前支持外部表和内部tskv表
TABLE_OPTIONSTRING内容为JSON字符串,记录表的所有参数

示例

SELECT * FROM information_schema.tables;
+--------------+----------------+------------+------------+--------------+---------------+
| table_tenant | table_database | table_name | table_type | table_engine | table_options |
+--------------+----------------+------------+------------+--------------+---------------+
| cnosdb | public | wind | BASE TABLE | TSKV | TODO |
| cnosdb | public | air | BASE TABLE | TSKV | TODO |
| cnosdb | public | sea | BASE TABLE | TSKV | TODO |
+--------------+----------------+------------+------------+--------------+---------------+

COLUMNS

该视图存放租户下所有列的定义。

视图定义

字段名称数据类型描述
TABLE_TENANTSTRING表所属的租户
TABLE_DATABASESTRING表所属的数据库
TABLE_NAMESTRING表所属的表名
COLUMN_NAMESTRING列名
ORDINAL_POSITIONSTRING列在表中的顺序位置
COLUMN_TYPESTRING列的类型,tskv表独有的,支持 TIME、TAG、FIELD,通常字段为FIELD类型
IS_NULLABLESTRING如果列可能包含NULL,则为"YES",否则为"NO"
DATA_TYPESTRING列的数据类型
COMPRESSION_CODECSTRING列使用的压缩算法

示例

SELECT * FROM information_schema.columns;
+-------------+---------------+------------+-------------+-------------+------------------+----------------+-------------+-----------+-------------------+
| tenant_name | database_name | table_name | column_name | column_type | ordinal_position | column_default | is_nullable | data_type | compression_codec |
+-------------+---------------+------------+-------------+-------------+------------------+----------------+-------------+-----------+-------------------+
| cnosdb | public | wind | time | TIME | 0 | NULL | false | TIMESTAMP | DEFAULT |
| cnosdb | public | wind | station | TAG | 1 | NULL | true | STRING | DEFAULT |
| cnosdb | public | wind | speed | FIELD | 2 | NULL | true | DOUBLE | DEFAULT |
| cnosdb | public | wind | direction | FIELD | 3 | NULL | true | DOUBLE | DEFAULT |
| cnosdb | public | air | time | TIME | 0 | NULL | false | TIMESTAMP | DEFAULT |
| cnosdb | public | air | station | TAG | 1 | NULL | true | STRING | DEFAULT |
| cnosdb | public | air | visibility | FIELD | 2 | NULL | true | DOUBLE | DEFAULT |
| cnosdb | public | air | temperature | FIELD | 3 | NULL | true | DOUBLE | DEFAULT |
| cnosdb | public | air | pressure | FIELD | 4 | NULL | true | DOUBLE | DEFAULT |
| cnosdb | public | sea | time | TIME | 0 | NULL | false | TIMESTAMP | DEFAULT |
| cnosdb | public | sea | station | TAG | 1 | NULL | true | STRING | DEFAULT |
| cnosdb | public | sea | temperature | FIELD | 2 | NULL | true | DOUBLE | DEFAULT |
+-------------+---------------+------------+-------------+-------------+------------------+----------------+-------------+-----------+-------------------+

ENABLED_ROLES

此视图展示当前用户在当前租户下的角色信息。

视图定义

字段数据类型描述
ROLE_NAMESTRING角色名称

示例

SELECT * FROM information_schema.enabled_roles;
+-----------+
| role_name |
+-----------+
| owner |
+-----------+

ROLES

此视图展示当前租户下所有可用的角色(包含系统角色和自定义角色)。 此视图只对当前租户的Owner可见。

视图定义

字段数据类型描述
ROLE_NAMESTRING租户下的角色名称
ROLE_TYPESTRING角色类型,自定义角色或系统角色
INHERIT_ROLESTRING自定义角色继承的系统角色名称,如果是系统角色则为NULL

示例

SELECT * FROM information_schema.roles;
+-----------+-----------+--------------+
| role_name | role_type | inherit_role |
+-----------+-----------+--------------+
| owner | system | |
| member | system | |
+-----------+-----------+--------------+

DATABASE_PRIVILEGES

视图定义

此视图展示所在租户下所有已被授予给指定角色的作用在db上的权限。 此视图的所有记录对当前租户的Owner可见。 对于非Owner成员,只展示对应角色的记录。

字段数据类型描述
TENANT_NAMESTRING被授予权限的数据库所属的租户名称
DATABASE_NAMESTRING被授予权限的数据库名称
PRIVILEGE_TYPESTRING被授予的权限类型,READ/WRITE/ALL
ROLE_NAMESTRING被授予权限的角色名称

示例

CREATE ROLE rrr INHERIT member;
GRANT READ ON DATABASE air TO ROLE rrr;
SELECT * FROM information_schema.database_privileges;
+-------------+---------------+----------------+-----------+
| tenant_name | database_name | privilege_type | role_name |
+-------------+---------------+----------------+-----------+
| cnosdb | air | Read | rrr |
+-------------+---------------+----------------+-----------+

MEMBERS

此视图展示所在租户下的成员信息。

此视图的所有记录对当前租户的所有成员可见。

视图定义

字段数据类型描述
USER_NAMESTRING租户下的用户成员名称
ROLE_NAMESTRING成员的角色名称

示例

SELECT * FROM information_schema.members;
+-----------+-----------+
| user_name | role_name |
+-----------+-----------+
| root | owner |
+-----------+-----------+

QUERIES(INFORMATION_SCHEMA)

此视图展示SQL语句实时快照,用于实时监控SQL作业。

此视图的所有记录对当前租户的owner可见。

对于非Owner成员,只展示当前成员提交的SQL。

视图定义

字段数据类型描述
QUERY_IDSTRINGSQL语句的ID
QUERY_TEXTSTRINGSQL语句的内容
USER_IDSTRING提交SQL的用户ID
USER_NAMESTRING提交SQL的用户名称
TENANT_IDSTRING租户ID
TENANT_NAMESTRING租户名称
STATESTRING语句的运行状态,分为ACCEPTING,DISPATCHING,ANALYZING,OPTMIZING,SCHEDULING
DURATIONBIGINT UNSIGNED语句持续运行的时间

示例

SELECT * FROM information_schema.queries;
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| query_id | query_text | user_id | user_name | tenant_id | tenant_name | state | duration |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| 36 | select * FROM air join sea ON air.temperature = sea.temperature; | 108709109615072923019194003831375742761 | root | 13215126763611749424716665303609634152 | cnosdb | SCHEDULING | 12.693345666 |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+

SHOW QUERIES

你还可以使用 SHOW QUERIES 语句来查看正在执行的SQL语句, 该语句这是对QUERIES视图的包装。

示例

SHOW QUERIES;
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| query_id | query_text | user_id | user_name | tenant_id | tenant_name | state | duration |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| 36 | select * FROM air join sea ON air.temperature = sea.temperature; | 108709109615072923019194003831375742761 | root | 13215126763611749424716665303609634152 | cnosdb | SCHEDULING | 12.693345666 |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+

USAGE_SCHEMA

该数据库,属于某个租户,在创建Tenant时,自动创建该DB,对租户下的所有成员可见。

对于普通用户,只能看到 USAGE_SCHEMA 中的表中属于当前用户租户的一部分,

对于系统管理员,能看到 USAGE_SCHEMA 中表的全部。

VNODE_DISK_STORAGE

该视图记录集群中各个 vnode 所占磁盘空间大小,单位Byte。

视图定义

字段数据类型描述
TIMETIMESTAMP记录时间
DATABASESTRINGvnode 所属的数据库
NODE_IDSTRINGdata节点的ID
TENANTSTRINGvnode 所属的租户名称
VNODE_IDSTRINGvnode 的 ID
VALUEBIGINT UNSIGNEDvnode 所占磁盘大小

普通用户只能访问当前用户所在的租户信息。

HTTP_DATA_IN

该视图记录HTTP请求的Body大小

视图定义

字段数据类型描述
TIMETIMESTAMP记录时间
DATABASESTRINGDatabase名称
NODE_IDSTRINGData节点的 ID
TENANTSTRINGDatabase 所属的租户名称
USERSTRING用户名称
HOSTSTRING服务端口
APISTRINGHttp的API
VALUEBIGINT UNSIGNED写入流量的总大小

普通用户只能访问当前用户所在的租户信息。

HTTP_DATA_OUT

该视图记录HTTP回应的Body总大小

视图定义

字段数据类型描述
TIMETIMESTAMP记录时间
DATABASESTRINGDatabase名称
NODE_IDSTRINGData节点的 ID
TENANTSTRINGDatabase 所属的租户名称
USERSTRING用户名称
APISTRINGHttp的API
HOSTSTRING服务端口
VALUEBIGINT UNSIGNED读取流量的总大小

普通用户只能访问当前用户所在的租户信息。

HTTP_QUERIES

该视图记录用户查询DB的次数。

视图定义

字段数据类型描述
TIMETIMESTAMP记录时间
DATABASESTRINGDatabase名称
NODE_IDSTRINGData节点的 ID
TENANTSTRINGDatabase 所属的租户名称
USERSTRING用户名称
APISTRINGHttp的API
HOSTSTRING服务端口
VALUEBIGINT UNSIGNED用户查询次数

普通用户只能访问当前用户所在的租户信息。

HTTP_WRITES

该视图记录用户通过HTTP写入DB的次数。

注意,INSERT 语句是记录在(#ht)

视图定义

字段数据类型描述
TIMETIMESTAMP记录时间
DATABASESTRINGDatabase名称
NODE_IDSTRINGData节点的 ID
TENANTSTRINGDatabase 所属的租户名称
USERSTRING用户名称
HOSTSTRING服务端口
APISTRINGHttp的API
VALUEBIGINT UNSIGNED用户写入次数

普通用户只能访问当前用户所在的租户信息。

COORD_DATA_IN

记录通过Coordinator的接受数据大小

视图定义

字段类型描述
timeTIMESTAMP记录时间
databaseSTRINGDatabase名称
node_idSTRINGData节点的 ID
tenantSTRINGDatabase 所属的租户名称
valueBIGINT UNSIGNED测量值

普通用户只能访问当前用户所在的租户信息。

COORD_DATA_OUT

记录通过Coordinator的输出数据大小

视图定义

字段类型描述
timeTIMESTAMP记录时间
databaseSTRINGDatabase名称
node_idSTRINGData节点的 ID
tenantSTRINGDatabase 所属的租户名称
valueBIGINT UNSIGNED测量值

普通用户只能访问当前用户所在的租户信息。

COORD_QUERIES

记录通过Coordinator的接受数据次数

视图定义

字段类型描述
timeTIMESTAMP记录时间
databaseSTRINGDatabase名称
node_idSTRINGData节点的 ID
tenantSTRINGDatabase 所属的租户名称
valueBIGINT UNSIGNED测量值

普通用户只能访问当前用户所在的租户信息。

COORD_WRITES

记录通过Coordinator的输出数据次数

视图定义

字段类型描述
timeTIMESTAMP记录时间
databaseSTRINGDatabase名称
node_idSTRINGData节点的 ID
tenantSTRINGDatabase 所属的租户名称
valueBIGINT UNSIGNED测量值

普通用户只能访问当前用户所在的租户信息。

示例

SELECT * FROM usage_schema.http_data_in ORDER BY time DESC LIMIT 2;
+----------------------------+--------------+--------------+---------+--------+------+-------+-----------+
| time | api | host | node_id | tenant | user | value | database |
+----------------------------+--------------+--------------+---------+--------+------+-------+-----------+
| 2023-10-18T08:41:09.948999 | api/v1/write | 0.0.0.0:8902 | 1001 | cnosdb | root | 144 | sqlancer2 |
| 2023-10-18T08:41:09.948995 | api/v1/write | 0.0.0.0:8902 | 1001 | cnosdb | root | 251 | sqlancer1 |
+----------------------------+--------------+--------------+---------+--------+------+-------+-----------+

创建流表

创建流表,需要一个表作为source表,流表暂不支持 ALTER

语法

CREATE STREAM TABLE [IF NOT EXISTS] table_name(field_definition [, field_definition] ...)
WITH (db = 'db_name', table = 'table_name', event_time_column = 'time_column')
engine = tskv;

field_definition:
column_name data_type

db和table参数,指定源表

event_time_column 指定事件时间列,该列数据类型必须是 TIMESTAMP 类型

目前仅支持普通表为source表,流表字段定义的字段名和字段类型必须是属于source表,且与source表定义相同

示例

创建 source 表

CREATE DATABASE oceanic_station;
\c oceanic_station
CREATE TABLE air(pressure DOUBLE, temperature DOUBLE, visibility DOUBLE, TAGS(station));

创建流表

CREATE STREAM TABLE air_stream(time TIMESTAMP, station STRING, pressure DOUBLE, temperature DOUBLE, visibility DOUBLE) 
WITH (db = 'oceanic_station', table = 'air', event_time_column = 'time')
engine = tskv;

删除流表

与删除普通表语法相同,请参考删除表

流查询

流查询只支持 INSERT SELECT 语句,SELECT 语句中 FROM 子句是流表,插入到目标表。

写入数据到源表时,触发流式查询。

流查询的 SELECT 子句不支持 JOIN。

流查询的语句会持久化运行,通过KILL QUERY 取消执行

示例

以流式降采样场景为示例,source表时间间隔为一分钟,降采样时间区间为1小时

创建流查询的目标表

CREATE TABLE air_down_sampling_1hour(max_pressure DOUBLE, avg_temperature DOUBLE, sum_temperature DOUBLE, count_pressure BIGINT, TAGS(station));

创建流查询语句

INSERT INTO air_down_sampling_1hour(time, station, max_pressure, avg_temperature, sum_temperature, count_pressure) 
SELECT
date_bin(INTERVAL '1' HOUR, time, TIMESTAMP '2023-01-14T16:00:00') time,
station,
MAX(pressure) max_pressure,
AVG(temperature) avg_temperature,
SUM(temperature) sum_temperature,
COUNT(pressure) count_pressure
FROM air_stream
GROUP BY date_bin(INTERVAL '1' HOUR, time, TIMESTAMP '2023-01-14T16:00:00'), station;

写入数据时触发流查询语句

数据来源

\w oceanic_station.txt

查看目标表结果

SELECT * FROM air_down_sampling_1hour LIMIT 10;
+---------------------+------------+--------------+-----------------+-----------------+----------------+
| time | station | max_pressure | avg_temperature | sum_temperature | count_pressure |
+---------------------+------------+--------------+-----------------+-----------------+----------------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 80.0 | 68.05 | 1361.0 | 20 |
| 2023-01-14T17:00:00 | XiaoMaiDao | 79.0 | 63.75 | 1275.0 | 20 |
| 2023-01-14T18:00:00 | XiaoMaiDao | 79.0 | 66.35 | 1327.0 | 20 |
| 2023-01-14T19:00:00 | XiaoMaiDao | 78.0 | 68.05 | 1361.0 | 20 |
| 2023-01-14T20:00:00 | XiaoMaiDao | 80.0 | 64.35 | 1287.0 | 20 |
| 2023-01-14T21:00:00 | XiaoMaiDao | 77.0 | 61.05 | 1221.0 | 20 |
| 2023-01-14T22:00:00 | XiaoMaiDao | 80.0 | 64.8 | 1296.0 | 20 |
| 2023-01-14T23:00:00 | XiaoMaiDao | 80.0 | 66.35 | 1327.0 | 20 |
| 2023-01-15T00:00:00 | XiaoMaiDao | 80.0 | 65.15 | 1303.0 | 20 |
| 2023-01-15T01:00:00 | XiaoMaiDao | 80.0 | 69.55 | 1391.0 | 20 |
+---------------------+------------+--------------+-----------------+-----------------+----------------+

KILL QUERY

语法

KILL [QUERY] query_id;

先通过 SHOW QUERIES 获取 query_id

示例

SHOW QUERIES;
+----------+------+------------------------------------------------------------------+------------+----------+
| query_id | user | query | state | duration |
+----------+------+------------------------------------------------------------------+------------+----------+
| 4 | root | select * from air join sea on air.temperature = sea.temperature; | SCHEDULING | 2703 |
| 5 | root | show queries; | SCHEDULING | 0 |
+----------+------+------------------------------------------------------------------+------------+----------+
KILL 4;
Query took 0.016 seconds.