跳到主要内容
版本:2.3.2

SQL语法参考手册

数据库操作

数据类型

类型描述大小
BIGINT整型8字节
BIGINT UNSIGNED无符号整型8字节
BOOLEAN布尔类型1字节
TIMESTAMP时间戳8字节
STRINGUTF-8编码的字符串-----
DOUBLE双精度浮点型8字节

其他数据类型

以下数据类型无法直接存储,但会在SQL表达式中出现

类型描述备注
BINARY二进制数据,可以使用Cast子句转换成STRINGsha224, sha256, sha384, sha512函数的返回值均属于此类型
INTERVAL时间间隔时间加减运算和date_bin函数参数需要
ARRAY数组类型聚合函数array_agg返回类型为此

常量

类型语法说明
BIGINT[{+-}]123数值类型
BIGINT UNSIGNED[+]123数值类型
DOUBLE123.45数值类型,目前暂不支持科学记数法
BOOLEAN{true | false | t | f}
STRING'abc'不支持双引号格式,引号中连续两个''转义成‘
TIMESTAMPTIMESTAMP '1900-01-01T12:00:00Z'时间戳,TIMESTAMP 关键字表示后面的字符串常量需要被解释为 TIMESTAMP 类型。
Geometry点击跳转几何类型
--NULL空值

TIMESTAMP 常量语法

时间戳是按RFC3339标准

T代表间隔,仅可以用空格代替

Z代表零时区

+08:00 代表东八区

如下:

  • 1997-01-31T09:26:56.123Z # 标准RFC3339 UTC 时区
  • 1997-01-31T09:26:56.123+08:00 # 标准RFC3339 东八区
  • 1997-01-31 09:26:56.123+08:00 # 接近RFC3339, 只是用空格代替T
  • 1997-01-31T09:26:56.123 # 接近RFC3339, 没有指定时区,默认UTC
  • 1997-01-31 09:26:56.123 # 接近RFC3339, 用空格代替T, 且没有指定时区
  • 1997-01-31 09:26:56 # 接近RFC3339, 精确度是秒级

注意CAST (BIGINT AS TIMESTAMP) 是转化为纳秒级的时间戳,如下:

SELECT CAST (1 AS TIMESTAMP);
+-------------------------------+
| Int64(1) |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+

INTERVAL 常量

示例

  1. INTERVAL '1' 一秒
  2. INTERVAL '1 SECONDE' 一秒
  3. INTERVAL '1 MILLISECONDS' 一毫秒
  4. INTERVAL '1 MINUTE' 一分钟
  5. INTERVAL '0.5 MINUTE' 半分钟
  6. INTERVAL '1 HOUR' 一小时
  7. INTERVAL '1 DAY' 一天
  8. INTERVAL '1 DAY 1' 一天零一秒
  9. INTERVAL '1 WEEK' 一周
  10. INTERVAL '1 MONTH' 一月(30天)
  11. INTERVAL '0.5 MONTH' 半月(15天)
  12. INTERVAL '1 YEAR' 一年(12个月)
  13. INTERVAL '1 YEAR 1 DAY 1 HOUR 1 MINUTE' 一年零一天零一小时一分
  14. INTERVAL '1 DECADES' 一个十年

注意:

INTERVAL '1 YEAR' 并不是365天或366天,而是12个月。 INTERVAL '1 MONTH' 并不是28天或29天或31天,而是30天。

Geometry

WKT

WKT格式是一种文本格式,用于描述二维和三维几何对象的空间特征。 WKT是“Well-Known Text”的缩写,是一种开放的国际标准。 WKT格式包括一些基本的几何对象,例如点、线、多边形和圆形,以及一些复合对象,例如多边形集合和几何对象集合。

语法

<geometry tag> <wkt data>

<geometry tag> ::= POINT | LINESTRING | POLYGON | MULTIPOINT |
MULTILINESTRING | MULTIPOLYGON | GEOMETRYCOLLECTION

<wkt data> ::= <point> | <linestring> | <polygon> | <multipoint> |
<multilinestring> | <multipolygon> | <geometrycollection>
几何对象语法描述
POINT (<x1> <y1>)
线LINESTRING (<x1> <y1>, <x2> <y2>, ...)
多边形POLYGON ((<x1> <y1>, <x2> <y2>, ...))
多点MULTIPOINT (<x1> <y1>, <x2> <y2>, ...)
多线MULTILINESTRING ((<x1> <y1>, <x2> <y2>, ...), (<x1> <y1>, <x2> <y2>, ...))
多多边形MULTIPOLYGON (((<x1> <y1>, <x2> <y2>, ...)), ((<x1> <y1>, <x2> <y2>, ...)))
几何对象集合GEOMETRYCOLLECTION (<geometry tag1> <wkt data1>, <geometry tag2> <wkt data2>, ...)

示例

几何对象图片示例
POINT (30 10)
线LINESTRING (30 10, 10 30, 40 40)
多边形POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
多点MULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
多线MULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))
多面MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))
几何对象集合GEOMETRYCOLLECTION (POINT (40 10), LINESTRING (10 10, 20 20, 10 40), POLYGON ((40 40, 20 45, 45 30, 40 40)))

创建数据库

语法

CREATE DATABASE [IF NOT EXISTS] db_name [WITH db_options];

db_options:
db_option ...

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
| PRECISION {'ms' | 'us' | 'ns'}
}

参数说明

  1. TTL:表示数据文件保存的时间,默认无限,用带单位的数据表示,支持天(d),小时(h),分钟(m),如TTL 10d,TTL 50h,TTL 100m,当不带单位时,默认为天,如TTL 30。
  2. SHARD:表示数据分片个数,默认为1。
  3. VNODE_DURATION:表示数据在shard中的时间范围,默认为365天,同样使用带单位的数据来表示,数据意义与TTL的value一致。
  4. REPLICA:表示数据在集群中的副本数,默认为1(副本数不大于分布式数据节点的数量)。
  5. PRECISION:数据库的时间戳精度,ms 表示毫秒,us 表示微秒,ns 表示纳秒,默认为ns纳秒。

示例

> CREATE DATABASE oceanic_station;
Query took 0.062 seconds.

查看数据库

语法

SHOW DATABASES;

示例

+-----------------+
| database_name |
+-----------------+
| oceanic_station |
| public |
+-----------------+

使用数据库

如果你通过HTTP API来使用数据库, 你可以在url中指定参数 db=database_name 来使用数据库。

在 CnosDB-Cli 中,可以使用如下命令切换数据库:

\c dbname
public ❯ \c oceanic_station
oceanic_station ❯

删除数据库

语法

DROP DATABASE [IF EXISTS] db_name [AFTER '7d'];

删除数据库会将指定database的所有table数据及元数据全部删除。

当不带AFTER时,会立即删除;

当带AFTER时,为延迟删除,会在指定时间后删除,时间支持天(d),小时(h),分钟(m),如10d,50h,100m,当不带单位时,默认为天。延迟删除期间数据库不可见且不可用。

语法

RECOVER DATABASE [IF EXISTS] db_name;

取消延迟删除,数据库恢复正常状态。

注意:只有对延迟删除的资源,且在延迟删除期间,执行RECOVER语句才有作用。

示例

DROP DATABASE oceanic_station AFTER7d’;

RECOVER DATABASE oceanic_station;

DROP DATABASE oceanic_station;

修改数据库参数

语法

ALTER DATABASE db_name [alter_db_options]

alter_db_options:
SET db_option

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
}

示例

ALTER DATABASE oceanic_station SET TTL '30d';

查看数据库参数

语法

DESCRIBE DATABASE dbname;

示例

DESCRIBE DATABASE oceanic_station;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

表操作

创建表

可以使用 CREATE TABLE 创建表。

CnosDB 支持创建普通表和外部表。

创建普通(TSKV)表

语法

CREATE TABLE [IF NOT EXISTS] tb_name
(field_definition [, field_definition ] ... [, TAGS(tg_name [, tg_name] ...)]);

field_definition:
column_name data_type [field_codec_type]

field_codec_type:
CODEC(code_type)

使用说明

  1. 创建表时无需创建timestamp列,系统自动添加名为"time"的timestamp列。
  2. 各列的名字需要互不相同。
  3. 创建表时如果不指定压缩算法,则使用系统默认的压缩算法。
  4. 目前各种类型支持的压缩算法如下,每种类型第一个为默认指定的算法,NULL表示不使用压缩算法。
  • BIGINT/BIGINT UNSIGNED: DELTA, QUANTILE, SDT, DEADBAND, NULL
  • DOUBLE: GORILLA, QUANTILE, SDT, DEADBAND, NULL
  • STRING: SNAPPY, ZSTD, GZIP, BZIP, ZLIB, NULL
  • BOOLEAN: BITPACK, NULL

想了解更多有关压缩算法的内容可以看压缩算法详情

示例

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.033 seconds.

创建外部表

语法

-- Column definitions can not be specified for PARQUET files

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tb_name
( field_definition [, field_definition] ... ) tb_option;

field_definition:
column_name data_type [ NULL ]

tb_option: {
STORED AS { PARQUET | NDJSON | CSV | AVRO }
| [ WITH HEADER ROW ]
| [ DELIMITER 'a_single_char' ]
| [ PARTITIONED BY ( column_name, [, ... ] ) ]
| LOCATION '/path/to/file'
}

使用说明

  1. 外部表并不存在数据库中,而是将一个操作系统文件当作数据库普通表来访问。
  2. 数据均是只读的,不能执行 DML 操作,也不能建索引。

参数说明

  1. STORED AS:表示文件以什么格式储存,目前支持 PARQUET,NDJSON,CSV,AVRO格式。
  2. WITH HEADER ROW:仅在csv文件格式下生效,表示带有csv表头。
  3. DELIMITER:仅在csv格式下生效,表示列数据的分隔符。
  4. PARTITIONED BY:使用创建表时指定的列来进行分区。
  5. LOCATION:表示关联的文件的位置。

示例

CREATE EXTERNAL TABLE cpu (
cpu_hz DECIMAL(10,6) NOT NULL,
temp DOUBLE NOT NULL,
version_num BIGINT NOT NULL,
is_old BOOLEAN NOT NULL,
weight DECIMAL(12,7) NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION 'tests/data/csv/cpu.csv';
Query took 0.031 seconds.

删除表

语法

DROP TABLE [ IF EXISTS ] tb_name;

示例

DROP TABLE IF EXISTS air;
Query took 0.033 seconds.

显示当前数据库所有表

语法

SHOW TABLES;

示例

SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+

查看表的模式

外部表和普通表的模式都可以使用该语句查看。

语法

DESCRIBE DATABASE table_name;

示例

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+

修改表

说明

目前我们支持修改普通表。

  1. 添加列:添加 field,tag 列。
  2. 删除列:删除 field 列,当删除列导致删除某一行的最后一个 field 值时,我们认为这一行没有值,SELECT 时将不显示这一行。
  3. 修改列:修改列定义,目前支持修改列名、修改列的压缩算法。

语法

ALTER TABLE tb_name alter_table_option;

alter_table_option: {
ADD TAG col_name
| ADD FIELD col_name [data_type] [CODEC(code_type)]
| ALTER col_name SET CODEC(code_type)
| DROP col_name
| RENAME COLUMN col_name TO new_col_name
}

不支持修改 time 列名。 避免在执行 rename tag column 时执行写入操作,可能会引起 series 冲突。

示例

ALTER TABLE air ADD TAG height;
ALTER TABLE air ADD FIELD humidity DOUBLE CODEC(DEFAULT);
ALTER TABLE air ALTER humidity SET CODEC(QUANTILE);
ALTER TABLE air DROP humidity;
ALTER TABLE air RENAME COLUMN height to height_v2;

插入数据

CnosDB支持两种数据写入的方法,一种是使用INSERT INTO语句,另一种是使用HTTP API的write接口,写入Line Protocol格式数据。

本页面只展示INSERT相关的语法。

INSERT

语法

INSERT [INTO] tb_name [ ( column_name [, ...] ) ] VALUES (  const [, ...] ) [, ...] | query; 

说明

CnosDB 要求插入的数据列必须要有时间戳,且VALUES列表必须为常量。 如果有列没有被选中,那么值为NULL

注意

时间列不能为NULL,Tag列和Field列可以为NULL

例如INSERT INTO air (TIME, station, visibility) VALUES(1666132800000000000, NULL, NULL)

如果 VALUES 列表需要表达式,请使用INSERT SELECT语法。

插入一条记录

TIME 列的数据既可以用时间字符串表示,也可以用数字类型的时间戳表示,请注意。

示例

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69, 77);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.044 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 06:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.032 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

注意

关于时区表示,请参考Timestamp

插入多条记录

VALUES关键字后面可以跟多个列表,用,分隔开。

例子

INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 05:40:00', 'XiaoMaiDao', 55, 68, 76),
('2022-10-19 04:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 2 |
+------+
Query took 0.037 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

插入查询结果(INSERT SELECT)

你还可以使用 INSERT SELECT语法,向表中插入查询的数据。

示例

CREATE TABLE air_visibility (
visibility DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT air_visibility (TIME, station, visibility) 
SELECT TIME, station, visibility FROM air;
+------+
| rows |
+------+
| 4 |
+------+
Query took 0.045 seconds.
SELECT * FROM air_visibility;
+----------------------------+------------+------------+
| time | station | visibility |
+----------------------------+------------+------------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 |
+----------------------------+------------+------------+

插入重复数据

CnosDB的存储引擎可以看成一种KV存储,其中Timestamp 和 Tags 构成了 KEY,Fields 构成了一系列Value。

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
INSERT INTO air (TIME, station, visibility, temperature) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69);

上面语句相当于插入了如下k-v对:

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')56
(1666165200290401000, 'XiaoMaiDao')69

结果为

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 56.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+

当同一field字段出现重复的k-v 对时,会发生覆盖。

INSERT INTO air (TIME, station, visibility) VALUES
(1666165200290401000, 'XiaoMaiDao', 66);

相当于插入

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')66

key 为 (1666165200290401000, 'XiaoMaiDao') 的 visibility-value 发生变化,变为 66。

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+
INSERT INTO air (TIME, station, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 77);

相当于插入

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')77
select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | 77.0 |
+----------------------------+------------+------------+-------------+----------+

更新数据

更新 tag 列

语法

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause

assignment clause :
tag_name = value_expression

使用说明

  1. CnosDB支持单独更新单个或多个 tag 列值,不支持同时更新 tag 列及 field 列。
  2. CnosDB支持更新 tag 列值为 NULL。
  3. value_expression 只能为编译期能确定值的表达式,如:'常量'、'1 + 2'、'CAST('1999-12-31 00:00:00.000' as timestamp)' 等。
  4. where_clause 中不能包含 field 列或 time 列,且不能为空,如果想更新表中所有数据,需要使用 'where true',这代表你接受在表数据量比较大时带来的性能问题。
  5. 不支持修改成已经存在 series(所有的 tag 列值构成 series)。
  6. 避免在写入数据时执行更新 tag 操作,可能会引起 series 冲突。

示例

update air set station = 'ShangHai' where station = 'LianYunGang';

更新 field 列

语法

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause

assignment clause :
field_name = value_expression

使用说明

  1. CnosDB支持单独更新单个或多个 field 列值,不支持同时更新 tag 列及 field 列。

示例

update air set pressure = pressure + 100 where pressure = 68 and time < '2023-01-14T16:03:00';

删除数据

按 tag 和 时间列过滤并删除数据。

语法

DELETE FROM table_name where_clause

使用说明

  1. where_clause 中仅能包含 tag 列和 time 列,不能包含 field 列。

示例

delete from air where station = 'LianYunGang' and time < '2023-01-14T16:03:00';

查询数据

CnosDB SQL 的灵感来自于 DataFusion ,我们支持DataFusion的大部分SQL语法。

注意:为了查询能更高效,没有指定排序的查询,每次行顺序都不一定相同,如果需要按字段排序的话,请参看ORDER BY子句。

示例数据

为了进一步学习CnosDB,本节将提供示例数据供您下载,并教您如何将数据导入数据库。后面章节中引用的数据源都来自此示例数据。

下载数据

如果在 cnosdb-cli 中,请输入\q退出

在shell中执行以下命令将在本地生成一个名称为oceanic_station的Line Protocol格式的数据文件

curl -o oceanic_station.txt https://dl.cnosdb.com/sample/oceanic_station.txt

导入数据

  • 启动CLI

    cnosdb-cli
  • 创建数据库

    create database oceanic_station;
  • 切换到指定数据库

    \c oceanic_station
  • 导入数据

    执行\w指令,\w后面为数据文件的绝对路径或相对cnosdb-cli的工作路径。

    \w oceanic_station.txt

SQL 语法

语法

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ OFFSET count ]
[ LIMIT { count | ALL } ];

-- from_item
-- 1.
tb_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
-- 2.
from_item join_type from_item
{ ON join_condition | USING ( join_column [, ...] ) }

-- join_type
[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
FULL [ OUTER ] JOIN
CROSS JOIN

-- grouping_element
()

SELECT 子句

SELECT *

通配符 * 可以用于代指全部列。

示例

SELECT * FROM air;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

ALL/DISTINCT

语法

SELECT [ ALL | DISTINCT ] select_expression [, ...];

SELECT关键字后可以使用DISTINCT去掉重复字段,只返回去重后的值。 使用ALL会返回字段中所有重复的值。不指定此选项时,默认值为ALL

示例

SELECT DISTINCT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
+-------------+------------+
SELECT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+------------+

别名

可以用 AS 关键字为列表达式或表取别名。

为列表达式取别名

语法

expression [ [ AS ] column_alias ]

示例

SELECT station s, visibility AS v FROM air;
+-------------+----+
| s | v |
+-------------+----+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+----+

为表取别名

你也可以用关键字AS为表取别名。

语法

FROM tb_name [AS] alias_name

示例

SELECT a.visibility, s.temperature
FROM air AS a JOIN sea s ON a.temperature = s.temperature limit 10;
+------------+-------------+
| visibility | temperature |
+------------+-------------+
| 67 | 62 |
| 50 | 78 |
| 50 | 78 |
| 65 | 79 |
+------------+-------------+

SELECT限制

  • 如果SELECT子句仅包含Tag列,相当于 SELECT DISTINCT Tag列

    示例

    -- station是Tag列,temperature是Field列
    SELECT station, temperature FROM air;
    +-------------+-------------+
    | station | temperature |
    +-------------+-------------+
    | XiaoMaiDao | 69 |
    | XiaoMaiDao | 78 |
    | XiaoMaiDao | 62 |
    | XiaoMaiDao | 79 |
    | XiaoMaiDao | 53 |
    | XiaoMaiDao | 72 |
    | XiaoMaiDao | 71 |
    | LianYunGang | 69 |
    | LianYunGang | 80 |
    | LianYunGang | 74 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    +-------------+-------------+
    -- station 是Tag列
    SELECT station FROM air;
    +-------------+
    | station |
    +-------------+
    | XiaoMaiDao |
    | LianYunGang |
    +-------------+

LIMIT 子句

语法

LIMIT n

限制返回结果集的行数为n,n必须非负。

示例

SELECT *
FROM air LIMIT 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
+---------------------+-------------+------------+-------------+----------+

OFFSET 子句

语法

OFFSET m

返回的结果集跳过 m 条记录, 默认 m=0。

示例

SELECT *
FROM air OFFSET 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

OFFSET可以和LIMIT语句配合使用,用于指定跳过的行数,格式为LIMIT n OFFSET m。 其中:LIMIT n控制输出m行数据,OFFSET m表示在开始返回数据之前跳过的行数。 OFFSET 0与省略OFFSET子句效果相同。

示例

SELECT *
FROM air LIMIT 3 OFFSET 3;
+---------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
+---------------------+------------+------------+-------------+----------+

WITH 子句

语法

WITH cte AS cte_query_definiton [, ...] query

可选。WITH子句包含一个或多个常用的表达式CTE(Common Table Expression)。 CTE充当当前运行环境中的临时表,您可以在之后的查询中引用该表。CTE使用规则如下:

  • 在同一WITH子句中的CTE必须具有唯一的名字。
  • 在WITH子句中定义的CTE仅对在其后定义的同一WITH子句中的其他CTE可以使用。 假设A是子句中的第一个CTE,B是子句中的第二个CTE:

示例

SELECT station, avg 
FROM ( SELECT station, AVG(visibility) AS avg
FROM air
GROUP BY station) AS x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+
WITH x AS 
(SELECT station, AVG(visibility) AS avg FROM air GROUP BY station)
SELECT station, avg
FROM x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+

UNION 子句

UNION 子句用于合并多个 SELECT 语句的分析结果。

语法

select_clause_set_left
[ UNION | UNION ALL| EXCEPT | INTERSECT]
select_clause_set_right
[sort_list_columns] [limit_clause]

UNION 会对合并的结果集去重。 UNION ALL 保留合并的结果集中相同的数据。 EXCEPT 会作两个结果集的差,从左查询中返回右查询没有找到的所有非重复值。 INTERSECT 返回两个结果集的交集(即两个查询都返回的所有非重复值)。

注意

UNION 内每个 SELECT 子句必须拥有相同数量的列,对应列的数据类型相同。

示例

  • UNION ALL

    SELECT visibility FROM air WHERE temperature < 60
    UNION ALL
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    +------------+
  • UNION

    SELECT visibility FROM air WHERE temperature < 60
    UNION
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • EXCEPT

    SELECT visibility FROM air
    EXCEPT
    SELECT visibility FROM air WHERE temperature < 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • INTERSECT

    SELECT visibility FROM air
    INTERSECT
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+

ORDER BY 子句

按引用的表达式对结果进行排序。默认情况使用升序 (ASC)。通过在 ORDER BY 的表达式后添加 DESC 按降序排序。

示例

SELECT * FROM air ORDER BY temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY temperature DESC;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY station, temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
+---------------------+-------------+------------+-------------+----------+

表达式

表达式是符号和运算符的一种组合,CnosDB 将处理该组合以获得单个数据值。 简单表达式可以是一个常量、变量、列或标量函数。 可以用运算符将两个或更多的简单表达式联接起来组成复杂表达式。

语法

<expresion> :: = { 
constant
| [ table_name. ] column
| scalar_function
| ( expression )
| expression { binary_operator } expression
| case_when_expression
| window_function | aggregate_function
}

常量

表示单个特定数据值的符号。 详细内容请阅览常量

示例

select 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+

标量函数

详细内容请阅览函数

单目运算符

运算符说明
NOT如果子表达式为true,则整个表达式false,如果整个表达式为false,则整个表达式为true
IS NULL如果子表达式为null,则整个表达式为true
IS NOT NULL如果子表达式为null,则整个表达式为false

二元运算符

二元运算符和两个表达式组合在一起,形成一个新的表达式。

支持的二元运算符有:

运算符说明
+数字类型表达式相加
-数字类型表达式相减
*数字类型表达式相乘
/数字类型表达式相除
%整数类型表达式取余
||字符串类型表达式拼接
=比较表达式是否相等
!=、 <>比较表达式是否不相等
<比较表达式是否小于
<=比较表达式是否小于等于
>比较表达式是否大于
>=比较表达式是否大于等于
AND先求左表达式的值,如果为true,计算右表达式的值,都为true为true
OR先求左表达式的值,如果为false,计算右表达式的值,都为false为false
LIKE判断左表达式是否符合右表达式的模式

BETWEEN AND 表达式

语法

expr BETWEEN expr AND expr

示例

SELECT DISTINCT PRESSURE FROM AIR WHERE PRESSURE BETWEEN 50 AND 60;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 60 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

注意:BETWEEN x AND y 会列出x和y之间的数,包括x和y

IN 表达式

IN 操作符判断列表中是否有值与表达式相等。

示例

SELECT station, temperature, visibility FROM air WHERE temperature  IN (68, 69);
+-------------+-------------+------------+
| station | temperature | visibility |
+-------------+-------------+------------+
| XiaoMaiDao | 69 | 56 |
| LianYunGang | 69 | 78 |
+-------------+-------------+------------+

注意

IN 列表暂时只支持常量

CASE WHEN 表达式

当表达式需要按照不同情况得不同的值时,可以使用 CASE WHEN 表达式。

语法

CASE
( WHEN expression THEN result1 [, ...] )
ELSE result
END;

示例

SELECT DISTINCT 
CASE WHEN PRESSURE >= 60 THEN 50
ELSE PRESSURE
END PRESSURE
FROM AIR;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

运算符优先级

如果一个复杂表达式有多个运算符,则运算符优先级将确定操作序列。 执行顺序可能对结果值有明显的影响。

运算符的优先级别如下表中所示。 在较低级别的运算符之前先对较高级别的运算符进行求值。 在下表中,1 代表最高级别,8 代表最低级别。

级别运算符
1*(乘)、/(除)、%(取模)
2+(正)、-(负)、+(加)、+(串联)、-(减)
3=、>=、<=、<>、!=、>、<(比较运算符)
4NOT
5AND
6BETWEEN、IN、LIKE、OR

SHOW

语法

SHOW {DATABASES | TABLES | QUERIES}

显示所有数据库,或显示所有表, 或正在执行的SQL。

示例

SHOW DATABASES;
+---------------+
| database_name |
+---------------+
| public |
+---------------+
SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+
SHOW QUERIES;
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| query_id | query_text | user_id | user_name | tenant_id | tenant_name | state | duration |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| 36 | select * FROM air join sea ON air.temperature = sea.temperature; | 108709109615072923019194003831375742761 | root | 13215126763611749424716665303609634152 | cnosdb | SCHEDULING | 12.693345666 |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+

关于 SHOW QUERIES 语句的详细信息,可以在系统表 QUERIES 查看。

SHOW SERIES

返回指定表的series

语法

SHOW SERIES [ON database_name] FROM table_name [WHERE expr] [order_by_clause] [limit_clause] 

示例

SHOW SERIES FROM air WHERE station = 'XiaoMaiDao' ORDER BY key LIMIT 1;
+------------------------+
| key |
+------------------------+
| air,station=XiaoMaiDao |
+------------------------+

注意

WEHER子句中的表达式列,只能是tag列或time列,ORDER BY 子句的表达式只能是 key

SHOW TAG VALUES

语法

SHOW TAG VALUES [ON database_name] FROM table_name WITH KEY [<operator> "<tag_key>" | [[NOT] IN ("<tag_key1>", ..)]] [WHERE expr] [order_by_clause] [limit_clause];

operator 包括 =, !=

示例

SHOW TAG VALUES FROM air WITH KEY = "station" WHERE station = 'XiaoMaiDao' ORDER BY key, value LIMIT 1;
+---------+------------+
| key | value |
+---------+------------+
| station | XiaoMaiDao |
+---------+------------+
SHOW TAG VALUES FROM air WITH KEY NOT IN ("station1");
+---------+-------------+
| key | value |
+---------+-------------+
| station | XiaoMaiDao |
| station | LianYunGang |
+---------+-------------+

EXPLAIN

语法

EXPLAIN [ ANALYZE ] [ VERBOSE ] <statement>;

说明

EXPLAIN 语句仅用于显示查询的执行计划,而不执行查询。

EXPLAIN ANALYZE 执行查询,并显示查询的执行计划。

EXPLAIN ANALYZE VERBOSE 执行查询,并显示更详细的执行计划,包括读取的行数等。

示例

EXPLAIN SELECT station, temperature, visibility FROM air;
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #air.station, #air.temperature, #air.visibility |
| | TableScan: air projection=[station, visibility, temperature] |
| physical_plan | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature] |
| | |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE VERBOSE SELECT station, temperature, visibility FROM air;
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=26.75µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=13.225875ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_point_to_record_batch=3.918163ms, elapsed_field_scan=3.992161ms, elapsed_series_scan=1.657416ms] |
| | |
| Plan with Full Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.865034 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=26.75µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.864225 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=13.225875ms, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13, elapsed_point_to_record_batch{partition=0}=3.918163ms, elapsed_field_scan{partition=0}=3.992161ms, elapsed_series_scan{partition=0}=1.657416ms] |
| | |
| Output Rows | 13 |
| Duration | 13.307708ms |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

DESCRIBE

语法

DESCRIBE {DATABASE db_name | TABLE tb_name};

描述数据库的参数,描述表的模式。

示例

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+
DESCRIBE DATABASE public;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

Join 子句

CnosDB支持 INNER JOINLEFT OUTER JOINRIGHT OUTER JOINFULL OUTER JOIN

目前暂不支持 CROSS JOIN

INNER JOIN

关键字 JOININNER JOIN 定义了一个只显示两个表中匹配的行的连接。

示例

SELECT * FROM air INNER JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

LEFT JOIN

用关键字 LEFT JOINLEFT OUTER JOIN 定义一个左连接。该连接包括左表中的所有行,如果右表没有匹配行,则连接的右侧为空值。

示例

SELECT * FROM air LEFT JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

RIGHT JOIN

用关键字 RIGHT JOINRIGHT OUTER JOIN 定义一个右连接。该连接包括右表中的所有行,如果左表没有匹配行,则连接的左侧为空值。

示例

SELECT * FROM air RIGHT JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

FULL JOIN

关键字 FULL JOINFULL OUTER JOIN 定义了一个全连接,实际上它是 LEFT OUTER JOIN 和 RIGHT OUTER JOIN 的联合。 它会显示连接左侧和右侧的所有行,并将在连接的任一侧不匹配的地方产生空值。

示例

SELECT * FROM air FULL JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

GROUP BY 子句

GROUP BY 子句必须在 WHERE 子句(如果有的话)的条件之后,ORDER BY 子句(如果有的话)之前。

示例

SELECT station, AVG(temperature) 
FROM air
GROUP BY station;
+-------------+----------------------+
| station | AVG(air.temperature) |
+-------------+----------------------+
| XiaoMaiDao | 69.14285714285714 |
| LianYunGang | 72.16666666666667 |
+-------------+----------------------+

HAVING 子句

语法

group_by_clause 
[ HAVING condition ];

在 SELECT 查询中,HAVING 子句必须紧随 GROUP BY 子句,并出现在 ORDER BY 子句(如果有的话)之前。

HAVING 与 WHERE 的区别

HAVING 在 GROUP BY 子句之后使你能够指定过滤条件,从而控制查询结果中哪些组可以出现在最终结果里面。

WHERE 在 GROUP BY 子句之前对被选择的列施加条件,而 HAVING 子句则对 GROUP BY 子句所产生的组施加条件。

示例

SELECT station, AVG(temperature)  AS avg_t 
FROM air
GROUP BY station
HAVING avg_t > 70;
+-------------+-------------------+
| station | avg_t |
+-------------+-------------------+
| LianYunGang | 72.16666666666667 |
+-------------+-------------------+

复杂的分组操作

CnosDB 提供了 ROLLUPCUBE 等复杂分组操作,使您能以不同的方式操作查询结果

ROLLUP

您可以在单个查询中使用 ROLLUP 选项来生成多个分组集。

ROLLUP 假定输入列之间存在层次结构。

如果你的group by 子句是:

语法

SELECT ...
FROM ...
GROUP BY ROLLUP(column_1,column_2);

它与如下的语句等同:

语法

SELECT ...
FROM ...


UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1

UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1, column2;

ROLLUP 生成在此层次结构中有意义的所有分组集。 每次 column_1 的值发生变化时,它都会生成一个小计行;

因此,我们经常在报告中使用 ROLLUP 来生成小计和总计。 ROLLUP 中列的顺序非常重要。

示例

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY ROLLUP (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| | | 70.53846153846153 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
+-------------+------------+----------------------+

CUBE

与 ROLLUP 类似,CUBE 是 GROUP BY 子句的扩展。 它允许您为 GROUP BY 子句中指定的分组列的所有组合生成小计。

CUBE为指定表达式集的每个可能组合创建分组集。首先会对(A、B、C)进行group by,

然后依次是(A、B),(A、C),(A),(B、C),(B),(C),最后对全表进行group by操作。

SELECT ... 
FROM ...
GROUP BY CUBE (column1, column2);

等价于:

SELECT ...
FROM ...
GROUP BY column1

UNION ALL

SELECT ...
FROM ...
GROUP BY column2

UNION ALL

SELECT ...
FROM ...
GROUP BY column1, column2

UNION ALL

SELECT ...
FROM ...
;

示例

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY CUBE (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
| | 56 | 69 |
| | 50 | 78 |
| | 67 | 66 |
| | 65 | 79 |
| | 53 | 53 |
| | 74 | 72 |
| | 71 | 71 |
| | 78 | 69 |
| | 79 | 80 |
| | 59 | 72 |
| | 80 | 70 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| | | 70.53846153846153 |
+-------------+------------+----------------------+

聚合函数

一般聚合函数

COUNT

语法

COUNT(x)

功能:返回选定元素中检索过的行的数目。

包含DISTINCT关键字,会对去重后的结果计数。

COUNT() 和 COUNT(literal value) 是等价的,如果sql的投影中仅含有 `/literal value`,则sql会被重写为 COUNT(time)。

COUNT(tag) 等价于 COUNT(DISTINCT tag)。

COUNT(field) 返回非NULL值的个数。

参数类型:任意

返回类型:BIGINT

示例

SELECT COUNT(*) FROM air;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 13 |
+-----------------+
SELECT COUNT(temperature) FROM air;
+------------------------+
| COUNT(air.temperature) |
+------------------------+
| 13 |
+------------------------+
SELECT COUNT(DISTINCT temperature) FROM air;
+---------------------------------+
| COUNT(DISTINCT air.temperature) |
+---------------------------------+
| 10 |
+---------------------------------+

SUM

语法

SUM(NUMERICS)

功能:返回从选定元素计算出的总和值。

参数类型:数值类型。

返回类型:与参数类型相同。

示例

SELECT SUM(temperature) FROM air;
+----------------------+
| SUM(air.temperature) |
+----------------------+
| 917 |
+----------------------+

MIN

语法

MIN(STRING | NUMERICS | TIMESTAMP)

功能:返回选定元素中最小值。

参数类型:数值类型或STRING或TIMESTAMP。

返回类型:与参数类型相同。

示例

 SELECT MIN(time), MIN(station), MIN(temperature) FROM air;
+---------------------+------------------+----------------------+
| MIN(air.time) | MIN(air.station) | MIN(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:21:00 | LianYunGang | 53 |
+---------------------+------------------+----------------------+

MAX

语法

MAX(STRINGS | NUMERICS | TIMESTAMPS)

功能:返回选定元素中最大值。

参数类型:数值类型或STRING或TIMESTAMP。

返回类型:与参数类型相同。

示例

SELECT MAX(time), MAX(station), MAX(temperature) FROM air;
+---------------------+------------------+----------------------+
| MAX(air.time) | MAX(air.station) | MAX(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:39:00 | XiaoMaiDao | 80 |
+---------------------+------------------+----------------------+

AVG

语法

AVG(NUMERICS)

功能:返回选定元素的平均值。

参数类型:数值类型。

返回类型:数值类型。

示例

SELECT AVG(temperature) FROM air;
+----------------------+
| AVG(air.temperature) |
+----------------------+
| 70.53846153846153 |
+----------------------+

ARRAY_AGG

语法

ARRAY_AGG(expr)

功能:返回一个数组,该数组由选定元素的所有值组成,元素类型必须相同。

参数类型:任意。

返回类型:参数类型的数组。

示例

SELECT ARRAY_AGG(temperature) from air;
+------------------------------------------------------+
| ARRAYAGG(air.temperature) |
+------------------------------------------------------+
| [69, 78, 62, 79, 53, 72, 71, 69, 80, 74, 70, 70, 70] |
+------------------------------------------------------+

注意:该聚合函数结果,无法以CSV格式返回。

FIRST

first(time,  value)

获取一列按另一列排序的第一个值。

参数:

  • time: Timestamp

  • value: any

返回值: 同value类型相同

示例

select first(time, pressure) from air;
+------------------------------+
| first(air.time,air.pressure) |
+------------------------------+
| 63.0 |
+------------------------------+

LAST

last(time,  value)

获取一列按另一列排序的最后一个值。

参数:

  • time: Timestamp

  • value: any

返回值: 同value类型相同

示例

select last(time, pressure) from air;
+-----------------------------+
| last(air.time,air.pressure) |
+-----------------------------+
| 55.0 |
+-----------------------------+

MODE

mode(value)

计算一列的众数。

参数: value: any

返回值: 同value类型相同

示例

select mode(pressure) from air;
+--------------------+
| mode(air.pressure) |
+--------------------+
| 69.0 |
+--------------------+

INCREASE

increase(time, value order by time)

计算 value 在时间序列中的增量

参数: value 数字类型

返回值: 同value类型相同

示例

CREATE DATABASE IF NOT EXISTS TEST_INCREASE;
ALTER DATABASE TEST_INCREASE SET TTL '100000D';
CREATE TABLE IF NOT EXISTS test_increase.test_increase(f0 BIGINT, TAGS(t0));
INSERT INTO test_increase.test_increase(time, t0, f0)
VALUES
('1999-12-31 00:00:00.000', 'a', 1),
('1999-12-31 00:00:00.005', 'a', 2),
('1999-12-31 00:00:00.010', 'a', 3),
('1999-12-31 00:00:00.015', 'a', 4),
('1999-12-31 00:00:00.020', 'a', 5),
('1999-12-31 00:00:00.025', 'a', 6),
('1999-12-31 00:00:00.030', 'a', 7),
('1999-12-31 00:00:00.035', 'a', 8),
('1999-12-31 00:00:00.000', 'b', 1),
('1999-12-31 00:00:00.005', 'b', 2),
('1999-12-31 00:00:00.010', 'b', 3),
('1999-12-31 00:00:00.015', 'b', 4),
('1999-12-31 00:00:00.020', 'b', 1),
('1999-12-31 00:00:00.025', 'b', 2),
('1999-12-31 00:00:00.030', 'b', 3),
('1999-12-31 00:00:00.035', 'b', 4);
SELECT t0, INCREASE(time, f0 ORDER BY time) AS increase
FROM test_increase.test_increase GROUP BY t0 ORDER BY t0;
+----+----------+
| t0 | increase |
+----+----------+
| a | 7 |
| b | 7 |
+----+----------+

统计聚合函数

VAR | VAR_SAMP

语法

VAR(NUMERICS)

功能:计算给定样本的方差

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT VAR(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 51.43589743589741 |
+---------------------------+

VAR_POP

语法

VAR_POP(NUMERICS)

功能:计算总体方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT VAR_POP(temperature) FROM air;
+------------------------------+
| VARIANCEPOP(air.temperature) |
+------------------------------+
| 47.47928994082838 |
+------------------------------+

STDDEV | STDDEV_SAMP

STDDEV(NUMERICS)

功能:计算样本标准差。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT STDDEV(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 7.1718824192744135 |
+-------------------------+

STDDEV_POP

语法

STDDEV_POP(NUMERICS)

功能:计算出的总体标准差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT STDDEV_POP(temperature) FROM air;
+----------------------------+
| STDDEVPOP(air.temperature) |
+----------------------------+
| 6.890521746633442 |
+----------------------------+

COVAR | COVAR_SAMP

语法

COVAR(NUMERICS, NUMERICS)

功能:返回样本的协方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT COVAR(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| -5.121794871794841 |
+------------------------------------------+

COVAR_POP

语法

COVAR_POP(NUMERICS, NUMERICS)

功能:返回组中数字对的总体协方差。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT COVAR_POP(temperature, pressure) FROM air;
+---------------------------------------------+
| COVARIANCEPOP(air.temperature,air.pressure) |
+---------------------------------------------+
| -4.727810650887546 |
+---------------------------------------------+

CORR

语法

CORR**(NUMERICS, NUMERICS)

功能:返回表示一组数字对之间的关联情况的皮尔逊系数。

参数类型:数值类型。

返回类型:DOUBLE

示例

SELECT CORR(temperature, pressure) FROM air;
+-------------------------------------------+
| CORRELATION(air.temperature,air.pressure) |
+-------------------------------------------+
| -0.07955796767766017 |
+-------------------------------------------+

近似聚合函数

APPROX_DISTINCT

语法

APPROX_DISTINCT(x)

功能:返回不同输入值的近似值(HyperLogLog)。

参数类型:STRING

返回类型:BIGINT

示例

SELECT APPROX_DISTINCT(station) FROM air;
+-----------------------------+
| APPROXDISTINCT(air.station) |
+-----------------------------+
| 2 |
+-----------------------------+

APPROX_PERCENTILE_CONT

语法

APPROX_PERCENTILE_CONT(x, p)  

功能:返回输入值x的近似百分位(TDigest),p是百分位,是0到1(包括1)之间的64位浮点数。

参数类型:x为数值类型,p为DOUBLE类型。

返回类型:DOUBLE

示例

SELECT APPROX_PERCENTILE_CONT(temperature, 0.1) FROM air;
+----------------------------------------------------+
| APPROXPERCENTILECONT(air.temperature,Float64(0.1)) |
+----------------------------------------------------+
| 60.4 |
+----------------------------------------------------+

APPROX_PERCENTILE_CONT_WITH_WEIGHT

语法

APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, w, p)  

功能:x返回带权重的输入值的近似百分比(TDigest),其中w是权重列表达式,p是0到1(包括)之间的浮点64。

APPROX_PERCENTILE_CONT(x, p) 相当于 APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, 1, p)。

参数类型:x,w为数值类型,p为DOUBLE类型。

返回类型:DOUBLE

示例

SELECT APPROX_PERCENTILE_CONT_WITH_WEIGHT(temperature,2, 0.1) FROM air;
+-----------------------------------------------------------------------+
| APPROXPERCENTILECONTWITHWEIGHT(air.temperature,Int64(2),Float64(0.1)) |
+-----------------------------------------------------------------------+
| 54.35 |
+-----------------------------------------------------------------------+

APPROX_MEDIAN(NUMERICS)

语法

APPROX_MEDIAN(NUMERICS)

功能:返回输入值的近似中值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT APPROX_MEDIAN(temperature) FROM air;
+-------------------------------+
| APPROXMEDIAN(air.temperature) |
+-------------------------------+
| 70 |
+-------------------------------+

SAMPLE

语法

SAMPLE(<column_key>, <N>)

功能:从给定的列 column_key 中随机选择 N 条记录。

参数类型

  • column_key:任意类型

  • N:整数类型

返回类型:数组

示例

select sample(visibility, 5) from air;
+--------------------------------------+
| sample(air.visibility,Int64(5)) |
+--------------------------------------+
| [65.0, 74.0, 76.0, 77.0, 72.0, 77.0] |
+--------------------------------------+

ASAP_SMOOTH

asap_smooth(time, value, resolution order by time)

ASAP smoothing算法旨在创建人类可读的图形,保留输入数据的粗糙形状和较大趋势,同时最小化点之间的局部方差。 采用 (Timestamp,value)对,将它们标准化为目标时间间隔,并返回 ASAP smooth 值。

参数:

  • time: Timestamp

  • value: Double

  • resolution: Bigint ,要返回的大概点数( (Timestamp, value) 对),确定结果图的水平分辨率。

返回值: TimeVector

Struct {
time: List[Timestamp], -- 毫秒级
value: List[Double],
resolution: Int Unsigned,
}

示例

select asap_smooth(time, pressure, 10) from air group by date_trunc('month', time);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| asap_smooth(air.time,air.pressure,Int64(10)) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10} |
| {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10} |
| {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

两阶段聚合函数

stats_agg

对二维数据执行线性回归分析,例如计算相关系数和协方差。 并且还可以分别计算每个维度的常见统计数据,例如平均值和标准差。 stats_agg 提供了与sum, count, corr, covar_pop 等聚合函数相同的功能, 适用于一条SQL中,包含多个分析函数的场景。

注意: 两列都不为NULL时才纳入聚合。

stats_agg

stats_agg(y, x)

功能:进行统计聚合。

参数类型

  • y: double 类型
  • x: double 类型

返回类型:结构体类型

{ 
n: bigint, -- count
sx: double, -- sum(x)- sum(x)
sx2: double, -- sum((x-sx/n)^2) (sum of squares)
sx3: double, -- sum((x-sx/n)^3)
sx4: double, -- sum((x-sx/n)^4)
sy: double, -- sum(y)
sy2: double, -- sum((y-sy/n)^2) (sum of squares)
sy3: double, -- sum((y-sy/n)^3)
sy4: double, -- sum((y-sy/n)^4)
sxy: double, -- sum((x-sx/n)*(y-sy/n)) (sum of products)
}

示例

create table if not exists test_stats(x bigint, y bigint);
alter database public set ttl '1000000d';
insert into test_stats(time, x, y) values
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 1, 4),
(5, 1, 5),
(6, 2, 1),
(7, 2, 2),
(8, 2, 3),
(9, 2, 4),
(10, 2, 5);
select stats_agg(y, x) from test_stats;
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stats_agg(test_stats.y,test_stats.x) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {n: 10, sx: 15.0, sx2: 2.5, sx3: -2.7755575615628914e-16, sx4: 0.6249999999999999, sy: 30.0, sy2: 20.0, sy3: -1.7763568394002505e-15, sy4: 68.0, sxy: 0.0} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+

num_vals

计算二维统计聚合后的数据行数。

返回类型:BIGINT UNSIGNED

select num_vals(stats_agg(y, x)) from test_stats;
+------------------------------------------------+
| num_vals(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------+
| 10 |
+------------------------------------------------+

average_y, average_x

计算二维统计聚合后指定维度的平均值。

返回类型:DOUBLE

select average_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| average_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 1.5 |
+-------------------------------------------------+

sum_y, sum_x

计算二维统计聚合后指定维度的和,方式为 population。

返回类型:DOUBLE

select sum_x(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| sum_x(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 15.0 |
+---------------------------------------------+

stddev_samp_y, stddev_samp_x

计算二维统计聚合后指定维度的标准差,方式为 sample。

返回类型:DOUBLE

select stddev_samp_x(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------+
| stddev_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------+
| 0.5270462766947299 |
+-----------------------------------------------------+

stddev_pop_y, stddev_pop_x

计算二维统计聚合后指定维度的标准差,方式为 population。

返回类型:DOUBLE

select stddev_pop_x(stats_agg(y, x)) from test_stats;
+----------------------------------------------------+
| stddev_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+----------------------------------------------------+
| 0.5 |
+----------------------------------------------------+

var_samp_y, var_samp_x

计算二维统计聚合后指定维度的方差,方式为 sample。

返回类型:DOUBLE

select var_samp_x(stats_agg(y, x)) from test_stats;
+--------------------------------------------------+
| var_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+--------------------------------------------------+
| 0.2777777777777778 |
+--------------------------------------------------+

var_pop_y, var_pop_x

计算二维统计聚合后指定维度的方差,方式为 population。

返回类型:DOUBLE

select var_pop_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| var_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 0.25 |
+-------------------------------------------------+

skewness_samp_y, skewness_samp_x

计算二维统计聚合后指定维度的偏度值,方式为 sample。

返回类型:DOUBLE

select skewness_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| skewness_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| -2.1065000811460203e-16 |
+-------------------------------------------------------+

skewness_pop_y, skewness_pop_x

计算二维统计聚合后指定维度的偏度值,方式为 population。

返回类型:DOUBLE

select skewness_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| skewness_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| -2.220446049250313e-16 |
+------------------------------------------------------+

kurtosis_samp_y, kurtosis_samp_x

计算二维统计聚合后指定维度的峰度值,方式为 sample。

返回类型:DOUBLE

select kurtosis_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| kurtosis_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.8999999999999998 |
+-------------------------------------------------------+

kurtosis_pop_y, kurtosis_pop_x

计算二维统计聚合后指定维度的峰度值,方式为 population。

返回类型:DOUBLE

select kurtosis_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| kurtosis_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.9999999999999998 |
+------------------------------------------------------+

correlation

计算二维统计聚合后的相关。

返回类型:DOUBLE

select correlation(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| correlation(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| 0.0 |
+---------------------------------------------------+

covariance_samp, covariance_pop

计算二维统计聚合后的协方差。

返回类型:DOUBLE

select covariance_samp(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| covariance_samp(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.0 |
+-------------------------------------------------------+
select covariance_pop(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| covariance_pop(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.0 |
+------------------------------------------------------+

determination_coeff

计算二维统计聚合后的决定系数。

返回类型:DOUBLE

select determination_coeff(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------------+
| determination_coeff(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------------+
| 0.0 |
+-----------------------------------------------------------+

slope

根据二维统计聚合,计算线性拟合线的斜率。

返回类型:DOUBLE

select slope(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| slope(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 0.0 |
+---------------------------------------------+

intercept

计算二维统计聚合后y的截距。

返回类型:DOUBLE

select intercept(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| intercept(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 3.0 |
+-------------------------------------------------+

x_intercept

计算二维统计聚合后x的截距。

返回类型:DOUBLE

select x_intercept(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| x_intercept(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| -inf |
+---------------------------------------------------+

gauge_agg

分析Gauge数据。与Counter不同,Gauge可以减少也可以增加。

gauge_agg

gauge_agg(time, value)

这是分析 Gauge 数据的第一步。使用 gauge_agg 创建中间聚合数据, 接下来其他函数使用中间聚合数据进行计算。

参数

  • time: Timestamp

  • value: DOUBLE

返回值

Struct {
first: Struct {
ts: Timestamp,
value: Double
},
second: Struct {
ts: Timestamp,
value: Double
},
penultimate: Struct {
ts: Timestamp,
val: Double
},
last: Struct {
ts: Timestamp,
val: Double
},
num_elements: Bigint Unsingned
}

示例:

select gauge_agg(time, pressure) from air group by date_trunc('month', time);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gauge_agg(air.time,air.pressure) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {first: {ts: 2023-03-01T00:00:00, val: 54.0}, second: {ts: 2023-03-01T00:00:00, val: 59.0}, penultimate: {ts: 2023-03-14T16:00:00, val: 55.0}, last: {ts: 2023-03-14T16:00:00, val: 80.0}, num_elements: 13122} |
| {first: {ts: 2023-01-14T16:00:00, val: 63.0}, second: {ts: 2023-01-14T16:00:00, val: 68.0}, penultimate: {ts: 2023-01-31T23:57:00, val: 77.0}, last: {ts: 2023-01-31T23:57:00, val: 54.0}, num_elements: 16640} |
| {first: {ts: 2023-02-01T00:00:00, val: 54.0}, second: {ts: 2023-02-01T00:00:00, val: 60.0}, penultimate: {ts: 2023-02-28T23:57:00, val: 74.0}, last: {ts: 2023-02-28T23:57:00, val: 59.0}, num_elements: 26880} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

delta

获取一段时间内Gauge的变化。这是简单的增量,通过从第一个值减去最后一个看到的值来计算。

返回值: Double

select delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+-----------------------------------------+
| delta(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------+
| 26.0 |
| -9.0 |
| 5.0 |
+-----------------------------------------+

time_delta

获取持续时间,最后一个Gauge的时间减去第一个Gauge的时间。

返回值: INTERVAL

select time_delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------------------------+
| time_delta(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------------------+
| 0 years 0 mons 13 days 16 hours 0 mins 0.000000000 secs |
| 0 years 0 mons 17 days 7 hours 57 mins 0.000000000 secs |
| 0 years 0 mons 27 days 23 hours 57 mins 0.000000000 secs |
+----------------------------------------------------------+

rate

计算Gauge变化和时间变化的比率。

返回值: Double

单位:

时间单位是ns时,比率单位是 /ns,

时间单位是ms时,比率单位就是 /ms

时间单位是s时,比率单位就是 /s

select rate(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------+
| rate(gauge_agg(air.time,air.pressure)) |
+----------------------------------------+
| 2.2018970189701897e-14 |
| 9.349414325974008e-15 |
| -4.133905465849807e-16 |
+----------------------------------------+

first_time

取得Gauge中最小的时间戳

返回类型: TIMESTAMP

select first_time(gauge_agg(time, pressure)) from air;
+----------------------------------------------+
| first_time(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------+
| 2023-01-14T16:00:00 |
+----------------------------------------------+

last_time

取得Gauge中最大的时间戳

返回类型: TIMESTAMP

select last_time(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| last_time(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 2023-03-14T16:00:00 |
+---------------------------------------------+

first_val

取得Gauge中最小时间戳对应的值

返回类型: gauge_agg 中指定的列的类型

select first_val(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| first_val(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 68.0 |
+---------------------------------------------+

last_val

取得Gauge中最大时间戳对应的值

返回类型: gauge_agg 中指定的列的类型

select last_val(gauge_agg(time, pressure)) from air;
+--------------------------------------------+
| last_val(gauge_agg(air.time,air.pressure)) |
+--------------------------------------------+
| 80.0 |
+--------------------------------------------+

idelta_left

计算Gauge最早的瞬时变化。这等于第二个值减去第一个值。

返回类型:gauge_agg 中指定的列的类型

 select time, station, pressure from air where station = 'XiaoMaiDao' order by time limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 63.0 |
| 2023-01-14T16:03:00 | XiaoMaiDao | 58.0 |
| 2023-01-14T16:06:00 | XiaoMaiDao | 65.0 |
| 2023-01-14T16:09:00 | XiaoMaiDao | 52.0 |
+---------------------+------------+----------+
select idelta_left(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+-----------------------------------------------+
| idelta_left(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------------+
| -5.0 |
+-----------------------------------------------+

idelta_right

计算Gauge最晚的瞬时变化。这等于最后一个值值减去倒数第二个值。

返回类型:gauge_agg 中指定的列的类型

select time, station, pressure from air where station = 'XiaoMaiDao' order by time desc limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-03-14T16:00:00 | XiaoMaiDao | 55.0 |
| 2023-03-14T15:57:00 | XiaoMaiDao | 62.0 |
| 2023-03-14T15:54:00 | XiaoMaiDao | 75.0 |
| 2023-03-14T15:51:00 | XiaoMaiDao | 61.0 |
+---------------------+------------+----------+
select idelta_right(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+------------------------------------------------+
| idelta_right(gauge_agg(air.time,air.pressure)) |
+------------------------------------------------+
| -7.0 |
+------------------------------------------------+

compact_state_agg

给定一个在离散状态之间切换的系统或值,

汇总每个状态所花费的时间。

例如,您可以使用compact_state_agg函数来跟踪系统

处于错误、运行或启动状态的时间。

compact_state_agg 设计用于处理相对较少的状态。它可能在行之间状态过多的数据集上表现不佳。

如果您需要跟踪进入和退出每个状态的时间,请使用 state_agg 函数。

如果您需要根据心跳信号跟踪系统的活跃度,请考虑使用 heartbeat_agg 函数。

compact_state_agg

compact_state_agg(ts, state)

统计每个状态所花费的时间,并聚合成StateAggData类型

参数: ts 为时间戳类型,state为任意类型

返回值: StateAggData 类型

Struct {
state_duration: List[
Struct{
state: any,
interval: duration
},
...
]
state_periods: List[
Struct(
state: any,
periods: List[
Struct {
start_time: timestamp,
end_time: timestamp
},
...
]
),
......
]
}

示例:

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select compact_state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| compact_state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}, {state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}], state_periods: []} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

统计某个状态的持续时间,或统计某个状态在某个时间段内的持续时间。

参数

state_agg_data: StateAggData

state: any 与compact_state_agg 的state类型相同。

begin_time: 可选,指定时间段内的开始时间。

interval_time: 可选,指定时间段的持续时间,不指定时,时间段为无穷大。

返回值: INTERVAL 类型

示例:

select duration_in(compact_state_agg(time, state), 'running') from states;
+--------------------------------------------------------------------------+
| duration_in(compact_state_agg(states.time,states.state),Utf8("running")) |
+--------------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+--------------------------------------------------------------------------+

state_agg

给定一个在离散状态之间切换的系统或值,跟踪状态之间的转换。

state_agg

state_agg(ts, state)

统计每个状态所花费的时间,,并聚合成StateAggData类型。

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}], state_periods: [{state: starting, periods: [{start_time: 2020-01-01T10:00:00, end_time: 2020-01-01T10:30:00}, {start_time: 2020-01-03T18:30:00, end_time: 2020-01-03T19:30:00}]}, {state: error, periods: [{start_time: 2020-01-03T16:00:00, end_time: 2020-01-03T18:30:00}]}, {state: running, periods: [{start_time: 2020-01-01T10:30:00, end_time: 2020-01-03T16:00:00}, {start_time: 2020-01-03T19:30:00, end_time: 2020-01-05T12:00:00}]}]} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

统计某个状态的持续时间,或统计某个状态在某个时间段内的持续时间。

参数

state_agg_data: StateAggData

state: any 与compact_state_agg 的state类型相同。

begin_time: 可选,指定时间段内的开始时间。

interval_time: 可选,指定时间段的持续时间,不指定时,时间段为无穷大。

返回值: INTERVAL 类型

示例:

统计 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running') from states;
+------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running")) |
+------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+------------------------------------------------------------------+

统计 从2020-01-01 11:00:00 开始 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00') 
from states;
+----------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00")) |
+----------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 21 hours 30 mins 0.000000000 secs |
+----------------------------------------------------------------------------------------------+

统计 从2020-01-01 11:00:00 开始的四天内 'running' 状态的持续时间。

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00', interval '4 day')
from states;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00"),IntervalMonthDayNano("73786976294838206464")) |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 20 hours 30 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------+

state_at

state_at(state_agg_data, ts)

统计某一时刻所处的状态。

参数

  • state_agg_data: StateAggData

  • ts: Timestamp

返回值: any 类型,与state_agg_data 统计的状态类型相同。

select state_at(state_agg(time, state), Timestamp '2020-01-01 10:30:00') from states;
+---------------------------------------------------------------------------+
| state_at(state_agg(states.time,states.state),Utf8("2020-01-01 10:30:00")) |
+---------------------------------------------------------------------------+
| running |
+---------------------------------------------------------------------------+

candlestick_agg

进行金融资产数据分析。该功能使编写涉及 candlestick 财务分析查询变得更容易。

candlestick_agg 能得到股票的开盘价和收盘价,何时最高价。

candlestick_agg 从原始报价数据生成中间聚合数据CandleStackData ,

然后可以对此中间聚合数据使用访问函数和汇总函数。

candlestick_agg

candlestick_agg(time, price, volume)

从原始报价查询生成中间聚合数据 CandleStackData 。

参数:

  • time: Timestamp

  • price: Double 价格

  • volume: Double 交易量

返回值: CandleStackData

Struct {
open: Struct {
ts: Timestamp,
val: Double,
},
close: Struct {
ts: Timestamp,
val: Double,
},
high: Struct {
ts: Timestamp,
val: Double,
},
low: Struct {
ts: Timestamp,
val: Double
},
volume: Struct {
vol: Double,
vwap: Double,
}
}

示例

alter database public set ttl '1000000d';
create table if not exists tick(price bigint ,volume bigint);
insert tick(time, price, volume)
values
('1999-12-31 00:00:00.000', 111, 444),
('1999-12-31 00:00:00.005', 222, 444),
('1999-12-31 00:00:00.010', 333, 222),
('1999-12-31 00:00:10.015', 444, 111),
('1999-12-31 00:00:10.020', 222, 555),
('1999-12-31 00:10:00.025', 333, 555),
('1999-12-31 00:10:00.030', 444, 333),
('1999-12-31 01:00:00.035', 555, 222);
select candlestick_agg(time, price, volume) from tick;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| candlestick_agg(tick.time,tick.price,tick.volume) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {open: {ts: 1999-12-31T00:00:00, val: 111.0}, close: {ts: 1999-12-31T01:00:00.035, val: 555.0}, low: {ts: 1999-12-31T00:00:00, val: 111.0}, high: {ts: 1999-12-31T01:00:00.035, val: 555.0}, volume: {vol: 2886.0, vwap: 850149.0}} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

close

close(candlestick_agg_data)

获取收盘价。

返回值: DOUBLE

示例

select close(candlestick_agg(time, price, volume)) from tick;
+----------------------------------------------------------+
| close(candlestick_agg(tick.time,tick.price,tick.volume)) |
+----------------------------------------------------------+
| 555.0 |
+----------------------------------------------------------+

close_time

close_time(candlestick_agg_data)

获取收盘时间。

返回值: Timestamp

示例

select close_time(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------------+
| close_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+---------------------------------------------------------------+

high

high(candlestick_agg_data)

获取最高价。

返回值: DOUBLE

示例

select high(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| high(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 555.0 |
+---------------------------------------------------------+

high_time

high_time(candlestick_agg_data)

获取最高价所在的时间。

返回值: DOUBLE

示例

select high_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| high_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+--------------------------------------------------------------+

low

low(candlestick_agg_data)

获取最低价。

返回值: DOUBLE

示例

select low(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------+
| low(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------+
| 111.0 |
+--------------------------------------------------------+

low_time

low_time(candlestick_agg_data)

获取最低价所在的时间。

返回值: Timestamp

示例

select low_time(candlestick_agg(time, price, volume)) from tick;
+-------------------------------------------------------------+
| low_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+-------------------------------------------------------------+

open

open(candlestick_agg_data)

获取最低价。

返回值: DOUBLE

示例

select open(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| open(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 111.0 |
+---------------------------------------------------------+

open_time

open_time(candlestick_agg_data)

获取最低价所在的时间。

返回值: Timestamp

示例

select open_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| open_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+--------------------------------------------------------------+

volume

volume(candlestick_agg_data)

获取总共交易量。

返回值: DOUBLE

示例

select volume(candlestick_agg(time, price, volume)) from tick;
+-----------------------------------------------------------+
| volume(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-----------------------------------------------------------+
| 2886.0 |
+-----------------------------------------------------------+

vwap

vwap(candlestick_agg_data)

获取成交量加权平均价格。

返回值: DOUBLE

示例

select vwap(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| vwap(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 294.5769230769231 |
+---------------------------------------------------------+

函数

数学函数

abs(x)

功能:返回x的绝对值。

参数类型:数值类型

返回类型:与函数参数类型一致

示例

SELECT abs(-1);
+----------------+
| abs(Int64(-1)) |
+----------------+
| 1 |
+----------------+

acos(x)

功能:返回x的反余弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT acos(3);
+----------------+
| acos(Int64(3)) |
+----------------+
| NaN |
+----------------+
SELECT acos(0.5);
+--------------------+
| acos(Float64(0.5)) |
+--------------------+
| 1.0471975511965976 |
+--------------------+

asin(x)

功能:返回x的反正弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT asin(0.5);
+--------------------+
| asin(Float64(0.5)) |
+--------------------+
| 0.5235987755982988 |
+--------------------+
SELECT asin(5);
+----------------+
| asin(Int64(5)) |
+----------------+
| NaN |
+----------------+

atan(x)

功能:返回x的反正切值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT atan(5);
+-------------------+
| atan(Int64(5)) |
+-------------------+
| 1.373400766945016 |
+-------------------+

atan2(y,x)

功能:返回y/x的反正切值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT atan2(10, 2);
+---------------------------+
| atan2(Int64(10),Int64(2)) |
+---------------------------+
| 1.3734008 |
+---------------------------+

ceil(x)

功能:向上取整。

参数类型:数值类型

返回类型:BIGINT

示例

SELECT ceil(1.6);
+--------------------+
| ceil(Float64(1.6)) |
+--------------------+
| 2 |
+--------------------+

floor(x)

功能:向下取整

参数类型:数值类型

返回类型:BIGINT

示例

SELECT floor(-3.1);
+----------------------+
| floor(Float64(-3.1)) |
+----------------------+
| -4 |
+----------------------+

cos(x)

功能:返回x的余弦值。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT cos(1);
+--------------------+
| cos(Int64(1)) |
+--------------------+
| 0.5403023058681398 |
+--------------------+

sin(x)

功能:x的正弦值

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT sin(5);
+---------------------+
| sin(Int64(5)) |
+---------------------+
| -0.9589242746631385 |
+---------------------+

exp(x)

功能:返回e的x次方。

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT exp(1);
+-------------------+
| exp(Int64(1)) |
+-------------------+
| 2.718281828459045 |
+-------------------+

ln(x)

功能:自然对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT ln(2.718281828459045);
+--------------------------------+
| ln(Float64(2.718281828459045)) |
+--------------------------------+
| 1 |
+--------------------------------+

log(x) | log10(x)

功能:以10为底的对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT log(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+
SELECT log10(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+

log2(x)

功能:以 2 为底的对数

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT log2(4);
+----------------+
| log2(Int64(4)) |
+----------------+
| 2 |
+----------------+

power(x,y) | pow(x,y)

功能:x的y次方

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT power(2, 3);
+--------------------------+
| power(Int64(2),Int64(3)) |
+--------------------------+
| 8 |
+--------------------------+

round(x)

功能:四舍五入到最接近的整数

参数类型:数值类型

返回类型:BIGINT

示例

SELECT round(3.5);
+---------------------+
| round(Float64(3.5)) |
+---------------------+
| 4 |
+---------------------+

signum(x)

功能:参数的符号 (-1, 0, +1)

参数类型:数值类型

返回类型:BIGINT

示例

SELECT signum(-3);
+-------------------+
| signum(Int64(-3)) |
+-------------------+
| -1 |
+-------------------+

sqrt(x)

功能:x的平方根

参数类型:数值类型

返回类型:与函数参数类型一致

示例

SELECT sqrt(4);
+----------------+
| sqrt(Int64(4)) |
+----------------+
| 2 |
+----------------+

tan(x)

功能:x的正切值

参数类型:数值类型

返回类型:DOUBLE

示例

SELECT tan(1);
+-------------------+
| tan(Int64(1)) |
+-------------------+
| 1.557407724654902 |
+-------------------+

trunc(x)

功能:向零取整

参数类型:数值类型

返回类型:BIGINT

示例

SELECT trunc(-3.9);
+----------------------+
| trunc(Float64(-3.9)) |
+----------------------+
| -3 |
+----------------------+

struct

语法

struct(expr1 [, ...] ) 

功能:创建一个具有指定字段值的 STRUCT。

参数类型:数值类型

注意:struct函数目前功能并不完善


条件函数

coalesce

语法

coalesce(expr[,...exp])

功能:返回其第一个不为空的参数。只有当所有参数都为 null 时才返回 Null。当检索数据以进行显示时,它通常用于将默认值替换为空值。

参数类型:任意

返回类型:第一个不为null的参数类型

示例

SELECT coalesce(temperature, null, station) FROM air;
+--------------------------------------------+
| coalesce(air.temperature,NULL,air.station) |
+--------------------------------------------+
| 69.0 |
| 78.0 |
| 62.0 |
| 79.0 |
| 53.0 |
| 72.0 |
| 71.0 |
| 69.0 |
| 80.0 |
| 74.0 |
| 70.0 |
| 70.0 |
| 70.0 |
+--------------------------------------------+

nullif

语法

nullif(expr1, expr2) 

功能:如果 expr1 等于 expr2,则返回 NULL;否则返回 expr1。

参数类型:expr1,expr2为数值类型,且为带列值的表达式

返回类型:expr1的类型或NULL

示例

SELECT nullif(temperature, 70) FROM air;
+-----------------------------------+
| nullif(air.temperature,Int64(70)) |
+-----------------------------------+
| 69 |
| 78 |
| 62 |
| 79 |
| 53 |
| 72 |
| 71 |
| 69 |
| 80 |
| 74 |
| |
| |
| |
+-----------------------------------+

字符串函数

ascii

语法

ascii(str) 

功能: 将 str 中的第一个字符转换成其ASCII 码后返回。

参数类型:STRING

返回类型:BIGINT

示例

SELECT ascii('abc');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+
SELECT ascii('a');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+

bit_length

语法

bit_length(str) 

功能:返回字符串数据的位长度或二进制数据的位数。

参数类型:STRING

返回类型:BIGINT

示例

SELECT bit_length('abc');
+------------------------+
| bitlength(Utf8("abc")) |
+------------------------+
| 24 |
+------------------------+

btrim

语法

btrim(string [, matching_string ] ) 

功能:函数通过删除前导空格和尾随空格或删除与可选的指定字符串匹配的字符来剪裁字符串。

参数类型:STRING

返回类型: STRING

示例

SELECT btrim('     abc                  ');
+-------------------------------------------+
| btrim(Utf8(" abc ")) |
+-------------------------------------------+
| abc |
+-------------------------------------------+
SELECT btrim('111abc111','1');
+------------------------------------+
| btrim(Utf8("111abc111"),Utf8("1")) |
+------------------------------------+
| abc |
+------------------------------------+

trim

语法

trim(str) 

功能:删除str首尾的空白字符

参数类型:STRING

返回类型:STRING


char_length | character_length

语法

char_length(expr) 

功能:以字符数形式返回指定字符串的长度。

参数类型:STRING

返回类型:BIGINT

示例

SELECT char_length('你好');
+-------------------------------+
| characterlength(Utf8("你好")) |
+-------------------------------+
| 2 |
+-------------------------------+

chr

语法

chr(expr) 

功能:返回位于提供的 UTF-16 码位的字符。

参数类型: BIGINT

返回类型: STRING

示例

SELECT chr(20005);
+-------------------+
| chr(Int64(20005)) |
+-------------------+
| 严 |
+-------------------+

concat

语法

concat(expr1, expr2 [, ...exp] ) 

功能:联接两个或两个以上表达式并返回生成的表达式。

参数类型:STRING

返回类型: STRING

示例

SELECT concat('a', 'b', 'c');
+---------------------------------------+
| concat(Utf8("a"),Utf8("b"),Utf8("c")) |
+---------------------------------------+
| abc |
+---------------------------------------+

concat_ws

语法

concat_ws(sep , expr1 [, ...] ) 

功能:返回由 sep 分隔的串联字符串。

参数类型:STRING

返回类型:STRING

示例

SELECT concat_ws(' ', 'a', 'b', 'c');
+--------------------------------------------------------------+
| concatwithseparator(Utf8(" "),Utf8("a"),Utf8("b"),Utf8("c")) |
+--------------------------------------------------------------+
| a b c |
+--------------------------------------------------------------+

initcap

语法

initcap(expr) 

功能:将参数中每个单词的首字母大写。

参数类型:STRING

返回类型:BIGINT

示例

SELECT initcap('hello world');
+------------------------------+
| initcap(Utf8("hello world")) |
+------------------------------+
| Hello World |
+------------------------------+

left

语法

left(str, len) 

功能:返回 str 中最左边的 len 个字符。

参数类型:str为STRING类型,len为BIGINT类型

返回类型:STRING

示例

SELECT left('abcde', 3);
+------------------------------+
| left(Utf8("abcde"),Int64(3)) |
+------------------------------+
| abc |
+------------------------------+

lpad

语法

lpad(expr, len [, pad] ) 

功能:返回 expr左侧填充了 pad,填充后长度为 len。

参数类型:expr, pad 类型为 STRING, len类型为BIGINT

返回类型:BIGINT

当len为负数时,len表现为0,当len过大,函数执行失败

示例

SELECT lpad('abc', 10, '1');
+---------------------------------------+
| lpad(Utf8("abc"),Int64(10),Utf8("1")) |
+---------------------------------------+
| 1111111abc |
+---------------------------------------+

rpad

语法

rpad(expr, len [, pad] ) 

功能:返回右侧填充了 pad 的 expr,填充后整个字符的长度为 len。

参数类型:expr, pad 类型为 STRING, len类型为BIGINT

返回类型:STRING

示例

SELECT rpad('aaa', 10, 'b');
+---------------------------------------+
| rpad(Utf8("aaa"),Int64(10),Utf8("b")) |
+---------------------------------------+
| aaabbbbbbb |
+---------------------------------------+

lower

语法

lower(expr) 

功能:返回字母小写。

参数类型:STRING

返回类型:STRING

示例

SELECT lower('ABC');
+--------------------+
| lower(Utf8("ABC")) |
+--------------------+
| abc |
+--------------------+

upper

语法

upper(expr)

功能:返回将 expr 的所有字符均更改为大写后的结果。

参数类型:STRING

返回类型:STRING


ltrim

语法

ltrim(str[, trimstr] ) 

功能:返回 str,其中删除了 trimStr 内的前导字符。默认trimestr为空白符

参数类型:STRING

返回类型:STRING

示例

SELECT ltrim('   abc');
+-----------------------+
| ltrim(Utf8(" abc")) |
+-----------------------+
| abc |
+-----------------------+

md5

语法

md5(expr) 

功能:以十六进制字符串形式返回 expr 的 MD5 128 位校验和。

参数类型:STRING

返回类型:STRING

示例

SELECT md5('abc');
+----------------------------------+
| md5(Utf8("abc")) |
+----------------------------------+
| 900150983cd24fb0d6963f7d28e17f72 |
+----------------------------------+

octet_length

语法

octet_length(expr) 

功能:返回字符串数据的字节长度。

参数类型:STRING

返回类型:BIGINT

示例

SELECT octet_length('你好');
+---------------------------+
| octetlength(Utf8("你好")) |
+---------------------------+
| 6 |
+---------------------------+

random