Skip to main content
Version: 2.3.2

SQL Reference

Database

Data Types

TypeDescriptionSize
BIGINTInteger8 Bytes
BIGINT UNSIGNEDUnsigned Integer8 Bytes
BOOLEANBoolean Type1 Byte
TIMESTAMPTime Stamp8 Bytes
STRINGUTF-8 Encoded String-----
DOUBLEDouble Precision Floating Point8 Bytes

Other Data Types

The following data types can't be stored directly, but can appear in SQL expressions.

TypeDescriptionRemarks
BINARYBinary data,can be converted to STRING using Cast clause.The return values of functions sha224, sha256, sha384, sha512 belong to this type.
INTERVALTime IntervalRequired by time addition or subtraction and function data_bin's parameters.
ARRAYArray TypeAggregate function array_agg's return type .

Constant

TypeSyntaxDescription
BIGINT[{+-}]123Numeric type
BIGINT UNSIGNED[+]123Numeric type
DOUBLE123.45Numerical type, scientific notation is not supported at present.
BOOLEAN{true | false | t | f}
STRING'abc'The double quotation mark format is not supported. Two consecutive '' in
TIMESTAMPTIMESTAMP '1900-01-01T12:00:00Z'Timestamp, the keyword TIMESTAMP indicates that the following string constant need to be interpreted as TIMESTAMP type.
GeometryClick to jumpGeometric type
--NULLNull Value

TIMESTAMP constant syntax

The time stamp is based on RCF3339 standard.

T represents interval, which can only be replaced by space

Z represents zero time zone

+08:00 represents the East 8th District

as follows:

  • 1997-01-31T09:26:56.123Z # Standard RCF3339, UTC time zone
  • 1997-01-31T09:26:56.123+08:00 # Standard RCF3339, East 8th District
  • 1997-01-31 09:26:56.123+08:00 # Close to RCF3339, just replace T by space
  • 1997-01-31T09:26:56.123 # Close to RCF3339, no time zone is specified, defaults to UTC
  • 1997-01-31 09:26:56.123 # Close to RCF3339, replace T by space, and no time zone is specified
  • 1997-01-31 09:26:56 # Close to RCF3339, replace T by space, and no time zone is specified, the accuracy is on the order of seconds

Note: CAST (BIGINT AS TIMESTAMP) is a timestamp converted to nanosecond, as follows:

SELECT CAST (1 AS TIMESTAMP);
+-------------------------------+
| Int64(1) |
+-------------------------------+
| 1970-01-01T00:00:00.000000001 |
+-------------------------------+

INTERVAL Constant

Example

  1. INTERVAL '1' One second
  2. INTERVAL '1 SECONDE' One second
  3. INTERVAL '1 MILLISECONDS' One millisecond
  4. INTERVAL '1 MINUTE' One minute
  5. INTERVAL '0.5 MINUTE' Half a minute
  6. INTERVAL '1 HOUR' One hour
  7. INTERVAL '1 DAY' One day
  8. INTERVAL '1 DAY 1' One day and one second
  9. INTERVAL '1 WEEK' One week
  10. INTERVAL '1 MONTH' One month(30 days)
  11. INTERVAL '0.5 MONTH' Half a month(15 days)
  12. INTERVAL '1 YEAR' One year(12 months)
  13. INTERVAL '1 YEAR 1 DAY 1 HOUR 1 MINUTE' One year, one day, one hour, one minute
  14. INTERVAL '1 DECADES' One decade(10 years)

Notice:

INTERVAL '1 YEAR' is not 365 days or 366 days, but 12-months. INTERVAL '1 MONTH' is not 29 days or 31 days, but 30 days.

Geometry

WKT

The WKT format is a text format used to describe the spatial characteristics of 2D and 3D geometric objects. WKT stands for "Well-Known Text" and is an open international standard. The WKT format includes some basic geometric objects, such as points, lines, polygons and circles, and some composite objects, such as collections of polygons and collections of geometric objects.

Syntax

<geometry tag> <wkt data>
<geometry tag> ::= POINT | LINESTRING | POLYGON | MULTIPOINT |
MULTILINESTRING | MULTIPOLYGON | GEOMETRYCOLLECTION

<wkt data> ::= <point> | <linestring> | <polygon> | <multipoint> |
<multilinestring> | <multipolygon> | <geometrycollection>
geometry objectsyntax descriptions
PointPOINT (<x1> <y1>)
LinestringLINESTRING (<x1> <y1>, <x2> <y2>, ...)
PolygonPOLYGON ((<x1> <y1>, <x2> <y2>, ...))
Multi-pointMULTIPOINT (<x1> <y1>, <x2> <y2>, ...)
Multi-linestringMULTILINESTRING ((<x1> <y1>, <x2> <y2>, ...), (<x1> <y1>, <x2> <y2>, ...))
Multi-polygonMULTIPOLYGON (((<x1> <y1>, <x2> <y2>, ...)), ((<x1> <y1>, <x2> <y2>, ...)))
Geometry-collectionGEOMETRYCOLLECTION (<geometry tag1> <wkt data1>, <geometry tag2> <wkt data2>, ...)

Example

geometry objectimageexample
PointPOINT (30 10)
LinestringLINESTRING (30 10, 10 30, 40 40)
PolygonPOLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
POLYGON ((35 10, 45 45, 15 40, 10 20, 35 10), (20 30, 35 35, 30 20, 20 30))
Multi-pointMULTIPOINT ((10 40), (40 30), (20 20), (30 10))
MULTIPOINT (10 40, 40 30, 20 20, 30 10)
Multi-linestringMULTILINESTRING ((10 10, 20 20, 10 40), (40 40, 30 30, 40 20, 30 10))
Multi-polygonMULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5)))
MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))
Geometry-collectionGEOMETRYCOLLECTION (POINT (40 10), LINESTRING (10 10, 20 20, 10 40), POLYGON ((40 40, 20 45, 45 30, 40 40)))

Create Database

Syntax

CREATE DATABASE [IF NOT EXISTS] db_name [WITH db_options];

db_options:
db_option ...

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
| PRECISION {'ms' | 'us' | 'ns'}
}

Parameters Description

  1. TTL: represents the saving time of the data file, defaults to 365 days, expressed in data with units. It supports day(d), hour(h), minute(m), such as TTL 10d, TTL 50h, TTL 100m.When no unit, the default is day, such as TTL 30.
  2. SHARD: represents the number of data partitions, defaults to 1.
  3. VNODE_DURATION: represents the time range of data in the shard, defaults to 365 days, and also expressed by data with units.Its data meaning is consistent with the value of TTL.
  4. REPLICA: represents the number of replicas of data in the cluster, defaults to 1 (the number of replicas is not larger than the number of distributed data nodes).
  5. PRECISION: The timestamp accuracy of the database. ms represents milliseconds, us represents microseconds, ns represents nanoseconds,defaults to ns.

Example

> CREATE DATABASE oceanic_station;
Query took 0.062 seconds.

Show All Databases

Syntax

SHOW DATABASES;

Example

SHOW DATABASES;
+-----------------+
| database_name |
+-----------------+
| oceanic_station |
| public |
+-----------------+

Use Database

If you use the database through HTTP API, you can specify the parameter db=database_name in the url to use the database.

In CnosDB-Cli, you can use the following command to switch to the specified database.

\c dbname
public ❯ \c oceanic_station
oceanic_station ❯

Drop Database

Syntax

DROP DATABASE [IF EXISTS] db_name [AFTER '7d'];

When not with AFTER, it will be deleted immediately.

When with AFTER, it is delayed deletion, which will be deleted after the specified time. The time supports days (d), hours (h), and minutes (m), such as 10d, 50h, 100m. When there is no unit, the default is day. The tenant is not visible and unavailable during the delayed deletion period.

Syntax

RECOVER DATABASE [IF EXISTS] db_name;

Delay deletion is cancelled and the tenant returns to normal.

Notice: Only resources that are delayed deletion and during the delayed deletion period can be recovered by executing the RECOVER statement.

Example

DROP DATABASE oceanic_station AFTER7d’;

RECOVER DATABASE oceanic_station;
Query took 0.030 seconds.

Alter Database Parameters

Syntax

ALTER DATABASE db_name [alter_db_options]

alter_db_options:
SET db_option

db_option: {
TTL value
| SHARD value
| VNODE_DURATION value
| REPLICA value
}

Example

ALTER DATABASE oceanic_station SET TTL '30d';

Describe Database Parameters

Syntax

DESCRIBE DATABASE dbname;

Example

DESCRIBE DATABASE oceanic_station;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

Table

Create Table

You can use CREATE TABLE to create tables.

CnosDB supports the creation of common tables and external tables.

Create Common (TSKV) Table

Syntax

CREATE TABLE [IF NOT EXISTS] tb_name
(field_definition [, field_definition ] ... [, TAGS(tg_name [, tg_name] ...)]);

field_definition:
column_name data_type [field_codec_type]

field_codec_type:
CODEC(code_type)

Instructions

  1. There is no need to create a timestamp column when creating a table. The system automatically adds a timestamp column named "time".
  2. The names of the columns need to be different.
  3. If the compression algorithm is not specified when creating a table, the system default compression algorithm is used.
  4. At present, the compression algorithms supported by various types are as follows. The first one of each type is the default specified algorithm. NULL means no compression algorithm is used.
  • BIGINT/BIGINT UNSIGNED: DELTA, QUANTILE, SDT, DEADBAND, NULL
  • DOUBLE: GORILLA, QUANTILE, SDT, DEADBAND, NULL
  • STRING: SNAPPY, ZSTD, GZIP, BZIP, ZLIB, NULL
  • BOOLEAN: BITPACK, NULL

For more information about the compression algorithm, see the details of the compression algorithm.

Example

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.033 seconds.

Create External Table

Syntax

-- Column definitions can not be specified for PARQUET files

CREATE EXTERNAL TABLE [ IF NOT EXISTS ] tb_name
( field_definition [, field_definition] ... ) tb_option;

field_definition:
column_name data_type [ NULL ]

tb_option: {
STORED AS { PARQUET | JSON | CSV | AVRO }
| [ WITH HEADER ROW ]
| [ DELIMITER 'a_single_char' ]
| [ PARTITIONED BY ( column_name, [, ... ] ) ]
| LOCATION '/path/to/file'
}

Instruction

  1. External tables do not exist in the database, but an operating system file is accessed as a common database table.
  2. The data is read-only and cannot be DML operated or indexed.

Parameter Description

  1. STORED AS: represents the format in which the file is stored. Currently, PARQUET, JSON, CSV and AVRO formats are supported.
  2. WITH HEADER ROW: Effective only in csv file format, representing with csv header.
  3. DELIMITER: only effective in csv format, representing the delimiter of column data.
  4. PARTITIONED BY: use the column specified when creating the table to partition.
  5. LOCATION: represents the location of the associated file.

Example

CREATE EXTERNAL TABLE cpu (
cpu_hz DECIMAL(10,6) NOT NULL,
temp DOUBLE NOT NULL,
version_num BIGINT NOT NULL,
is_old BOOLEAN NOT NULL,
weight DECIMAL(12,7) NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION 'tests/data/csv/cpu.csv';
Query took 0.031 seconds.

Drop Table

Syntax

DROP TABLE [ IF EXISTS ] tb_name;

Example

DROP TABLE IF EXISTS air;
Query took 0.033 seconds.

Show Tables of Current Database

Syntax

SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+

Describe Table

You can use DESCRIBE TABLE to view the table structure.

Syntax

DESCRIBE DATABASE table_name;

Example

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+

Alter Table

Explanation

At present, we support altering common tables.

  1. Add Column: add field and tag columns.
  2. Drop Column: drop the field column. When dropping a column results in dropping the last field value of a row, we think that this row has no value, and this row will not be showed in SELECT.
  3. Alter Column: alter the column definition. Currently supports changing column names and modifying compression algorithms for columns.

Syntax

ALTER TABLE tb_name alter_table_option;

alter_table_option: {
ADD TAG col_name
| ADD FIELD col_name [CODEC(code_type)]
| ALTER col_name SET CODEC(code_type)
| DROP col_name
| RENAME COLUMN col_name TO new_col_name
}

Example

ALTER TABLE air ADD TAG height;
ALTER TABLE air ADD FIELD humidity DOUBLE CODEC(DEFAULT);
ALTER TABLE air ALTER humidity SET CODEC(QUANTILE);
ALTER TABLE air DROP humidity;
ALTER TABLE air RENAME COLUMN height to height_v2;

INSERT

CnosDB supports two data inserting methods: one is to use the INSERT INTO statement, and the other is to use the HTTP API write interface to insert Line Protocol format data.

This page only shows INSERT related syntax.

INSERT

Syntax

INSERT [INTO] tb_name [ ( column_name [, ...] ) ] VALUES (  const [, ...] ) [, ...] | query; 

Explanation

CnosDB requires that the inserted data column must have a timestamp, and the VALUES list must be a constant. If a column is not selected, the value is NULL.

Note

The time column cannot be NULL, and the Tag column and Field column can be NULL.

Example: INSERT INTO air (TIME, station, visibility) VALUES(1666132800000000000, NULL, NULL)

If the VALUES list requires an expression, please use the INSERT SELECT syntax.

Insert One Record

Please note that data in the TIME column can be represented by either a time string or a numeric timestamp.

Example

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69, 77);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.044 seconds.
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 06:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 1 |
+------+
Query took 0.032 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

Note

For more information about timezone, please refer to Timestamp.

Insert Multiple Records

The keyword VALUES can be followed by multiple lists separated by ,.

Example

INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 05:40:00', 'XiaoMaiDao', 55, 68, 76),
('2022-10-19 04:40:00', 'XiaoMaiDao', 55, 68, 76);
+------+
| rows |
+------+
| 2 |
+------+
Query took 0.037 seconds.
SELECT * FROM air;
+----------------------------+------------+------------+-------------+-----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+-----------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 | 68 | 76 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 | 69 | 77 |
+----------------------------+------------+------------+-------------+-----------+

Insert Query Results (INSERT SELECT)

You can also use INSERT SELECT to insert query data into the table.

Example

CREATE TABLE air_visibility (
visibility DOUBLE,
TAGS(station)
);
Query took 0.027 seconds.
INSERT air_visibility (TIME, station, visibility)
SELECT TIME, station, visibility FROM air;
+------+
| rows |
+------+
| 4 |
+------+
Query took 0.045 seconds.
SELECT * FROM air_visibility;
+----------------------------+------------+------------+
| time | station | visibility |
+----------------------------+------------+------------+
| 2022-10-18 20:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 21:40:00 | XiaoMaiDao | 55 |
| 2022-10-18 22:40:00 | XiaoMaiDao | 55 |
| 2022-10-19 07:40:00.290401 | XiaoMaiDao | 56 |
+----------------------------+------------+------------+

Insert Duplicate Data

The storage engine of CnosDB can be regarded as a KV storage, in which Timestamp and Tags constitute the KEY, and Fields constitute a series of values.

CREATE TABLE air (
visibility DOUBLE,
temperature DOUBLE,
pressure DOUBLE,
TAGS(station)
);
INSERT INTO air (TIME, station, visibility, temperature) VALUES
(1666165200290401000, 'XiaoMaiDao', 56, 69);

The SQL statement above is equivalent to inserting the following k-v pairs into the database.

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')56
(1666165200290401000, 'XiaoMaiDao')69

The result of the query is as follows:

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 56.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+

Overwriting occurs when duplicate k-v pairs occur in the same field.

INSERT INTO air (TIME, station, visibility) VALUES
(1666165200290401000, 'XiaoMaiDao', 66);

This is equivalent to inserting the following k-v pairs into the database.

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')66

Key in (1666165200290401000, 'XiaoMaiDao') visibility-value has changed, change to 66.

select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | |
+----------------------------+------------+------------+-------------+----------+
INSERT INTO air (TIME, station, pressure) VALUES
(1666165200290401000, 'XiaoMaiDao', 77);

This is equivalent to inserting the following k-v pairs into the database.

keyvisibility-valuetemperature-valuepressure-value
(1666165200290401000, 'XiaoMaiDao')77
select * from air;
----
+----------------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+----------------------------+------------+------------+-------------+----------+
| 2022-10-19T07:40:00.290401 | XiaoMaiDao | 66.0 | 69.0 | 77.0 |
+----------------------------+------------+------------+-------------+----------+

Update Data

Update tag column

Syntax

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause
assignment clause :
tag_name = value_expression

Instructions

  1. CnosDB supports updating single or multiple tag column values separately. It does not support updating tag column and field column at the same time.
  2. CnosDB supports updating the tag column value to NULL.
  3. value_expression can only be an expression whose value can be determined at compile time, such as constant, 1 + 2, CAST('1999-12-31 00:00:00.000 as timestamp)', and so on.
  4. The where_clause must not contain a field or time column, and it must not be null. If you want to update all the data in the table, you need to use where true, which means you accept that there will be performance problems when the table is large.
  5. Changing to an existing series is not supported (all tag column values make up series).
  6. Avoid performing update tag operations while writing data, which may cause series conflicts.

Example

update air set station = 'ShangHai' where station = 'LianYunGang';

Update field column

Syntax

UPDATE table_name SET ( assignment_clause [, ...] ) where_clause

assignment clause :
field_name = value_expression

Instructions

  1. CnosDB supports updating single or multiple field column values separately. It does not support updating tag column and field column at the same time.

Example

update air set pressure = pressure + 100 where pressure = 68 and time < '2023-01-14T16:03:00';

Delete Data

Filter and delete data by tag and time columns.

Syntax

DELETE FROM table_name where_clause

Instructions

  1. where_clause can only contain tag and time columns, not field columns.

Example

delete from air where station = 'LianYunGang' and time < '2023-01-14T16:03:00';

Data Query

CnosDBSQL is inspired by DataFusion, We support most of the SQL syntax of DataFusion.

Note

In order to query more efficiently, the order of each row may not be the same for queries without specified sorting.

Sample Data

To further study CnosDB, this section will provide sample data for you to download and teach you how to import data into the database. The data sources referenced in the following chapters are all from this sample data.

Download Data

If in cnosdb cli, enter \q to exit.

Executing the following command in the shell will generate a local data file named oceanic_station in Line Protocol format.

curl -o oceanic_station.txt https://dl.cnosdb.com/sample/oceanic_station.txt

Import Data

  • Start the CLI

    cnosdb-cli
  • Create the database

    create database oceanic_station;
  • Switch to the specified database

    \c oceanic_station
  • Import data

    Execute the \w command, followed by the absolute path of the data file or the working path relative to cnosdb-cli.

    \w oceanic_station.txt

SQL Syntax

Syntax

[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ OFFSET count ]
[ LIMIT { count | ALL } ];

-- from_item
-- 1.
tb_name [ [ AS ] alias [ ( column_alias [, ...] ) ] ]
-- 2.
from_item join_type from_item
{ ON join_condition | USING ( join_column [, ...] ) }

-- join_type
[ INNER ] JOIN
LEFT [ OUTER ] JOIN
RIGHT [ OUTER ] JOIN
FULL [ OUTER ] JOIN
CROSS JOIN

-- grouping_element
()

SELECT Clause

SELECT *

The wildcard * can be used to refer to all columns.

Example

SELECT * FROM air;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

ALL/DISTINCT

Syntax

SELECT [ ALL | DISTINCT ] select_expression [, ...];

After the keyword SELECT, you can use DISTINCT to remove duplicate fields and return only the values after duplicate removal. Using ALL returns all duplicate values in the field. When this option is not specified, the default value is ALL.

Example

SELECT DISTINCT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
+-------------+------------+
SELECT station, visibility FROM air;
+-------------+------------+
| station | visibility |
+-------------+------------+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+------------+

Alias

You can use the keyword AS to alias a column expression or table.

Alias Column Expression

Syntax

expression [ [ AS ] column_alias ]

Example

SELECT station s, visibility AS v FROM air;
+-------------+----+
| s | v |
+-------------+----+
| XiaoMaiDao | 56 |
| XiaoMaiDao | 50 |
| XiaoMaiDao | 67 |
| XiaoMaiDao | 65 |
| XiaoMaiDao | 53 |
| XiaoMaiDao | 74 |
| XiaoMaiDao | 71 |
| LianYunGang | 78 |
| LianYunGang | 79 |
| LianYunGang | 59 |
| LianYunGang | 67 |
| LianYunGang | 80 |
| LianYunGang | 59 |
+-------------+----+

Alias Table

You can also use the keyword AS to alias the table.

Syntax

FROM tb_name [AS] alias_name

Example

SELECT a.visibility, s.temperature
FROM air AS a JOIN sea s ON a.temperature = s.temperature limit 10;
+------------+-------------+
| visibility | temperature |
+------------+-------------+
| 67 | 62 |
| 50 | 78 |
| 50 | 78 |
| 65 | 79 |
+------------+-------------+

SELECT Limitation

  • If the SELECT clause contains only the Tag column, it is equivalent to the SELECT DISTINCT Tag column.

    Example

    -- station is a Tag column, temperature is a Field Namecolumn.
    SELECT station, temperature FROM air;
    +-------------+-------------+
    | station | temperature |
    +-------------+-------------+
    | XiaoMaiDao | 69 |
    | XiaoMaiDao | 78 |
    | XiaoMaiDao | 62 |
    | XiaoMaiDao | 79 |
    | XiaoMaiDao | 53 |
    | XiaoMaiDao | 72 |
    | XiaoMaiDao | 71 |
    | LianYunGang | 69 |
    | LianYunGang | 80 |
    | LianYunGang | 74 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    | LianYunGang | 70 |
    +-------------+-------------+
    -- station is a Tag column
    SELECT station FROM air;
    +-------------+
    | station |
    +-------------+
    | XiaoMaiDao |
    | LianYunGang |
    +-------------+

LIMIT Clause

Syntax

LIMIT n

Limit the number of rows returned from the result set to n, and n must be non-negative.

Example

SELECT *
FROM air LIMIT 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
+---------------------+-------------+------------+-------------+----------+

OFFSET Clause

Syntax

OFFSET m

The returned result set skips m records. default m=0.

Example

SELECT *
FROM air OFFSET 10;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
+---------------------+-------------+------------+-------------+----------+

OFFSET can be used with the LIMIT statement to specify the number of lines to skip. The format is LIMIT n OFFSET m. LIMIT n controls the output of n rows of data, and OFFSET m indicates the number of rows skipped before starting to return data. OFFSET 0 has the same effect as omitting the OFFSET clause.

Example

SELECT *
FROM air LIMIT 3 OFFSET 3;
+---------------------+------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+------------+------------+-------------+----------+
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
+---------------------+------------+------------+-------------+----------+

WITH Clause

Syntax

WITH cte AS cte_query_definiton [, ...] query

Optional. The WITH clause contains one or more commonly used expressions CTE (Common Table Expression). CTE acts as a temporary table in the current running environment, which you can refer to in subsequent queries.The rules for using CTE are as follows:

  • CTE in the same WITH clause must have a unique name.
  • The CTE defined in the WITH clause can only be used for other CTEs in the same WITH clause defined later. Suppose A is the first CTE in the clause and B is the second CTE in the clause:

Example

SELECT station, avg 
FROM ( SELECT station, AVG(visibility) AS avg
FROM air
GROUP BY station) AS x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+
WITH x AS 
(SELECT station, AVG(visibility) AS avg FROM air GROUP BY station)
SELECT station, avg
FROM x;
+-------------+--------------------+
| station | avg |
+-------------+--------------------+
| XiaoMaiDao | 62.285714285714285 |
| LianYunGang | 70.33333333333333 |
+-------------+--------------------+

UNION Clause

The UNION clause is used to combine the analysis results of multiple SELECT statements.

Syntax

select_clause_set_left
[ UNION | UNION ALL| EXCEPT | INTERSECT]
select_clause_set_right
[sort_list_columns] [limit_clause]

UNION will de-duplicate the merged result set. UNION ALL will retain the same data in the merged result set. EXCEPT will make the difference between the two result sets, return all non-duplicate values not found in the right query from the left query. INTERSECT returns the intersection of the two result sets (that means, all non-duplicate values are returned by both queries).

Note

Each SELECT clause in the UNION must have the same number of columns, and the corresponding columns have the same data type.

Example

  • UNION ALL

    SELECT visibility FROM air WHERE temperature < 60
    UNION ALL
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    +------------+
  • UNION

    SELECT visibility FROM air WHERE temperature < 60
    UNION
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 53 |
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • EXCEPT

    SELECT visibility FROM air
    EXCEPT
    SELECT visibility FROM air WHERE temperature < 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+
  • INTERSECT

    SELECT visibility FROM air
    INTERSECT
    SELECT visibility FROM air WHERE temperature > 50 LIMIT 10;
    +------------+
    | visibility |
    +------------+
    | 56 |
    | 50 |
    | 67 |
    | 65 |
    | 53 |
    | 74 |
    | 71 |
    | 78 |
    | 79 |
    | 59 |
    +------------+

ORDER BY Clause

Sort the results by the referenced expression. Ascending (ASC) is used by default. Sort in descending order by adding DESC after the expression of ORDER BY.

Example

SELECT * FROM air ORDER BY temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY temperature DESC;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
+---------------------+-------------+------------+-------------+----------+
SELECT * FROM air ORDER BY station, temperature;
+---------------------+-------------+------------+-------------+----------+
| time | station | visibility | temperature | pressure |
+---------------------+-------------+------------+-------------+----------+
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 |
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 |
+---------------------+-------------+------------+-------------+----------+

Expression

An expression is a kind of combination of symbols and operators that CnosDB will process to obtain a single data value. A simple expression can be a constant, variable, column, or scalar function. Complex expressions can be formed by concatenating two or more simple expressions with operators.

Syntax

<expresion> :: = { 
constant
| [ table_name. ] column
| scalar_function
| ( expression )
| expression { binary_operator } expression
| case_when_expression
| window_function | aggregate_function
}

Constant

A symbol representing a single specific data value. You can refer to constant.

Example

select 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+

Functions

You can refer to function.

Unary Operator

OperatorDescription
NOTIf the subexpression is true, the whole expression is false, and if the whole expression is false, the whole expression is true
IS NULLIf the subexpression is null, the whole expression is true
IS NOT NULLIf the subexpression is null, the whole expression is false

Binary Operator

Binary operators and two expressions are combined to form a new expression.

Binary operators supported now:

OperatorDescription
+Numeric type expressions add
-Number type expressions are subtracted
*Number type expressions multiply
/Number type expressions divide
%Integer type expressions are modulo
||String type expression concatenation
=Comparing expressions for equality
!=、 <>Comparing expressions for inequality
<Compare expressions to see if they are less than
<=Comparing expressions to see if they are less than or equal to
>Compare expressions for greater than
>=Compares expressions for greater than or equal to
ANDEvaluate the left expression first, and if it's true, evaluate the right expression, both true and true
ORFirst evaluate the left expression, and if it is false, evaluate the right expression, both false and false
LIKEDetermines whether the left expression matches the pattern of the right expression

BETWEEN AND Expression

Syntax

expr BETWEEN expr AND expr

Example

SELECT DISTINCT PRESSURE FROM AIR WHERE PRESSURE BETWEEN 50 AND 60;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 60 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

Note: BETWEEN x AND y lists the numbers between x and y, including x and y.

IN Expression

The IN operator determines whether any value in the list is equal to the expression.

Example

SELECT station, temperature, visibility FROM air WHERE temperature  IN (68, 69);
+-------------+-------------+------------+
| station | temperature | visibility |
+-------------+-------------+------------+
| XiaoMaiDao | 69 | 56 |
| LianYunGang | 69 | 78 |
+-------------+-------------+------------+

Note

IN only supports a list of constants, not a list of expressions.

CASE WHEN Expression

The CASE WHEN expression is used when the expression needs different values depending on the situation.

Syntax

CASE
( WHEN expression THEN result1 [, ...] )
ELSE result
END;

Example

SELECT DISTINCT 
CASE WHEN PRESSURE >= 60 THEN 50
ELSE PRESSURE
END PRESSURE
FROM AIR;
+----------+
| pressure |
+----------+
| 52 |
| 54 |
| 57 |
| 50 |
| 51 |
| 56 |
| 58 |
| 59 |
| 53 |
| 55 |
+----------+

Operator Precedence

If a complex expression has more than one operator, operator precedence determines the sequence of operations. The order of execution may have a noticeable effect on the resulting value.

The precedence levels of the operators are given in the following table. Operators at higher levels are evaluated before operators at lower levels. In the following table, 1 represents the highest level and 8 represents the lowest level.

PrecedenceOperator
1*(plus)、/(division)、%(modular)
2+ (positive), - (negative), + (plus), + (series), - (minus)
3=、>=、<=、<>、!=、>、<(Comparison operator)
4NOT
5AND
6BETWEEN、IN、LIKE、OR

SHOW

Syntax

SHOW {DATABASES | TABLES | QUERIES}

Show all databases or all tables or SQL in progress.

Example

SHOW DATABASES;
+---------------+
| database_name |
+---------------+
| public |
+---------------+
SHOW TABLES;
+------------+
| table_name |
+------------+
| air |
| sea |
| wind |
+------------+
SHOW QUERIES;
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| query_id | query_text | user_id | user_name | tenant_id | tenant_name | state | duration |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+
| 36 | select * FROM air join sea ON air.temperature = sea.temperature; | 108709109615072923019194003831375742761 | root | 13215126763611749424716665303609634152 | cnosdb | SCHEDULING | 12.693345666 |
+----------+------------------------------------------------------------------+-----------------------------------------+-----------+----------------------------------------+-------------+------------+--------------+

For more information about SHOW QUERIES, you can reference to SHOW QUERIES.

SHOW SERIES

Return the series in the specified table.

Syntax

SHOW SERIES [ON database_name] FROM table_name [WHERE expr] [order_by_clause] [limit_clause] 

Example

SHOW SERIES FROM air WHERE station = 'XiaoMaiDao' ORDER BY key LIMIT 1;
+------------------------+
| key |
+------------------------+
| air,station=XiaoMaiDao |
+------------------------+

Notice

The expression column in the WEHER clause can only be the tag column or the time column, and the expression in the ORDER BY clause can only be the key.

SHOW TAG VALUES

Syntax

SHOW TAG VALUES [ON database_name] FROM table_name WITH KEY [<operator> "<tag_key>" | [[NOT] IN ("<tag_key1>", ..)]] [WHERE expr] [order_by_clause] [limit_clause];

operator include =, !=.

Example

SHOW TAG VALUES FROM air WITH KEY = "station" WHERE station = 'XiaoMaiDao' ORDER BY key, value LIMIT 1;
+---------+------------+
| key | value |
+---------+------------+
| station | XiaoMaiDao |
+---------+------------+
SHOW TAG VALUES FROM air WITH KEY NOT IN ("station1");
+---------+-------------+
| key | value |
+---------+-------------+
| station | XiaoMaiDao |
| station | LianYunGang |
+---------+-------------+

EXPLAIN

Syntax

EXPLAIN [ ANALYZE ] [ VERBOSE ] <statement>;

Explanation

EXPLAIN is only used to display the execution plan of a query, and does not execute the query.

EXPLAIN ANALYZE executes the query and displays the execution plan of the query.

EXPLAIN ANALYZE VERBOSE executes the query and displays a more detailed execution plan, including the number of rows read.

Example

EXPLAIN SELECT station, temperature, visibility FROM air;
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: #air.station, #air.temperature, #air.visibility |
| | TableScan: air projection=[station, visibility, temperature] |
| physical_plan | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature] |
| | |
+---------------+-----------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE SELECT station, temperature, visibility FROM air;
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=20.375µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=15.929624ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_series_scan=1.698791ms, elapsed_point_to_record_batch=4.572954ms, elapsed_field_scan=5.119076ms] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
EXPLAIN ANALYZE VERBOSE SELECT station, temperature, visibility FROM air;
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[output_rows=13, elapsed_compute=26.75µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[output_rows=13, elapsed_compute=13.225875ms, spill_count=0, spilled_bytes=0, mem_used=0, elapsed_point_to_record_batch=3.918163ms, elapsed_field_scan=3.992161ms, elapsed_series_scan=1.657416ms] |
| | |
| Plan with Full Metrics | ProjectionExec: expr=[station@0 as station, temperature@2 as temperature, visibility@1 as visibility], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.865034 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=26.75µs, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13] |
| | TskvExec: limit=None, predicate=ColumnDomains { column_to_domain: Some({}) }, projection=[station,visibility,temperature], metrics=[start_timestamp{partition=0}=2022-10-25 03:00:14.864225 UTC, end_timestamp{partition=0}=2022-10-25 03:00:14.879596 UTC, elapsed_compute{partition=0}=13.225875ms, spill_count{partition=0}=0, spilled_bytes{partition=0}=0, mem_used{partition=0}=0, output_rows{partition=0}=13, elapsed_point_to_record_batch{partition=0}=3.918163ms, elapsed_field_scan{partition=0}=3.992161ms, elapsed_series_scan{partition=0}=1.657416ms] |
| | |
| Output Rows | 13 |
| Duration | 13.307708ms |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

DESCRIBE

Syntax

DESCRIBE {DATABASE db_name | TABLE tb_name};

Describe the parameters of the database and the pattern of the table.

Example

DESCRIBE TABLE air;
+-------------+-----------------------+-------------+-------------------+
| column_name | data_type | column_type | compression_codec |
+-------------+-----------------------+-------------+-------------------+
| time | TIMESTAMP(NANOSECOND) | TIME | DEFAULT |
| station | STRING | TAG | DEFAULT |
| pressure | DOUBLE | FIELD | DEFAULT |
| temperature | DOUBLE | FIELD | DEFAULT |
| visibility | DOUBLE | FIELD | DEFAULT |
+-------------+-----------------------+-------------+-------------------+
DESCRIBE DATABASE public;
+-----+-------+----------------+---------+-----------+
| ttl | shard | vnode_duration | replica | precision |
+-----+-------+----------------+---------+-----------+
| INF | 1 | 365 Days | 1 | NS |
+-----+-------+----------------+---------+-----------+

Join Clause

Join Operation

CnosDB supports INNER JOINLEFT OUTER JOINRIGHT OUTER JOINFULL OUTER JOIN.

CROSS JOIN is not supported currently.

INNER JOIN

The keyword JOIN or INNER JOIN defines a join that only displays matching rows in two tables.

Example

SELECT * FROM air INNER JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

LEFT JOIN

Define a left join with the keyword LEFT JOIN or LEFT OUTER JOIN . This join includes all the rows in the left table. If there are no matching rows in the right table, the right side of the join is null.

Example

SELECT * FROM air LEFT JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

RIGHT JOIN

Define a right join with the keyword RIGHT JOIN or RIGHT OUTER JOIN . This join includes all the rows in the right table. If there are no matching rows in the left table, the left side of the join is null.

Example

SELECT * FROM air RIGHT JOIN sea ON air.temperature = sea.temperature;
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
+---------------------+------------+------------+-------------+----------+---------------------+-------------+-------------+

FULL JOIN

The keyword FULL JOIN or FULL OUTER JOIN defines a full connection, which is actually the union of LEFT OUTER JOIN and RIGHT OUTER JOIN. It will display all the rows on the left and right of the join, and will generate null values where either side of the join does not match.

Example

SELECT * FROM air FULL JOIN sea ON air.temperature = sea.temperature;
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| time | station | visibility | temperature | pressure | time | station | temperature |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+
| 2022-01-28 13:27:00 | XiaoMaiDao | 67 | 62 | 59 | 2022-01-28 13:18:00 | LianYunGang | 62 |
| | | | | | 2022-01-28 13:21:00 | LianYunGang | 63 |
| | | | | | 2022-01-28 13:24:00 | LianYunGang | 77 |
| | | | | | 2022-01-28 13:27:00 | LianYunGang | 54 |
| | | | | | 2022-01-28 13:30:00 | LianYunGang | 55 |
| | | | | | 2022-01-28 13:33:00 | LianYunGang | 64 |
| | | | | | 2022-01-28 13:36:00 | LianYunGang | 56 |
| | | | | | 2022-01-28 13:21:00 | XiaoMaiDao | 57 |
| | | | | | 2022-01-28 13:24:00 | XiaoMaiDao | 64 |
| | | | | | 2022-01-28 13:27:00 | XiaoMaiDao | 51 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:30:00 | XiaoMaiDao | 78 |
| 2022-01-28 13:24:00 | XiaoMaiDao | 50 | 78 | 66 | 2022-01-28 13:33:00 | XiaoMaiDao | 78 |
| | | | | | 2022-01-28 13:36:00 | XiaoMaiDao | 57 |
| 2022-01-28 13:30:00 | XiaoMaiDao | 65 | 79 | 77 | 2022-01-28 13:39:00 | XiaoMaiDao | 79 |
| 2022-01-28 13:21:00 | XiaoMaiDao | 56 | 69 | 77 | | | |
| 2022-01-28 13:33:00 | XiaoMaiDao | 53 | 53 | 68 | | | |
| 2022-01-28 13:36:00 | XiaoMaiDao | 74 | 72 | 68 | | | |
| 2022-01-28 13:39:00 | XiaoMaiDao | 71 | 71 | 80 | | | |
| 2022-01-28 13:21:00 | LianYunGang | 78 | 69 | 71 | | | |
| 2022-01-28 13:24:00 | LianYunGang | 79 | 80 | 51 | | | |
| 2022-01-28 13:27:00 | LianYunGang | 59 | 74 | 59 | | | |
| 2022-01-28 13:30:00 | LianYunGang | 67 | 70 | 72 | | | |
| 2022-01-28 13:33:00 | LianYunGang | 80 | 70 | 68 | | | |
| 2022-01-28 13:36:00 | LianYunGang | 59 | 70 | 54 | | | |
+---------------------+-------------+------------+-------------+----------+---------------------+-------------+-------------+

GROUP BY Clause

The GROUP BY clause must be after the condition of the WHERE clause (if there is one) and before the ORDER BY clause (if there is one).

Example

SELECT station, AVG(temperature) 
FROM air
GROUP BY station;
+-------------+----------------------+
| station | AVG(air.temperature) |
+-------------+----------------------+
| XiaoMaiDao | 69.14285714285714 |
| LianYunGang | 72.16666666666667 |
+-------------+----------------------+

HAVING Clause

Syntax

group_by_clause 
[ HAVING condition ];

In the SELECT query, the HAVING clause must follow the GROUP BY clause and appear before the ORDER BY clause (if there is one).

Differences between HAVING and WHERE

HAVING enables you to specify filter conditions after the GROUP BY clause, so as to control which groups in the query results can appear in the final results.

WHERE sets conditions on the selected column before the GROUP BY clause, while HAVING clause sets conditions on the group generated by the GROUP BY clause.

Example

SELECT station, AVG(temperature)  AS avg_t 
FROM air
GROUP BY station
HAVING avg_t > 70;
+-------------+-------------------+
| station | avg_t |
+-------------+-------------------+
| LianYunGang | 72.16666666666667 |
+-------------+-------------------+

Complex Grouping Operation

CnosDB provides ROLLUP, CUBE and other complex grouping operations, enabling you to operate query results in different ways.

ROLLUP

You can use the ROLLUP option in a single query to generate multiple group sets.

ROLLUP assumes a hierarchy between input columns.

If you GRUOP BY Clause is as follows:

Syntax

SELECT ...
FROM ...
GROUP BY ROLLUP(column_1,column_2);

It is equivalent to the following statement:

Syntax

SELECT ...
FROM ...


UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1

UNION ALL

SELECT ...
FROM ...
GROUP BY
column_1, column2;

ROLLUP generates all grouping sets that are meaningful in this hierarchy. Whenever the value of column_1 changes, it will generate a subtotal line;

Therefore, we often use ROLLUP in reports to generate subtotals and totals. The order of columns in ROLLUP is very important.

Example

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY ROLLUP (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| | | 70.53846153846153 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
+-------------+------------+----------------------+

CUBE

Similar to ROLLUP, CUBE is an extension of the GROUP BY clause. It allows you to generate subtotals for all combinations of grouping columns specified in the GROUP BY clause.

CUBE creates a grouping set for each possible combination of the specified expression set. First, GROUP BY (A, B, C), then (A, B), (A, C), (A), (B, C), (B), (C), and finally GROUP BY the entire table.

Syntax

SELECT ... 
FROM ...
GROUP BY CUBE (column1, column2);

Equivalent to:

SELECT ...
FROM ...
GROUP BY column1

UNION ALL

SELECT ...
FROM ...
GROUP BY column2

UNION ALL

SELECT ...
FROM ...
GROUP BY column1, column2

UNION ALL

SELECT ...
FROM ...
;

Example

SELECT station, visibility, avg(temperature) 
FROM air
GROUP BY CUBE (station, visibility);
+-------------+------------+----------------------+
| station | visibility | AVG(air.temperature) |
+-------------+------------+----------------------+
| XiaoMaiDao | 56 | 69 |
| XiaoMaiDao | 50 | 78 |
| XiaoMaiDao | 67 | 62 |
| XiaoMaiDao | 65 | 79 |
| XiaoMaiDao | 53 | 53 |
| XiaoMaiDao | 74 | 72 |
| XiaoMaiDao | 71 | 71 |
| LianYunGang | 78 | 69 |
| LianYunGang | 79 | 80 |
| LianYunGang | 59 | 72 |
| LianYunGang | 67 | 70 |
| LianYunGang | 80 | 70 |
| | 56 | 69 |
| | 50 | 78 |
| | 67 | 66 |
| | 65 | 79 |
| | 53 | 53 |
| | 74 | 72 |
| | 71 | 71 |
| | 78 | 69 |
| | 79 | 80 |
| | 59 | 72 |
| | 80 | 70 |
| XiaoMaiDao | | 69.14285714285714 |
| LianYunGang | | 72.16666666666667 |
| | | 70.53846153846153 |
+-------------+------------+----------------------+

Aggregate Function

Common Aggregate Functions

COUNT

Syntax

COUNT(x)

Function: Return the number of rows retrieved in the selected element.

Contain the DISTINCT keyword, which counts the results after deduplication.

COUNT() and COUNT(literal value) are equivalent, and if the sql projection contains only '/literal value', the sql is rewritten as COUNT(time).

COUNT(tag) and COUNT(DISTINCT tag) are equivalent.

COUNT(field) Returns the number of non-null values.

Parameter Type:Any type

Return Type:BIGINT

Example

SELECT COUNT(*) FROM air;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 13 |
+-----------------+
SELECT COUNT(temperature) FROM air;
+------------------------+
| COUNT(air.temperature) |
+------------------------+
| 13 |
+------------------------+
SELECT COUNT(DISTINCT temperature) FROM air;
+---------------------------------+
| COUNT(DISTINCT air.temperature) |
+---------------------------------+
| 10 |
+---------------------------------+

SUM

Syntax

SUM(NUMERICS)

Function: Return the sum calculated from the selected element.

Parameter Type: Numeric type

Return Type: Consistent with parameter type.

Example

SELECT SUM(temperature) FROM air;
+----------------------+
| SUM(air.temperature) |
+----------------------+
| 917 |
+----------------------+

MIN

Syntax

MIN(STRING | NUMERICS | TIMESTAMP)

Function: Return the minimum value of the selected element.

Parameter Type: Numeric type or STRING or TIMESTAMP

Return Type: Consistent with parameter type.

Example

 SELECT MIN(time), MIN(station), MIN(temperature) FROM air;
+---------------------+------------------+----------------------+
| MIN(air.time) | MIN(air.station) | MIN(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:21:00 | LianYunGang | 53 |
+---------------------+------------------+----------------------+

MAX

Syntax

MAX(STRINGS | NUMERICS | TIMESTAMPS)

Function: Return the maximum value in the selected element.

Parameter Type: Numeric type or STRING or TIMESTAMP.

Return Type: Consistent with parameter type.

Example

SELECT MAX(time), MAX(station), MAX(temperature) FROM air;
+---------------------+------------------+----------------------+
| MAX(air.time) | MAX(air.station) | MAX(air.temperature) |
+---------------------+------------------+----------------------+
| 2022-01-28T13:39:00 | XiaoMaiDao | 80 |
+---------------------+------------------+----------------------+

AVG

Syntax

AVG(NUMERICS)

Function: Return the average value of the selected element.

Parameter Type: Numeric type

Return Type: Numeric type

Example

SELECT AVG(temperature) FROM air;
+----------------------+
| AVG(air.temperature) |
+----------------------+
| 70.53846153846153 |
+----------------------+

ARRAY_AGG

Syntax

ARRAY_AGG(expr)

Function: Return an array consisting of all the values of the selected element. The element types must be the same.

Parameter Type: any type

Return Type: Array of parameter type

Example

SELECT ARRAY_AGG(temperature) from air;
+------------------------------------------------------+
| ARRAYAGG(air.temperature) |
+------------------------------------------------------+
| [69, 78, 62, 79, 53, 72, 71, 69, 80, 74, 70, 70, 70] |
+------------------------------------------------------+

Note

The aggregate function result cannot be returned in CSV format.

FIRST

first(time,  value)

Gets the first value of one column sorted by another.

Parameter Type:

  • time: Timestamp

  • value: any

Return Type: Same as value type.

Example

select first(time, pressure) from air;
+------------------------------+
| first(air.time,air.pressure) |
+------------------------------+
| 63.0 |
+------------------------------+

LAST

last(time,  value)

Gets the last value of one column sorted by another.

Parameters Type:

  • time: Timestamp

  • value: any

Return Type: Same as value type.

Example

select last(time, pressure) from air;
+-----------------------------+
| last(air.time,air.pressure) |
+-----------------------------+
| 55.0 |
+-----------------------------+

MODE

mode(value)

Calculate the mode of a column.

Parameter Type: value: any

Return Type: Same as value type.

Example

select mode(pressure) from air;
+--------------------+
| mode(air.pressure) |
+--------------------+
| 69.0 |
+--------------------+

INCREASE

increase(time, value order by time)

Calculate the increment of value in the time series.

Parammeter Type: value: numeric type

Return Type: Same as value type.

Example

CREATE DATABASE IF NOT EXISTS TEST_INCREASE;
ALTER DATABASE TEST_INCREASE SET TTL '100000D';
CREATE TABLE IF NOT EXISTS test_increase.test_increase(f0 BIGINT, TAGS(t0));
INSERT INTO test_increase.test_increase(time, t0, f0)
VALUES
('1999-12-31 00:00:00.000', 'a', 1),
('1999-12-31 00:00:00.005', 'a', 2),
('1999-12-31 00:00:00.010', 'a', 3),
('1999-12-31 00:00:00.015', 'a', 4),
('1999-12-31 00:00:00.020', 'a', 5),
('1999-12-31 00:00:00.025', 'a', 6),
('1999-12-31 00:00:00.030', 'a', 7),
('1999-12-31 00:00:00.035', 'a', 8),
('1999-12-31 00:00:00.000', 'b', 1),
('1999-12-31 00:00:00.005', 'b', 2),
('1999-12-31 00:00:00.010', 'b', 3),
('1999-12-31 00:00:00.015', 'b', 4),
('1999-12-31 00:00:00.020', 'b', 1),
('1999-12-31 00:00:00.025', 'b', 2),
('1999-12-31 00:00:00.030', 'b', 3),
('1999-12-31 00:00:00.035', 'b', 4);
SELECT t0, INCREASE(time, f0 ORDER BY time) AS increase
FROM test_increase.test_increase GROUP BY t0 ORDER BY t0;
+----+----------+
| t0 | increase |
+----+----------+
| a | 7 |
| b | 7 |
+----+----------+

Statistical Aggregate Functions

VAR | VAR_SAMP

Syntax

VAR(NUMERICS)

Function: Calculate the variance of a given sample

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT VAR(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 51.43589743589741 |
+---------------------------+

VAR_POP

Syntax

VAR_POP(NUMERICS)

Function: Calculate the variance of population.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT VAR_POP(temperature) FROM air;
+------------------------------+
| VARIANCEPOP(air.temperature) |
+------------------------------+
| 47.47928994082838 |
+------------------------------+

STDDEV | STDDEV_SAMP

Syntax

STDDEV(NUMERICS)

Function: Calculate the standard deviation of the sample.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT STDDEV(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 7.1718824192744135 |
+-------------------------+

STDDEV_POP

Syntax

STDDEV_POP(NUMERICS)

Function: Calculate the standard deviation of population.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT STDDEV_POP(temperature) FROM air;
+----------------------------+
| STDDEVPOP(air.temperature) |
+----------------------------+
| 6.890521746633442 |
+----------------------------+

COVAR | COVAR_SAMP

Syntax

COVAR(NUMERICS, NUMERICS)

Function: Return the covariance of the sample.

Parameter Type: Numeric type

Numeric type: DOUBLE

Example

SELECT COVAR(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| -5.121794871794841 |
+------------------------------------------+

COVAR_POP

Syntax

COVAR_POP(NUMERICS, NUMERICS)

Function: Return the overall covariance of number pairs in a group.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT COVAR_POP(temperature, pressure) FROM air;
+---------------------------------------------+
| COVARIANCEPOP(air.temperature,air.pressure) |
+---------------------------------------------+
| -4.727810650887546 |
+---------------------------------------------+

CORR

Syntax

CORR**(NUMERICS, NUMERICS)

Function: Return the Pearson coefficient representing the association between a set of number pairs.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT CORR(temperature, pressure) FROM air;
+-------------------------------------------+
| CORRELATION(air.temperature,air.pressure) |
+-------------------------------------------+
| -0.07955796767766017 |
+-------------------------------------------+

Approximate Aggregate Functions

APPROX_DISTINCT

Syntax

APPROX_DISTINCT(x)

Function: Return approximations of different input values (HyperLogLog).

Parameter Type: STRING

Return Type: BIGINT

Example

SELECT APPROX_DISTINCT(station) FROM air;
+-----------------------------+
| APPROXDISTINCT(air.station) |
+-----------------------------+
| 2 |
+-----------------------------+

APPROX_PERCENTILE_CONT

Syntax

APPROX_PERCENTILE_CONT(x, p)  

Function: Returns the approximate percentile (TDigest) of the input value x, where p is the percentile and is a 64 bit floating point number between 0 and 1 (including 1).

Parameter Type: x is numeric type, p is DOUBLE type

Return Type: DOUBLE

Example

SELECT APPROX_PERCENTILE_CONT(temperature, 0.1) FROM air;
+----------------------------------------------------+
| APPROXPERCENTILECONT(air.temperature,Float64(0.1)) |
+----------------------------------------------------+
| 60.4 |
+----------------------------------------------------+

APPROX_PERCENTILE_CONT_WITH_WEIGHT

Syntax

APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, w, p)  

Function: x returns the approximate percentage (TDigest) of the weighted input value, where w is the weight column expression and p is a floating point 64 between 0 and 1 inclusive.

APPROX_PERCENTILE_CONT(x, p) is equivalent to APPROX_PERCENTILE_CONT_WITH_WEIGHT(x, 1, p)

Parameter Type: x. w is numeric type, p is DOUBLE type.

Return Type: DOUBLE

Example

SELECT APPROX_PERCENTILE_CONT_WITH_WEIGHT(temperature,2, 0.1) FROM air;
+-----------------------------------------------------------------------+
| APPROXPERCENTILECONTWITHWEIGHT(air.temperature,Int64(2),Float64(0.1)) |
+-----------------------------------------------------------------------+
| 54.35 |
+-----------------------------------------------------------------------+

APPROX_MEDIAN(NUMERICS)

Syntax

APPROX_MEDIAN(NUMERICS)

Function: Return the approximate median of the input value.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT APPROX_MEDIAN(temperature) FROM air;
+-------------------------------+
| APPROXMEDIAN(air.temperature) |
+-------------------------------+
| 70 |
+-------------------------------+

SAMPLE

Syntax

SAMPLE(<column_key>, <N>)

Function: Randomly select N records from the given column_key.

Parameter Type:

  • column_key: Any type
  • N: Int

Return Type: Array

Example

select sample(visibility, 5) from air;
+--------------------------------------+
| sample(air.visibility,Int64(5)) |
+--------------------------------------+
| [65.0, 74.0, 76.0, 77.0, 72.0, 77.0] |
+--------------------------------------+

ASAP_SMOOTH

asap_smooth(time, value, resolution order by time)

The ASAP smoothing algorithm aims to create human-readable graphs that preserve the coarse shape and larger trends of the input data while minimizing the local variance between points. Take the (Timestamp, value) pair, normalize them to the target time interval, and return the ASAP smooth value.

Parameter Type:

  • time: Timestamp

  • value: Double

  • resolution: Bigint, the approximate number of points to return ((Timestamp, value) pair), determines the horizontal resolution of the resulting plot.

Return Type: TimeVector

Struct {
time: List[Timestamp], -- ms
value: List[Double],
resolution: Int Unsigned,
}

Example

select asap_smooth(time, pressure, 10) from air group by date_trunc('month', time);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| asap_smooth(air.time,air.pressure,Int64(10)) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10} |
| {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10} |
| {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Two-stage Aggregation Function

stats_agg

Perform linear regression analysis on two-dimensional data, such as calculating correlation coefficients and covariances. Common statistics such as mean and standard deviation can also be calculated for each dimension separately. stats_agg provides the same functionality as aggregation functions such as sum, count, corr, covar_pop, and so on. Suitable for use with multiple analysis functions in a single SQL.

Notice: Neither column is included in the aggregation if it is NULL.

stats_agg

stats_agg(y, x)

Function: Perform statistical aggregation.

Parameter Type:

  • y: double
  • x: double

Return Type: Struct as follows.

{ 
n: bigint, -- count
sx: double, -- sum(x)- sum(x)
sx2: double, -- sum((x-sx/n)^2) (sum of squares)
sx3: double, -- sum((x-sx/n)^3)
sx4: double, -- sum((x-sx/n)^4)
sy: double, -- sum(y)
sy2: double, -- sum((y-sy/n)^2) (sum of squares)
sy3: double, -- sum((y-sy/n)^3)
sy4: double, -- sum((y-sy/n)^4)
sxy: double, -- sum((x-sx/n)*(y-sy/n)) (sum of products)
}

Example

create table if not exists test_stats(x bigint, y bigint);
alter database public set ttl '1000000d';
insert into test_stats(time, x, y) values
(1, 1, 1),
(2, 1, 2),
(3, 1, 3),
(4, 1, 4),
(5, 1, 5),
(6, 2, 1),
(7, 2, 2),
(8, 2, 3),
(9, 2, 4),
(10, 2, 5);
select stats_agg(y, x) from test_stats;
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stats_agg(test_stats.y,test_stats.x) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {n: 10, sx: 15.0, sx2: 2.5, sx3: -2.7755575615628914e-16, sx4: 0.6249999999999999, sy: 30.0, sy2: 20.0, sy3: -1.7763568394002505e-15, sy4: 68.0, sxy: 0.0} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------+

num_vals

Calculate the number of data rows after two-dimensional statistical aggregation

Return Type: BIGINT UNSIGNED

select num_vals(stats_agg(y, x)) from test_stats;
+------------------------------------------------+
| num_vals(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------+
| 10 |
+------------------------------------------------+

average_y, average_x

Calculate the average of the specified dimensions after the aggregation of 2-D statistics.

Return Type: Double

select average_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| average_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 1.5 |
+-------------------------------------------------+

sum_y, sum_x

Calculate the sum of the specified dimensions after the two-dimensional statistical aggregation, and the method is population.

Return Type: DOUBLE

select sum_x(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| sum_x(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 15.0 |
+---------------------------------------------+

stddev_samp_y, stddev_samp_x

Calculate the standard deviation of the specified dimension after two-dimensional statistical aggregation, and the method is sample.

Return Type: DOUBLE

select stddev_samp_x(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------+
| stddev_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------+
| 0.5270462766947299 |
+-----------------------------------------------------+

stddev_pop_y, stddev_pop_x

Calculate the standard deviation of the specified dimension after the two-dimensional statistical aggregation, and the method is population.

Return Type: DOUBLE

select stddev_pop_x(stats_agg(y, x)) from test_stats;
+----------------------------------------------------+
| stddev_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+----------------------------------------------------+
| 0.5 |
+----------------------------------------------------+

var_samp_y, var_samp_x

Calculate the variance of the specified dimension after aggregating the two-dimensional statistics, and the method is sample.

Return Type: DOUBLE

select var_samp_x(stats_agg(y, x)) from test_stats;
+--------------------------------------------------+
| var_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+--------------------------------------------------+
| 0.2777777777777778 |
+--------------------------------------------------+

var_pop_y, var_pop_x

Calculate the variance of the specified dimension after aggregating the two-dimensional statistics, and the method is population.

Return Type: DOUBLE

select var_pop_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| var_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 0.25 |
+-------------------------------------------------+

skewness_samp_y, skewness_samp_x

Calculate the skewness value of the specified dimension after two-dimensional statistical aggregation, and the method is sample.

Return Type: DOUBLE

select skewness_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| skewness_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| -2.1065000811460203e-16 |
+-------------------------------------------------------+

skewness_pop_y, skewness_pop_x

Calculate the skewness value of the specified dimension after the two-dimensional statistical aggregation, and the method is population.

Return Type: DOUBLE

select skewness_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| skewness_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| -2.220446049250313e-16 |
+------------------------------------------------------+

kurtosis_samp_y, kurtosis_samp_x

Calculate the kurtosis value of the specified dimension after two-dimensional statistical aggregation, and the method is sample.

Return Type: DOUBLE

select kurtosis_samp_x(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| kurtosis_samp_x(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.8999999999999998 |
+-------------------------------------------------------+

kurtosis_pop_y, kurtosis_pop_x

Calculate the kurtosis value of the specified dimension after two-dimensional statistical aggregation, and the method is population.

Return Type: DOUBLE

select kurtosis_pop_x(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| kurtosis_pop_x(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.9999999999999998 |
+------------------------------------------------------+

correlation

The correlation after aggregation of two-dimensional statistics is calculated.

Return Type: DOUBLE

select correlation(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| correlation(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| 0.0 |
+---------------------------------------------------+

covariance_samp, covariance_pop

The covariance after aggregation of 2-D statistics is calculated.

Return Type: DOUBLE

select covariance_samp(stats_agg(y, x)) from test_stats;
+-------------------------------------------------------+
| covariance_samp(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------------+
| 0.0 |
+-------------------------------------------------------+
select covariance_pop(stats_agg(y, x)) from test_stats;
+------------------------------------------------------+
| covariance_pop(stats_agg(test_stats.y,test_stats.x)) |
+------------------------------------------------------+
| 0.0 |
+------------------------------------------------------+

determination_coeff

The coefficient of determination after 2D statistical aggregation is calculated.

Return Type: DOUBLE

select determination_coeff(stats_agg(y, x)) from test_stats;
+-----------------------------------------------------------+
| determination_coeff(stats_agg(test_stats.y,test_stats.x)) |
+-----------------------------------------------------------+
| 0.0 |
+-----------------------------------------------------------+

slope

Based on the 2-D statistical aggregation, the slope of the linear fitting line is calculated.

Return Type: DOUBLE

select slope(stats_agg(y, x)) from test_stats;
+---------------------------------------------+
| slope(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------+
| 0.0 |
+---------------------------------------------+

intercept

Calculate the intercept of y after 2D statistical aggregation.

Return Type: DOUBLE

select intercept(stats_agg(y, x)) from test_stats;
+-------------------------------------------------+
| intercept(stats_agg(test_stats.y,test_stats.x)) |
+-------------------------------------------------+
| 3.0 |
+-------------------------------------------------+

x_intercept

Calculate the intercept of x after two-dimensional statistical aggregation.

Return Type: DOUBLE

select x_intercept(stats_agg(y, x)) from test_stats;
+---------------------------------------------------+
| x_intercept(stats_agg(test_stats.y,test_stats.x)) |
+---------------------------------------------------+
| -inf |
+---------------------------------------------------+

gauge_agg

Analyze Gauge data. Unlike Counter, Gauge can be decreased or increased.

gauge_agg

gauge_agg(time, value)

This is the first step in analyzing the Gauge data. Create intermediate aggregates using gauge_agg, The other functions then use the intermediate aggregated data for their calculations.

Parameter Type:

  • time: Timestamp

  • value: DOUBLE

Return Type:

Struct {
first: Struct {
ts: Timestamp,
value: Double
},
second: Struct {
ts: Timestamp,
value: Double
},
penultimate: Struct {
ts: Timestamp,
val: Double
},
last: Struct {
ts: Timestamp,
val: Double
},
num_elements: Bigint Unsingned
}

Example

select gauge_agg(time, pressure) from air group by date_trunc('month', time);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gauge_agg(air.time,air.pressure) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {first: {ts: 2023-03-01T00:00:00, val: 54.0}, second: {ts: 2023-03-01T00:00:00, val: 59.0}, penultimate: {ts: 2023-03-14T16:00:00, val: 55.0}, last: {ts: 2023-03-14T16:00:00, val: 80.0}, num_elements: 13122} |
| {first: {ts: 2023-01-14T16:00:00, val: 63.0}, second: {ts: 2023-01-14T16:00:00, val: 68.0}, penultimate: {ts: 2023-01-31T23:57:00, val: 77.0}, last: {ts: 2023-01-31T23:57:00, val: 54.0}, num_elements: 16640} |
| {first: {ts: 2023-02-01T00:00:00, val: 54.0}, second: {ts: 2023-02-01T00:00:00, val: 60.0}, penultimate: {ts: 2023-02-28T23:57:00, val: 74.0}, last: {ts: 2023-02-28T23:57:00, val: 59.0}, num_elements: 26880} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

delta

Get Gauge changes over a period of time. This is simply increment, calculated by subtracting the last value seen from the first value.

Return Type: Double

select delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+-----------------------------------------+
| delta(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------+
| 26.0 |
| -9.0 |
| 5.0 |
+-----------------------------------------+

time_delta

Get the duration, the time of the last Gauge minus the time of the first Gauge.

Return Type: INTERVAL

select time_delta(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------------------------+
| time_delta(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------------------+
| 0 years 0 mons 13 days 16 hours 0 mins 0.000000000 secs |
| 0 years 0 mons 17 days 7 hours 57 mins 0.000000000 secs |
| 0 years 0 mons 27 days 23 hours 57 mins 0.000000000 secs |
+----------------------------------------------------------+

rate

Calculate the ratio of Gauge change and time change.

Return Type: Double

Unit:

When the time unit is ns, the ratio unit is /ns,

When the time unit is ms, the ratio unit is /ms

When the time unit is s, the ratio unit is /s

select rate(gauge_agg(time, pressure)) from air group by date_trunc('month', time);
+----------------------------------------+
| rate(gauge_agg(air.time,air.pressure)) |
+----------------------------------------+
| 2.2018970189701897e-14 |
| 9.349414325974008e-15 |
| -4.133905465849807e-16 |
+----------------------------------------+

first_time

Get the smallest timestamp in the gauge

返回类型: TIMESTAMP

select first_time(gauge_agg(time, pressure)) from air;
+----------------------------------------------+
| first_time(gauge_agg(air.time,air.pressure)) |
+----------------------------------------------+
| 2023-01-14T16:00:00 |
+----------------------------------------------+

last_time

Get the largest timestamp in the Gauge

返回类型: TIMESTAMP

select last_time(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| last_time(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 2023-03-14T16:00:00 |
+---------------------------------------------+

first_val

Obtain the value corresponding to the smallest timestamp in the gauge

返回类型: The type of column specified in gauge_agg

select first_val(gauge_agg(time, pressure)) from air;
+---------------------------------------------+
| first_val(gauge_agg(air.time,air.pressure)) |
+---------------------------------------------+
| 68.0 |
+---------------------------------------------+

last_val

Get the value corresponding to the largest timestamp in the Gauge.

返回类型: Type of column specified in gauge_agg

select last_val(gauge_agg(time, pressure)) from air;
+--------------------------------------------+
| last_val(gauge_agg(air.time,air.pressure)) |
+--------------------------------------------+
| 80.0 |
+--------------------------------------------+

idelta_left

Calculates the earliest instantaneous change in Gauge. This is equal to the second value minus the first.

返回类型:Type of column specified in gauge_agg

 select time, station, pressure from air where station = 'XiaoMaiDao' order by time limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 63.0 |
| 2023-01-14T16:03:00 | XiaoMaiDao | 58.0 |
| 2023-01-14T16:06:00 | XiaoMaiDao | 65.0 |
| 2023-01-14T16:09:00 | XiaoMaiDao | 52.0 |
+---------------------+------------+----------+
select idelta_left(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+-----------------------------------------------+
| idelta_left(gauge_agg(air.time,air.pressure)) |
+-----------------------------------------------+
| -5.0 |
+-----------------------------------------------+

idelta_right

Calculates the latest instantaneous change in Gauge. This is equal to the last value value minus the penultimate value.

返回类型:Type of column specified in gauge_agg

select time, station, pressure from air where station = 'XiaoMaiDao' order by time desc limit 4;
+---------------------+------------+----------+
| time | station | pressure |
+---------------------+------------+----------+
| 2023-03-14T16:00:00 | XiaoMaiDao | 55.0 |
| 2023-03-14T15:57:00 | XiaoMaiDao | 62.0 |
| 2023-03-14T15:54:00 | XiaoMaiDao | 75.0 |
| 2023-03-14T15:51:00 | XiaoMaiDao | 61.0 |
+---------------------+------------+----------+
select idelta_right(gauge_agg(time, pressure)) from air where station = 'XiaoMaiDao';
+------------------------------------------------+
| idelta_right(gauge_agg(air.time,air.pressure)) |
+------------------------------------------------+
| -7.0 |
+------------------------------------------------+

compact_state_agg

Given a system or a value that switches between discrete states,

Sum up the time taken for each state.

For example, you can use the compact_state_agg function to keep track of the system.

Time spent in the error, running, or start state.

compact_state_agg is designed to handle a relatively small number of states. It may not perform well on datasets with too many states between rows.

If you need to track the time to enter and exit each state, use the state_agg function.

If you need to track the activity of your system based on heartbeat signals, consider using the heartbeat_agg function.

compact_state_agg

compact_state_agg(ts, state)

The time spent in each state is counted and aggregated into the StateAggData type.

Parameter Type:

-ts: Timestamp

-state: Any

Return Type: StateAggData type

Struct {
state_duration: List[
Struct{
state: any,
interval: duration
},
...
]
state_periods: List[
Struct(
state: any,
periods: List[
Struct {
start_time: timestamp,
end_time: timestamp
},
...
]
),
......
]
}

Example

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select compact_state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| compact_state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}, {state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}], state_periods: []} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

Count the duration of a state, or count the duration of a state in a certain period of time.

Parameter Type:

  • state_agg_data: StateAggData

  • state: any The same type of state as in compact_state_agg.

  • begin_time: This is optional and specifies the start time of the period.

  • interval_time: This is optional and specifies the duration of the time interval or infinity if not specified.

Return Type: INTERVAL 类型

Example

select duration_in(compact_state_agg(time, state), 'running') from states;
+--------------------------------------------------------------------------+
| duration_in(compact_state_agg(states.time,states.state),Utf8("running")) |
+--------------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+--------------------------------------------------------------------------+

state_agg

Given a system or value that switches between discrete states, the transitions between states are tracked.

state_agg

state_agg(ts, state)

The time spent in each state is counted, and aggregated into the StateAggData type.

alter database public set ttl '1000000d';

create table if not exists states(state STRING);

insert into states values
('2020-01-01 10:00:00', 'starting'),
('2020-01-01 10:30:00', 'running'),
('2020-01-03 16:00:00', 'error'),
('2020-01-03 18:30:00', 'starting'),
('2020-01-03 19:30:00', 'running'),
('2020-01-05 12:00:00', 'stopping');
select state_agg(time, state) from states;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| state_agg(states.time,states.state) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {state_duration: [{state: starting, duration: 0 years 0 mons 0 days 1 hours 30 mins 0.000000000 secs}, {state: running, duration: 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs}, {state: stopping, duration: 0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs}, {state: error, duration: 0 years 0 mons 0 days 2 hours 30 mins 0.000000000 secs}], state_periods: [{state: starting, periods: [{start_time: 2020-01-01T10:00:00, end_time: 2020-01-01T10:30:00}, {start_time: 2020-01-03T18:30:00, end_time: 2020-01-03T19:30:00}]}, {state: error, periods: [{start_time: 2020-01-03T16:00:00, end_time: 2020-01-03T18:30:00}]}, {state: running, periods: [{start_time: 2020-01-01T10:30:00, end_time: 2020-01-03T16:00:00}, {start_time: 2020-01-03T19:30:00, end_time: 2020-01-05T12:00:00}]}]} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

duration_in

duration_in(state_agg_data, state [,begin_time, interval_time]) 

Count the duration of a state, or count the duration of a state in a certain period of time.

Parameter Type:

  • state_agg_data: StateAggData

  • state: any is the same type as the state of compact_state_agg.

  • begin_time: This is optional and specifies the start time of the period.

  • interval_time: This is optional; it specifies the duration of the time interval or infinity if not specified.

Return Type: INTERVAL

Example

Count the time of 'running' status.

select duration_in(state_agg(time, state), 'running') from states;
+------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running")) |
+------------------------------------------------------------------+
| 0 years 0 mons 3 days 22 hours 0 mins 0.000000000 secs |
+------------------------------------------------------------------+

Count the duration of the 'running' state starting 2020-01-01 11:00:00.

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00') 
from states;
+----------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00")) |
+----------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 21 hours 30 mins 0.000000000 secs |
+----------------------------------------------------------------------------------------------+

Count the duration of the 'running' state for four days starting on 2020-01-01 11:00:00.

select duration_in(state_agg(time, state), 'running', Timestamp '2020-01-01 11:00:00', interval '4 day')
from states;
+-------------------------------------------------------------------------------------------------------------------------------------------+
| duration_in(state_agg(states.time,states.state),Utf8("running"),Utf8("2020-01-01 11:00:00"),IntervalMonthDayNano("73786976294838206464")) |
+-------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 3 days 20 hours 30 mins 0.000000000 secs |
+-------------------------------------------------------------------------------------------------------------------------------------------+

state_at

state_at(state_agg_data, ts)

Count the state you are in at a certain time.

Parameter Type:

  • state_agg_data: StateAggData

  • ts: Timestamp

Return Type: any, the same type as the state of compact_state_agg.

select state_at(state_agg(time, state), Timestamp '2020-01-01 10:30:00') from states;
+---------------------------------------------------------------------------+
| state_at(state_agg(states.time,states.state),Utf8("2020-01-01 10:30:00")) |
+---------------------------------------------------------------------------+
| running |
+---------------------------------------------------------------------------+

candlestick_agg

Perform financial asset data analysis. This feature makes it easier to write financial analysis queries involving candlestick.

candlestick_agg gets the open and close price of the stock and the high price.

candlestick_agg generates intermediate aggregate data CandleStackData from raw quote data,

You can then use access and summary functions for this intermediate aggregate data.

candlestick_agg

candlestick_agg(time, price, volume)

Generate intermediate aggregate data CandleStackData from the original quote query.

Parameter Type:

  • time: Timestamp

  • price: Double

  • volume: Double

Return Type: CandleStackData

Struct {
open: Struct {
ts: Timestamp,
val: Double,
},
close: Struct {
ts: Timestamp,
val: Double,
},
high: Struct {
ts: Timestamp,
val: Double,
},
low: Struct {
ts: Timestamp,
val: Double
},
volume: Struct {
vol: Double,
vwap: Double,
}
}

Example

alter database public set ttl '1000000d';
create table if not exists tick(price bigint ,volume bigint);
insert tick(time, price, volume)
values
('1999-12-31 00:00:00.000', 111, 444),
('1999-12-31 00:00:00.005', 222, 444),
('1999-12-31 00:00:00.010', 333, 222),
('1999-12-31 00:00:10.015', 444, 111),
('1999-12-31 00:00:10.020', 222, 555),
('1999-12-31 00:10:00.025', 333, 555),
('1999-12-31 00:10:00.030', 444, 333),
('1999-12-31 01:00:00.035', 555, 222);
select candlestick_agg(time, price, volume) from tick;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| candlestick_agg(tick.time,tick.price,tick.volume) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {open: {ts: 1999-12-31T00:00:00, val: 111.0}, close: {ts: 1999-12-31T01:00:00.035, val: 555.0}, low: {ts: 1999-12-31T00:00:00, val: 111.0}, high: {ts: 1999-12-31T01:00:00.035, val: 555.0}, volume: {vol: 2886.0, vwap: 850149.0}} |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

close

close(candlestick_agg_data)

Get the closing price.

Return Type: DOUBLE

Example

select close(candlestick_agg(time, price, volume)) from tick;
+----------------------------------------------------------+
| close(candlestick_agg(tick.time,tick.price,tick.volume)) |
+----------------------------------------------------------+
| 555.0 |
+----------------------------------------------------------+

close_time

close_time(candlestick_agg_data)

Get the closing time.

Return Type: Timestamp

Example

select close_time(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------------+
| close_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+---------------------------------------------------------------+

high

high(candlestick_agg_data)

Get the highest price.

Return Type: DOUBLE

Example

select high(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| high(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 555.0 |
+---------------------------------------------------------+

high_time

high_time(candlestick_agg_data)

Get the time of the highest price.

Return Type: DOUBLE

Example

select high_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| high_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T01:00:00.035 |
+--------------------------------------------------------------+

low

low(candlestick_agg_data)

Get the lowest price.

Return Type: DOUBLE

Example

select low(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------+
| low(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------+
| 111.0 |
+--------------------------------------------------------+

low_time

low_time(candlestick_agg_data)

Get the time of the lowest price.

Return Type: Timestamp

Example

select low_time(candlestick_agg(time, price, volume)) from tick;
+-------------------------------------------------------------+
| low_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+-------------------------------------------------------------+

open

open(candlestick_agg_data)

Get the opening price.

Return Type: DOUBLE

Example

select open(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| open(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 111.0 |
+---------------------------------------------------------+

open_time

open_time(candlestick_agg_data)

Get the time of the opening price.

Return Type: Timestamp

Example

select open_time(candlestick_agg(time, price, volume)) from tick;
+--------------------------------------------------------------+
| open_time(candlestick_agg(tick.time,tick.price,tick.volume)) |
+--------------------------------------------------------------+
| 1999-12-31T00:00:00 |
+--------------------------------------------------------------+

volume

volume(candlestick_agg_data)

Get the total volume.

Return Type: DOUBLE

Example

select volume(candlestick_agg(time, price, volume)) from tick;
+-----------------------------------------------------------+
| volume(candlestick_agg(tick.time,tick.price,tick.volume)) |
+-----------------------------------------------------------+
| 2886.0 |
+-----------------------------------------------------------+

vwap

vwap(candlestick_agg_data)

Get the volume weighted average price.

Return Type: DOUBLE

Example

select vwap(candlestick_agg(time, price, volume)) from tick;
+---------------------------------------------------------+
| vwap(candlestick_agg(tick.time,tick.price,tick.volume)) |
+---------------------------------------------------------+
| 294.5769230769231 |
+---------------------------------------------------------+

Functions

Mathematical Functions

abs(x)

Function:Return the absolute value of x.

Parameter Type: Numeric type.

Return Type: Consistent with function parameter type.

Example

SELECT abs(-1);
+----------------+
| abs(Int64(-1)) |
+----------------+
| 1 |
+----------------+

acos(x)

Function: Return the arccosine of x.

Parameter Type: Numeric type.

Return Type: DOUBLE.

Example

SELECT acos(3);
+----------------+
| acos(Int64(3)) |
+----------------+
| NaN |
+----------------+
SELECT acos(0.5);
+--------------------+
| acos(Float64(0.5)) |
+--------------------+
| 1.0471975511965976 |
+--------------------+

asin(x)

Function: Return the arcsine of x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT asin(0.5);
+--------------------+
| asin(Float64(0.5)) |
+--------------------+
| 0.5235987755982988 |
+--------------------+
SELECT asin(5);
+----------------+
| asin(Int64(5)) |
+----------------+
| NaN |
+----------------+

atan(x)

Function:Return the arctangent of x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT atan(5);
+-------------------+
| atan(Int64(5)) |
+-------------------+
| 1.373400766945016 |
+-------------------+

atan2(y,x)

Function:Return the arctangent of y/x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT atan2(10, 2);
+---------------------------+
| atan2(Int64(10),Int64(2)) |
+---------------------------+
| 1.3734008 |
+---------------------------+

ceil(x)

Function: Round up.

Parameter Type: Numeric type

Return Type: BIGINT

Example

SELECT ceil(1.6);
+--------------------+
| ceil(Float64(1.6)) |
+--------------------+
| 2 |
+--------------------+

floor(x)

Function: Round down.

Parameter Type: Numeric type

Return Type: BIGINT

Example

SELECT floor(-3.1);
+----------------------+
| floor(Float64(-3.1)) |
+----------------------+
| -4 |
+----------------------+

cos(x)

Function: Return the cosine of x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT cos(1);
+--------------------+
| cos(Int64(1)) |
+--------------------+
| 0.5403023058681398 |
+--------------------+

sin(x)

Function: Return the sine of x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT sin(5);
+---------------------+
| sin(Int64(5)) |
+---------------------+
| -0.9589242746631385 |
+---------------------+

exp(x)

Function: Return e to the x power.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT exp(1);
+-------------------+
| exp(Int64(1)) |
+-------------------+
| 2.718281828459045 |
+-------------------+

ln(x)

Function: Natural logarithm.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT ln(2.718281828459045);
+--------------------------------+
| ln(Float64(2.718281828459045)) |
+--------------------------------+
| 1 |
+--------------------------------+

log(x) | log10(x)

Function: Base 10 logarithm.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT log(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+
SELECT log10(10);
+----------------+
| log(Int64(10)) |
+----------------+
| 1 |
+----------------+

log2(x)

Function: Base 2 logarithm.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT log2(4);
+----------------+
| log2(Int64(4)) |
+----------------+
| 2 |
+----------------+

power(x,y) | pow(x,y)

Function: x to the y power.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT power(2, 3);
+--------------------------+
| power(Int64(2),Int64(3)) |
+--------------------------+
| 8 |
+--------------------------+

round(x)

Function: Rounded to the nearest whole number.

Parameter Type: Numeric type

Return Type: BIGINT

Example

SELECT round(3.5);
+---------------------+
| round(Float64(3.5)) |
+---------------------+
| 4 |
+---------------------+

signum(x)

Function: Signs of parameter (-1,0,+1).

Parameter Type: Numeric type

Return Type: BIGINT

Example

SELECT signum(-3);
+-------------------+
| signum(Int64(-3)) |
+-------------------+
| -1 |
+-------------------+

sqrt(x)

Function: Square root of x.

Parameter Type: Numeric type

Return Type: Consistent with function parameter type.

Example

SELECT sqrt(4);
+----------------+
| sqrt(Int64(4)) |
+----------------+
| 2 |
+----------------+

tan(x)

Function: Tangent value of x.

Parameter Type: Numeric type

Return Type: DOUBLE

Example

SELECT tan(1);
+-------------------+
| tan(Int64(1)) |
+-------------------+
| 1.557407724654902 |
+-------------------+

trunc(x)

Function: Round to zero.

Parameter Type: Numeric type

Return Type: BIGINT

Example

SELECT trunc(-3.9);
+----------------------+
| trunc(Float64(-3.9)) |
+----------------------+
| -3 |
+----------------------+

struct

Syntax

struct(expr1 [, ...] ) 

Function: Create a STRUCT with the specified field value.

Parameter Type: Numeric type

Note

Function struct is not perfect at present.


Conditional Functions

coalesce

Syntax

coalesce(expr[,...exp])

Function: Return its first non null parameter. Null is returned only when all parameters are null. When retrieving data for display, it is often used to replace the default value with a null value.

Parameter Type: Any type

Return Type: First non null parameter type

Example

SELECT coalesce(temperature, null, station) FROM air;
+--------------------------------------------+
| coalesce(air.temperature,NULL,air.station) |
+--------------------------------------------+
| 69.0 |
| 78.0 |
| 62.0 |
| 79.0 |
| 53.0 |
| 72.0 |
| 71.0 |
| 69.0 |
| 80.0 |
| 74.0 |
| 70.0 |
| 70.0 |
| 70.0 |
+--------------------------------------------+

nullif

Syntax

nullif(expr1, expr2) 

Function: If expr1 is equal to expr2, NULL is returned; Otherwise, expr1 is returned.

Parameter Type: expr1 and expr2 are numeric expressions with column values

Return Type: The type of expr1 or NULL

Example

SELECT nullif(temperature, 70) FROM air;
+-----------------------------------+
| nullif(air.temperature,Int64(70)) |
+-----------------------------------+
| 69 |
| 78 |
| 62 |
| 79 |
| 53 |
| 72 |
| 71 |
| 69 |
| 80 |
| 74 |
| |
| |
| |
+-----------------------------------+

String Functions

ascii

Syntax

ascii(str) 

Function: Convert the first character in str to its ASCII code and return it.

Parameter Type: STRING

Return Type: BIGINT

Example

SELECT ascii('abc');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+
SELECT ascii('a');
+------------------+
| ascii(Utf8("a")) |
+------------------+
| 97 |
+------------------+

bit_length

Syntax

bit_length(str) 

Function: Returns the bit length of string data or the bit size of binary data.

Parameter Type: STRING

Return Type: BIGINT

Example

SELECT bit_length('abc');
+------------------------+
| bitlength(Utf8("abc")) |
+------------------------+
| 24 |
+------------------------+

btrim

Syntax

btrim(string [, matching_string ] ) 

Function: The function trims a string by removing leading and trailing spaces or by removing characters that match an optional specified string.

Parameter Type: STRING

Return Type: STRING

Example

SELECT btrim('     abc                  ');
+-------------------------------------------+
| btrim(Utf8(" abc ")) |
+-------------------------------------------+
| abc |
+-------------------------------------------+
SELECT btrim('111abc111','1');
+------------------------------------+
| btrim(Utf8("111abc111"),Utf8("1")) |
+------------------------------------+
| abc |
+------------------------------------+

trim

Syntax

trim(str) 

Function: Remove blank characters at the begin and end of str.

Parameter Type: STRING

Return Type: STRING


char_length | character_length

Syntax

char_length(expr) 

Function: Return the length of the specified string in characters.

Parameter Type: STRING

Return Type: BIGINT

Example

SELECT char_length('你好');
+-------------------------------+
| characterlength(Utf8("你好")) |
+-------------------------------+
| 2 |
+-------------------------------+

chr

Syntax

chr(expr) 

Function: Return the character at the provided UTF-16 code.

Parameter Type: BIGINT

Return Type: STRING

Example

SELECT chr(20005);
+-------------------+
| chr(Int64(20005)) |
+-------------------+
| 严 |
+-------------------+

concat

Syntax

concat(expr1, expr2 [, ...exp] ) 

Function: Joins two or more expressions and returns the generated expression.

Parameter Type: STRING

Return Type: STRING

Example

SELECT concat('a', 'b', 'c');
+---------------------------------------+
| concat(Utf8("a"),Utf8("b"),Utf8("c")) |
+---------------------------------------+
| abc |
+---------------------------------------+

concat_ws

Syntax

concat_ws(sep , expr1 [, ...] ) 

Function: Return a concatenated string separated by sep.

Parameter Type: STRING

Return Type: STRING

Example

SELECT concat_ws(' ', 'a', 'b', 'c');
+--------------------------------------------------------------+
| concatwithseparator(Utf8(" "),Utf8("a"),Utf8("b"),Utf8("c")) |
+--------------------------------------------------------------+
| a b c |
+--------------------------------------------------------------+

initcap

Syntax

initcap(expr) 

Function: Capitalize the first letter of each word in the parameter.

Parameter Type: STRING

Return Type: BIGINT

Example

SELECT initcap('hello world');
+------------------------------+
| initcap(Utf8("hello world")) |
+------------------------------+
| Hello World |
+------------------------------+

left

Syntax

left(str, len) 

Function: Return the leftmost len characters in str.

Parameter Type: str is STRING type, len is BIGINT type

Return Type: STRING

Example

SELECT left('abcde', 3);
+------------------------------+
| left(Utf8("abcde"),Int64(3)) |
+------------------------------+
| abc |
+------------------------------+

lpad

Syntax

lpad(expr, len [, pad] ) 

Function: Return expr filled with pad on the left. After filling, the length of the whole string is len.

Parameter Type: expr, pad type is STRING, len type is BIGINT

Return Type: BIGINT

When len is a negative number, len represents 0. When len is too large, function execution fails.

Example

SELECT lpad('abc', 10, '1');
+---------------------------------------+
| lpad(Utf8("abc"),Int64(10),Utf8("1")) |
+---------------------------------------+
| 1111111abc |
+---------------------------------------+

rpad

Syntax

rpad(expr, len [, pad] ) 

Function: Return expr filled with pad on the right. After filling, the length of the whole string is len.

Parameter Type: expr, pad is STRING type, len is BIGINT type.

Return Type: STRING

Example

SELECT rpad('aaa', 10, 'b');
+---------------------------------------+
| rpad(Utf8("aaa"),Int64(10),Utf8("b")) |
+---------------------------------------+
| aaabbbbbbb |
+---------------------------------------+

lower

Syntax

lower(expr) 

Function: Return lowercase string.

Parameter Type: STRING

Return Type: STRING

Example

SELECT lower('ABC');
+--------------------+
| lower(Utf8("ABC")) |
+--------------------+
| abc |
+--------------------+

upper

Syntax

upper(expr)

Function: Return uppercase string.

Parameter Type: STRING

Return Type: STRING


ltrim

Syntax

ltrim(str[, trimstr] ) 

Function: Returns str, in which the leading characters in trimstr are deleted. The default trimestr is blank character.

Parameter Type: STRING

Return Type: STRING

Example

SELECT ltrim('   abc');
+-----------------------+
| ltrim(Utf8(" abc")) |
+-----------------------+
| abc |
+-----------------------+

md5