Aggregate Functions
Aggregate functions are commonly used functions in databases for aggregating calculations and summaries.They take a set of values as input and return a single aggregated result.Aggregate functions can be used to perform various operations, such as calculating totals, averages, maximums, minimums, etc.
General
avg
Returns the average of the values in the specified column.
avg(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View avg
Example
SELECT station, avg(temperature) FROM air group by station;
+-------------+----------------------+
| station | AVG(air.temperature) |
+-------------+----------------------+
| XiaoMaiDao | 64.93894989583701 |
| LianYunGang | 65.12753786942551 |
+-------------+----------------------+
count
Returns the number of rows in the specified column.
count(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View count
Example
SELECT station, count(temperature) FROM air group by station;
+-------------+------------------------+
| station | COUNT(air.temperature) |
+-------------+------------------------+
| LianYunGang | 28321 |
| XiaoMaiDao | 28321 |
+-------------+------------------------+
Push down count function
Count will only be pushed down to the TSKV layer when the SQL statement is 'SELECT count(*) FROM table_name;' or 'SELECT count(field) FROM table_name;'. This allows for obtaining the row count by reading the statistical information from the underlying files, avoiding actual data reads and improving efficiency.
However, there may be duplicate timestamp data that results in a count higher than the actual number of rows. To address this, an exact_count that will not be pushed down has been added. Note: exact_count can only be used to replace the above statements; using it in other statements may result in an error.
View Push down count function
Example
CREATE TABLE air(visibility DOUBLE, temperature DOUBLE, pressure DOUBLE, TAGS(station));
// 写入有重复时间戳的数据
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 01:40:00', 'XiaoMaiDao', 55, 68, 71),
('2022-10-19 01:40:00', 'XiaoMaiDao', 55, 68, 72),
('2022-10-19 02:40:00', 'XiaoMaiDao', 55, 68, 73),
('2022-10-19 03:40:00', 'XiaoMaiDao', 55, 68, 75),
('2022-10-19 04:40:00', 'XiaoMaiDao', 55, 68, 77),
('2022-10-19 05:40:00', 'XiaoMaiDao', 55, 68, 80);
SELECT count(*) FROM air; // 直接读取底层文件统计信息,重复时间戳不会去重
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 6 |
+-----------------+
SELECT count(pressure) FROM air; // 直接读取底层文件统计信息,重复时间戳不会去重
+---------------------+
| COUNT(air.pressure) |
+---------------------+
| 6 |
+---------------------+
SELECT exact_count(*) FROM air; // 精确count(*),不会下推,重复时间戳会去重
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 5 |
+-----------------+
SELECT exact_count(pressure) FROM air; // 精确count(field),不会下推,重复时间戳会去重
+---------------------+
| COUNT(air.pressure) |
+---------------------+
| 5 |
+---------------------+
max
Returns the number of rows in the specified column.
max(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View max
Example
SELECT station, max(temperature) FROM air group by station;
+-------------+----------------------+
| station | MAX(air.temperature) |
+-------------+----------------------+
| LianYunGang | 80.0 |
| XiaoMaiDao | 80.0 |
+-------------+----------------------+
mean
Alias for avg
.
median
返回指定列中的中值。
median(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View median
Example
SELECT median(temperature) FROM air;
+-------------------------+
| MEDIAN(air.temperature) |
+-------------------------+
| 65.0 |
+-------------------------+
min
Returns the minimum value in the specified column.
min(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View min
Example
SELECT min(temperature) FROM air;
+----------------------+
| MIN(air.temperature) |
+----------------------+
| 50.0 |
+----------------------+
sum
Returns the sum of all values in the specified column.
sum(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View sum
Example
SELECT sum(temperature) FROM air;
+----------------------+
| SUM(air.temperature) |
+----------------------+
| 3683613.0 |
+----------------------+
Query took 0.035 seconds.
array_agg
Returns an array created from expression elements.If sorting requirements are given, insert elements in the order required for sorting.
array_agg(expression [ORDER BY expression])
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View array_agg
Example
Assume the content in the table is as follows:
SELECT time,temperature FROM air ORDER BY time limit 10;
+---------------------+-------------+
| time | temperature |
+---------------------+-------------+
| 2023-01-14T16:00:00 | 78.0 |
| 2023-01-14T16:00:00 | 80.0 |
| 2023-01-14T16:03:00 | 54.0 |
| 2023-01-14T16:03:00 | 64.0 |
| 2023-01-14T16:06:00 | 54.0 |
| 2023-01-14T16:06:00 | 79.0 |
| 2023-01-14T16:09:00 | 75.0 |
| 2023-01-14T16:09:00 | 55.0 |
| 2023-01-14T16:12:00 | 50.0 |
| 2023-01-14T16:12:00 | 74.0 |
+---------------------+-------------+
Run the following SQL:
WITH sample AS (SELECT time,temperature FROM air ORDER BY time limit 10)
SELECT array_agg(temperature) from sample;
The result will be:
+--------------------------------------------------------------+
| ARRAY_AGG(sample.temperature) |
+--------------------------------------------------------------+
| [78.0, 80.0, 54.0, 64.0, 54.0, 79.0, 75.0, 55.0, 50.0, 74.0] |
+--------------------------------------------------------------+
This result is an array containing the temperature values of all rows in the table.
array_agg
can also be used in conjunction with the ORDER BY
clause to determine the order of values in the array. If we want the resulting array of temperature
to be sorted in ascending order, we can write it like this:
WITH sample AS (SELECT time,temperature FROM air ORDER BY time limit 10)
SELECT array_agg(temperature ORDER BY temperature ASC) from sample;
This will return
+--------------------------------------------------------------+
| ARRAY_AGG(sample.temperature) |
+--------------------------------------------------------------+
| [50.0, 54.0, 54.0, 55.0, 64.0, 74.0, 75.0, 78.0, 79.0, 80.0] |
+--------------------------------------------------------------+
first_value
Return the first element in the aggregate group in the order of the request.If no order is specified, returns an arbitrary element from the group.
first_value(expression [ORDER BY expression])
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View first_value
Example
SELECT station, first_value(temperature ORDER BY time) FROM air GROUP BY station;
+-------------+------------------------------+
| station | FIRST_VALUE(air.temperature) |
+-------------+------------------------------+
| LianYunGang | 78.0 |
| XiaoMaiDao | 80.0 |
+-------------+------------------------------+
first
Return the first record sorted by time.
first(time_expression, expression)
Parameters | Description |
---|---|
time_expression | Must be for the time column. |
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View first
Example
SELECT first(time, temperature) FROM air;
+---------------------------------+
| first(air.time,air.temperature) |
+---------------------------------+
| 80.0 |
+---------------------------------+
last_value
Return the first element in the aggregate group in the order of the request.If no order is specified, returns an arbitrary element from the group.
last_value(expression [ORDER BY expression])
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View last_value
Example
SELECT station, last_value(temperature ORDER BY time) FROM air GROUP BY station;
+-------------+-----------------------------+
| station | LAST_VALUE(air.temperature) |
+-------------+-----------------------------+
| XiaoMaiDao | 55.0 |
| LianYunGang | 50.0 |
+-------------+-----------------------------+
last
Return the last record sorted by time.
last(expression_x, expression_y)
Parameters | Description |
---|---|
time_expression | Must be for the time column. |
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View last
Example
SELECT last(time, temperature) FROM air;
+--------------------------------+
| last(air.time,air.temperature) |
+--------------------------------+
| 50.0 |
+--------------------------------+
mode
Calculate the most frequently occurring value in a set of data.
mode(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View mode
Example
SELECT mode(temperature) FROM air;
+-----------------------+
| mode(air.temperature) |
+-----------------------+
| 80.0 |
+-----------------------+
increase
Calculate the increment of the time series in the range vector, similar to the increase function in Prometheus.
increase(time_expression, expression ORDER BY time_expression)
Parameters | Description |
---|---|
time_expression | Must be for the time column. |
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View mode
Example
SELECT station, INCREASE(time, temperature ORDER BY time) FROM air GROUP BY station ORDER BY station;
+-------------+------------------------------------+
| station | increase(air.time,air.temperature) |
+-------------+------------------------------------+
| LianYunGang | 964366.0 |
| XiaoMaiDao | 961627.0 |
+-------------+------------------------------------+
Statistics
corr
Calculate the Pearson correlation coefficient between two columns, which is a method of measuring the degree of linear correlation between two variables.The value of Pearson correlation coefficient ranges between -1 and 1, where 1 indicates perfect positive correlation, -1 indicates perfect negative correlation, and 0 indicates no linear correlation.
corr(expression1, expression2)
Parameters | Description |
---|---|
expression1 | The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
expression1 | The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
View corr
Example
SELECT corr(temperature, pressure) FROM air;
+-------------------------------------------+
| CORRELATION(air.temperature,air.pressure) |
+-------------------------------------------+
| 0.003247570100691381 |
+-------------------------------------------+
covar
Return a set of covariance of number pairs.
covar(expression1, expression2)
Parameters | Description |
---|---|
expression1 | The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
expression1 | The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
View covar
Example
SELECT covar(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| 0.2589293257928204 |
+------------------------------------------+
covar_pop
Return the total covariance of a set of number pairs.
covar_pop(expression1, expression2)
Parameters | Description |
---|---|
expression1 | The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
expression1 | The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
View covar_pop
Example
SELECT covar_pop(temperature, pressure) FROM air;
+----------------------------------------------+
| COVARIANCE_POP(air.temperature,air.pressure) |
+----------------------------------------------+
| 0.25892475446190355 |
+----------------------------------------------+
covar_samp
Return a set of sample covariance of number pairs.
covar_samp(expression1, expression2)
Parameters | Description |
---|---|
expression1 | The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
expression1 | The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
View covar_samp
Example
SELECT covar_samp(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| 0.2589293257928204 |
+------------------------------------------+
stddev
Returns the standard deviation of a set of numbers.
stddev(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View stddev
Example
SELECT stddev(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 8.938534326752999 |
+-------------------------+
stddev_pop
Returns the overall standard deviation of a set of numbers.
stddev_pop(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View stddev_pop
Example
SELECT stddev_pop(temperature) FROM air;
+-----------------------------+
| STDDEV_POP(air.temperature) |
+-----------------------------+
| 8.938455422637864 |
+-----------------------------+
stddev_samp
Returns the sample standard deviation of a set of numbers.
stddev_samp(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View stddev_samp
Example
SELECT stddev_samp(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 8.938534326752999 |
+-------------------------+
var
Return a set of statistical variances for a set of numbers.
var(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View var
Example
SELECT var(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 79.89739591054169 |
+---------------------------+
var_pop
Return the total population variance of a set of numbers.
var_pop(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View var_pop
Example
SELECT var_pop(temperature) FROM air;
+-------------------------------+
| VARIANCE_POP(air.temperature) |
+-------------------------------+
| 79.89598534248422 |
+-------------------------------+
var_samp
Returns the statistical sample variance of a set of numbers.
var_samp(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View var_samp
Example
SELECT var_samp(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 79.89739591054169 |
+---------------------------+
Approx
approx_distinct
Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
approx_distinct(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View asap_smooth
Example
This SQL query uses the approx_distinct
function to estimate the number of unique values in the station
column of the air
table.The function of approx_distinct(station)
is to calculate approximately how many different station
values exist in the air
table.The query result shows that the approx_unique_station
value is 2, which means that according to the estimate of the approx_distinct
function, there are approximately 2 different meteorological stations in the air
table (i.e. the number of unique values in the station
column is approximately 2).
SELECT approx_distinct(station) AS approx_unique_station FROM air;
+-----------------------+
| approx_unique_station |
+-----------------------+
| 2 |
+-----------------------+
approx_median
Returns the approximate median of the input values (the 50th percentile).It is an alias for approx_percentile_cont(x, 0.5)
.
approx_median(expression)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View approx_median
Example
SELECT approx_median(temperature) as approx_median_temperature FROM air;
+---------------------------+
| approx_median_temperature |
+---------------------------+
| 64.91965582214088 |
+---------------------------+
approx_percentile_cont
Returns the weighted approximate percentile of the input values using the t-digest algorithm.
approx_percentile_cont(expression, percentile, centroids)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
percentile | The percentile to be calculated.Must be a floating point value between 0 and 1 (inclusive). |
centroids | Optional, the centroid number used in the t-digest algorithm.Default value is 100. |
If there are this number or fewer unique values, exact results can be expected.The more centroids there are, the more accurate the approximate value, but more memory is needed for calculation.
View approx_percentile_cont
Example
SELECT approx_percentile_cont(temperature, 0.1,100) FROM air;
+-----------------------------------------------------------------+
| APPROX_PERCENTILE_CONT(air.temperature,Float64(0.1),Int64(100)) |
+-----------------------------------------------------------------+
| 53.0 |
+-----------------------------------------------------------------+
approx_percentile_cont_with_weight
Returns the weighted approximate percentile of the input values using the t-digest algorithm.
approx_percentile_cont_with_weight(expression, weight, percentile)
Parameters | Description |
---|---|
expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
weight | Expression used as a weight.Can be a constant, column, or function, and any combination of arithmetic operators. |
percentile | The percentile to be calculated.Must be a floating point value between 0 and 1 (inclusive). |
If there are this number or fewer unique values, exact results can be expected.The more centroids there are, the more accurate the approximate value, but more memory is needed for calculation.
View approx_percentile_cont_with_weight
Example
SELECT approx_percentile_cont_with_weight(temperature, 0.1,0.5) FROM air;
+-------------------------------------------------------------------------------+
| APPROX_PERCENTILE_CONT_WITH_WEIGHT(air.temperature,Float64(0.1),Float64(0.5)) |
+-------------------------------------------------------------------------------+
| 80.0 |
+-------------------------------------------------------------------------------+
sample
Select n random records from the given column.
sample(expression, n)
Parameters | Description |
---|---|
expression | Expression to operate on.Must be a column in the table. |
n | Number of records to be returned. |
View sample
Example
SELECT sample(temperature, 5) FROM air;
+--------------------------------------+
| sample(air.temperature,Int64(5)) |
+--------------------------------------+
| [74.0, 76.0, 53.0, 56.0, 65.0, 58.0] |
+--------------------------------------+
- 社区版
- 企业版
asap_smooth
The asap_smooth
function is used for smoothing time series data. The ASAP (As Smooth As Possible) smoothing algorithm aims to quickly smooth time series data while preserving key trends and patterns in the data, making the visualization of the data clearer without losing important information due to excessive smoothing.
asap_smooth(time, value, resolution ORDER BY time)
Parameters | Description |
---|---|
time | Timestamp of each data point. |
value | The value of each timestamp. |
resolution | Approximate points to return.Determine the horizontal resolution of the resulting image. |
View asap_smooth
Example
SELECT asap_smooth(time, pressure, 10) FROM air GROUP BY date_trunc('month', time);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| asap_smooth(air.time,air.pressure,Int64(10)) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {time: [2023-01-14T16:00:00, 2023-01-16T14:13:00, 2023-01-18T12:26:00, 2023-01-20T10:39:00, 2023-01-22T08:52:00, 2023-01-24T07:05:00, 2023-01-26T05:18:00, 2023-01-28T03:31:00, 2023-01-30T01:44:00, 2023-01-31T23:57:00], value: [64.79507211538461, 65.31009615384616, 65.25841346153847, 64.8485576923077, 65.09495192307692, 65.02524038461539, 64.8389423076923, 65.2421875, 65.02103365384616, 65.1141826923077], resolution: 10} |
| {time: [2023-02-01T00:00:00, 2023-02-04T02:39:40, 2023-02-07T05:19:20, 2023-02-10T07:59:00, 2023-02-13T10:38:40, 2023-02-16T13:18:20, 2023-02-19T15:58:00, 2023-02-22T18:37:40, 2023-02-25T21:17:20, 2023-02-28T23:57:00], value: [65.20982142857143, 64.90625, 64.94828869047619, 64.97916666666667, 64.88504464285714, 64.8203125, 64.64434523809524, 64.88802083333333, 65.0, 64.76004464285714], resolution: 10} |
| {time: [2023-03-01T00:00:00, 2023-03-02T12:26:40, 2023-03-04T00:53:20, 2023-03-05T13:20:00, 2023-03-07T01:46:40, 2023-03-08T14:13:20, 2023-03-10T02:40:00, 2023-03-11T15:06:40, 2023-03-13T03:33:20, 2023-03-14T16:00:00], value: [65.29115853658537, 64.58307926829268, 64.7530487804878, 64.76753048780488, 65.14405487804878, 65.4298780487805, 65.1920731707317, 65.10365853658537, 64.86356707317073, 64.83841463414635], resolution: 10} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+