Skip to main content
Version: latest

Aggregate Functions

Aggregate functions are commonly used functions in databases for aggregating calculations and summaries.They take a set of values as input and return a single aggregated result.Aggregate functions can be used to perform various operations, such as calculating totals, averages, maximums, minimums, etc.

General

avg

Returns the average of the values in the specified column.

avg(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View avg Example
SELECT station, avg(temperature) FROM air group by station;
+-------------+----------------------+
| station | AVG(air.temperature) |
+-------------+----------------------+
| XiaoMaiDao | 64.93894989583701 |
| LianYunGang | 65.12753786942551 |
+-------------+----------------------+

count

Returns the number of rows in the specified column.

count(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View count Example
SELECT station, count(temperature) FROM air group by station;
+-------------+------------------------+
| station | COUNT(air.temperature) |
+-------------+------------------------+
| LianYunGang | 28321 |
| XiaoMaiDao | 28321 |
+-------------+------------------------+

Push down count function

Count will only be pushed down to the TSKV layer when the SQL statement is 'SELECT count(*) FROM table_name;' or 'SELECT count(field) FROM table_name;'. This allows for obtaining the row count by reading the statistical information from the underlying files, avoiding actual data reads and improving efficiency.

However, there may be duplicate timestamp data that results in a count higher than the actual number of rows. To address this, an exact_count that will not be pushed down has been added. Note: exact_count can only be used to replace the above statements; using it in other statements may result in an error.

View Push down count function Example
CREATE TABLE air(visibility DOUBLE, temperature DOUBLE, pressure DOUBLE, TAGS(station));

// 写入有重复时间戳的数据
INSERT INTO air (TIME, station, visibility, temperature, pressure) VALUES
('2022-10-19 01:40:00', 'XiaoMaiDao', 55, 68, 71),
('2022-10-19 01:40:00', 'XiaoMaiDao', 55, 68, 72),
('2022-10-19 02:40:00', 'XiaoMaiDao', 55, 68, 73),
('2022-10-19 03:40:00', 'XiaoMaiDao', 55, 68, 75),
('2022-10-19 04:40:00', 'XiaoMaiDao', 55, 68, 77),
('2022-10-19 05:40:00', 'XiaoMaiDao', 55, 68, 80);

SELECT count(*) FROM air; // 直接读取底层文件统计信息,重复时间戳不会去重
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 6 |
+-----------------+

SELECT count(pressure) FROM air; // 直接读取底层文件统计信息,重复时间戳不会去重
+---------------------+
| COUNT(air.pressure) |
+---------------------+
| 6 |
+---------------------+

SELECT exact_count(*) FROM air; // 精确count(*),不会下推,重复时间戳会去重
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 5 |
+-----------------+

SELECT exact_count(pressure) FROM air; // 精确count(field),不会下推,重复时间戳会去重
+---------------------+
| COUNT(air.pressure) |
+---------------------+
| 5 |
+---------------------+

max

Returns the number of rows in the specified column.

max(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View max Example
SELECT station, max(temperature) FROM air group by station;

+-------------+----------------------+
| station | MAX(air.temperature) |
+-------------+----------------------+
| LianYunGang | 80.0 |
| XiaoMaiDao | 80.0 |
+-------------+----------------------+

mean

Alias for avg.

median

返回指定列中的中值。

median(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View median Example
SELECT  median(temperature) FROM air;
+-------------------------+
| MEDIAN(air.temperature) |
+-------------------------+
| 65.0 |
+-------------------------+

min

Returns the minimum value in the specified column.

min(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View min Example
SELECT  min(temperature) FROM air;
+----------------------+
| MIN(air.temperature) |
+----------------------+
| 50.0 |
+----------------------+

sum

Returns the sum of all values in the specified column.

sum(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View sum Example
SELECT  sum(temperature) FROM air;
+----------------------+
| SUM(air.temperature) |
+----------------------+
| 3683613.0 |
+----------------------+
Query took 0.035 seconds.

array_agg

Returns an array created from expression elements.If sorting requirements are given, insert elements in the order required for sorting.

array_agg(expression [ORDER BY expression])
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View array_agg Example

Assume the content in the table is as follows:

SELECT time,temperature FROM air ORDER BY time limit 10;
+---------------------+-------------+
| time | temperature |
+---------------------+-------------+
| 2023-01-14T16:00:00 | 78.0 |
| 2023-01-14T16:00:00 | 80.0 |
| 2023-01-14T16:03:00 | 54.0 |
| 2023-01-14T16:03:00 | 64.0 |
| 2023-01-14T16:06:00 | 54.0 |
| 2023-01-14T16:06:00 | 79.0 |
| 2023-01-14T16:09:00 | 75.0 |
| 2023-01-14T16:09:00 | 55.0 |
| 2023-01-14T16:12:00 | 50.0 |
| 2023-01-14T16:12:00 | 74.0 |
+---------------------+-------------+

Run the following SQL:

WITH  sample AS (SELECT time,temperature FROM air ORDER BY time limit 10)
SELECT array_agg(temperature) from sample;

The result will be:

+--------------------------------------------------------------+
| ARRAY_AGG(sample.temperature) |
+--------------------------------------------------------------+
| [78.0, 80.0, 54.0, 64.0, 54.0, 79.0, 75.0, 55.0, 50.0, 74.0] |
+--------------------------------------------------------------+

This result is an array containing the temperature values of all rows in the table.

array_agg can also be used in conjunction with the ORDER BY clause to determine the order of values in the array. If we want the resulting array of temperature to be sorted in ascending order, we can write it like this:

WITH  sample AS (SELECT time,temperature FROM air ORDER BY time limit 10)
SELECT array_agg(temperature ORDER BY temperature ASC) from sample;

This will return

+--------------------------------------------------------------+
| ARRAY_AGG(sample.temperature) |
+--------------------------------------------------------------+
| [50.0, 54.0, 54.0, 55.0, 64.0, 74.0, 75.0, 78.0, 79.0, 80.0] |
+--------------------------------------------------------------+

first_value

Return the first element in the aggregate group in the order of the request.If no order is specified, returns an arbitrary element from the group.

first_value(expression [ORDER BY expression])
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View first_value Example
SELECT station, first_value(temperature ORDER BY time) FROM air GROUP BY station;
+-------------+------------------------------+
| station | FIRST_VALUE(air.temperature) |
+-------------+------------------------------+
| LianYunGang | 78.0 |
| XiaoMaiDao | 80.0 |
+-------------+------------------------------+

first

Return the first record sorted by time.

first(time_expression, expression)
ParametersDescription
time_expressionMust be for the time column.
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View first Example
SELECT first(time, temperature) FROM air;
+---------------------------------+
| first(air.time,air.temperature) |
+---------------------------------+
| 80.0 |
+---------------------------------+

last_value

Return the first element in the aggregate group in the order of the request.If no order is specified, returns an arbitrary element from the group.

last_value(expression [ORDER BY expression])
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View last_value Example
SELECT station, last_value(temperature ORDER BY time) FROM air GROUP BY station;
+-------------+-----------------------------+
| station | LAST_VALUE(air.temperature) |
+-------------+-----------------------------+
| XiaoMaiDao | 55.0 |
| LianYunGang | 50.0 |
+-------------+-----------------------------+

last

Return the last record sorted by time.

last(expression_x, expression_y)
ParametersDescription
time_expressionMust be for the time column.
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View last Example

SELECT last(time, temperature) FROM air;
+--------------------------------+
| last(air.time,air.temperature) |
+--------------------------------+
| 50.0 |
+--------------------------------+

mode

Calculate the most frequently occurring value in a set of data.

mode(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View mode Example
SELECT mode(temperature) FROM air;
+-----------------------+
| mode(air.temperature) |
+-----------------------+
| 80.0 |
+-----------------------+

increase

Calculate the increment of the time series in the range vector, similar to the increase function in Prometheus.

increase(time_expression, expression ORDER BY time_expression)
ParametersDescription
time_expressionMust be for the time column.
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View mode Example
SELECT station, INCREASE(time, temperature ORDER BY time) FROM air GROUP BY station ORDER BY station;
+-------------+------------------------------------+
| station | increase(air.time,air.temperature) |
+-------------+------------------------------------+
| LianYunGang | 964366.0 |
| XiaoMaiDao | 961627.0 |
+-------------+------------------------------------+

Statistics

corr

Calculate the Pearson correlation coefficient between two columns, which is a method of measuring the degree of linear correlation between two variables.The value of Pearson correlation coefficient ranges between -1 and 1, where 1 indicates perfect positive correlation, -1 indicates perfect negative correlation, and 0 indicates no linear correlation.

corr(expression1, expression2)
ParametersDescription
expression1The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
expression1The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
View corr Example
SELECT corr(temperature, pressure) FROM air;
+-------------------------------------------+
| CORRELATION(air.temperature,air.pressure) |
+-------------------------------------------+
| 0.003247570100691381 |
+-------------------------------------------+

covar

Return a set of covariance of number pairs.

covar(expression1, expression2)
ParametersDescription
expression1The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
expression1The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
View covar Example
SELECT covar(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| 0.2589293257928204 |
+------------------------------------------+

covar_pop

Return the total covariance of a set of number pairs.

covar_pop(expression1, expression2)
ParametersDescription
expression1The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
expression1The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
View covar_pop Example
SELECT covar_pop(temperature, pressure) FROM air;
+----------------------------------------------+
| COVARIANCE_POP(air.temperature,air.pressure) |
+----------------------------------------------+
| 0.25892475446190355 |
+----------------------------------------------+

covar_samp

Return a set of sample covariance of number pairs.

covar_samp(expression1, expression2)
ParametersDescription
expression1The first expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
expression1The second expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators.
View covar_samp Example
SELECT covar_samp(temperature, pressure) FROM air;
+------------------------------------------+
| COVARIANCE(air.temperature,air.pressure) |
+------------------------------------------+
| 0.2589293257928204 |
+------------------------------------------+

stddev

Returns the standard deviation of a set of numbers.

stddev(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View stddev Example
SELECT stddev(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 8.938534326752999 |
+-------------------------+

stddev_pop

Returns the overall standard deviation of a set of numbers.

stddev_pop(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View stddev_pop Example
SELECT stddev_pop(temperature) FROM air;
+-----------------------------+
| STDDEV_POP(air.temperature) |
+-----------------------------+
| 8.938455422637864 |
+-----------------------------+

stddev_samp

Returns the sample standard deviation of a set of numbers.

stddev_samp(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View stddev_samp Example
SELECT stddev_samp(temperature) FROM air;
+-------------------------+
| STDDEV(air.temperature) |
+-------------------------+
| 8.938534326752999 |
+-------------------------+

var

Return a set of statistical variances for a set of numbers.

var(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View var Example
SELECT var(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 79.89739591054169 |
+---------------------------+

var_pop

Return the total population variance of a set of numbers.

var_pop(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View var_pop Example
SELECT var_pop(temperature) FROM air;
+-------------------------------+
| VARIANCE_POP(air.temperature) |
+-------------------------------+
| 79.89598534248422 |
+-------------------------------+

var_samp

Returns the statistical sample variance of a set of numbers.

var_samp(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View var_samp Example
SELECT var_samp(temperature) FROM air;
+---------------------------+
| VARIANCE(air.temperature) |
+---------------------------+
| 79.89739591054169 |
+---------------------------+

Approx

approx_distinct

Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.

approx_distinct(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View asap_smooth Example

This SQL query uses the approx_distinct function to estimate the number of unique values in the station column of the air table.The function of approx_distinct(station) is to calculate approximately how many different station values exist in the air table.The query result shows that the approx_unique_station value is 2, which means that according to the estimate of the approx_distinct function, there are approximately 2 different meteorological stations in the air table (i.e. the number of unique values in the station column is approximately 2).

SELECT approx_distinct(station) AS approx_unique_station FROM air;
+-----------------------+
| approx_unique_station |
+-----------------------+
| 2 |
+-----------------------+

approx_median

Returns the approximate median of the input values (the 50th percentile).It is an alias for approx_percentile_cont(x, 0.5).

approx_median(expression)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
View approx_median Example
SELECT approx_median(temperature) as approx_median_temperature FROM air;
+---------------------------+
| approx_median_temperature |
+---------------------------+
| 64.91965582214088 |
+---------------------------+

approx_percentile_cont

Returns the weighted approximate percentile of the input values using the t-digest algorithm.

approx_percentile_cont(expression, percentile, centroids)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
percentileThe percentile to be calculated.Must be a floating point value between 0 and 1 (inclusive).
centroidsOptional, the centroid number used in the t-digest algorithm.Default value is 100.
tip

If there are this number or fewer unique values, exact results can be expected.The more centroids there are, the more accurate the approximate value, but more memory is needed for calculation.

View approx_percentile_cont Example
SELECT approx_percentile_cont(temperature, 0.1,100) FROM air;
+-----------------------------------------------------------------+
| APPROX_PERCENTILE_CONT(air.temperature,Float64(0.1),Int64(100)) |
+-----------------------------------------------------------------+
| 53.0 |
+-----------------------------------------------------------------+

approx_percentile_cont_with_weight

Returns the weighted approximate percentile of the input values using the t-digest algorithm.

approx_percentile_cont_with_weight(expression, weight, percentile)
ParametersDescription
expressionExpression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators.
weightExpression used as a weight.Can be a constant, column, or function, and any combination of arithmetic operators.
percentileThe percentile to be calculated.Must be a floating point value between 0 and 1 (inclusive).
tip

If there are this number or fewer unique values, exact results can be expected.The more centroids there are, the more accurate the approximate value, but more memory is needed for calculation.

View approx_percentile_cont_with_weight Example
SELECT approx_percentile_cont_with_weight(temperature, 0.1,0.5) FROM air;
+-------------------------------------------------------------------------------+
| APPROX_PERCENTILE_CONT_WITH_WEIGHT(air.temperature,Float64(0.1),Float64(0.5)) |
+-------------------------------------------------------------------------------+
| 80.0 |
+-------------------------------------------------------------------------------+

sample

Select n random records from the given column.

sample(expression, n)
ParametersDescription
expressionExpression to operate on.Must be a column in the table.
nNumber of records to be returned.
View sample Example
SELECT sample(temperature, 5) FROM air;
+--------------------------------------+
| sample(air.temperature,Int64(5)) |
+--------------------------------------+
| [74.0, 76.0, 53.0, 56.0, 65.0, 58.0] |
+--------------------------------------+