Quality functions
Used to assess the quality of time series data
Sample Data
View example
CREATE TABLE wzz(value double);
INSERT wzz VALUES (1, 12.34), (3, 34.54 ), (4, 1.43), (6, 14.03), (10, 12.30), (13, 11.54), (14, 112.20), (16, 14.44), (18, 134.02), (19, 116.34), (22, 1234.45), (24,10.36), (26, 124.21), (31, 6.34), (33, acos(12345));
SELECT * FROM wzz;
+-------------------------------+---------+
| time | value |
+-------------------------------+---------+
| 1970-01-01T00:00:00.000000001 | 12.34 |
| 1970-01-01T00:00:00.000000003 | 34.54 |
| 1970-01-01T00:00:00.000000004 | 1.43 |
| 1970-01-01T00:00:00.000000006 | 14.03 |
| 1970-01-01T00:00:00.000000010 | 12.3 |
| 1970-01-01T00:00:00.000000013 | 11.54 |
| 1970-01-01T00:00:00.000000014 | 112.2 |
| 1970-01-01T00:00:00.000000016 | 14.44 |
| 1970-01-01T00:00:00.000000018 | 134.02 |
| 1970-01-01T00:00:00.000000019 | 116.34 |
| 1970-01-01T00:00:00.000000022 | 1234.45 |
| 1970-01-01T00:00:00.000000024 | 10.36 |
| 1970-01-01T00:00:00.000000026 | 124.21 |
| 1970-01-01T00:00:00.000000031 | 6.34 |
| 1970-01-01T00:00:00.000000033 | NaN |
+-------------------------------+---------+
completeness
Used to calculate the integrity of time series, which measures the proportion of data that is not missing.
The function completeness
first counts the number of rows of data, cnt.Then consider the possibility of NaN and Inf in the data column, perform linear smoothing on them, and count the special values specialcnt.Then scan the data to calculate the missing count misscnt.The formula for calculating completeness is
completeness(time_expresion, numeric_expression)
Options | Description |
---|---|
time_expresion | The time expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
numeric_expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
View example
The following example uses the example data that starts this document.
Querying the integrity of time series data:
SELECT completeness(time, value) FROM wzz;
+----------------------------------+
| completeness(wzz.time,wzz.value) |
+----------------------------------+
| 0.8235294117647058 |
+----------------------------------+
consistency
Calculate the consistency of time series, which measures the proportion of sparsely redundant evenly distributed time series data.
consistency(time_expresion, numeric_expression)
Options | Description |
---|---|
time_expresion | The time expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
numeric_expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
With the same function completeness
, after missing value filling, redundancy is calculated by scanning the data for redundant counts.The calculation formula for consistency consistency is:
View example
The following example uses the example data that starts this document.
SELECT consistency(time, value) FROM wzz;
+---------------------------------+
| consistency(wzz.time,wzz.value) |
+---------------------------------+
| 0.8666666666666667 |
+---------------------------------+
timeliness
Used to calculate the timeliness of time series, which measures the proportion of timely arrival of time series data without delay.
timeliness(time_expresion, numeric_expression)
Options | Description |
---|---|
time_expresion | The time expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
numeric_expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
With the same function completeness
, after missing value filling, latecnt is calculated by scanning the data for redundant counts.The calculation formula for timeliness timeliness
:
View example
The following example uses the example data that starts this document.
SELECT timeliness(time, value) FROM wzz;
+--------------------------------+
| timeliness(wzz.time,wzz.value) |
+--------------------------------+
| 0.9333333333333333 |
+--------------------------------+
validity
Used to calculate the effectiveness of time series, which measures the proportion of data satisfying the constraints.
validity(time_expresion, numeric_expression)
Options | Description |
---|---|
time_expresion | The time expression to operate.Can be a constant, column, or function, and any combination of arithmetic operators. |
numeric_expression | Expression to operate on.Can be a constant, column, or function, and any combination of arithmetic operators. |
首先统计数据的行数 cnt 。然后进行缺失值填充,去除其中的 NaN 和 Inf 。然后通过自定义计算方法得到计数 valuecnt、variationcnt、speedcnt、speedchangecnt 。则有效性 validity 的计算公式:
View example
The following example uses the example data that starts this document.
SELECT validity(time, value) FROM wzz;
+------------------------------+
| validity(wzz.time,wzz.value) |
+------------------------------+
| 0.8 |
+------------------------------+