Skip to main content
Version: 2.3.x

Downsampling

The data write cycle is generally based on the actual table write frequency, which is usually related to the device that collects the data, sometimes it may need to process a large number of data points per second, and processing so much data for a long time may cause storage problems.A more natural solution would be to lower data samples.A more natural solution would be to lower data samples.

Downsampling in the timing database refers to the downsampling of timing data, the original fine-grained data downsampling to get the coarser-grained data, in order to save storage costs, downsampling data will only retain some statistical characteristics of the original data.This chapter describes how to use CnosDB for automated data sampling.

Definition

Stream Query:is a special query in CnosDB for processing stream data calculation, the stream query requires that the SELECT function must contain the GROUP BY time() phrase.

Note: This article does not describe the syntax of how to create a stream query in detail, for more details, please click Stream query to jump to the corresponding interface.

Data Samples

Let's take the air table in the oceanic station library as an example:

select * from air limit 5;
+---------------------+------------+----------+-------------+------------+
| time | station | pressure | temperature | visibility |
+---------------------+------------+----------+-------------+------------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 63.0 | 80.0 | 79.0 |
| 2023-01-14T16:03:00 | XiaoMaiDao | 58.0 | 64.0 | 78.0 |
| 2023-01-14T16:06:00 | XiaoMaiDao | 65.0 | 79.0 | 67.0 |
| 2023-01-14T16:09:00 | XiaoMaiDao | 52.0 | 55.0 | 73.0 |
| 2023-01-14T16:12:00 | XiaoMaiDao | 59.0 | 74.0 | 64.0 |
+---------------------+------------+----------+-------------+------------+
Query took 0.028 seconds.

Targets

Assuming that the frequency of air meter data writing is 1min, but we only want to know the change of each index every 1h, such as the maximum value of the pressure, the average value of the temperature, the sum of the temperature, and the number of data rows in the specified time window.Then the corresponding sql is created as follows:

INSERT INTO air_down_sampling_1hour(time, station, max_pressure, avg_temperature, sum_temperature, count_pressure) 
SELECT
date_bin(INTERVAL '1' HOUR, time, TIMESTAMP '2023-01-14T16:00:00') time,
station,
MAX(pressure) max_pressure,
AVG(temperature) avg_temperature,
SUM(temperature) sum_temperature,
COUNT(pressure) count_pressure
FROM air_stream
GROUP BY date_bin(INTERVAL '1' HOUR, time, TIMESTAMP '2023-01-14T16:00:00'), station;

Results

SELECT * FROM air_down_sampling_1hour LIMIT 10;

+---------------------+------------+--------------+-----------------+-----------------+----------------+
| time | station | max_pressure | avg_temperature | sum_temperature | count_pressure |
+---------------------+------------+--------------+-----------------+-----------------+----------------+
| 2023-01-14T16:00:00 | XiaoMaiDao | 80.0 | 68.05 | 1361.0 | 20 |
| 2023-01-14T17:00:00 | XiaoMaiDao | 79.0 | 63.75 | 1275.0 | 20 |
| 2023-01-14T18:00:00 | XiaoMaiDao | 79.0 | 66.35 | 1327.0 | 20 |
| 2023-01-14T19:00:00 | XiaoMaiDao | 78.0 | 68.05 | 1361.0 | 20 |
| 2023-01-14T20:00:00 | XiaoMaiDao | 80.0 | 64.35 | 1287.0 | 20 |
| 2023-01-14T21:00:00 | XiaoMaiDao | 77.0 | 61.05 | 1221.0 | 20 |
| 2023-01-14T22:00:00 | XiaoMaiDao | 80.0 | 64.8 | 1296.0 | 20 |
| 2023-01-14T23:00:00 | XiaoMaiDao | 80.0 | 66.35 | 1327.0 | 20 |
| 2023-01-15T00:00:00 | XiaoMaiDao | 80.0 | 65.15 | 1303.0 | 20 |
| 2023-01-15T01:00:00 | XiaoMaiDao | 80.0 | 69.55 | 1391.0 | 20 |
+---------------------+------------+--------------+-----------------+-----------------+----------------+