Data Migration
This article mainly introduces how to use DataX based on the cnosdbwriter plugin to import OpenTSDB data into CnosDB.
Migration Tool DataX
DataX is Alibaba's open source offline data synchronization tool/platform, which can achieve efficient data synchronization between various heterogeneous data sources.
In order to deal with the differences between different data sources, DataX abstracts the synchronization of different data sources into Reader plugins that read data from the source data source, and Writer plugins that write data to the target side. The DataX framework provides general functions such as type conversion and performance statistics between reading and writing plugins. Users only need to define Reader plug-in and Writer plug-in through the configuration file, which can easily realize heterogeneous data synchronization. A typical DataX configuration file looks like this:
{
"job": {
"content": [
{
"reader": {
Reader configuration
...
},
"writer": {
Writer configuration
...
}
}
],
"setting": {
other configuration
...
}
}
}
We provide a Writer plugin CnosDBWriter that you can configure to import data from other data sources into CnosDB via DataX.
Introduction to the cnosdbwriter plugin
The plugin cnosdbwriter generates schema-less write statements and sends them to CnosDB by reading the protocol data generated by the Reader plugin.
- Support OpenTSDB data format (via format = "opentsdb").
- Support for maximum number of rows (via batchSize) and maximum number of bytes (via bufferSize) written per batch.
- Support to configure the timing accuracy of input data (milliseconds, microseconds, nanoseconds).
Plugin configuration
Parameter | Description | Required | Default Value |
---|---|---|---|
cnosdbWriteAPI | API of CnosDB write URL, string | No | http://127.0.0.1:8902/api/v1/write |
tenant | Tenant, string | No | cnosdb |
database | Database, string | No | public |
username | Username, string | No | root |
password | Password, string | No | root |
batchSize | Maximum number of rows written to CnosDB per batch, unsigned integer | No | 1000 |
bufferSize | Maximum number of bytes written to CnosDB per batch, unsigned integer | No | 8388608 |
format | The format used by the Reader, string This configuration is required if the Reader uses a special format, such as opentsdbreader. Optional values are datax , opentsdb . | No | datax |
table | Table, string; No configuration is required when format is opentsdb . | Yes | - |
tags | The Map type, which maps the Tag name to the sequence number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax . See the instructions below for the format details. | Yes | - |
fields | The Map type, which maps the Field name to the number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax . See the instructions below for the format details. | Yes | - |
timeIndex | The time field corresponds to the sequence number of the input column, which is an unsigned integer starting from 0. Only works if format is datax . | Yes | - |
precision | The timestamp accuracy of the input data, a string The optional values are s , ms , us , and ns for seconds, milliseconds, microseconds, and nanoseconds, respectively. | No | ms |
tagsExtra | Map type, configured with additional tags, as an additional column for each row of data, and imported into CnosDB. See the instructions below for the format details. | No | - |
fieldsExtra | Map type to configure which tables and columns of CnosDB the data from some columns of reader is output to. Only works if format is opentsdb . See the instructions below for the format details. | No | - |
Notice:
When any condition of batchSize and bufferSzie is met, the batch data will be sent to CnosDB immediately, and the buffer will be cleared. For example, when batchSize=1000 and bufferSize=1048576, if the number of rows in the buffer reaches 1000, the data will be sent even if it does not reach 1MB; When the buffer reaches 1MB, if the number of rows does not reach 1000, it will also be sent.
The default value of format is
datax
, CnosdbWriter will assume that the data format is a table type, so you need to manually set the three configuration items table, tags, fields, timeIndex.The configuration table specifies which CnosDB table the data should be output to.
The tags configuration option specifies which columns correspond to CnosDB tag columns in the table type of data. Suppose the first and second columns of the table contain tags with names "host" and "unit", respectively:
"tags": {
"host": 1,
"unit": 2
}The fields configuration option specifies which columns correspond to CnosDB field columns for table-type data. Suppose the third and fourth columns of the table are fields with the names "usage_min" and "usage_max".You can set them like this:
"fields": {
"usage_min": 3,
"usage_max": 4
}The timeIndex configuration option specifies which columns correspond to the CnosDB time column in the table type of data. Suppose the table has column 0, Time, and we set it like this:
"timeIndex": 0
The precision configuration option corresponds to the time precision provided by the Reader plug-in, which defaults to milliseconds. Since CnosDB uses nanoseconds for storage by default, sometimes CnosDBWriter will convert the time.
The tagsExtra configuration is in the format
{tag name: tag value}
. If the defined Tag exists in the input data, it will be ignored, and the Tag in the input data will be used instead. The following example addshost=localhost
andsource=datax
tags to each row:{ "host": "localhost", "source": "datax" }
The format configuration option can also be set to
opentsdb
, in which case CnosDBWriter will assume that the input data has only one column, and the format is OpenTSDB JSON format write request. There is no need to configure the table, tags, fields, timeIndex. The data will be parsed from the JSON OpenTSDB write request.The fieldsExtra configuration is applied at this point in the format
{source column: {"table": target table, "field": target column}}
. The following example shows that the data of OpenTSDB columncpu_usage
is written to the data columnusage
of CnosDB table{ "cpu_usage": { "table": "cpu", "field": "usage" } }
Data type conversion
DataX converts the types of Reader to internal types in order to normalize the type conversion operations between source and destination and ensure that the data is not distorted. See DataX Docs - Type Conversion. These internal types are as follows:
Long
:Fixed-point numbers (Int, Short, Long, BigInteger, etc.).Double
:Floating-point numbers (Float, Double, BigDecimal(infinite precision), etc.).String
:String type, underlying unlimited length, using the Universal character set (Unicode).Date
:Date type.Bool
:Boolean type.Bytes
:Binary, which can hold unstructured data such as MP3s.
CnosDBWriter will convert these internal types to CnosDB internal data types with the following conversion rules:
DataX Internal Type | CnosDB Data Type |
---|---|
Date (time column) | TIMESTAMP(NANOSECOND) |
Date (not time column) | BIGINT |
Long | BIGINT |
Double | DOUBLE |
Bytes | Unsupported |
String | STRING |
Bool | BOOLEAN |
Example - Importing CnosDB from OpenTSDB
Preparation
- Install Python 2 or 3, JDK 1.8 and DataX, see DataX Docs - Quick Start.
- Install CnosDB as described in the Deployment section.
We assume that DataX is installed in the path {YOUR_DATAX_HOME}
.
Configuration
Reader Plugin OpenTSDBReader Configuration
Suppose we have a running OpenTSDB and the data to be exported is as follows:
- Node address:
http://127.0.0.1:4242
- Metric:
sys.cpu.nice
- Start Time:
2023-06-01 00:00:00
- End Time:
2023-06-02 00:00:00
- Time Accuracy: ms
Then the corresponding Reader plugin OpenTSDBReader configuration is as follows:
{
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
}
The time precision is not written in the OpenTSDBReader configuration but in the precision entry of the CnosDBWriter configuration provided.
By default, the configuration item column will be used as the table name for CnosDB. Finally, cpu_usage_nice
and cpu_usage_idle
tables will be generated in CnosDB, and the metrics will be written to the value
column of both tables. We can configure fieldsExtra to write cpu_usage_nice
to the same CnosDB table as cpu_usage_idle
, as shown in the fieldsExtra configuration below.
The data in OpenTSDB is as follows:
curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_nice' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 0.0,
"1685548820000": 0.0,
"1685548830000": 0.0,
"1685548840000": 1.509054,
"1685548850000": 4.885149,
"1685548860000": 19.758805,
"1685548870000": 27.269705,
"1685548880000": 32.713946,
"1685548890000": 37.621445,
"1685548900000": 26.837964,
}
}
]
curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_idle' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 26.837964,
"1685548820000": 37.621445,
"1685548830000": 32.713946,
"1685548840000": 27.269705,
"1685548850000": 1.509054,
"1685548860000": 19.758805,
"1685548870000": 4.885149,
"1685548880000": 0.0,
"1685548890000": 0.0,
"1685548900000": 0.0,
}
}
]
Writer Plugin CnosDBWriter Configuration
Suppose we have a running CnosDB with the following parameters:
- API Address:
http://127.0.0.1:8902/api/v1/write
- Tenant:
cnosdb
- Database:
public
- User:
root
- Password:
root
When using OpenTSDBReader with CnosDBWriter, you need to set the CnosdbWriter configuration format to
opentsdb
so that CnosdbWriter will write data to CnosDB correctly.
Then the corresponding CnosDBWriter configuration is as follows:
{
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}
Start the import task
- We create a DataX configuration file and populate the reader and writer entries with the previous OpenTSDBReader and CnosDBWriter configurations. Save as
{YOUR_DATAX_HOME}/bin/opentsdb_to_cnosdb.json
:
{
"job": {
"content": [
{
"reader": {
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
},
"writer": {
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
- Run datax.py, start the import task:
cd {YOUR_DATAX_HOME}/bin
python ./datax.py ./opentsdb_to_cnosdb.json
The output is as follows:
...
2023-07-01 12:34:56.789 [job-0] INFO JobContainer -
任务启动时刻 : 2023-07-01 12:34:55
任务结束时刻 : 2023-07-01 12:34:56
任务总计耗时 : 1s
任务平均流量 : 508B/s
记录写入速度 : 20rec/s
读出记录总数 : 20
读写失败总数 : 0
The data in CnosDB is as follows:
SELECT * FROM cpu ORDER BY time ASC;
+---------------------+--------+------+------------+------------+
| time | host | cpu | usage_nice | usage_idle |
+---------------------+--------+------+------------+------------+
| 2023-06-01T00:00:10 | myhost | cpu0 | 0.0 | 26.837964 |
| 2023-06-01T00:00:20 | myhost | cpu0 | 0.0 | 37.621445 |
| 2023-06-01T00:00:30 | myhost | cpu0 | 0.0 | 32.713946 |
| 2023-06-01T00:00:40 | myhost | cpu0 | 1.509054 | 27.269705 |
| 2023-06-01T00:00:50 | myhost | cpu0 | 4.885149 | 1.509054 |
| 2023-06-01T00:01:00 | myhost | cpu0 | 19.758805 | 19.758805 |
| 2023-06-01T00:01:10 | myhost | cpu0 | 27.269705 | 4.885149 |
| 2023-06-01T00:01:20 | myhost | cpu0 | 32.713946 | 0.0 |
| 2023-06-01T00:01:30 | myhost | cpu0 | 37.621445 | 0.0 |
| 2023-06-01T00:01:40 | myhost | cpu0 | 26.837964 | 0.0 |
+---------------------+--------+------+------------+------------+
Check the status of the import task:
The log files for DataX jobs are located by default in the {YOUR_DATAX_HOME}/log
directory. In these log files, we can view the start time, end time, status of the task, and any output and error messages. In addition, the import progress can be obtained by querying the exported table in CnosDB:
SELECT COUNT(usage_idle) as c FROM "cpu";
+----+
| c |
+----+
| 10 |
+----+
Cancel or stop the import task:
You can shut down the import task by terminating the DataX process:
pkill datax