Data Migration
This article describes how to use DataX to import data from other data sources into CnosDB.
Migration Tool DataX
DataX is Alibaba's open source offline data synchronization tool/platform that enables efficient data synchronization between various heterogeneous data sources.
In order to cope with the differences of different data sources, DataX abstracts the synchronization of different data sources into a Reader plug-in that reads data from the source data source and a Writer plug-in that writes data to the target, and the common functions such as type conversion and performance statistics between the read and write plug-ins are provided by the DataX framework.Users can easily synchronize heterogeneous data by defining the Reader plug-in and the Writer plug-in through configuration files.The configuration file of DataX is generally as follows:
{
"job": {
"content": [
{
"reader": {
Reader 配置
...
},
"writer": {
Writer 配置
...
}
}
],
"setting": {
其他配置
...
}
}
}
We provide a Writer plugin, CnosDBWriter, which you can configure to import data from other data sources into CnosDB via DataX.
Introduction to the cnosdbwriter plugin
The plugin cnosdbwriter generates schema-less write statements and sends them to CnosDB by reading the protocol data generated by the Reader plugin.
- Support OpenTSDB data format (via format = "opentsdb").
- Support for maximum number of rows (via batchSize) and maximum number of bytes (via bufferSize) written per batch.
- Support to configure the timing accuracy of input data (milliseconds, microseconds, nanoseconds).
Plugin configuration
Plugin configuration | Description | Required or Not | Default Value |
---|---|---|---|
cnosdbWriteAPI | API of CnosDB write URL, string | No | http://127.0.0.1:8902/api/v1/write |
tenant | Tenant, string | No | cnosdb |
database | Database, string | No | public |
username | Username, string | No | root |
password | Password, string | No | root |
batchSize | Maximum number of rows written to CnosDB per batch, unsigned integer | No | 1000 |
bufferSize | Maximum number of bytes written to CnosDB per batch, unsigned integer | No | 8388608 |
format | The format used by the Reader, string This configuration is required if the Reader uses a special format, such as opentsdbreader. Optional values are datax , opentsdb . | No | datax |
table | Table, string; No configuration is required when format is opentsdb . | Yes | - |
tags | The Map type, which maps the Tag name to the sequence number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax .See the instructions below for the format details. | Yes | * |
fields | The Map type, which maps the Field name to the number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax . See the instructions below for the format details.See the instructions below for the format details. | Yes | - |
timeIndex | The time field corresponds to the sequence number of the input column, which is an unsigned integer starting from 0. Only works if format is datax . | Yes | * |
precision | The timestamp accuracy of the input data, a string The optional values are s , ms , us , and ns for seconds, milliseconds, microseconds, and nanoseconds, respectively. | No | ms |
tagsExtra | Map type, configured with additional tags, as an additional column for each row of data, and imported into CnosDB.See the instructions below for the format details. | No | - |
fieldsExtra | Map type to configure which tables and columns in CnosDB to output data from certain columns of the reader.Only takes effect when format is opentsdb .See the instructions below for the format details. | No | * |
Notice:
When any conditions of the configuration items batchSize and bufferSize are met, the batch data is immediately sent to CnosDB and the buffer is cleared.For example, when batchSize=1000 and bufferSize=1048576, if the number of rows in the buffer reaches 1000, the data will be sent even if it does not reach 1MB; and when the buffer reaches 1MB, the data will be sent even if the number of rows does not reach 1000.
The default value of the configuration item format is datax. In this case, the CnosdbWriter assumes the data format is tabular, so you need to manually set the three configuration items: table, tags, fields, and timeIndex.
The configuration table specifies which CnosDB table the data should be output to.
The configuration item tags specifies which columns of the table type data correspond to the CnosDB tag column.Assuming the first and second columns of the table are tags, with tag names 'host' and 'unit', you can set it like this:
"tags": {
"host": 1,
"unit": 2
}The configuration item fields specifies which columns of the table type data correspond to the CnosDB field column.Assuming that columns 3 and 4 of the table are Field, and the Field names are "usage_min" and "usage_max", then you can set it up like this:
"fields": {
"usage_min": 3,
"usage_max": 4
}The configuration item timeIndex specifies which columns of the table type data correspond to the CnosDB time column.Assuming the 0th column of the table is Time, you can set it like this:
"timeIndex": 0
The configuration item 'precision' corresponds to the time precision provided by the Reader plugin, with the default being milliseconds.CnosDB will convert time by default because CnosDBWRitter will be stored by default.
The tagsExtra configuration is in the format
{tag name: tag value}
. If the defined Tag exists in the input data, it will be ignored, and the Tag in the input data will be used instead. The following example shows adding thehost=localhost
andsource=datax
tags to each line of data:{ "host": "localhost", "source": "datax" }
The format configuration option can also be set to
opentsdb
, in which case CnosDBWriter will assume that the input data has only one column, and the format is OpenTSDB JSON format write request. There is no need to configure the table, tags, fields, timeIndex. The data will be parsed from the JSON OpenTSDB write request.The configuration item fieldsExtra takes effect at this point, in the format
{ source column: { "table": target table, "field": target column } }
.The following example shows writing the data of OpenTSDB columncpu_usage
to the columnusage
of the tabletable
in CnosDB:{ "cpu_usage": { "table": "cpu", "field": "usage" } }
Data type conversion
DataX converts the types of Reader to internal types in order to normalize the type conversion operations between source and destination and ensure that the data is not distorted. See DataX Docs - Type Conversion.The internal types are as follows:
Long
: Fixed-point numbers (Int, Short, Long, BigInteger, etc.).Double
:Floating-point numbers (Float, Double, BigDecimal(infinite precision), etc.).String
: String type, underlying unlimited length, using the Universal character set (Unicode).Date
: Date type.Bool
: Boolean value.Bytes
:Binary, which can hold unstructured data such as MP3s.
CnosDBWriter will convert these internal types to CnosDB internal data types with the following conversion rules:
DataX Internal Type | CnosDB Data Type |
---|---|
Date (time column) | TIMESTAMP(NANOSECOND) |
Date (not time column) | BIGINT |
Long | BIGINT |
Double | DOUBLE |
Bytes | Unsupported |
String | STRING |
Bool | BOOLEAN |
Example - Importing CnosDB from OpenTSDB
Prerequisites
- Install Python 2 or 3, JDK 1.8 and DataX, see DataX Docs - Quick Start.
- Install CnosDB as described in the Deployment section.
We assume that DataX is installed in the path {YOUR_DATAX_HOME}
.
Parameter
Reader Plugin OpenTSDBReader Configuration
Suppose we have a running OpenTSDB and the data to be exported is as follows:
- Node address:
http://127.0.0.1:4242
- Metric:
sys.cpu.nice
- Start Time:
2023-06-01 00:00:00
- End Time:
2023-06-02 00:00:00
- Precision: ms
Then the corresponding Reader plugin OpenTSDBReader configuration is as follows:
{
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
}
The time precision is not written in the OpenTSDBReader configuration but in the precision entry of the CnosDBWriter configuration provided.
By default, the configuration item column will be used as the table name for CnosDB.Eventually the cpu_usage_nice
and cpu_usage_idle
tables will be generated in CnosDB and the metrics data will be written to the value
columns of both tables in a fixed manner.By configuring fieldsExtra, cpu_usage_nice
and cpu_usage_idle
can be written to the same table in CnosDB, see below for the configuration of fieldsExtra.
The data in OpenTSDB is as follows:
curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_nice' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 0.0,
"1685548820000": 0.0,
"1685548830000": 0.0,
"1685548840000": 1.509054,
"1685548850000": 4.885149,
"1685548860000": 19.758805,
"1685548870000": 27.269705,
"1685548880000": 32.713946,
"1685548890000": 37.621445,
"1685548900000": 26.837964,
}
}
]
curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_idle' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 26.837964,
"1685548820000": 37.621445,
"1685548830000": 32.713946,
"1685548840000": 27.269705,
"1685548850000": 1.509054,
"1685548860000": 19.758805,
"1685548870000": 4.885149,
"1685548880000": 0.0,
"1685548890000": 0.0,
"1685548900000": 0.0,
}
}
]
Writer Plugin CnosDBWriter Configuration
Suppose we have a running CnosDB with the following parameters:
- API Address:
http://127.0.0.1:8902/api/v1/write
- Tenant:
cnosdb
- Database:
public
- User:
root
- Password:
root
When using OpenTSDBReader with CnosDBWriter, you need to set the CnosdbWriter configuration format to
opentsdb
so that CnosdbWriter will write data to CnosDB correctly.
Then the corresponding CnosDBWriter configuration is as follows:
{
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}
Start the import task
- We create a DataX configuration file and populate the reader and writer entries with the previous OpenTSDBReader and CnosDBWriter configurations. Save as
{YOUR_DATAX_HOME}/bin/opentsdb_to_cnosdb.json
:
{
"job": {
"content": [
{
"reader": {
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
},
"writer": {
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
- Run datax.py, start the import task:
cd {YOUR_DATAX_HOME}/bin
python ./datax.py ./opentsdb_to_cnosdb.json
The output is as follows:
...
2023-07-01 12:34:56.789 [job-0] INFO JobContainer -
任务启动时刻 : 2023-07-01 12:34:55
任务结束时刻 : 2023-07-01 12:34:56
任务总计耗时 : 1s
任务平均流量 : 508B/s
记录写入速度 : 20rec/s
读出记录总数 : 20
读写失败总数 : 0
The data in CnosDB is as follows:
SELECT * FROM cpu ORDER BY time ASC;
+---------------------+--------+------+------------+------------+
| time | host | cpu | usage_nice | usage_idle |
+---------------------+--------+------+------------+------------+
| 2023-06-01T00:00:10 | myhost | cpu0 | 0.0 | 26.837964 |
| 2023-06-01T00:00:20 | myhost | cpu0 | 0.0 | 37.621445 |
| 2023-06-01T00:00:30 | myhost | cpu0 | 0.0 | 32.713946 |
| 2023-06-01T00:00:40 | myhost | cpu0 | 1.509054 | 27.269705 |
| 2023-06-01T00:00:50 | myhost | cpu0 | 4.885149 | 1.509054 |
| 2023-06-01T00:01:00 | myhost | cpu0 | 19.758805 | 19.758805 |
| 2023-06-01T00:01:10 | myhost | cpu0 | 27.269705 | 4.885149 |
| 2023-06-01T00:01:20 | myhost | cpu0 | 32.713946 | 0.0 |
| 2023-06-01T00:01:30 | myhost | cpu0 | 37.621445 | 0.0 |
| 2023-06-01T00:01:40 | myhost | cpu0 | 26.837964 | 0.0 |
+---------------------+--------+------+------------+------------+
Check the status of the import task:
The log files for DataX job runs are located in the {YOUR_DATAX_HOME}/log
directory by default.In these log files, we can see the start time, end time, status of the task, and any output and error messages.In addition, you can obtain the import progress by querying the exported table in CnosDB.
SELECT COUNT(usage_idle) as c FROM "cpu";
+----+
| c |
+----+
| 10 |
+----+
Cancel or stop the import task:
You can shut down the import task by terminating the DataX process:
pkill datax