Skip to main content
Version: 2.3.2

Data Migration

This article mainly introduces how to use DataX based on the cnosdbwriter plugin to import OpenTSDB data into CnosDB.

Migration Tool DataX

DataX is Alibaba's open source offline data synchronization tool/platform, which can achieve efficient data synchronization between various heterogeneous data sources.

In order to deal with the differences between different data sources, DataX abstracts the synchronization of different data sources into Reader plugins that read data from the source data source, and Writer plugins that write data to the target side. The DataX framework provides general functions such as type conversion and performance statistics between reading and writing plugins. Users only need to define Reader plug-in and Writer plug-in through the configuration file, which can easily realize heterogeneous data synchronization. A typical DataX configuration file looks like this:

{
"job": {
"content": [
{
"reader": {
Reader configuration
...
},
"writer": {
Writer configuration
...
}
}
],
"setting": {
other configuration
...
}
}
}

We provide a Writer plugin CnosDBWriter that you can configure to import data from other data sources into CnosDB via DataX.

Introduction to the cnosdbwriter plugin

The plugin cnosdbwriter generates schema-less write statements and sends them to CnosDB by reading the protocol data generated by the Reader plugin.

  • Support OpenTSDB data format (via format = "opentsdb").
  • Support for maximum number of rows (via batchSize) and maximum number of bytes (via bufferSize) written per batch.
  • Support to configure the timing accuracy of input data (milliseconds, microseconds, nanoseconds).

Plugin configuration

ParameterDescriptionRequiredDefault Value
cnosdbWriteAPIAPI of CnosDB write URL, stringNohttp://127.0.0.1:8902/api/v1/write
tenantTenant, stringNocnosdb
databaseDatabase, stringNopublic
usernameUsername, stringNoroot
passwordPassword, stringNoroot
batchSizeMaximum number of rows written to CnosDB per batch, unsigned integerNo1000
bufferSizeMaximum number of bytes written to CnosDB per batch, unsigned integerNo8388608
formatThe format used by the Reader, string This configuration is required if the Reader uses a special format, such as opentsdbreader. Optional values are datax, opentsdb.Nodatax
tableTable, string; No configuration is required when format is opentsdb.Yes-
tagsThe Map type, which maps the Tag name to the sequence number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax. See the instructions below for the format details.Yes-
fieldsThe Map type, which maps the Field name to the number of the corresponding input column (unsigned integer, starting from 0). Only works if format is datax. See the instructions below for the format details.Yes-
timeIndexThe time field corresponds to the sequence number of the input column, which is an unsigned integer starting from 0. Only works if format is datax.Yes-
precisionThe timestamp accuracy of the input data, a string The optional values are s, ms, us, and ns for seconds, milliseconds, microseconds, and nanoseconds, respectively.Noms
tagsExtraMap type, configured with additional tags, as an additional column for each row of data, and imported into CnosDB. See the instructions below for the format details.No-
fieldsExtraMap type to configure which tables and columns of CnosDB the data from some columns of reader is output to. Only works if format is opentsdb. See the instructions below for the format details.No-

Notice:

  • When any condition of batchSize and bufferSzie is met, the batch data will be sent to CnosDB immediately, and the buffer will be cleared. For example, when batchSize=1000 and bufferSize=1048576, if the number of rows in the buffer reaches 1000, the data will be sent even if it does not reach 1MB; When the buffer reaches 1MB, if the number of rows does not reach 1000, it will also be sent.

  • The default value of format is datax, CnosdbWriter will assume that the data format is a table type, so you need to manually set the three configuration items table, tags, fields, timeIndex.

    • The configuration table specifies which CnosDB table the data should be output to.

    • The tags configuration option specifies which columns correspond to CnosDB tag columns in the table type of data. Suppose the first and second columns of the table contain tags with names "host" and "unit", respectively:

      "tags": {
      "host": 1,
      "unit": 2
      }
    • The fields configuration option specifies which columns correspond to CnosDB field columns for table-type data. Suppose the third and fourth columns of the table are fields with the names "usage_min" and "usage_max".You can set them like this:

      "fields": {
      "usage_min": 3,
      "usage_max": 4
      }
    • The timeIndex configuration option specifies which columns correspond to the CnosDB time column in the table type of data. Suppose the table has column 0, Time, and we set it like this:

      "timeIndex": 0
  • The precision configuration option corresponds to the time precision provided by the Reader plug-in, which defaults to milliseconds. Since CnosDB uses nanoseconds for storage by default, sometimes CnosDBWriter will convert the time.

  • The tagsExtra configuration is in the format {tag name: tag value}. If the defined Tag exists in the input data, it will be ignored, and the Tag in the input data will be used instead. The following example adds host=localhost and source=datax tags to each row:

    { "host": "localhost", "source": "datax"  }
  • The format configuration option can also be set to opentsdb, in which case CnosDBWriter will assume that the input data has only one column, and the format is OpenTSDB JSON format write request. There is no need to configure the table, tags, fields, timeIndex. The data will be parsed from the JSON OpenTSDB write request.

    • The fieldsExtra configuration is applied at this point in the format {source column: {"table": target table, "field": target column}}. The following example shows that the data of OpenTSDB column cpu_usage is written to the data column usage of CnosDB table

      { "cpu_usage": { "table": "cpu", "field": "usage" } }

Data type conversion

DataX converts the types of Reader to internal types in order to normalize the type conversion operations between source and destination and ensure that the data is not distorted. See DataX Docs - Type Conversion. These internal types are as follows:

  • Long:Fixed-point numbers (Int, Short, Long, BigInteger, etc.).
  • Double:Floating-point numbers (Float, Double, BigDecimal(infinite precision), etc.).
  • String:String type, underlying unlimited length, using the Universal character set (Unicode).
  • Date:Date type.
  • Bool:Boolean type.
  • Bytes:Binary, which can hold unstructured data such as MP3s.

CnosDBWriter will convert these internal types to CnosDB internal data types with the following conversion rules:

DataX Internal TypeCnosDB Data Type
Date (time column)TIMESTAMP(NANOSECOND)
Date (not time column)BIGINT
LongBIGINT
DoubleDOUBLE
BytesUnsupported
StringSTRING
BoolBOOLEAN

Example - Importing CnosDB from OpenTSDB

Preparation

We assume that DataX is installed in the path {YOUR_DATAX_HOME}.

Configuration

Reader Plugin OpenTSDBReader Configuration

Suppose we have a running OpenTSDB and the data to be exported is as follows:

  • Node address: http://127.0.0.1:4242
  • Metric: sys.cpu.nice
  • Start Time: 2023-06-01 00:00:00
  • End Time: 2023-06-02 00:00:00
  • Time Accuracy: ms

Then the corresponding Reader plugin OpenTSDBReader configuration is as follows:

{
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
}

The time precision is not written in the OpenTSDBReader configuration but in the precision entry of the CnosDBWriter configuration provided.

By default, the configuration item column will be used as the table name for CnosDB. Finally, cpu_usage_nice and cpu_usage_idle tables will be generated in CnosDB, and the metrics will be written to the value column of both tables. We can configure fieldsExtra to write cpu_usage_nice to the same CnosDB table as cpu_usage_idle, as shown in the fieldsExtra configuration below.

The data in OpenTSDB is as follows:

curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_nice' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 0.0,
"1685548820000": 0.0,
"1685548830000": 0.0,
"1685548840000": 1.509054,
"1685548850000": 4.885149,
"1685548860000": 19.758805,
"1685548870000": 27.269705,
"1685548880000": 32.713946,
"1685548890000": 37.621445,
"1685548900000": 26.837964,
}
}
]

curl 'http://localhost:4242/api/query?start=2023/06/01-00:00:00&end=2023/06/01-01:00:00&m=none:cpu_usage_idle' |jq
[
{
"metric": "cpu_usage_nice",
"tags": {
"host": "myhost",
"cpu": "cpu0"
},
"aggregateTags": [],
"dps": {
"1685548810000": 26.837964,
"1685548820000": 37.621445,
"1685548830000": 32.713946,
"1685548840000": 27.269705,
"1685548850000": 1.509054,
"1685548860000": 19.758805,
"1685548870000": 4.885149,
"1685548880000": 0.0,
"1685548890000": 0.0,
"1685548900000": 0.0,
}
}
]

Writer Plugin CnosDBWriter Configuration

Suppose we have a running CnosDB with the following parameters:

  • API Address: http://127.0.0.1:8902/api/v1/write
  • Tenant: cnosdb
  • Database: public
  • User: root
  • Password: root

When using OpenTSDBReader with CnosDBWriter, you need to set the CnosdbWriter configuration format to opentsdb so that CnosdbWriter will write data to CnosDB correctly.

Then the corresponding CnosDBWriter configuration is as follows:

{
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}

Start the import task

  1. We create a DataX configuration file and populate the reader and writer entries with the previous OpenTSDBReader and CnosDBWriter configurations. Save as {YOUR_DATAX_HOME}/bin/opentsdb_to_cnosdb.json:
{
"job": {
"content": [
{
"reader": {
"name": "opentsdbreader",
"parameter": {
"endpoint": "http://localhost:4242",
"column": [
"cpu_usage_nice",
"cpu_usage_idle"
],
"beginDateTime": "2023-06-01 00:00:00",
"endDateTime": "2023-06-02 00:00:00"
}
},
"writer": {
"name": "cnosdbwriter",
"parameter": {
"cnosdbWriteAPI": "http://127.0.0.1:8902/api/v1/write",
"tenant": "cnosdb",
"database": "public",
"username": "root",
"password": "root",
"format": "opentsdb",
"fieldsExtra": {
"cpu_usage_nice": {
"table": "cpu", "field": "usage_nice"
},
"cpu_usage_idle": {
"table": "cpu", "field": "usage_idle"
}
}
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
  1. Run datax.py, start the import task:
cd {YOUR_DATAX_HOME}/bin
python ./datax.py ./opentsdb_to_cnosdb.json

The output is as follows:

...
2023-07-01 12:34:56.789 [job-0] INFO JobContainer -
任务启动时刻 : 2023-07-01 12:34:55
任务结束时刻 : 2023-07-01 12:34:56
任务总计耗时 : 1s
任务平均流量 : 508B/s
记录写入速度 : 20rec/s
读出记录总数 : 20
读写失败总数 : 0

The data in CnosDB is as follows:

SELECT * FROM cpu ORDER BY time ASC;
+---------------------+--------+------+------------+------------+
| time | host | cpu | usage_nice | usage_idle |
+---------------------+--------+------+------------+------------+
| 2023-06-01T00:00:10 | myhost | cpu0 | 0.0 | 26.837964 |
| 2023-06-01T00:00:20 | myhost | cpu0 | 0.0 | 37.621445 |
| 2023-06-01T00:00:30 | myhost | cpu0 | 0.0 | 32.713946 |
| 2023-06-01T00:00:40 | myhost | cpu0 | 1.509054 | 27.269705 |
| 2023-06-01T00:00:50 | myhost | cpu0 | 4.885149 | 1.509054 |
| 2023-06-01T00:01:00 | myhost | cpu0 | 19.758805 | 19.758805 |
| 2023-06-01T00:01:10 | myhost | cpu0 | 27.269705 | 4.885149 |
| 2023-06-01T00:01:20 | myhost | cpu0 | 32.713946 | 0.0 |
| 2023-06-01T00:01:30 | myhost | cpu0 | 37.621445 | 0.0 |
| 2023-06-01T00:01:40 | myhost | cpu0 | 26.837964 | 0.0 |
+---------------------+--------+------+------------+------------+

Check the status of the import task:

The log files for DataX jobs are located by default in the {YOUR_DATAX_HOME}/log directory. In these log files, we can view the start time, end time, status of the task, and any output and error messages. In addition, the import progress can be obtained by querying the exported table in CnosDB:

SELECT COUNT(usage_idle) as c FROM "cpu";
+----+
| c |
+----+
| 10 |
+----+

Cancel or stop the import task:

You can shut down the import task by terminating the DataX process:

pkill datax