跳到主要内容
版本:latest

导入导出

CnosDB 可以使用 SQL COPY INTO 把数据导出到本地或对象存储上,也可以把数据从对象存储和本地文件系统导入。

支持的文件格式有PARQUETCSVJSON,目前支持的对象存储有AWS S3, Google Cloud Storage, Azure Blob Storage

导出数据

导出到本地文件

语法

COPY INTO <path>
FROM [<database>.]<table_name>
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)

示例

导出PARQUET文件到本地目录
COPY INTO 'file:///tmp/air/'
FROM "air"
FILE_FORMAT = (
TYPE = 'PARQUET'
);
导出CSV文件到本地目录,并且设置分隔符为,,且不设置表头
COPY INTO 'file:///tmp/air'
FROM "air"
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = false
);

导出到 AWS S3

语法

COPY INTO <s3://<bucket>[/<path>]>
FROM [database.]<table_name>
CONNECTION = (
region = '<string>'
, access_key_id = '<string>'
, secret_key = '<string>'
[, token = '<string>' ]
[, virtual_hosted_style = true | false ]
)
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
regionAWS 地区代码
access_key_id访问密钥 ID
secret_key密钥
token(可选)临时授权令牌
virtual_hosted_style是否启用 Virtual-Hosted Style 访问模式,默认为 true

示例

将数据导出成CSV存储到 AWS S3,指定分隔符为,,并设置表头
COPY INTO 's3://tmp/air'
FROM "air"
CONNECTION = (
region = 'us‑east‑1',
access_key_id = '****************',
secret_key = '****************'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);
将数据导出成PARQUET存储到 AWS S3
COPY INTO 's3://tmp/air'
FROM "air"
CONNECTION = (
region = 'us‑east‑1',
access_key_id = '****************',
secret_key = '****************'
)
FILE_FORMAT = (
TYPE = 'PARQUET'
);

导出到 Google Cloud Storage

语法

COPY INTO 'gcs://<bucket>[/<path>]'
FROM [<database>.]<table_name>
CONNECTION = (
gcs_base_url = '<string>'
[, disable_oauth = true | false]
[, client_email = '<string>']
[, private_key = '<string>']
)
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
gcs_base_urlGoogle Cloud Storage 的基础 URL
disable_oauth(可选)是否禁用 OAuth,默认为 false
client_email(可选)服务账号的电子邮件地址,仅在不禁用 OAuth 时需要
private_key(可选)服务账号的私钥,仅在不禁用 OAuth 时需要

示例

将数据导出成 JSON 存储到 Google Cloud Storage
COPY INTO 'gcs://tmp/air'
FROM "air"
CONNECTION = (
gcs_base_url = 'https://storage.googleapis.com',
disable_oauth = false,
client_email = 'service_account@example.com',
private_key = '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----'
)
FILE_FORMAT = (
TYPE = 'JSON'
);
将数据导出成 CSV 存储到 Google Cloud Storage
COPY INTO 'gcs://tmp/air'
FROM air
CONNECTION = (
gcs_base_url = 'https://storage.googleapis.com',
disable_oauth = false,
client_email = 'service_account@example.com',
private_key = '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);

导出到 Azure Blob Storage

语法

COPY INTO 'azblob://<container>[/<path>]'
FROM [<database>.]<table_name>
CONNECTION = (
account = '<string>'
[, access_key = '<string>']
[, bearer_token = '<string>']
[, use_emulator = true | false]
)
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)

参数说明

这些参数提供了与Azure存储账户的连接和身份验证所需的信息。根据具体情况,你可能会使用access_key进行身份验证,或者使用bearer_token来进行身份验证。

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
accountAzure存储账户的名称,指定要连接的存储账户。
access_key存储账户的访问密钥,用于进行身份验证和授权。
bearer_token身份验证所需的令牌,可以替代access_key进行身份验证。
use_emulator默认为false,当为 true 时,url使用环境变量 AZURITE_BLOB_STORAGE_URLhttp://127.0.0.1:10000

示例

将数据导出成 CSV 存储到 Azure Blob Storage,指定分隔符为,,并设置表头
COPY INTO 'azblob://tmp/air'
FROM "air"
CONNECTION = (
account = '***********',
access_key = '****************'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);
将数据导出成 PARQUET 存储到 Microsoft Azure Blob Storage
COPY INTO 'azblob://tmp/air'
FROM "air"
CONNECTION = (
account = '***********',
access_key = '****************'
)
FILE_FORMAT = (
TYPE = 'PARQUET'
);

导入数据

导入之前,请确定目标表已经存在,并且列名和列的类型对应。

从本地导入数据

语法

COPY INTO [database.]<table_name>[(<time_col>, <field_col> [,field_col] ...[,TAGS (<tag_col> [, tag_col] ...)])]
FROM '<path>'
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)
COPY_OPTIONS = (
auto_infer_schema = true | false
)

参数说明

名称说明
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
auto_infer_schema是否自动推断文件的 schema,如果为false则使用目标表的schema

示例

从本地目录导入PARQUET文件
COPY INTO "air"
FROM 'file:///tmp/air/'
FILE_FORMAT = (
TYPE = 'PARQUET'
);
从本地目录导入CSV文件

DELIMITER=',' 表示文件以,分隔。

WITH_HEADER = true 表示文件有表头。

auto_infer_schema = true 表示会自动推断表头

COPY INTO "air"
FROM "file:///tmp/air/"
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER=',',
WITH_HEADER = true
)
COPY_OPTIONS = (
auto_infer_schema = false
);

从 AWS S3 导入数据

语法

COPY INTO [database.]<table_name>[(<time_col>, <field_col> [,field_col] ...[,TAGS (<tag_col> [, tag_col] ...)])]
FROM 's3://<bucket>[/<path>]'
CONNECTION = (
region = '<string>'
, access_key_id = '<string>'
, secret_key = '<string>'
[, token = '<string>' ]
[, virtual_hosted_style = true | false ]
)
FILE_FORMAT (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)
COPY_OPTIONS = (
auto_infer_schema = true | false
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
auto_infer_schema是否自动推断文件的 schema,如果为false则使用目标表的schema
regionAWS 地区代码
access_key_id访问密钥 ID
secret_key密钥
token(可选)临时授权令牌
virtual_hosted_style是否启用 Virtual-Hosted Style 访问模式,默认为 true

示例

AWS S3 上的CSV数据导入

DELIMITER=',' 表示文件以,分隔。

WITH_HEADER = true 表示文件有表头。

auto_infer_schema = true 表示会自动推断表头

COPY INTO "air"
FROM 's3://temp/air'
CONNECTION = (
region = 'us‑east‑1',
access_key_id = '****************',
secret_key = '****************'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);
AWS S3 上的 PARQUET 数据导入
COPY INTO "air"
FROM 's3://tmp/air'
CONNECTION = (
region = 'us‑east‑1',
access_key_id = '****************',
secret_key = '****************'
)
FILE_FORMAT = (
TYPE = 'PARQUET'
);

从 Google Cloud Storage 导入数据

语法

COPY INTO [database.]<table_name>[(<time_col>, <field_col> [,field_col] ...[,TAGS (<tag_col> [, tag_col] ...)])]
FROM 'gcs://<bucket>[/<path>]'
CONNECTION = (
gcs_base_url = '<string>'
[, disable_oauth = true | false]
[, client_email = '<string>']
[, private_key = '<string>']
)
FILE_FORMAT (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)
COPY_OPTIONS = (
auto_infer_schema = true | false
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
auto_infer_schema是否自动推断文件的 schema,如果为false则使用目标表的schema
gcs_base_urlGoogle Cloud Storage 的基础 URL
disable_oauth(可选)是否禁用 OAuth,默认为 false
client_email(可选)服务账号的电子邮件地址,仅在不禁用 OAuth 时需要
private_key(可选)服务账号的私钥,仅在不禁用 OAuth 时需要

示例

Google Cloud Storage 上的 JSON 数据导入
COPY INTO 'gcs://tmp/air'
FROM air
CONNECTION = (
gcs_base_url = 'https://storage.googleapis.com',
disable_oauth = false,
client_email = 'service_account@example.com',
private_key = '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----'
)
FILE_FORMAT = (
TYPE = 'JSON'
);
Google Cloud Storage 上的 CSV 数据导入
COPY INTO 'gcs://tmp/air'
FROM air
CONNECTION = (
gcs_base_url = 'https://storage.googleapis.com',
disable_oauth = false,
client_email = 'service_account@example.com',
private_key = '-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);

从 Azure Blob Storage 导入数据

语法

COPY INTO [database.]<table_name>[(<time_col>, <field_col> [,field_col] ...[,TAGS (<tag_col> [, tag_col] ...)])]
FROM 'azblob://<container>[/<path>]'
CONNECTION = (
account = '<string>'
[, access_key = '<string>']
[, bearer_token = '<string>']
[, use_emulator = true | false]
)
FILE_FORMAT = (
TYPE = {'PARQUET' | 'CSV' | 'JSON'}
[, DELIMITER = '<character>']
[, WITH_HEADER = true | false]
)
COPY_OPTIONS = (
auto_infer_schema = true | false
)

参数说明

名称描述
TYPE设置文件格式,分别为:PARQUETCSVJSON,示例:FILE_FORMAT = (TYPE='CSV')
DELIMITER仅支持CSV,设置文件格式,示例:DELIMITER = ','
WITH_HEADER仅支持CSV,是否带有表头,默认为 true,示例:示例:FILE_FORMAT = (TYPE='CSV', WITH_HEADER=true)
accountAzure存储账户的名称,指定要连接的存储账户。
access_key存储账户的访问密钥,用于进行身份验证和授权。
bearer_token身份验证所需的令牌,可以替代access_key进行身份验证。
use_emulator默认为false,当为 true 时,url使用环境变量 AZURITE_BLOB_STORAGE_URLhttp://127.0.0.1:10000

示例

将 Azure Blob Storage 上的 CSV 数据导入
COPY INTO "air"
FROM 'azblob://tmp/air'
CONNECTION = (
account = '***********',
access_key = '****************'
)
FILE_FORMAT = (
TYPE = 'CSV',
DELIMITER = ',',
WITH_HEADER = true
);
将 Azure Blob Storage 上的 PARQUET 数据导入
COPY INTO "air"
FROM 'azblob://tmp/air'
CONNECTION = (
account = '***********',
access_key = '****************'
)
FILE_FORMAT = (
TYPE = 'PARQUET'
);