Skip to main content
Version: latest

Arrow Flight SQL

Arrow Flight SQL Introduction

Arrow Flight SQL is a protocol for interacting with SQL databases using the Arrow in-memory format and the Flight RPC framework.

Our current environments that support the Arrow Flight SQL client are

Benefits of Arrow Flight SQL

  1. Powerful functionality.Powerful functionality. Functionality similar to APIs such as JDBC and ODBC, including executing queries, creating prepared statements
  2. Safe.Flight, supporting features such as out-of-the-box encryption and authentication.
  3. Performance.Communicates with client-side servers that implement Arrow Flight without data transformation, while allowing further optimizations such as parallel data access

Although it can be used directly for database access, it cannot directly replace JDBC/ODBC. However, Flight SQL can be used as a specific wired protocol/driver implementation, supporting JDBC/ODBC drivers, and reducing the implementation burden on the database.

Flow of Arrow Flight SQL Queries

The client uses arrow flight sql client to connect to the database, query data, and execute SQL in the following flow.

  1. Create FlightSql client.
  2. Verify the user name and password.
  3. Execute the SQL and get the FlightInfo structure.
  4. Get the FlightData data stream through the FlightEndPoint in the FlightInfo structure.

FlightInfo contains detailed information about the location of the data, The client can get the data from the appropriate server. Server information is encoded as a prime Flash Endpoint message in FlowInfo. Each Endpoint represents a certain location that contains a subset of response data.

A FlightEndpoint contains a list of server addresses, a Ticket, and a binary Token used by the server to identify request data. FlightEndPoint has no defined order, if the dataset is sorted, Only data will be returned in one FlightEndPoint.

The flow chart is as follows:

流程图

Different Client Usage

tip

This section provides a separate description of how different clients are used.

  • Installing Apache Arrow

    You can find a detailed installation tutorial in the official documentation On Mac systems, it's easy to install with the brew command

    brew install apache-arrow
    brew install apache-arrow-glib
  • Configuring CMakeLists.txt

    cmake_minimum_required(VERSION 3.24)
    project(arrow_flight_cpp)

    set(CMAKE_CXX_STANDARD 20)

    find_package(Arrow REQUIRED)
    find_package(ArrowFlight REQUIRED)
    find_package(ArrowFlightSql REQUIRED)

    include_directories(${ARROW_INCLUDE_DIR})
    add_executable(arrow_flight_cpp main.cpp)
    target_link_libraries(arrow_flight_cpp PRIVATE Arrow::arrow_shared)
    target_link_libraries(arrow_flight_cpp PRIVATE ArrowFlight::arrow_flight_shared)
    target_link_libraries(arrow_flight_cpp PRIVATE ArrowFlightSql::arrow_flight_sql_shared)
  • Usage of C++ Arrow library

    Most of arrow's functions return the arrow::Result<T> type, so you need to write the code in a function that returns a value of the type arrow::Result<T>, as follows:

     arrow::Result <std::unique_ptr<FlightClient>> get_location() {
    ARROW_ASSIGN_OR_RAISE(auto location, Location::ForGrpcTcp("localhost", 8904));
    ARROW_ASSIGN_OR_RAISE(auto client, FlightClient::Connect(location))
    }

    ARROW_ASSIGN_OR_RAISEThe effect of the macro is to first evaluate the expression with a return value of type arrow::Result<T> on the right, and then return it early if an exception occurs, assigning the corresponding Status value.

    For convenience, the sample code is written in the lambda function.

    int main() {
    auto fun = []() {
    // code
    }
    fun();
    return 0;
    }
  • Verify identity to obtain a token and create a FlightSqlClient

    ARROW_ASSIGN_OR_RAISE(auto location, Location::ForGrpcTcp("localhost", 8904))
    ARROW_ASSIGN_OR_RAISE(auto client, FlightClient::Connect(location))
    auto user = "root";
    auto password = "";
    //Base64加密认证
    auto auth = client->AuthenticateBasicToken({}, user, password);
    ARROW_RETURN_NOT_OK(auth); // 如果result出现异常,直接return
    FlightCallOptions call_options;
    call_options.headers.push_back(auth.ValueOrDie()); //把认证放到调用选项中
    auto sql_client = std::make_unique<FlightSqlClient>(std::move(client));
  • Execute sql to get FlightInfo

    ARROW_ASSIGN_OR_RAISE(auto info, sql_client->Execute(call_options, "select now();"));
    const auto endpoints = info->endpoints();
  • Retrieve data via FlightEndPoint

    for (auto i = 0; i < endpoints.size(); i++) {
    auto &ticket = endpoints[i].ticket;
    // stream中包含数据
    ARROW_ASSIGN_OR_RAISE(auto stream, sql_client->DoGet(call_options, ticket));
    // 获取数据的Schema
    auto schema = stream->GetSchema();
    ARROW_RETURN_NOT_OK(schema);
    std::cout << "Schema:" << schema->get()->ToString() << std::endl;
    // 取得并打印数据
    while(true) {
    ARROW_ASSIGN_OR_RAISE(FlightStreamChunk chunk, stream->Next());
    if (chunk.data == nullptr) {
    break;
    }
    std::cout << chunk.data->ToString();
    }
    }

Overall code

#include <iostream>
#include <arrow/flight/api.h>
#include <arrow/flight/sql/api.h>
using namespace arrow::flight;
using namespace arrow::flight::sql;
using namespace arrow;

int main() {

auto fun = []() {
ARROW_ASSIGN_OR_RAISE(auto location, Location::ForGrpcTcp("localhost", 8904))
ARROW_ASSIGN_OR_RAISE(auto client, FlightClient::Connect(location))

auto user = "root";
auto password = "";
auto auth = client->AuthenticateBasicToken({}, user, password);
auto sql_client = std::make_unique<FlightSqlClient>(std::move(client));
ARROW_RETURN_NOT_OK(auth);
FlightCallOptions call_options;
call_options.headers.push_back(auth.ValueOrDie());

ARROW_ASSIGN_OR_RAISE(auto info, sql_client->Execute(call_options, "select now();"));
const auto endpoints = info->endpoints();
for (auto i = 0; i < endpoints.size(); i++) {
auto &ticket = endpoints[i].ticket;

ARROW_ASSIGN_OR_RAISE(auto stream, sql_client->DoGet(call_options, ticket));

auto schema = stream->GetSchema();
ARROW_RETURN_NOT_OK(schema);

std::cout << "Schema:" << schema->get()->ToString() << std::endl;
while(true) {
ARROW_ASSIGN_OR_RAISE(FlightStreamChunk chunk, stream->Next());
if (chunk.data == nullptr) {
break;
}
std::cout << chunk.data->ToString();
}
}
return Status::OK();
};

auto status = fun();

return 0;
}