高性能分析数据库StarRocks的安装与使用详解_ar

引言

在大数据时代，选择一个高性能的分析数据库对业务的成功至关重要。starrocks作为一款次世代mpp（massively parallel processing）数据库，以其卓越的实时分析和多维分析能力而闻名。本篇文章将带您探讨starrocks的安装与使用，并探讨其作为快速向量数据库的潜力。

什么是starrocks

starrocks是一种高度并行的分析数据库管理系统，专为多维分析、实时分析和临时查询而设计。它凭借其向量化执行引擎，在clickbench基准测试中展现了卓越的性能，被广泛应用于各种分析场景。

特性与优势

子秒查询响应：利用向量化引擎，starrocks可以提供极快的查询响应时间。

多维度分析：支持高效处理多维度的数据分析任务。

实时分析：具备强大的实时数据分析能力，适合动态数据场景。

灵活的查询能力：支持复杂的ad-hoc查询，适合多种业务需求。

下载

文中使用版本为3.2.4，可通过官网自行下载

准备部署文件 | starrocks

文章中使用的是存算一体架构，starrocks也支持存算分离架构

安装与配置

要开始使用starrocks，我们首先需要设置必要的软件环境。以下是安装步骤：

# 安装python mysql客户端
pip install pymysql

测试语句

create database example_db;

use example_db;

-- 新建用户并授权

create user 'testuser'@'%' identified by '123456';

grant all on databasename.* to 'testuser'@'%';

-- 仅包含一个 be,所以需要加properties( "replication_num" = "1" )

create table user_access (

    uid int,

    name varchar(64),

    age int,

    phone varchar(16),

    last_access datetime,

    credits double

)

properties( "replication_num" = "1" );

create table orders1 (

    order_id bigint not null,

    dt date not null,

    user_id int not null,

    good_id int not null,

    cnt int not null,

    revenue int not null

)

primary key (order_id)

distributed by hash (order_id)

properties( "replication_num" = "1" )

;

create table orders2 (

    order_id bigint not null,

    dt date not null,

    merchant_id int not null,

    user_id int not null,

    good_id int not null,

    good_name string not null,

    price int not null,

    cnt int not null,

    revenue int not null,

    state tinyint not null

)

primary key (order_id,dt,merchant_id)

partition by date_trunc('day', dt)

distributed by hash (merchant_id)

order by (dt,merchant_id)

properties (

    "enable_persistent_index" = "true",

"replication_num" = "1"

);

create table detail (

    event_time datetime not null comment "datetime of event",

    event_type int not null comment "type of event",

    user_id int comment "id of user",

    device_code int comment "device code",

    channel int comment "")

order by (event_time, event_type)

properties( "replication_num" = "1" );

create table aggregate_tbl (

    site_id largeint not null comment "id of site",

    date date not null comment "time of event",

    city_code varchar(20) comment "city_code of user",

    pv bigint sum default "0" comment "total page views"

)

aggregate key(site_id, date, city_code)

distributed by hash(site_id)

properties( "replication_num" = "1" );

create table orders4 (

    create_time date not null comment "create time of an order",

    order_id bigint not null comment "id of an order",

    order_state int comment "state of an order",

    total_price bigint comment "price of an order"

)

unique key(create_time, order_id)

distributed by hash(order_id);

properties( "replication_num" = "1" );

describe user_access;

show create table user_access;

-- 从本地文件导入数据

create table `table1`

(

    `id` int(11) not null comment "用户 id",

    `name` varchar(65533) null comment "用户姓名",

    `score` int(11) not null comment "用户得分"

)

engine=olap

primary key(`id`)

distributed by hash(`id`)

properties( "replication_num" = "1" );

-- 查看 fe 节点的 ip 地址和 http 端口号。

show frontends；

-- 导入作业

curl --location-trusted -u root: -h "label:123" -h "expect:100-continue" -h "column_separator:," -h "columns: id, name, score" -t d:\\data\\test.csv -xput http://192.168.5.66:8030/api/example_db/table1/_stream_load

select * from table1;

接下来，我们将使用一个例子来说明如何在python中使用starrocks库。

代码示例

假设我们想要在starrocks中储存和查询向量数据。以下是一个简单的使用示例：

from langchain_community.vectorstores import starrocks

# 假设我们通过api代理服务连接到starrocks数据库
starrocks_client = starrocks(api_endpoint="{ai_url}")  # 使用api代理服务提高访问稳定性

# 插入样本向量数据
vector_data = [0.1, 0.2, 0.3, 0.4]
starrocks_client.insert_vector("your_vector_table", vector_data)

# 查询向量
query_result = starrocks_client.query_vector("your_vector_table", query_vector=[0.1, 0.2, 0.3])

print("query result:", query_result)

这段代码展示了如何连接到starrocks数据库并执行基本的向量插入和查询操作。