327人参与 • 2024-08-04 • 网页播放器
broker load 是一个异步的导入方式,支持的数据源取决于 broker 进程支持的数据源。
用户在提交导入任务后,fe 会生成对应的 plan 并根据目前 be 的个数和文件的大小,将 plan 分给 多个 be 执行,每个 be 执行一部分导入数据。
be 在执行的过程中会从 broker 拉取数据,在对数据 transform 之后将数据导入系统。所有 be 均完成导入,由 fe 最终决定导入是否成功。
+
| 1. user create broker load
v
+----+----+
| |
| fe |
| |
+----+----+
|
| 2. be etl and load the data
+--------------------------+
| | |
+---v---+ +--v----+ +---v---+
| | | | | |
| be | | be | | be |
| | | | | |
+---+-^-+ +---+-^-+ +--+-^--+
| | | | | |
| | | | | | 3. pull data from broker
+---v-+-+ +---v-+-+ +--v-+--+
| | | | | |
|broker | |broker | |broker |
| | | | | |
+---+-^-+ +---+-^-+ +---+-^-+
| | | | | |
+---v-+-----------v-+----------v-+-+
| hdfs/bos/afs cluster |
| |
+----------------------------------+
hadoop3.3 下载
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
lacalhost ssh 免密登录
ssh-keygen -t rsa -p '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost
执行 ssh localhost,会弹出安全提示,填写yes即可。
如果是docker环境下,需要手动启动sshd服务。
/usr/sbin/sshd
core-site.xml配置
[root@17a5da45700b hadoop]# cat core-site.xml
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
licensed under the apache license, version 2.0 (the "license");
you may not use this file except in compliance with the license.
you may obtain a copy of the license at
http://www.apache.org/licenses/license-2.0
unless required by applicable law or agreed to in writing, software
distributed under the license is distributed on an "as is" basis,
without warranties or conditions of any kind, either express or implied.
see the license for the specific language governing permissions and
limitations under the license. see accompanying license file.
-->
<!-- put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultfs</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml配置
[root@17a5da45700b hadoop]# cat hdfs-site.xml
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
licensed under the apache license, version 2.0 (the "license");
you may not use this file except in compliance with the license.
you may obtain a copy of the license at
http://www.apache.org/licenses/license-2.0
unless required by applicable law or agreed to in writing, software
distributed under the license is distributed on an "as is" basis,
without warranties or conditions of any kind, either express or implied.
see the license for the specific language governing permissions and
limitations under the license. see accompanying license file.
-->
<!-- put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
hadoop-env.sh配置:添加如下配置到hadoop-env.sh。
export java_home=/data1/jdk1.8.0_331
export hdfs_namenode_user=root
export hdfs_datanode_user=root
export hdfs_secondarynamenode_user=root
export yarn_resourcemanager_user=root
export yarn_nodemanager_user=root
格式化hdfs文件系统
bin/hdfs namenode -format
启动hdfs
sbin/start-dfs.sh
创建hdfs目录
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/root
查看hdfs目录,确保hdfs服务正常。
[root@17a5da45700b hadoop-3.3.3]# bin/hdfs dfs -ls /user
found 2 items
drwxr-xr-x - root supergroup 0 2022-06-21 03:00 /user/hive
drwxr-xr-x - root supergroup 0 2022-06-15 09:38 /user/root
hadoop环境变量配置。
export hadoop_home=/opt/software/hadoop/hadoop-3.3.3
export hadoop_conf_dir=$hadoop_home/etc/hadoop
export hadoop_hdfs_home=$hadoop_home
export path=$path:$hadoop_home/sbin:$hadoop_home/bin
doris安装参照官网:https://doris.apache.org/zh-cn/docs/get-starting/get-starting.html#%e5%8d%95%e6%9c%ba%e9%83%a8%e7%bd%b2
doris和hadoop有一些端口冲突,需要对doirs默认端口进行修改。
vim be/conf/be.conf
将webserver_port = 8040修改为webserver_port = 18040
vim fe/conf/fe.conf
将http_port = 8030修改为http_port = 18030
mysql -h 127.0.0.1 -p9030 -uroot
create database;
use test;
create table `test` (
`id` varchar(32) null default "",
`user_name` varchar(32) null default "",
`member_list` decimal(10,3)
) engine=olap
duplicate key(`id`)
comment 'olap'
distributed by hash(`id`) buckets 10
properties (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "v2",
"disable_auto_compaction" = "false"
);
5,sim5,1.500
6,sim6,1.006
7,sim7,1.070
#创建hdfs目录
bin/hdfs dfs -mkdir /user/root/doris_test
#将本地文件stream_load_data.csv写入hdfs上的doris_test目录
bin/hdfs dfs -put /data1/hadoop-3.3.0/stream_load_data.csv /user/root/doris_test
支持如下命令,通过broker load导入数据。
use test;
load label test.label_20220404
(
data infile("hdfs://127.0.0.1:9000/user/root/doris_test/stream_load_data.csv")
into table `test`
columns terminated by ","
format as "csv"
(id,user_name,member_list)
)
with hdfs (
"fs.defaultfs"="hdfs://127.0.0.1:9000",
"hadoop.username"="root"
)
properties
(
"timeout"="1200",
"max_filter_ratio"="0.1"
);
注意:
show load
可以看到导入任务的状态 jobid: 10041
label: label_20220404
state: finished
progress: etl:100%; load:100%
type: broker
etlinfo: unselected.rows=0; dpp.abnorm.all=0; dpp.norm.all=3
taskinfo: cluster:n/a; timeout(s):1200; max_filter_ratio:0.1
errormsg: null
createtime: 2022-10-21 00:33:34
etlstarttime: 2022-10-21 00:33:38
etlfinishtime: 2022-10-21 00:33:38
loadstarttime: 2022-10-21 00:33:38
loadfinishtime: 2022-10-21 00:33:38
url: null
jobdetails: {"unfinished backends":{"a32767db5a4249e8-96d523ac04909465":[]},"scannedrows":3,"tasknumber":1,"loadbytes":96,"all backends":{"a32767db5a4249e8-96d523ac04909465":[10003]},"filenumber":1,"filesize":39}
transactionid: 5
errortablets: {}
6 rows in set (0.01 sec)
支持表查询语句,查看导入结果:
mysql> select * from test;
+------+-----------+-------------+
| id | user_name | member_list |
+------+-----------+-------------+
| 5 | sim5 | 1.500 |
| 6 | sim6 | 1.006 |
| 7 | sim7 | 1.070 |
+------+-----------+-------------+
具体参加:https://doris.apache.org/zh-cn/docs/dev/sql-manual/sql-reference/data-manipulation-statements/load/broker-load
您想发表意见!!点此发布评论
版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。
发表评论