it编程 > 网页制作 > 网页播放器

Doris Broker Load入门实战

327人参与 2024-08-04 网页播放器

broker load 原理

broker load 是一个异步的导入方式,支持的数据源取决于 broker 进程支持的数据源。

用户在提交导入任务后,fe 会生成对应的 plan 并根据目前 be 的个数和文件的大小,将 plan 分给 多个 be 执行,每个 be 执行一部分导入数据。

be 在执行的过程中会从 broker 拉取数据,在对数据 transform 之后将数据导入系统。所有 be 均完成导入,由 fe 最终决定导入是否成功。

                 +
                 | 1. user create broker load
                 v
            +----+----+
            |         |
            |   fe    |
            |         |
            +----+----+
                 |
                 | 2. be etl and load the data
    +--------------------------+
    |            |             |
+---v---+     +--v----+    +---v---+
|       |     |       |    |       |
|  be   |     |  be   |    |   be  |
|       |     |       |    |       |
+---+-^-+     +---+-^-+    +--+-^--+
    | |           | |         | |
    | |           | |         | | 3. pull data from broker
+---v-+-+     +---v-+-+    +--v-+--+
|       |     |       |    |       |
|broker |     |broker |    |broker |
|       |     |       |    |       |
+---+-^-+     +---+-^-+    +---+-^-+
    | |           | |          | |
+---v-+-----------v-+----------v-+-+
|       hdfs/bos/afs cluster       |
|                                  |
+----------------------------------+

broker load 导入hdfs数据

1. hdfs环境搭建
  1. hadoop3.3 下载

    wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.3/hadoop-3.3.3.tar.gz
    
  2. lacalhost ssh 免密登录

      ssh-keygen -t rsa -p '' -f ~/.ssh/id_rsa
      cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
      chmod 0600 ~/.ssh/authorized_keys
      ssh localhost
    

    执行 ssh localhost,会弹出安全提示,填写yes即可。

    如果是docker环境下,需要手动启动sshd服务。

     /usr/sbin/sshd
    
  3. core-site.xml配置

    [root@17a5da45700b hadoop]# cat core-site.xml
    <?xml version="1.0" encoding="utf-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      licensed under the apache license, version 2.0 (the "license");
      you may not use this file except in compliance with the license.
      you may obtain a copy of the license at
    
        http://www.apache.org/licenses/license-2.0
    
      unless required by applicable law or agreed to in writing, software
      distributed under the license is distributed on an "as is" basis,
      without warranties or conditions of any kind, either express or implied.
      see the license for the specific language governing permissions and
      limitations under the license. see accompanying license file.
    -->
    
    <!-- put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>fs.defaultfs</name>
            <value>hdfs://localhost:9000</value>
        </property>
    
    <property>
      <name>hadoop.proxyuser.root.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.root.groups</name>
      <value>*</value>
    </property>
    
    </configuration>
    
  4. hdfs-site.xml配置

    [root@17a5da45700b hadoop]# cat  hdfs-site.xml
    <?xml version="1.0" encoding="utf-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      licensed under the apache license, version 2.0 (the "license");
      you may not use this file except in compliance with the license.
      you may obtain a copy of the license at
    
        http://www.apache.org/licenses/license-2.0
    
      unless required by applicable law or agreed to in writing, software
      distributed under the license is distributed on an "as is" basis,
      without warranties or conditions of any kind, either express or implied.
      see the license for the specific language governing permissions and
      limitations under the license. see accompanying license file.
    -->
    
    <!-- put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
    </configuration>
    
  5. hadoop-env.sh配置:添加如下配置到hadoop-env.sh。

    export java_home=/data1/jdk1.8.0_331
    export hdfs_namenode_user=root
    export hdfs_datanode_user=root
    export hdfs_secondarynamenode_user=root
    export yarn_resourcemanager_user=root
    export yarn_nodemanager_user=root
    
  6. 格式化hdfs文件系统

    bin/hdfs namenode -format
    
  7. 启动hdfs

    sbin/start-dfs.sh
    
  8. 创建hdfs目录

     bin/hdfs dfs -mkdir /user
     bin/hdfs dfs -mkdir /user/root	
    
  9. 查看hdfs目录,确保hdfs服务正常。

    [root@17a5da45700b hadoop-3.3.3]#  bin/hdfs dfs -ls /user
    found 2 items
    drwxr-xr-x   - root supergroup          0 2022-06-21 03:00 /user/hive
    drwxr-xr-x   - root supergroup          0 2022-06-15 09:38 /user/root
    
  10. hadoop环境变量配置。

    export hadoop_home=/opt/software/hadoop/hadoop-3.3.3
    export hadoop_conf_dir=$hadoop_home/etc/hadoop
    export hadoop_hdfs_home=$hadoop_home
    export path=$path:$hadoop_home/sbin:$hadoop_home/bin
    
2.doris环境安装
  1. doris安装参照官网:https://doris.apache.org/zh-cn/docs/get-starting/get-starting.html#%e5%8d%95%e6%9c%ba%e9%83%a8%e7%bd%b2

  2. doris和hadoop有一些端口冲突,需要对doirs默认端口进行修改。

    vim be/conf/be.conf
    

    将webserver_port = 8040修改为webserver_port = 18040

    vim fe/conf/fe.conf 
    

    将http_port = 8030修改为http_port = 18030

doris 库表结构建立
mysql -h 127.0.0.1 -p9030  -uroot
create database;
use test;
create table `test` (
  `id` varchar(32) null default "",
  `user_name` varchar(32) null default "",
  `member_list` decimal(10,3)
) engine=olap
duplicate key(`id`)
comment 'olap'
distributed by hash(`id`) buckets 10
properties (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "v2",
"disable_auto_compaction" = "false"
);


hdfs 数据准备
  1. 编辑stream_load_data.csv,并加入如下数据
5,sim5,1.500
6,sim6,1.006
7,sim7,1.070
  1. csv数据导入hdfs
#创建hdfs目录	
bin/hdfs dfs -mkdir /user/root/doris_test
#将本地文件stream_load_data.csv写入hdfs上的doris_test目录
bin/hdfs dfs -put /data1/hadoop-3.3.0/stream_load_data.csv /user/root/doris_test
执行broker load导入数据

支持如下命令,通过broker load导入数据。

use test;
   load label test.label_20220404
        (
            data infile("hdfs://127.0.0.1:9000/user/root/doris_test/stream_load_data.csv")
            into table `test`
            columns terminated by ","
            format as "csv"          
            (id,user_name,member_list)
        ) 
        with hdfs (
            "fs.defaultfs"="hdfs://127.0.0.1:9000",
            "hadoop.username"="root"
        )
        properties
        (
            "timeout"="1200",
            "max_filter_ratio"="0.1"
        );

注意:

         jobid: 10041
         label: label_20220404
         state: finished
      progress: etl:100%; load:100%
          type: broker
       etlinfo: unselected.rows=0; dpp.abnorm.all=0; dpp.norm.all=3
      taskinfo: cluster:n/a; timeout(s):1200; max_filter_ratio:0.1
      errormsg: null
    createtime: 2022-10-21 00:33:34
  etlstarttime: 2022-10-21 00:33:38
 etlfinishtime: 2022-10-21 00:33:38
 loadstarttime: 2022-10-21 00:33:38
loadfinishtime: 2022-10-21 00:33:38
           url: null
    jobdetails: {"unfinished backends":{"a32767db5a4249e8-96d523ac04909465":[]},"scannedrows":3,"tasknumber":1,"loadbytes":96,"all backends":{"a32767db5a4249e8-96d523ac04909465":[10003]},"filenumber":1,"filesize":39}
 transactionid: 5
  errortablets: {}
6 rows in set (0.01 sec)

支持表查询语句,查看导入结果:

mysql> select * from test;
+------+-----------+-------------+
| id   | user_name | member_list |
+------+-----------+-------------+
| 5    | sim5      |       1.500 |
| 6    | sim6      |       1.006 |
| 7    | sim7      |       1.070 |
+------+-----------+-------------+

具体参加:https://doris.apache.org/zh-cn/docs/dev/sql-manual/sql-reference/data-manipulation-statements/load/broker-load

(0)
打赏 微信扫一扫 微信扫一扫

您想发表意见!!点此发布评论

推荐阅读

RabbitMQ MQTT集群方案官方说明

08-04

Docker-简介、基本操作

08-06

U盘启动盘 制作Linux Ubuntu CentOS系统启动盘 系统安装

08-06

阿里面试官:说一下RecyclerView-性能优化(1)

08-06

jmeter性能优化之tomcat配置与基础调优

08-06

ElasticSearch 实现 全文检索 支持(PDF、TXT、Word、HTML等文件)通过 ingest-attachment 插件实现 文档的检索_es全文检索word文件

08-03

猜你喜欢

版权声明:本文内容由互联网用户贡献,该文观点仅代表作者本人。本站仅提供信息存储服务,不拥有所有权,不承担相关法律责任。 如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 2386932994@qq.com 举报,一经查实将立刻删除。

发表评论