简单搭建kafka + zookeeper，附简单Java生产和消费客户端

https://blog.csdn.net/adam_ling/article/details/56284448

2017年02月21日 15:28:12 TraitorousAdam 阅读数：7331
首先说明一下这是一篇入门实战操作文章，本人亦是刚刚开始接触kafka，看了网上很多很多的文章，觉得自己也应当在搭建平台的过程中留下一些痕迹，总结一下遇到的问题和解决方法。

这篇文章主要是讲从零开始把服务器环境配好，然后用简单的java生产者和消费者客户端将整个流程跑通。

说一下为什么要使用kafka：实验室最近新起了一个研究方向，主要是挖掘用户的行为，基于机器学习的一些模型（SVM/LR/RF之类的）识别用户的状态，为用户提供实时的反馈。客户端会产生大量的用户行为数据和图像特征数据，服务端需要对这些数据进行实时计算并给出客户端反馈，与此同时这些数据需要持久化供模型训练使用。在这里kafka主要作为用户行为数据源，后面可能会使用Spark streaming进行online learning或者是先使用hbase，mongodb之类的nosql数据库存储数据再使用Spark进行offline learning，不过这是后话了。

有关kafka的背景知识，我觉得去官网上看看吧，别人的博文也一堆一堆的，这里主要讲讲怎么搭个简单的小集群。

本次搭建使用两台服务器（zookeeper由于选举机制推荐使用2*n+1台，不过这里是实验，所以先使用两台，等有闲置的服务器再加一台）

两台服务器系统均为Ubuntu 16.04 LTS desktop，Java环境为openjdk1.8.0_121，IP分别为192.168.1.37，192.168.1.199（在同一局域网内）

进入正题，我们先分别从kafka和zookeeper官网上分别下载各自程序包，解压，这里使用的是当前最新的zookeeper-3.4.9和kafka_2.11-0.10.1.1

配置和启动zookeeper
1.1 配置文件
先拷贝一份sample配置，然后修改里面的参数，这里仅添加服务器列表，其余参数按默认的来（先按照最简单的来，以跑通为目标）

cp /zookeeper-3.4.9/conf/zoo_sample.cfg /zookeeper-3.4.9/conf/zoo.cfg
vim /zookeeper-3.4.9/conf/zoo.cfg
#The number of milliseconds of each tick
tickTime=2000
#The number of ticks that the initial 
#synchronization phase can take
initLimit=10
#The number of ticks that can pass between 
#sending a request and getting an acknowledgement
syncLimit=5
#the directory where the snapshot is stored.
#do not use /tmp for storage, /tmp here is just 
#example sakes.
dataDir=/tmp/zookeeper
#the port at which the clients will connect
clientPort=2181
#the maximum number of client connections.
#increase this if you need to handle more clients
#maxClientCnxns=60

#Be sure to read the maintenance section of the 
#administrator guide before turning on autopurge.

#http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
#Purge task interval in hours
#Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
#服务器列表
server.1=192.168.1.37:20881:30881
server.2=192.168.1.199:20881:30881

这边说一下几个重要参数吧，clientPort是暴露给外部系统的（如kafka）

然后服务器列表中的每一条格式为server.X=ip:portA:portB

X为每台服务器的标识符，portA表示该server与集群中的leader交互信息使用的端口，portB为选举leader使用的端口。

在我的搭建环境中，192.168.1.37为1号服务器，192.168.1.199为2号服务器。

1.2 配置myid
然后根据.cfg文件中的dataDir目录下（zookeeper存放数据），创建一个名为myid的文件，内容为id号

对于192.168.1.37这台机器而言，即：

echo 1 > /tmp/zookeeper/myid
1.3 启动zookeeper
那么现在就先启动192.168.1.37这台机器上的zookeeper，在zookeeper-3.4.9文件夹中，执行zkServer.sh脚本，便能在终端内看到运行信息

sudo bin/zkServer.sh start conf/zoo.cfg

将192.168.1.37服务器上的zookeeper配置文件scp到192.168.1.199这台服务器上，然后 echo 2 > /tmp/zookeeper/myid，随后执行启动脚本启动

至此，zookeeper算是简单的搭建起来了。

配置和启动Kafka Broker
2.1 配置Kafka Broker（192.168.1.37）
Kafka broker的配置文件为 /kafka_2.11-0.10.1.1/config/server.properties，详细的内容我不列出来了，这边列举一些关键配置参数：

#...
broker.id=1
#...
############################# Socket Server Settings #############################


#The address the socket server listens on. It will get the value returned from 
#java.net.InetAddress.getCanonicalHostName() if not configured.
#FORMAT:
#listeners = security_protocol://host_name:port
#EXAMPLE:
#listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092
#Hostname and port the broker will advertise to producers and consumers. If not set, 
#it uses the value for "listeners" if configured.  Otherwise, it will use the value
#returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092
advertised.listeners=PLAINTEXT://192.168.1.37:9092
############################# Zookeeper #############################


#Zookeeper connection string (see zookeeper docs for details).
#This is a comma separated host:port pairs, each corresponding to a zk
#server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
#You can also append an optional chroot string to the urls to specify the
#root directory for all kafka znodes.
zookeeper.connect=192.168.1.37:2181,192.168.1.199:2181


#Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000

a) broker.id是这台服务器的在kafka中的标识符，全局唯一
b) advertised.listeners，这个是会发布给producer和consumer的地址，producer和consumer程序访问kafka
c) zookeeper.connect = ip:port,… 这个是zookeeper服务器列表，以逗号分隔
2.2 启动Kafka
在目录kafka_2.11_0.10.1.1中，执行以下脚本
bin/kafka-server-start.sh config/server.properties

在192.168.1.199这台服务器上可按照2.1相同配置，但是需要修改server.properties中的broker.id为2，随后在192.168.1.199上执行启动脚本

2.3 创建Topic
创建Topic的命令为：
bin/kafka-topics.sh --create --zookeeper 192.168.1.37:2181,192.168.1.199:2181 --replication-factor 2 --partitions 1 --topic test

–zookeeper是zookeeper服务器列表，–repication-factor是备份数量，–partitions是该topic的分区数量，–topic是创建的topic的名字。
创建完以后可以使用

其他有一些命令可以查看topic的一些信息，这些命令上可以去官网查找。

2.4 Java Producer客户端（官网搬运）
在Maven中添加一下Kafka client的依赖

        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>0.10.1.0</version>
        </dependency>

下面是核心代码（其实是从官网上抄来的）

public class Main {

    public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.1.199:9092,192.168.1.37:9092"); // broker list
        props.put("acks", "all");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("linger.ms", 1);
        props.put("buffer.memory", 33554432);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");


        Producer<String, String> producer = new KafkaProducer<String, String>(props);

        for(int i = 0; i < 10; i++) {
            System.out.println(i);
            // send record to topic 'test'
            producer.send(new ProducerRecord<String, String>("test", "key"+ Integer.toString(i), "value" + Integer.toString(i)));
        }

        producer.close();
    }
}

2.5 Java Consumer客户端（官网搬运）

public class Main {

    public static void main(String[] args) {
    // write your code here
        Properties props = new Properties();
        props.put("bootstrap.servers", "192.168.1.199:9092,192.168.1.37"); // broker list
        props.put("group.id", "testGroup");
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "1000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        consumer.subscribe(Arrays.asList("test")); // subscribe "test" topic
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(100);
            for (ConsumerRecord<String, String> record : records)
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
        }

    }
}

运行两个Java程序，在Consumer console中应该就能看见Producer程序所产生的信息。

结束语
我在这里仅仅是先将kafka跑通了，至于kafka本身的整体架构以及各个参数配置对于系统整体的影响皆未探究，今后一定慢慢补上研究报告。
如果仅仅是停留在会用工具的层次上，那么终究只是一个代码的搬运工，很容易被时代淘汰。
现在才感慨时间之珍贵，而知识之浩瀚；不当蹉跎岁月，应时时勉励自己，奋发图强

转载请注明：SuperIT » 简单搭建kafka + zookeeper，附简单Java生产和消费客户端