Elasticsearch安装analysis-ik中文分词插件
环境:Elasticsearch 2.3.2和analysis-ik 1.9.3为例
一开始我下载了个最新版本的ik结果安装后启动提示版本不兼容。
/etc/init.d/elasticsearch start
Starting elasticsearch: Exception in thread “main” java.lang.IllegalArgumentException: Plugin [analysis-ik] is incompatible with Elasticsearch [2.3.2]. Was designed for version [5.0.0]
重新查找后很简单也不用mvn重新编译打包
到https://github.com/medcl/elasticsearch-analysis-ik/releases对应下载一个zip包,解压放到usr/share/elasticsearch/plugins/ik下即可。
配置词库(ik自带搜狗词库)
配置:/usr/share/elasticsearch/plugins/ik/config/ik/IKAnalyzer.cfg.xml
<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic;custom/sougou.dic</entry>
打开ES_HOME/config/elasticsearch.yml文件
在文件最后加入如下内容:
index: analysis: analyzer: ik: alias: [ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_max_word: type: ik use_smart: false ik_smart: type: ik use_smart: true index.analysis.analyzer.default.type: ik
重启elasticsearch
service elasticsearch restart
测试
http://localhost:9200/随便一个索引名/_analyze?analyzer=ik&pretty=true&text=深圳热销限时促销优惠600元
{ "tokens" : [ { "token" : "深圳", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0 }, { "token" : "圳", "start_offset" : 1, "end_offset" : 2, "type" : "CN_WORD", "position" : 1 }, { "token" : "热销", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "热", "start_offset" : 2, "end_offset" : 3, "type" : "CN_WORD", "position" : 3 }, { "token" : "销", "start_offset" : 3, "end_offset" : 4, "type" : "CN_WORD", "position" : 4 }, { "token" : "限时", "start_offset" : 4, "end_offset" : 6, "type" : "CN_WORD", "position" : 5 }, { "token" : "促销", "start_offset" : 6, "end_offset" : 8, "type" : "CN_WORD", "position" : 6 }, { "token" : "促", "start_offset" : 6, "end_offset" : 7, "type" : "CN_WORD", "position" : 7 }, { "token" : "销", "start_offset" : 7, "end_offset" : 8, "type" : "CN_WORD", "position" : 8 }, { "token" : "优惠", "start_offset" : 8, "end_offset" : 10, "type" : "CN_WORD", "position" : 9 }, { "token" : "惠", "start_offset" : 9, "end_offset" : 10, "type" : "CN_WORD", "position" : 10 }, { "token" : "600", "start_offset" : 10, "end_offset" : 13, "type" : "ARABIC", "position" : 11 }, { "token" : "元", "start_offset" : 13, "end_offset" : 14, "type" : "COUNT", "position" : 12 } ] }