一:logstash概述:
简单来说logstash就是一根具备实时数据传输能力的管道,负责将数据信息从管道的输入端传输到管道的输出端;与此同时这根管道还可以让你根据自己的需求在中间加上滤网,Logstash提供里很多功能强大的滤网以满足你的各种应用场景。
logstash常用于日志系统中做日志采集设备,最常用于ELK中作为日志收集器使用
二:logstash作用:
集中、转换和存储你的数据,是一个开源的服务器端数据处理管道,可以同时从多个数据源获取数据,并对其进行转换,然后将其发送到你最喜欢的“存储
三:logstash的架构:
logstash的基本流程架构:input | filter | output 如需对数据进行额外处理,filter可省略。
3.1 Input(输入):采集各种样式,大小和相关来源数据,从各个服务器中收集数据。
数据往往以各种各样的形式,或分散或集中地存在于很多系统中。Logstash 支持各种输入选择 ,可以在同一时间从众多常用来源捕捉事件。能够以连续的流式传输方式,轻松地从您的日志、指标、Web 应用、数据存储以及各种 AWS 服务采集数据。
3.2 Filter(过滤器)
用于在将event通过output发出之前对其实现某些处理功能。grok。
grok:用于分析结构化文本数据。目前 是logstash中将非结构化数据日志数据转化为结构化的可查询数据的不二之选
[root@node1 ~]# rpm -ql logstash | grep "patterns$" grok定义模式结构化的位置。
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns/grok-patterns
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns/mcollective-patterns
[root@node1 ~]#
3.3 Output(输出):将我们过滤出的数据保存到那些数据库和相关存储中,。
3.4 总结:
inpust:必须,负责产生事件(Inputs generate events),常用:File、syslog、redis、beats(如:Filebeats)
filters:可选,负责数据处理与转换(filters modify them),常用:grok、mutate、drop、clone、geoip
outpus:必须,负责数据输出(outputs ship them elsewhere),常用:elasticsearch、file、graphite、statsd
四:安装logstash环境。依据rpm包进行下载。
准备了四个节点,实验备用;
节点 ip
192.168.126.128 node1
192.168.126.129 node2
192.168.126.130 node3
192.168.126.131 node4
[root@node1 ~]# rpm -ivh logstash-7.9.1.rpm
warning: logstash-7.9.1.rpm: Header V4 RSA/SHA512 Signature, key ID d88e42b4: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:logstash-1:7.9.1-1 ################################# [100%]
Using provided startup.options file: /etc/logstash/startup.options
/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/pleaserun-0.0.31/lib/pleaserun/platform/base.rb:112: warning: constant ::Fixnum is deprecated
Successfully created system startup script for Logstash
[root@node1 ~]# vim /etc/profile.d/logstash.sh 添加可执行文件路径
export PATH=$PATH:/usr/share/logstash/bin
[root@node1 ~]# source /etc/profile.d/logstash.sh
[root@node1 ~]# java -version 基于logstash是jruby语言编写,即需要java环境。
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
五:logstash的工作流程。
input { 从哪个地方读取,输入数据。
filter { 依据grok模式对数据进行分析结构化
output { 将分析好的数据输出存储到哪些地方
实例一:我们以标准输入,来输出数据。
[root@node1 ~]# cd /etc/logstash/conf.d/ 默认logstash的配制文件在这个目录下
[root@node1 conf.d]# ls
[root@node1 conf.d]# vim shil.conf
input {
stdin { 标准输入
output {
stdout { 标准输入
codec => rubydebug 编码格式ruby
[root@node1 conf.d]# logstash -f /etc/logstash/conf.d/shil.conf --config.debug 使用--config.debug进行验证配置是否有错误
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2020-10-13 13:11:00.251 [main] runner - --config.debug was specified, but log.level was not set to 'debug'! No config info will be logged.
[INFO ] 2020-10-13 13:11:00.261 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.262-b10 on 1.8.0_262-b10 +indy +jit [linux-x86_64]"}
[INFO ] 2020-10-13 13:11:00.319 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2020-10-13 13:11:00.340 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2020-10-13 13:11:00.803 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-10-13 13:11:00.845 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"593d27c7-7f01-4bbc-a68c-e60c555d2f73", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2020-10-13 13:11:02.670 [Converge PipelineAction::Create
] Reflections - Reflections took 44 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-10-13 13:11:03.627 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/shil.conf"], :thread=>"#"}
[INFO ] 2020-10-13 13:11:04.503 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.87}
[INFO ] 2020-10-13 13:11:04.567 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2020-10-13 13:11:04.682 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-10-13 13:11:04.935 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
[root@node1 conf.d]# logstash -f /etc/logstash/conf.d/shil.conf --config.debug
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[WARN ] 2020-10-13 13:11:00.251 [main] runner - --config.debug was specified, but log.level was not set to 'debug'! No config info will be logged.
[INFO ] 2020-10-13 13:11:00.261 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.262-b10 on 1.8.0_262-b10 +indy +jit [linux-x86_64]"}
[INFO ] 2020-10-13 13:11:00.319 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/usr/share/logstash/data/queue"}
[INFO ] 2020-10-13 13:11:00.340 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/usr/share/logstash/data/dead_letter_queue"}
[WARN ] 2020-10-13 13:11:00.803 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-10-13 13:11:00.845 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"593d27c7-7f01-4bbc-a68c-e60c555d2f73", :path=>"/usr/share/logstash/data/uuid"}
[INFO ] 2020-10-13 13:11:02.670 [Converge PipelineAction::Create
] Reflections - Reflections took 44 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-10-13 13:11:03.627 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/shil.conf"], :thread=>"#"}
[INFO ] 2020-10-13 13:11:04.503 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.87}
[INFO ] 2020-10-13 13:11:04.567 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2020-10-13 13:11:04.682 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-10-13 13:11:04.935 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
^C[WARN ] 2020-10-13 13:16:23.243 [SIGINT handler] runner - SIGINT received. Shutting down.
[INFO ] 2020-10-13 13:16:24.389 [Converge PipelineAction::Stop
] javapipeline - Pipeline terminated {"pipeline.id"=>"main"}
^C[FATAL] 2020-10-13 13:16:24.429 [SIGINT handler] runner - SIGINT received. Terminating immediately..
[ERROR] 2020-10-13 13:16:24.490 [LogStash::Runner] Logstash - org.jruby.exceptions.ThreadKill
[root@node1 conf.d]# logstash -f /etc/logstash/conf.d/shil.conf
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-10-13 13:16:44.536 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.262-b10 on 1.8.0_262-b10 +indy +jit [linux-x86_64]"}
[WARN ] 2020-10-13 13:16:44.920 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-10-13 13:16:46.360 [Converge PipelineAction::Create
] Reflections - Reflections took 37 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-10-13 13:16:47.108 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/shil.conf"], :thread=>"#"}
[INFO ] 2020-10-13 13:16:47.839 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.73}
[INFO ] 2020-10-13 13:16:47.900 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2020-10-13 13:16:48.010 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-10-13 13:16:48.227 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
hello world 我们输入一些字段。
{
"host" => "node1", 当前主机
"message" => "hello world", 发布的消息
"@version" => "1", 版本号
"@timestamp" => 2020-10-13T06:08:07.476Z
实例二:我们通过grok来对日志进行分析,读取,标准输出。
2.1我们自定义gork模式对日志进行过滤。
语法格式:
%{SYNTAX:SEMANTIC}
SYNTAX:预定义模式名称;
SEMANTIC:匹配到的文本的自定义标识符;
[root@node1 conf.d]# vim groksimple.conf
input {
stdin {}
filter {
grok {
match => { "message" => "%{IP:clientip} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" }
output {
stdout {
codec => rubydebug
[root@node1 conf.d]# logstash -f /etc/logstash/conf.d/groksimple.conf
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-10-13 14:29:41.936 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.262-b10 on 1.8.0_262-b10 +indy +jit [linux-x86_64]"}
[WARN ] 2020-10-13 14:29:42.412 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-10-13 14:29:44.025 [Converge PipelineAction::Create
] Reflections - Reflections took 42 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-10-13 14:29:44.995 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/groksimple.conf"], :thread=>"#"}
[INFO ] 2020-10-13 14:29:45.749 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.74}
[INFO ] 2020-10-13 14:29:45.820 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2020-10-13 14:29:45.902 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-10-13 14:29:46.098 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600}
1.1.1.1 get /index.html 30 0.23 我们标准输入一些日志信息。
{
"@timestamp" => 2020-10-13T06:30:11.973Z,
"host" => "node1",
"@version" => "1",
"request" => "/index.html",
"message" => "1.1.1.1 get /index.html 30 0.23",
"duration" => "0.23",
"clientip" => "1.1.1.1",
"method" => "get",
"bytes" => "30"
实例三:将一些webserver服务器产生的日志进行过滤标准输出
例如:apache产生的日志,在grok中有特定的过滤apache日志的结构。
[root@node1 conf.d]# vim httpdsimple.conf
input {
file { 从哪个文件中获取
path => ["/var/log/httpd/access_log"] 文件路径
type => "apachelog" 文件类型
start_position => "beginning" 从最开始取数据
filter {
grok { 过滤分析格式
match => {"message" => "%{COMBINEDAPACHELOG}"} 过滤httpd日志格式。
output {
stdout {
codec => rubydebug
[root@node4 conf.d]# logstash -f /etc/logstash/conf.d/httpd.conf --path.data=/tmp
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console
[INFO ] 2020-10-13 17:57:53.743 [main] runner - Starting Logstash {"logstash.version"=>"7.9.1", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 25.262-b10 on 1.8.0_262-b10 +indy +jit [linux-x86_64]"}
[INFO ] 2020-10-13 17:57:53.803 [main] writabledirectory - Creating directory {:setting=>"path.queue", :path=>"/tmp/queue"}
[INFO ] 2020-10-13 17:57:53.819 [main] writabledirectory - Creating directory {:setting=>"path.dead_letter_queue", :path=>"/tmp/dead_letter_queue"}
[WARN ] 2020-10-13 17:57:54.315 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified
[INFO ] 2020-10-13 17:57:54.352 [LogStash::Runner] agent - No persistent UUID file found. Generating new UUID {:uuid=>"99ab593e-1436-49c0-874a-e815644cb316", :path=>"/tmp/uuid"}
[INFO ] 2020-10-13 17:57:56.722 [Converge PipelineAction::Create
] Reflections - Reflections took 51 ms to scan 1 urls, producing 22 keys and 45 values
[INFO ] 2020-10-13 17:57:58.642 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>250, "pipeline.sources"=>["/etc/logstash/conf.d/httpd.conf"], :thread=>"#"}
[INFO ] 2020-10-13 17:57:59.663 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>1.01}
[INFO ] 2020-10-13 17:58:00.166 [[main]-pipeline-manager] file - No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/tmp/plugins/inputs/file/.sincedb_15940cad53dd1d99808eeaecd6f6ad3f", :path=>["/var/log/httpd/access_log"]}
[INFO ] 2020-10-13 17:58:00.198 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
[INFO ] 2020-10-13 17:58:00.316 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[INFO ] 2020-10-13 17:58:00.356 [[main]
[INFO ] 2020-10-13 17:58:00.852 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9603}
在浏览器直接访问10.5.100.183
{
"@timestamp" => 2020-10-13T10:01:02.347Z,
"message" => "- - - [13/Oct/2020:18:01:01 +0800] \"GET / HTTP/1.1\" 304 - \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36\"",
"@version" => "1",
"tags" => [[0] "_grokparsefailure"],
"path" => "/var/log/httpd/access_log",
"type" => "apachelog",
"host" => "node4"
"@timestamp" => 2020-10-13T10:01:02.407Z,
"message" => "- - - [13/Oct/2020:18:01:01 +0800] \"GET /favicon.ico HTTP/1.1\" 404 209 \"http://10.5.100.183/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36\"",
"@version" => "1",
"tags" => [[0] "_grokparsefailure"],
"path" => "/var/log/httpd/access_log",
"type" => "apachelog",
"host" => "node4"
logstash的工作流程我们完整的走了一遍,从input—>filter—>output依次进行。更多内容在下篇如何实现
读取httpd日志—>存储在redis—>读取redis数据—>存储在elasticsearch中。
关键词: logstash概述 日志采集设备 日志收集器 实时数据传输 logstash的工作流程 logstash作用 logstash的架构 Filter过滤器 logstash 安装环境