Deploy Fluentd on Kubernetes
Kubernetes Logging With Fluentd
1. Introduction
Fluentd solves a major problem in today’s distributed and complex infrastructure–logging. Deploy fluentd on kubernetes is a howto on deploying logging in your Kubernetes infrastructure. System logs and application logs help you to understand the activities inside your Kubernetes cluster. Once logs are collected, they can be used for:
- Security–may be needed for compliance
- Monitoring – application and system logs can help you understand what is happening inside your cluster and help detect potential problems. e.g monitoring memory
- Troubleshooting and Debugging – help solve problems
Like most modern applications, Kubernetes support logging to help with debugging and monitoring. Kubernetes usually reads from the underlying container engine like Docker. How much logs Kubernetes collects therefore depends on logging level enabled at the underlying container engine.
There are different types of logging:
- Local logging:This is writing to the standard output and standard error streams inside the container itself. The problem with this method of logging is that when the container dies or is evicted, you may not have access to the logs.
- Node-level logging: Node-level logging is when the container engine redirects everything from the container’s stdout and stderr to another location. For example, Docker container engine redirects the 2 streams to a logging driver. Logrotation is a good way to ensure that the logs don’t clog the node. This method is better than local logging but still not a perfect solution because logs are localized on every node. Ideal solution is to have all the logs sent to a centralized node for centralized management.
- Cluster-level-logging. This requires a separate backend to store, analyze, and query logs. The backend can either be within or outside the cluster. A node-level logging agent (e.g fluentd) runs on each node and sends log data to a central logging node. Typically, the logging agent is a container that has access to a directory with log files from all of the application containers on that node. Kubernetes does not provide a native backend to store and analyze logs, but many existing logging solutions exists that integrates well with the Kubernetes cluster such as ElasticSearch and Stackdriver.
2. Fluentd, ElasticSearch and Kibana
Deploy fluentd on kubernetes tutorial discusses how to perform Kubernetes Cluster-level-logging using Fluentd, Elasticsearch and Kibana. Fluentd is the logging agent deployed on every node. Fluentd sends the standard output and standard error for each container logs collected to Elasticsearch for analysis. Visualization is done on Kibana. The diagram below (most diagrams are from Fluentd website) depicts pictorial view of Fluentd, Elasticsearch and Kibana.
2.1 What is Elasticsearch?
Elasticsearch is a search engine that provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.
2.2 What is Kibana?
Kibana is an open source data visualization plugin for Elasticsearch. It provides visualization capabilities on top of content indexed on an Elasticsearch cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.
2.3 What is Fluentd?
Fluentd Is a free and open-source log collector that instantly enables you to have a ‘Log Everything’ architecture. It has 3 main attributes:
- Unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations
- Fluentd treats logs as JSON, a popular machine-readable format.
- Fluentd is extensible and currently has over 600 plugins.
Fluentd agents are deployed on every node to gather all of the logs that are stored within individual nodes in the Kubernetes cluster. The logs can usually found under the /var/log/containers directory.
Below is the simplified architecture of Fluentd. Fluentd is pluggable, extensible and reliable. It can do buffering, HA and load balancing.
Input: Tell fluentd what to log
Engine: The main engine containing the common concerns for the logging. E.g buffering, error handling, message routing.
Output: Where to send output logs in the correct format e.g MongoDB or ProgreSQL or Elasticsearch
The input and output are pluggable and plugins can be classified into Read, Parse, Buffer, Write and Format plugins. Plugins are further discussed below.
As can be seen from the Architecture, Fluentd collects logs from different sources/Applications to be logged. It can collect data from infinite number of sources.Data collected is then output to the desired storage backend such as Mysql, MongoDB or postgreSQL. This is illustrated with this diagram.
2.4 Understanding Fluentd Logging
Fluentd works with plugins to get its mission accomplished. Some plugins are inbuilt but custom plugins can be developed as Fluentd is extensible. The different types of plugins are illustrated in the diagram below. A good reference for each of the plugins is on the fluentd website.
A brief description of each is presented here:
Interface | Description |
---|---|
Input | Entry point of data. This interface allows to gather or receive data from external sources. E.g: log file content, data over TCP, built-in metrics, etc. Can periodically pull data from data sources.A Fluentd event consists of a tag, time and record.
The input plugin is responsible for generating Fluentd events from specified data sources. |
Parser | Parsers enables the user to create their own parser formats to read user’s custom data format.convert unstructured data gathered from the Input interface into a structured one. Parsers are optional and depends on Input plugins. |
Filter | Filter plugins enables Fluentd to modify event streams by the Input Plugin. Example use cases are:
|
Buffer | By default, the data ingested by the Input plugins, resides in memory until is routed and delivered to an Output interface. |
Output | An output defines a destination for the data.There are three types of output plugins: Non-Buffered, Buffered, and Time Sliced.
|
Formatter | Lets the user extend and re-use custom output formats |
2.5 Understanding how Fluentd Sends Kubernetes Logs to ElasticSearch
The installation instructions to deploy fluentd on kubernetes are below but its important to understand how Fluentd is configured. Fluentd will contact Elasticsearch on a well defined URL and port, configured inside the Fluentd container. 3 Plugins are used here: Input, Filter and Output. The diagram below depicts the configuration architecture and the different plugins are explained. The configuration file is called td-agent.conf. td in td-agent.conf implies Treasure Data, the company behind Fluentd.
The configuration file is located at /etc/td-agent/td-agent.conf
2.5.1 Input Plugin:
Here is the configuration in td-agent.conf to collect logs from /var/log/containers
<source>
type tail
path /var/log/containers/*.log
pos_file fluentd-docker.pos
time_format %Y-%m-%dT%H:%M:%S
tag kubernetes.*
format json
read_from_head true
</source>
2.5.2 Filter Plugin
To get more information out of Docker containers suitable for Kubernetes, a plugin called Kubernetes metadata is required.
To install the filter plugin:
gem install fluent-plugin-kubernetes_metadata_filter
Here is the configuration in td-agent.conf to scrap additional kubernetes parameters
<filter kubernetes.var.log.containers.**.log>
type kubernetes_metadata
</filter>
2.5.3 Output Plugin
For the output, Elasticsearch plugin will be installed. Full details of Elasticsearch plugin can be found here
Prepare for ruby gem to run and then install fluent-plugin:
apt install ruby sudo apt-get install make libcurl4-gnutls-dev sudo apt-get install build-essential sudo apt-get install ruby2.3-dev gem install fluent-plugin-elasticsearch
The configuration in td-agent.conf to send log files to Elasticsearch is here:
<match **>
type elasticsearch
user "#{ENV['FLUENT_ELASTICSEARCH_USER']}"
password "#{ENV['FLUENT_ELASTICSEARCH_PASSWORD']}"
log_level info
include_tag_key true
host elasticsearch-logging
port 9200
logstash_format true
# Set the chunk limit the same as for fluentd-gcp.
buffer_chunk_limit 2M
# Cap buffer memory usage to 2MiB/chunk * 32 chunks = 64 MiB
buffer_queue_limit 32
flush_interval 5s
# Never wait longer than 5 minutes between retries.
max_retry_wait 30
# Disable the limit on the number of retries (retry forever).
disable_retry_limit
# Use multiple threads for processing.
num_threads 8
</match>
Note that Fluentd, Elasticsearch and Kibana will be deployed as different containers so the fluentd configurations above will be on the fluentd container.
3. Installing Fluentd, Elasticsearch and Kibana
To deploy these services, let’s use Kubernetes manifest files which are already publicly available. We need to create a deployment and a service for each of the application. You can find the manifest files cloned to this github location. Only little modifications were done to the yaml templates.
The Kubernetes installation was performed following kubernetes with KOPS, one of the earlier blog tutorials. One master and one slave node were used but you can use as many nodes desired. Fluentd is deployed as a daemonset so whenever an additional node is added, it will join the cluster and start sending logs to Elasticsearch on the master node.
Step 1: Clone the repository on your master Kubernetes node and then create the deployments and service objects:
kubectl create -f elastic-search-rc.yaml kubectl create -f elasticsearch-svc.yaml kubectl create -f kibana-rc.yaml kubectl create -f kibana-svc.yaml
Step 2: Create the fluentd daemonsets:
kubectl create -f fluentd-daemonset.yaml
Step 3: Check all is well, that all the kubernetes objects are properly deployed:
$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system elasticsearch-logging-h68v6 1/1 Unknown 0 59d kube-system elasticsearch-logging-mpdkv 1/1 Running 5 59d kube-system etcd-fluentdmaster 1/1 Running 11 63d kube-system fluentd-es-1.24-2z7w5 1/1 Running 5 59d kube-system kibana-logging-5874ff6996-5wqfg 1/1 Running 5 59d kube-system kube-apiserver-fluentdmaster 1/1 Running 11 63d kube-system kube-controller-manager-fluentdmaster 1/1 Running 11 63d kube-system kube-dns-6f4fd4bdf-655tv 3/3 Running 30 63d kube-system kube-proxy-4ff9h 1/1 Running 10 63d kube-system kube-proxy-vclr9 1/1 Running 6 59d kube-system kube-scheduler-fluentdmaster 1/1 Running 11 63d kube-system weave-net-6w9sd 2/2 Running 19 59d kube-system weave-net-m24wm 2/2 Running 3 6h kubectl get svc --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 63d kube-system elasticsearch-logging ClusterIP 10.111.123.66 <none> 9200/TCP 63d kube-system kibana-logging NodePort 10.96.204.66 <none> 80:30560/TCP 63d kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 63d
Step 4: Test Elasticsearch with basic query searches. If elasticsearch is not working properly, Kibana will give errors when loaded in the browser. Note that the IP address is the service address of Easticsearch as can be seen in the kubectl get svc command above.
curl 10.111.123.66:9200/_search?q=*pretty curl 10.111.123.66:9200/_search?q=*warning The warning search will give a long output as follows: [email protected]:~/Kubernetes-efk-stack$ curl 10.111.123.66:9200/_search?q=*warning {"took":20,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":5,"max_score":1.0,"hits":[{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmo","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:50Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:50Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1Ma5tybVhNMr2IQkj","_score":1.0,"_source":{"log":"WARNING: Tini has been relocated to /sbin/tini.\n","stream":"stderr","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728f91c","labels":{"k8s-app":"kibana-logging","pod-template-hash":"1430992552"},"host":"fluentdslave1","master_url":"https://10.96.0.1:443/api"},"@timestamp":"2018-06-12T18:10:06+00:00","tag":"kubernetes.var.log.containers.kibana-logging-5874ff6996-5wqfg_kube-system_kibana-logging-ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15.log"}},{"_index":"logstash-2018.06.12","_type":"fluentd","_id":"AWP1MbMPybVhNMr2IQmu","_score":1.0,"_source":{"type":"log","@timestamp":"2018-06-12T18:10:52Z","tags":["warning","elasticsearch","admin"],"pid":6,"message":"No living connections","log":"{\"type\":\"log\",\"@timestamp\":\"2018-06-12T18:10:52Z\",\"tags\":[\"warning\",\"elasticsearch\",\"admin\"],\"pid\":6,\"message\":\"No living connections\"}\n","stream":"stdout","docker":{"container_id":"ecb40acfaf294458d95a48c7c0f6993536bfe598a70dd12a27fa22a596490a15"},"kubernetes":{"container_name":"kibana-logging","namespace_name":"kube-system","pod_name":"kibana-logging-5874ff6996-5wqfg","pod_id":"d6402b2f-3f9b-11e8-902e-08002728
You can ssh into any of the containers if need to troubleshoot a service:
$ kubectl exec -it fluentd-es-1.24-2z7w5 --namespace=kube-system -- /bin/bash
Step 5: If all goes well, put the ip of kibana service (obtained with kubectl get svc –all-namespaces) in your URL and you will see the Kibana Dashboard.
4. Conclusion
Deploy fluentd on kubernetes tutorial discusses how to deploy Fluentd, Kibana and Elasticserach on a Kubernetes cluster. You’ll have a fully functional Kubernetes cluster together with Logging by following this tutorials. Fluentd is very important and almost becoming the standard in modern architecture logging, replacing syslog. If you like the tutorials, do subscribe to our blog and youtube channel for more coming your way.