Understanding Fluentd’s Unified Logging Layer

sandeepseeram

May 8, 20223 min read

Fluentd is an open-source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Fluentd is written in C and Ruby and requires very little system resources to operate.

The vanilla instance runs on 30-40 MB of memory and can process upto 13,000 events/second per core. In IoT environments, where we have tighter memory requirements like 450-600 kb, we can use the lightweight forwarder known as Fluent Bit.

Here is the difference between Fluentd and Fluent Bit

Source: https://www.fluentd.org/faqs

Fluentd has Pluggable Architecture

Fluentd has a flexible plugin system that allows the community to extend its functionality. Over 500+ community-contributed plugins connect dozens of data sources and data outputs. By leveraging the plugins, you can start making better use of your logs right away.

Fluentd as Unified Logging Layer

Fluentd structure data as JSON as much as possible, which allows Fluentd to unify all functions of log data collection and processing: collecting, filtering, buffering and outputting logs across multiple source and destinations. The downstream data processing is much easier with JSON, since it has enough structure to be accessible while retaining flexible schemas.

With everything defined inside a configuration file, fluentd is flexible enough to connect and collect various types of logs from your applications and infrastructure.

Lifecycle of Fluentd Event:

Lifecycle of the fluentd logging event comprises of 5 different components

1. Setup

2. Inputs

3. Filters

4. Matches

5. Labels

Setup:

Fluentd uses a main configuration file to connect all of its components and the main configuration file defines the inputs which are called listeners.

Matching rules are also setup within the main configuration file for data routing. Outputs are defined for rules to route data to a specific output.

Let’s go through the installation and setup process:

Note: td-agent is stable release of fluentd, in an enterprise environment its recommended to use td-agent rather than fluentd.

To install td-agent gem:

Ø curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh

The next step is to install the plugin required

Here is the list of 500+ community developed plugins: https://www.fluentd.org/plugins

For our example: I am going to install fluent-plugin-cloudwatch.

This plugin helps us to input AWS Cloudwatch to fluentd.

In the td-agent.conf file, we need to setup a source of events, in our case its AWS Cloud Watch.

<source>
 @type cloudwatch
 tag cloudwatch
 aws_key_id YOUR_AWS_KEY_ID
 aws_sec_key YOUR_AWS_SECRET_KEY
 cw_endpoint ENDPOINT

 namespace [namespace]
 statistics [statistics] (default: Average)
 metric_name  [metric name]
 dimensions_name [dimensions_name]
 dimensions_value [dimensions value]
 period [period] (default: 300)
 interval [interval] (default: 300)
 delayed start [bool] (default: false)
 emit zero [bool] (default: false)
</source>

The following example matches cloudwatch events and store them as file in the path. We shall call the path as buffer storage.

<match cloudwatch>

  @type copy

 <store>

  @type file

  path /var/log/td-agent/awscloudwatch

 </store>

</match>

Inputs: Fluentd events are comprised to three components:

Tags – Origin of an event

Time – Occurrence of an event

Record – content of event log

These three components are aggregated by input plugins.

Filters: Filter creates a rule to allow or disallow an event and these rules are implemented to aggregate input data from inputs.

<filter test.example>

     @type grep

     <exclude>

         key action

         pattern ^logout$

 </exclude>

</filter>

For example, in this filter we are excluding user logout action.

Matches:

Match element is to specify a specific data from the input. Matching provides an action to take place when a match of a specified event occurs. Like, if an event matches file or forward output plugins can be used to sending output to other system is the most common event matching.

Matching is comprised of a match directive and must include:

1. Match pattern

2. @type parameter

Events must match the pattern before sent to an output.

For Example: We are matching cloudwatch and using @type parameter to file it to a path.

<match cloudwatch>

  @type copy

 <store>

  @type file

  path /var/log/td-agent/awscloudwatch

 </store>

</match>

Labels: Labelling provides grouping of filters and output for simplifying tag handling. The label directive is used to support event flow separation without a tag prefix. The best use case is Error Handling.

By using the <label @ERROR> label, error event is automatically routed to this label.

In summary, fluentd is a flexible log shipper, aggregator, transformer and ingestor that can be used in any size environment.

Sandeep Seeram

Understanding Fluentd’s Unified Logging Layer

Recent Posts

Comments