Elasticsearch, Logstash, and Kibana – The ELK stack

ELK Hunt

Article from Issue 185/2016

Author(s): Christian Rohmann , Author(s): Heike Jurzik

A powerful search engine, a tool for processing and normalizing protocols, and another for visualizing the results – Elasticsearch, Logstash, and Kibana form the ELK stack, which helps admins manage logfiles on high-volume systems.

Even a single, small LAMP server will produce a number of logfiles, and if you have a large array of servers, you can generally look forward to a volume of logfiles that is likely to exceed the capabilities of most built-in log management tools – if you want to analyze the data in your logs, that is. The different file formats output by the typical zoo of applications also add complexity.

The ELK stack, which is a combination of Elasticsearch [1], Logstash [2], and Kibana [3] addresses these difficulties. Elasticsearch is an extremely powerful search server that receives its data from Logstash, an application that extracts the data from server protocols, normalizes them, and dumps the results in an Elasticsearch index. Finally, the Kibana analytics and data visualization tool offers extremely flexible views of the information.

The lab environment consisted of several Debian Jessie servers, one running an ELK stack, as well as Filebeat [4], a service that acquires the local logs and sends them to Logstash. Filebeat can also collect logs from remote sources; we used it on another server that was already set up and upgraded as a central log host. The server also takes care of Syslog forwarding.

Three other servers work as Elasticsearch nodes to improve storage space and search performance across the board. Currently, an ELK stack is taking care of the logs from Postfix, Dovecot, Apache, Nginx, and Open-Xchange in the lab.

Elasticsearch

Elasticsearch [1] by Elastic is implemented in Java and based on Apache Lucene, an extremely powerful full-text search engine that provides its feature set via a REST API. Elasticsearch automatically indexes all text (documents). Even without defining fields or data types, it can find search terms in a large volume of data. Elasticsearch supports complex requests with many dependencies and understands metrics (e.g., the frequency of occurrence of certain criteria).

The main components are released under the Apache license and are available for free via the GitHub repository and the project's website. This is also where users will find the source code and packages for Debian- and RPM-based distributions. Elasticsearch has additional commercial modules, such as Shield (see the "Security!" section), Marvel (monitoring), or Watcher (alerting).

Elastic does not sell individual licenses for the plugins; instead, users need to take out a subscription that includes all the components and support. The website does not cite prices for the individual subscription models [5]. If you are interested in a subscription, you need to contact the vendor to request a quotation.

The test team installed version 2.1.0 dated November 24, 2015, using the Debian package from the homepage. The Elasticsearch repository was added to our own server's package sources to keep everything up to date. The package is easily integrated with the system – but it does not complain if you are missing a Java Runtime Environment. This is something you definitely need to install retroactively; openjdk-8-jre worked perfectly in our lab. The installation routine sets up a service unit for systemd to start and stop the daemon.

Well Distributed

Linking up multiple machines with an Elasticsearch installation to create a cluster is easily done. The nodes synchronize their indexes in the cluster and autonomously distribute incoming search requests from clients. Adding a second Elasticsearch node means the data is replicated, so you start to increase storage space as of the third node. Elasticsearch automatically breaks down its indexes into shards, which means that the service can store large collections of data distributed across multiple servers, ensuring replication if a node fails.

Moreover, access is distributed, which improves performance and ensures that large collections of data are searched quickly. Admins do not need to decide whether or not they want the ELK to scale before installing and setting up. At any time you can extend your setup and add more Elasticsearch nodes to your cluster. The software supports mechanisms for distributing the data out of the box, which removes the need for an additional clustering or load balancing component.

Elasticsearch is configured in the /etc/elasticsearch/elasticsearch.yml file, which is broken down into various sections. The listings for this article [6] has an example of the first section, as well as the setup file for the other nodes. The cluster name is listed below the Cluster section (e.g., cluster.name: elk-test), and the Node section contains the node designations: elk-test1, elk-test2, … elk-test4 in this example (e.g., node.name: elk-test1).

The test team also made changes below Network. By default, the Elasticsearch service is tied to port 9200 on localhost (IPv4 and IPv6). Because we have multiple nodes, we told Elasticsearch to listen on all network interfaces. As of this writing, it is not possible to define a list of interfaces and thus restrict access, but the vendor has received such a feature request.

If you have multiple IP addresses, you can use the publish_host variable in the Network section to define which IP the computer uses to communicate with the other Elasticsearch nodes. In contrast, bind_host defines the addresses on which the service listens. The setting is particularly important if you need to scale massively. In this case, you will probably want the Elasticsearch nodes to exchange data on one network but use a different outward-facing IP for client access.

The Discovery section of the configuration file, which is where you list all the nodes. is also interesting if you have more than one Elasticsearch node. Once a node is set up, users can run the curl command-line tool or use their web browsers to check whether the search service is running (Figure 1).

Figure 1: Elasticsearch doesn't need much configuration; the service is accessible on localhost:9200 after a short while.

Security!

One thing you notice on first contact is that Elasticsearch does not use any authentication mechanisms and that the data passes through the network in the clear. It also lacks rights management to determine which client is allowed to access what part of the index.

The Shield [7] plugin gives you all of these security features and can be particularly interesting if you are running Elasticsearch in a cluster with multiple server instances. You can use the /usr/share/elasticsearch/bin/plugin scripts to install the license and Shield on each of your nodes – as described on the website. Then restart all of your Elasticsearch services. You can test Shield and the other commercial plugins for 30 days free of charge.

Shield extends the search service to include user management and a rights system. It also encrypts the data streams between the Elasticsearch nodes with SSL and prevents unauthorized nodes joining the cluster. You need to manage the SSL certificates yourself, but you will find some support in the Shield documentation on the website.

As an alternative, you can use iptables to decide who is allowed to access your Elasticsearch server or servers. For example, you could specify that only certain machines on your internal network are allowed to access the nodes (Listing 1), but this does not solve the problem of unencrypted data transfer. In the case of logfiles, which may contain confidential information, this is not exactly ideal. Because Elasticsearch provides a web server, you could install a reverse proxy in the middle to enable both SSL encryption and authentication based on htpasswd.

Listing 1

Iptables Rules for Elasticsearch

1 2 3 4 Next »

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Systemd Fixes Bug While Facing New Challenger in GNU Shepherd

Linux , Software , Systemd

The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
AlmaLinux 10.0 Beta Released

AlmaLinux , Enterprise Linux , open source

The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
Gnome 47.2 Now Available

Gnome , Linux , open source

Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
Latest Cinnamon Desktop Releases with a Bold New Look

Cinnamon , Linux , open source

Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
Armbian 24.11 Released with Expanded Hardware Support

Armbian , DEBIAN , Ubuntu

If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
SUSE Renames Several Products for Better Name Recognition

Enterprise Linux , SUSE , Virtualization

SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
ESET Discovers New Linux Malware

Linux , malware , Security

WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
New Linux Kernel Patch Allows Forcing a CPU Mitigation

Kernel , Linux , Security

Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
Red Hat Enterprise Linux 9.5 Released

Enterprise Linux , RHEL

Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
Linux Sees Massive Performance Increase from a Single Line of Code

Kernel , Linux

With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.

Elasticsearch, Logstash, and Kibana – The ELK stack