Elasticsearch, Logstash, and Kibana – The ELK stack
ELK Hunt
A powerful search engine, a tool for processing and normalizing protocols, and another for visualizing the results – Elasticsearch, Logstash, and Kibana form the ELK stack, which helps admins manage logfiles on high-volume systems.
Even a single, small LAMP server will produce a number of logfiles, and if you have a large array of servers, you can generally look forward to a volume of logfiles that is likely to exceed the capabilities of most built-in log management tools – if you want to analyze the data in your logs, that is. The different file formats output by the typical zoo of applications also add complexity.
The ELK stack, which is a combination of Elasticsearch [1], Logstash [2], and Kibana [3] addresses these difficulties. Elasticsearch is an extremely powerful search server that receives its data from Logstash, an application that extracts the data from server protocols, normalizes them, and dumps the results in an Elasticsearch index. Finally, the Kibana analytics and data visualization tool offers extremely flexible views of the information.
The lab environment consisted of several Debian Jessie servers, one running an ELK stack, as well as Filebeat [4], a service that acquires the local logs and sends them to Logstash. Filebeat can also collect logs from remote sources; we used it on another server that was already set up and upgraded as a central log host. The server also takes care of Syslog forwarding.
Three other servers work as Elasticsearch nodes to improve storage space and search performance across the board. Currently, an ELK stack is taking care of the logs from Postfix, Dovecot, Apache, Nginx, and Open-Xchange in the lab.
Elasticsearch
Elasticsearch [1] by Elastic is implemented in Java and based on Apache Lucene, an extremely powerful full-text search engine that provides its feature set via a REST API. Elasticsearch automatically indexes all text (documents). Even without defining fields or data types, it can find search terms in a large volume of data. Elasticsearch supports complex requests with many dependencies and understands metrics (e.g., the frequency of occurrence of certain criteria).
The main components are released under the Apache license and are available for free via the GitHub repository and the project's website. This is also where users will find the source code and packages for Debian- and RPM-based distributions. Elasticsearch has additional commercial modules, such as Shield (see the "Security!" section), Marvel (monitoring), or Watcher (alerting).
Elastic does not sell individual licenses for the plugins; instead, users need to take out a subscription that includes all the components and support. The website does not cite prices for the individual subscription models [5]. If you are interested in a subscription, you need to contact the vendor to request a quotation.
The test team installed version 2.1.0 dated November 24, 2015, using the Debian package from the homepage. The Elasticsearch repository was added to our own server's package sources to keep everything up to date. The package is easily integrated with the system – but it does not complain if you are missing a Java Runtime Environment. This is something you definitely need to install retroactively; openjdk-8-jre
worked perfectly in our lab. The installation routine sets up a service unit for systemd to start and stop the daemon.
Well Distributed
Linking up multiple machines with an Elasticsearch installation to create a cluster is easily done. The nodes synchronize their indexes in the cluster and autonomously distribute incoming search requests from clients. Adding a second Elasticsearch node means the data is replicated, so you start to increase storage space as of the third node. Elasticsearch automatically breaks down its indexes into shards, which means that the service can store large collections of data distributed across multiple servers, ensuring replication if a node fails.
Moreover, access is distributed, which improves performance and ensures that large collections of data are searched quickly. Admins do not need to decide whether or not they want the ELK to scale before installing and setting up. At any time you can extend your setup and add more Elasticsearch nodes to your cluster. The software supports mechanisms for distributing the data out of the box, which removes the need for an additional clustering or load balancing component.
Elasticsearch is configured in the /etc/elasticsearch/elasticsearch.yml
file, which is broken down into various sections. The listings for this article [6] has an example of the first section, as well as the setup file for the other nodes. The cluster name is listed below the Cluster
section (e.g., cluster.name: elk-test
), and the Node
section contains the node designations: elk-test1
, elk-test2
, … elk-test4
in this example (e.g., node.name: elk-test1).
The test team also made changes below Network
. By default, the Elasticsearch service is tied to port 9200 on localhost
(IPv4 and IPv6). Because we have multiple nodes, we told Elasticsearch to listen on all network interfaces. As of this writing, it is not possible to define a list of interfaces and thus restrict access, but the vendor has received such a feature request.
If you have multiple IP addresses, you can use the publish_host
variable in the Network
section to define which IP the computer uses to communicate with the other Elasticsearch nodes. In contrast, bind_host
defines the addresses on which the service listens. The setting is particularly important if you need to scale massively. In this case, you will probably want the Elasticsearch nodes to exchange data on one network but use a different outward-facing IP for client access.
The Discovery
section of the configuration file, which is where you list all the nodes. is also interesting if you have more than one Elasticsearch node. Once a node is set up, users can run the curl
command-line tool or use their web browsers to check whether the search service is running (Figure 1).
Security!
One thing you notice on first contact is that Elasticsearch does not use any authentication mechanisms and that the data passes through the network in the clear. It also lacks rights management to determine which client is allowed to access what part of the index.
The Shield [7] plugin gives you all of these security features and can be particularly interesting if you are running Elasticsearch in a cluster with multiple server instances. You can use the /usr/share/elasticsearch/bin/plugin
scripts to install the license and Shield on each of your nodes – as described on the website. Then restart all of your Elasticsearch services. You can test Shield and the other commercial plugins for 30 days free of charge.
Shield extends the search service to include user management and a rights system. It also encrypts the data streams between the Elasticsearch nodes with SSL and prevents unauthorized nodes joining the cluster. You need to manage the SSL certificates yourself, but you will find some support in the Shield documentation on the website.
As an alternative, you can use iptables to decide who is allowed to access your Elasticsearch server or servers. For example, you could specify that only certain machines on your internal network are allowed to access the nodes (Listing 1), but this does not solve the problem of unencrypted data transfer. In the case of logfiles, which may contain confidential information, this is not exactly ideal. Because Elasticsearch provides a web server, you could install a reverse proxy in the middle to enable both SSL encryption and authentication based on htpasswd
.
Listing 1
Iptables Rules for Elasticsearch
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.