≡ Menu

How to setup curator to archive old Elastic Search indices

  •  
  •  
  •  

If you don’t have a proper archival process, data in your elastic search cluster will grow uncontrollably. You risk losing valuable log data if you don’t make sure you have enough space in your disk subsystem. From the elastic search log file, you might see messages like below:

[INFO ][cluster.routing.allocation.decider] [myELK-Node2] low disk watermark [85%] exceeded on [aULsc9C7R1ecGfHvy0gNqg][ myELK -Node1] free: 2gb[10%], replicas will not be assigned to this node

[WARN ][cluster.routing.allocation.decider] [myELK -Node2] high disk watermark [90%] exceeded on [G19eWLL9Skqcq8Mb0p-xTg][ myELK -Node2] free: 1.9gb[9.8%], shards will be relocated away from this node

INFO ][cluster.routing.allocation.decider] [myELK -Node2] high disk watermark exceeded on one or more nodes, rerouting shards

That is not pretty.

There are few ways to delete unused/old indexes.

  1. You can use the HEAD plugin
  2. You can use CURL commands or SENSE Dashboard
  3. You can use the tool curator.

Using HEAD is pretty straight forward. Locate the index you want to delete, click on ‘actions’ -> ‘delete’ and confirm deletion by typing ‘DELETE’ and you are done.

Using SENSE dashboard is more fun. Launch sense and execute ‘DELETE <index name>’ as shown below.

In order to see the available indexes, just type “.” after the delete command and sense will pull up all the available indexes.

The above two methods are great but they are ‘manual’. In order to automate deleting old indexes, you need a better way. Say hello to curator.

Curator is a python tool that helps you to manage indices. While ‘deleting’ has been the primary use case for curator, it can do more than just deleting. You can use curator for the following tasks

  1. Shard routing allocation
  2. Close indices and open closed indices
  3. Optimize indices
  4. Change number of replicas
  5. Take backup of indices (snapshot)
  6. List the available indices and backups (snapshots)
  7. Delete indices and backups (snapshots)

To quickly summarize the install process: You basically need to install Python 3.4 or above first and then use pip to install elastic search

pip install elasticsearch-curator

For further installation instructions, see https://www.elastic.co/guide/en/elasticsearch/client/curator/current/installation.html.

Test the command first:

Once you have a working curator, type the following command to list the available indices (Note: I am assuming you are running the command on the server where elastic search is running)

curator show –show-indices

If you want to delete indices older than 50 days, you would just execute the following command:

curator delete –older-than 50

2015-12-21 17:17:49,115 INFO Deleting indices…

2015-12-21 17:17:49,692 INFO delete_index operation succeeded on logstash-2015.10.29

2015-12-21 17:17:50,097 INFO delete_index operation succeeded on logstash-2015.10.30

2015-12-21 17:17:50,472 INFO delete_index operation succeeded on logstash-2015.10.31

2015-12-21 17:17:50,878 INFO delete_index operation succeeded on logstash-2015.11.01

2015-12-21 17:17:51,236 INFO delete_index operation succeeded on logstash-2015.11.02

2015-12-21 17:17:51,236 INFO logstash-2015.11.03 is within the threshold period (50 days).

2015-12-21 17:17:51,252 INFO logstash-2015.11.04 is within the threshold period (50 days).

2015-12-21 17:17:51,252 INFO logstash-2015.11.05 is within the threshold period (50 days).

2015-12-21 17:17:51,252 INFO Specified indices deleted.

2015-12-21 17:17:51,252 INFO Done in 0:00:02.168456

Some important arguments:

–older-than: Number of time units older than this

–prefix: Prefix of the names of the indices. Default is logstash-

–time-unit: Timeout. Default is days

For example, if you want to delete the marvel indices that are older than 15 days, you would use the following command:

curator delete –prefix .marvel- –older-than 15

Once you have the curator command for deletion, all you have to do is schedule it either using cron (in Unix) or Windows Task Scheduler (or any other job automation tool you use).

It’s that simple.
Note: So, can you simply use a curl command and automate it using a scheduler? Perhaps you could. But the flexibility curator offers makes it easy to setup.


  •  
  •  
  •  

Comments on this entry are closed.