Kelson Martins Blog

Recently after building a new ELK stack, I was required to move data from the old Elasticsearch cluster into a new one.
I am confident to say that there must be a few tools that are up for the job. These tools include input-elasticsearch logstash plugin, Elasticsearch reindex API and elasticdump.
On my curiosity on evaluating new tools, I decided to experiment with elasticdump and this post shows how I used it to achieve the stated requirement of moving data from one Elasticsearch node into another.

First things first. What is elasticdump?

Elasticdump is an open-source tool, which according to its official description has the goal of moving and saving Elasticsearch indexes.
Elasticdump works by requesting data from an input and consequently shipping it into an output. Either input or output may be an Elasticsearch URL or a File.

Elasticsearch scenario

In our scenario, we are required to move data from one Elasticsearch cluster into another.
Based on elasticdump features, we are provided with 2 options to achieve this goal:
1) Using an Elasticsearch URL for both input and output.
This approach is the most straightforward option, where with one statement we can move the data across 2 clusters.
2) Using an Elasticsearch URL for input and File as output, followed by a File input and an Elasticsearch URL output.
This approach requires at least 2 executions, one to save the data from a cluster into a file and then another execution to use the generated file as input.
This approach may be useful if you want to backup the indexes before taking further actions.

Installing Elasticdump

To install Elasticdump, we can make use of an npm package or a docker image.
For those of you who wonder what is npm, it is short for Node Package Manager, which first and foremost it is an online repository hosting open-source Node.js projects.
You can install npm through:
# Ubuntu
sudo apt-get install npm

# CentOS
sudo yum install npm

With npm installed, install elasticdump with:

npm install elasticdump -g

Using elasticdump

Using elasticdump is as simple as performing something similar to:
elasticdump \
  --input={{INPUT}} \
  --output={{OUTPUT}} \
{{INPUT}} or {{OUTPUT}} can be either one of:
Elasticsearch URL: {protocol}://{host}:{port}/{index}
Fille: {FilePath}
{{TYPE}} can be one of the following: analyzer, mapping, data

Export my data – Option 1

OK. Let’s get our hands dirty and export some data.
On my case, I want to export data from a docker-daemon index and push it into a remote index.
Using Elasticsearch URL for both input and output, this is my command:
elasticdump \
  --input=http://user:[email protected]_node:9200/docker-daemon \
  --output=http://user:[email protected]_node:9200/docker-daemon \
If you follow the output, you will see something similar to:
Thu, 21 Sep 2017 14:40:29 GMT | starting dump
Thu, 21 Sep 2017 14:40:31 GMT | got 53 objects from source elasticsearch (offset: 0)
Thu, 21 Sep 2017 14:40:33 GMT | sent 53 objects to destination elasticsearch, wrote 53
Thu, 21 Sep 2017 14:40:33 GMT | got 0 objects from source elasticsearch (offset: 53)
Thu, 21 Sep 2017 14:40:33 GMT | Total Writes: 53
Thu, 21 Sep 2017 14:40:33 GMT | dump complete
Let’s now confirm that the index was transferred successfully:
On the target Elaticsearch node, perform:
[[email protected] ~]# curl -u user:password localhost:9200/_cat/indices?v | grep docker-daemon
Positive output would be:
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100  2394  100  2394    0     0   245k      0 --:--:-- --:--:-- --:--:--  259k
green  open   logstash-docker-daemon        eilJdiZvSGixTNIfMwP-kw   5   2         41            0    292.3kb        292.3kb

Export my data – Option 2

In option 1, we want first to output the index into a file before moving it to the new Node.
This can be achieved by something similar to:
elasticdump \
  --input=http://user:[email protected]_node:9200/docker-daemon \
  --output=/data/docker-daemon.json \
elasticdump \
  --input=/data/docker-daemon.json \
  --output=http://user:[email protected]_node:9200/docker-daemon \
Note that we first export the data from the index into the /data/docker-daemon.json file.
We then use this file as input to be moved into the new node.

Analyzers and Mappings

What was shown was the most basic method of moving an index from a node into a new one.
In a probable scenario when moving an index, you will want to move the index with its appropriate analyzers and field mappings.
If this is the case, you will want to move these before moving the index. This would be achieved by cascading the 3 statements as shown:
elasticdump \
  --input=http://user:[email protected]_node:9200/docker-daemon \
  --output=http://user:[email protected]_node:9200/docker-daemon \
elasticdump \
  --input=http://user:[email protected]_node:9200/docker-daemon \
  --output=http://user:[email protected]_node:9200/docker-daemon \
elasticdump \
  --input=http://user:[email protected]_node:9200/docker-daemon \
  --output=http://user:[email protected]_node:9200/docker-daemon \

Extra Options

What was shown is the basic manipulation of elasticdump.
A series of other parameters may be used depending on your requirement.
Some commonly used parameters include:
–searchBody: Useful when you do not want to export an entire index. Example:  –searchBody ‘{“query”:{“term”:{“containerName”: “nginx”}}}’
–limit: Indicates how many objects to move in batch per operation. Defaults to 100
–delete: Delete documents from the input as they are moved.
A full list of these can be found on the official tool page here.

Final Considerations

Moving elasticsearch indexes across nodes and clusters should not be a burden and elasticdump proves that.
The tool is well documented and easy to use so if you need move Elasticsearch indexes around, look no further.
As mentioned in the introduction, there are other alternatives such as using logstasn elasticsearch-input-plugin or even Elasticsearch Reindex API but these are our of the scope of this article.

Software engineer, geek, traveler, wannabe athlete and a lifelong learner. Works at @IBM

Next Post