Kelson Martins Blog

Elastic introduced Elasticsearch Cross Cluster Search on version 5.3, which will soon be replacing the now deprecated Tribe node.
Similar to the Tribe node, the goal of Elasticsearch Cross Cluster Search is to perform a federated search across multiple clusters and differently from the Tribe node, Cross Cluster Search node do not join a remote cluster.

Scenario

This post will demonstrate the use of Cross Cluster Search through an example scenario containing 2 clusters, having each cluster 2 nodes.
 I bootstrapped these 2 clusters locally, where in my scenario they have the following distribution:
Node
Address
Port
Transport Port
Cluster
elasticsearch01
127.0.0.1
9201
9301
America
elasticsearch02
127.0.0.1
9202
9302
America
elasticsearch03
127.0.0.1
9203
9303
Europe
elasticsearch04
127.0.0.1
9204
9304
Europe

Now let’s check their health:

[[email protected] elasticsearch02]# curl localhost:9201/_cluster/health?pretty
{
  "cluster_name" : "America",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 6,
  "active_shards" : 12,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
Note that above we have our cluster “America” running with 2 nodes.
Similarly, “Europe” cluster can be seen below also running with 2 nodes.
[[email protected] elasticsearch04]# curl localhost:9203/_cluster/health?pretty
{
  "cluster_name" : "Europe",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 6,
  "active_shards" : 12,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}
These clusters are working independently and up to this point, no cross communication happens.
To work further on our example, I inserted a message on each cluster on an index named “logstash-ccs-*”, and these messages can be seen as follow:
First, see a sample message from the cluster “America”
[[email protected] elasticsearch04]# curl -XGET localhost:9201/logstash-ccs-*/_search?pretty
{
  "took" : 85,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "logstash-ccs-2017.11.28",
        "_type" : "doc",
        "_id" : "LXEsAmABzrBCe4vHDNB9",
        "_score" : 1.0,
        "_source" : {
          "@version" : "1",
          "host" : "gateway",
          "@timestamp" : "2017-11-28T10:28:47.550Z",
          "@metdata" : {
            "ip_address" : "172.18.0.1"
          },
          "message" : "message america cluster",
          "port" : 43982
        }
      }
    ]
  }
}

Now, let’s check our event on the cluster “Europe”:

[[email protected] elasticsearch04]# curl -XGET localhost:9203/logstash-ccs-*/_search?pretty
{
  "took" : 14,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "logstash-ccs-2017.11.28",
        "_type" : "doc",
        "_id" : "T785AmABFtLcLj8V4ejd",
        "_score" : 1.0,
        "_source" : {
          "@version" : "1",
          "host" : "gateway",
          "@timestamp" : "2017-11-28T10:43:55.802Z",
          "@metdata" : {
            "ip_address" : "172.18.0.1"
          },
          "message" : "message europe cluster\r",
          "port" : 41146
        }
      }
    ]
  }
}
Now, what if we want to be able to perform queries against one single cluster but still retrieve data from remote clusters as well? With Cross Cluster Search we can.

Configuring Cross Cluster Search

There are 2 ways in which we can configure our cluster for cross-cluster search. They are:

1) Configuration via elasticsearch.yml

To do that, we must only list the remote clusters that should be connected to. In our example scenario, this would be achieved with the following snippet:
search:
    remote:
        america:
            seeds: 127.0.0.1:9301
            seeds: 127.0.0.1:9302
        europe:
            seeds: 127.0.0.1:9303
            seeds: 127.0.0.1:9304
Note
One important note when configuring Cross Cluster Search via elasticsearch.yml is that as of now, Elasticsearch node will not go up if the remote cluster that it should connect to is not up and running.
To exemplify, if I add the above snippet configuration into the elasticsearch.yml of one of the nodes in the “America” cluster, the node will not start if the “Europe” cluster is not up and running. So keep that in mind.
2)  The second approach (my personal choice) is to use the Cluster Settings API to add or remove remote nodes.
In our example scenario, this would be achieved by the following curl command:
curl -XPUT -H'Content-Type: application/json' localhost:9201/_cluster/settings -d '
{
    "persistent": {
        "search.remote": {
            "america": {
                "seeds": ["127.0.0.1:9301","127.0.0.1:9302"]
            },
            "europe": {
                "seeds": ["127.0.0.1:9303","127.0.0.1:9304"]
            }
        }
    }
}'
Let’s test it?

Validating Elasticsearch Cross Cluster Search

After configuring Cross Cluster Search, let’s validate our configuration. To do that, we can perform the following:
[[email protected] elasticsearch01]# curl -XGET -H 'Content-Type: application/json' localhost:9201/_remote/info?pretty

This command will request the information regarding the configured remote clusters. If the clusters were properly configured, you will get a response similar to:

[[email protected] elasticsearch01]# curl -XGET -H 'Content-Type: application/json' localhost:9201/_remote/info?pretty
{
  "america" : {
    "seeds" : [
      "127.0.0.1:9301",
      "127.0.0.1:9302"
    ],
    "http_addresses" : [
      "127.0.0.1:9201",
      "127.0.0.1:9202"
    ],
    "connected" : true,
    "num_nodes_connected" : 2,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s"
  },
  "europe" : {
    "seeds" : [
      "127.0.0.1:9303",
      "127.0.0.1:9304"
    ],
    "http_addresses" : [
      "127.0.0.1:9203",
      "127.0.0.1:9204"
    ],
    "connected" : true,
    "num_nodes_connected" : 2,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s"
  }
}
See that in the response we get data from both America and Europe Clusters, each with 2 connected nodes respectively, validating our configuration.
From this point forward,  we are able to perform requests that gather data from the remote cluster and to do that, we may use the following approach in our Elasticsearch curl requests:
GET /cluster_name:index,cluster_name:index/_search
GET */index/_search

In our scenario, this can be translated into the following:

curl -XGET localhost:9201/america:logstash-ccs-*,europe:logstash-ccs-*/_search?pretty

Or even better:

curl -XGET localhost:9201/*:logstash-ccs-*/_search?pretty

To prove that, let’s execute the second approach, using the wildcard ‘*’ to state that our query must get results from all remote hosts.

[[email protected] elasticsearch01]# curl -XGET localhost:9201/*:logstash-ccs-*/_search?pretty
{
  "took" : 40,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "america:logstash-ccs-2017.11.28",
        "_type" : "doc",
        "_id" : "LXEsAmABzrBCe4vHDNB9",
        "_score" : 1.0,
        "_source" : {
          "@version" : "1",
          "host" : "gateway",
          "@timestamp" : "2017-11-28T10:28:47.550Z",
          "@metdata" : {
            "ip_address" : "127.0.0.1"
          },
          "message" : "message america cluster",
          "port" : 43982
        }
      },
      {
        "_index" : "europe:logstash-ccs-2017.11.28",
        "_type" : "doc",
        "_id" : "T785AmABFtLcLj8V4ejd",
        "_score" : 1.0,
        "_source" : {
          "@version" : "1",
          "host" : "gateway",
          "@timestamp" : "2017-11-28T10:43:55.802Z",
          "@metdata" : {
            "ip_address" : "127.0.0.1"
          },
          "message" : "message europe cluster\r",
          "port" : 41146
        }
      }
    ]
  }
}

Note that we receive 2 results, one from each cluster as identified by the “_index” field of each result.

Disabling Cross Cluster Search

If you configured Cross Cluster Search through the Cluster Settings API, you may remove remote clusters by setting its seeds to null.
In our scenario, this would be achieved by the following:
[[email protected] elasticsearch01]# curl -XPUT -H 'Content-Type: application/json' localhost:9201/_cluster/settings -d '
{
    "persistent": {
        "search.remote": {
            "america": {
                "seeds": null
            },
            "europe": {
                "seeds": null
            }
        }
    }
}'

Upon execution, we can validate the command requesting the remote cluster information with the following:

[[email protected] elasticsearch04]# curl -XGET -H'Content-Type: application/json' localhost:9201/_remote/info?pretty

Conclusion

This post aimed to provide a quick introduction to Elasticsearch Cross Cluster Search capabilities.
Although I believe that the main topics were covered, reading the official documentation will surely equip you with new tricks that were out of the scope of this post.
Kibana Cross Search is also possible but this will be covered in a future post.
As a final note, you may find value as I did from the following Slide Deck created by Luca Cavanna (Elastic Engineer), containing valuable information on inner workings of Cross Cluster Search.

Software engineer, geek, traveler, wannabe athlete and a lifelong learner. Works at @IBM

Next Post