ES Server - Elasticsearch v2.0.0

  • cluster.routing.allocation.cluster_concurrent_rebalance: 2
    • Determines the number of shards allowed for concurrent rebalance. This property needs to be set appropriately depending on the hardware being used, for example the number of CPUs, IO capacity, etc. If this property is not set appropriately, it can impact the ElasticSearch performance with indexing. By default the value is set at 2, meaning that at any point in time only 2 shards are allowed to be moving. It is good to set this property low so that the rebalance of shards is throttled and doesn't affect indexing.
  • index.store.throttle.max_bytes_per_sec: 10mb
    • Allows to control the maximum bytes per sec written to the file system. I was using small documents, something around 10Kb, we can increase this value once we get bigger documents.
  • index.number_of_replicas: 0
    • To disable the replica allocation in order to run a single node per cluster.
  • index.routing.allocation.total_shards_per_node: 2
    • The maximum number of shards (replicas and primaries) that will be allocated to a single node. Defaults is unbounded. It imposes a hard limit which can result in some shards not being allocated. Use with caution. It can be changed later to add more shards and help the search feature, because replica shards will receive the queries while primary shards will receive the index and delete requests.
  • index.refresh_interval: 5s
    • Better indexing performance if you leave refresh enabled. This is because ES a separate refresh thread which will do the flushing, instead of having your bulk indexing threads to it when RAM is full.
  • indices.cluster.send_refresh_mapping: false
    • When the index manager send a node an index request to process, the node updates its own mapping and then sends that mapping to the master. While the master processes it, that node receives a state that includes an older version of the mapping. If there’s a conflict, it’s not bad (i.e. the cluster state will eventually have the correct mapping), but we send a refresh just in case from that node to the master. In order to make the index request more efficient, we have set this property on our data nodes. We are currently running one single node so we do not need to have it enabled.
  • node.max_local_storage_nodes: 1
    • Start at most one single node in the cluster.
  • action.destructive_requires_name: true
    • The delete index API can also be applied to more than one index, or on all indices by using _all or * as index. To prevent deleting all indices via wildcards or _all.

ES Java Client - Elasticsearch v2.0.0

  • TransportClient instead of NodeClient
    • Connects to the cluster and does not act like a new node, which reduce the noise in the cluster and allow faster requests.
  • Parallel bulk requests and do not wait for ES Responses
    • ES execute the bulk requests in background and it takes time to send back a response, so just send more bulk requests and put the listeners to parse the responses in another thread.
  • BulkProcessorListener instead of client.prepareBulk()
    • The listener allow you to configure the size of the bulk request and other flush parameters as well.
  • BulkProcessorListener actions: 1000-10000 (current is 5K)
    • Do not set more than 10K per bulk actions, it is not recommended.
  • BulkProcessorListener with BulkSize in MB
    • Set auto flush to the buffer when it reaches X MB, even if it contains less than N actions.
  • BulkProcessorListener with concurrentRequests enabled (x >= 1)
    • To avoid blocking threads for bulk requests is recommended to set this property using x = 4 * num_available_cores for the concurrency.



Comments

comments powered by Disqus

Felipe Forbeck
Senior Developer, Java, Scala, Akka, NoSQL dbs, Brazilian Jiu-Jitsu.