# Fault Recovery


## Recovery from yellow state

The `yellow` state indicates the presence of unassigned replica shards.

**Query index status**

```
curl -s -XGET 'http://<host>:9200/_cat/indices?v'
curl -s -XGET 'http://<host>:9200/_cluster/health?level=indices'
```

**Query unassigned shards**

```
curl -s -XGET 'http://<host>:9200/_cat/shards?v' | grep UNASSIGNED
curl -s -XGET 'http://<host>:9200/_cluster/health?level=shards'
```

\* **Unreasonable index replica setting**

If the number of indexed replicas is set to be greater than the number of data nodes, leading to the cluster being in a `yellow` state, adjust the number of replicas to rectify the cluster status.

```
curl -XPUT \
http://<host>:9200/unassigned_index/_settings \
-H 'Content-Type: application/json' \
-d '{
    "index": {
        "number_of_replicas": replicasCount
    }
}'

# unassigned_index is the index of the unassigned shard
# replicasCount is the new number of index replicas
```

Under normal circumstances, unassigned replica shards will be automatically assigned and the cluster status will recover to `green`. Under special circumstances, it might be necessary to manually assign unassigned replica shards.

```
curl -XPOST \
http://<host>:9200/_cluster/reroute \
-H 'Content-Type: application/json' \
-d '{
    "commands": [{
        "allocate_replica": {
            "index": "unassigned_index",
            "shard": num,
            "node": "nodeName"
        }
    }]
}'

# unassigned_index is the index of the unassigned shard
# num is the sequence number of the unassigned shard
# nodeName is the node name, or can be the node ID, such as kVWViI1PQt2Bk2rP7PlrbQ
```

The cluster will attempt to allocate a maximum of index.allocation.max_retries time slices in a row (default is 5) before giving up and leaving the shard. This situation might be caused by structural problems, such as a analyzer referring to a stop word file that does not exist on any node. Once this problem has been resolved, manual retry allocation can be done by calling retry_failed on the [Reroute
API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html), which will attempt to retry these shards once.

```
POST /_cluster/reroute?retry_failed=true
```

Worse cluster situations may result in unassigned primary shards. For manual allocation, refer to [Reroute](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html).

\* **Node disk usage exceeds threshold**

For the impact of node disk usage on shard allocation, refer to [Disk-based Shard
Allocation](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/disk-allocator.html).

If high disk usage is causing the cluster to have unassigned shards, consider modifying the disk usage policy for temporary relief, or increase the number of nodes.

Additionally, if it is certain that some historical index data can be permanently deactivated, the cluster status can be restored by deleting such indices.