Elasticsearch is a real-time distributed and open source full-text search and analytics engine which is mostly used because of fast retrieving of data from a huge pile of data records. It is basically a search engine based on lucene.
So, let’s say we made a search query in elastic like :-
“field” : “value_to_match”
and suppose we get matching matching records more than 10,000. In that case elastic search will throw an exception saying that records are > 10,000 and will not give you the result as elastic search has inbuilt limit of 10,000 considering giving more than that result can take a lot of heap memory and can be dangerous.
So, elastic search has a feature to overcome this issue known as Scroll API.
While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.
The steps can be broken down as follows :-
1.) First you have to define a batch size and create a scroll context.Here, batch refers to the number of documents elastic search will return in each scroll api hit.
2.) Let’s say you have a result with 10 records(just for example) and you want to implement scroll api in this case with a bucket size of 2 records.
3.) So, first step is to create a scroll context like :-
Here “size” refers to the batch size and ‘1m’ refers context time. The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.
4.) The result of the above query is as follows :-
5.) The result can be explained as follows :-
–> “_scroll_id” :- scroll id which is used as a parameter to get the next set of results.
–> “total” :- total number of matching records for the given search query.
–> “hits.hits” :- list containing batch data which will be 2 in this case as batch_size is specified as 2.
6.) So, we are finally left with 8 records after this query. So, in order to get the next set of data we have to query like this :-
The query has only two parameters the context for the next batch of data and the scroll_id received in previous api hit. Result :-
The result of the above query will give the next set of result with a different scroll_id so we have to again make the above request with that scroll_id in order to get next set of data and so on till we get all the data records.
I have a written python script that implements and shows the demo of elastic search.
You can find out the code at this link.
Hope you find the article useful and helpful.
Happy Hacking 🙂