ElasticSearch : Diving into Scroll API for handling huge data records !!

Elasticsearch is a real-time distributed and open source full-text search and analytics engine which is mostly used because of fast retrieving of data from a huge pile of data records. It is basically a search engine based on lucene.

So, let’s say we made a search query in elastic like :-

POST /index_name/_search
{
“query”: {
“match”:
{
“field” : “value_to_match”
}
}
and suppose we get matching matching records more than 10,000. In that case elastic search will throw an exception saying that records are > 10,000 and will not give you the result as elastic search has inbuilt limit of 10,000 considering giving more than that result can take a lot of heap memory and can be dangerous.

So, elastic search has a feature to overcome this issue known as Scroll API.
While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

The steps can be broken down as follows :-
1.) First you have to define a batch size and create a scroll context.Here, batch refers to the number of documents elastic search will return in each scroll api hit.
2.) Let’s say you have a result with 10 records(just for example) and you want to implement scroll api in this case with a bucket size of 2 records.
3.) So, first step is to create a scroll context like :-
elastic_1

Here “size” refers to the batch size and ‘1m’ refers context time. The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.

4.) The result of the above query is as follows :-
elastic_2

5.) The result can be explained as follows :-
–> “_scroll_id” :- scroll id which is used as a parameter to get the next set of results.
–> “total” :- total number of matching records for the given search query.
–>  “hits.hits” :- list containing batch data which will be 2 in this case as batch_size is specified as 2.

6.) So, we are finally left with 8 records after this query. So, in order to get the next set of data we have to query like this :-
elastic_3

The query has only two parameters the context for the next batch of data and the scroll_id received in previous api hit. Result :-
elastic_5.png

The result of the  above query will give the next set of result with a different scroll_id so we have to again make the above request with that scroll_id in order to get next set of data and so on till we get all the data records.

I have a written python script that implements and shows the demo of  elastic search.
You can find out the code at this link.

Hope you find the article useful and helpful.

Happy Hacking 🙂

Advertisements

I am going to Akademy’17 !!

Very happy to share, while writing this blog is that next month I will be attending Akademy 2017, the yearly KDE Community summit that is held since 2003 and which this year will take place in Almería, Spain from July 22th until July 27th 2017.

I am very grateful to KDE for providing me this wonderful opportunity to meet and connect with awesome people from the KDE community.Moreover, KDE has also provide me financial help by accepting my travel request so that I can travel to spain without any financial problem.

I am planning to schedule my travel from 21 july 2017 to 27 july 2017.Another good part of this event is that KDE has also provided me the opportunity to give a short talk(10 min) about my experience with open source world and how I am contributing to the community. So, I will give a short talk titled “Getting started with GCompris” which will highlight, how and when I started contributing to “GCompris” a awesome FOSS project under KDE.

akad
So, at last I am once again very grateful to the KDE community for providing me this wonderful opportunity.
See you in Almeria !!

SOK 2017 wrap-up !!

So, this month sok 2017 came to an end, with a hell lot of learning & community bonding.
I must say that the four months during the season were awesome & i gained a lot from it.
Thanks to KDE for giving me the wonderful opportunity of working on an awesome open source project named GCompris.
GCompris is a high quality educational software suite comprising of numerous activities for children aged 2 to 10.

My mentors were :- Johnny Jazeix,Emmanuel Charruau & sagar aggarwal.
They were very helpful & cooperative always helped me during time of trouble.

My goals for the season were as follows:-
1.) The main aim of this activity is to assist & help children to memorize multiplication tables, addition & subtraction in a fun & competitive way.
2.) The children, basically will be asked different set of questions which they have to answer in the space provided.
3.) There will be a single base activity called Question & answer activity & three sub-activities. The focus of the sok will be to have the generic activity and the 3 mathematical activities told above.

Goals achieved :-
1.) Normal mode of Activity is complete. Questions are displayed to the user as Grid format with some space to write answers.
2.) School mode is also almost complete , the user can go to settings window & can select the questions to be displayed to the students. The selected questions will be displayed as grid format same as in normal mode.
3.) After writing answers for the questions, the user have to click on finish button.
The total score & time taken will be displayed.

 

1.)  Home screen of Activity

screen1

 

2.) Here the user can write answers in space provided

screen2

 

3.) Total score & time taken displayed for the user

screen3

 

                      4.) The user can select modes in the settings window

screen4

 

  5.) In school mode list will be displayed from where the user can choose questions

screen5

 

Task Pending :-
In school mode (where the user selects the questions), the selected should be displayed in a random order on each day of the weak using JS random function.

So, finally I would to conclude that Season of KDE proved to be a great source of learning for me. I learned & gained a lot from this wonderful opportunity.
I again wants to thanks KDE & the open source community for organizing such awesome events.

Cheers.

 

 

Thank you Season of KDE !!!

First of all, thanks to KDE for giving me the wonderful opportunity of working on an awesome open source project named GCompris under sok 2016-17.
GCompris is a high quality educational software suite comprising of numerous activities for children aged 2 to 10.

My goals for this year SOK are as follows:-

Work flow of the activity :-
1.) The main aim of this activity is to assist & help children to memorise multiplication tables, addition & subtraction in a fun & competitive way.
2.) The children, basically will be asked different set of questions which they have to answer in the space provided.
3.) There will be a single base activity called Question & answer activity & three sub-activities. The focus of the sok will be to have the generic activity and the 3 mathematical activities told above.

I am very excited & enthusiast to grab this awesome opportunity  provided by KDE to contribute to open source community & learn from the awesome people there.

*** THE END ***