Using Spot Instances in Amazon EMR without the risk of losing the job

All of us look for saving cost on machines running on the cloud and one such option provided by Amazon are the SPOT machines. But is it practical to use these for our EMR jobs?

You can bid on Spot instances on EMR for your Hadoop jobs but in this case there is always a risk of losing the machines and therefore the job failing. This is not entirely correct since EMR allows us to launch a job with a few spot nodes(task) and a few core nodes.


The EC2 instances used to run an Elastic MapReduce job flow fall in to one of three categories or instance groups:

Master– The Master instance group contains a single EC2 instance. This instance schedules Hadoop tasks on the Core and Task nodes.

Core – The Core instance group contains one or more EC2 instances. These instances use HDFS to store the data for the job flow. They also run mapper and reducer tasks as specified in the job flow. This group can be expanded in order to accelerate a running job flow.

Task – The Task instance group contains zero or more EC2 instances and runs mapper and reduce tasks. Since they don’t store any data, this group can expand or contract during the course of a job flow.

You can choose to use either On-Demand or Spot Instances for each of your job flows. This is valid for all of the above types. However, from the definition above if you lose a master or core machine then your job is bound to fail. Theoretically, you can have something like:

elastic-mapreduce –create –alive –plain-output

–instance-group master –instance-type m1.small –instance-count 1 –bid-price 0.098 \
–instance-group core –instance-type m1.small –instance-count 10 –bid-price 0.028 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.018

But realistically, as you know, if you request spot instances, keep in mind that if the current spot price exceeds your max bid, either instances will not be provisioned or will be removed from the current job flow. Thus, if at any time the bid price goes higher and you lose any of your CORE or MASTER node then the job will fail. Both CORE and TASKS nodes run TaskTrackers but only CORE nodes run DataNodes so you would need at least one CORE node.

To hedge the complete lose of a jobflow, multiple instance groups can be created where the `CORE` group is a smaller complement of traditional on-demand systems and the `TASK` group is the group of spot instances. In this configuration, the `TASK` group will only benefit the mapper phases of a job flow as work from the `TASK` group is “hand back up” to the `CORE` group for reduction.

So say if you have to run a job which would ideally need 40 slave machines, then you can have say 10 machines(CORE group) as the traditional instance while other 30 as spot instances(TASK group). The syntax for creating the multiple instance groups is below:

elastic-mapreduce –create –alive –plain-output

–instance-group master –instance-type m1.small –instance-count 1 \
–instance-group core –instance-type m1.small –instance-count 10 \
–instance-group task –instance-type m1.small –instance-count 30 –bid-price 0.018.

This will help you to save cost by running SPOT instances as your nodes and at the same time make sure that job does not fail. However, keep in mind that it is possible, depending upon your price and the time taken to complete the job, the SPOT instances may come and go so might in the worst case end up incurring the same cost and taking longer time to complete the job. It will all depend on your bid price so choose the price wisely.

Posted on December 26, 2012, in EMR, Hadoop. Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: