Skip to main content

Posts

Showing posts from January, 2015

User based queue mapping for Capacity Scheduler

When I  started to use Capacity Scheduler hierarchical queue features on top of Hortonworks' HDP 2.0 I have immediately realized that I need automatic assignment of job to queue based on username. Sounds easy and useful? Yes! But could not find any configuration parameter and example for that. I found only references to use mapred.job.queuename config option. This can be configured in HIVE via set mapred.job.queuename=yourqueue or using -Dmapred.job.queuename=yourqueue as a hadoop command argument. After some hours of unavailing googling I have checked the corresponding code part and have been shocked. This is available only since HADOOP-2.6 (HDP-2.2). Check YARN-2411 for details. According to the CHANGELOG this is a relatively new feature. So sadly this is not available to me until an upgrade. :( See below an example based on YARN-2411 to use it in Hadoop 2.6 or higher for Hortonworks HDP-2.2 1. user1 is mapped to queue1, group1 is mapped to queue2: yarn.schedul

Hortonworks Hadoop HDP 2.0 lost default capacity scheduler config

As a result of my fault, and also result of strange behaviour of Ambari UI, I have overwritten my default capacity scheduler configuration data on my Hadoop Hortonworks HDP 2.0 cluster. Looking around I have found the xml file containing the original value as /var/lib/ambari-agent/cache/stacks/HDP/2.0._/services/YARN/configuration/capacity-scheduler.xml However on the UI you need a properties file style format. Here it is. yarn.scheduler.capacity.maximum-applications=10000 yarn.scheduler.capacity.maximum-am-resource-percent=0.2 yarn.scheduler.capacity.root.queues=default yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.default.capacity=100 yarn.scheduler.capacity.root.default.user-limit-factor=1 yarn.scheduler.capacity.root.default.maximum-capacity=100 yarn.scheduler.capacity.root.default.state=RUNNING yarn.scheduler.capacity.root.default.acl_submit_jobs=* yarn.scheduler.capacity.root.default.acl_administer_jobs=* yarn.scheduler.capacity.root.acl_