Skip to main content

User based queue mapping for Capacity Scheduler

When I  started to use Capacity Scheduler hierarchical queue features on top of Hortonworks' HDP 2.0 I have immediately realized that I need automatic assignment of job to queue based on username.

Sounds easy and useful? Yes! But could not find any configuration parameter and example for that.

I found only references to use mapred.job.queuename config option. This can be configured in HIVE via set mapred.job.queuename=yourqueue or using -Dmapred.job.queuename=yourqueue as a hadoop command argument.

After some hours of unavailing googling I have checked the corresponding code part and have been shocked. This is available only since HADOOP-2.6 (HDP-2.2). Check YARN-2411 for details. According to the CHANGELOG this is a relatively new feature. So sadly this is not available to me until an upgrade.

:(

See below an example based on YARN-2411 to use it in Hadoop 2.6 or higher for Hortonworks HDP-2.2

1. user1 is mapped to queue1, group1 is mapped to queue2:

yarn.scheduler.capacity.queue-mappings-override.enable=true
yarn.scheduler.capacity.queue-mappings=u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:

yarn.scheduler.capacity.queue-mappings-override.enable=true
yarn.scheduler.capacity.queue-mappings=u:%user:%user


Update: As a workaround I have configured the default queue to have very low 10% capacity with the possibility to use even 100% and created another queue which has 90% capacity (up to 100%) and can be used only by a special user. Also enabled ProportionalCapacityPreemptionPolicy so the system will kill over allocated resources from default queue in case prio queue needs those.

Example config:
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill=20000
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval=10000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round=0.1
yarn.resourcemanager.scheduler.monitor.enable=true
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.acl_administer_jobs=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.acl_submit_applications=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.acl_submit_jobs=*
yarn.scheduler.capacity.root.default.capacity=10
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=10
yarn.scheduler.capacity.root.prio.acl_submit_applications=specialuser
yarn.scheduler.capacity.root.prio.acl_submit_jobs=specialuser
yarn.scheduler.capacity.root.prio.capacity=90
yarn.scheduler.capacity.root.prio.maximum-capacity=100
yarn.scheduler.capacity.root.prio.state=RUNNING
yarn.scheduler.capacity.root.prio.user-limit-factor=10
yarn.scheduler.capacity.root.queues=default,prio

Comments

Popular posts from this blog

Insufficient Disk Space reported under wine

Did you try to install/setup any Windows Application - actually a Game what else could be necessary - and got a message that you do not have enough free space on your drive meanwhile you had lot of free space on the chosen mounted partition? You will learn the problem and hopefully the solution too. (Of course I suppose it is not the real situation you have no enough space. If so do not read ahead.) The problem is that wine does not check the amount of free space on the mounted partition corresponds to the selected directory but reports the free on the root of the directory the partition mounted to . ;( Probably it is not clean so here is an example: Let say you have / only and something is mounted as /mnt/part1 If you directly select /mnt/part1 during installation wine will check free space in fact on / and does not calculate free on the partition mounted under /mnt/part1. How to solve it you may ask? It is easy. Start winecfg and create a new drive with the directory you want to use....

Ansible ec2 module "region must be specified" issue

Some month ago I made an Ansible based autoinstall for Hortonwork's HDP 2.2. Since HDP 2.2.4.2 is out I wanted to update my install process and test how it works. However I had to realize that my previously working ansible playbooks are failing with an error message. TASK: [Launching Ambari instance] ********************************************* failed: [localhost] => {"failed": true} msg: region must be specified FATAL: all hosts have already failed -- aborting First I have checked my ansible, eucalyptus and boto config. However everything was fine. So I have checked the code of the ec2 module of ansible and found the error message in the source. # tail -n +1205 /usr/share/pyshared/ansible/modules/core/cloud/amazon/ec2.py|head -17 ec2 = ec2_connect(module) ec2_url, aws_access_key, aws_secret_key, region = get_ec2_creds(module) if region: try: vpc = boto.vpc.connect_to_region( region, aws_access_key_id=aws_access_key, aws_secret_access_key=aws_secr...