Tuesday, 4 August 2015

Ansible: Using multiple tags and untagged tag together

I have lots of Ansible playbooks with many roles in each. However when you are installing different minor version of the same software stack, there are only minor differences between the steps. In this case it does not make much sense to "copy paste" the whole role so I just wanted to use tags. I wanted to use untagged tasks as common tasks and tagged tasks for version specific tasks. To make it clear here is an example. If you have a long os related role which does ssh config, web config, database install and creation and many more but sometimes you need java-6 or java-7 it is easy to add task and tag those according to this. Than my theory was that I can run
ansible-playbook --tags=untagged,java6
to install the stack with java6 and
ansible-playbook --tags=untagged,java7
to install same stack with java7. However this does not work.

I have checked the Ansible source code and found why it is not working. Since I was not sure if this is a bug or by design I have opened a issue and described the problem.

Brian Coca was kind to answer quickly and as you can read he wrote this is by design but he was also kind to consider it as a feature request. Hope will be accepted. However if you need this modified behaviour now you can either check the issue or read the solution below.

Ansible 1.9


The corresponding code part is the tasks_to_run_in_play() function in playbook/__init__.py

Original code:
elif 'untagged' in self.only_tags:
    if task_set == u:
        should_run = True

Proposed fix:
elif 'untagged' in self.only_tags and task_set == u:
    should_run = True

Friday, 26 June 2015

Hortonworks HDP 2 moving master componenst to other nodes

In a need to decommission a node from a Hadoop cluster based on HDP 2.2.4.2 I have realised that Ambari 2.0 delivered with HDP 2.2.4.2 can not move History Server and Falcon Server master components to another node. Simply the functionality is missing. I could use the Ambari web UI for every other service I wanted to move but not these two.

So looking around I found a this mail

which I am summarising in the below command set.

  • Stop Falcon Server and History Server via the Ambari UI.
  • Execute below commands and do not forget to specify values for the first five lines :)
    AMBARI_SERVER_HOST=
    CLUSTERNAME=mycluster
    HOSTNAME=
    TARGET_HOSTNAME=
    PASS=

    curl -i -u admin:${PASS} -H 'X-Requested-By: ambari' -X DELETE http://${AMBARI_SERVER_HOST}:8080/api/v1/clusters/${CLUSTERNAME}/hosts/${HOSTNAME}/host_components/HISTORYSERVER

    curl -i -u admin:${PASS} -H 'X-Requested-By: ambari' -X POST -d'{"HostRoles":{"component_name":"HISTORYSERVER"}}' http://${AMBARI_SERVER_HOST}:8080/api/v1/clusters/${CLUSTERNAME}/hosts/${TARGET_HOSTNAME}/host_components

    curl -i -u admin:${PASS} -H 'X-Requested-By: ambari' -X DELETE http://${AMBARI_SERVER_HOST}:8080/api/v1/clusters/${CLUSTERNAME}/hosts/${HOSTNAME}/host_components/FALCON_SERVER

    curl -i -u admin:${PASS} -H 'X-Requested-By: ambari' -X POST -d'{"HostRoles":{"component_name":"FALCON_SERVER"}}' http://${AMBARI_SERVER_HOST}:8080/api/v1/clusters/${CLUSTERNAME}/hosts/${TARGET_HOSTNAME}/host_components
  • After the commands have been executed go to the Ambari UI and click Re-Install for the services on the new host.
  • As noted in the e-mail please update the values of mapreduce.jobhistory.address and mapreduce.jobhistory.webapp.address of Mapreduce2 via the Ambari UI.
  • Please also update *.broker.url under Falcon -> Config -> Falcon startup.properties
  • When installation finished start the services.

Sunday, 24 May 2015

Ansible ec2 module "region must be specified" issue

Some month ago I made an Ansible based autoinstall for Hortonwork's HDP 2.2.
Since HDP 2.2.4.2 is out I wanted to update my install process and test how it works. However I had to realize that my previously working ansible playbooks are failing with an error message.

TASK: [Launching Ambari instance] *********************************************
failed: [localhost] => {"failed": true}
msg: region must be specified

FATAL: all hosts have already failed -- aborting

First I have checked my ansible, eucalyptus and boto config. However everything was fine. So I have checked the code of the ec2 module of ansible and found the error message in the source.

# tail -n +1205 /usr/share/pyshared/ansible/modules/core/cloud/amazon/ec2.py|head -17

ec2 = ec2_connect(module)

ec2_url, aws_access_key, aws_secret_key, region = get_ec2_creds(module)

if region:
try:
vpc = boto.vpc.connect_to_region(
region,
aws_access_key_id=aws_access_key,
aws_secret_access_key=aws_secret_key
)
except boto.exception.NoAuthHandlerFound, e:
module.fail_json(msg = str(e))
else:
module.fail_json(msg="region must be specified")

The code shows if region is not specified you get this error message and actually this is what the message itself writes.

BUT WHY DO YOU NEED A REGION? IT WAS NOT NECESSARY BEFORE! SEEMS TO BE A BUG!!!

After checking Ansible bugs I have found a bug describing my problem. One of the commenter also found the same code part suspicious. So I have modified it a bit to avoid the vpc related issue and after the modification all my playbooks started to work fine again.
1207c1207
< 
---
>     vpc=None
1219,1220c1219,1220
<     else:
<         module.fail_json(msg="region must be specified")
---
>     #else:
>     #    module.fail_json(msg="region must be specified")
1237d1236
< 

Side-by-side diff
ec2 = ec2_connect(module)                                       ec2 = ec2_connect(module)
                                                              |     vpc=None
    ec2_url, aws_access_key, aws_secret_key, region = get_ec2       ec2_url, aws_access_key, aws_secret_key, region = get_ec2

    if region:                                                      if region:
        try:                                                            try:
            vpc = boto.vpc.connect_to_region(                               vpc = boto.vpc.connect_to_region(
                region,                                                         region,
                aws_access_key_id=aws_access_key,                               aws_access_key_id=aws_access_key,
                aws_secret_access_key=aws_secret_key                            aws_secret_access_key=aws_secret_key
            )                                                               )
        except boto.exception.NoAuthHandlerFound, e:                    except boto.exception.NoAuthHandlerFound, e:
            module.fail_json(msg = str(e))                                  module.fail_json(msg = str(e))
    else:                                                     |     #else:
        module.fail_json(msg="region must be specified")      |     #    module.fail_json(msg="region must be specified")

Sunday, 8 February 2015

Google Now Launcher : No settings, no cards, not working anymore.

My Moto G 1st Gen runs Android 5.0.2 and I was using Google Now Launcher on top. It had been working fine for long but one day suddenly all my cards disappeared and search did not work anymore. The "Reminders" and the "Customize" menu items were inactive and trying to use "Settings" of Google Now Launcher just crashed.

Since Googling around did not reveal any exact solution but gave some hints here is what I have done at the end.

Please note: You will loose all your Home Screen and Google Now customizations!!!

So go to Settings - Apps and swipe to the "All" page. Search for and select "Google App".
 
First try "CLEAR CACHE".


If still not working select "MANAGE SPACE" and than try first "CLEAR GOOGLE SEARCH DATA". Still no luck? Select "CLEAR LAUNCHER DATA" but this is where you loose your customizations. Finally if none of the above was working you can select "CLEAR ALL DATA".

After this you might have to reconfigure Google Now and also you have to recreate your Home screen.

Friday, 16 January 2015

User based queue mapping for Capacity Scheduler

When I  started to use Capacity Scheduler hierarchical queue features on top of Hortonworks' HDP 2.0 I have immediately realized that I need automatic assignment of job to queue based on username.

Sounds easy and useful? Yes! But could not find any configuration parameter and example for that.

I found only references to use mapred.job.queuename config option. This can be configured in HIVE via set mapred.job.queuename=yourqueue or using -Dmapred.job.queuename=yourqueue as a hadoop command argument.

After some hours of unavailing googling I have checked the corresponding code part and have been shocked. This is available only since HADOOP-2.6 (HDP-2.2). Check YARN-2411 for details. According to the CHANGELOG this is a relatively new feature. So sadly this is not available to me until an upgrade.

:(

See below an example based on YARN-2411 to use it in Hadoop 2.6 or higher for Hortonworks HDP-2.2

1. user1 is mapped to queue1, group1 is mapped to queue2:

yarn.scheduler.capacity.queue-mappings-override.enable=true
yarn.scheduler.capacity.queue-mappings=u:user1:queue1,g:group1:queue2

2. To map users to queues with the same name as the user:

yarn.scheduler.capacity.queue-mappings-override.enable=true
yarn.scheduler.capacity.queue-mappings=u:%user:%user


Update: As a workaround I have configured the default queue to have very low 10% capacity with the possibility to use even 100% and created another queue which has 90% capacity (up to 100%) and can be used only by a special user. Also enabled ProportionalCapacityPreemptionPolicy so the system will kill over allocated resources from default queue in case prio queue needs those.

Example config:
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill=20000
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval=10000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round=0.1
yarn.resourcemanager.scheduler.monitor.enable=true
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.acl_administer_jobs=*
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.acl_submit_applications=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.default.acl_submit_applications=*
yarn.scheduler.capacity.root.default.acl_submit_jobs=*
yarn.scheduler.capacity.root.default.capacity=10
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor=10
yarn.scheduler.capacity.root.prio.acl_submit_applications=specialuser
yarn.scheduler.capacity.root.prio.acl_submit_jobs=specialuser
yarn.scheduler.capacity.root.prio.capacity=90
yarn.scheduler.capacity.root.prio.maximum-capacity=100
yarn.scheduler.capacity.root.prio.state=RUNNING
yarn.scheduler.capacity.root.prio.user-limit-factor=10
yarn.scheduler.capacity.root.queues=default,prio

Thursday, 15 January 2015

Hortonworks Hadoop HDP 2.0 lost default capacity scheduler config

As a result of my fault, and also result of strange behaviour of Ambari UI, I have overwritten my default capacity scheduler configuration data on my Hadoop Hortonworks HDP 2.0 cluster. Looking around I have found the xml file containing the original value as
/var/lib/ambari-agent/cache/stacks/HDP/2.0._/services/YARN/configuration/capacity-scheduler.xml

However on the UI you need a properties file style format. Here it is.

yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.root.queues=default
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.default.capacity=100
yarn.scheduler.capacity.root.default.user-limit-factor=1
yarn.scheduler.capacity.root.default.maximum-capacity=100
yarn.scheduler.capacity.root.default.state=RUNNING
yarn.scheduler.capacity.root.default.acl_submit_jobs=*
yarn.scheduler.capacity.root.default.acl_administer_jobs=*
yarn.scheduler.capacity.root.acl_administer_queues=*
yarn.scheduler.capacity.root.unfunded.capacity=50