Skip to main content

Why ec2-ami-tools - ec2-upload-bundle - hangs

Me and my colleagues had been fighting an issue for some days. Since it made me really-really f**** upset I decided to investigate it now.

The issue: is that if you are behind a proxy (not necessarily) - http or socks does not matter much - you might have problem with ec2-ami-tools. When you start ec2-upload-bundle it does nothing just looks like hanging. We had been waiting for some minutes but nothing happened. Actually it is trying to do something but can not start upload.
In contrast on my home network ec2-upload-bundle almost immediately starts uploading and showing a message to indicate it.
So what is wrong? Why this is happening from behind a proxy? Is it proxy related?

We have tested many possible workarounds without any luck.
First we have specified http_proxy and https_proxy of course but it did not work.
Then we also tried to socksify the process. Without any luck.
At that point we decided to eliminate the proxies. We got an access to a machine in DMZ and we made a dynamic port forwarding and tried to use that as a socks proxy server. And that was really surprising because even that did not work.
Everything else - Firefox, ssh - was working fine but not ec2-ami-tools.
Here we ran out of the ideas for a while.

But  of course problem happens to be solved.

So I made a tcpdump, had a look and saw that the process is trying to access a special IP address 169.254.169.254 . This is a link-local IP address and this is used by amazon within the cloud to fetch some instance specific data. It is unclear why ec2-upload-bundle is trying to reach it. It is much more unclear how the program is written to be able to hang on it. I mean it is also visible that it makes a retry before the TCP timeout (0,3,9 sec) but than it gives up and hangs.

Workaround: to the problem can be to configure this 169.254.169.254 IP on the ssh server or http/socks proxy server. Also you can try somehow locally - with a firewall - reject these connections. But you have to prevent the connection to timeout since if the connection times out ec2-upload-bundle will hang. See the note for an alternative workaround.

Note: This is working on my home machine because normally my Ubuntu Linux has a 169.254.0.0/16 route. And this is enough to prevent the timeout. But only if you are not using proxy. If proxy is in between than your own routing is not used to reach this IP but the route of the proxy is used. So you have to add this route to the proxy/ssh server if you can.

route add -net 169.254.0.0 netmask 255.255.0.0 dev eth0 # Replace eth0 with your interface.


Comment: I have less motivation now to investigate it further but probably I will check what is wrong in the code and why ec2-upload-bundle hangs in such a case.

Update 1: I have checked the source code of ec2-ami-tools and it seems the error is in instance-data.rb. The initialize method is checking if the meta data is available or not and set @instance_data_accessible according. But there is no timeout defined here so you have to wait ... It seems even the TCP timeout is not realized by the open method. A Timeout::timeout(10) { } can solve this error.

Update 2: Later in instance-data.rb the read_meta_data(path) method should call the open() method only if @instance_data_accessible is true. But this is not the case. I do not know ruby at all but this seems to be a coding error. I have just added some print command to see what was going there but do not understand the code really. Besides the open() methods here also hanging indefinitely. So this is a complete hang here.

Comments

Popular posts from this blog

Insufficient Disk Space reported under wine

Did you try to install/setup any Windows Application - actually a Game what else could be necessary - and got a message that you do not have enough free space on your drive meanwhile you had lot of free space on the chosen mounted partition? You will learn the problem and hopefully the solution too. (Of course I suppose it is not the real situation you have no enough space. If so do not read ahead.) The problem is that wine does not check the amount of free space on the mounted partition corresponds to the selected directory but reports the free on the root of the directory the partition mounted to . ;( Probably it is not clean so here is an example: Let say you have / only and something is mounted as /mnt/part1 If you directly select /mnt/part1 during installation wine will check free space in fact on / and does not calculate free on the partition mounted under /mnt/part1. How to solve it you may ask? It is easy. Start winecfg and create a new drive with the directory you want to use....

User based queue mapping for Capacity Scheduler

When I  started to use Capacity Scheduler hierarchical queue features on top of Hortonworks' HDP 2.0 I have immediately realized that I need automatic assignment of job to queue based on username. Sounds easy and useful? Yes! But could not find any configuration parameter and example for that. I found only references to use mapred.job.queuename config option. This can be configured in HIVE via set mapred.job.queuename=yourqueue or using -Dmapred.job.queuename=yourqueue as a hadoop command argument. After some hours of unavailing googling I have checked the corresponding code part and have been shocked. This is available only since HADOOP-2.6 (HDP-2.2). Check YARN-2411 for details. According to the CHANGELOG this is a relatively new feature. So sadly this is not available to me until an upgrade. :( See below an example based on YARN-2411 to use it in Hadoop 2.6 or higher for Hortonworks HDP-2.2 1. user1 is mapped to queue1, group1 is mapped to queue2: yarn.schedul...

Python Azure ML SDK issue on Ubuntu 22.04

It has been quite a while since I posted last time. Why? Because simply I did not run into any issue worth to share. But now! I did.  Recently we are doing some Machine Learning on Azure using Azure Machine Learning Python SDK. No problem you might think. Well. As it turned out Ubuntu 22.04 is not supported. And this is clearly said in a message. Which is in fact a lie. The Error message: NotImplementedError: Linux distribution ubuntu 22.04 does not have automatic support. Missing packages: {'liblttng-ust.so.0'} .NET Core 3.1 can still be used via `dotnetcore2` if the required dependencies are installed. Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions. Follow your distro specific instructions to install `dotnet-runtime-*` and replace `*` with `3.1.23`. Ok but what is this? And why? So as the error mentions dotnetcore2==3.1.23 Python package uses .NET Core 3.1 but Ubuntu 22.04 has only dotnet6 packages. And also Micro...