DevOps MeetUp in Mannheim

Yesterday was the very first DevOps MeetUp in Mannheim. VersionEye was sponsoring the event with Location, Pizza & Beer. Beside that I did an intro talk to Docker. I showed the basics and also demonstrated how VersionEye is using Docker for infrastructure deployment. Here are my slides.

The 2nd talk was from Michael Bahr about Ansible. He did a great intro and showed how you can use Ansbile to orchestrate Docker containers. His slides are here on SlideShare.

The event was a great success. More people showed up then expected. The room was full packed with people.


And even more people joint during the Docker talk. The left upper corner was completely full with people by end of the talk.

The event started at 7 PM. After the 2 talks most of the people stayed for Beer & Pizza and the networking part went until midnight.

Screen Shot 2015-09-24 at 12.54.08

It was a great kick off for DevOps Mannheim. Looking forward to the next event 🙂

Geek2Geek at Delodi

The last Geek2Geek event was at Delodi in Berlin. This time we talked about Configuration Management Tools. Probably you heard about Chef and Puppet already. We had 2 talks to the new kids on the block. To Ansible and Salt.

A couple weeks ago I pushed out a tweet about Ansible and Frederic responded with an answer. He mentioned that he is using Salt Stack, but he would like to learn more about Ansible. I mentioned that we could do a Geek2Geek event to that topic and if he agrees to do the Salt talk I would do the Ansible talk. And so we did! That’s how I found the topic to this Geek2Geek event 🙂

Frederic just arrived the day before in Germany, coming in from San Francisco. He had a bit jet lag. That’s why he did the first talk about Salt.


Frederic gave us a very good overview about the architecture of Salt. You can find his slides here.

After a short break I did the Ansible talk. The special thing on Ansible is that you don’t have a Master and you don’t need to install a client on the servers. It works via SSH and the configuration is done in Yaml. Here are my slides to Ansible.

The Geek2Geek group was this time a bit smaller than last time. All together I counted 30 people. It was a good round with interesting talks afterwards.


A special thanks to our Sponsors.

Guarana Brause

Guarana Brause sponsored this time the Pizza. Many Thanks for that. Guarana Brause is a powder with natural caffeine. You can mix the powder with water or any other drink. It’s much better than coffee and very popular in the tech scene of Berlin.

Screen Shot 2014-10-12 at 18.36.43


Delodi sponsored the Drinks and of course the space. Delodi is a cool tech company in the heart of Berlin. They build solutions for their clients, mostly with PHP & JavaScript.

Screen Shot 2014-10-12 at 18.39.31

They have a really nice office in Kreuzberg at Oranienstrasse, with a lot of free space. And by the way, they are currently looking for another PHP Senior Developer. if you want to work with smart people, on cool projects in the heart of Berlin you should contact Delodi.

Rebuilding Capistrano like deployment with Ansible

Probably you are familiar with Capistrano. It’s a deployment tool written in Ruby. I used it in several projects to deploy Rails applications. For the default Rails stack with a SQL Database it always worked fine. For a non default Rails stack, not always so good. This is how a capistrano deployment directory looks like on the server.

Screen Shot 2014-09-24 at 19.05.22

The current symlink links to a timestamped directory in the “releases” directory, which contains the actual application code. The “releases” directory looks like this:

Screen Shot 2014-09-24 at 19.05.49

Each subdirectory contains the timestamp of the deployment as name. The “shared” directory looks like this:

Screen Shot 2014-09-24 at 19.06.09

It contains directories which are shared over all “release” directories. For example the “log” directory for logging outputs or the “pid” directory which contains the pid of the current running ruby application server, for example unicorn.

Capistrano creates these directory structure automatically for you. And every time you perform a new deployment it creates a new timestamped directory under “releases” and if the deployment was successfull it links the “current” to the newest “release” directory under “releases”.

A couple months ago I learned Ansible, to automate the whole IT Infrastructure of VersionEye. Ansible is really great! I automated everything with it. But there was a break in the flow. Usually I executed an Ansible script to setup a new Server and then I had to execute capistrano to deploy the application. One day I thought why not implementing the whole deployment with Ansible as well? Why not just execute one single command which sets up EVERYTHING?

So I did. I implemented a capistrano like deployment with Ansible. This is how I did it.

First of all I ensure that the “log” and “pid” directories exist in the “shared” folder. The following commands will create them if they do not exist. These commands create the whole directory path if they do not exist. If they exist nothing happens.

- name: Create log directory
  file: >

- name: Create pids directory
  file: >

The next part is a bit more tricky, because we need a timestamp as a variable. This is how it works with Ansible.

- name: Get release timestamp
  command: date +%Y%m%d%H%M%S
  register: timestamp

These command takes the current timestamp and registers it in the “timestamp” variable.  Now we can use the variable to create a new variable with the full path to the new “release” directory.

- name: Name release directory
  command: echo "/var/www/versioneye/releases/{{ timestamp.stdout }}"
  register: release_path

And now we can create the new “release” directory.

- name: Create release directory
  file: >
  path={{ release_path.stdout }}

Allright. Now in the next step we can checkout our source code from git into the new “release” directory we just created.

- name: checkout git repo into release directory
  git: >
    dest="{{ release_path.stdout }}"
    sudo: no

Remember. Ansible works via SSH tunneling. With the right configuration you can auto forward your SSH Agent. That means if you are able to check out that git repository on your localhost, you will be able to check it out on any remote server via Ansible as well.

Now we want to overwrite the “log” and the “pids” directory in the application directory and link them to our “shared” folder.

- name: link log directory
  file: >
    path="{{ release_path.stdout }}/log"
    sudo: no
- name: link pids directory
  file: >
    path="{{ release_path.stdout }}/pids"
    sudo: no

Now let’s install the dependencies.

- name: install dependencies
  shell: cd {{ release_path.stdout }}; bundle install
  sudo: no

And pre compile the assets.

- name: assets precompile
  shell: cd {{ release_path.stdout }}; bundle exec rake assets:precompile --trace
  sudo: no

And finally update the “current” symlink and restart Unicorn, the ruby application server.

- name: Update app version
  file: >
    src={{ release_path.stdout }}
    notify: restart unicorn

That’s it.

This is how VersionEye was deployed for a couple months, before we moved on to Docker containers. Now we use Ansible to deploy Docker Containers. But that’s another blog post 😉

Intro to Ansible

Ansible is a tool for managing your IT Infrastructure.

If you have only 1 single server to manage, you probably login via SSH and execute a couple shell commands. If you have 2 servers with the same setup you loose a lot of time if you do everything by hand. 2 Servers are already a reason to think about automation.

How do you handle the setup for 10, 100 or even 1000 servers? Assume you have to install ruby 2.1.1 and Oracle Java 7 on 100 Linux servers. And by the way both packages are not available via “apt-get install”. Good luck by doing it manually 😀

That’s what I asked myself at the time I moved away from Heroku. I took a look to Chef and Puppet. But honestly I couldn’t get warm with any of them. Both are very complex and for my purpose totally over engineered. A friend of my recommended finally Ansible.


I never heard of it and I was skeptical in the beginning. But after I finished watching this videos, I was convinced! It’s simple and it works like expected!

Key Features

Here some facts

  • Ansible doesn’t need a master server!
  • You don’t need to install anything on your servers for Ansible!
  • Ansible works via SSH.
  • Just tell Ansible the IP Addresses of your servers and run the script!
  • With ansible your IT infrastructure is just another code repository.
  • Configuration in Yaml files

Sounds like magic? No it’s not. It’s Python 😉 Ansible is implemented in Python and works via the SSH protocol. If you configured password less login on your servers with public certificates, than Ansible only need the IP Addresses of the servers.


You don’t need to install Ansible on your servers! Only on your workstation. There are different ways to install it. If you are anyway a Python developer I assume you have installed Pypi, the package manager from Python. In that case you can install it like this:

sudo pip install ansible

On Mac OS X you can install it via the package manager brew.

$ brew update
$ brew install ansible

And there are much more ways to install it. Read here:

Getting started

With Ansible your IT infrastructure is just another code repository. You can keep everything in one directory and put it under version control with git.

Let’s create a new directory for ansible.

$ mkdir infrastructure
$ cd infrastructure

Ansible has to know where your servers are and how they are grouped together. That information we will keep in the hosts file in the root of the “infrastructure” directory. Here is an example:



As you can see there are 3 servers defined. 2 Of them are in the “dev_servers” group and one of them is in the “www_servers” group. You can define as many groups with as many IP addresses as you want. We will use the group names in Playbook, to assign roles to them.


A Playbook assigns roles (software) to server groups. It defines which role (software) should be installed on which server groups. Playbooks are stored in Yaml files. Let’s create a site.yml file in the root of the “infrastructure” directory, with this content:

- hosts: dev_servers
  user: ubuntu
  sudo: true
  - java
  - memcached

- hosts: www_servers
  user: ubuntu
  sudo: true
  - nginx

In the above example we defined that on all dev_servers the 2 roles “java” and “memcached” should be installed. And on all web servers (www) the role “nginx” should be defined. The “hosts” from the site.yml has to match with the names from the hosts file.

Otherwise we define here to each hosts the user (ubuntu), which should be used to login to the server via SSH. And we defined that “sudo” should be used in front of each command. As you can see there is no password defined. I assume that you can login to your servers without a password, because of cert auth. If you use AWS, that is the default anyway.


All right. We defined the IP Addresses in the hosts file and we assigned roles to the servers in the site.yml playbook. Now we have to create the roles. A role describes exactly what to install and how to install it. A role can be defined in a single Yaml file. But also in a directory with subdirectories. I prefer a directory per role. Let’s create the first role

$ mkdir java
$ cd java

A role can contain “tasks”, “files”, “vars” and “handlers” as subdirectories. But at least the “tasks” directory.

$ mkdir tasks 
$ cd tasks

Each of this subdirectories have to have a main.yml file. This is the main file for this role. And this is how it looks for the java role:

- name: update debian packages
  apt: update_cache=true

- name: install Java JDK
  apt: name=openjdk-7-jdk state=present

Ansible is organized in modules. Currently there are more than 100 modules out there. The “apt” module is for example for “apt-get” on Debian machines. In the above example you can see that the task directives are always 2 lines. The first line is the name of the task. This is what you will see in the command line if you start Ansible. The 2nd line is always a module with attributes. For example:

apt: name=openjdk-7-jdk state=present

This is the “apt” module and we basically tell here that the debian package “openjdk-7-jdk” has to be installed on the server. The full documentation of the apt module you can find here.

Let’s create another role for memcached.

$ mkdir memcached
$ cd memcached
$ mkdir tasks
$ cd tasks

And add a main.yml file with this content:

- name: update debian packages
 apt: update_cache=true

- name: install memcached
 apt: name=memcached state=present

Easy right? Now let’s create a more complex role.

$ mkdir nginx
$ cd nginx
$ mkdir tasks
$ mkdir files
$ mkdir handlers

This role has beside the tasks also files and handlers. In the files directory you can put files which should be copied to the server. This is especially useful for configuration files. In this case we put the nginx.conf into the files directory. The main.yml file in the tasks directory looks like this:

- name: update debian packages
  apt: update_cache=true

- name: install NGinx
  apt: name=nginx state=present

- name: copy nginx.conf to the server
  copy: src=nginx.conf dest=/etc/nginx/nginx.conf
  notify: restart nginx

The first tasks updates the debian apt cache.  That is similar to “apt-get update”. The 2nd tasks installs nginx. And the 3rd tasks copies nginx.conf from the “files” subdirectory to the server to “/etc/nginx/nginx.con”.

And the last line notifies the handler “restart nginx”. Handlers are defined in the “handlers” subdirectory and are usually used to restart a service. The main.yml in the “handlers” subdirectory looks like this:

- name: restart nginx
  service: name=nginx state=restarted

It uses the “service” module to restart the web server nginx. That is mandatory because we installed a new nginx.conf configuration file.


Allright. We defined 3 roles and 3 servers now. Lets setup our infrastructure. Execute this command in the root of the infrastructure directory:

ansible-playbook -i hosts site.yml

The “ansible-playbook” command takes a playbook file as parameter to execute it. In this case the “site.yml” file. In addition to the playbook file we let the command know where our servers are with “-i hosts”.


This was a very simple example as intro. But you can do easily much complex things. For example manipulating values in existing configuration files on servers, checking out private git repositories or executing shell commands with the shell module. Ansible is very powerful and you can do amazing things with it!

I’m using Ansible to manage the whole infrastructure for VersionEye. Currently I have 36 roles and 15 playbooks defined for VersionEye. I can setup the whole infrastructure with 1 single command! Or just parts of it. I even use Ansible for deployments. Deploying the VersionEye crawlers into the Amazon Cloud is 1 single command for me. And I even rebuild the capistrano deployment process for Rails apps with Ansible.

Let me know if you find this tutorial helpful or you have additional questions. Either here in the comments or on Twitter.

Big Refactoring

In the last 2 months the daily traffic at VersionEye doubled. The traffic on the site is constantly growing. We updated the EC2 instances with more RAM and more CPUs. Of course that is a very short term solution. To be able to scale better we did quiet some refactoring work in the last weeks.

The web application VersionEye started as a default Rails app. Exactly with this command:

rails new versioneye

That was more than 2 years ago. Over time the application was growing. We added parsers and crawlers to the Rails app, because it was easy to reuse the models and it was a fast way to add new features. At one point we added Grape to the project for implementing the API. For long time the Rails monolith was running on 1 single machine together with database and the crawlers. The server was lovely hand crafted by myself 🙂

With growing traffic the infrastructure had to change as well. First of all the backend systems (MongoDB, ElasticSearch and MemcacheD) got their on EC2 instance. The next step was to deploy the crawlers on a separate EC2 instance.

Ansible – a simple way to automate IT

2 servers are already reason enough to automate the IT infrastructure. I took a look to Chef and Puppet. But honestly I couldn’t get warm with any of them. They are too complicated and a bit over engineered for my current purpose. Marvin and Christian from Playtestcloud recommend me to take a look to Ansible. So I did and I really liked it. Ansible requires no master and nothing needs to be installed on the remote servers. The only requirement is that you are able to login via SSH to the remote server.

Currently the whole infrastructure for VersionEye is described in 17 roles, with Ansible. With 1 single command the whole infrastructure can be setup. Or parts of it. The ansible scripts are all managed in a separate git repository. That way all changes are tracked and documented. IT-Infrastructure is just another code repository now! That’s pretty cool!

Small code repositories

In the past the API was deployed together with the Rails application. And as the API become a little bit more popular that caused some issues. If somebody used heavily the API, that slowed down the web application. And vice versa.

In the last weeks we split up the big monolith Rails app. First of all we extracted the models, mailers, services and parsers into the ‘versioneye-core’ project. As next step we pulled out the Grape API into a separate application. The web application and the API are using ‘versioneye-core’ as a regular dependency now. Now the web application and the API can be deployed & scaled independently from each other, without effecting each other.

Splitting up the code in smaller repositories was not the only thing. The code itself was refactored as well. Big classes have been splitted up in smaller classes and test coverage was increased for each repository.

Next steps

I think we are on the right way. In the mean while the whole code base for VersionEye is distributed over 12 git repositories. There is still room for improvements. And as the service grows we will continue to split up in smaller services / git repositories.

But the directly next step will be to introduce RabbitMQ and EventMachine. Many tasks can be outsourced to RabbitMQ, to get done in parallel. For example tasks like sending out emails, parsing files and importing meta data from external APIs.