Django application with puppet

03 July 2016

This post is a quick tutorial how to provision geodjango application using puppet. While writing this tutorial I have taken the approach that I start with running code and then refactor this to something better.

Firstly what is puppet? From their website :

Puppet provides a standard way of delivering and operating software, no matter where it runs. With the Puppet approach, you define what you want your apps and infrastructure to look like using a common language.

So it’s a tool for automatic deployment. Other choices are: fabric or ansible. I’ve chosen this tool first because I use it in my work as a tool for automation as well as I was keen to look more how this all works.

Puppet is different from other mentioned tools in a way it does deployment: there are two entities: puppet master and a puppet agent. Master is responsible for keeping the configuration how puppet agent should look like. When puppet is run it pulls out information from puppet master and apply to puppet agent. In other words, puppet agent doesn’t have information about its configuration directly- it pulls this from puppet master. Other tools have a different approach: to push configuration via SSH.

To play with puppet I decided to choose my project: geodjango + leaflet. As I said before to run puppet you have to have two machines: puppet master + puppet agent. Fortunately, there is a way to develop puppet modules (module is responsible for configuration of one thing: like module for PostgreSQL or APT) via vagrant.

This tool is so awesome that it allows you to have puppet master and agent on the same machine. How to do this? After installing Vagrant & VirtualBox place a file called Vagrantfile inside your project folder:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|
    config.vm.network "private_network", ip: "192.168.33.10"
  config.vm.box = "ubuntu/trusty64"

  config.vm.provision :shell do |shell|
    shell.inline = "mkdir -p /etc/puppet/modules;
                    puppet module install puppetlabs-stdlib;
                    puppet module install ripienaar-concat;
                    puppet module install puppetlabs-apt;
                    puppet module install puppetlabs/postgresql;
                    puppet module install puppetlabs/vcsrepo;
                    puppet module install puppetlabs-git;
                    puppet module install arioch-redis;
                    puppet module install ajcrowe-supervisord;
                    puppet module install jfryman-nginx"
  end

  config.vm.provision "puppet" do |puppet|
        puppet.options = ["--templatedir","/vagrant/templates"]
  end

end

In this file, I set up ip address of machine: 192.168.33.10 as well as what OS will be inside vagrant: ubuntu/trusty64. Right after that, I tell vagrant to execute shell commands for creating a directory structure for puppet modules as well as install those modules that I will need later. At the end, I tell vagrant to run puppet with template directory. If you wanted to run this few times you can add to every puppet module install flag --force at the end of command like puppet module install puppetlabs-stdlib --force;.

Now I can move on to puppet code itself. Puppet modules have to be under folder called manifests. The name of pp file is right now not important so I left it as default value- default.pp. So what is in this file?

At the top I declared bunch of postgresql statements:

# required to postgresql resources to work
class { 'postgresql::server':  }
# required by geodjango
class {'postgresql::server::postgis': }
# create db
postgresql::server::db { 'geodjango':
  user     => $title,
  password => $title,
}

postgresql_psql { 'Add password to role':
  db      => 'geodjango',
  command => "ALTER ROLE geodjango WITH PASSWORD 'geodjango';",
  require => Postgresql::Server::Role['geodjango'],
}
# create geodjango role
postgresql::server::role {'geodjango':;}

postgresql::server::database_grant { 'grant ALL privilleges for user geodjango':
  privilege => 'ALL',
  db        => 'geodjango',
  role      => 'geodjango',
}

postgresql_psql { 'Enable postgis extension':
  db      => 'geodjango',
  command => 'CREATE EXTENSION postgis;',
  unless  => "SELECT extname FROM pg_extension WHERE extname ='postgis'",
  require => Postgresql::Server::Db['geodjango'],
}

As you can see the puppet syntax is straightforward. To read more about classes in puppet go there. I added one thing that can be not clear: require => Postgresql::Server::Role['geodjango']. It tells puppet that first postgresql::server::role resource needs to be applied. This is how to create dependencies.

So I’ve setup database needed for geodjango application, but there are more dependencies for geodjango- GIS libraries. How to install them via puppet:

package {
  'binutils':  ensure                 => present;
  'libproj-dev': ensure               => present;
  'gdal-bin': ensure                  => present;
  'postgresql-server-dev-9.3': ensure => present;
  'build-essential': ensure           => latest;
  'python3': ensure                   => latest;
  'python3.4-dev': ensure             => latest;
  'python3-setuptools': ensure        => latest;
  'python3-pip': ensure               => latest;
  'python3.4-venv': ensure            => latest;
  'python-pip': ensure                => present;
}

I’ve used redis for my application so I need it too. I’ve default config for redis and I don’t need to specify additional arguments for this resource:

class { 'redis':;}

I don’t like when application is run by root user that’s why I created a special dedicated one only for my application. I also like to keep my code on machines under /opt/name_of_project path so I created this too:

user { 'geodjango':
  ensure     => present,
  managehome => true,
}

file { ['/opt/geodjango/','/opt/geodjango/geodjango']:
  ensure => 'directory',
  owner  => 'geodjango'
}

For running my application I need it source code which is under git. To download it to vagrant machine I use:

include git

vcsrepo { '/opt/geodjango/geodjango':
  ensure   => latest,
  provider => git,
  source   => 'https://github.com/krzysztofzuraw/geodjango-leaflet.git',
  user     => 'geodjango',
  force     => true,
}

In vcsrepo, I added parameter force to make sure that repo is updated with new commits if it already exists on my deployed machine. It’s a good practice in python word to have isolated environments per application. In python 3 there is a tool for that in standard library called venv. How to create such virutal enviroment? By invoking similar command in shell:

python3 -m venv /opt/geodjango/env

As it is the command that is run in the shell, puppet has the special resource to handling these cases: exec. How to use it? It’s simple:

exec { 'create venv':
  command => 'python3 -m venv /opt/geodjango/env',
  path    => '/usr/local/bin:/usr/bin:/bin',
  require => Vcsrepo['/opt/geodjango/geodjango'],
}

I’m telling puppet to execute command that is in path. I decided that this command will be run only when there are changes in the repo. That’s why require argument.

Right now I created virtual environment. It’s time to install python packages that are needed for proper operation of the whole application. I’ve used so-called requirements.txt. To install packages from that file via puppet I need:

exec { 'install requirements':
  command => '/opt/geodjango/env/bin/pip install --requirement /opt/geodjango/geodjango/requirements.txt',
  path    => '/usr/local/bin:/usr/bin:/bin',
  require => Exec['create venv']
}

I specify here full paths for pip as well as for requirements file.

As everything is installed I need a tool for managing my geodjango application. I can do this by invoking django command runserver as a deamon. But there is a tool designed especially for that- supervisor. How does it works? You specify in ini file which commands needs to be run by supervisor. In addition to that, you can see if your command run was successful or not. To use supervisor you need:

include ::supervisord

supervisord::program { 'django':
  command     => '/opt/geodjango/env/bin/gunicorn geodjango_leaflet.wsgi -b 127.0.0.1:9000',
  user        => 'geodjango',
  directory   => '/opt/geodjango/geodjango',
  subscribe   => Vcsrepo['/opt/geodjango/geodjango'],
}

At the top, I included supervisord resource. D at the end stands for the daemon. Right below that I setup program django which is a simple gunicorn command run by user geodjango inside specified directory.

I have my app running via gunicorn managed by supervisor but there is one more thing: web server. In my apps I use nginx so I’m gonna setup that:

class {'nginx':
  confd_purge  => true,
  vhost_purge  => true,
}

$nginx_settings = {
  'upstream_name'    => 'geodjango',
  'upstream_address' => '127.0.0.1:9000',
}

file { ["/etc/nginx/sites-available/geodjango.conf","/etc/nginx/sites-enabled/geodjango.conf" ] :
  ensure   => file,
  content  => template('nginx.erb'),
  notify   => Service['nginx']
}

Starting from the top: I configured class nginx to do not setup conf.d files as well as vhost ones. Right after that, I defined puppet variable $nginx_settings which is a hash. I will be using this variable in resource file where I tell puppet to setup file in sites-available as well as in sites-enabled. Content of this file is present in template nginx.erb:

upstream <%= @nginx_settings['upstream_name'] %> {
  server <%= @nginx_settings['upstream_address'] %>;
}

server {

    location /static {
        alias /opt/geodjango/static;
    }

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_pass http://<%= @nginx_settings['upstream_name'] %>;
    }
}

As you can see I use nginx_settings inside my template. It’s because puppet takes variables for the local scope of given module- in this case default.pp. It’s good to know that they are two types of templates that puppet can use- one erb style (ruby) that I currently used in this example and puppet style (epp).

There are three more things to do: first to run database migrations, load initial data to the database and the third one to collect static files. I want to do them manually but here is puppet code if you are interested:

exec { 'run django migrations':
  command     => '/opt/geodjango/env/bin/python /opt/geodjango/geodjango/manage.py migrate --no-input',
  path        => '/usr/local/bin:/usr/bin:/bin',
  require     => Exec['install requirements'],
  subscribe   => Postgresql_psql['Add password to role'],
  refreshonly => true,
}

exec { 'load initial data to db':
  command     => '/opt/geodjango/env/bin/python /opt/geodjango/geodjango/manage.py loaddata',
  path        => '/usr/local/bin:/usr/bin:/bin',
  require     => Exec['install requirements'],
  subscribe   => Postgresql_psql['Add password to role'],
  refreshonly => true,
}

exec { 'collect static files':
  command     => '/opt/geodjango/env/bin/python /opt/geodjango/geodjango/manage.py collectstatic --noinput',
  path        => '/usr/local/bin:/usr/bin:/bin',
  require     => Exec['install requirements'],
  subscribe   => Vcsrepo['/opt/geodjango/geodjango'],
  refreshonly => true,
}

All these 3 commands are django one (loaddata is made by myself). To use them with puppet you need to specify them under exec resource.

That’s all for this time. To sum these two articles up: I really enjoyed playing with puppet. Especially this clear syntax that puppet provides. I also like that you can even write a tests for puppet code! Having two machines (puppet master & agent) for provisioning is good because you can have real time update of your agent machine but requiers resources.

What is more I currently use vagrant with default config which is not good- not enough RAM on client machine forces puppet run to stop. I could set it up for higher value but my computer isn’t’ good enough. To bypass this I have plan to use docker with puppet master and agent. Lastly installing every time puppet modules in Vagrantfile isn’t good idea- that’s another thing to change and maybe use something like puppet-librarian?

Source code for this is avaiable here.