PaaS bakeoff: Comparing Stackato, OpenShift, Dotcloud and Heroku for Django hosting and deployment

paasbakeoff-300x0If you’ve been following this blog, you’ll know that I’m a big fan of PaaS providers – heck, I even built one which gave me even greater respect for all the work that goes into making a platform that is flexible, scalable, reliable and easy to use.

During the last few weeks I’ve been kicking the tires on these PaaS solutions, both publicly hosted ones like Heroku and Dotcloud as well as open source ones like OpenShift and CloudFoundry.

Last night I gave a talk Django deployment revisited at the Django Boston meetup group, and discussed four different PaaS providers: Stackato, Dotcloud, OpenShift and Heroku. As an example, I showed for each provider how to deploy Mezzanine, a Django-based blogging and CMS software.

Here are the slides from the presentation (sorry, no audio):

Django deployment with PaaS from Appsembler

 

Show me the code!

All the code used in the examples is available in this paasbakeoff Github repo – with a different branch for each PaaS provider.

One criteria for a PaaS is how many files do I need to add/modify in order to get my Django project deployed. What became apparent as I was giving the talk, is that all of the providers function quite similarly in regards to how you get your Django project working with them.  It really boils down to these things:

DATABASES

All of the providers will provision a PostgreSQL or MySQL (except for Heroku) database for you without you needing to do anything except issue one command.

The actual database creation happens automatically except for Dotcloud in which you get to specify the name of the database in your settings.py, and you have complete control about how it’s created in a createdb.py script. You can either see this as an advantage (complete flexibility) or a disadvantage (one more thing to have to manage). It’s the classic tradeoff – control. vs ease-of-use, that is a recurring theme when adopting a PaaS solution.

The way you tell Django to use this provisioned database, is to modify your settings.py file (or use a separate production_settings.py) to override the DATABASES setting. All of the providers expose environment variables that contain the connection string:

Stackato

DATABASE_URL

You can also use VCAP_SERVICES to retain CloudFoundry Core compatibility.

OpenShift

OPENSHIFT_MYSQL_DB_URL
OPENSHIFT_POSTGRESQL_DB_URL

DotCloud

DOTCLOUD_DB_SQL_LOGIN
DOTCLOUD_DB_SQL_PASSWORD
DOTCLOUD_DB_SQL_HOST
DOTCLOUD_DB_SQL_PORT

Heroku

DATABASE_URL

Also see the convenient dj-database-url package by Kenneth Reitz for handling the parsing of the DATABASE_URL string with one line of code.

Heroku lets you attach multiple PostgreSQL databases (master/slave, or staging/production) and each database gets it’s own color-coded database URL (i.e. HEROKU_POSTGRESQL_GREEN, HEROKU_POSTGRESQL_RED, etc.)  Most Django projects are only going to use 1 database, so Heroku provides a pg:promote command that lets you promote that database to be the canonical DATABASE_URL.

STATIC_ROOT

While it’s possible to have Django serve up static assets (images, CSS, Javascript), it’s advised that all static assets should be served up using an HTTP server like Apache or Nginx for performance reasons. All of the PaaS providers have a built-in way to do this except for Heroku which requires that you serve them up using Amazon S3.

Stackato

Stackato strangely uses uWSGI to serve the static assets. In the stackato.yml file:

processes:
    web: $STACKATO_UWSGI --static-map /static=$HOME/mywebsite/static

OpenShift

In the settings.py file:

STATIC_ROOT = os.path.join(os.environ.get('OPENSHIFT_REPO_DIR'), 'wsgi', 'static')

In /wsgi/static/.htaccess:

RewriteEngine On
RewriteRule ^application/static/(.+)$ /static/$1 [L]

Dotcloud

In settings.py:

STATIC_ROOT = '/home/dotcloud/volatile/static/'

In nginx.conf:

location /static/ { root /home/dotcloud/volatile ; }

Heroku

In settings.py:

STATICFILES_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

See complete example of S3FileStorage

 

MEDIA_ROOT

Similar to the STATIC_ROOT, the files in MEDIA_ROOT need to not only be served up Apache, Nginx or uWSGI, but also need to be persisted across subsequent deploys. By default, files that are uploaded through your Django application will be stored in the application container that is thrown away on every deploy. So we need to tell Django to store these files in a data directory that won’t be discarded.

Stackato

You first need to create a ‘filesystem’ service by adding it to your stackato.yml file:

services:
    postgresql-mywebsite: postgresql
    filesystem-mywebsite: filesystem

Then in your settings.py:

MEDIA_ROOT = os.environ['STACKATO_FILESYSTEM']

OpenShift

OpenShift provides a persisted data dir that can be referenced with the environment variable OPENSHIFT_DATA_DIR:

MEDIA_ROOT = os.path.join(os.environ.get('OPENSHIFT_DATA_DIR'), 'media')

You then need to symlink this directory to the static directory that is being served up by Apache (see above in STATIC_ROOT).

In .openshift/action_hooks/build:

#!/bin/bash
if [ ! -d $OPENSHIFT_DATA_DIR/media ]; then
    mkdir $OPENSHIFT_DATA_DIR/media
fi

ln -sf $OPENSHIFT_DATA_DIR/media $OPENSHIFT_REPO_DIR/wsgi/static/media

Dotcloud

Add the following to your settings.py:

MEDIA_ROOT = '/home/dotcloud/data/media/'

Add another line to your nginx.conf:

location /static/ { root /home/dotcloud/volatile; }
location /media/ { root /home/dotcloud/data/media; }

Heroku

In settings.py:

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage'

See complete example of S3FileStorage

 

WSGI

The PaaS providers use mod_wsgi, uWSGI or Gunicorn to serve the Django application.

Stackato

Stackato uses uWSGI by default but you can use gunicorn instead if you prefer. You simply place a wsgi.py file that references our Django settings file.

OpenShift

OpenShift uses mod_wsgi and expects to find a file /wsgi/application that looks something like this.

Dotcloud

Like Stackato, Dotcloud expects a wsgi.py file in the root of the project directory.

Heroku

Heroku recommends using gunicorn. Simply add gunicorn to your requirements.txt and INSTALLED_APPS, and create a file called Procfile in the root of your repo, that contains the following:

web: gunicorn hellodjango.wsgi -b 0.0.0.0:$PORT

Requirements

All of the providers expect a requirements.txt file to be in the root of the project directory except for OpenShift which uses the more Pythonic way of defining dependencies in a setup.py file. You can still reference your requirements.txt file using this trick.

Configuration

Stackato (example stackato.yml) and Dotcloud (example dotcloud.yml) both use a YAML file to define configuration information about your app (i.e. what database to create and bind)

OpenShift doesn’t seem to have a configuration file, so you have to add the cartridges (database) with a separate command.

Heroku uses a Procfile but most of the configuration is done using the config and addons commands.

Management commands

When it comes time to provide instructions for what should be done when you do a deploy, each provider has a slightly different way of handling these management commands (syncdb, collectstatic, migrate, etc.).

Stackato uses post-staging hooks in the stackato.yml file.

Dotcloud uses a simple postinstall bash script.

OpenShift uses a deploy bash script in the .openshift/action_hooks directory.

Heroku does most of these things for you automatically and you can disable them by adding a collectstatic_disabled marker to the .heroku directory.

Background processes with Celery

Many advanced Django applications require the use of background job processing using Celery, a distributed task queue. Which PaaS providers support Celery?

Stackato supposedly had Celery support at one time as evidenced by this thread, but the latest commit on the celery-demo app is that it no longer works.

OpenShift supposedly has Celery support according to this thread and this closed bug, but I don’t see any definitive documentation about how to set it up on OpenShift.

Dotcloud has a complete documentation page on how to use Django and Celery on Dotcloud.

Heroku lets you run Celery as just another worker.

So who took the 1st prize trophy home?

All of the PaaS providers are winners in my book, because they’re making our jobs as developers easier! But there are clearly pros/cons for each one:

Stackato

Pros:

  • runs anywhere (EC2, VirtualBox, VMWare, HPCloud, etc.)
  • recent versions of MySQL and PostgreSQL and support for most other services
  • you can apt-get Ubuntu packages

Cons:

  • long deploy times due to rebuilding the virtualenv on every deploy
  • no hosted offering, so if you want to use it you need to deploy it yourself to EC2 or HP Cloud

OpenShift

Pros:

  • Open source and backed by a company (Redhat) known for open source community building
  • Zero downtime deploys with Jenkins builds and hot_deploys

Cons:

  • Older versions of Python 2.6 and PostgreSQL 8.4
  • Bit clunky handling of git repos. Add your app’s source as a remote, rather than adding OpenShift git repo as a remote
  • Missing built-in services that other PaaS’ have (Redis, Memcached, RabbitMQ)

Dotcloud

Pros:

Cons:

  • Flexibility adds some complexity

Heroku

Pros:

  • Good documentation (including e-book Heroku Hacker’s Guide)
  • Large community of developers using Heroku (more likely you’ll be able to get your question answered)
  • Large ecosystem of 3rd party add-ons
  • Easiest deployment – Heroku auto-detects Django app and sets most things up automagically (syncdb, collectstatic, etc.)

Cons:

Feature comparison matrix

This is by no means an exhaustive list, but just the things I could think of off the top of my head. If you have suggestions for other things to be included, let me know in the comments below.

Stackato

OpenShift

Dotcloud

Heroku

Python

2.7, 3.2
stackato runtimes

2.6 (2.7)

2.6.5, 2.7.2, 3.1.2, 3.2.2

2.7.2

PostgreSQL

9.1

8.4

9.0

9.1.6

MySQL

5.5

5.1

5.1

(Yes, via RDS)

Persisted FS

Yes

Yes

Yes

(Yes, via S3)

Redis

Yes, 2.4

No

Yes, 2.4.11

(Yes, via addon)

MongoDB

Yes, 2.0

Yes, 2.2

Yes, 2.2.1

(Yes, via addon)

Memcached

Yes, 1.4

No

Yes

(Yes, via addon)

RabbitMQ

Yes, 2.4

No

Yes, 2.8.5

(Yes, via addon)

Solr

No

No

Yes, 3.4.0

(Yes, via Websolr)

Cron

Yes

Yes

Yes

Yes

Extensible

Yes, apt-get install

Yes, DIY cartridge

Yes, custom service

Yes, buildpacks

WebSockets

Yes

Yes

Yes

Yes, via Pusher add-on

Hot deploys

No

Yes, w/ hot_deploy

Yes, with Granite

Yes, with preboot

 

If it ain’t broke, don’t fix it

There were a lot of questions at the end about reliability, portability, extensibility which I think sums up the reasons that people are still not jumping on these PaaS platforms. When you’ve got something that works (Fabric file that pushes to AWS), why change it?

Several people contacted me afterwards and said that after my talk, they are now reconsidering their opinion of PaaS providers and might dump the Linode, Rackspace, AWS servers that they’re babysitting in favor of a PaaS deployment solution.

The Future of PaaS

PaaS is still in its infancy and it will be interesting to see over the next few years what happens in the developer ecosystem as these platforms mature. There will no doubt be more consolidation, and hopefully some standardization around common formats.

Imagine being able to define a generic deploy.yml file in your code repo that is consumed by each PaaS provider and translated into their specific way of doing things.

At the last DjangoCon 2012 sprint, we started working on a project called django-deployer, to attempt to make a PaaS-agnostic deployment tool for Django. We added support for Stackato and Dotcloud and then the sprint was over, and I haven’t had time to revisit it. But if anyone is interested in working on this, let me know!

What’s next

I only had time in this presentation to cover four PaaS providers, but there are others that have Python/Django support including Amazon Elastic Beanstalk, Google App Engine, CloudFoundry, AppFog and even Microsoft Azure!

What would you like the next blog post to be?  Leave a comment below to express your preference!

  • Additional PaaS providers compared like we already did with these four
  • Pricing comparison showing for an average Django application what the costs are on each provider
  • Deployment time durations – statistics about deployment times (how long is the first push, subsequent deploys)
  • Scaling your app on a PaaS
  • something else?
Sign up for the SaaS Developers Kit to get notified about useful developer tips!
End of post.
  • Tanvi

    1. Additional Paas Providers and the comparisons with each other, If all facts are presented about the PaaS providers it can be easier for developers to decide which tool the really want to go for.