I just got back from the Container Camp conference here San Francisco. It was an informative conference with presentations covering a wide range of interesting topics in the containerization ecosystem. There was a lot of discussion about the Docker application suite since they are the dominant player, but there was also good coverage of alternative approaches and presentations on the full spectrum of tooling required to use containers in production.
Some Things are Hard
One of the common threads among many of presentations was the concept that while Docker simplifies a complex problem, it exposes other difficult issues, and in a certain sense makes some of them more complex. I would put networking at the top of this list, as Docker in many ways makes networking even trickier than it already is. Fortunately, the community is very aware of this and there are a number of very promising projects which I believe will resolve these issues. (And of course expose the next hard thing.)
Another big – and very much undecided – area of discussion is what to do about databases and other large-scale services. Most people are in favor of containerizing most of the tools in addition to the core application, but there is a good deal of debate about how to deal with databases and other large services that generally require one or more dedicated machines.
Containerization has obvious benefits for services that are small enough to be run together on a single machine, but what about the type of services that span multiple physical machines? There are definitely advocates for both approaches and no clear winner at this point. At Funding Circle we are happy with our current DB setup, so we will probably not move to containers in that area unless a very strong case presents itself. If I were working on a greenfield project I might spend more thought on this subject, but even then it would not be top of mind.
There were a few key concepts that were mentioned by almost every speaker and the related applications:
- etcd - Almost every presenter mentioned etcd
Software Defined Networking
I won’t go into further detail about these since each topic is worthy of its own series of posts.
Host OS vs. Container OS
Three of the presenters were developers of host OSes designed for running containers and each has developed a unique approach:
CoreOS is one of the early movers in the containerization world and was the first to focus on running lightweight containers like Docker. It is a very lightweight host OS that auto-updates the base software and libraries that make up the OS in one fell swoop. There’s no package management system, all applications outside of the base OS are installed via containers.
The CoreOS team has also built many of the most popular tools to manage containers, and have even built an interesting Docker alternative: rkt.
Project Atomic is RedHat’s project to make the OSes in their ecosystem Docker-friendly by default. They say it is production ready in RHEL, and will be production-ready in CentOS by June 2015. I expect they will get good early adoption, as it will be an easy migration path for many of those looking in to containerization of their existing infrastructure.
I had not heard of Rancher before but they have an interesting approach – they take what CoreOS started to the extreme: Docker becomes both the init system and the package management system.
A “System Docker” runs as PID 1 which starts and manages all of the system services as Docker containers. Software is installed by Docker containers and updates are a simple pull against a Docker repo followed by a container restart. As far as I understand there is literally nothing in the base OS except for the kernel and Docker.
They also provide a platform product that simplifies running Docker in a clustered environment.
From my perspective, the decision on host OS can be deferred and is probably one of the less critical decisions points, although some tooling decisions will lead to a particular host OS. It will be interesting to watch how this area plays out over the long run.
What I found notable with relation to the OS is that there was almost no one who discussed what OS to use in the container images themselves, and those that did mention it did so only in passing.
The first thing most people do is build an image that looks just like the servers they are running. They start with a full version of CentOS, Ubuntu or the like, and include all the dependencies and applications they are used to. These images quickly grow to gargantuan scales. Our own worst offender clocked in at nearly 2.5GB.
Once you realize this is not a viable approach you will likely try to bring the size down by stripping out unnecessary software or squashing the layers, but even these stripped down images can wind up around half a gigabyte or more as many of the standard OS base images clock in at hundreds of MB.
For our second generation Docker infrastracture we are using Alpine Linux as our base image, which clocks in at just about 5MB. This is a reasonable starting point for creating an image. While it’s possible to get even smaller, this seems like a worthwhile tradeoff for the additional comfort of a good set of familiar tools, a complete package management system, and almost everything you could want to run an application.
Technically, you don’t require an OS at all to run services in a container. With statically linked binaries you can fairly easily do without the OS, but there are many already existing applications that cannot realistically be converted to static compilation. It is even possible to build an image that works with dynamically linked binaries and libraries, but this seems like more effort than its worth in most cases.
I expect that for the foreseeable future most implementations will need some sort of OS on the container, and Alpine Linux seems to be the a great option. I highly recommend looking at Alpine or other similar micro distributions, as early in your containerization efforts as possible.
What surprised me the most was that at a conference dedicated entirely to containers, only about 10% of the attendees were actually using Docker, or any type of container, in production. I expected that there would be a higher level of adoption by now among those interested enough in the concept to attend this type of conference. It does imply that the interest in Docker / containers is very strong, so I expect the rates of production use to quickly go up in the coming year.
Here at Funding Circle have been using Docker in production for about 6 months, and first started working with Docker almost a year ago. Now that we’ve gained a better understanding of the role containers can play we are starting our 2nd generation Docker implementation using the latest and greatest techinques and tools in the container ecosystem.
Cloud 66 is a consulting / contracting company offering DevOps as a service. The presentation by Khash Sajadi covered their experience implementing Docker on thousands of servers for numerous clients. He discussed the hurdles they faced with their various clients, both from an adoptance standpoint, as well as technically, which they’ve reduced to 11 major topics that need to be addressed in each implementation:
- OS Vendor for Host and Container
- Scheduling Engine
- Service Discovery
- Continuous Integration / Delivery
- OS package repository: should it be private or public?
- Should data & the DB be in a container?
- Logging - Active & passive logging
- Load balancing: Inside / Outside
- Debugging Note: I believe there is 12th item in addition to the 11 identified by Sajedi that is important to consider while planning any infrastructure implementation:
- Auto Scaling
Due to the limited time of each presentation he was only able to go into detail on the three bolded topics. The topics he dived into were really informative and talked about the practicalities of adopting containerization in a production environment.
It was nice to see that we’ve started planning here at Funding Circle about almost all of these topics in our planning. I feel confident we are on the right path for our second generation containerization efforts, but we still have a good deal of work to do to get it right.
Docker as a desktop application package management system
My favorite presentation at Container Camp was “The Willy Wonka of Containers” by Jessie Frazelle from Docker. While it wasn’t particularly relevant to what we are doing at Funding Circle, her talk was unique and thought-provoking presentation. She has gone down the path of using Docker as a package management system for desktop use, to the point where she has packaged Chrome, and some 3D games into containers. This not something I ever considered using Docker for.
I’m sure other folks have already considered this use case (its an extension of the concept of what many are already doing on servers) but Jessie takes it to a degree that I’ve not seen anybody else do.
It is a great experiment and shows the kind of potential containerization has when you push it to the extreme edges of what it’s designed for. It makes me excited to think about what other fascinating applications will come out of the ecosystem.
She has an in-depth blog post if you are interested in more detail on this topic.
Most interesting sponsor product
Sysdig has a really interesting monitoring / performance dashboard with realtime streaming data of system level information alerts – you can drill down to an almost ridiculous level of granularity but also get a 10,000 foot view of your infrastructure. It can also show you the state of your entire system at a specific point-in-time and lots of other cool stuff I haven’t had time to look into yet. I plan to dig deeper into their offerings soon to see if it is something that will be applicable for us.
All the presentations are now available on Container Camp’s YouTube channel.
Are you interested in working on Docker in a production environment? Come join us!