Lately, I've been working on Satellite, an open-source monitoring solution for Mesos clusters. It has become clear that Satellite needs to be easier to install and configure, and that we may need to change the very nature of its configuration files. This has led me think about software configuration in general, and particularly about which about which styles work best in which situations.
In a healthy software project, a line is drawn between the "Application" itself and its "Configuration". For good reason, we treat components of the project differently depending on which side of the line they fall.
|Included in the main distribution package of your software.
|Not included in the main distribution package of your software.
|Applicable to more than a single installation (including developers' workstations).
|May apply only to a single installation of the software.
|Always in source control. This means that changes to code have high visibility can be eeasily reviewed before they go into effect. Providing that source control is used well, it also provides a robust way to identify problems that a proposed change would introduce.
|Doesn't necessarily need to be in source control (and may not even consist of files). Changes tend not to undergo the same sort of code review process (and in fact, the authors of the Application may not have any way to know how the application is being configured).
|Changing code tends to trigger the project's established test procedures (especially automated tests).
|Changes don't necessarily produce publicly visible test results.
|Contains lots and lots of logic and behavior.
|Consists only of values, and contains no logic or behavior.
The last item in the "Configuration" column is a bit contentious. Here are some reasons to keep your Configuration free of anything but values:
- If you treat all of your code with the same care that you treat Application code, you maintain a greater degree of trust that the software will continue to function as intended wherever it is deployed. Of course, even if you limit your Configuration to only values, it's always possible for unanticipated values to create problems. In general, however, a stricter limit on the scope of what it's possible to change via Configuration also limits the ways in which an application can misbehave.
- If your configuration consists only of values, when troubleshooting a given installation, it is very easy to understand how that installation is configured. If you can see the list of configuration settings (for example, a static json file), you have the whole story.
- When your Configuration can be represented as data, this makes it easier for a wider variety of people (and perhaps processes) to reconfigure your Application. For example, Many more people (and processes) will feel confident about editing a JSON document than a small Python program.
Rails: an example of when to avoid "Code as Configuration".
I love Ruby on Rails, but not unconditionally. It my opinion, it needlessly fuzzes the distinction between Application and Configuration. Rails allows arbitrary ruby code to be embedded in its configuration files (e.g. database.yml) via ERB tags. For example:
Normally people limit the embedded Ruby to simply injecting a few inocuous references to environment variables, and that's not the end of the world. However, I've definitely seen folks embed some non-trivial code blocks (little undercover ruby programs) inside their Rails' YAML.
This is an antipattern that I discourage wherever possible. Rails would be better off avoiding ERB interpolation when reading files with an extension that implies static data (.yml). Instead, where necessary, actual Ruby modules (hopefully living in .rb files) should define more formal processes to choose between static configuration documents.
Riemann: an example of when "Code as Configuration" works great.
Some tools actively promote their own use of Code as Configuration. For example, Riemann is a brilliant way to monitor and react to events in distributed systems. Here's the fourth paragraph of the Riemann project homepage:
"Since Riemann's configuration is a Clojure program, its syntax is concise, regular, and extendable. Configuration-as-code minimizes boilerplate and gives you the flexibility to adapt to complex situations."
In fact, Riemann wouldn't make sense sense any other way!
Here's how I reconcile my love for Riemann with my distaste for Code as Configuration in general. Let's do our best to compare parts of a Riemann-based monitoring system with parts of a Rails application:
|Riemann monitoring system
|Rails framework code
|Riemann library code
|Specific models, views, controllers
|Riemann configuration .clj files
|YAML configuration (e.g. database.yml)
|Specific values (e.g. hostnames, environment variables)
The word "Configuration" clearly means something a little different in the context of Riemann system than it does in the context of a Rails application. If you think about it, Riemann by itself contains no domain-specific logic whatsoever. It simply provides a bunch of functions, leaving it up to your "configuration" to call any of them and make Riemann do anything.
Looking at it this way, when you develop a Riemann-based monitoring system, you're not simply configuring a pre-existing application. You are actually creating a new Application that utilizes Riemann as a library! You should treat it that way; check your riemann config into source control, and unit test its functions.
When is "Code as Configuration" a good idea?
When your "Configuration" is actually your App!