I've given overviews to many people over the years about how we run deployments at Vast and figured it was time I wrote some of this down. Right or wrong, I've chosen to separate code and system configuration management. Much of that decision stems from the lack of adequate 'push' functionality where systems like puppet and chef are concerned. I wanted to be able to better determine ~when~ code was going to be running on a server than I felt like those tools provided. We use chef as our system configuration management service and rundeck plus a home-grown bash shell framework called vastexec to handle code deployment.

To get a service into production we start by requiring a teamcity generated artifact. The next requirement is a vastexec plugin (more specifics on that later). The plugin is generated by the software engineer (typically with some operations involvement) and checked into git, where it, too, is turned into part the plugins teamcity artifact. From that point there's a certain amount of hand-off where the developer lets operations know the resource footprint they expect and any dependencies, RE java version, libraries, etc. the service might have. Operations will then create, at the minimum, a role for that server, though we usually end up creating a cookbook for it as well. We have everything lined up by environment and role, which effectively act as queryable tags for us. I can do a knife search for various attributes on a system and treat that as an authoritative list, which is actually how we handle our deploys through rundeck. I wrote a provider (https://github.com/looprock/rundeck-chef-provider) for rundeck which compiles the attributes into rundeck tags. When we specify a new project inside rundeck, we then use those tags as node filters which determine what systems that software will be deployed to. The great thing about this setup is that the chef client needs to actually successfully run it's cookbooks before those systems show up with the proper tags via the provider, so we know as that point they'll be properly bootstrapped to support the application when vastexec attempts to execute against them.

I've outsourced the problems of bootstrapping and the execution layers to chef and rundeck, and tied them to a central source of truth, chef itself. That just leaves the application logic itself, which is handled by the vastexec plugins.

Vastexec

Vast has a lot of legacy code as well as a continuous pipeline of new services getting cranked out on a steady basis. The services span multiple languages, with significant platform and requirement differences. We've typically had 'standards' for how we do things per language, though, as is the case with most 'standards' they usually deviate over time. I wanted to try to wrangle all of this into some kind of consistency so that we could codify how we create and manage projects and jobs inside of rundeck. Vastexec was the answer to that, providing a unified interface that can still handle application level complexity.

Vastexec is composed of two parts. The first is a primary script, which is operations managed and provides a lot of built-in functions ops has battle-tested and engineered. The second part is comprised of plugins, which contain service specific code and are maintained primarily by the developers. Both components are bash based. The primary reason for using bash vs another language is that I was aiming for the lowest common denominator. Almost all our engineers were at least familiar with the command line, so it was much easier to take what they were doing there and codifying it inside a plugin versus forcing them to filter that logic through another language. There is, however, nothing stopping anyone from bundling a script in another language and launching that from a plugin.

Vastexec options

usage: /usr/local/bin/vastexec [options]

Execute a plugin

OPTIONS:
-p REQUIRED: specify a product
-r specify a release
-e specify an environment
-h Show this message
-s Show all plugins
-b rollback to previous version
-f force (if supported by plugin)
-v verbose (if supported by plugin)
-o generic option parameter

Command example

vastexec -p coolvastthing -r 1.4.1-SNAPSHOT-14277 -e prod

Possibly useful global variables

BUILD - the build ID used in teamcity URL /${BUILD}:id/

DLDIR - a unique temporary directory defined when vastexec is executed

DSTAMP - The current date in the format: YYYYMMDD

TEAMCITY - the teamcity server

Global functions

add_lb - Add a system back to the load-balanced pool

backup source_dir backup_dir - Backup the current deployment dir for rollback. It will move the entire source_dir into backup_dir. All arguments are mandatory and should be absolute paths.

createsymlinkrollback working_dir release_base_name suffix - For symlink style releases, document the last release version before a new deploy. release_base_name should be in the form of TCNAME/PRODUCT. Suffix means anything after the release version, RE: '-installer'.

ensuredirs "dir [ dir2 dir3 ... dirN ]" user group - Make sure dir is present, create it if it's not and chown it to user : group. dir, user and group arguments are mandatory. dir is an absolute path. Optionally, you can enter multiple directories in quotes, RE: "/data/foo /data/bar"

extract file dest_dir user group - Deploy a Teamcity artifact on the system.

fetch url_without_filename filename - Pull an artifact from Teamcity.

findpid pidfile user pid_search_string - Try to identify a pid, first by checking the pidfile, alternately via ps (error-prone!). pidfile should be an absolute path to the pidfile. $PID variable will be set with the result.

portdie port - Wait 10 times, 6 seconds apart for the port to die.

portstart port - Wait 10 times, 6 seconds apart for the port to start listening.

procdie pid - Wait 10 times, 6 seconds apart for the process to shut down on it's own. If it doesn't exit by itself, kill it. The best practice: use $PID for pid here, if you've used the findpid function above.

procstart pid - Wait 10 times, 6 seconds apart for the process to start

purgeoldreleases work_dir release_base_name (TCNAME/PRODUCT)] latest_release_dest - delete old release directories, leaving only the two latest ones

rm_lb - remove a system from the load-balanced pool via the /machine file

showplugins - list all the available plugins

startsvd product - Starts product supervisor managed process and makes sure it comes up.

stopservice init_script_name pid - stop a service and make sure it's dead

stopsvd product - Stops product supervisor managed process and makes sure it goes down.

testexists [file or space delimited list of things to check] - abort if files, dirs, links don't exist

urlcheck [healthcheck URL] - looks for a 200 response, but will fall back to looking for "OK" as the first part of the page content.

Anatomy of a vastexec plugin

A plugin is a shell script comprised of a function. Ideally a plugin will be both idempotent and be able to build an application up from scratch, including actions like creating necessary directories and adding users. It's also important to use the port check, process check, and the urlcheck function to verify the service is in the state it should be at any given step. Using these tools you should be able to keep from putting a broken service in production! The name of the file should be [function].plugin, R.E. hello.plugin.

Examples

hello.plugin

hello()
{
echo "Hello world"
}

A fairly basic plugin for deploying a single artifact:

coolvastthing.plugin

coolvastthing()
{
# $PRODUCT is the same as the function: coolvastthing
declare WORKDIR="/data"
declare DESTDIR="$WORKDIR/$PRODUCT" # /data/coolvastthing
declare LOGDIR="${WORKDIR}/logs/${PRODUCT}"
declare USER="cvt"
declare GROUP="cvt"
declare TCID="bt666"
declare TCNAME="cool-vast-thing" # the artifact path is different in teamcity so we're using this variable
declare PIDFILE="/tmp/${TCNAME}-service.pid"
declare TESTPORT="8090"
declare TESTURL="http://localhost:${TESTPORT}/healthcheck"

# take the node out of the load-balancer
rm_lb

# make sure DLDIR, DESTDIR, and LOGDIR all exist and have the right permissions
ensuredirs "${DLDIR} ${DESTDIR} ${LOGDIR}" $USER $GROUP

# this creates a file ${DESTDIR}/.rollback_version which we'll use to roll back the release if needed
createsymlinkrollback ${DESTDIR} ${TCNAME} "-installer"

# find the pid of the current process. If no pidfile is found, we'll look in the process list for a process owned by the user which contains the string
# "cp cool-vast-thing". This is a good match for finding our java processes, which generally contain "-cp <teamcity artifact>"
findpid ${PIDFILE} ${USER} "cp ${TCNAME}"

# stop the process; ensure it's dead and not listening on it's testport
stopservice ${PRODUCT} ${PID}
portdie ${TESTPORT}

FILEBASE="${TCNAME}-${RELEASE}-installer"
FILENAME="${FILEBASE}.tar.gz"
URLPATH="http://${TEAMCITY}/repository/download/${TCID}/${BUILD}:id/target/"
fetch ${URLPATH} ${FILENAME}

# extract the contents of cool-vast-thing-1.4.1-SNAPSHOT-14277-installer.tar.gz to /data/coolvastthing/cool-vast-thing-1.4.1-SNAPSHOT-14277-installer
extract ${FILENAME} ${DESTDIR}/${FILEBASE} ${USER} ${GROUP}

# link the release to current
ln -s ${DESTDIR}/${FILEBASE} $DESTDIR/current

# start the process again
echo "OK: restarting service $PRODUCT"
service ${PRODUCT} start
sleep 2
# find the new pid
findpid ${PIDFILE} ${USER} "cp ${TCNAME}"
# make sure the process starts
procstart ${PID}
# make sure the port is listening
portstart ${TESTPORT}
# verify the service is reporting OK on the healthcheck URL
urlcheck ${TESTURL}

# clean up after ourselves
# remove old releases, leaving only the last two
purgeoldreleases ${DESTDIR} ${TCNAME} ${FILEBASE}
echo "OK: deleting temporary download dir: ${DLDIR}"
rm -rf ${DLDIR}

# return the system to the load-balancer
add_lb
}

We also use 'base' plugin inheritance to handle similar jobs with slightly different variables:

frontend-site1.plugin

frontend-site1()
{
declare TCID="frontend_site1"
declare TESTPORT="9090"
source frontend-base.sh
}

Feel free to hit me up with any further questions! doug at webuilddevops