Content Dark Launch

In March 2017 I gave a talk at WordCamp Raleigh on content dark launch, a technique that has saved us many hours of stress when launching new sections of content on our WordPress-based intranet site.

If video is not your speed, I’ve written up most of the talk below as well.

 

Table of Contents

  1. Background
  2. The pain
  3. Looking for a solution
  4. How we do content dark launching
  5. Where we are now
  6. Unanticipated benefits

 

on off switch
Photo by cea+ on Flickr

1. Background

I work on an intranet. It’s big, and it’s old. When I plugged a wildcard into the search engine last, I got 1.3 million results. The intranet has been around more than 20 years – if it were an American, it would be old enough to vote, and to drink.

A couple of years ago, we decided to move some of it into WordPress – as a co-worker recently said, we’re moving the good stuff into WordPress, the stuff people really care about. So far we’ve got about a thousand pages and posts in our WordPress site and although nowhere near all 1.3 million “things” will move into the site, we have really only just begun.

We’ve given our mostly non-technical content owners a way to stage their content safely on a content staging site, and move it over to the live site using a plugin called RAMP.

RAMP is great. It gives us a nice UI for moving content in a very granular way, and in a way that’s logical to content people. You create a batch with the pages and posts that you want to move and it moves them – no database copy, you can have pages in progress on the staging site while moving other, completed pages to the live site. That’s not possible when you’re relying on database copy to migrate content.

The original idea behind using WordPress, in general, is that it would be largely self-service. It’s a walk up and use product unlike the big CMS in use on sas.com. Content owners can be trained by other content owners and can be successful at creating and maintaining pages without the help of a developer. This has worked out well – for most of our content owners, WordPress is pretty easy to use even after customization.

As we work through each business unit’s content, we fill out the information architecture (IA) that we’ve built based on user testing. Each section of the IA is a launch. These content launches are where the content owners needed the most help – launches are definitely not a self-service operation as we’d hoped. The content folks can handle maintenance, but a brand new section of the site is too complicated.

2. The Pain

The SAS Camp site is my favorite example. It’s a four page site. Four little pages. However, those four pages are packed with features that all have to be migrated over to the live site, too. Those pages are full of dependencies.

RAMP tries really hard to make sure that if a page has any kind of dependency that it can detect – an image in the body copy is a good example – that dependency will be there on the live site after the page has been RAMPed. So, RAMP will automatically add stuff to content batches. Because of the way we’ve customized the site, RAMP might add quite a few pages, posts, and media to a batch.

I’ve had to customize RAMP to work with our customizations – and my work does have the occasional bug. So we can wind up with unweildy batches and despite RAMP’s best intentions, we can wind up with broken stuff on the live site, anyway, because of these bugs.

Planning these complicated launches is a nightmare, and even with many hours spent trying to plan launches in detail, launch day often wound up going something like this:

  1. Try to follow a launch  plan made with incomplete information
  2. Spend hours in a Skype call in crisis mode / boredom mode on launch day
  3. Have visibly broken stuff on the live site, frantically try to fix
  4. Go home, collapse. (Collapsing all evening doesn’t go over well when one has a six year old.)

After a few of these stressful launches I realized that our launch day pains could be distilled down to three root causes:

  1. Stuff we didn’t know about (because non-technical content owners just don’t know to tell us) is only discovered after it’s visibly broken on the live site after launch
  2. Stuff is broken after RAMPing
  3. RAMPing takes a loooooong time

3. Looking for a solution

At first I thought the solution was to make launches smaller. Thinking like a developer steeped in Agile processes and continuous deployment, I thought that if we could get launches down to the smallest viable set of content it would make things much less complicated.  In Agile software development, we try to distill our development stories down to the smallest piece of code that can be deployed and used.

However, as soon as I started asking our content owners how small they could make their launches, I knew I was wrong. Partly by the look on our client’s face, and party just in my gut, I knew.

After all, this was a technical problem that needed to be solved. As a developer, the way I provide value is by helping the business achieve its goals. Content owners know their content and know what content has to logically be grouped together in a launch. If there are technical problems with launching groups of pages together, it’s my job to solve those problems, not ask the business to work around them. It also hurts my pride to think that there’s a technical problem that I can’t solve!

Around this time I had started to follow on Twitter the founder of a startup that provides a SaaS feature flagging service for software companies, and I could see a possible solution in what I was learning from her.

What is feature flagging?

In software, feature flagging (also known as feature toggles) is the practice of wrapping a small software feature in a piece of code that allows the developer to toggle its availability off and on. When such a feature is launched, but toggled “off”, this is referred to as “dark launching”.

A great example of a company that uses feature flags all the time is Facebook. If you’re a Facebook user, you may have seen people in your timeline talking about some visible change to Facebook (often one they are upset about!). Then there is usually a pile on in the comments of other people saying that they cannot see this upsetting change (that they are fully prepared to be equally as upset about). This is an example of using a feature flag to gradually deploy a feature out to some percentage of users at a time. The change rolls out to one set after the other, giving the Facebook engineers a chance to see if the change breaks things, or makes people upset. I am not sure if they ever completely recall a feature that makes people upset but in theory, they could, and the feature toggle would make this quite easy to do.

The feature flagging folks are fond of saying that the only way to truly test a feature is in the live environment. It’s easy to see how this could be true at a company like Facebook – the scale at which they operate can’t easily be emulated in a test environment. There’s real value, even at a smaller scale, to being able to see how a new feature operates in the live environment, but to be able to do so safely without breaking things for all users at once – or, if things are broken, to toggle the feature off quickly.

The Dream

So, I though about how to apply this to our content launches, and came up with this as a goal:

  • Separate deploy from launch. If they are two separate things, then the pain of deploy doesn’t have to occur along with the time pressure of launching.
  • Deploy gradually – and automatically – as content becomes ready
  • Flip a switch on launch day

4. How we do content dark launching

The Transporter

The first thing I thought about was how to allow us to deploy gradually, over time as the content was prepared, long before launch day. Again thinking like a developer, I thought this process should be as automated as possible. I came up with the idea of the Transporter.

 

Unfortunately, I was too busy running launches to actually implement my idea! So my co-worker Sarah stepped in and created the Transporter.

The Transporter is a RAMP extension. Here’s how it works:

  • After we identify a section as being ready to dark launch, we tag all pages and content items with a unique dark launch category.
  • In the Transporter configuration page, we make this category active – this is a setting.
  • We RAMP this setting from our content staging site over to live.
  • On a daily basis, the Transporter creates a RAMP batch (on stage) with all content in that category that has been changed in the last day.

What Sarah found with RAMP is that it’s very easy to create batches programatically, but impossible to send batches programmatically. So every morning the analyst in our group checks for Transporter batches and sends any that need to be sent. As it turns out, it’s good to have a human involved in sending the batches for the very reasons that we have problems with launches: there’s always stuff we don’t know about, and it’s usually not noticed until a RAMP batch is built, sent, and checked on the live site.

Doing this on a daily (or close) basis means that we have lots of time to catch problems and fix them, and the slow nature of RAMP isn’t a big deal, each batch is pretty small, and we’re not under deadline, anyway.

Finding the edges

Next I started to think about hiding the content during the dark launch phase. I found this useful blog post by Martin Fowler, who writes a lot about continuous deployment and related topics including feature toggles:

“Toggle tests should only appear at the minimum amount of toggle points to ensure the new feature is properly hidden. There could be many screens in the pet survey feature, but if there’s only one link on the home page that gets you there, then that’s the only element that needs to be protected with the toggle tag. Don’t try to protect every code path in the new feature code with a toggle, focus on just the entry points that would lead users there and toggle those entry points.”

Well, for content on our web site, the entry points boil down to these three places:

  • Mega menu
  • Links within already launched content
  • Sometimes, the left nav

Figuring out where the entry points are for a content section is something I call “finding the edges”.

The mega menu is always easy. It’s manually created, rarely changes, and easy to RAMP. There’s really no need to dark launch mega menu changes, we can just do them on launch day.

Links within already launched content is much trickier. It’s usually not ideal to make these changes on launch day, and they’re pretty common.

We’ll get to the left nav in a minute.

The Deflector

The next thing I thought about was how to selectively re-route links within already launched content when they point over to dark launched content.

For this my idea was more vague – I thought maybe we’d use the Rewrite API, but that was unnecessary. Sarah and I talked it through and again she did the implementation. We call her tool the Deflector (yes, there’s a Star Trek theme here). Here is how it works:

  • Pages to be “deflected” are defined in the deflector settings. For each page in a dark launch section that should be “deflected”, a corresponding page on the older part of our intranet is mapped to that page. Since most of our content is being moved from older parts of the intranet into our WordPress site, there’s always a place we can redirect people to.
  • If a link is followed from within our site, it will have a referrer. If that link is also defined in the Transporter, the Transporter will redirect that request over to the page on the older part of the intranet, using the mapping in the Transporter settings.
  • If a request for a dark launched page has no referrer, then the request didn’t come from within the site, so the Transporter allows it through.

This means that anyone who knows the direct URL can get through to the page by pasting it into a browser, or clicking on it in an email. So anyone who needs to can review the dark launched content, but no one can stumble onto it by clicking a link in a publicly available part of the site.

From a technical perspective, it’s actually pretty simple. The Transporter checks the $_SERVER array for any URL requests coming into the site that are listed in the Transporter configuration. If the URL is listed in the config, and if the $_SERVER array shows that there’s a referrer from within the site..  it redirects the request over to the page on the old part of the intranet.

So: truth in advertising, we haven’t used the Deflector yet. It’s tested and we know it works, and we’re ready to fire it up. Just haven’t done it yet.

The left nav

Sometimes, we do get into a situation where there are dark launched pages within an existing, publicly visible section of the site.

The left nav is automatically generated by a custom walker. This means that any page in our page tree will show up in its right place in the left nav. It’s not a manually built menu, because that would be impractical in a site of this size.

So, if we are dark launching a new section within a publicly visible section of the site, we need to hide the dark launched pages from the left nav when we’re in the non-dark-launched area… but allow the dark launched pages to be visible in the left nav when in the dark launched section.

Sarah wrote a filter that uses the Transporter settings to achieve this. It simply filters what the custom walker shows, based on those settings.

What about search?

There is of course another possible ‘edge’ or route into content on the site – the search engine. However, the way our search engine works – the way most search engines work – is that it follows links to index content. Content that isn’t accessible via a link doesn’t get indexed, and never shows up in search results.

So, if our dark launched content is not accessible via links the search engine will never find it.

5. Where we are now

Establishing the content dark launch process has been part of an overall maturation in our processes for the site. Around it we have set up a process that now includes opportunities for our governance staff to ensure the new content meets standards for the site, and gives the dev team enough information and time to get everything fully in place prior to launch day.

Readiness review

  1. We review for readiness. Is the structure of the new pages largely in place? Are the pages in the right templates? Those things are harder to change after dark launch has begun.
  2. Are the landing pages that are above the dark launch section in the page tree okay to be visible? A new section often means changes to these landing pages, and often those changes cannot be dark launched. We need to know that going in.
  3. The dev team analyzes the content to identify all the entry points, and all the content items involved.

Dark launch begins

  1. The dev team creates the dark launch category, tags the content items, configures the Transporter.
  2. In the future, we’ll also configure the Deflector.
  3. Dark launch begins and batches are sent daily when content is changed. Problems in RAMPing are resolved as they occur.

Launch planning

  1. Launch date is set, with at least a three day content freeze built into the schedule prior to launch.
  2. Governance staff member who works with content owners provides the dev team with exhaustive documentation of what is launching.
  3. Dev team prepares detailed launch plan based on this documentation and experience in dark launching the content.

3-2-1-GO!

  • Content freeze 3 days prior to launch
  • Final dark launches occur
  • On the day of launch, RAMP Transporter and Deflector configuration changes to “flip the switch” and make it all visible. Mega menu changes occur at this time as well.

Problems solved?

Remember the three root causes of our problems…

  • Stuff we didn’t know about (because non-technical content owners just don’t know to tell us) is only discovered after it’s visibly broken on the live site after launch
  • Stuff is broken after RAMPing
  • RAMPing takes a loooooong time

Well, we still have all of these problems. However, they just don’t matter as much, because with dark launching we have time to deal with all of them. No more scrambling to frantically fix things while the site is broken and we’re under deadline.

6. Unanticipated benefits

Although content dark launching was something I devised in response to a very large pain point, as is often the case, the benefits go beyond just making that pain better. I didn’t plan to discuss this in my talk, but at the end I put into words a few thoughts on this that came to me as I was speaking. I won’t try and reproduce that here, but if you want to hear it, scrub forward to the 33:00 point in the video.  It’s brief, only about a minute at the very end of the talk.