Gated Migrations: my gift to Steemit devs, and the world! A superior testing framework.

[ This was posted to Steemit earlier today, located here: https://steemit.com/steemgigs/@libertyteeth/gated-migrations-my-gift-to-steemit-devs-and-the-world-a-superior-testing-framework ]

My Background

I have spent decades in software development, wearing the three major hats of Engineering: Developer, Builder, and Tester. This has given me a broad view of the software development lifecycle, and contributed to my development of the system I describe below, which I have termed “Gated Migrations.”

New Definition of “IPO” 🙂

I view everything as having three attributes:

  1. inputs;
  2. processing; and
  3. outputs.

Thus, the input to developers is the specification; their process is to write code; and their output goes to the build team.

The build team’s process is to repeatedly and deterministically build from the source code the developers provide; and their output goes to the testers.

The testers’ process is to verify functionality is working; their output goes to the customers.

Where Did This Come From?

At one point, between “software companies,” I worked at a company which provided services for people who phoned in. They were not a “technology” company; they were a phone bank, with some developers. It was the worst company I’ve ever worked for, in terms of their use of technology.

They had several environments, all of which were tested in manually (and, the manual testers played games at their desks while waiting for the next test cycle, really unprofessional – and with management’s consent!):

  1. Production;
  2. Staging;
  3. QA;
  4. Dev testing.

The final one was “Production”, which had all the hardware. There was “Staging” which was very similar to Production, but was lacking some hardware, so not all tests could be performed. I would not do it this way; Staging should be identical to Production, because you would want all the tests to run and pass, prior to deploying to the customer-facing environment.

Before that there was “QA” with even less hardware, and then “Dev testing” which was on the developer’s machine, and generally only tested the software; some devs had some hardware.

So, Tell Me About the Gates?

Microsoft’s Team Foundation Server (TFS) has a feature called “Gated Checkins”, where rules can be configured to run tests when a developer does a checkin, and refuse the checkin if the tests fail. This is excellent, but it couldn’t run hardware tests. Developers were constantly checking in code that broke the tests, which fortunately were discovered (mostly!) in environments prior to Production.

Bugs would often make it to Production and then “failure to properly plan” on the developers’ part became an “emergency” on the rest of the company’s part. “Fighting fires” was a routine operation, and should not be.

One major benefit to the Gated Checkins feature was that code that broke the tests would never make it into the codebase, which is much better from an organizational perspective. Other source control systems have similar features.

The idea of the “gates” in “Gated Migrations” is so that tests can be performed in an environment until they’re done, and then if more code had been checked in, the environment can be immediately refreshed and more tests performed. I’ve used Mozilla’s Tinderbox before and the idea is similar, although that was more “only for builds” whereas this tests the code changes all the way through to Production, automatically. More recently I’ve used Hudson which turned into Jenkins, and is an excellent CI (Continuous Integration) framework. I would probably use Jenkins as the basis for this system.

One caveat to fully automated deployment: having a human decide when Production gets updated is probably a good thing – but, I designed it to be flexible, so it would be able to help at multiple companies, not only the one I was in; that’s a feature I try to put into the automation I develop – I want it to be able to apply as widely as possible, within reason. It may take a little more time to produce, but the added configurability makes it more powerful, and usable in many more situations – meaning, it’s more commercially viable if I wanted to go that route. And even if I don’t (I’m giving it away, here), even if it’s open source, I would want it to be able to help as many as possible.

The Process, Broken Down

So, a developer has made changes to code. They check it in, and a Gated Checkins process verifies that the code passes rudimentary tests, and allows the checkin if so.

Note that “Gated Checkins” is the Microsoft TFS term for it. Other version control systems have “hooks” and “conditions”, which are similar – if it fails the test, it’s not checked in. And, this is not essential to the “Gated Migrations” system; it works in conjunction with it, helping to reduce inefficiency – the idea being that the checkin tests should happen quicker than an entire pass through an environment.

The system waits until the first environment is available. At that point, it checks to see if other unrelated code changes had been performed. For instance, let’s say there are modules A, B, and C. One developer changes A; another developer changes B; and three developers make different changes to C (call them C1, C2, and C3), all before the environment becomes available.

This then (without any additional code changes happening while the tests are ongoing) would require three passes through this (and, each) environment:

  1. The first pass would include A, B, and C1;
  2. the second pass would test C2;
  3. and the third pass would test C3.

What’s a Pass?

Each “pass” would entail several steps:

  1. Refresh the environment;
  2. Deploy the new modules;
  3. Notify automated and manual testers;
  4. Wait for test completion;
  5. Wait for new code needing to be tested; and
  6. Repeat.

Refreshing the environment is generally as simple as reverting a VM (or several) to a state where the module hadn’t been deployed yet, and also perhaps sending a refresh signal to specific hardware, or automating a power cycle for specific hardware (and waiting for the device to be ready).

Deploying the new modules can happen in several ways. Some source code might be “as is”, i.e., HTML, CSS, etc. Others might be executables, built by the build team. And still others could be installers, also generally packaged by the build team. So the system would either copy the files or executables, or run the installer. If anything failed in this section, it would report an error and the associated module’s checkin would be flagged as potentially suspect – although “something else” may have gone wrong, for instance a power or network glitch; detecting for such would make the system more robust.

It would deploy as many separate modules as possible, while testing only one code change at a time, per module (as described above with A, B, C1, C2 and C3). The idea being “change only one variable at a time” – if we were to test changes by two different developers (e.g., C1 and C2) and the test failed, it would not be certain which developer to notify; in that situation, I’d have it notify both developers, with wording to indicate that it might be either or both that caused the failure. And of course, scaling to “N” developers, not just two; some tests may take hours, meaning several devs could check in during that time.

Also note that some tests may test multiple modules, so if there was a test which tests both A and B, and it failed during the first pass above (testing A, B, and C1), the developers who checked in to A and B would both be notified, with wording as above.

And, it could flag those checkins internally, so that it wouldn’t deploy them for future tests – in other words, if A and B failed in the first pass of the example, then when it went to test C2 on the second pass, the A and B modules it would deploy to that environment would be “A minus 1” and “B minus 1”, in other words, the most recent checkin to have passed the tests. Note that this paragraph describes “extra credit” as it could just keep failing until a dev fixed those errors and checked in new code. I’d probably leave this out of the first version, but put it on the roadmap. And, there may be situations where modules are interdependent, for instance A makes calls to a function in B, and that function changes; if B failed but A didn’t, then the next pass, “B minus 1” is deployed which has the old API call, and A tried to call the new one, and then A fails! Investigating that failure of A would be time wasted, and also might cause “A minus 1” to be deployed next along with “B” (which would fail since they’re again mismatched). This also might cause an alternating cycle, until new code is checked in to either A or B, and we’d want to avoid that situation as well.

Fortunately, a lot of what I describe in the above is “edge cases”, i.e., they don’t happen generally. However, we need to code in the “edges” so the system doesn’t “fall off the table.”

Notifying the manual and automated testers would be in two parts: the first being an email, page, text, instant message, etc. to the manual testers; and, an API call for the automated ones. When manual testers completed, they’d click a link on a web page (or perhaps in the email it sent); and when automated tests completed, they’d perform another API call to inform the system that the automated tests were complete.

The system would wait until it heard back from all manual and automated testers that it had notified. It would also have a configurable timeout; for instance if tests generally took an hour, it could be configured to 2 hours to notify management. It could also take metrics on run times, and set the timeout dynamically, with configurable thresholds, e.g. “200%”, or “150%”, etc. And, I’d have it generate two emails, “warning” and “emergency”.

In fact, another “extra credit” item could be to force-stop hanging tests. There could be three settings, the above “warning” and “emergency”, and a third “forced restart”, with them set to, for instance, “120%”, “200%”, and “300%”. So if the test usually took an hour, it’d alert management with a warning if the tests ran 12 minutes over time; with an emergency if the tests ran an hour over; and if nobody responded, it’d reset the environment if the tests ran two hours over, and refresh it for the next set of deployments to be tested.

Another aspect to the above “extra credit” would be that there should be a “hang on, we’re investigating this environment” button so it would not do the “forced restart”. Some bugs may require developers to use a debugger in the environment, for instance with hardware the dev doesn’t have on their machine.

Once management is alerted to a “warning” or “emergency”, they could look into whether a test hung, or perhaps a manual tester was out, or on break/lunch/vacation, etc. Hmm, another “extra credit” – the system should integrate with HR and be able to “know” when people were out. It could have primary/secondary/etc. for each test, so it would know who to notify in order to get the tests completed in the least amount of time, regardless of who was in the office that day.

Once the system was notified that testing had completed, it would check for new code, and if any code had been checked in while the tests had been running, it would immediately refresh the environment and start the process over again. One goal being, to keep the tests running as close to 100% of the time as possible.

Environmental Expansion: Cloning and Splitting

The system would maintain metrics on environmental usage. If an environment was in use for more than 90% of the time for a certain period of time (configurable; perhaps a day, or a week – also, the “90%” value should be configurable as well), it would email management suggesting that they clone the environment; for instance, the tests that run in Staging are generally more than those that run on the developer’s machine, meaning they’ll generally have a longer run time. So they could set up “Staging1” and “Staging2”, and the system would then choose to deploy to them in round-robin fashion; whichever was available would be deployed to, and if both were available, Staging1 would be deployed to. (Again, this expands to “N”.)

Additionally, environments could be “split” e.g. “Staging1A” and “Staging1B”, where A would run the tests for module A, and B the tests for module B, etc. The system could be configured to automatically alert management when an environmental split would be advantageous; for instance, it could be configured to send an email when tests run longer than an hour.

Feedback is Essential!

The more time between the developer checking the code in, and getting feedback that a test had failed, the more time (generally) it’ll take to fix it. If the developer is informed immediately, they generally still have the code “in their head” and can make a fix more quickly than if they have to go back over it – for instance, if they had moved on to fixing a bug in another module. So it’s a good goal to have tests completed ASAP, and both cloning and splitting the environments can help achieve that.

Of course, management has to balance that against expenses; each environment has a cost associated to set up and maintain. For instance, additional VM(s), and perhaps more hardware to be tested with – which could be very expensive, for instance nuclear plant control systems (I’ve worked at a nuclear plant, but it was at the beginning of my career so I didn’t have this system in mind, back then).

This section aligns with what I learned in Psych 101, that correction after errant behavior should come as soon as possible after the behavior; hitting your dog an hour after they peed on the floor, for instance, it might not associate your wrath with its urination.

Branching

Source control/version control systems like Subversion, GIT, Perforce, etc. can be configured with multiple branches. For instance, with the above environments we could have the following branches:

  1. dev
  2. qa
  3. staging
  4. production

A developer would (generally) only have permissions to check in to the dev branch. The system would perform those tests, and if it passed all the tests, it would automatically merge the code to the qa branch, which the QA environment would be configured to deploy from. Similarly for staging and production, as long as the tests passed in the previous environment.

If there was a bug in a branch after dev, then the team lead or management would have permissions to check in to the other branches to fix it; or, the developer could be given temporary or permanent permission, if the team is smaller for instance.

I Can’t Develop This 🙁

I’m recovering from four concussions in the past four years, which have disabled me. The major symptoms are forgetfulness, headaches, irritability, and sleep issues. The forgetfulness is the worst, because while developing software one needs to maintain multiple variables in one’s head; as these leak out, my development speed slows way down. This causes headaches, and the irritability causes me to yell and swear and cry at the computer. This would not be acceptable in a business environment, and also, I don’t like doing those behaviors, so I’ve stopped trying to develop software, for now.

If anybody wants to run with this idea, it’s all yours.

If you’d like me involved, I’d be happy to help in any way I can.

 

Credits

This post was inspired by this comment which I had made to @grow-pro. Thanks to him for the encouragement to flesh this out!

Thanks also to him for the wonderful logo at the top!  My mock-up is the smaller black-and-white “birds between gates” and he did a great job taking my vision and turning it into something more visually appealing.  Thanks again!