In 2015, I wrote an article, Visual CSS Regression Testing 101 for Front End Developers, where I covered the two competing philosophies of Visual Regression testing, Comparative vs. Baseline. Since then, PhantomCSS was sunsetted as PhantomJS was not as good as running Headless Chrome and BBC's Wraith works but wasn't ever as useful as I'd of liked.
What is Visual Regression Testing?
There are other primers on the concept, but it's worth quickly covering visual regression testing. In the course of development, CSS/JS/templating changes can potentially have unintended changes on your website or web app. Visual Regression Testing seeks to automate the laborious task of comparing visual elements to see if any unexpected changes have occurred. This is performed by running scripts with headless web browsers to render the webpage, then capturing its renderings, and using a show diff tool to compare the screenshots, flagging changed elements for review. Once approved, the latest changes are "approved" as the gold master and then saved to compare against next time you run the test.
Now, four years later, Backstop.JS emerged, mixing (mostly) the best of both Wraith and PhantomCSS.
Back when I first investigated visual regression, I spent time discussing baseline and comparative tools. Baseline visual regression tools in the talk I attended were complete screen renders, whereas the comparative tools could query individual DOM elements. In hindsight, The distinction between baseline and comparative is somewhat of a moot one, as comparative tools can do baseline checks as they're able to query the screen, be it the entire
body. That said, tools like Wraith that only do full-page screen renders can't make individual element selection thus are far more limited. At this point, I doubt either term gets much play, nor does it need the distinction as people have gravitated naturally to a tool that can query DOM elements.
Backstop.JS gets major points out the gate as easy to use. Just run the global npm installer, then navigate to your project directory and run
backstop init. It'll create a boilerplate template ready for you to start writing tests. This a serious upgrade, considering I once wrote a 12-step guide on how to install PhantomCSS.
Running tests is also easy, run
backstop test from the root directory and backstop will take care of the rest. Approving a batch of changes is easy, just punch in
Next up is formatting: All the tests are created in using JSON, which is easy to read and familiar. I've never been super into YAML, and I like JSON. Everyone likes JSON.
Where Backstop shines is how quick I went from never having written a test to having queried a roster of visual elements found our company website. Start up by declaring a set of screen sizes, and I created my own mobile, tablet, desktop, and large desktop screen sizes.
My first tests were entire pages, then I quickly graduated to advanced Backstop, testing our mobile menu. The mobile menu had a few considerations:
- It must be clicked
- It only makes sense to test it on a mobile resolution
- There's a delay for the animation
And there you have it; my mobile navigation is being tested against JS breakage and CSS changes. I'm fairly impressed. There's even integration for Running custom scripts. The only hiccups I've had is with AJAX content. I used remove element to hack out the DOM elements, which created reliable elements to test around the AJAX content, and for the AJAX content itself, I used the readySelector.
Lastly, chaining events is a bit cumbersome as you'll be coding up scenarios, but its still much less overhead than the days of PhantomJS.
Chaining Backstop to deploys
The next step is to chain backstop test to deployments. The demo shows Backstop playing with Jenkins deployments. At my office, we use bitbucket pipelines. It's a matter of translations.
The gif work flow is pretty straight forward with Visual Regression testing, ignore the test folders, and track the gold masters. Backstop creates a new timestamped directory for each test in
/backstop_databitmaps_test for each test. Depending on the number of tests, you run, it's easy to churn out hundreds of megabytes of images, so be prepared to have a trash collection method if you're running via a deployment method that might require such.