Thursday, April 4, 2013

Three tales of browser-based test suites [Part 2] - test harder

In Part 1 I wrote about my initial experience testing with Selenium. In this part I will write about what I learned about browser-based tests on my first project working for ThoughtWorks. It was a step up in terms of coverage and complexity and that also brought its own set of problems.

The application we were working on is essentially a massive single-page JavaScript app but written by multiple teams. This made testing the different parts in integration quite a challenge. To handle that, an effort had been started to build up a suite of cuke4duke tests instrumenting the browser via Selenium.

All the teams had build monitors in their rooms and the department lead was thankfully very insistent that people follow the rule to not commit on a red build. To make communication about the status of the build a little bit easier, we were using a skype chat room. This is something I now try to do on bigger teams all the time as the persistent chat history allowed people to stay up to date even if they came in late or were out in meetings.

Over time the tests grew until they took about 30 minutes to run. In combination with a bit of flakyness in third party systems and the general brittleness of Selenium this lead to a situation where people were unable to commit for multiple hours and in one case even for two whole days. This was obviously not a good situation and so we set out to find ways to improve this.

A lot of effort was put into creating fake versions of the third party backend services. This wasn't possible for all the external systems but what was there already helped to reduce false negatives.

Developers and QAs also worked more closely together to figure out better ways to handle timing related issues with Selenium and to focus on what specifically was tested on that layer. I don't think we were aggressive enough here though as there was quite a lot of tests that were redundant between unit and even between functional tests.

Another problem was cuke4duke itself, mainly when trying to parallelize tests. Since the gherkin tests didn't need too much updating for it, the choice was made to switch to JBehave.

There was also a dedicated pair chosen each week that would take care of build issues and work on improving the testing framework. This made it possible to do things that might otherwise seem a bit daunting (like switching to JBehave) but also meant a considerable investment in what was "just" maintenance.

With all of this in place one of the operations guys got into setting up more and more environments to run Selenium Grid against. This sped things up considerably. As mentioned in the last post I would now recommend doing this with SauceLabs but the general idea is still the same. If you can run tests in parallel, things go a lot faster.

When I left, there were about 600 JBehave tests around that took about 10 minutes to run as part of the CI run. As a comparison, the 2000-ish Javascript tests took a little over a minute on a dev machine. And I'm tempted to blame part of that minute on being run as part of maven. These days, I'd love to try lineman for such a front-end heavy project.

So, again, here's my takeaway points:

  • consideration of what tests can be moved out of being run in a browser takes some dedication. After spending time to write all those tests people are very reluctant to give up on them. Especially since you're likely to perceive it as a higher risk than it actually is.
  • Using Gherkin on top of a decent page model seemed to be useful for the QAs and helped with collaboration on the tests. But in a smaller team I'd probably still opt for running test straight from JUnit
  • this was an interesting case in balancing between investing in the testing setup versus eventually ignoring brittle, long-running tests. The latter seems a lot more common and it took a very strong-minded department lead to avoid that.
  • again it was very important how long tests would be running. The shorter the more likely it would be that devs would actually run the tests before committing. I still think 10 minutes seems to be the absolute maximum there.
  • having one massive Javascript application developed by multiple teams is probably not something I would recommend
  • having a build chat was a very simple but extremely useful tool. People were more likely to fix problems quickly rather than just trying to find someone to blame and then ignoring the problem
  • I also spend quite some time trying to understand Selenium's quirks to avoid the dreaded Thread.sleep() that seems all too common. For larger test suites this quickly adds up. 

In part 3 I would like to talk about a project where QAs and devs didn't share ownership. Considering that I'll probably be too lazy to write that for another couple of months, here's the abstract: don't do it that way.