Testing a Rails app on Semaphore with parallel_tests

We use Semaphore for automated testing. Just push changes and the tests are automatically run. Great. But there is a caveat as the test suite grows. Semaphore kills processes that run longer than an hour. At this point we integrated parallel_tests to reduce runtime. The result is great, overall runtime decreased to about one third.

Setting up parallel tests

Preparing database.yml can’t be done statically as the database name is randomly generated for each run. I wrote a rake task which adds the TEST_ENV_NUMBER variable to the database name:

  1. namespace :testing do
  2.   task :prepare_db_config do |_t, _args|
  3.     if Rails.env.test?
  4.       db_config = YAML.load_file('config/database.yml')
  5.       db_config['test']['database'] += "<%= ENV['TEST_ENV_NUMBER'] %>" unless db_config['test']['database'].ends_with?("<%= ENV['TEST_ENV_NUMBER'] %>")
  6.       File.open('config/database.yml', 'w') { |f| f.write db_config.to_yaml }
  7.     end
  8. end

The test environment is prepared by adding the following two lines at the end of the setup thread:

  1. unset SPEC_OPTS # optionally if you want to use Insights with parallel_tests
  2. bundle exec rake testing:prepare_db_config
  3. bundle exec rake parallel:drop parallel:create parallel:migrate

Because we have two threads (i. e. separate machines that can do stuff simultaneously), we pass each thread a different regexp to tell it which tests to run:

Thread 1   
  1. SPEC_OPTS="--format progress --format json --out rspec_report.json" bundle exec rspec spec/features/
  2. bundle exec rake parallel:spec[spec/models/]

As you can see, feature tests are not run with parallel. We often observed test failures due to lost or closed server connections and couldn’t fix it in considerable time. If you have a solution, I’d appreciate your comment.

The second thread gets the complementary regexp (and also checks whether everyone behaved by running rubocop):

Thread 2   
  1. bundle exec rake rubocop
  2. bundle exec rake parallel:spec[^\(?\!spec/features\|spec/models\)]

If you are not sure whether you missed tests with your regexps, run your tests once with RSpec and then once with parallel_tests for each regexp. If the numbers of the latter add up to the number of tests run by RSpec you know you are right.

Use Insights with parallel tests

Semaphore recently launched a nice feature, Insights, which collects runtime data for tests. For RSpec it automatically injects the required parameters to generate the report. Unfortunately this does not work for the parallel command which runs most of our tests. But after some trial and error I found out how to set it up.

1. Generate JSON reports with parallel

Parallel can forward parameters to each RSpec process. But there is a problem, logging is not properly synced. If all processes write to the same file, the result is not JSON but a mess. So the idea is to use the TEST_ENV_NUMBER variable in the report file name for each process. I couldn’t figure out, how to specify that within the .rspec_parallel file (and doubt that it can be done because the parameters are forwarded within single quotes which prevents variable interpolation) but it worked by passing the required parameters directly to the rake task. Extend the parallel calls in your test threads this way:

  1. bundle exec rake parallel:spec[spec/models/,"--format progress --format json --out parallel_report_\$TEST_ENV_NUMBER.json"]

The first format ensures you get the dots in STDOUT, the second tells RSpec to log the execution details in JSON format to some file (escaping the $ is crucial here).

2. Combining the reports

Insights expects all runtime data to be in a file called rspec_report.json in the root directory of your project. So we need another rake task to combine all logs:

  1. namespace :testing do
  2.   task :combine_runtime_log, [:rspec_log_path, :parallel_logs_pattern] => [:environment] do |_t, args|
  3.     if Rails.env.test?
  4.       rspec_log_path = Pathname(args[:rspec_log_path])
  5.       log_hashes = Dir[args[:parallel_logs_pattern]].map { |path| JSON.parse(Pathname(path).read) }
  6.       log_hashes << JSON.parse(rspec_log_path.read) if rspec_log_path.exist?
  7.       result_hash = log_hashes.reduce { |a, e| a.tap { |m| m['examples'] += e['examples'] } }
  8.       rspec_log_path.write result_hash.to_json
  9.     end
  10.   end
  11. end

The interesting data is found in the 'examples' key of the JSON, the other keys are the same in all files. The task above is called in both test thread as last command:

  1. bundle exec rake testing:combine_runtime_log[rspec_report.json,parallel_report_*.json]

The task is written and called such that it also works, if there is no rspec_report.json which is important in cases all tests are run in parallel. It could also be called by the post thread and therefore would only have to be specified once. But then it gets executed even if one test thread fails which we decided against. Happy insighting!

Update (11/30/2015)

Shortly after I wrote this, Semaphore started to set SPEC_OPTS environment variable on the machines which overrides the format and logging parameters passed to parallel. So in order to get things running again, you need to unset this variable in your set-up thread and pass the specified options explicitly to the tests run with RSpec directly (I adjusted the code snippets above).

Comments (1)

Michael GrosserNovember 30th, 2015 at 17:40

Thx for posting, looks neat 🙂

FYI
You should be able to uncheck ‚write database.yml‘ and then avoid having to have the rake task by using ENV variables in the database.yml

Also to get a more even runtime, try using runtime to partition your suite instead of path prefixes -> -n 2 + –only-group 1 / –only-group 2

Might be better to blow up when not in test env instead of silently noop-ing.

Instead of posting an update it might be easier for readers if you change the article to be in proper working order.

Leave a comment

Your comment

(required)