Rails Threading Gains (Part 2)

rails-threading-gains-part-2

Introduction

Last time we talked about a set of simple Ruby classes that can be used in your Rails application to perform operation in parallel on separate worker threads, Part 1. The classes handle re-joining when the workflow is complete as well taking care of ActiveRecord connections for you.

Today we'll discuss some small improvements we can apply to our other classes - specifically around the use of a ThreadPool (and the resulting inherent concurrency limiting).

Why use a ThreadPool?

The allocation (creation) of threads is not a free operation. Whenever a new thread is allocated, your system needs to create a whole bunch of memory structures internally (about ~50K for each thread for the stack copy). Then, when your thread is complete your system need to garbage collect those memory structures. These operations consume valuable time and memory resources on your system - and depending on your frequency usage could have a real impact.

To get around this we allocate pool(s) of X threads in advance - and re-use those threads whenever we need to. Whenever an operation is required, we pick a thread from the pool and use it to perform the operation and return the result.

What about Concurrency Limiting for a Task?

Well, now that you have a set of ThreadPools (each of which having as many threads as you like) - by simply allocating tasks to that specific ThreadPool you automatically rate-limit the concurrent jobs to the number of threads in that pool. For instance - want 100 jobs to run 3 at at time? Then simply allocated them to a ThreadPool of size 3!

Implementation

The basic implementation resolves around a few core concepts:

We add a simple ThreadPool class
We re-use [the Classes](LINK TO PREVIOUS POST) from the last blog in this series. NOTE: that we'll make some changes to the ThreadRunner class!

Class: ThreadPool

First we need to define a class called ThreadPool which encapsulates our ThreadPool logic.

require 'thread'
class ThreadPool
 # mutex instance creation
 INSTANCE_MUTEX = Mutex.new

 # a pool of 15 threads
 def self.pool_of_15
  # initialize thread pool
  INSTANCE_MUTEX.synchronize do 
   @pool_of_15 ||= MultiThreading::ThreadPool.new(pool_size: 15)}
  end
 end

 # a pool of 3 threads
 def self.pool_of_3
  # initialize thread pool
  INSTANCE_MUTEX.synchronize do 
   @pool_of_3 ||= MultiThreading::ThreadPool.new(pool_size: 3)
  end
 end

 def initialize(pool_size:)
  @pool_size = pool_size
  @operations = Queue.new
  @thread_pool = Array.new(pool_size) do
   Thread.new do
    catch(:exit) do
     loop do
      block, lambda_context = @operations.pop
      block.call(lambda_context)
     end 
    end
   end
  end
 end

 def schedule(lambda_context, &block)
  lambda_context.exit_code = nil
  @operations << [block, lambda_context]
 end

 def shutdown
   @pool_size.times {schedule(nil) {throw :exit}}
   @thread_pool.map(&:join)
 end
end

Class: ThreadRunner

We now update our previous ThreadRunner class:

attr_accessor :lambda_contexts

 def initialize(lambda_contexts: [])
  @lambda_contexts = lambda_contexts
 end

 # executes and returns true or false if success or fail
 def execute(thread_pool: ThreadPool.pool_of_15)
  @lambda_contexts.each do |lambda_context|
   # enqueue the job in the thread pool
   thread_pool.schedule(lambda_context) do |context|
    with_connection do
     begin
      params = context.parameters
      context.stdout = context.lambda_definition.call(*params)
      context.exit_code = 0
     rescue => exc
      context.stderr = exc.message
      context.exit_code = -1
      context.stderr_backtrace = exc.backtrace.join("\n")
     end
    end
   end
  end
  # now we wait for all lambda_contexts to complete
  # lets run a tight loop here for now
  while @lambda_contexts.any? {|lambda_context| lambda_context.exit_code.nil?}
   # sleep for an arbitrary length
   # we can also introduce a timeout here
   sleep(0.02)
  end
  # return exit codes for all our operations
  @lambda_contexts.all? {|lambda_context| lambda_context.exit_code == 0}
 end

 # returns errors after execution is complete
 def after_execution_get_full_errors
  errors = @lambda_contexts.select do |lc| 
   !lc.stderr_backtrace.blank?
  end			
  errors.map do |lc| 
    "EXCEPTION: #{lc.stderr} AT #{lc.stderr_backtrace}"
  end
 end

 private

 # ensures that connections are handled in the case of 
 # Ruby Timeouts (evil)
 def with_connection(&block)
  ActiveRecord::Base.connection_pool.with_connection do
   yield block
  end
 rescue Timeout::Error => exc
  ActiveRecord::Base.clear_active_connections!
  ActiveRecord::Base.connection.close
  raise exc
 end 
end

Example Usage

Now, using the classes above for our workflow as defined above we can simply do the following:

# create thread_runner object
thread_runner = ThreadRunner.new

# add lambdas to the thread_runner
2.times do |idx|
 # define the method lambda
 lambda_def = lambda {|svr_idx| create_server(svr_idx) 
 thread_runner.lambda_contexts << LambdaContext.new(lambda_def, idx)
end

# execute all lambdas in parallel
# and wait for them to complete (with max of 3 concurrency)
all_ok = thread_runner.execute(thread_pool: ThreadPool.pool_of_3)
# OR wait for them to complete (with max of 15 concurrency)
all_ok = thread_runner.execute(thread_pool: ThreadPool.pool_of_15)
# spit out errors if any failed
unless all_ok
 # do something
 puts thread_runner.after_execution_get_full_errors 
end

... profit!

Conclusion

In this blog post, we looked at an enhancement to our previous multithreading solution in Ruby to provide the use of a ThreadPool and concurrency limiting.

Now go code!

Check out part I - 'Ruby Mutex Mayhem'.
Also, read part I 'Rails Threading Gains'