Log slow Rails SQL queries

Rails framework comes with a instrumentation API which allows developers to hook into various internal events, being that view rendering pipeline, request processing or any custom application events. Obviously there's a lot of different instrumentation frameworks/products on the market that work with Rails, however, the majority of tools are still utilizing the core API, powered by ActiveSupport Notifications.

In the context of this blog post we will instrument the slow SQL queries that take more than certain amount of time to execute and print them out to STDOUT or into the JSON-formatted log file, whichever suits your needs.

First, let's take a look on how the ActiveSupport Notification subscriptions work. The easiest way to create a subscription is via a Rails initializer file. Let's have a look at the example from the documentation:

ActiveSupport::Notifications.subscribe('render') do |name, start, finish, id, payload|
  name    # => String, name of the event (such as 'render' from above)
  start   # => Time, when the instrumented block started execution
  finish  # => Time, when the instrumented block ended execution
  id      # => String, unique ID for the instrumenter that fired the event
  payload # => Hash, the payload
end

All subscribers follow the same block signature that includes event name, timestamps and the payload. Payload container is the hash that holds various metric data that's defined by the event producer. In our case, we want to subscribe to all SQL queries executed via ActiveRecord::Base and log the ones that are too slow.

Create a file config/initializers/rails_log_slow_queries.rb:

ActiveSupport::Notifications.subscribe("sql.active_record") do |name, start, finish, id, payload|
  # Print out payload for testing purposes
  pp payload
end

When you restart the Rails server and visit a few pages you might notice there's a few extra lines in the log that will look like this:

{
  :sql=>"SELECT  \"users\".* FROM \"users\" WHERE \"users\".\"id\" = $1 ORDER BY \"users\".\"id\" ASC LIMIT $2",
  :name=>"User Load",
  :binds=>[
    #<ActiveRecord::Relation::QueryAttribute:0x00007fdfda949d98 @name="id", @value_before_type_cast=1, @type=#<ActiveModel::Type::Integer:0x00007fdfeb20b7a0 @precision=nil, @scale=nil, @limit=nil, @range=-2147483648...2147483648>, @original_attribute=nil, @value=1, @value_for_database=1>, #<ActiveRecord::Attribute::WithCastValue:0x00007fdfda948fd8 @name="LIMIT", @value_before_type_cast=1, @type=#<ActiveModel::Type::Value:0x00007fdfeb127f50 @precision=nil, @scale=nil, @limit=nil>, @original_attribute=nil, @value=1>
  ],
  :type_casted_binds=>[1, 1],
  :statement_name=>nil,
  :connection_id=>70299906117300
}

This essentially shows all the ActiveRecord SQL queries flowing through the application, and it's great. We have access to the raw query (as well as binds) and we also have timing information (start/finish args). Now, lets go a bit further and setup a logger class that would log the slow queries:

# lib/slow_query_logger.rb

require "json"
require "digest"
require "logger"

class SlowQueryLogger
  def initialize(output = nil, opts = {})
    # will log any query slower than 500 ms
    @threshold = opts.fetch(:threshold, "500").to_i

    @logger = Logger.new(output || STDOUT)
    @logger.formatter = method(:formatter)
  end

  def call(name, start, finish, id, payload)
    # Skip transaction start/end statements
    return if payload[:sql].match(/BEGIN|COMMIT/)

    duration = ((finish - start) * 1000).round(4)
    return unless duration >= @threshold

    data = {
      time:       start.iso8601,
      pid:        Process.pid,
      pname:      $PROGRAM_NAME,
      request_id: RequestContext.request_id,
      duration:   duration,
      query:      payload[:sql].strip.gsub(/(^([\s]+)?$\n)/, ""),
      kind:       payload[:sql].match(/INSERT|UPDATE|DELETE/) ? "write" : "read",
      length:     payload[:sql].size,
      cached:     payload[:cache] ? true : false,
      hash:       Digest::SHA1.hexdigest(payload[:sql])
    }.compact

    @logger.info(data)
  end

  private

  def formatter(severity, time, progname, data)
    JSON.dump(data) + "\n"
  end
end

The logger class above wraps Ruby's standard logger writer and conforms to ActiveSupport Notification call signature. Each log message includes a bunch of useful info, like process ID, program name, request ID (useful to correlate with HTTP logs), compacted SQL query (for better readability), kind of query (read/write), size and query hash. Query hash is useful since you can use the SHA1 hash and search your logs to see all of the query occurrences over time.

Finally, to make use of the logger we would setup subscription as follows:

# config/initializers/rails_log_slow_queries.rb
if ENV["RAILS_LOG_SLOW_QUERIES"] == "1"
  require "slow_query_logger"

  # Setup the logger
  output = Rails.root.join("log/slow_queries.log")
  logger = SlowQueryLogger.new(output, threshold: ENV["RAILS_LOG_SLOW_QUERIES_THRESHOLD"])

  # Subscribe to notifications
  ActiveSupport::Notifications.subscribe("sql.active_record", logger)
end

As a result, the log/slow_queries.log file will contain lines like (added formatting):

{
  "time": "2020-02-18T23:19:03-06:00",
  "pid": 15028,
  "pname": "puma: cluster worker 0: 15020 [railsapp]",
  "duration": 0.322,
  "query": "SELECT  \"users\".* FROM \"users\" WHERE \"users\".\"id\" = $1 LIMIT $2",
  "kind": "read",
  "length": 63,
  "cached": false,
  "hash": "6e277ee23336967d98ad775ae6b24a3b58163696"
}

That's it, the JSON log file then could be shipped to a log provider for further analysis and alerting. You could also set RAILS_LOG_SLOW_QUERIES_THRESHOLD to 0 and log ALL the queries, but that would be unnecessary in most cases.

Gist file for the reference