Ruby vs. MySQL (or, the day I learned about Benchmark)

ruby rails

Tue Aug 05 03:27:00 +0000 2008

I was writing a small Rails app for work to allow faculty and students to sign up for various events. So I had a Signup model :has_many timeslots that :has_many people. I needed to retrieve the email addresses for people that had signed up for timeslots after a certain time (i.e. all the people that had registered for a timeslot after Saturday at 10:00). Since the timeslots may not be in chronological order, I couldn’t find timeslots by a timeslot id.

So as one who is a few steps from Rails expert status, I had three methods to retrieve the email addresses. As you can see, they all do basically the same thing but in different ways. Option 1 uses nearly pure SQL, while options 2 and 3 rely more on Ruby Array methods. I assumed the SQL would be faster, but I wasn’t sure how much.

def email_timeslot_and_later_1  
  Person.find( :all,  
                 :select => "people.email",  
                 :joins => "INNER JOIN timeslots ON people.timeslot_id = timeslots.id",  
                 :conditions => ["signup_id = ?", self.signup_id],  
                 :group => "people.email",  
                 :conditions => ["time > ?", self.time]  
                 ).collect(&:email).join(", ")  
end

def email_timeslot_and_later_2
  Timeslot.find(:all, :conditions => [ "time >= ? ", self.now ], :include => :people).collect(&:people).flatten.collect(&:email).join(", ")  
end

def email_timeslot_and_later_3  
  Person.find(:all, :conditions => [ "timeslot_id in (?) ", Timeslot.find(:all, :conditions => ["time > ?", self.now]) ] ).collect(&:email).join(", ")  
end

Then I stumbled across the benchmark method in Active Record (it’s been there for a while, I know, but I don’t have all the methods memorized yet:-) ). So I ran the following code:

Timeslot.benchmark("test email 1") do  
	1000.times { Timeslot.find(23).email_timeslot_and_later_1 }  
end

Timeslot.benchmark("test email 2") do  
	1000.times { Timeslot.find(23).email_timeslot_and_later_2 }  
end

Timeslot.benchmark("test email 3") do  
	1000.times { Timeslot.find(23).email_timeslot_and_later_3 }  
end

And the results?

test email 1 (19.00181)  
test email 2 (264.22156)  
test email 3 (134.85444)

The SQL-based ActiveRecord find command is substantially faster (at least 7 times) than those involving the combination of returned model collections. The morals of the story:

  • Use SQL whenever possible. It’s not cheating :-)
  • Utilize the benchmark command more often to find bottlenecks and speed bumps
  • Study the documentation more. I continue to be surprised at what Rails has built-in
blog comments powered by Disqus