Everybody knows that this line :
Model.where(column: 'value').update_all(column: 'Another Value')
is better than the following lines :
Model.where(column: 'value').each do |item| item.column = 'Another Value' item.save end
Not because it's shorter, but because it will get executed as one
UPDATE statement, Not as multiple
But it seems that I did this mistake while working on a worker that should bulk insert thousands of records.
My task was to clone N amount of records and update one value for the new clones (records), So I started to work and here was my code :
ActiveRecord::Base.transaction do Model.where(column: 'value').each do |item| new_item = item.dup # Duplicating the Object new_item.column = 'Another Value' new_item.save end end
So even though I am using
DB Transactions thinking that It would be the most optimized solution since I'm committing my records at once but I forgot the fact that I'm executing multiple
So the solution is to execute one
SQL INSERT statement but unfortunately
Rails doesn't support that or
ActiveRecords doesn't. So I had to use a 3rd party gem to do so, and the gem is bulk_insert.
bulk_insert allows you to insert multiple records in one single
INSERT statement along with some other features where you can check in this README file. And to test it, I created a small
Rails App with a local
PostgreSQL DB to benchmark the alternative solutions. The benchmark was testing the time to bulk insert 100k records.
|Solution||Time to insert 100K records|
|Multiple INSERT statements with DB Transaction||~ 88 seconds|
|Using bulk_insert gem||~ 8 seconds|
I admit that the
ORMs spoiled me and I need to pay more attention to what's happening behind the scenes.