Avoid unexpected duplication in MySQL database

Screen Shot 2018-05-24 at 4.32.18 PM

Let’s assume that your application has to insert thousand records at the same time, and you might want those records not to be duplicated. For example, you need to perform insertion from a CSV file, to insert contacts, and you want there’s no two contacts with the same email.

You might come up with the solution like: “Ah, ActiveRecord supports validations!”. I think validation like

class User
  validates_uniqueness_of :email
end

I think the validation will work well in majority cases. However, let’s assume you’re doing that task in background job. Then you have another file, which has a lot of duplicated users, and new users, of course. You wanna do the second insertion right after that (or some actions like that at almost same time). Do you think that at the same moment, two jobs will insert the same users, and validations for those two records will both pass. Then? You have the duplicated users in your database.

It’s time to try the second solution. Let’s put the hard jobs on MySQL side. MySQL provides kind of index called UNIQUE index that allows you to enforce the uniqueness of values in one or more columns
You can use MYSQL command

CREATE UNIQUE INDEX idx_email
ON users(email);

You can ask MYSQL to have unique index on multiple columns

CREATE UNIQUE INDEX idx_email_phone
ON users(email, phone);

How can you do with migrations?

add_index :users, [:email, :phone], unique: true

Another thing that We should mention here, MySQL allow multiple rows with NULL even there is UNIQUE INDEX constraint.

Again, you can come up with two possible solutions.

MySQL side

change_column :users, :email, :string, null: false

Or use a validation

validates :email, presence: true

The last thing we should be aware of is that MySQL might raise an exception if there’s duplicated rows. Thus, we should have a rescue block

begin
  # Your code come here
rescue StandardError => e
  # You can write exception detail to a log file using logger
  log = Logger.new("log/mysql.log")
  log.info("#{Time.now.strftime('%Y-%m-%d %H:%M:%S')} - #{e.class}: #{e.message} \n #{email}")
end

That’s all for handling duplicated records
Happy coding!

Published by

Colin Dao

I am a hardworking Rubyist in Hanoi, Vietnam

Leave a comment