MySQL去除“關(guān)聯(lián)表”重復(fù)數(shù)據(jù),以建立聯(lián)合唯一索引
前言
昨天遇到一個問題,需要對一張關(guān)系表進行重構(gòu)和優(yōu)化。然而這張關(guān)系表由于已有代碼沒有注重并發(fā)導(dǎo)致了很多的臟數(shù)據(jù),即重復(fù)數(shù)據(jù)。
表名thread_recommend,帖子推薦表,為兩個實體user_id和thread_id的(推薦)關(guān)系表,表結(jié)構(gòu)很簡單如下:
- /*用戶推薦帖子記錄表*/
- CREATE TABLE `thread_recommend` (
- `id` int(11) NOT NULL AUTO_INCREMENT,
- `thread_id` int(11) DEFAULT NULL COMMENT '被用戶推薦的帖子編號',
- `user_id` int(11) DEFAULT NULL COMMENT '推薦該帖子的用戶編號',
- `status` int(11) DEFAULT '1' COMMENT '狀態(tài)0 取消推薦,1推薦',
- `created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '推薦時間',
- PRIMARY KEY (`id`),
- KEY `userid` (`user_id`) USING BTREE
- ) ENGINE=InnoDB;
問題在于,由于代碼不規(guī)范,在高并發(fā)時(或數(shù)據(jù)庫壓力大時造成的延時積壓時)會出現(xiàn)多個(相同thread_id和user_id的)組合,如下:
之后你們懂的,各種和原想不一致的神奇bug噴涌而出,比如:
我剛剛?cè)∠送扑],怎么還顯示我推薦著!!
顯示的總推薦數(shù)怎么和實際推薦用戶加起來不一樣!!
解決方案一:使用insert where not exists語句
聲明:此方案并不是***方案,不推薦使用。
先上代碼:(這里拿另一個關(guān)系表的真實query舉例,原理一樣)
- INSERT INTO `user_topic` (`user_id`, `topic_id`)
- SELECT :userId, :topicid FROM `user_topic`
- WHERE NOT EXISTS (SELECT * FROM `user_topic`
- WHERE `user_topic`.`user_id` = :userId
- AND `user_topic`.`topic_id` = :topicid)
- LIMIT 1;
(相同方法見http://stackoverflow.com/a/31...)
通過這種“插入時判斷不存在才插入并返回行數(shù)為1,存在的話返回行數(shù)為0”的方法,可以做到:
- 只有在返回行數(shù)為1的情況下才執(zhí)行之后邏輯(如緩存內(nèi)的統(tǒng)計數(shù)+1,緩存內(nèi)帖子推薦人增加此userId等等)
- 如果返回行數(shù)為0,則接口返回error
解決方案二:清理臟數(shù)據(jù)并建立聯(lián)合唯一索引
這個方案是本文的核心了,也是我們目前認為的***實踐。
***步:查找user_id, thread_id的聯(lián)合duplication
- SELECT a.* FROM `thread_recommend` a
- INNER JOIN (SELECT * FROM `thread_recommend` GROUP BY `thread_id`, `user_id` HAVING COUNT(id) > 1) b ON a.`thread_id` = b.`thread_id` AND a.`user_id` = b.`user_id`
- ORDER BY a.`user_id` ASC, a.`thread_id` ASC, a.`id` DESC
或簡單的版本
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
得到
哇!所有的重復(fù)項都在這里了,好想馬上把它們干掉!
現(xiàn)在需要將重復(fù)的條目中ID更大的所有條目都刪除,只留ID最小的那一個。
刪之前先獲得需要刪除項,比對一下,
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
下一步,SELECT * FROM改成DELETE FROM,刪除!
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
Oops!報錯! You can't specify target table 'thread_recommend' for update in FROM clause
這是Mysql的一個小問題,我們參見解決方案 http://stackoverflow.com/a/14... 后修改一下SQL就好:
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM (SELECT * FROM `thread_recommend`) a GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM (SELECT * FROM `thread_recommend`) b GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
***,加聯(lián)合唯一索引!
- ALTER TABLE `thread_recommend`
- ADD UNIQUE KEY `thread_id_user_id_unique`(`thread_id`,`user_id`) USING BTREE;
Of course,如果上述清理工作沒有完成將會報錯!
完!