調べたことメモ - kotaroito's notes

distinct

指定した列の重複を除くキーワード。データの並べ替えが必要なので遅いらしい。
30万レコードくらいのデータで試してみる。

select distinct user_id, content from data;
299990 rows in set (7.82 sec)

select user_id, content from data;
300000 rows in set (1.21 sec)

必要なときだけ理解して使いなさいとのこと（by 初めてのSQL）

whereとhaving

where節にはデータベースのデータに適用されるフィルタを、havingには集約されたデータに適用されるフィルタを記述する。whereはグループ化の前に評価される。

select user_id, count(user_id) as count from oneword_data
where user_id > 15000
group by user_id having count > 30
;

group by

カラム名ではなく式を指定することも可能

select user_id%10, count(user_id) num
from user_data
group by user_id%10
;

検索式を使って条件をつくる。
testで始まる文字列を探す（testのみでもok）

select content from oneword_data
where content like 'test%'
;

test＋任意の一字の文字列を探す。

select content from oneword_data
where content like 'test_'
;

regexp

正規表現で検索する。
testで始まる文字列を探す（testのみでもok）

select content from oneword_data
where content regexp 'test.*'
;

test＋任意の一字の文字列を探す

select content from oneword_data
where content regexp 'test.'
;

likeとregexpの速度

どっちが速いか？

select content from oneword_data where content regexp '^test1.*0$';
13333 rows in set (1.12 sec)
select content from oneword_data where content like 'test1%0';
13333 rows in set (0.59 sec)

likeのほうが倍近く速い。

これだとどうだ？

select content from oneword_data where content regexp '^test.*0$';
30000 rows in set (0.95 sec)
select content from oneword_data where content like 'test%0';
30000 rows in set (0.68 sec)

正規表現は速くなったが、それでもまだlikeのほうが速い

さらに試してみる。

select content from oneword_data where content regexp '^test1.2.*';
13330 rows in set (1.19 sec)
select content from oneword_data where content like 'test1_2%';
13330 rows in set (0.58 sec)

やっぱりlikeのほうが速い。
実装がわからないのでアレだが、likeでできることならlikeでSQLを書いたほうがよさげ。

サブクエリの結果が多いとメチャ遅い。

select  * from oneword_data
where user_id in (
	select user_id from oneword_data
  	where user_id > 15000
	)
;
147880 rows in set (6.69 sec)

select  * from oneword_data
where user_id > 15000
;
147880 rows in set (1.23 sec)

user_id in(1111, 2222)だったらuser_id=1111 OR user_id=2222
に内部変換しているのだろう。そう考えると納得。

filesort

MySQLでEXPLAIN SELECT...を実行するとExtraフィールドでよく見かける「Using filesort」という文字列。Filesortって一体なんだろう？と思ったことはないだろうか。単刀直入に言ってFilesortの正体はクイックソートである。
クエリにORDER BYが含まれる場合、MySQLはある程度の大きさまでは全てメモリ内でクイックソートを処理する。ある程度の大きさとはsort_buffer_sizeであり、これはセッションごとに変更可能である。ソートに必要なメモリがsort_buffer_sizeより大きくなると、テンポラリファイル（テンポラリテーブルではない）が作成され、メモリとファイルを併用してクイックソートが実行される。

引用元：http://nippondanji.blogspot.com/2009/03/using-filesort.html

filesortが発生しないようなSQLか要チェックということですね。

まとめ

- むやみにdistinctを使わない
- havingには集約結果のフィルタリング条件を
- likeとregexpではlikeのほうが速い
- inでたくさんの結果を含むようなサブクエリは避けるべき
- explain select〜でSQLの実行計画を調べるべき