Protocol of memcached

KVS(RedisやらMongoDBやらCassandraやら)について調べてみようと思ったのだが、その前にmemcachedについて理解深めておいたほうがいいよね。ということでプロトコルのドキュメントを読んでみるなり。

http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt

Clients of memcached communicate with server through TCP connections.
(A UDP interface is also available; details are below under "UDP
protocol.") A given running memcached server listens on some
(configurable) port; clients connect to that port, send commands to
the server, read responses, and eventually close the connection.

TCPベースってことですね。

There is no need to send any command to end the session. A client may
just close the connection at any moment it no longer needs it. Note,
however, that clients are encouraged to cache their connections rather
than reopen them every time they need to store or retrieve data. This
is because memcached is especially designed to work very efficiently
with a very large number (many hundreds, more than a thousand if
necessary) of open connections.

セッション終わらせるのにコマンドを送る必要はなく、コネクションを単に切断すればいいだけ。ただし、memcachedは数百〜千以上のオープンコネクションを効率的に扱えるように設計されているので、コネクションはキャッシュしておくことが推奨、とのこと。

There are two kinds of data sent in the memcache protocol: text lines
and unstructured data. Text lines are used for commands from clients
and responses from servers. Unstructured data is sent when a client
wants to store or retrieve data. The server will transmit back
unstructured data in exactly the same way it received it, as a byte
stream.

行ベースのプロトコルで非構造化データをバイトストリームで送る、と。"set"コマンドでbytesを指定してるのも、いきなりブチっとコネクション切断してよいのも納得です。

Text lines are always terminated by \r\n. Unstructured data is _also_
terminated by \r\n, even though \r, \n or any other 8-bit characters
may also appear inside the data. Therefore, when a client retrieves
data from a server, it must use the length of the data block (which it
will be provided with) to determine where the data block ends, and not
the fact that \r\n follows the end of the data block, even though it
does.

テキストラインは\r\nが区切り文字になるが、data blockにも\r\nは含まれるので"set"でbytesを指定する必要があるよね、という話。

まとめ

memcachedは行ベースのシンプルなプロトコルで、非構造化データをbyte streamとしてやりとりしている。

また、データは単なるbyteなのでserialize/deserializeはクライアントにお任せ、ということですね。