Monday, December 21, 2009

Google Basic Building Block - Protocol Buffers

“Protocol Buffers” is an important one of Google’s basic building blocks. It’s is a way of encoding structured data in an efficient yet extensible format, and a compiler that generates convenient wrappers for manipulating the objects in a variety of languages. Protocol Buffers are used extensively at Google for almost all RPC protocols, and for storing structured information in a variety of persistent storage systems.

When to use Protocol Buffers:
- RPC Protocols/Messages
- Persistent Storage of structured information
- As Client/Server Framework

According to Jeff Dean’s keynote at LADIS2009 http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

Serialization/Deserialization
- high performance (200+ MB/s encode/decode)
- fairly compact (uses variable length encodings)
- format used to store data persistently (not just for RPCs)

Low-level MapReduce interfaces are in terms of byte arrays
- Hardly ever use textual formats, though: slow, hard to parse
- Most input & output is in encoded Protocol Buffer format

Language Support:
- C++
- Java
- Python

Optimization for different use cases: (e.g.: option optimize_for = SPEED)

- SPEED (default): The protocol buffer compiler will generate code for serializing, parsing, and performing other common operations on your message types. This code is extremely highly optimized.

- CODE_SIZE: The protocol buffer compiler will generate minimal classes and will rely on shared, reflection-based code to implement serialialization, parsing, and various other operations. The generated code will thus be much smaller than with SPEED, but operations will be slower. Classes will still implement exactly the same public API as they do in SPEED mode. This mode is most useful in apps that contain a very large number .proto files and do not need all of them to be blindingly fast.

- LITE_RUNTIME: The protocol buffer compiler will generate classes that depend only on the "lite" runtime library (libprotobuf-lite instead of libprotobuf). The lite runtime is much smaller than the full library (around an order of magnitude smaller) but omits certain features like descriptors and reflection. This is particularly useful for apps running on constrained platforms like mobile phones. The compiler will still generate fast implementations of all methods as it does in SPEED mode. Generated classes will only implement the MessageLite interface in each language, which provides only a subset of the methods of the full Message interface.

The detail of Protocol Buffers, please refer http://code.google.com/apis/protocolbuffers/.

We may select one between Protocol Buffers and Thrift as our building block. After have a brief read of the Protocol Buffers’ code, and compare to our experiences of using Thrift, I like Thrift, which provide better RPC implementation and coding interfaces.

There are also some performance compares of Thrift and Protocol Buffers:
http://timyang.net/programming/thrift-protocol-buffers-performance-java/
http://timyang.net/programming/thrift-protocol-buffers-performance-2/
http://rapd.wordpress.com/2009/04/18/json-vs-thrift-vs-pbuffer/

How to install protobuf (an example):

1. Download protobuf-2.2.0a.tar.gz
$ cd /usr/local/src
$ tar -zxvf /root/pkgs/protobuf-2.2.0a.tar.gz

Read README.TXT and INSTALL.TXT for detail.

2. Build and install the C++ Protocol Buffer runtime and the Protocol Buffer compiler (protoc)
$./configure --prefix=/usr/local/protobuf
$ make
$ make check
$ make install

Set linux lib path, then application can find protobuf.so.
$ echo “/usr/local/protobuf/lib” > /etc/ld.so.conf.d/protobuf.conf
$ ldconfig

3. /etc/profile.d/local.sh
This local.sh is added by me. It add some system level environment variables for the convenience of applications.
# apache-ant
ANT_HOME=/usr/local/apache-ant
PATH=$PATH:$ANT_HOME/bin
export ANT_HOME

# google protocol buffer
GOOGLE_PROTOBUF_HOME=/usr/local/protobuf
PATH=$PATH:$GOOGLE_PROTOBUF_HOME/bin
export GOOGLE_PROTOBUF_HOME
# apps use pkg-config to compile and link protobuf (eg. pkg-config --cflags --libs protobuf)
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:$GOOGLE_PROTOBUF_HOME/lib/pkgconfig/

export PATH
4. Install protobuf Java
$ cd /usr/local/src/protobuf-2.2.0a/java
Read README.TXT (Installation - Without Maven)

Generate DescriptorProtos.java
$ protoc --java_out=src/main/java -I../src ../src/google/protobuf/descriptor.proto

Write a new build.xml:
-------------------
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<project basedir="." default="jar-libprotobuf" name="libprotobuf">
<property environment="env"/>

<!-- javac options -->
<property name="javac.version" value="1.6"/>
<property name="javac.source" value="${javac.version}"/>
<property name="javac.target" value="${javac.version}"/>
<property name="javac.deprecation" value="off"/>
<property name="javac.debug" value="off"/>
<property name="javac.debuglevel" value="source,lines,vars"/>
<property name="javac.optimize" value="on"/>
<property name="javac.args" value=""/>
<property name="javac.args.warnings" value="-Xlint:unchecked"/>

<!-- jar options -->
<property name="jar.index" value="true"/>

<!-- protobuf names -->
<property name="version" value="2.2.0a"/>
<property name="Name" value="libprotobuf"/>
<property name="final.name" value="${Name}-java-${version}"/>

<!-- dir locations -->
<property name="src.dir" value="${basedir}/src"/>
<property name="src.main.dir" value="${src.dir}/main"/>
<property name="src.test.dir" value="${src.dir}/test"/>
<property name="build.dir" value="${basedir}/build"/>
<property name="build.classes.dir" value="${build.dir}/classes"/>

<!-- TARGET init -->
<target name="init">
<mkdir dir="${build.dir}"/>
</target>

<!-- TARGET clean -->
<target name="clean">
<delete dir="${build.dir}"/>
</target>

<!-- TARGET cleanall -->
<target name="cleanall" depends="clean">
<delete>
<fileset dir="." includes="*.jar"/>
</delete>
</target>

<!-- TARGET compile-libprotobuf -->
<target name="compile-libprotobuf" depends="init" >
<echo message="${ant.project.name}: ${ant.file}"/>
<mkdir dir="${build.classes.dir}"/>
<javac source="${javac.source}" target="${javac.target}"
destdir="${build.classes.dir}"
srcdir="${src.main.dir}"
debug="${javac.debug}"
debuglevel="${javac.debuglevel}"
optimize="${javac.optimize}"
deprecation="${javac.deprecation}">
<compilerarg line="${javac.args} ${javac.args.warnings}" />
</javac>
</target>

<!-- TARGET jar-libprotobuf -->
<target name="jar-libprotobuf" depends="compile-libprotobuf">
<jar basedir="${build.classes.dir}" destfile="${build.dir}/${final.name}.jar" index="${jar.index}">
</jar>
<copy todir="${basedir}">
<fileset file="${build.dir}/${final.name}.jar"/>
</copy>
</target>


<!-- for libprotobuf-lite -->

<property name="build.lite.dir" value="${build.dir}/lite"/>
<property name="build.lite.classes.dir" value="${build.lite.dir}/classes"/>
<property name="final.lite.name" value="${Name}-lite-java-${version}"/>

<!-- TARGET compile-libprotobuf-lite -->
<target name="compile-libprotobuf-lite" depends="init" >
<echo message="${ant.project.name}: ${ant.file}"/>
<mkdir dir="${build.lite.dir}"/>
<mkdir dir="${build.lite.classes.dir}"/>
<javac source="${javac.source}" target="${javac.target}"
destdir="${build.lite.classes.dir}"
srcdir="${src.main.dir}"
includes="**/AbstractMessageLite.java
**/ByteString.java
**/CodedInputStream.java
**/CodedOutputStream.java
**/ExtensionRegistryLite.java
**/FieldSet.java
**/GeneratedMessageLite.java
**/InvalidProtocolBufferException.java
**/Internal.java
**/MessageLite.java
**/UninitializedMessageException.java
**/WireFormat.java"
debug="${javac.debug}"
debuglevel="${javac.debuglevel}"
optimize="${javac.optimize}"
deprecation="${javac.deprecation}">
<compilerarg line="${javac.args} ${javac.args.warnings}" />
</javac>
</target>

<!-- TARGET jar-libprotobuf-lite -->
<target name="jar-libprotobuf-lite" depends="compile-libprotobuf-lite">
<jar basedir="${build.lite.classes.dir}" destfile="${build.lite.dir}/${final.lite.name}.jar" index="${jar.index}">
</jar>
<copy todir="${basedir}">
<fileset file="${build.lite.dir}/${final.lite.name}.jar"/>
</copy>
</target>

</project>
-------------------

$ ant
$ cp libprotobuf-java-2.2.0a.jar /usr/local/protobuf/lib/

$ ant libprotobuf-lite-java
$ cp libprotobuf-lite-java-2.2.0a.jar /usr/local/protobuf/lib/

5. Build examples
$ cd /usr/local/src/protobuf-2.2.0a/examples
Read detail of README.txt

JAVA:
$ export CLASSPATH=.:$CLASSPATH:/usr/local/protobuf/lib/libprotobuf-java-2.2.0a.jar
$ make java

CPP:
$ make cpp

Python:
$ make python

Then we can read the example code (AddPersion and ListPeople) and run them.