はじめに

SREの本を読んでいて出てきたProtocol Buffers
XMLやJSONよりも高速とのことで、これを使ってみたいとおもいます

公式ドキュメントは
https://developers.google.com/protocol-buffers/

インストール

pip install python3-protobuf

wget https://github.com/protocolbuffers/protobuf/releases/download/v3.8.0/protobuf-python-3.8.0.tar.gz
cd protobuf-3.8.0/
./autogen.sh
./configure
make
make install
cd python/
python3.6 setup.py build
python3.6 setup.py test
cd ../
make
make install
make clean
cd python/
python3.6 setup.py install


スキーマ(Protoファイル)の定義

データをやり取りするために、そのデータのスキーマをprotoファイルという形で
定義するようです

定義したデータは CやJava、Pythonのライブラリに変換させることができるようです

cat.proto

車の情報をやり取りするようなスキーマを作ってみます
syntax = "proto2";

message Car {
  required string model = 1;
  required string vendor = 2;
  required int32 price = 3;
  optional string owner = 4;

  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }

  repeated PhoneNumber phone = 5;
}
最初にsyntaxでバージョンを指定する必要があるようです

コンパイル
今回はPythonで利用してみます
protoc --python_out=./ car.proto
出来上がったファイルはこんな感じになってました

car_pb2.py
# -*- coding: utf-8 -*-
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: car.proto

import sys
_b=sys.version_info[0]<3 and (lambda x:x) or (lambda x:x.encode('latin1'))
from google.protobuf import descriptor as _descriptor
from google.protobuf import message as _message
from google.protobuf import reflection as _reflection
from google.protobuf import symbol_database as _symbol_database
# @@protoc_insertion_point(imports)

_sym_db = _symbol_database.Default()




DESCRIPTOR = _descriptor.FileDescriptor(
  name='car.proto',
  package='',
  syntax='proto2',
  serialized_options=None,
  serialized_pb=_b('\n\tcar.proto\"\xc9\x01\n\x03\x43\x61r\x12\r\n\x05model\x18\x01 \x02(\t\x12\x0e\n\x06vendor\x18\x02 \x02(\t\x12\r\n\x05price\x18\x03 \x02(\x05\x12\r\n\x05owner\x18\x04 \x01(\t\x12\x1f\n\x05phone\x18\x05 \x03(\x0b\x32\x10.Car.PhoneNumber\x1a\x41\n\x0bPhoneNumber\x12\x0e\n\x06number\x18\x01 \x02(\t\x12\"\n\x04type\x18\x02 \x01(\x0e\x32\x0e.Car.PhoneType:\x04HOME\"!\n\tPhoneType\x12\n\n\x06MOBILE\x10\x00\x12\x08\n\x04HOME\x10\x01')
)



_CAR_PHONETYPE = _descriptor.EnumDescriptor(
  name='PhoneType',
  full_name='Car.PhoneType',
  filename=None,
  file=DESCRIPTOR,
  values=[
    _descriptor.EnumValueDescriptor(
      name='MOBILE', index=0, number=0,
      serialized_options=None,
      type=None),
    _descriptor.EnumValueDescriptor(
      name='HOME', index=1, number=1,
      serialized_options=None,
      type=None),
  ],
  containing_type=None,
  serialized_options=None,
  serialized_start=182,
  serialized_end=215,
)
_sym_db.RegisterEnumDescriptor(_CAR_PHONETYPE)


_CAR_PHONENUMBER = _descriptor.Descriptor(
  name='PhoneNumber',
  full_name='Car.PhoneNumber',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='number', full_name='Car.PhoneNumber.number', index=0,
      number=1, type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='type', full_name='Car.PhoneNumber.type', index=1,
      number=2, type=14, cpp_type=8, label=1,
      has_default_value=True, default_value=1,
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[],
  enum_types=[
  ],
  serialized_options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=115,
  serialized_end=180,
)

_CAR = _descriptor.Descriptor(
  name='Car',
  full_name='Car',
  filename=None,
  file=DESCRIPTOR,
  containing_type=None,
  fields=[
    _descriptor.FieldDescriptor(
      name='model', full_name='Car.model', index=0,
      number=1, type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='vendor', full_name='Car.vendor', index=1,
      number=2, type=9, cpp_type=9, label=2,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='price', full_name='Car.price', index=2,
      number=3, type=5, cpp_type=1, label=2,
      has_default_value=False, default_value=0,
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='owner', full_name='Car.owner', index=3,
      number=4, type=9, cpp_type=9, label=1,
      has_default_value=False, default_value=_b("").decode('utf-8'),
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
    _descriptor.FieldDescriptor(
      name='phone', full_name='Car.phone', index=4,
      number=5, type=11, cpp_type=10, label=3,
      has_default_value=False, default_value=[],
      message_type=None, enum_type=None, containing_type=None,
      is_extension=False, extension_scope=None,
      serialized_options=None, file=DESCRIPTOR),
  ],
  extensions=[
  ],
  nested_types=[_CAR_PHONENUMBER, ],
  enum_types=[
    _CAR_PHONETYPE,
  ],
  serialized_options=None,
  is_extendable=False,
  syntax='proto2',
  extension_ranges=[],
  oneofs=[
  ],
  serialized_start=14,
  serialized_end=215,
)

_CAR_PHONENUMBER.fields_by_name['type'].enum_type = _CAR_PHONETYPE
_CAR_PHONENUMBER.containing_type = _CAR
_CAR.fields_by_name['phone'].message_type = _CAR_PHONENUMBER
_CAR_PHONETYPE.containing_type = _CAR
DESCRIPTOR.message_types_by_name['Car'] = _CAR
_sym_db.RegisterFileDescriptor(DESCRIPTOR)

Car = _reflection.GeneratedProtocolMessageType('Car', (_message.Message,), {

  'PhoneNumber' : _reflection.GeneratedProtocolMessageType('PhoneNumber', (_message.Message,), {
    'DESCRIPTOR' : _CAR_PHONENUMBER,
    '__module__' : 'car_pb2'
    # @@protoc_insertion_point(class_scope:Car.PhoneNumber)
    })
  ,
  'DESCRIPTOR' : _CAR,
  '__module__' : 'car_pb2'
  # @@protoc_insertion_point(class_scope:Car)
  })
_sym_db.RegisterMessage(Car)
_sym_db.RegisterMessage(Car.PhoneNumber)


# @@protoc_insertion_point(module_scope)

Protocol Buffersの利用

実行用のプログラムです
  1. 最初にcarクラスを作成し、そこにデータを入れてシリアライズ
  2. car2クラスを作成し、シリアライズされたデータをパースし、中身を表示

import car_pb2

car = car_pb2.Car()
car.model  = "prius"
car.vendor = "toyota"
car.price  = 3000000
car.owner  = "yuki08"
car.PhoneNumber.number  = "050-9999-9999"
car.SerializeToString()

car2 = car_pb2.Car()
car2.ParseFromString(car.SerializeToString())
print(car2.model)
print(car2.vendor)
print(car2.owner)
print(car2.PhoneNumber.number)

実行結果

# python3.6 car_protocol_buffer.py
prius
toyota
yuki08
050-9999-9999

すごく簡単にデータのやり取りができました。
パース部分を考えなくて良いのは大きいですね