Protocol Buffers (protobuf) Knowledge

Kip Landergren

(Updated: )

My Protocol Buffers (protobuf) knowledge base that evolves as I learn more.

Contents

Overview

Protocol Buffers (AKA “protobuf”) are a binary serialization toolset and language comprised of:

An overview of the protocol buffer compilation process, starting with the data schema defined in a .proto file, combined with user options and passed to protoc, the compiler, to generate native bindings.

This centralizes data schema definition and easily permits the scenario of a server written in one language communicating with multiple clients each in different languages. Serialized data can also be written to disk or for inter-process communication.

Common data format demonstrated by two different client libraries communicating with each other.

Core Idea

Create a toolset for:

Taken together this creates a single source of truth for data and service definitions, simplifying what was previously easily skewed per-language implementations.

Key Concepts

interface definition language / schema definition language the language used to specify the data structure to be represented
serialization / deserialization the process of converting a data structure into a format that can be easily transmitted, stored, and reconstructed later
services and remote procedure call (rpc) the specification of operations between a server and client that can be performed using the data structures
binary encoding / output format the binary format the data is output as
code generation the automatically generated native bindings based on the schema definition that represent the data structures and the services that use them
data exchange formats, compatibility, and versioning how a set of servers can communicate over time with changes to their formats and preserving existing functionality while safely rolling out new functionality

Components

The Language

proto2

proto3

protoc and related plug-ins

protoc uses a plug-in architecture for code generation. While it does contain native support for some language bindings but is otherwise augmented by separate plug-ins. These plug-ins support code generation for both data objects and remote procedure call (RPC) frameworks like gRPC.

The Wire Format

Versioning

Protocol Buffers has to manage evolving:

The considerations are that:

As of January 2024, the two supported versions of the IDL are proto2 and proto3. The goals of proto3 were to:

While this was useful, there were still problems:

To address these protocol buffers will soon adopt the concept of editions. Editions, inspired by how Rust uses them, are groups of features that allow the user to opt-in to compiler behavior. A future user will be able to evolve a .proto file at their own pace by specifying an edition—the first of which will essentially be a no-op—and then opting into features as desired.

Usage

Protocol Buffers is set up to give you the means to describe your data, enforce its safe evolution, and automatically generate polyglot client and server libraries that can communicate with each other. It is opinionated about some factors—like default values and optional fields—but largely leaves you the work of defining and managing your API.

Strengths:

Considerations:

Best for:

Protocol Buffers Terminology

protoc
the protocol buffers compiler
native bindings
the automatically generated software libraries
wire format
the protocol buffers binary output format