Bazel Knowledge
Kip Landergren
(Updated: )
My Bazel knowledge base that evolves as I learn more.
Contents
Overview
bazel is an artifact-based build system. It offers the ability to distribute build jobs over remote instances and caches build artifacts for reuse. It is controlled by BUILD files that contain rules defining how the artifacts should be constructed.
Background and Context
Build tooling started as an extension of what the developer was already doing manually: pass files to a compiler as inputs and generate a binary as outputs.
As projects grow in scope—e.g. in number of generated binaries, dependencies, different compilers / languages—the scope of what these manual build tasks must manage becomes more complex:
- more compilers and tools are needed
- specific environment configurations are required
- differences between developer and production environments become wider
And the problems with their operation and maintenance become manifold:
- tedium
- slow / complex to get correct
- multiplicative (more scripts for more specific actions!)
- environment-specific configuration or assumptions lead to bugs
- hard to get consistency across automated builds
- how do you scale across unused compute resources?
Additionally, from an engineering organization perspective, there is an ever present desire for increased product velocity (which trickles down to developer velocity).
This results in an increased emphasis on:
- automated builds
- fast releases
- debugging / maintenance
- faster build times
- correctness
And so to address these needs tasks grow even larger in scope and complexity and brittleness.
There is a fundamental issue at play here: because the tasks are so flexible—they offer no guarantees about their operation and output—the containing system is hamstrung in terms of what execution improvements it can confidently make.
Build systems that exhibit these characteristics are called task-based build systems.
In contrast, bazel is an artifact-based build system. An artifact-based build system instead focuses on declaring what the outputs should be and leave the “how” of creating them to the build system itself. By creating build primitives that developers can invoke to specify what their desired outputs are—including metadata like their dependencies— bazel forces the developer to give up some of that task freedom in exchange for powerful build system capabilities.
Those capabilities include:
- parallelization
- guarantees around reproducibility
- remote caching
- remote execution
bazel accomplishes this with the following setup:
- BUILD files at the package level define what the targets are
- targets are defined using rules, many of which are built-in, that give bazel an understanding of what is being built
- a directed acyclic graph of dependencies is created via the metadata specified on target definition
This graph, in combination with the rules (of which it has guarantees about operation, like their lack of side effects), allow bazel to determine an optimal build sequence. Additionally, the reproducibility requirement makes it possible to introduce remote shared caching and remote execution.
Core Idea
bazel reduces the developer’s ability to define the “how” of how a target is built and instead directs them toward specifying the “what” of what needs to be built.
Key Concepts
- artifact-based build system
- sandboxing
- determinism
- reproducibility
- hermeticity
Tooling
Gazelle
Gazelle is BUILD file generator. It analyzes a target’s dependencies and ensures that they are available for bazel to find and use when building a target.