Software Development Reference
Kip Landergren
(Updated: )
My cheat sheet for software development covering basic understanding, terminology, and workflows.
Contents
- Hash Functions
- Booleans
- Numeral Systems
- Powers of 2
- Powers of 10
- Units
- Bit Shifting
- Encoding
- Permission Bits
- Software Development Terminology
-
Git Workflows
- Fork and Pull / Integration Manager / Integrator
- Centralized / Shared Repository
- Others
- Versioning
- Software Versioning Terminology
- Code Review
- Networking
- Data Structures
- Computational Complexity
- Algorithms
- FAQ
Hash Functions
h: 2n → 2k
Generally:
- 2 k/2 operations before seeing collision, due to birthday paradox
Non-exhaustive list of desirable properties, depending on the application:
- avalanche effect - where output changes significantly relative to input
- randomness - can we predict output?
- uniformity - for every possible value are there the same number of inputs?
- one-way function - easy to compute, hard to invert
Applications:
- verification:
- file / data integrity
- password
- digital signature
- data identifier
Cryptographic Hash Functions
SHA-1
- fixed 160-bit (20 byte, 40 hex digits) message digest
- demonstrates good avalanche effect, where output changes significantly relative to input
- deprecated in 2017
$ echo -n 'hello world' | sha1sum
2aae6c35c94fcfb415dbe95f408b9ce91ee846ed -
$ echo -n 'hello world!' | sha1sum
430ce34d020724ed75a196dfc2ad67c77772d169 -
Booleans
0 | false |
1 | true |
Beware! Unix command exit codes are:
0 | success |
1 | error |
Numeral Systems
Positional Notation
ijkradix ⇒ (i × radix2) + (j × radix1) + (k × radix0)
Binary - Base 2
bit | contraction of “binary digit” |
1012 ⇒ (1 × 22) + (0 × 21) + (1 × 20)
0000 | 0 |
1111 | 15 |
Octal - Base 8
1018 ⇒ (1 × 82) + (0 × 81) + (1 × 80)
- 3-bit encoding
Hexadecimal - Base 16
10116 ⇒ (1 × 162) + (0 × 161) + (1 × 160)
- 4-bit encoding
- capitalization is by convention; it does not matter
- considered “binary shorthand” because of the ease in exchanging 4 bits for 1 hex digit
- “hexa” is Greek for “six”, “decimal” is derived from Latin; the etymology is confusing
Hex value | Decimal Value |
---|---|
00 | 0 |
FF | 255 |
Conversions
Decimal | Binary | Octal | Hexadecimal |
---|---|---|---|
0 | 0000 | 0 | 0 |
1 | 0001 | 1 | 1 |
2 | 0010 | 2 | 2 |
3 | 0011 | 3 | 3 |
4 | 0100 | 4 | 4 |
5 | 0101 | 5 | 5 |
6 | 0110 | 6 | 6 |
7 | 0111 | 7 | 7 |
8 | 1000 | 10 | 8 |
9 | 1001 | 11 | 9 |
10 | 1010 | 12 | A |
11 | 1011 | 13 | B |
12 | 1100 | 14 | C |
13 | 1101 | 15 | D |
14 | 1110 | 16 | E |
15 | 1111 | 17 | F |
Powers of 2
20 | 1 |
21 | 2 |
22 | 4 |
23 | 8 |
24 | 16 |
25 | 32 |
26 | 64 |
27 | 128 |
28 | 256 |
29 | 512 |
210 | 1024 |
211 | 2048 |
212 | 4096 |
216 | 65,536 |
232 | 4,294,967,296 |
264 | 18,446,744,073,709,551,616 |
Powers of 10
100 | 1 |
101 | 10 |
102 | 100 |
103 | 1000 |
104 | 10,000 |
105 | 100,000 |
106 | 1,000,000 |
107 | 10,000,000 |
108 | 100,000,000 |
109 | 1,000,000,000 |
1010 | 10,000,000,000 |
Units
byte | 8 bits (most common) |
nibble | 4 bits; a play on words in that “a nibble is a portion of a bite (byte)” |
word | the natural unit of data for a processor |
Why 8 bits to a byte?
Convenience given that 8 is 23 and history with 7-bit ascii.
Bit Shifting
For int-like types, left bit shifting by n multiplies the value by 2n :
10 << 1 # 10 * 2^1 = 20
10 << 2 # 10 * 2^2 = 40
10 << 3 # 10 * 2^3 = 80
10 << 4 # 10 * 2^4 = 160
Encoding
ASCII
- 7 bits wide, 8th can be used as parity bit
- American Standard Code for Information Interchange
Permission Bits
rwx | 7 | read, write, and execute |
rw- | 6 | read and write |
r-x | 5 | read and execute |
r-- | 4 | read |
-wx | 3 | write and execute |
-w- | 2 | write |
--x | 1 | execute |
--- | 0 | no permissions |
permission | octal | field |
---|---|---|
rwx------ | 0700 | user |
---rwx--- | 0070 | group |
------rwx | 0007 | other |
Selected common permissions:
0777 | rwxrwxrwx | everyone rwx |
0755 | rwxr-xr-x | owner rwx, everyone else r-x |
0655 | rw-r-xr-x | owner rw-, everyone else r-x |
0644 | rw-r--r-- | owner rw-, everyone else r-- |
Directory permissions have special meanings:
- read: user can read filenames
- write: user can [add,rename,delete] files if execute also set
- execute: user can enter directory and access files
This StackOverflow answer goes into more detail.
Software Development Terminology
- abstraction
- the separation of the use from the implementation
- angle brackets
- < >
- backtick (diacritical mark term: grave accent)
- ^
- braces (AKA “curly braces”, “curly brackets”, “moustache brackets”)
- { }
- brackets (AKA “square brackets”)
- [ ]
- caret
- the text cursor
- caret (diacritical mark term: circumflex)
- ^
- declarative programming
- code style that expresses logic without specifying control flow
- dynamic programming language
- a programming language that defers certain programming behaviors until runtime; not to be confused with a dynamically typed language
- functional programming
- programs constructed by the application and composition of functions
- imperative programming
- code style that expresses desired changes in program state
- paradigm
- the classification of a programming language according to its features; e.g. “functional”
- parens
- ( )
- peek
- return value at location; could be by index (e.g. an array) or by operation (e.g. via
top
on a stack) - robustness
- the likelihood that small changes in a specification will require correspondingly small changes in the program
- runtime configuration / feature flags
- the ability to modify settings and behavior for live traffic and usage
- static programming language
- a programming language that performs certain programming behaviors at compilation time
- stream
- a sequence of data objects
- telemetry
- timing data
- transpiler
- a source-to-source converter; a type of compiler
Git Workflows
Fork and Pull / Integration Manager / Integrator
Contributors do not (necessarily) have commit access to a repository and instead fork it to create their own copy, work within that forked version, and communicate to the original repo’s maintainer (possibly through pull requests) that their forked repository’s branch has contributions ready for pulling and merging into the original repository.
Centralized / Shared Repository
Multiple contributors have commit access to a repository and work on clones of it.
git-flow
Useful for:
- aggregating multiple features into a single, testable, isolated release branch
- maintaining specific release versions
- unambiguous process steps
Consider alternatives if:
-
git rebase is preferred
-
a lightweight workflow is needed
-
Authoritative:
- A Successful git Branching Model by Vincent Driessen (nvie)
- nvie/gitflow GitHub repo with git extensions implementing the branching model
-
Perspectives:
- Why aren't you using git-flow? | Hacker News (2010 August)
- GitFlow considered harmful | Hacker News (2015 June)
- That "git-flow” (2016 February)
- What is wrong with “A successful Git branching model”? | Hacker News (2016 February)
- Please stop recommending Git Flow! | Hacker News (2020 March)
GitHub Flow
Useful for:
- continuous deployment to production, like with a web app
- code review through GitHub’s Pull Requests
Consider alternatives if:
-
specific releases must be maintained
-
Authoritative:
-
Perspectives:
- Git email flow vs Github flow | Hacker News (2021 March)
Others
- Git workflows - The Linux Kernel Archives
- GitLab Flow
- Comparing Workflows
- Git Feature Branch Workflow
- Trunk Based Development
- a simple git branching model (written in 2013)
- Patterns for Managing Source Code Branches
- Fossil vs Git | Hacker News
Versioning
For Libraries
Recognize:
- library versioning is about compatibility first, and new functionality second
- the main consumers are other software maintainers and package managers
Preferred versioning policy:
- use Semantic Versioning
- if you want to break backwards compatibility (e.g. increment the
MAJOR
version), create a new, differently named library instead - avoid an automated determination and change of version; a human should be in the loop
Strive for:
- a publically declared versioning scheme
- a publically declared compatibility guarantee
- a unit test suite determining version compatibility
- a detailed
CHANGELOG
For Applications
Recognize:
- application versioning may have many varying consumers:
- end users
- the organization that creates and maintains the application, which itself may be split into:
- engineering interests
- product interests
- business interests
- marketing interests
- entities—which may be software themselves—that manage the application on behalf of the end users (e.g. an app store or an operating system)
Strive for:
- a context-appropriate scheme, like:
MAJOR.MINOR
, whereMAJOR
increment indicates a major product change andMINOR
indicates a minor product change or bugfix- Calendar Versioning, for when releases have a dependency or relationship to time
- channel-based releases, like
ALPHA
orBETA
- named releases
- target-based versioning, like for operating systems (e.g.
macOS-10.15
) or other software
Best Practices
Recognize:
- Hyrum’s Law, which states: “With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.”
- the human consumer:
- comparing two integers for which is larger is straightforward
- “Spring 2021” means different things to a United States vs. an Australian audience
- some may view the next logical sequence member of
2.8
,2.9
,??
to be3.0
, not2.10
Strive for:
- a versioning policy reflective of use and context
- making a human ultimately responsible for the version change decision; being informed by automated processes, like passing a test suite, is encouraged
- committing version metadata to source code
- automatically generating a build identifer using SCM software (e.g.
git describe --first-parent
orgit describe --tags
) for easy reference
Consider:
Schemes
Semantic Versioning (SemVer) | MAJOR.MINOR.PATCH , where MAJOR is breaking backward-compatibility, MINOR is additive, backwards-compatible changes, and PATCH is backwards-compatible bugfixes |
Semantic Import Versioning | Go’s approach of encouraging libraries to follow SemVer with the added requirement that the import path must be file-local to its use and scoped to the major version |
Calendar Versioning (CalVer) | versioning convention based on project release calendar; no hard and fast rules |
Serial Versioning | each version is an increment of the previous |
Software Versioning Terminology
- bug-for-bug compatibility
- preserving aberrant behavior in new implementations to not disrupt compatibility with consumers
- dependency hell
- the frustration from problems involved in managing software dependencies; Dependency hell Wikipedia article
- diamond dependency
- a problem where two dependencies have sub-dependency version conflicts
- evergreen software
- constantly updated software, possibly without user control
- transitive dependency
- a sub-dependency not directly referenced or used by the grandparent
- upgrade cliff
- releasing beta or early access APIs to versions so that libraries can support multiple versions
Additional Resources
- Spec-ulation Keynote - Rich Hickey
- The Principles of Versioning in Go
- CppCon 2017: Titus Winters “C++ as a "Live at Head" Language”
Code Review
Gerrit
Resources
- Gerrit Code Review
- openstack/nova code review
- OpenDev Developer’s Guide using a gerrit-based development and review workflow
- Abandoning Gitflow and GitHub in favour of Gerrit | Hacker News (2016)
Networking
OSI Model
Layer 7 | L7 | Application Layer |
Layer 6 | L6 | Presentation Layer |
Layer 5 | L5 | Session Layer |
Layer 4 | L4 | Transport Layer |
Layer 3 | L3 | Network Layer |
Layer 2 | L2 | Data Link Layer |
Layer 1 | L1 | Physical Layer |
Data Structures
Array
For dynamic (growable) arrays:
access | O(1) |
mutate (at beginning) | O(n) |
mutate (at middle) | O(n) |
mutate (at end) | O(1) |
Computational Complexity
Algorithms
Graph Traversal
FAQ
How to handle validating state?
Scenario:
An object exists, and its state is valid. An object is modified, is its state valid?
The questions arise:
- should proposed state changes be validated in the modification method?
- if they are validated in the modification method, how could we really know if the object would have been invalid, were it given a chance to assume the new state?
- if they validated after modification, why is the object allowed to exist in an invalid state?
- how does the logic for what constitutes validity be used in all appropriate cases?
- how to handle atomic updates where multiple components of state may be modified at once? and if updated individually would constitute an invalid state?
This leads to an alternative path:
Create an immutable object which validates its state on construction, failing to create the object if invalid.
What does the “rc” of files like .bashrc mean?
The “rc” suffix stands for “run commands”.