Structured Query Language (SQL) Knowledge

Kip Landergren

February 05, 2021 (Updated: February 23, 2025)

My Structured Query Language (SQL) knowledge base explaining an overview and some statements.

Overview
Core Idea
Key Concepts
SQL Terminology
Statements
- select
- where
- from
- join
- aggregations
SQL Advice and Performance Tips

Overview

Structured Query Language (SQL) is a declarative—meaning you state what you want, not how to get it—language for querying and manipulating data from a database. SQL operators are compositional around relations, meaning that they take relations as input and return relations as outputs. This allows for expressivity when creating queries.

Core Idea

Use declarative statements to perform operations on relations that will return relations. Understand that there will be many equivalent statements, and the performance of each can be examined by working with the query planner.

Key Concepts

relational algebra
set theory
logic theory

SQL Terminology

FROM: specifies the relation(s) on which the select projects
GROUP BY: allows partitioning relations into groups by values of a given attribute or set of attributes; when used in conjuncture with aggregations like min or max, those aggregations will be computed independently
HAVING: allows apply conditions to the results of the aggregate values; applies to the group generated by the group by (whereas the where applies to single rows at a time)
JOIN: returns a relation generated from combinations of tuples based on the provided conditions
SELECT: specifies the attributes to be projected from a relation and returned
WHERE: applies a condition to individual tuples from the specified relation(s)
aggregate function: reduces multiple inputs to a single output value; computes a single result from multiple input rows
alias: a temporary name assigned to a table or column
column reference: correlation.columnname
dangling tuple: tuple from a join that does not have a match
delimited identifier / quoted identifier: a sequence of characters enclosed by double quotes and treated as case sensitive; is always considered an identifier and never a key word
execution plan: the actual sequence of steps necessary to execute the SQL statement
identifier: tokens that name the tables, columns, or other database objects within a command
key word: tokens that have a fixed meaning in SQL (e.g. SELECT, UPDATE, or VALUES)
query planner: component of DBMS that determines sequence of steps necessary to execute the SQL statement
scalar: the result of a value expression
schema: contains tables and other named objects (data types, functions, and operators). schemas are analogous to directories at the operating system level (but cannot be nested).
statement: the SQL being executed by the DBMS
token: can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol

Statements

select

A select statement specifies a new relation to be returned. Think of select statement as having three parts:

from, specifying the relations
where, specfiying how to combine and filter
select, specifying what attributes to return

Note: a select statement in SQL is not the same as the select operator in relational algebra.

Add distinct to treat returned relation as a set rather than a multi-set.

SQL is by default unordered. If order matters, add a clause.

table variables - use to make queries more readable, and to disambiguate relations in the where clause.

Operator <> for not equal.

Set operator union example:

select cName from Collect
union
select sName from Student;

Set operator intersect may not be available; can emulate via selecting from the same relation multiple times.

Set operator except is the difference operator.

Subqueries in select produce a value that comes out of the query. As long as it returns one value that value is used in the result tuple. This does not mean one value for all tuples, but a single value returned for the executed subquery.

where

Subqueries in where clause allow for combining and filtering relations to then select from. Note that while some subqueries can be rewritten using a join, that may not always be the case depending on what data is desired (pay attention to duplicates!).

Subqueries can be modified with keywords in and not in to further refine results.

The keyword exists (and not exists) allows for testing of condition before selection of relation.

The keyword all allows for testing condition against all members of a relation.

The keyword any allows for testing condition against at least one member of a relation.

from

Relations listed in the from clause are implicit cross products. It is possible to have an explicit join if needed.

A subquery in the from statement generates a new relation that we can then use in the rest of the query. These new relations often have to be marked with a new identifier.

The use of , means take the cross product. e.g.:

select * from Foo, Bar;

join

Kinds:

(inner) join on a condition (e.g. “theta join” from relational algebra); this is the default in SQL
(natural) join which equates columns across tables of the same name
inner join via using(attrs) which is like natural join but with specifying the attributes you want to be equated
outer join, which comes in flavors:
- left
- right
- full which are similar to the theta join except for when tuples do not match the condition they are padded with NULL values.

So, diving in on a left outer join, this would include tuples from the left relation of the join that do not match the condition, padded with NULL values.

A right outer join does the same thing, but for the right relation of the join.

A full outer join takes unmatched tuples from both the left and right relations of the join (aka the union).

An outer join is not associative; order matters.

aggregations

Performs sets of values over multiple rows of a relation. Includes:

min
max
sum
avg
count, which allows convenient functions like count(distinct fooId)

allows addition of new clauses:

group by
having

SQL Advice and Performance Tips

Many SQL statements are equivalent. The query processor should convert statements into the most efficient form, but you can provide “hints” by structuring the query with judicious use of conditions within a join statement or within a where statement.

A query processor will also typically follow the order of parentheses specified in the statement.

Prefer:

  from Student join Apply using(sID)

over:

  from Student natural join Apply

due to the clarity and robustness gained by explicitly specifying sID.

Think of group by as “partition by”.

If a relation can contain NULL values, be aware of how operators works:

  select distinct foo from Bar;

may return NULL values, but:

  select count(distinct foo) from Bar;

may not.