Structured Query Language (SQL) Knowledge
Kip Landergren
(Updated: )
My Structured Query Language (SQL) knowledge base explaining an overview and some statements.
Contents
Overview
Structured Query Language (SQL) is a declarative—meaning you state what you want, not how to get it—language for querying and manipulating data from a database. SQL operators are compositional around relations, meaning that they take relations as input and return relations as outputs. This allows for expressivity when creating queries.
Core Idea
Use declarative statements to perform operations on relations that will return relations. Understand that there will be many equivalent statements, and the performance of each can be examined by working with the query planner.
Key Concepts
- relational algebra
- set theory
- logic theory
SQL Terminology
FROM
- specifies the relation(s) on which the select projects
GROUP BY
- allows partitioning relations into groups by values of a given attribute or set of attributes; when used in conjuncture with aggregations like
min
ormax
, those aggregations will be computed independently HAVING
- allows apply conditions to the results of the aggregate values; applies to the group generated by the
group by
(whereas thewhere
applies to single rows at a time) JOIN
- returns a relation generated from combinations of tuples based on the provided conditions
SELECT
- specifies the attributes to be projected from a relation and returned
WHERE
- applies a condition to individual tuples from the specified relation(s)
- aggregate function
- reduces multiple inputs to a single output value; computes a single result from multiple input rows
- column reference
- correlation.columnname
- dangling tuple
- tuple from a join that does not have a match
- delimited identifier / quoted identifier
- a sequence of characters enclosed by double quotes and treated as case sensitive; is always considered an identifier and never a key word
- execution plan
- the actual sequence of steps necessary to execute the SQL statement
- identifier
- tokens that name the tables, columns, or other database objects within a command
- key word
- tokens that have a fixed meaning in SQL (e.g. SELECT, UPDATE, or VALUES)
- query planner
- component of DBMS that determines sequence of steps necessary to execute the SQL statement
- scalar
- the result of a value expression
- schema
- contains tables and other named objects (data types, functions, and operators). schemas are analogous to directories at the operating system level (but cannot be nested).
- statement
- the SQL being executed by the DBMS
- token
- can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol
Statements
select
A select
statement specifies a new relation to be returned. Think of select
statement as having three parts:
from
, specifying the relationswhere
, specfiying how to combine and filterselect
, specifying what attributes to return
Note: a select
statement in SQL is not the same as the select operator in relational algebra.
Add distinct
to treat returned relation as a set rather than a multi-set.
SQL is by default unordered. If order matters, add a clause.
table variables - use to make queries more readable, and to disambiguate relations in the where clause.
Operator <>
for not equal.
Set operator union
example:
select cName from Collect
union
select sName from Student;
Set operator intersect
may not be available; can emulate via selecting from the same relation multiple times.
Set operator except
is the difference operator.
Subqueries in select produce a value that comes out of the query. As long as it returns one value that value is used in the result tuple. This does not mean one value for all tuples, but a single value returned for the executed subquery.
where
Subqueries in where
clause allow for combining and filtering relations to then select
from. Note that while some subqueries can be rewritten using a join
, that may not always be the case depending on what data is desired (pay attention to duplicates!).
Subqueries can be modified with keywords in
and not in
to further refine results.
The keyword exists
(and not exists
) allows for testing of condition before selection of relation.
The keyword all
allows for testing condition against all members of a relation.
The keyword any
allows for testing condition against at least one member of a relation.
from
Relations listed in the from
clause are implicit cross products. It is possible to have an explicit join
if needed.
A subquery in the from
statement generates a new relation that we can then use in the rest of the query. These new relations often have to be marked with a new identifier.
The use of ,
means take the cross product. e.g.:
select * from Foo, Bar;
join
Kinds:
(inner) join
on a condition (e.g. “theta join” from relational algebra); this is the default in SQL- (natural)
join
which equates columns across tables of the same name inner join
viausing(attrs)
which is like natural join but with specifying the attributes you want to be equatedouter join
, which comes in flavors:left
right
full
which are similar to the theta join except for when tuples do not match the condition they are padded withNULL
values.
So, diving in on a left outer join
, this would include tuples from the left relation of the join that do not match the condition, padded with NULL
values.
A right outer join
does the same thing, but for the right relation of the join.
A full outer join
takes unmatched tuples from both the left and right relations of the join (aka the union
).
An outer join
is not associative; order matters.
aggregations
Performs sets of values over multiple rows of a relation. Includes:
min
max
sum
avg
count
, which allows convenient functions likecount(distinct fooId)
allows addition of new clauses:
group by
having
SQL Advice and Performance Tips
Many SQL statements are equivalent. The query processor should convert statements into the most efficient form, but you can provide “hints” by structuring the query with judicious use of conditions within a join
statement or within a where
statement.
A query processor will also typically follow the order of parentheses specified in the statement.
Prefer:
from Student join Apply using(sID)
over:
from Student natural join Apply
due to the clarity and robustness gained by explicitly specifying sID
.
Think of group by
as “partition by”.
If a relation can contain NULL
values, be aware of how operators works:
select distinct foo from Bar;
may return NULL
values, but:
select count(distinct foo) from Bar;
may not.