Structured Query Language (SQL) Knowledge
Kip Landergren
(Updated: )
My Structured Query Language (SQL) knowledge base explaining an overview and some statements.
Contents
Overview
Structured Query Language (SQL) is a declarative—meaning you state what you want, not how to get it—language for querying and manipulating data from a database. SQL operators are compositional around relations, meaning that they take relations as input and return relations as outputs. This allows for expressivity when creating queries.
Core Idea
Use declarative statements to perform operations on relations that will return relations. Understand that there will be many equivalent statements, and the performance of each can be examined by working with the query planner.
Key Concepts
- relational algebra
- set theory
- logic theory
SQL Terminology
FROM- specifies the relation(s) on which the select projects
GROUP BY- allows partitioning relations into groups by values of a given attribute or set of attributes; when used in conjuncture with aggregations like
minormax, those aggregations will be computed independently HAVING- allows apply conditions to the results of the aggregate values; applies to the group generated by the
group by(whereas thewhereapplies to single rows at a time) JOIN- returns a relation generated from combinations of tuples based on the provided conditions
SELECT- specifies the attributes to be projected from a relation and returned
WHERE- applies a condition to individual tuples from the specified relation(s)
- aggregate function
- reduces multiple inputs to a single output value; computes a single result from multiple input rows
- alias
- a temporary name assigned to a table or column
- column reference
- correlation.columnname
- dangling tuple
- tuple from a join that does not have a match
- delimited identifier / quoted identifier
- a sequence of characters enclosed by double quotes and treated as case sensitive; is always considered an identifier and never a key word
- execution plan
- the actual sequence of steps necessary to execute the SQL statement
- identifier
- tokens that name the tables, columns, or other database objects within a command
- key word
- tokens that have a fixed meaning in SQL (e.g. SELECT, UPDATE, or VALUES)
- query planner
- component of DBMS that determines sequence of steps necessary to execute the SQL statement
- scalar
- the result of a value expression
- schema
- contains tables and other named objects (data types, functions, and operators). schemas are analogous to directories at the operating system level (but cannot be nested).
- statement
- the SQL being executed by the DBMS
- token
- can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol
Statements
select
A select statement specifies a new relation to be returned. Think of select statement as having three parts:
from, specifying the relationswhere, specfiying how to combine and filterselect, specifying what attributes to return
Note: a select statement in SQL is not the same as the select operator in relational algebra.
Add distinct to treat returned relation as a set rather than a multi-set.
SQL is by default unordered. If order matters, add a clause.
table variables - use to make queries more readable, and to disambiguate relations in the where clause.
Operator <> for not equal.
Set operator union example:
select cName from Collect
union
select sName from Student;
Set operator intersect may not be available; can emulate via selecting from the same relation multiple times.
Set operator except is the difference operator.
Subqueries in select produce a value that comes out of the query. As long as it returns one value that value is used in the result tuple. This does not mean one value for all tuples, but a single value returned for the executed subquery.
where
Subqueries in where clause allow for combining and filtering relations to then select from. Note that while some subqueries can be rewritten using a join, that may not always be the case depending on what data is desired (pay attention to duplicates!).
Subqueries can be modified with keywords in and not in to further refine results.
The keyword exists (and not exists) allows for testing of condition before selection of relation.
The keyword all allows for testing condition against all members of a relation.
The keyword any allows for testing condition against at least one member of a relation.
from
Relations listed in the from clause are implicit cross products. It is possible to have an explicit join if needed.
A subquery in the from statement generates a new relation that we can then use in the rest of the query. These new relations often have to be marked with a new identifier.
The use of , means take the cross product. e.g.:
select * from Foo, Bar;
join
Kinds:
(inner) joinon a condition (e.g. “theta join” from relational algebra); this is the default in SQL- (natural)
joinwhich equates columns across tables of the same name inner joinviausing(attrs)which is like natural join but with specifying the attributes you want to be equatedouter join, which comes in flavors:leftrightfullwhich are similar to the theta join except for when tuples do not match the condition they are padded withNULLvalues.
So, diving in on a left outer join, this would include tuples from the left relation of the join that do not match the condition, padded with NULL values.
A right outer join does the same thing, but for the right relation of the join.
A full outer join takes unmatched tuples from both the left and right relations of the join (aka the union).
An outer join is not associative; order matters.
aggregations
Performs sets of values over multiple rows of a relation. Includes:
minmaxsumavgcount, which allows convenient functions likecount(distinct fooId)
allows addition of new clauses:
group byhaving
SQL Advice and Performance Tips
Many SQL statements are equivalent. The query processor should convert statements into the most efficient form, but you can provide “hints” by structuring the query with judicious use of conditions within a join statement or within a where statement.
A query processor will also typically follow the order of parentheses specified in the statement.
Prefer:
from Student join Apply using(sID)
over:
from Student natural join Apply
due to the clarity and robustness gained by explicitly specifying sID.
Think of group by as “partition by”.
If a relation can contain NULL values, be aware of how operators works:
select distinct foo from Bar;
may return NULL values, but:
select count(distinct foo) from Bar;
may not.