Structured Query Language (SQL) Knowledge

Kip Landergren

(Updated: )

My Structured Query Language (SQL) knowledge base explaining an overview and some statements.

Contents

Overview

Structured Query Language (SQL) is a declarative—meaning you state what you want, not how to get it—language for querying and manipulating data from a database. SQL operators are compositional around relations, meaning that they take relations as input and return relations as outputs. This allows for expressivity when creating queries.

Core Idea

Use declarative statements to perform operations on relations that will return relations. Understand that there will be many equivalent statements, and the performance of each can be examined by working with the query planner.

Key Concepts

SQL Terminology

FROM
specifies the relation(s) on which the select projects
GROUP BY
allows partitioning relations into groups by values of a given attribute or set of attributes; when used in conjuncture with aggregations like min or max, those aggregations will be computed independently
HAVING
allows apply conditions to the results of the aggregate values; applies to the group generated by the group by (whereas the where applies to single rows at a time)
JOIN
returns a relation generated from combinations of tuples based on the provided conditions
SELECT
specifies the attributes to be projected from a relation and returned
WHERE
applies a condition to individual tuples from the specified relation(s)
aggregate function
reduces multiple inputs to a single output value; computes a single result from multiple input rows
column reference
correlation.columnname
dangling tuple
tuple from a join that does not have a match
delimited identifier / quoted identifier
a sequence of characters enclosed by double quotes and treated as case sensitive; is always considered an identifier and never a key word
execution plan
the actual sequence of steps necessary to execute the SQL statement
identifier
tokens that name the tables, columns, or other database objects within a command
key word
tokens that have a fixed meaning in SQL (e.g. SELECT, UPDATE, or VALUES)
query planner
component of DBMS that determines sequence of steps necessary to execute the SQL statement
scalar
the result of a value expression
schema
contains tables and other named objects (data types, functions, and operators). schemas are analogous to directories at the operating system level (but cannot be nested).
statement
the SQL being executed by the DBMS
token
can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol

Statements

select

A select statement specifies a new relation to be returned. Think of select statement as having three parts:

Note: a select statement in SQL is not the same as the select operator in relational algebra.

Add distinct to treat returned relation as a set rather than a multi-set.

SQL is by default unordered. If order matters, add a clause.

table variables - use to make queries more readable, and to disambiguate relations in the where clause.

Operator <> for not equal.

Set operator union example:

select cName from Collect
union
select sName from Student;

Set operator intersect may not be available; can emulate via selecting from the same relation multiple times.

Set operator except is the difference operator.

Subqueries in select produce a value that comes out of the query. As long as it returns one value that value is used in the result tuple. This does not mean one value for all tuples, but a single value returned for the executed subquery.

where

Subqueries in where clause allow for combining and filtering relations to then select from. Note that while some subqueries can be rewritten using a join, that may not always be the case depending on what data is desired (pay attention to duplicates!).

Subqueries can be modified with keywords in and not in to further refine results.

The keyword exists (and not exists) allows for testing of condition before selection of relation.

The keyword all allows for testing condition against all members of a relation.

The keyword any allows for testing condition against at least one member of a relation.

from

Relations listed in the from clause are implicit cross products. It is possible to have an explicit join if needed.

A subquery in the from statement generates a new relation that we can then use in the rest of the query. These new relations often have to be marked with a new identifier.

The use of , means take the cross product. e.g.:

select * from Foo, Bar;

join

Kinds:

So, diving in on a left outer join, this would include tuples from the left relation of the join that do not match the condition, padded with NULL values.

A right outer join does the same thing, but for the right relation of the join.

A full outer join takes unmatched tuples from both the left and right relations of the join (aka the union).

An outer join is not associative; order matters.

aggregations

Performs sets of values over multiple rows of a relation. Includes:

allows addition of new clauses:

SQL Advice and Performance Tips

Many SQL statements are equivalent. The query processor should convert statements into the most efficient form, but you can provide “hints” by structuring the query with judicious use of conditions within a join statement or within a where statement.

A query processor will also typically follow the order of parentheses specified in the statement.

Prefer:

  from Student join Apply using(sID)

over:

  from Student natural join Apply

due to the clarity and robustness gained by explicitly specifying sID.

Think of group by as “partition by”.

If a relation can contain NULL values, be aware of how operators works:

  select distinct foo from Bar;

may return NULL values, but:

  select count(distinct foo) from Bar;

may not.