SQL: Pick up the Basic within a day
Overview
This blog is for people who have learnt SQL at some points of their study (just like me): We can quickly recap on various important concepts in SQL.
Basic Concepts
Primary key:
- Always unique for each row of the table, it must be NOT NULL (automatically set when
PRIMARY KEY
is specified) - Helps to identify each row even when row attributes are the same
- A table must have and only have 1 primary key
- Types:
- Surrogate key : An artificial key that has no mapping to anything (or business value) in real world.
- Natural key: A key that has mapping to real world thing: example: social security number/ NRIC / Passport number
- Always unique for each row of the table, it must be NOT NULL (automatically set when
composiite key: 2 column entries combined to form a key
- Motivation: sometimes individuals of 2 entries cannot uniquely identify a row;
Foreign key:
- Stores the primary key of a row in another database table
- The foreign key's column name NOT necessary to coincide with the foreign table's primary key column name
- A table can have more than 1 foreign key (or no foreign key at all)
advance concept:
Q: is it possible that TABLE A's foreign key is TABLE B's primary key and TABLE B's foreign key is TABLE A's primary key?
A: Yes! cyclic dependency is valid in SQL.
Example:
employee
'semp_id
isdepartment
'smanager_id
;department
'sbranch_id
isemployee
'sdepartment_id
.Q: is it possible that TABLE A's foreign key relates to itself?
A: Yes! used to define relationships between rows within a table.
Example:
employee
'ssuper_id
refers to a row inemployee
's table.
Data Types:
- INT: – Whole number
- DECIMAL(M,N): – Decimal numbers - exact value, M digits, N after decimal point
- VARCHAR(K): – Sring of text of length K
- BLOB: – Binary Large Object, stores large data
- DATE: – 'YYYY-MM-DD'
- TIMESTAMP – 'YYYY-MM-DD HH:MM:SS'
Difference between DROP and DELETE:
DELETE DROP Data Manipulation Language command Data Definition Language Command To remove tuples from a table To remove entire schema, table, domain or constraints from the database
Basic Operations
Create Database
1
2
3SHOW DATABASES;
CREATE DATABASE July_05;
USE July_05;Logical Query Processing (IMPT)
- Step 1. FROM (includes JOINS)
- Step 2. WHERE
- Step 3. GROUP BY
- Step 4. HAVING
- Step 5. SELECT
- Step 6. ORDER BY
CAUTION about column ordering: columns evaluated at later steps must be created in earlier steps
- Step 1. FROM (includes JOINS)
Table Opeartions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15CREATE TABLE student (
student_id INT PRIMARY KEY,
student_name VARCHAR(50),
major VARCHAR(20),
thr_id INT,
-- can also remove the PRIMARY KEY above and add a line below
-- PRIMARY KEY(student_id)
FOREIGN KEY(thr_id) REFERENCES teacher(emp_id) ON DELETE SET NULL
);
-- Show all columns properties using the DESCRIBE keyword
DESCRIBE student;
-- Name Null Type
-- student_id INT
-- name VARCHAR(50)
-- major VARCHAR(20)Delete or modify a table
1
2
3
4
5
6DROP TABLE student;
ALTER TABLE student ADD gpa DECIMAL(3,2);
ALTER TABLE student DROP COLUMN gpa;
ALTER TABLE student MODIFY COLUMN major TINYINT(1) UNSIGNED;
ALTER TABLE student ADD CONSTRAINT pk_id PRIMARY KEY (student_id);
ALTER TABLE student ADD CONSTRAINT fk_id FOREIGN KEY (thr_id) REFERENCES teacher(emp_id) ON DELETE SET NULL;Row Insertion
1
2
3-- Two ways of insertion
INSERT INTO student VALUES(2, 'Kate', 'Sociology');
INSERT INTO student(student_id, name) VALUES(3, 'Claire');More properties of column
1
2
3
4
5
6
7
8
9CREATE TABLE student (
student_id INT AUTO_INCREMENT, -- id increase automaically if not specified
student_id2 INT IDENTITY(1, 1) -- similar to AUTO_INCREMENT except
-- IDENTITY(seed, increment) enables one to self define the starting value (seed) and the increment amount (increment)
name VARCHAR(50) NOT NULL, -- name value cannot be empty
major VARCHAR(20) UNIQUE, -- each row's major value must be unique across the table
info VARCHAR(10) DEFAULT 'undecided', --info has 'undecided' as default value
PRIMARY KEY(student_id)
);Update the table
1
2
3
4
5
6
7-- Modify the content
UPDATE student
SET major = 'Biochemistry', name = 'What'
WHERE major = 'Biology' or major = 'Chemistry'; -- if no WHERE is applied, the set applies to all
-- Delete entries
DELETE FROM student
WHERE student_id = 5;SELECT keyword
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26-- partial selection
SELECT student.name, student.major
FROM student
ORDER BY major, name DESC; -- by default ascending order, DESC change to descending
-- or both major and name descending by
SELECT name, major
FROM student
ORDER BY major DESC, student_id DESC
- other optional selection technique
FROM student
...
WHERE major = 'chemistry' OR major = 'Bio';
...
WHERE name IN ('kate', 'Claire', 'Jack'); -- the use of IN keyword
...
WHERE birth_day BETWEEN '1970-01-01' AND '1975-01-01';
...
WHERE (birth_day >= '1970-01-01' AND sex = 'F') OR salary > 80000;
...
LIMIT 2 OFFSET 1;
...
SELECT TOP(100) -- select the 100 rows in the front
SELECT ... INTO samples -- select those columns into the "sample" tablecomparison keyword
1
<, > , <=, >=, =, <> (means not equal to), AND, OR, ANY, ALL
Functions to call
1
2
3
4
5
6SELECT COUNT(sex), sex
SELECT AVG(salary)
SELECT SUM(salary)
FROM employee
WHERE sex = 'F' AND birth_date > '1971-01-01';
GROUP BY sex;Wildcard
1 | -- It is often used to find the string containing certain characters; |
- UNION
- Motivation: row combine (fixed columns)
- Used to combine the multiple select statement into 1;
- Vertical join (add rows of the latter
SELECT
below the rows of formerSELECT
) - Warning: each entry within the same column must have the same data-type
1
2
3
4
5
6
7SELECT client.client_name AS Non_Employee_Entities, client.branch_id AS Branch_ID
-- here the renaming using AS is very important to make the unioned row's column more logical
-- e.g the client.branch_id and branch_supplier.branch_id unioned to be Branch_ID and branch_id separately in the table returned
FROM client
UNION
SELECT branch_supplier.supplier_name, branch_supplier.branch_id
FROM branch_supplier;1
2
3
4
5
6
7
8
9
10
11
12
13SELECT * FROM
(
(SELECT CITY, LENGTH(CITY)
FROM STATION
WHERE LENGTH(CITY) = (SELECT MIN(LENGTH(CITY)) FROM STATION)
ORDER BY CITY)
UNION
(SELECT CITY, LENGTH(CITY)
FROM STATION
WHERE LENGTH(CITY) = (SELECT MAX(LENGTH(CITY)) FROM STATION)
ORDER BY CITY)
) AS K -- note the use of AS is MUST included
ORDER BY CITY
- JOIN
- Motivation: column combine (fixed row)
- The second table is used as an auxilary table for additional column entries in the first table
1
2
3
4SELECT employee.emp_id, employee.first_name, branch.branch_name
FROM employee
JOIN branch -- LEFT JOIN, RIGHT JOIN
ON employee.emp_id = branch.mgr_id;Different types of join:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16SELECT * FROM
(
(SELECT
*
FROM STATION AS P
ORDER BY LENGTH(P.CITY) DESC
) AS A
LEFT JOIN
(SELECT
*
FROM STATION AS K
ORDER BY LENGTH(K.CITY) DESC
) AS B
ON A.ID = B.ID
) -- here should not have AS
ORDER BY A.CITY -- Must specify A or ambiguous warning- INNER JOIN: the usual type of JOIN;
Only those rows that match the ON criteria in both tables will be included and joined - LEFT JOIN:
All those rows in the left table are included but rows in the right table are included only when they match theON
criteria - RIGHT JOIN:
the symmetric idea with LEFT JOIN - OUTER JOIN:
All the rows in both tables are included (empty columns in the resultant table rows are treated with NULL)
- INNER JOIN: the usual type of JOIN;
- Nested query
1
2
3
4
5SELECT employee.first_name, employee.last_name
FROM employee
WHERE employee.emp_id IN (SELECT works_with.emp_id
FROM works_with
WHERE works_with.total_sales > 50000); - ON DELETE
1
2ON DELETE SET NULL -- set the foreign key to null if the primary key which the foreign key refers to gets deleted
ON DELETE CASCADE -- delete the entire row if the primary key gets deleted, especially important if set null cannot be done (i.e the foreign key cannot be set to null) - Trigger test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30CREATE TABLE trigger_test (
message VARCHAR(100)
);
-- the following code needs to be manually typed in mySQL code
DELIMITER $$ -- change the delimiter to $$
CREATE
TRIGGER my_trigger BEFORE INSERT
ON employee
FOR EACH ROW BEGIN
INSERT INTO trigger_test VALUES('added new employee'); -- note the use of ; delimiter here
END$$ -- we need to use the $$ as delimiter which is declared in line 168
DELIMITER ; -- change the delimiter back to ;
-- Conditional trigger_test
DELIMITER $$
CREATE
TRIGGER my_trigger BEFORE INSERT -- can also be UPDATE, DELETE
ON employee
FOR EACH ROW BEGIN
IF NEW.sex = 'M' THEN
INSERT INTO trigger_test VALUES('added male employee');
ELSEIF NEW.sex = 'F' THEN
INSERT INTO trigger_test VALUES('added female');
ELSE
INSERT INTO trigger_test VALUES('added other employee');
END IF;
END$$
DELIMITER ;
-- possible to drop the trigger case (done in client terminal):
DROP TRIGGER my_trigger - CTE: Common Table Expression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26WITH Number -- here Number is the name of the CTE, can be anything
AS
(
SELECT
CustomerId
, NTILE(1000) OVER(ORDER BY CustomerId) AS N
FROM dbo.Customers
)
,
TopCustomer -- here we define the second CTE here, notice the comma "," above, indicates that the WITH keyword is still effective
AS
(
SELECT
MAX(CustomerId) AS CustId
FROM Number
GROUP BY N
)
SELECT -- this SELECT is together with the CTE Expression, not separate query
C2.*
INTO dbo.CustomersSample
FROM TopCustomer AS C1
INNER JOIN dbo.Customers AS C2
ON C1.CustId = C2.CustomerId
SELECT * FROM dbo.CustomersSample -- with the above cte method, we created a randomized sample in the dbo.customers table - Functions and procedures
- Procedure Creation and Execution
1
2
3
4
5
6
7
8
9
10
11DELIMITER $$
CREATE PROCEDURE FizzBuzz()
BEGIN
DECLARE N INT DEFAULT 1;
WHILE N <= 100 DO
SET N = N + 1;
END WHILE;
END$$
DELIMITER ;
CALL FizzBuzz(); - Function Creation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16DELIMITER $$
CREATE FUNCTION multi( -- if function alrea exists, CREATE is changed to ALTER
n INT
, m INT
)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE result INT;
SET result = m * n;
RETURN result;
END$$
DELIMITER ;
SELECT your_db_name.multi(2,3) AS result; - Check if the function exists
1
SHOW FUNCTION STATUS WHERE db = 'your database name';
- Procedure Creation and Execution
String and numeric operations on values
String operations
1 | SELECt |
Regex Matching1
1 | SELECT CONCAT(first_name, ' ', last_name) FROM customer |
Numeric Functions
1 | SELECT |
1 | -- A Blunder |
Some Advanced operations: Window functions
- OVER clause determines window (the set of rows to operate on)
- PARTITION BY splits the result set into partitions on which the window function is applied
- Functions Available:
- Aggregate - COUNT, SUM, MIN, MAX, AVG
- Ranking - ROW_NUMBER, RANK, DENSE_RANK, NTILE
- Offset - FIRST_VALUE, LAST_VALUE, LEAD, LAG
- Statistical - PERCENT_RANK, CUME_DIST, PERCENTILE_CONT, PERCENTILE_DIST
- Windows Functions also have FRAMES
- ROWS
- RANGE
1. Demo on PARTITION BY
1 | -- the non-window function way |
2. On Ranking Functions
Ranking functions are available as part of Window Functions:
ROW_NUMBER() unique incrementing integers
RANK() same rank for same values, but keep the counting rolling
1, 1 (duplicate), 3, 4, 5 DENSE_RANK(): same rank for same values, but only increase rank by 1 when values change
1, 1 (duplicate), 2, 3, 4 RANK() vs DENSE_RANK(): RANK() will have rows with identical rank/ gaps in rank if we get tied values
NTILE(N) assigns tile number based on the number of tiles required, just assign each row with a value from 0 - N ,in increase order
- Example: 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, …. ,N, N, N, N
- Usage: for partitioning/selective sampling of the data
1 | SELECT -- note that here we use ORDER BY instead of PARTITION BY because order/rank is sort of important rather than the fixed set of value |
3. GROUP BY
1 | SELECT |
4. Window FRAMES
1 | SELECT |
5. Lag and Lead
- Useful for trend analysis
- LAG - return the value from the previous row
- LEAD - return the value from the next row
- Format:
1
LAG([Column], [Offset], [Value if NULL])
- Demo:
1
2
3
4
5
6
7
8
9SELECT
Sales_Customer_Id
, Sales_Date
, LAG(Sales_Amount, 2, 0) OVER(PARTITION BY Sales_Customer_Id ORDER BY Sales_Date) AS PrevValue
-- get the Sales_Amount 2 days before, if no value is in the entry 2 days before, set it to 0 (default is NULL)
, Sales_Amount
, LEAD(Sales_Amount, 2, 0) OVER(PARTITION BY Sales_Customer_Id ORDER BY Sales_Date) AS NextValue
-- idea is the same, just change it to later
FROM dbo.Sales
6. Rolling window
1 | SELECT |
7. Variable Specification
1 | SET GLOBAL some_global_variable = 1; |
Conclusion
The above codes demonstrate the majorities of the SQL codes formats an engineer would ever need in its daily CRUD operations already. Thanks for reading!
SQL: Pick up the Basic within a day