Lunch Break Funky functions in SQL
Data Science for Everyone
SQL stands for Structured Query Language. People often pronounce it as either “S-Q-L” or “sequel.” SQL is used in programming and is designed for managing data stored in a database using SQL queries.
The most common type of database management system (DBMS) is a relational database management system (RDBMS), where we store structured data, i.e., data incorporating relations among entities and variables. These entities and variables are organized as a set of tables with columns and rows. Some of the most popular RDBMS are MYSQL, Oracle, IBM DB2, and Microsoft SQL Server database.
Loading the databases used in this Exercise
SELECT * FROM staff;
SELECT * FROM company_divisions;
SELECT * FROM company_regions;
Data Wrangling in SQL
- Data analytics is about answering questions ..for instance
what are the distinct departments in this company?
- we can use the function DISTINCT To answer this
SELECT ____(department)
FROM staff
ORDER BY department;
Concatenations
- say you just want to pull a staff name , his position and his department
- you can simply use SELECT Statement for this
SELECT
last_name,
job_title ,
department FROM staff;
- however you can choose to concatenate these into one column to be more explicit.
SELECT
last_name,|| ' - ' || department AS title_with_department
job_title FROM staff;
The SQL LIKE Operator for Pattern Matching
Like it or not, the LIKE operator is essential in SQL. It gives data practitioners the power to filter data on specific string matches.
column_name LIKE pattern
Pull information for employees with Assistant roles only
SELECT id , last_name , email ,job_title
FROM staff
WHERE job_title LIKE '%Assistant%';
How many employees with Assistant roles
SELECT
COUNT(*) AS employees_with_Assistant_role
FROM staff
WHERE job_title LIKE '%Assistant%';
What are those Assistant roles?
SELECT DISTINCT(job_title)
FROM staff
WHERE job_title LIKE '%Assistant%'
ORDER BY 1;
The magic here is in the clause
WHERE job_title LIKE '%Assistant%'
, which means “find all job_title that contains the word ‘Assistant’.” The'%Assistant%'
here is known as a pattern for matching.The
%
is not the only wildcard you can use in conjunction with the LIKE operator. You could use the underscore sign_
too.
%
matches any number of characters. _
matches any single character. The syntax is easy to remember. It is simply
Filterig, Join and Aggregration
- Assume that people who earn at latest 100,000 salary is Executive. We want to know the average salary for executives for each department.
SELECT department, ROUND(AVG(salary),2) AS average_salary
FROM staff
WHERE salary >= 100000
GROUP BY department
ORDER BY 2 DESC;
- who earn the most in the company?
SELECT last_name, department, salary
FROM staff
WHERE salary = (
SELECT MAX(s2.salary)
FROM staff s2
);
- who earn the most in his/her own department
SELECT s.department, s.last_name, s.salary
FROM staff s
WHERE s.salary = (
SELECT MAX(s2.salary)
FROM staff s2
WHERE s2.department = s.department
)ORDER BY 1;
Basic Statistics
- What about gender distribution?
SELECT gender, COUNT(*) AS total_employees
FROM staff
GROUP BY gender;
- How many employees in each department
SELECT department, COUNT(*) AS total_employee
FROM staff
GROUP BY department
ORDER BY department;
- How many distinct departments ?
SELECT DISTINCT(department)
FROM staff
ORDER BY department;
- What is the highest and lowest salary of employees?
SELECT MAX(salary) AS Max_Salary, MIN(salary) AS Min_Salary
FROM staff;
- what about salary distribution by gender group?
SELECT gender, MIN(salary) As Min_Salary, MAX(salary) AS Max_Salary, AVG(salary) AS Average_Salary
FROM staff
GROUP BY gender;
- How much total salary company is spending each year?
SELECT SUM(salary)
FROM staff;
- want to know distribution of min, max average salary by department
SELECT
department, MIN(salary) As Min_Salary,
MAX(salary) AS Max_Salary,
AVG(salary) AS Average_Salary,
COUNT(*) AS total_employees
FROM staff
GROUP BY department
ORDER BY 4 DESC;
- how spread out those salary around the average salary in each department ?
SELECT
department, MIN(salary) As Min_Salary,
MAX(salary) AS Max_Salary,
AVG(salary) AS Average_Salary,
VAR_POP(salary) AS Variance_Salary,
STDDEV_POP(salary) AS StandardDev_Salary,
COUNT(*) AS total_employees
FROM staff
GROUP BY department
ORDER BY 4 DESC;
- which department has the highest salary spread out ?
SELECT
department, MIN(salary) As Min_Salary,
MAX(salary) AS Max_Salary,
ROUND(AVG(salary),2) AS Average_Salary,
ROUND(VAR_POP(salary),2) AS Variance_Salary,
ROUND(STDDEV_POP(salary),2) AS StandardDev_Salary,
COUNT(*) AS total_employees
FROM staff
GROUP BY department
ORDER BY 6 DESC;
- Let’s see Health department salary
SELECT department, salary
FROM staff
WHERE department LIKE 'Health'
ORDER BY 2 ASC;
VIEWS VS COMMON TABLE EXPRESSIONS
common table expressions are named temporary result sets that are created using simple SQL statements and then referenced within SELECT, INSERT, UPDATE, or DELETE statements.
A view is a stored SQL query that is executed each time you reference it in another query. Note that a view doesn’t store the output of a particular query – it stores the query itself.
SQL CTE vs. View: When to Use Each One
Although there are some differences between them, common table expressions and views seem to perform very similarly. So, when should you use each one?
- Ad-hoc queries. For queries that are referenced occasionally (or just once), it’s usually better to use a CTE. If you need the query again, you can just copy the CTE and modify it if necessary.
- Frequently used queries. If you tend to reference the same query often, creating a corresponding view is a good idea. However, you’ll need create view permission in your database to create a view.
- Access management. A view might be used to restrict particular users’ database access while still allowing them to get the information they need. You can give users access to specific views that query the data they’re allowed to see without exposing the whole database. In such a case, a view provides an additional access layer.
we will make 3 buckets to see the salary earning status for Health Department
CREATE VIEW health_dept_earning_status
AS
SELECT
CASE
WHEN salary >= 100000 THEN 'high earner'
WHEN salary >= 50000 AND salary < 100000 THEN 'middle earner'
ELSE 'low earner'
END AS earning_status
FROM staff
WHERE department LIKE 'Health';
- we can see that there are 24 high earners, 14 middle earners and 8 low earners
SELECT earning_status, COUNT(*)
FROM health_dept_earning_status
GROUP BY 1;
WITH cte
AS (
SELECT
CASE
WHEN salary >= 100000 THEN 'high earner'
WHEN salary >= 50000 AND salary < 100000 THEN 'middle earner'
ELSE 'low earner'
END AS earning_status
FROM staff
WHERE department LIKE 'Health')
- we can see that there are 24 high earners, 14 middle earners and 8 low earners
SELECT earning_status, COUNT(*)
FROM cte
GROUP BY 1;