Advanced SQL Tips for Data Analysts
SQL remains one of the most important skills for data analysts. While basic SELECT statements will get you started, mastering advanced SQL techniques can dramatically improve your efficiency and the insights you can extract from your data.
1. Window Functions
Window functions perform calculations across a set of table rows related to the current row. They're incredibly powerful for analytics.
SELECT
department,
employee_name,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
salary - AVG(salary) OVER (PARTITION BY department) as diff_from_avg
FROM employees;
This query shows each employee's salary alongside their department's average salary and the difference between the two.
2. Common Table Expressions (CTEs)
CTEs make complex queries more readable by breaking them into named temporary result sets.
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(amount) as total_sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
sales_growth AS (
SELECT
month,
total_sales,
LAG(total_sales) OVER (ORDER BY month) as prev_month_sales
FROM monthly_sales
)
SELECT
month,
total_sales,
prev_month_sales,
(total_sales - prev_month_sales) / prev_month_sales * 100 as growth_percentage
FROM sales_growth
WHERE prev_month_sales IS NOT NULL;
3. CASE Statements for Conditional Logic
CASE statements allow you to add conditional logic to your queries.
SELECT
product_name,
price,
CASE
WHEN price < 50 THEN 'Budget'
WHEN price BETWEEN 50 AND 100 THEN 'Mid-range'
ELSE 'Premium'
END as price_category
FROM products;
4. Efficient Joins
Understanding different join types and their performance implications is crucial:
- INNER JOIN: Returns only matching rows
- LEFT JOIN: Returns all rows from the left table and matching rows from the right
- RIGHT JOIN: Returns all rows from the right table and matching rows from the left
- FULL OUTER JOIN: Returns all rows when there's a match in either table
5. Subqueries vs. Joins
Sometimes a subquery is more readable or efficient than a join:
SELECT
department_name,
(SELECT COUNT(*) FROM employees e WHERE e.department_id = d.id) as employee_count
FROM departments d;
6. Analytical Functions
Functions like RANK(), DENSE_RANK(), and NTILE() are powerful for ranking and segmentation:
SELECT
product_name,
category,
price,
RANK() OVER (PARTITION BY category ORDER BY price DESC) as price_rank
FROM products;
7. Efficient Aggregations
GROUP BY with ROLLUP, CUBE, or GROUPING SETS allows for multiple levels of aggregation in a single query:
SELECT
COALESCE(region, 'All Regions') as region,
COALESCE(product_category, 'All Categories') as category,
SUM(sales) as total_sales
FROM sales
GROUP BY ROLLUP(region, product_category);
8. Date and Time Functions
Mastering date/time functions is essential for time-series analysis:
SELECT
DATE_TRUNC('week', order_date) as week,
COUNT(*) as orders,
SUM(amount) as revenue
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY DATE_TRUNC('week', order_date)
ORDER BY week;
9. String Manipulation
Functions like SUBSTRING, CONCAT, REGEXP_REPLACE help with text data:
SELECT
email,
SUBSTRING(email FROM '@(.*)$') as domain,
COUNT(*) as count
FROM users
GROUP BY SUBSTRING(email FROM '@(.*)$')
ORDER BY count DESC;
10. Query Optimization
Understanding execution plans and indexing is crucial for performance:
- Use EXPLAIN to analyze query execution plans
- Filter early to reduce the working dataset
- Be cautious with functions in WHERE clauses as they can prevent index usage
- Consider materialized views for complex, frequently-run queries
Conclusion
Mastering these advanced SQL techniques will not only make you more efficient but also enable you to extract deeper insights from your data. The best way to improve is through practice—try implementing these techniques in your daily work and you'll quickly see the benefits.