Finding High-Performing Employees Based on Department Averages
This challenge focuses on using subqueries within the WHERE clause in SQL to identify employees who significantly outperform their department's average salary. Understanding how to leverage subqueries for comparison is a crucial skill for data analysis and reporting, allowing you to pinpoint exceptional individuals within specific groups.
Problem Description
You are tasked with writing a SQL query that identifies employees whose salaries are above the average salary of their respective departments. The query should return the employee's ID, name, department, and salary. The subquery will calculate the average salary for each department, and the main query will filter employees based on this calculated average.
Key Requirements:
- The query must use a subquery within the
WHEREclause. - The subquery must calculate the average salary for each department.
- The main query must filter employees to include only those whose salary is greater than the average salary of their department.
- The query should handle cases where a department might have no employees (the average salary would be NULL in this case – you should still be able to correctly identify employees in departments with employees).
Expected Behavior:
The query should return a result set containing the employee_id, employee_name, department_id, and salary for all employees whose salary exceeds the average salary of their department. If an employee's salary is equal to the average, they should not be included in the result.
Edge Cases to Consider:
- Departments with no employees.
- Null salary values (assume these should be excluded from the average calculation).
- Large datasets – consider query performance.
Examples
Example 1:
Input:
Employee Table:
employee_id | employee_name | department_id | salary
------------|---------------|---------------|--------
1 | Alice | 10 | 60000
2 | Bob | 10 | 75000
3 | Charlie | 20 | 50000
4 | David | 20 | 55000
5 | Eve | 30 | 80000
Department Table:
department_id | department_name
---------------|----------------
10 | Sales
20 | Marketing
30 | Engineering
Output:
employee_id | employee_name | department_id | salary
------------|---------------|---------------|--------
2 | Bob | 10 | 75000
4 | David | 20 | 55000
5 | Eve | 30 | 80000
Explanation:
The average salary for Sales (department 10) is (60000 + 75000) / 2 = 67500. Bob's salary (75000) is greater than 67500.
The average salary for Marketing (department 20) is (50000 + 55000) / 2 = 52500. David's salary (55000) is greater than 52500.
The average salary for Engineering (department 30) is 80000. Eve's salary (80000) is not greater than 80000, so she is not included.
Example 2:
Input:
Employee Table:
employee_id | employee_name | department_id | salary
------------|---------------|---------------|--------
1 | Alice | 10 | 60000
2 | Bob | 10 | 60000
3 | Charlie | 20 | 50000
Department Table:
department_id | department_name
---------------|----------------
10 | Sales
20 | Marketing
Output:
employee_id | employee_name | department_id | salary
------------|---------------|---------------|--------
1 | Alice | 10 | 60000
2 | Bob | 10 | 60000
Explanation:
The average salary for Sales (department 10) is (60000 + 60000) / 2 = 60000. Both Alice and Bob's salaries are greater than 60000.
The average salary for Marketing (department 20) is 50000. No employee's salary is greater than 50000.
Constraints
- The
employee_idis a unique integer. employee_nameis a string.department_idis an integer.salaryis a numeric value (integer or decimal).- The database system is assumed to be a standard SQL implementation (e.g., MySQL, PostgreSQL, SQL Server).
- The query should execute within a reasonable time frame (e.g., less than 1 second) on a dataset with up to 10,000 employees.
Notes
- Consider using
AVG()function to calculate the average salary. - The subquery should be placed in the
WHEREclause to filter the employees. - Think about how to handle departments with no employees to avoid errors. The
AVG()function will returnNULLin such cases, which should be handled appropriately. - Focus on clarity and readability of your SQL query. Proper formatting and indentation are encouraged.
- Pseudocode:
// Assume Employee table has columns: employee_id, employee_name, department_id, salary
// Assume Department table has columns: department_id, department_name
// 1. Calculate the average salary for each department using a subquery.
// - SELECT department_id, AVG(salary) AS avg_salary
// FROM Employee
// GROUP BY department_id
// 2. Select employee information from the Employee table.
// - SELECT employee_id, employee_name, department_id, salary
// FROM Employee
// 3. Filter the employees based on the average salary of their department using a subquery in the WHERE clause.
// - WHERE salary > (SELECT AVG(salary) FROM Employee WHERE department_id = Employee.department_id)