Hone logo
Hone
Problems

Finding Shortest Paths in a Social Network with SQL

Social networks are essentially graphs where users are nodes and connections (friendships, follows, etc.) are edges. This challenge asks you to use SQL to perform graph traversal, specifically finding the shortest path between two users in a simulated social network. This is a common problem in social network analysis, recommendation systems, and network optimization.

Problem Description

You are given a table representing a social network. The table, friendships, has two columns: user1 and user2, representing a connection between two users. Your task is to write a SQL query that finds the shortest path (minimum number of hops) between two given users, start_user and end_user. The query should return a list of users representing the shortest path, ordered from start_user to end_user. If no path exists, return an empty list.

Key Requirements:

  • Shortest Path: The solution must find the shortest path, not just a path.
  • Path Reconstruction: The query must return the actual path as a list of user IDs.
  • Directed Graph: Treat the friendships table as a directed graph. A connection from user1 to user2 does not imply a connection from user2 to user1.
  • Self-Loops: Ignore any rows where user1 and user2 are the same.
  • No Cycles: The algorithm should avoid infinite loops in case of cycles in the graph.

Expected Behavior:

The query should take start_user and end_user as input parameters. It should return a list of user IDs representing the shortest path from start_user to end_user. The order of the user IDs in the list should be from start_user to end_user. If no path exists, the query should return an empty list.

Edge Cases to Consider:

  • start_user and end_user are the same: Return a list containing only start_user.
  • start_user or end_user do not exist in the friendships table: Return an empty list.
  • The graph is disconnected: Return an empty list if end_user is not reachable from start_user.
  • Multiple shortest paths exist: Return one of the shortest paths (the specific one doesn't matter).

Examples

Example 1:

Input: friendships table:
user1 | user2
-------|-------
1     | 2
2     | 3
1     | 4
4     | 3
start_user = 1
end_user = 3
Output: [1, 2, 3]
Explanation: The shortest path from 1 to 3 is 1 -> 2 -> 3.

Example 2:

Input: friendships table:
user1 | user2
-------|-------
1     | 2
2     | 3
start_user = 1
end_user = 4
Output: []
Explanation: User 4 is not reachable from user 1.

Example 3:

Input: friendships table:
user1 | user2
-------|-------
1     | 2
2     | 3
3     | 1
start_user = 1
end_user = 3
Output: [1, 2, 3]
Explanation: Although there's a cycle, the shortest path is still 1 -> 2 -> 3.

Example 4:

Input: friendships table:
user1 | user2
-------|-------
1     | 2
2     | 3
1     | 3
start_user = 1
end_user = 3
Output: [1, 3]
Explanation: The shortest path is 1 -> 3.

Constraints

  • The friendships table will contain at least 1 row and up to 10,000 rows.
  • user1 and user2 will be integers between 1 and 1000 (inclusive).
  • start_user and end_user will be integers between 1 and 1000 (inclusive).
  • The query should execute within 10 seconds for any valid input.

Notes

  • This problem requires a recursive or iterative approach to graph traversal within SQL. Common techniques include using Common Table Expressions (CTEs) to simulate recursion or iterative queries with UNION ALL.
  • Consider using a breadth-first search (BFS) approach to guarantee finding the shortest path.
  • Be mindful of potential infinite loops if the graph contains cycles. Implement a mechanism to prevent revisiting nodes.
  • The SQL dialect is assumed to be standard SQL, compatible with most major database systems (PostgreSQL, MySQL, SQL Server, etc.). However, specific syntax for recursive CTEs might vary slightly.
  • Focus on clarity and efficiency in your SQL query. While a functional solution is the primary goal, optimizing for performance is also important given the constraints.
Loading editor...
plaintext