Hone logo
Hone
Problems

Distributed Cache Implementation in Python

Building a distributed cache is a common requirement in modern applications to improve performance and reduce database load. This challenge asks you to design and implement a simplified distributed cache system in Python, focusing on core functionalities like setting, getting, and deleting keys across multiple cache nodes. The goal is to understand the fundamental concepts of distributed caching and how to handle data consistency.

Problem Description

You are tasked with creating a basic distributed cache system. The system should consist of multiple cache nodes, each storing a subset of the cached data. The cache should support the following operations:

  • set(key, value, node_id): Stores a key-value pair in the specified node_id. node_id is an integer representing the node where the data should be stored.
  • get(key, node_id): Retrieves the value associated with the given key from the specified node_id. If the key is not found in that node, it should return None.
  • delete(key, node_id): Removes the key-value pair associated with the given key from the specified node_id.

The system should be designed to handle multiple concurrent requests. For simplicity, assume that node IDs are always valid (i.e., within the range of available nodes). You don't need to implement node discovery or failover mechanisms for this challenge. Focus on the core caching logic.

Key Requirements:

  • Implement a DistributedCache class.
  • The DistributedCache class should internally manage a dictionary representing the cache nodes. Each node is identified by its node_id and stores its data as a dictionary.
  • The set, get, and delete methods should be implemented as described above.
  • Handle the case where a key is not found gracefully (return None for get).

Expected Behavior:

The cache should behave as a simple key-value store distributed across multiple nodes. Data stored on a specific node should only be accessible through the get and delete operations targeting that node.

Edge Cases to Consider:

  • Empty cache: Ensure that get returns None when the cache is empty or the key doesn't exist.
  • Invalid node ID: While the problem states node IDs are always valid, consider how your code would behave if an invalid node ID were passed (e.g., raise an exception or ignore the operation).
  • Concurrent access: While not strictly required to implement full concurrency control, consider how your code might behave under concurrent access (e.g., using locks if necessary for more complex scenarios).

Examples

Example 1:

Input:
cache = DistributedCache(num_nodes=3)
cache.set("key1", "value1", 0)
cache.set("key2", "value2", 1)
print(cache.get("key1", 0))
print(cache.get("key2", 1))
print(cache.get("key3", 2))
Output:
value1
value2
None

Explanation: "key1" is stored on node 0, "key2" on node 1, and "key3" is not present in any node.

Example 2:

Input:
cache = DistributedCache(num_nodes=2)
cache.set("key1", "value1", 0)
cache.delete("key1", 0)
print(cache.get("key1", 0))
Output:
None

Explanation: "key1" is stored on node 0 and then deleted from node 0. A subsequent get on node 0 returns None.

Example 3: (Edge Case)

Input:
cache = DistributedCache(num_nodes=1)
print(cache.get("key1", 0))
Output:
None

Explanation: The cache has only one node, and "key1" is not stored on that node.

Constraints

  • num_nodes will be a positive integer between 1 and 100 (inclusive).
  • key will be a string.
  • value can be any Python object.
  • node_id will be an integer between 0 and num_nodes - 1 (inclusive).
  • The solution should be reasonably efficient for a small number of nodes and keys (e.g., up to 1000 keys across all nodes). Optimizing for extreme scale is not required.

Notes

  • You can use Python dictionaries to represent the cache nodes.
  • Consider the structure of your DistributedCache class and how it manages the nodes.
  • This is a simplified model of a distributed cache. Real-world distributed caches have much more complex features like data replication, consistency protocols, and fault tolerance.
  • Focus on clarity and correctness of the core caching operations. Error handling beyond the specified behavior (e.g., invalid input types) is not required.
Loading editor...
python