Automated File Backup System
This challenge asks you to implement a basic automated file backup system in Python. The system should monitor a specified directory for changes and periodically copy files to a backup directory. This is a common and essential task for data protection and disaster recovery.
Problem Description
You are tasked with creating a Python script that performs automated backups of files from a source directory to a destination directory. The script should:
- Monitor a Source Directory: The script will be given a source directory path as input. It should continuously monitor this directory for new, modified, or deleted files.
- Backup to a Destination Directory: The script will also be given a destination directory path. It should copy any new or modified files from the source directory to the destination directory. If a file exists in the destination directory with the same name, it should be overwritten with the latest version from the source.
- Handle Deletions: If a file is deleted from the source directory, it should also be deleted from the destination directory.
- Periodic Check: The script should check for changes at a specified interval (e.g., every 5 seconds).
- Error Handling: The script should gracefully handle potential errors, such as invalid directory paths or permission issues, and log these errors.
Key Requirements:
- The script must be able to handle a large number of files (hundreds or thousands).
- The backup process should be efficient to minimize resource usage.
- The script should be robust and handle unexpected situations gracefully.
Expected Behavior:
- The script should run continuously in the background, monitoring the source directory.
- Changes in the source directory should be reflected in the destination directory within the specified interval.
- The script should log any errors encountered during the backup process.
- The script should not consume excessive system resources.
Edge Cases to Consider:
- Empty source directory.
- Destination directory not existing (should create it).
- Permissions issues when accessing source or destination directories.
- Files with unusual characters in their names.
- Very large files.
- Simultaneous modifications to the same file.
Examples
Example 1:
Input: source_dir = "/path/to/source", dest_dir = "/path/to/backup", interval = 5
Initial State: source_dir contains file "file1.txt" with content "initial content". dest_dir is empty.
After 5 seconds: source_dir is unchanged. dest_dir contains "file1.txt" with content "initial content".
After 5 seconds: source_dir now contains "file1.txt" with content "updated content". dest_dir now contains "file1.txt" with content "updated content".
Explanation: The script monitors the source directory and copies the latest version of "file1.txt" to the destination directory.
Example 2:
Input: source_dir = "/path/to/source", dest_dir = "/path/to/backup", interval = 5
Initial State: source_dir contains "file1.txt" and "file2.txt". dest_dir contains "file1.txt" and "file3.txt".
After 5 seconds: source_dir now contains only "file1.txt". dest_dir now contains only "file1.txt".
Explanation: "file2.txt" was deleted from the source directory, so it was also deleted from the destination directory. "file3.txt" was not present in the source directory, so it was deleted from the destination directory.
Example 3: (Edge Case - Destination Directory Doesn't Exist)
Input: source_dir = "/path/to/source", dest_dir = "/path/to/nonexistent_backup", interval = 5
Initial State: source_dir contains "file1.txt". dest_dir does not exist.
After script execution: dest_dir is created. "file1.txt" is copied to "/path/to/nonexistent_backup/file1.txt".
Explanation: The script automatically creates the destination directory if it doesn't exist.
Constraints
source_diranddest_dirmust be valid string paths.intervalmust be a positive integer representing the check interval in seconds.- The script should be able to handle files up to 1GB in size.
- The script should use a reasonable amount of memory (less than 500MB) even when backing up a large number of files.
- The script should be able to handle a source directory containing up to 10,000 files.
Notes
- Consider using the
osandshutilmodules for file system operations. - You can use the
timemodule for implementing the periodic check. - Think about how to efficiently track changes in the source directory. Hashing files can be useful.
- Logging errors is crucial for debugging and monitoring the backup process. Use the
loggingmodule. - While a full-fledged synchronization tool is beyond the scope of this challenge, focus on the core concepts of monitoring, copying, and deleting files.
- Consider using a simple in-memory data structure (e.g., a dictionary) to store the state of the files in the source directory.