Data Archiving System in Python
Data archiving is a crucial process for managing large datasets, ensuring compliance, and freeing up valuable storage space. This challenge asks you to implement a simplified data archiving system in Python that moves older files from a source directory to an archive directory based on a specified age threshold. This is a common task in data management and backup systems.
Problem Description
You are tasked with creating a Python script that archives files older than a given number of days from a source directory to an archive directory. The script should:
- Take three arguments:
source_directory: The path to the directory containing the files to be archived.archive_directory: The path to the directory where archived files will be moved.age_in_days: The age (in days) beyond which files should be archived.
- Identify files older than
age_in_days: The script should iterate through all files in thesource_directoryand determine their last modification time. - Move files to the archive directory: For each file identified as older than the specified age, the script should move it to the
archive_directory. The original file name should be preserved in the archive directory. - Handle directory creation: If the
archive_directorydoes not exist, the script should create it. - Error Handling: The script should gracefully handle potential errors, such as:
- The
source_directorynot existing. - Permissions issues when reading from the
source_directoryor writing to thearchive_directory. - Files that cannot be moved (e.g., due to permissions or being in use). These files should be logged to the console (printed to standard output) with a message indicating the failure and the reason.
- The
Examples
Example 1:
Input: source_directory="/tmp/source", archive_directory="/tmp/archive", age_in_days=7
Assume /tmp/source contains two files: file1.txt (modified 10 days ago) and file2.txt (modified 3 days ago). Assume /tmp/archive does not exist.
Output:
/tmp/archive created
Moving file1.txt to /tmp/archive
Explanation: The script creates /tmp/archive. It identifies file1.txt as older than 7 days and moves it to /tmp/archive. file2.txt is not older than 7 days, so it is not moved.
Example 2:
Input: source_directory="/tmp/source", archive_directory="/tmp/archive", age_in_days=30
Assume /tmp/source contains file1.txt (modified 45 days ago), file2.txt (modified 15 days ago), and /tmp/archive already exists.
Output:
Moving file1.txt to /tmp/archive
Explanation: The script identifies file1.txt as older than 30 days and moves it to /tmp/archive. file2.txt is not older than 30 days, so it is not moved.
Example 3: (Edge Case - Permissions Error)
Input: source_directory="/root/source", archive_directory="/tmp/archive", age_in_days=1
Assume /root/source contains file1.txt (modified 2 days ago) and the script is run by a user without read permissions for /root/source.
Output:
Error: Permission denied when accessing /root/source. Skipping archiving.
Explanation: The script detects a permission error when trying to access /root/source and prints an error message to standard output, skipping the archiving process.
Constraints
source_directory,archive_directorymust be strings representing valid file paths.age_in_daysmust be a non-negative integer.- The script should handle files of any size.
- The script should be reasonably efficient for directories containing a large number of files (e.g., thousands). Avoid unnecessary iterations or complex data structures.
- The script should not delete the original files from the
source_directory. It should move them. - The script should not archive directories, only files.
Notes
- You can use the
os,time, andshutilmodules in Python. - Consider using
os.path.getmtime()to get the last modification time of a file. - Use
shutil.move()to move files. - Think about how to handle potential exceptions (e.g.,
FileNotFoundError,PermissionError). - The script should be designed to be robust and handle unexpected situations gracefully. Logging errors is important.
- Assume the input arguments are provided correctly (no need to validate the input arguments themselves, just handle errors that arise from them).