Hone logo
Hone
Problems

Data Archiving System in Python

Data archiving is a crucial process for managing large datasets, ensuring compliance, and freeing up valuable storage space. This challenge asks you to implement a simplified data archiving system in Python that moves older files from a source directory to an archive directory based on a specified age threshold. This is a common task in data management and backup systems.

Problem Description

You are tasked with creating a Python script that archives files older than a given number of days from a source directory to an archive directory. The script should:

  1. Take three arguments:
    • source_directory: The path to the directory containing the files to be archived.
    • archive_directory: The path to the directory where archived files will be moved.
    • age_in_days: The age (in days) beyond which files should be archived.
  2. Identify files older than age_in_days: The script should iterate through all files in the source_directory and determine their last modification time.
  3. Move files to the archive directory: For each file identified as older than the specified age, the script should move it to the archive_directory. The original file name should be preserved in the archive directory.
  4. Handle directory creation: If the archive_directory does not exist, the script should create it.
  5. Error Handling: The script should gracefully handle potential errors, such as:
    • The source_directory not existing.
    • Permissions issues when reading from the source_directory or writing to the archive_directory.
    • Files that cannot be moved (e.g., due to permissions or being in use). These files should be logged to the console (printed to standard output) with a message indicating the failure and the reason.

Examples

Example 1:

Input: source_directory="/tmp/source", archive_directory="/tmp/archive", age_in_days=7

Assume /tmp/source contains two files: file1.txt (modified 10 days ago) and file2.txt (modified 3 days ago). Assume /tmp/archive does not exist.

Output:
/tmp/archive created
Moving file1.txt to /tmp/archive

Explanation: The script creates /tmp/archive. It identifies file1.txt as older than 7 days and moves it to /tmp/archive. file2.txt is not older than 7 days, so it is not moved.

Example 2:

Input: source_directory="/tmp/source", archive_directory="/tmp/archive", age_in_days=30

Assume /tmp/source contains file1.txt (modified 45 days ago), file2.txt (modified 15 days ago), and /tmp/archive already exists.

Output:
Moving file1.txt to /tmp/archive

Explanation: The script identifies file1.txt as older than 30 days and moves it to /tmp/archive. file2.txt is not older than 30 days, so it is not moved.

Example 3: (Edge Case - Permissions Error)

Input: source_directory="/root/source", archive_directory="/tmp/archive", age_in_days=1

Assume /root/source contains file1.txt (modified 2 days ago) and the script is run by a user without read permissions for /root/source.

Output:
Error: Permission denied when accessing /root/source.  Skipping archiving.

Explanation: The script detects a permission error when trying to access /root/source and prints an error message to standard output, skipping the archiving process.

Constraints

  • source_directory, archive_directory must be strings representing valid file paths.
  • age_in_days must be a non-negative integer.
  • The script should handle files of any size.
  • The script should be reasonably efficient for directories containing a large number of files (e.g., thousands). Avoid unnecessary iterations or complex data structures.
  • The script should not delete the original files from the source_directory. It should move them.
  • The script should not archive directories, only files.

Notes

  • You can use the os, time, and shutil modules in Python.
  • Consider using os.path.getmtime() to get the last modification time of a file.
  • Use shutil.move() to move files.
  • Think about how to handle potential exceptions (e.g., FileNotFoundError, PermissionError).
  • The script should be designed to be robust and handle unexpected situations gracefully. Logging errors is important.
  • Assume the input arguments are provided correctly (no need to validate the input arguments themselves, just handle errors that arise from them).
Loading editor...
python