Simple Website Traffic Analyzer
This challenge focuses on implementing basic analytics for a simplified website. You'll be provided with a list of website visit logs, and your task is to write a Python script that analyzes these logs to calculate key metrics like total visits, unique visitors, and the most popular pages. This is a fundamental problem in data analysis and a stepping stone to more complex web analytics implementations.
Problem Description
You are given a list of strings, where each string represents a website visit log entry. Each log entry follows the format: "IP_Address,Page_URL". Your task is to write a Python function analyze_traffic(logs) that takes this list of logs as input and returns a dictionary containing the following analytics:
total_visits: The total number of visits recorded in the logs.unique_visitors: The number of unique IP addresses that visited the website.popular_pages: A dictionary where keys are page URLs and values are the number of times each page was visited. The dictionary should be sorted in descending order of visit count.
The function should handle potential errors gracefully, such as malformed log entries. Malformed entries should be ignored without crashing the program.
Examples
Example 1:
Input: ["192.168.1.1,/home", "192.168.1.2,/about", "192.168.1.1,/home", "192.168.1.3,/contact"]
Output: {'total_visits': 4, 'unique_visitors': 3, 'popular_pages': {'/home': 2, '/about': 1, '/contact': 1}}
Explanation: There are 4 total visits from 3 unique IP addresses. The /home page was visited twice, while /about and /contact were visited once each. The `popular_pages` dictionary is sorted by visit count.
Example 2:
Input: ["10.0.0.1,/products", "10.0.0.2,/services", "10.0.0.1,/products", "10.0.0.3,/blog", "invalid_log_entry", "10.0.0.2,/services"]
Output: {'total_visits': 4, 'unique_visitors': 3, 'popular_pages': {'/products': 2, '/services': 2, '/blog': 1}}
Explanation: The "invalid_log_entry" is ignored. /products and /services were each visited twice, and /blog was visited once.
Example 3: (Edge Case - Empty Input)
Input: []
Output: {'total_visits': 0, 'unique_visitors': 0, 'popular_pages': {}}
Explanation: If the input list is empty, all metrics should be zero or empty.
Constraints
- The input
logswill be a list of strings. - Each string in
logsis expected to be in the format "IP_Address,Page_URL". - IP addresses are assumed to be valid IPv4 addresses (though validation is not required).
- Page URLs are strings.
- The length of the
logslist can vary from 0 to 1000. - The
popular_pagesdictionary must be sorted in descending order of visit count. Use Python's built-in sorting capabilities for this.
Notes
- Consider using a
try-exceptblock to handle potential errors when parsing log entries. - A
setcan be used efficiently to track unique visitors. - A
dictionaryis suitable for counting page visits. - Remember to sort the
popular_pagesdictionary by value (visit count) in descending order. You can usesorted()with alambdafunction for this. - Focus on clarity and readability in your code. Good variable names and comments are encouraged.