Concurrent Web Page Fetcher
Asynchronous I/O is crucial for building efficient and responsive applications, especially when dealing with network operations. This challenge asks you to implement a concurrent web page fetcher in Go using goroutines and channels to download multiple web pages simultaneously, demonstrating your understanding of asynchronous programming and Go's concurrency features.
Problem Description
You are tasked with creating a program that fetches the content of multiple web pages concurrently. The program should take a list of URLs as input and download the content of each page in parallel. The fetched content (as a string) and the URL from which it was fetched should be sent to a channel. The main function should then read from this channel and print the URL and the length of the fetched content. Error handling is essential; if a URL cannot be fetched, the error should be printed to standard error, and the program should continue processing other URLs.
Key Requirements:
- Concurrency: Utilize
goroutinesto fetch each URL concurrently. - Channels: Employ channels to communicate the fetched content and URLs between the goroutines and the main function.
- Error Handling: Gracefully handle errors during the fetching process and print them to standard error.
- Output: Print the URL and the length of the fetched content for each successfully fetched page.
- Efficiency: The program should be designed to maximize concurrency and minimize overall execution time.
Expected Behavior:
The program should accept a slice of URLs as input. It should then launch a goroutine for each URL to fetch its content. The fetched content and the corresponding URL should be sent to a channel. The main function should read from the channel, print the URL and the length of the content, and handle any errors encountered during fetching. The program should complete when all URLs have been processed.
Edge Cases to Consider:
- Invalid URLs: Handle cases where the provided URLs are malformed or unreachable.
- Network Errors: Account for potential network errors such as timeouts, connection refused, and DNS resolution failures.
- Empty Input: Handle the case where the input slice of URLs is empty.
- Large Number of URLs: Consider the potential for resource exhaustion if a very large number of URLs are provided.
Examples
Example 1:
Input: ["https://www.example.com", "https://www.google.com"]
Output:
www.example.com: 1291
www.google.com: 1379
Explanation: The program fetches the content of example.com and google.com concurrently. It then prints the URL and the length of the content for each.
Example 2:
Input: ["https://www.example.com", "https://invalid-url.com"]
Output:
www.example.com: 1291
2023/10/27 10:00:00 Error fetching https://invalid-url.com: Get "https://invalid-url.com": dial tcp 'invalid-url.com:443': connect: connection refused
Explanation: The program fetches example.com successfully. It attempts to fetch invalid-url.com, but encounters a connection error. The error is printed to standard error, and the program continues.
Example 3:
Input: []
Output: (No output)
Explanation: The input slice is empty. The program completes without any errors or output.
Constraints
- The program should be able to handle at least 10 URLs concurrently without significant performance degradation.
- The program should gracefully handle network errors and continue processing other URLs.
- The fetched content should be treated as a string.
- The URL input will be a slice of strings.
- The program should complete within a reasonable time (e.g., less than 10 seconds) for a list of 20 URLs.
Notes
- The
net/httppackage is the recommended way to fetch web pages in Go. - Consider using a
sync.WaitGroupto ensure that all goroutines complete before the program exits. - Channels are essential for safely communicating data between goroutines.
- Error handling is critical for robustness. Use
deferto ensure resources are cleaned up properly. - Think about how to structure your code to make it modular and easy to understand. Consider creating a separate function for fetching the content of a single URL.