Extract Links and Images

We need to extract links from the HTML both the links people can click AND the images that are displayed. This gives us a complete picture of what resources each page references.

For example, this HTML page has both a link and an image:

<html>
  <body>
    <a href="https://blog.boot.dev">Go to Boot.dev</a>
    <img src="/logo.png" alt="Boot.dev Logo" />
  </body>
</html>

We need to extract both https://blog.boot.dev (from the link) and /logo.png (from the image).

Assignment

Write tests for a new function called getURLsFromHTML. Here's the signature:

func getURLsFromHTML(htmlBody string, baseURL *url.URL) ([]string, error) {

htmlBody is an HTML string
baseURL is the (already parsed) root URL of the website we're crawling. This will allow us to rewrite relative URLs into absolute URLs.
It returns an un-normalized list of all the URLs found within the HTML, and an error if one occurs.

In your tests, make sure that:

relative URLs are converted to absolute URLs
you find all the <a> tags in a body of HTML

Here's one example test case to give you an idea:

func TestGetURLsFromHTMLAbsolute(t *testing.T) {
	inputURL := "https://blog.boot.dev"
	inputBody := `<html><body><a href="https://blog.boot.dev"><span>Boot.dev</span></a></body></html>`

    baseURL, err := url.Parse(inputURL)
    if err != nil {
        t.Errorf("couldn't parse input URL: %v", err)
        return
    }

	actual, err := getURLsFromHTML(inputBody, baseURL)
	if err != nil {
		t.Fatalf("unexpected error: %v", err)
	}

	expected := []string{"https://blog.boot.dev"}
	if !reflect.DeepEqual(actual, expected) {
		t.Errorf("expected %v, got %v", expected, actual)
	}
}

Implement the getURLsFromHTML function. Continue using the goquery library. Here are a few hints you may find useful:

The Find method returns a Selection type. That type has a lot of helper methods. Take a look at the Each method. That's an iterator that will help you check the matching elements. For example:

	doc.Find("a[href]").Each(func(_ int, s *goquery.Selection) {
        // For each '<a href>' it finds, it will run this function.
    }

In HTML, "anchor" elements are links with their href attribute containing the actual URL. e.g:

<a href="https://www.boot.dev">Learn Backend Development</a>

Write tests for a new function called getImagesFromHTML. Here's the signature:

func getImagesFromHTML(htmlBody string, baseURL *url.URL) ([]string, error) {

htmlBody is an HTML string
baseURL is the (already parsed) root URL of the website we're crawling. This will allow us to rewrite relative URLs into absolute URLs.
It returns an un-normalized list of all the image URLs found within the HTML, and an error if one occurs.

In your tests, make sure:

relative URLs are converted to absolute URLs
Handle cases where attributes might be missing

Here's one example test case to give you an idea:

func TestGetImagesFromHTMLRelative(t *testing.T) {
	inputURL := "https://blog.boot.dev"
	inputBody := `<html><body><img src="/logo.png" alt="Logo"></body></html>`

    baseURL, err := url.Parse(inputURL)
    if err != nil {
        t.Errorf("couldn't parse input URL: %v", err)
        return
    }

	actual, err := getImagesFromHTML(inputBody, baseURL)
	if err != nil {
		t.Fatalf("unexpected error: %v", err)
	}

	expected := []string{"https://blog.boot.dev/logo.png"}
	if !reflect.DeepEqual(actual, expected) {
		t.Errorf("expected %v, got %v", expected, actual)
	}
}

Implement the getImagesFromHTML function.
Make sure you have at least 3 test cases for each function you're testing.

Run and submit the CLI tests from the root of your module.