Get HTML

Let's start crawling real web pages! For these remaining steps, you'll need a website you can crawl. Preferably a small one with less than 100 pages so the crawling doesn't take all day. You can use my personal blog, https://wagslane.dev if you don't have another in mind.

Assignment

Create a getHTML function that takes a URL string and returns an HTML string and error.

func getHTML(rawURL string) (string, error)

For now, your function should:

Use http.NewRequest with an http.Client to fetch the webpage of the rawURL. Set a User-Agent header (e.g. BootCrawler/1.0) to avoid being blocked by servers.
Return an error if the HTTP status code is an error-level code (400+)
Return an error if the response content-type header is not text/html
Return any other possible errors
Return the webpage's HTML if successful

You may find io.ReadAll helpful in reading the response.

I'd argue that it's not necessary to create unit tests for a function like getHTML. It's primarily just side effects (internet access), making it not a pure function like normalizeURL and getURLsFromHTML. Most of the "logic" in getHTML is just a couple of standard library function calls: and there's not much reason to test the standard library.

Call getHTML from your main function and print the result. It should print some HTML that it fetched from the internet!

Run and submit the CLI tests.

Notice that they're grabbing some HTML from the main page of Wikipedia.

The Boot.dev CLI requires you to be signed in to submit your solution!

Copy/paste one of the following commands into your terminal:

Run

bootdev run 75af182c-f95b-458f-8042-d207463f7a4f

Submit

bootdev run 75af182c-f95b-458f-8042-d207463f7a4f -s

Run the CLI commands to test your solution.

go run . "https://wikipedia.org" | grep "<body"
- Expecting stdout to contain all of:
  - body

Using the Bootdev CLI

The Bootdev CLI is the only way to submit your solution for this type of lesson. We need to be able to run commands in your environment to verify your solution.

You can install it here. It's a Go program hosted on GitHub, so you'll need Go installed as well. Instructions are on the GitHub page.