Our link tracker will need to know how to read a page of HTML text and extract links.
For example, the following HTML page has a single link to https://blog.boot.dev
:
<html>
<body>
<a href="https://blog.boot.dev"><span>Go to Boot.dev</span></a>
</body>
</html>
We'll use a third-party HTML parsing library called JSDOM to find and extract links.
We want to write a new function called getURLsFromHTML
in the crawl.ts
file. It takes 2 arguments. The first is an HTML string, while the second is the root URL of the website we're crawling. This will allow us to rewrite relative URLs into absolute URLs. Lastly, it returns an un-normalized array of all the URLs found within the HTML.
function getURLsFromHTML(html: string, baseURL: string)
Here are some ideas for writing your tests:
npm install jsdom
npm install -D @types/jsdom
This will install jsdom
as a "dependency" (as opposed to vitest
which is a "devDependency" and was installed with the -D
flag). "Dev dependencies" are not required to run your application, they're only required for development (like testing). Regular dependencies are required to run the program itself.
I'll try not to give too many hints: you should go read the JSDOM docs! That said here are a few:
import { JSDOM } from 'jsdom'
new JSDOM(htmlBody)
creates a new "document object model"dom.window.document.querySelectorAll('a')
returns an array of <a>
tag "anchor" elementsIn HTML, "anchors" are links. e.g:
<a href="https://boot.dev">Learn Backend Development</a>
Once you're satisfied that your function works as expected, move on to the next step!
Login to Complete
Login to view solution