0 / 2 embers
0 / 3000 xp
click for more info
Complete a lesson to start your streak
click for more info
Difficulty: 6
click for more info
No active XP Potion
Accept a Quest
Login to submit answers
Why write a web crawler with Go and not use concurrency? Let's speed this bad boy up with some goroutines!
type config struct {
pages map[string]PageData
baseURL *url.URL
mu *sync.Mutex
concurrencyControl chan struct{}
wg *sync.WaitGroup
}
pages
map)baseURL
)pages
map is thread-safe (mu
Mutex)concurrencyControl
channel). It's a buffered channel of empty structs. When a new goroutine starts, we'll send an empty struct into the channel. When it's done, we'll receive an empty struct from the channel. This will cause new goroutines to block and wait until the buffer has space for their "send". (For example, a buffer size of 5 means at most 5 requests at once)main
function waits until all in-flight goroutines (HTTP requests) are done before exiting the program (wg
WaitGroup)func (cfg *config) crawlPage(rawCurrentURL string)
We remove some parameters because they're available via the struct now.
I created this method to call as a helper inside of crawlPage
:
func (cfg *config) addPageVisit(normalizedURL string) (isFirst bool)
When you're satisfied with the results, you can move on.
addPageVisit
method returns a boolean: to indicate if it's the first time we've seen the page.<-cfg.concurrencyControl
) channel. This ensures that the wg
is decremented and the channel is emptied even when the goroutine errors and returns early.main
function.Login to Complete
Login to view solution