Identify situations where the async module would be needed
Utilize async control flow functions for asynchronous function calls
What is Async?
It helps to know the dictionary definitions of synchronous and asynchronous:
synchronous: happening, existing, or arising at exactly the same time
asynchronous: not simultaneous or concurrent in time
Async is a JavaScript module available for Node.js and the browser. It provides a set of utilities for working with functions asynchronously. We'll be focusing our attention to Async's control flow functions, mainly series, parallel, concat, and waterfall.
Why Use Async?
There are situations where functions will need to be run without conflicting with each other. This is one of the benefits/drawbacks of asynchronous behavior in JavaScript. A great example is data scraping multiple pages. If you were to data scrape pages using a loop, the requests would all run at the same time.
var request =require('request');var urls = ['http://www.google.com','http://www.yelp.com','http://www.twitter.com'];var allData = [];//this would work correctly to a point, but would be disastrous if there were, say, 1000 URLs in the urls arrayurls.forEach(function(url) {request(url,function(err, response, data) { allData += data; });});//this would not work correctlyconsole.log(allData);
Using the async module will allow us to control the flow of how these functions are run, and when we can access all of the data.
Using Async
First, let's install the async module.
npm install async
Series
The .series function can pass in an array of functions, along with a callback function to execute once the array of functions is completed. Each function in the array takes in a callback function as a parameter, then executes the callback when the function is finished (passing an error and results). Functions will be executed in order until completed, then the final callback is executed.
The .parallel function also accepts an array of functions, along with a callback function. However, the functions aren't executed one after another. They start in parallel, and the final callback is executed once the array of functions is done executing. Note that the functions are started, but not executed, in parallel. So if the functions don't take any time to complete, they'll execute in series (just as before).
The .waterfall function is similar to .series, but allows arguments to be passed from one function to the next. This is ideal if multiple API calls are being made, and the results of one call are required for the next call.
Lastly, .concat allows you to pass in an array of values and an iterator (instead of passing in an array of functions). This allows functions to be more reusable, as in the instance of making requests. These functions occur in parallel.
var async =require('async');var request =require('request');//urls to make requests tovar urlsToGet = ['https://www.reddit.com/search.json?q=politics','https://www.reddit.com/search.json?q=puppies','https://www.reddit.com/search.json?q=drake'];//create an iterator to get the first title for each URLvargetFirstTitle=function(url, callback) {request(url,function(err, response, data) {var firstTitle =JSON.parse(data).data.children[0].data.title;callback(null, firstTitle); });};//get each title, then print the resultsasync.concat(urlsToGet, getFirstTitle,function(err, results) {console.log(results);});
Concat Example 2: Seattle Neighborhoods
For those who created the Seattle neighborhoods scraper, we didn't forget about you! Here's an example that takes the neighborhood data we scraped and retrieves the description for each Seattle neighborhood. Comments are provided, so take a moment and decipher the code.
// importing modulesvar request =require('request');var cheerio =require('cheerio');var async =require('async');// // make a request to the Visit Seattle and scrape neighorhood names and linksrequest('http://www.visitseattle.org/things-to-do/neighborhoods/',function (error, response, data) {var $ =cheerio.load(data);// scrape neighborhood names and links, and store as an array of objectsvar neighborhoods =$('.info-window-content').map(function(index, element) {return { name:$(element).find('h4').text(), link:$(element).find('a').attr('href') }; }).get();// console.log("printing neighborhoods from request:", neighborhoods);// // pass the neighborhood data to fetchDescriptionsfetchDescriptions(neighborhoods);});// this function takes neighborhood data and requests descriptions from each linkfunctionfetchDescriptions(neighborhoodData) {// using async.parallel to retrieve data from all neighborhoods at onceasync.concat(neighborhoodData, getNeighborhoodDescription,function(err, results) {// here are the neighborhoods with descrptions. Let's print them outresults.forEach(function(result) {console.log(result.name);console.log(result.description);console.log('----------------'); }); });}// this function runs for each neighborhood link. We'll go and scrape a descrptionfunctiongetNeighborhoodDescription(neighborhood, cb) {request(neighborhood.link,function(error, response, data) {var $ =cheerio.load(data);// get the first paragraph (the description) and store in the neighborhood objectneighborhood.description =$('h2').first().text();// return the appended neighborhood object to the async.concat callcb(null, neighborhood); });}
Wrapup
This is just a sampling of the various utilities included in the async module. Some other useful functions include .forever, .until, .queue, and .times. For usage samples, see the documentation: