Web Scraping for the Business Novice

by Tom Fast

By the latest estimates, there are over 30B websites up from 3M in 1999. The successful companies and business people of the 21st century will need to find ways to organize and exploit this source of information. One technique for this job is called web scraping. Web scraping use to be a very tedious and technical exercise but with new software packages that is no longer the case.

Why might you want to use data scraping? The applications are numerous: you want to pull client contact info from a national organization website, you want to pull historic options prices into a financial model, or maybe you just want to pull stats from your favorite baseball team off MLB.com to impress your colleagues.

There are several software packages on the market, but the one I chose to use was Mozenda. The company offers a 14 day free trial which is more than enough to test your scraping project and is $99 a month after the trial period for a basic package. You can also view product videos that demonstrate the process on the company’s website.

So how does it work? The program interface is completely graphical just like using PowerPoint to pluck key data items off a page. While understanding basic website logic is necessary, coding is not necessary. In fair disclosure, I do code HTML and Visual Basic, but I did not need those skills to master basic scraping. The web scraping program identifies the page coding surrounding your desired data and saves that logic to be run on multiple pages (for example if you wanted to pull data from every major league baseball team page rather than just your favorite). Once you chose your data items you can then cycle through any number of websites as part of the data project. You can load excel lists of URLs or simply scrape the URLs off of another webpage. The scraping projects can be run on command or scheduled to run automatically. The data can then be downloaded to excel or another database program for your use.

Give web scraping a try; I think it will be a great asset to your information gathering tool kit. Contact me via LinkedIn if you'd like to discuss business applications of scraping.

Comments

Popular posts from this blog

Quiz Time 129

TCS IT Wiz 2013 Bhubaneswar Prelims

The 5 hour start-up: BrownBagBrain