Harvesting the Web:  Using PowerShell to Scrape Screens, Exploit Web Services and Save Time… Even If You Don’t Know PowerShell or Web Services!

  • For many IT pros, our days start with the same activity:  surfing to catch up.  We skim our Twitter feeds, view and perhaps download The Picture of the Day from some amazing photo site, check the weather, peek at eBay to see if that prized item is selling under $100 yet, or whatever.  That’s great, but what isn’t so great is the fact that most of us are gathering those data the same way we’ve been doing it for it for twenty years — by clicking around in a web browser.  Hey, it’s the 21st century, and IT pros hoping to remain employed are automating things, so it’s time to turn The Robot Hand of PowerShell to our daily web data-gathering tasks, whether personal- or business-related.

    PowerShell includes a number of little-known but essential web-harvesting cmdlets (Invoke-WebRequest and Invoke-Restmethod, amongst others),  enabling you to automate web surfing and data interpretation, but unless you’re familiar with the workings of “screen scraping,” “web services,” “SOAP” and the like, most of us sadly just don’t have the time to figure them out… unless you attend this session created and delivered by best-selling tech author and speaker Mark Minasi, “the Chief Explainer of the Really Complex.”

    This talk starts by demonstrating simple web page harvesting via downloading files and specifically-targeted text from web pages with the help of “regular expressions or “regexes.”  Then we move to a quick explanation of accessing information from “web services,” a more automation-friendly source for information.  Many web services exist on the Web and are publicly available, or in some cases you might be drawing data from an in-house web service.  Much of the data returned by web services are packaged as XML or “JSON” objects, so you’ll next learn about the PowerShell tools for cracking open XML and JSON objects.  Again, you needn’t be an expert in any of these things to make them work, as Mark makes it all easy, whether regex, JSON, or REST.  Attend this session and become The Automation Master of the Web!

 

Leave a Reply

 

Your email address will not be published. Required fields are marked *

  • Speaker Sponsors

     

    The following companies have graciously sponsored some of our speakers for PowerShell Saturday 010! SnipImageconcurrency-logointegrity_logo5964_CISTEL_LOGO_COLOUR-e1429537987345SAPIENLogo