Multilingual programming for retrieving web pages

Tower of Babylon

© Lead Image © Maksym Shevchenko, 123RF.com

© Lead Image © Maksym Shevchenko, 123RF.com

Article from Issue 201/2017
Author(s):

We show you how to whip up a script that pulls an HTTP document off the web and how to find out which language offers the easiest approach.

Few programming tasks illuminate the differences between commonly used languages as clearly as that of retrieving a web document. When it comes to shell scripts, admins often turn to the curl utility, which transfers the data behind a URL without much ado and sends them to the standard output.

But, what if the URL points to a black hole? Or the server denies access? And what if the server returns a redirect? For example, curl http://google.com does not return the expected HTML page with the search form but just a note that the desired page may be available on www.google.com. Armed with the -L option, however, curl follows the reference and then returns the data from the source it finds there.

What happens with a huge file like a 4K movie containing many gigabytes of data? Will the process exhaust your RAM because it attempts to swallow everything in a single gulp? Does encryption work automatically for an HTTPS URL using the SSL protocol, and does the utility check the server's certificate correctly so that it does not fall victim to a man-in-the-middle attack? Similar to good old curl, popular programming languages offer all of this, although often only as an add-on package and often requiring quirky approaches.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Simultaneous Runners

    In the Go language, program parts that run simultaneously synchronize and communicate natively via channels. Mike Schilli whips up a parallel web fetcher to demonstrate the concept.

  • Perl: Asynchronous Code

    Asynchronous program flow quickly degenerates into unreadable code if it lacks an overarching concept to provide structure. Fortunately, the JavaScript community has invented some functional tricks that also help tame asynchronous Perl code.

  • Better Safe than Sorry

    Developers cannot avoid unit testing if they want their Go code to run reliably. Mike Schilli shows how to test even without an Internet or database connection, by mocking and injecting dependencies.

  • Fighting Chaos

    When functions generate legions of goroutines to do subtasks, the main program needs to keep track and retain control of ongoing activity. To do this, Mike Schilli recommends using a Context construct.

  • A Python script warns of failed login attempts

    A number of sensors and cameras send author Mike Schilli a short message if someone tampers with his apartment door. He has now applied this security principle to the SSH entrance of his Linux computer.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News