Multilingual programming for retrieving web pages
Functional Node.js
If you want to retrieve a URL in a snippet of JavaScript in the browser, or do something similar in Node.js code on the server side, for example, on an Amazon Lamda Server [2], you need to toggle your brain to functional programming mode. After all, event-based systems do not follow the paradigm of "Do this, wait until it is finished, then do that." Instead, they want to receive their instructions in the form of "Do this, then this, then this… and go."
The reason for this is the event loop, which can only perform short callbacks and then wants the control back. It then drops in again when the data slowly flutters in from external interfaces. This structure complicates the readability of your code and requires much experience in the design of software components so that they interact well and in an easily maintainable way.
The dreaded pyramid of doom [3], composed of nested callbacks, can be resolved by several helper constructs. Node 7.6 now even comes with support for the async
and await
keywords, which force asynchronous code into a synchronous straightjacket to make things look tidier [4].
Listing 4 shows a get
call of the HTTP module in Node.js. In addition to the URL for the web document, it expects a function. This is called later with a response object and defines a closure with a variable (content
) and three callbacks for the events data
, error
, and end
.
Listing 4
http-get.js
The data
event gets triggered whenever a bunch of data arrives from the server. It collects the data chunks one by one and reassembles them in the content
variable. The error
callback gets involved in case of an error and writes the reason to the log in Line 11. When the server signals the end of the transmission, the event loop jumps to the end
callback, which in line 15 outputs the content of content
, where all the body data in the HTTP response is now located. The Node.js http
library automatically follows redirects.
Good Old Perl
Good Old Perl traditionally retrieves web documents with the CPAN LWP::UserAgent module. SSL support is not automatic but gets magically added if the admin retroactively installs the CPAN LWP::Protocol::https module, which depends on the availability of an OpenSSL installation and a list of root certificates.
Listing 5 shows also a peculiarity as well as correct error handling: Like some other libraries presented here, it automatically follows redirects and identifies the encoding of google.de as ISO-8859-1
, but it returns a UTF-8 string from decoded_content()
(as opposed to content()
). That is a good thing, because processing the data in the program code often relies on UTF-8 and otherwise causes ugly-looking mangled text problems.
Listing 5
http-get.pl
To output a UTF-8 string as such without modification using print
, the script first needs to tell stdout to select on UTF-8 mode with the help of binmode
. This rather elaborate procedure is owed to compatibility reasons and at least ensures that old scripts from the early days of Perl's UTF-8 support don't freak out when they meet the new versions of Perl.
Yeah, old age is not a piece of cake, when all of your joints are aching and the young folks are turning somersaults!
Infos
- Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/magazine/201
- "Equipping Alexa with Self-Programmed Skills" by Michael Schilli, Linux Magazine, issue 199, June 2017: http://www.linux-magazine.com/Issues/2017/199/Programming-Snapshot-Alexa
- "Pyramid of Doom" by Mike Schilli, Linux Magazine, issue 170, January 2015: http://www.linux-magazine.com/Issues/2015/170/Perl-Asynchronous-Code
- "Node 7.6 Brings Default Async/Await Support" by Sergio De Simone: https://www.infoq.com/news/2017/02/node-76-async-await
« Previous 1 2
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Red Hat Adds New Deployment Option for Enterprise Linux Platforms
Red Hat has re-imagined enterprise Linux for an AI future with Image Mode.
-
OSJH and LPI Release 2024 Open Source Pros Job Survey Results
See what open source professionals look for in a new role.
-
Proton 9.0-1 Released to Improve Gaming with Steam
The latest release of Proton 9 adds several improvements and fixes an issue that has been problematic for Linux users.
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.