Ssscrape 1.0 Collects Dynamic Web Data
The Ssscrape tool screen-scrapes data from RSS and Atom feeds, blogs and podcasts. The open source software is now available in version 1.0.
Ssscrape tracks feeds and other collections for similar elements on updates, and downloads and cleans content by converting HTML to plain text. The database used is MySQL. The tool can also gather statistics about feed activities and report errors. A scheduler takes care of the periodic checks and a monitor displays the running activities.
Known as a Web crawler, a program that scrapes together information off the Web, Ssscrape is short for Syndicated and Semi-Structured Content Retrieval and Processing Environment. The Web scraper is written in Python with Twisted used for network programming and the not always standards-based Beautiful Soup used for parsing HTML/XML content.
Ssscrape was developed in the Information and Language Processing Systems (ILPS) department of the University of Amsterdam and is under LGPLv3 licensing. Ssscrape 1.0 requires Python 2.4 and is available for download as a tarball from the project page.
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Endless OS 6 has Arrived
After more than a year since the last update, the latest release of Endless OS is now available for general usage.
-
Fedora Asahi 40 Remix Available for Macs with Apple Silicon
If you've been anticipating KDE's Plasma 6 for your Apple Silicon-powered Mac, then you're in luck.
-
Red Hat Adds New Deployment Option for Enterprise Linux Platforms
Red Hat has re-imagined enterprise Linux for an AI future with Image Mode.
-
OSJH and LPI Release 2024 Open Source Pros Job Survey Results
See what open source professionals look for in a new role.
-
Proton 9.0-1 Released to Improve Gaming with Steam
The latest release of Proton 9 adds several improvements and fixes an issue that has been problematic for Linux users.
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.