Setting up a data analytics environment in Linux with Python

Down in the Mine

© Photo by Luca Maffeis on Unsplash

© Photo by Luca Maffeis on Unsplash

Article from Issue 259/2022
Author(s):

The Knowledge Discovery in Data Mining (KDD) method breaks the business of data analytics into easy-to-understand steps. We'll show you how to get started with KDD and Python.

Data analytics is a major force in the current zeitgeist. Analytics are the eyes and ears on a very wide variety of domains (society, climate, health, etc.) to perform an even wider variety of tasks (such as understanding commercial trends, the spread of COVID-19, and finding exoplanets). In this article, I will discuss some fundamentals of data analytics and show how to get started with analytics in Python. Finally, I will show the whole process at work on a simple data analytics problem.

A Primer on Data Analytics

Data analytics uses tools from statistics and computer science (CS), such as artificial intelligence (AI) and machine learning (ML), to extract information from collected data. The collected data is usually very complex and voluminous, and it cannot be interpreted easily (or at all) by humans. Therefore, the data on its own is useless. Information lies hidden within the data, and it takes many forms: repeating patterns, trends, classifications, or even predictive models. You can use this data to uncover insights and build knowledge of the problem you are studying. For example, suppose you wish to measure the traffic in a parking lot that is monitored by a network of IoT sensors covering the whole city. Reading a single occupancy sensor doesn't say anything about the traffic on its own. Neither do the readings of all the parking sensors of the city without any more context. But the timestamped percentage of occupied places within the monitored parking lot does tell us something, and we use this information to derive insights, such as the times of day with maximum traffic.

Learning the mathematical background and analytics tools is only half the journey. Field expertise (experience on the problem that is being studied) is equally important. Some data scientists come from a statistics background, others are computer scientists who pick up the statistics as they go, and many are people starting from a field of expertise who need to learn both the statistics and the computing tools.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Data Visualization in Python

    Python's powerful Matplotlib, Bokeh, PyQtGraph, and Pandas libraries lend programmers a helping hand when visualizing complex data and their relationships.

  • PyScript

    PyScript lets you use your favorite Python libraries on client-side web pages.

  • JSON Deep Dive

    JSON data format is a standard feature of today's Internet – and a common option for mobile and desktop apps – but many users still regard it as something of a mystery. We'll take a close look at JSON format and some of the free tools you can use for reading and manipulating JSON data.

  • CircuitPython

    The CircuitPython run-time environment runs on almost all microcomputers and microcontrollers, making it perfect for cross-platform programming.

  • Julia on the Pi

    Create GUIs and a web app that connects to sensors.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News