What Comes After

Welcome

Article from Issue 290/2025
Author(s):

It is well known that many of our most excruciating arguments about religion and philosophy are secretly arguments about definitions. With that said, I will add, it is quite a novel thing when the definition is in the foreground and everyone knows that is what they are arguing about.

Dear Reader,

It is well known that many of our most excruciating arguments about religion and philosophy are secretly arguments about definitions. With that said, I will add, it is quite a novel thing when the definition is in the foreground and everyone knows that is what they are arguing about.

A doozy of a definition argument is brewing right now at the Open Source Initiative (OSI). The OSI recently released version one of their definition for "Open Source AI" [1]. At a glance, the definition appears to fall squarely into the basic open source ethos. The user is accorded the right to:

  • Use the system for any purpose and without having to ask for permission.
  • Study how the system works and understand how its results were created.
  • Modify the system for any purpose, including to change its output.
  • Share the system for others to use with or without modifications, for any purpose.

So far so good, right? But the thing is, it all depends on what you call "the system." To the OSI, "the system" is the initial software used as a starting point for developing an AI model – not the model itself. In other words, the definition allows for secrecy in the training data used to build the model. If you can't fully see how the model was trained, you really can't study it and "understand how its results were created." As security expert Bruce Schneier (and others) have pointed out, "the training data is the source code – it's how the model gets programmed" [2].

To be fair, the Open Source AI Definition (OSAID) [3] does call for sharing "sufficiently detailed information about the data used to train the system so that a skilled person can build a substantially equivalent system." But they are a little cagey in the follow-through, requiring only what they call a "complete description" of the data used for training (as opposed to the data itself). They also make reference to "unshareable data" without fully explaining why the data would be unshareable. Somewhere in the commentary, they give an example of medical data, but they don't really restrict the term to legally unshareable data and appear open to the possibility that "unshareable" could be a business choice. If the data is obtained from a third party, you have to say where you got it, but again, you don't have to provide the data. In the FAQ that accompanies the definition, the OSI states that, for purposes of the definition, "training data does not equate to software source code".

As many have pointed out, the OSI is largely funded by high-tech companies who are actively competing against each other in AI. Google, Microsoft, and Meta are all sponsors, and these mega-vendors have a strong interest in protecting their business interests when it comes to AI development. On the other hand, part of the goal of the OSI is to stay relevant. If the definition is too strict, no one will follow it, and any potential benefits, such as disclosure of source code and free distribution, will be lost. The OSI has always been the pragmatist, as opposed to the Free Software Foundation, where the focus has always been a harder line on rights and principles.

Many experts have pointed to the dangers of drifting into a future where society is run by AI programs trained on secret data. Those dangers are as stark as ever. If you're wondering whether the open source definition will be a useful tool in combating the problem, the answer so far appears to be no.

Some are asking whether the open source definition is still even useful for promoting free software. IBM/Red Hat's legal gambit for restricting access to RHEL source code is another example of big companies looking for loopholes to slip around the open source concept. DRM is another example, and the whole idea of the web-based service architecture poses a significant problem for the definition. (If the software runs on a web server, is it distributed in a way that would trigger the need to share changes as defined in the GPL?) Bruce Perens, creator of the original open source definition, has already said he is looking for "what comes after open source" [4].

Looks like we'll all be looking.

Editor in Chief, Joe Casad

Infos

  1. Open Source AI definition: https://opensource.org/ai/open-source-ai-definition
  2. "AI Industry is Trying to Subvert the Definition of Open Source AI" by Bruce Schneier, November 8, 2024: https://securityboulevard.com/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai/
  3. OSAID FAQ: https://hackmd.io/@opensourceinitiative/osaid-faq
  4. "What Comes After Open Source? Bruce Perens is Working On It" by Thomas Claburn, The Register, December 27, 2023: https://www.theregister.com/2023/12/27/bruce_perens_post_open/

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News