I saw a new article - Gentoo refusing AI generated updates

lyallp

If this is true, then....

GO Gentoo! Awesome!

I guess there will be some push back.

The other thing is I guess it will be kinda hard to differentiate AI generated and human generated.

_________________
...Lyall

rfx · Posted: Wed Apr 17, 2024 6:17 am Post subject:

Do you have a source for this information?
Over the last 30 years I have often read that the world is about to end. Hasn't happened yet.

lyallp · Posted: Wed Apr 17, 2024 6:20 am Post subject:

https://www.theregister.com/2024/04/16/gentoo_linux_ai_ban/

The end is getting close, with xz hack, middle east, inflation, climate change, cost of living, sheesh, my keyboard just wore out listing the issues....

_________________
...Lyall

Chiitoo · Posted: Wed Apr 17, 2024 6:44 am Post subject:

Here's a quote from the mail quoting the approved statement:

kgdrenefort · Posted: Wed Apr 17, 2024 8:37 am Post subject:

Hello,

That could sounds naive, but I have to strongly say before hands that I agree with the decision, I do not understand, as a baby wannabe dev (I'm bad and not skilled or experienced enough to have a proper idea of all of this) I'm wondering why not use AI for… Code checking.

I know that is not something that is set by snapping your fingers, of course.

I do understand that using AI is, mostly, a bad idea. Myself I could use it for Gentoo french translation, either full translation (what is the point of human, then ?), or partially (which I did a few times, because I wasn't able to translate properly a difficult sentence, simple as that), but I avoid it.

But, checking code for "obvious" but unsee error, bugs or worst: security hole.

As Linus said, about AI incoming into the dev IT field:

saturnalia0 · Tux's lil' helper Joined: 13 Oct 2016 Posts: 136

Policy (WIP): https://wiki.gentoo.org/wiki/Project:Council/AI_policy

Let me start by saying I just contribute helping others online and donating when I can, I'm not the one actually maintaining the software, so take my opinion for what it's worth...

As for the stated quality concerns, I understand that some people blindly use whatever AI spits out, just like some people copy and paste from StackOverflow without understanding what they were copying. Should StackOverflow be banned? How would one even enforce either one of these? Human code reviews are still required to merge...

As for the stated ethical concerns, yes, AI requires computational power, so does thousands/millions of Gentoo boxes compiling things. Should Gentoo switch to binary distribution, then? Using an IDE and browser on X.org on a modern i5/i7 with a GPU instead of a console on an 8086 also uses more energy. Should contributions be restricted to the ones typed with vim on a tty? As long as it's productive usage, the energy use makes sense... As for the other concerns over spam and scams, I don't understand how accepting code partially generated with CoPilot contribute to it.

As for the stated copyright concerns, I'm no legal expert, but it seems like the horse has left the barn on this one. Every big tech company I know of already has code that had some sort of AI generation involved in the development process merged deeply into their trunks. I just don't see the entire software industry getting sued over this.

I'm usually a late adopter of things, and it was no different with AI code completion. Nowadays I use it daily and it increases productivity dramatically for me. Just like touch typing, visual/insert modes, autocompletion and an IDE. CoPilot is like auto-complete on steroids, you still need to read and understand what it's spitting out. So it seems to me that banning it is counterproductive.

Rad · Posted: Wed Apr 17, 2024 1:36 pm Post subject:

I see this as just as much of a mistake. It affects tooling and programmer productivity and it is mostly unenforceable anyhow. On claim seems to be that AI is pretty bad, but it already often is better than (above) average people and this trend will likely just become more extreme. No offense intended to the most skilled Gentoo devs and users who do their thing with the greatest skill and care and rarely/never forget anything while doing huge documentation works or test cases or filling in boilerplate code or other things.

Are you going to refuse particularly good contributions for fear that they might be AI-generated? How do you check for this anyhow, will Gentoo run an AI to check it's not accepting AI work...? Spam-wise, I guess we could also talk about the harm that Linux distros in general do by being able to send emails in any way other than slow human keyboard input. Will humans like AI have to prove they licensed their learning from patterns? What if AI is humanlike in intelligence and by all reasonable tests equally sentient, does such an AI have to forever comply/pay for what patterns it learned unless it stays 100% away from anything that is still copyrightable? And so on.

But realistically, all of this won't be solved on Gentoo's side anyhow. It would simply make far more sense to integrate into tooling or even operate open source AI and to take the productivity gains while doing a little something so that some of the most powerful tools ever -AI and AI models- stay open source. If national legislation requires x to be done, then maybe you do x. Else let people use what is reasonably shareable as open source between developers (for reasons of reproduction, shared tooling, and so on). Yes, simply use AI whenever and wherever it is effective, fun, useful, all the other things people might do with/on/for Gentoo.

szatox · Advocate Joined: 27 Aug 2013 Posts: 3140

Rad · Posted: Wed Apr 17, 2024 2:09 pm Post subject:

pingtoo · Posted: Wed Apr 17, 2024 2:34 pm Post subject:

Hu · Moderator Joined: 06 Mar 2007 Posts: 21651

I was not involved in the discussion, nor in the subsequent vote. My take on it is that:

Regarding copyright, most countries have copyright laws that are poorly suited to software, at best. For an organization that cannot afford to spend millions defending against a copyright lawsuit, extreme caution seems like the right approach to me. Once major jurisdictions have clear statutory or case law holding that AI output is not subject to the copyright of the input training data, this will be less of a concern. I am not aware of any jurisdictions that have said that it is subject, or is not subject, so the cautious approach is to assume a court might decide that the copyright on the input training data does transfer to the AI output. Most, if not all, AI output currently has poor attribution, so it is difficult to determine whether the input training data was subject to an enforceable copyright, and if it was, then whether the input had an acceptable license (such as BSD/MIT for any purpose, or GPL for projects that accept GPL contributions) that would permit its use even if a court does rule that the copyright is passed down.
Regarding quality, my opinion is that the submission's quality should be judged independent of the source. If the submitter's work is of good quality, whether because the submitter is a good author, because the submitter took poor quality AI output and manually improved it, or because the submitter used a high quality AI that produces naturally good output, does not matter. However, at present, AI output is often not of a quality that it can be submitted verbatim, and I would not want reviewers (whose time is often very precious) to waste their time picking through poor quality AI output. Establishing a rule that summarily rejects AI output is heavy-handed, but effective. Once submitters provide output that is not obviously poor quality, this point will become difficult to enforce, but also unnecessary.
Regarding the other concerns, I have not read enough to form an opinion.

spica · Apprentice Joined: 04 Jun 2021 Posts: 288

AI can enhance existing texts by providing a means for authors to have their work reviewed and improved, either by seeking feedback from other individuals or utilizing AI tools for refinement. This practice, when employed by knowledgeable individuals seeking to refine their work, is undoubtedly beneficial. This is undoublety beneficial for this forum, because AI helps to make text more understandable and clear.

However, a significant concern arises when individuals lacking expertise rely solely on AI to generate complex content, such as code. In such cases, if an inexperienced user utilizes AI-generated content and distributes it to others, who then must expend valuable time deciphering and correcting the nonsensical output? This practice obviously must be prohibited.

One critical issue surrounding this practice of AI usage pertains to licensing. It remains unclear how the responsibility for the generated content is attributed to the AI authors. Most AI tools are accessed via non-GUI APIs, which may not inherently include licensing information. To address this, it is imperative to tightly integrate licensing information with the output provided by AI. Without this cohesive integration, establishing a clear connection between the final product, crafted by a skilled engineer and submitted to AI for verification, becomes very challenging.

At the end, Google's search results are generated using AI algorithms. Does Michał's message mean we must avoid Google usage?

Absolutely, defining clear boundaries for AI usage is important for ensuring its ethical and responsible usage. Without such boundaries, AI could start to be more than aimless noise, lacking direction and purpose. By establishing well-defined limits and guidelines, we can harness the potential of AI while mitigating potential risks and ensuring that its impact aligns with our values and objectives.