I saw a new article - Gentoo refusing AI generated updates

Message

lyallp · Post by **lyallp** » Wed Apr 17, 2024 5:58 am

If this is true, then....

GO Gentoo! Awesome!

I guess there will be some push back.

The other thing is I guess it will be kinda hard to differentiate AI generated and human generated.

rfx · Post by **rfx** » Wed Apr 17, 2024 6:17 am

Do you have a source for this information?
Over the last 30 years I have often read that the world is about to end. Hasn't happened yet.

lyallp · Post by **lyallp** » Wed Apr 17, 2024 6:20 am

https://www.theregister.com/2024/04/16/ ... ux_ai_ban/

The end is getting close, with xz hack, middle east, inflation, climate change, cost of living, sheesh, my keyboard just wore out listing the issues....

Post by **Chiitoo** » Wed Apr 17, 2024 6:44 am

Here's a quote from the mail quoting the approved statement:

Hello,

On 2024-04-14, the Gentoo Council has unanimously approved the new AI
policy. The original wording from the mailing list thread was approved:

"""
It is expressly forbidden to contribute to Gentoo any content that has
been created with the assistance of Natural Language Processing
artificial intelligence tools. This motion can be revisited, should
a case been made over such a tool that does not pose copyright, ethical
and quality concerns.
"""

I have started drafting a Wiki page detailing this at [1]. We will also
look into how best provide this new information to our contributors.

[1] https://wiki.gentoo.org/wiki/Project:Council/AI_policy

--
Best regards,
Michał Górny

- https://marc.info/?l=gentoo-dev&m=171324172908553&w=2

kgdrenefort · Post by **kgdrenefort** » Wed Apr 17, 2024 8:37 am

Hello,

That could sounds naive, but I have to strongly say before hands that I agree with the decision, I do not understand, as a baby wannabe dev (I'm bad and not skilled or experienced enough to have a proper idea of all of this) I'm wondering why not use AI for… Code checking.

I know that is not something that is set by snapping your fingers, of course.

I do understand that using AI is, mostly, a bad idea. Myself I could use it for Gentoo french translation, either full translation (what is the point of human, then ?), or partially (which I did a few times, because I wasn't able to translate properly a difficult sentence, simple as that), but I avoid it.

But, checking code for "obvious" but unsee error, bugs or worst: security hole.

As Linus said, about AI incoming into the dev IT field:

Looking ahead, Hohndel said, we must talk about "artificial intelligence large language models (LLM). I typically say artificial intelligence is autocorrect on steroids. Because all a large language model does is it predicts what's the most likely next word that you're going to use, and then it extrapolates from there, so not really very intelligent, but obviously, the impact that it has on our lives and the reality we live in is significant. Do you think we will see LLM written code that is submitted to you?"

Torvalds replied, "I'm convinced it's gonna happen. And it may well be happening already, maybe on a smaller scale where people use it more to help write code." But, unlike many people, Torvalds isn't too worried about AI. "It's clearly something where automation has always helped people write code. This is not anything new at all."

It has to be seen, as he said (I agree), as an automation tool.

Refusing AI, maybe not now but in the future, to help peoples doing a better job (while not replacing them, for different reason and mostly the best argument would be morality), is in my opinion a problem.

It should never replace someone's job, otherwise than a really useless job (as we had with lamplighter, for example, it was just becoming useless and we had better alternatives with time). Maybe one day AI will do most of the works. That is something a bit scary, but humanity have to adapt to the new technology.

No tech is bad by itself, it's what you do with it.

Otherwise, let's just start the Butlerian Jihad :wink: !

Regards,
GASPARD DE RENEFORT Kévin

saturnalia0 · Post by **saturnalia0** » Wed Apr 17, 2024 12:49 pm

Policy (WIP): https://wiki.gentoo.org/wiki/Project:Council/AI_policy

Let me start by saying I just contribute helping others online and donating when I can, I'm not the one actually maintaining the software, so take my opinion for what it's worth...

As for the stated quality concerns, I understand that some people blindly use whatever AI spits out, just like some people copy and paste from StackOverflow without understanding what they were copying. Should StackOverflow be banned? How would one even enforce either one of these? Human code reviews are still required to merge...

As for the stated ethical concerns, yes, AI requires computational power, so does thousands/millions of Gentoo boxes compiling things. Should Gentoo switch to binary distribution, then? Using an IDE and browser on X.org on a modern i5/i7 with a GPU instead of a console on an 8086 also uses more energy. Should contributions be restricted to the ones typed with vim on a tty? As long as it's productive usage, the energy use makes sense... As for the other concerns over spam and scams, I don't understand how accepting code partially generated with CoPilot contribute to it.

As for the stated copyright concerns, I'm no legal expert, but it seems like the horse has left the barn on this one. Every big tech company I know of already has code that had some sort of AI generation involved in the development process merged deeply into their trunks. I just don't see the entire software industry getting sued over this.

I'm usually a late adopter of things, and it was no different with AI code completion. Nowadays I use it daily and it increases productivity dramatically for me. Just like touch typing, visual/insert modes, autocompletion and an IDE. CoPilot is like auto-complete on steroids, you still need to read and understand what it's spitting out. So it seems to me that banning it is counterproductive.

Rad · Post by **Rad** » Wed Apr 17, 2024 1:36 pm

I see this as just as much of a mistake. It affects tooling and programmer productivity and it is mostly unenforceable anyhow. On claim seems to be that AI is pretty bad, but it already often is better than (above) average people and this trend will likely just become more extreme. No offense intended to the most skilled Gentoo devs and users who do their thing with the greatest skill and care and rarely/never forget anything while doing huge documentation works or test cases or filling in boilerplate code or other things.

Are you going to refuse particularly good contributions for fear that they might be AI-generated? How do you check for this anyhow, will Gentoo run an AI to check it's not accepting AI work...? Spam-wise, I guess we could also talk about the harm that Linux distros in general do by being able to send emails in any way other than slow human keyboard input. Will humans like AI have to prove they licensed their learning from patterns? What if AI is humanlike in intelligence and by all reasonable tests equally sentient, does such an AI have to forever comply/pay for what patterns it learned unless it stays 100% away from anything that is still copyrightable? And so on.

But realistically, all of this won't be solved on Gentoo's side anyhow. It would simply make far more sense to integrate into tooling or even operate open source AI and to take the productivity gains while doing a little something so that some of the most powerful tools ever -AI and AI models- stay open source. If national legislation requires x to be done, then maybe you do x. Else let people use what is reasonably shareable as open source between developers (for reasons of reproduction, shared tooling, and so on). Yes, simply use AI whenever and wherever it is effective, fun, useful, all the other things people might do with/on/for Gentoo.

szatox · Post by **szatox** » Wed Apr 17, 2024 1:49 pm

As for the stated copyright concerns, I'm no legal expert, but it seems like the horse has left the barn on this one. Every big tech company I know of already has code that had some sort of AI generation involved in the development process merged deeply into their trunks. I just don't see the entire software industry getting sued over this.

The funniest part is that AI does a very similar thing to what humans do:
it discovers patterns in the data it's fed and fills gaps in other data with discovered patterns.
Sure, there is a difference in quantity and abstraction level, but both AI and humans need some input to learn common features before producing any useful output, so slapping "copyright" or "ethical concerns" sounds just like a bad joke and I wish people stopped legitimizing this line of thinking... It's about as sensible as charging kids for scribbling on the last page of their notebooks.

Quality is a real concern, since AI lacks the depth and understanding. There are useful automated tools specialized for a particular job, like e.g. static analyzers, but language models specifically are just not fit doing any work in tech, so banning them specifically is a good policy. Still, justifying the decision with wrong reasoning just to extend the list actually makes the argument weaker... And might have unintended consequences down the line, when said wrong reasoning is extrapolated to something else.

Rad · Post by **Rad** » Wed Apr 17, 2024 2:09 pm

szatox wrote:Quality is a real concern, since AI lacks the depth and understanding.

I'll again place my opinion that this is not an universal fact at all anymore. AI made remarkable progress, it's already *very* powerful in summarizing documents or even internet discussions, including open source tooling running LLM in the hands of users who specialized it a little to get short summaries or whatever they wanted. I say it would already *increase* rather than decrease availability-quality of documentation, or the understanding of bug reports. That it should not be completely trusted... isn't that the same for working with tired hobbyist humans?

IF you focus on running multiple AI within your control (->the ones your developers like best), you can get a rough understanding how reliable or not they are individually, and of course replace or disable or just be aware of and warn about what doesn't work well enough in the context where it doesn't quite work.

pingtoo · Post by **pingtoo** » Wed Apr 17, 2024 2:34 pm

Rad wrote:
szatox wrote:Quality is a real concern, since AI lacks the depth and understanding.
I'll again place my opinion that this is not an universal fact at all anymore. AI made remarkable progress, it's already *very* powerful in summarizing documents or even internet discussions,

I wonder if we take Gentoo's Wiki to AI, will AI produce better document (and summary)? Will Gentoo team accept this result?

Post by Hu » Wed Apr 17, 2024 3:01 pm

I was not involved in the discussion, nor in the subsequent vote. My take on it is that:

Regarding copyright, most countries have copyright laws that are poorly suited to software, at best. For an organization that cannot afford to spend millions defending against a copyright lawsuit, extreme caution seems like the right approach to me. Once major jurisdictions have clear statutory or case law holding that AI output is not subject to the copyright of the input training data, this will be less of a concern. I am not aware of any jurisdictions that have said that it is subject, or is not subject, so the cautious approach is to assume a court might decide that the copyright on the input training data does transfer to the AI output. Most, if not all, AI output currently has poor attribution, so it is difficult to determine whether the input training data was subject to an enforceable copyright, and if it was, then whether the input had an acceptable license (such as BSD/MIT for any purpose, or GPL for projects that accept GPL contributions) that would permit its use even if a court does rule that the copyright is passed down.
Regarding quality, my opinion is that the submission's quality should be judged independent of the source. If the submitter's work is of good quality, whether because the submitter is a good author, because the submitter took poor quality AI output and manually improved it, or because the submitter used a high quality AI that produces naturally good output, does not matter. However, at present, AI output is often not of a quality that it can be submitted verbatim, and I would not want reviewers (whose time is often very precious) to waste their time picking through poor quality AI output. Establishing a rule that summarily rejects AI output is heavy-handed, but effective. Once submitters provide output that is not obviously poor quality, this point will become difficult to enforce, but also unnecessary.
Regarding the other concerns, I have not read enough to form an opinion.

spica · Post by **spica** » Wed Apr 17, 2024 6:28 pm

AI can enhance existing texts by providing a means for authors to have their work reviewed and improved, either by seeking feedback from other individuals or utilizing AI tools for refinement. This practice, when employed by knowledgeable individuals seeking to refine their work, is undoubtedly beneficial. This is undoublety beneficial for this forum, because AI helps to make text more understandable and clear.

However, a significant concern arises when individuals lacking expertise rely solely on AI to generate complex content, such as code. In such cases, if an inexperienced user utilizes AI-generated content and distributes it to others, who then must expend valuable time deciphering and correcting the nonsensical output? This practice obviously must be prohibited.

One critical issue surrounding this practice of AI usage pertains to licensing. It remains unclear how the responsibility for the generated content is attributed to the AI authors. Most AI tools are accessed via non-GUI APIs, which may not inherently include licensing information. To address this, it is imperative to tightly integrate licensing information with the output provided by AI. Without this cohesive integration, establishing a clear connection between the final product, crafted by a skilled engineer and submitted to AI for verification, becomes very challenging.

At the end, Google's search results are generated using AI algorithms. Does Michał's message mean we must avoid Google usage?

Absolutely, defining clear boundaries for AI usage is important for ensuring its ethical and responsible usage. Without such boundaries, AI could start to be more than aimless noise, lacking direction and purpose. By establishing well-defined limits and guidelines, we can harness the potential of AI while mitigating potential risks and ensuring that its impact aligns with our values and objectives.

GalaxyNova · Post by **GalaxyNova** » Thu May 23, 2024 9:40 pm

I think this decision was a mistake. Gentoo is in no position to talk about energy usage.

kgdrenefort · Post by **kgdrenefort** » Fri May 24, 2024 3:24 pm

GalaxyNova wrote:I think this decision was a mistake. Gentoo is in no position to talk about energy usage.

Are you thinking of compilation times, by any means ?

If yes, well, sure it takes more times to update, if you don't manage it well then, you could have to let your computer runs all the night for compilation, which takes ressources, that is true.

At the same time, a simple request to an AI is requesting way more resources than you can think of I think ;). Specially when you see some goofy stuff that it's being used for…

Regards,
GASPARD DE RENEFORT Kévin

GalaxyNova · Post by **GalaxyNova** » Fri May 24, 2024 3:37 pm

kgdrenefort wrote:
GalaxyNova wrote:I think this decision was a mistake. Gentoo is in no position to talk about energy usage.
Are you thinking of compilation times, by any means ?

If yes, well, sure it takes more times to update, if you don't manage it well then, you could have to let your computer runs all the night for compilation, which takes ressources, that is true.

At the same time, a simple request to an AI is requesting way more resources than you can think of I think . Specially when you see some goofy stuff that it's being used for…

Regards,
GASPARD DE RENEFORT Kévin

I think the most important factor to consider is per-capita energy usage, on which compilation and AI are probably roughly the same. If you've ever tried to run an LLM like LLaMa offline for example, it maxes out your CPU in the same way as emerging @world.

Of course this doesn't mean a lot in the end. Humans use a lot of energy for many different things and we don't stop doing it even if we know it is bad for the environment (see cars, entertainment). The point is there are many non-essential things that take up a lot of energy.

That's why I sort of dislike it when people bring up the climate argument when talking about various things. It only really acts as an extra argument people throw in when they already don't like something.

I think LLMs offer significant value for code autocomplete or commit message generation, if used right.

szatox · Post by **szatox** » Fri May 24, 2024 5:54 pm

Ah, the "don't have children to save the planet" argument... Save it for whom exactly?

Specialized AI is a handy tool when used within the scope of it's capabilities. Just like any other tool.
Chatbots are good for chatting. Writing code and reporting bugs use text, but they are engineering. Chatbots suck at engineering. They can organize words in reasonably looking patterns, but they don't understand depth.

Anyone needs a proof? Go to YT and search for "ChatGPT plays chess"... And enjoy the mayhem.
A similar thing happens with tensioned straps in generated pictures.... Following curves both the outside (correct) and inside curves of elbows (where they should go straight instead).

GalaxyNova · Post by **GalaxyNova** » Sat May 25, 2024 1:34 am

szatox wrote:Ah, the "don't have children to save the planet" argument... Save it for whom exactly?

Specialized AI is a handy tool when used within the scope of it's capabilities. Just like any other tool.
Chatbots are good for chatting. Writing code and reporting bugs use text, but they are engineering. Chatbots suck at engineering. They can organize words in reasonably looking patterns, but they don't understand depth.

Anyone needs a proof? Go to YT and search for "ChatGPT plays chess"... And enjoy the mayhem.
A similar thing happens with tensioned straps in generated pictures.... Following curves both the outside (correct) and inside curves of elbows (where they should go straight instead).

Yeah they are definitely not quite at the level of human intelligence. But to me this doesn't seem like a valid reason to completely ban the usage of such tools in any Gentoo project.

dmpogo · Post by **dmpogo** » Sun May 26, 2024 5:51 pm

Hu wrote:I was not involved in the discussion, nor in the subsequent vote. My take on it is that:
Regarding copyright, most countries have copyright laws that are poorly suited to software, at best. For an organization that cannot afford to spend millions defending against a copyright lawsuit, extreme caution seems like the right approach to me. Once major jurisdictions have clear statutory or case law holding that AI output is not subject to the copyright of the input training data, this will be less of a concern. I am not aware of any jurisdictions that have said that it is subject, or is not subject, so the cautious approach is to assume a court might decide that the copyright on the input training data does transfer to the AI output. Most, if not all, AI output currently has poor attribution, so it is difficult to determine whether the input training data was subject to an enforceable copyright, and if it was, then whether the input had an acceptable license (such as BSD/MIT for any purpose, or GPL for projects that accept GPL contributions) that would permit its use even if a court does rule that the copyright is passed down.
Regarding quality, my opinion is that the submission's quality should be judged independent of the source. If the submitter's work is of good quality, whether because the submitter is a good author, because the submitter took poor quality AI output and manually improved it, or because the submitter used a high quality AI that produces naturally good output, does not matter. However, at present, AI output is often not of a quality that it can be submitted verbatim, and I would not want reviewers (whose time is often very precious) to waste their time picking through poor quality AI output. Establishing a rule that summarily rejects AI output is heavy-handed, but effective. Once submitters provide output that is not obviously poor quality, this point will become difficult to enforce, but also unnecessary.
Regarding the other concerns, I have not read enough to form an opinion.

With the quality I see the following situation. Currently at some stage human still has to review the code. If this is done by a submitter and he reasonably can claim correctness, I would not care if AI was used in the development process. Possible problem is when the code checks will fall on the maintainers, i.e it is easy to submit AI generated code and let others deal with whether it works as intended. I see that a lot with students submissions

When you write your own code, you at least kind of now what you put in and what can be wrong. With AI tempetation is great to just do basic checks, so output looks reasonable, and push it out.

psycho · Post by **psycho** » Mon Jun 24, 2024 6:04 am

Well... I've just skimmed through this thread, so maybe I'm missing something important... but at this point it just looks like a bizarre decision to me. Not only does this policy serve no purpose that I can see (who cares what tools were used if the code is good... are you going to ban code that was typed on particular keyboards?!), but it looks like a terrible decision for a source-based OS. I can't think of anyone who stands more to gain from AI-generated code than the users of source-based operating systems, so it feels to me like exactly the time to start planning how Gentoo could evolve to make best use of this tech, not adopting "none of that new-fangled tech around here, please!" policies.

I'm largely ignorant of the Gentoo development context, so again I'm aware that I may be missing important parts of this picture... but from where I sit, "no thanks" looks like a completely disastrous response to AI. Gentoo should be riding this tsunami like a surfer, over the heads of everyone who's saying "no thanks... blub, blub, blub...".

szatox · Post by **szatox** » Mon Jun 24, 2024 9:39 am

who cares what tools were used if the code is good... are you going to ban code that was typed on particular keyboards

The intelligence part in AI is a lie. The correct name is "statistics".
Statistics sucks at generating code, but its time is much cheaper than human's and it's capable of producing a volume of garbage big enough to overwhelm and drown any developer on the receiving end of this pipe.
Whatever keyboard you're using, it's driven by a real (hopefully) intelligence, and you still have to literally touch every letter and think about it's purpose.
Coding aids can be used to speed up developer's work, but they can not and will not replace them, but people fundamentally are lazy and LLMs make pushing the work on someone else too easy to make such a process sustainable.

Change my mind

psycho · Post by **psycho** » Mon Jun 24, 2024 11:12 am

szatox wrote:
who cares what tools were used if the code is good... are you going to ban code that was typed on particular keyboards
The intelligence part in AI is a lie. The correct name is "statistics".
Statistics sucks at generating code, but its time is much cheaper than human's and it's capable of producing a volume of garbage big enough to overwhelm and drown any developer on the receiving end of this pipe.
Whatever keyboard you're using, it's driven by a real (hopefully) intelligence, and you still have to literally touch every letter and think about it's purpose.
Coding aids can be used to speed up developer's work, but they can not and will not replace them, but people fundamentally are lazy and LLMs make pushing the work on someone else too easy to make such a process sustainable.

Change my mind

Well, when I asked "who cares what tools were used if the code is good?", my definition of "good code" did not include "garbage".

Where's the problem if a competent developer uses AI tools to produce *good* code in half the time? Scenarios involving the submission of large volumes of inelegant garbage are presumably rejected regardless of who or what produced the garbage so I'm still not understanding the problem here. Surely the criteria for inclusion should involve code quality rather than the use of mystic arts to determine what tools the developer used to produce that good code?

As far as I can see it would require the use of mystic arts to enforce the policy: if a developer were to use AI to draft a passage of code, and then edited it until it was good, and then submitted it, how would the person reviewing the submission know, in order to pointlessly smack the naughty developer for producing the good code too quickly?

And this isn't even getting into the possibilities of using AI to assist at the review end. I just think making smart use of this tech makes more sense than banning it.

szatox · Post by **szatox** » Mon Jun 24, 2024 11:36 am

Are you seriously implying that nobody ever solved his problem using answers from stack overflow?

YOU can use whatever tools you want to help YOU write code, but don't hook up LLMs to gitlab and bugzilla.

Post by Hu » Mon Jun 24, 2024 11:45 am

psycho: I suggest you read the thread in full before responding to it. I think my prior post addressed your questions. To recap:

Generative AIs tend to produce low quality output. Reviewer time is too precious to waste cleaning up that output, so anyone who uses a generative AI and submits it directly for review is wasting reviewer's time. If a would-be contributor uses generative AI as a start, then personally cleans up the garbage to a presentable level before submitting it for review, that would be different. As szatox suggests though, that's not likely to be what happens. When the contributor does do a good enough job cleaning up the input that it no longer appears to be AI-generated garbage, then the reviewer will not be able to readily reject it as AI garbage.
Copyright status on AI generated outputs is unclear at best. Accepting contributions that have a decent chance of becoming a copyright mess later is risky, so the safe path is to refuse AI generated outputs until the situation improves.

GDH-gentoo · Post by **GDH-gentoo** » Mon Jun 24, 2024 12:40 pm

Hu wrote:
Copyright status on AI generated outputs is unclear at best. Accepting contributions that have a decent chance of becoming a copyright mess later is risky, so the safe path is to refuse AI generated outputs until the situation improves.

And this is the the biggest problem, IMO. I don't see enough participants in this thread addressing it.

psycho · Post by **psycho** » Tue Jun 25, 2024 7:13 am

Hu wrote:psycho: I suggest you read the thread in full before responding to it. I think my prior post addressed your questions. To recap:
Generative AIs tend to produce low quality output. Reviewer time is too precious to waste cleaning up that output, so anyone who uses a generative AI and submits it directly for review is wasting reviewer's time. If a would-be contributor uses generative AI as a start, then personally cleans up the garbage to a presentable level before submitting it for review, that would be different. As szatox suggests though, that's not likely to be what happens. When the contributor does do a good enough job cleaning up the input that it no longer appears to be AI-generated garbage, then the reviewer will not be able to readily reject it as AI garbage.
Copyright status on AI generated outputs is unclear at best. Accepting contributions that have a decent chance of becoming a copyright mess later is risky, so the safe path is to refuse AI generated outputs until the situation improves.

Ah, sincere apologies: I did say "I've just skimmed through this thread, so maybe I've missed something important", and clearly I had. The decision still doesn't make sense to me, but this time it's probably more about my ignorance of Gentoo development culture/process.

I suppose I tend to assume that Gentoo developers are competent and would not be motivated to use AI to generate garbage and then just tweak it to look like good code: certainly if I were responsible for a portion of code that could impact upon thousands of other people, I would want to understand it thoroughly and not gamble that code I've generated via AI is sound. So even if I used AI tools I'd still be taking responsibility for the quality of the code I submitted. The situation you and szatox are describing involves a careless irresponsible developer, which is something I associate more with situations in which there's money to be made, than labours of love like Gentoo.