Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
UNIX way, C, LISP et al.
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri Jun 22, 2018 7:01 pm    Post subject: UNIX way, C, LISP et al. Reply with quote

Just continuing a tangential conversation
CasperVector wrote:
The core advantange of Lisp is not functional programming, but homoiconicity (in a way superior to many other homoiconic languages, perhaps except for Mathematica)
Thanks, that was an interesting diversion.
Have you read "The Joy of Clojure" (Fogus & Hauser, 2011)? The language is mentioned on that page, and it is a fascinating read, even if I'd never want to use it personally.
The book has quite a lot to say on the same subject, just without using the same term.

I think the true lesson of functional languages is that all code is serializable; rather obviously, any sequence of actions can be thought of as smaller subsequences, which begin as data in CPU terms.
It is that which makes them so suitable to optimisation of higher-order languages. ("higher-order" using the Chomsky ladder, in case that offends.;)
steveL wrote:
Yes, I like rc; it's aimed explicitly at the scripting side of it, which is ofc much less to worry about.
Most of the "baggage" in shell comes from it having to cope with random user input at the terminal. (It helps to bear that in mind when scripting it, ime; it makes it much easier to relax into quoting, for a start.)
CasperVector wrote:
I would be glad to know, specifically, which of the historical baggages that came from "having to cope with random user input" are still necessary with scsh or rc(1).
None; that's why rc can ditch a lot of baggage: precisely because it is purely a scripting language, not intended for interactive usage (from my reading of the project pages many years ago.)
My point was more about how people dismiss shell, without understanding what it is about: making it easy for a user at an interactive terminal to run commands, while doing the wildcarding so no command, utility or other program, ever has to worry about it.
Which is why argc and argv are the execution interface in Standard C, which like it or not, and I do, standardizes the UNIX approach as much as possible, leaving it to POSIX to fill in the middle layers, and more UNIX goodness than a language standard can mandate.

In any event, it helps when learning sh, to remember that its main purpose is to process fumbling user input and turn it into a command with arguments; that's why everything is a string, and that's why quoting is important, when it comes to canning those sequences and staying robust. And that's why POSIX has so much to say about interactive use.
Everything else, like command status and macro-processing, which is central to every shell, is aimed at that purpose.

Pipelines are a wonderful metaphor too. "Software Tools" is enlightening on those; some of the exercises seem to have inspired later developments; eg: <(process substitution) in bash.
CasperVector wrote:
I do not want to force the Lisp / Unix (the latter might be Plan 9 + djb-esque design) reconciliation on anyone. However, I am convinced that Lisp (at least Scheme) is, in many aspects, more Unix-ish than C.
Here I must disagree. Sure you can find "aspects" but the truth is that UNIX and C were developed in parallel.

I can see what you're saying wrt computing and computability, but that's just the sequencing I referred to above, which derives from LISP's first purpose, as a thought-experiment on computing.

I'd counter "assembler" in at least two architectures, to grok what a computer is all about. That is what C builds on, and formalises in its memory model. And that is what LISP is about, imo, as are all code languages, ultimately: how to make something happen on a digital computer, with a register file, a program-counter, and a runtime control-stack in memory (also where insns reside.)

Additionally, I don't think "the UNIX way" is about any language: it is about modularity, high-cohesion and low-coupling, ie: basic fundamentals of Computer Science, with cheap processes (certainly by comparison to every other system when it was conceived and developed as a model, in the 1970s) and a clean namespace, which are what keep things modular.
CasperVector wrote:
(BTW, I do not consider Scheme to be the Ultimate Language: see the last paragraph of this post for a glimpse of what I personally imagined.)
I believe the reconciliation will result in a system of significantly lower total implementation complexity, yet much higher flexibity: IOW, better adherence to the Unix philosophy.
I couldn't see much about a language in there; regardless, as I said, UNIX is not about any one language. In fact if you read "Software Tools" (the Ratfor edition, 1976), "The UNIX Programming Environment", "The AWK Programming Language" and "The Practice of Programming", you will see it is all about combining the best approaches for each part of the problem, using whatever utility, language or mental-model is best for each part.

In that sense, Perl is the most "UNIXy" language, with its catchphrase Tim Toady: There's More Than One Way to Do It.

Nonetheless, if you get hung up on one "ultimate language", you're missing the point, which is to allow people with wildly-differing mental models and conceptions to work together. Firstly and most simply, by using standard input and output, settling on text data streams since those are the easiest to debug and maintain, as well as being the most useful outside the closed system which so many "developers" like to focus on -- to the detriment of the end-result for the user, which is why we're even here.
CasperVector wrote:
And the Unix philosophy is, by the way, the only technical means by which we can get rid of control by big companies like the one which attempts to push systemd everywhere.
You cannot solve sociopolitical problems via technical solutions to something completely different.

You're right in that, whatever the marketing hype, it always has to come face-to-face with reality at some point, and then all the lies in the world won't make water flow uphill, or modularity any less important than it has always been, on a level with hygiene in cooking and medicine both.

But then, that's not a problem for the kleptos, since none of that is the reason we have such stupidity on a global scale.
They'll just keep peddling the same old line, ("systemdbust is just the messenger") and people will believe it, because:
Leonard Schapiro wrote:
The true object of propaganda is neither to convince nor even to persuade. But to produce a uniform pattern of public utterances in which the first trace of unorthodox thought reveals itself as a jarring dissonance.
That was on Stalin, but it's exactly the same playbook, with the neologism "cognitive dissonance" which is essentially just: lazy thinking.
Since that is what the indoctrination^W education system and the media (including the anti-social media) foster, people are lazy in their thinking, and since most everyone is a week or two from starvation, "No-one ever got fired for.." is all they worry about, quite reasonably.

Oh, wrt computing, languages and assembler, a really great book if you can find a copy, is "Digital Computer Fundamentals" (Bartee, 1972.) That's the 3rd edition, summing up after a decade more of hardware implementation.
You don't need anything more than "grade-school" electronics theory to understand it, which any "Starting Electronics" book will give.
That's the one which will really ground any work you do in assembler, or indeed any other language, in what the CPU actually gets up to. Wonderful description of Booth's algorithm, in a section subtitled "the Use of Algorithms in Computer Design," which seems apposite.
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 547
Location: NRW, Germany

PostPosted: Sat Jun 23, 2018 1:35 pm    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
In any event, it helps when learning sh, to remember that its main purpose is to process fumbling user input and turn it into a command with arguments; that's why everything is a string, and that's why quoting is important, when it comes to canning those sequences and staying robust.

Well, in hindsight they really (reeeeally) should've disallowed '\t\n' in filenames …

steveL wrote:
In that sense, Perl is the most "UNIXy" language, with its catchphrase Tim Toady: There's More Than One Way to Do It.

And in the sense that it "does everything and many things badly" it is the least UNIXy.
Offering many different ways to do the same thing is beneficial if (and only if) the underlying system is modular.
In the case of UNIX i can say 'grep' should support parallelism, write implementation that does and replace the current grep with it.
In this scenario providing many different ways to do the same thing is helpful, because it offers you different interfaces at different levels where you can plug your implementation.
This is not true for programming languages, because for the most part, they are not modular.
For instance the ability to use C- and Haskell-Style function headers [int foo(int)] and [foo -> int -> int] within the same language has no value outside of "Well I think this one looks better".
It does however make the language more complex, which raises the barrier to write a different compiler for that language.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Sat Jun 23, 2018 3:26 pm    Post subject: Reply with quote

Why disallow tab and newline, but still permit blank (0x20) and carriage return (\r)? We could argue extensively about which characters need to be disallowed, but I think the better approach would have been to say that there is a shell mode (preferably, the default for scripts, if not for interactive) where variable references are required to declare what semantics they want in all cases, rather than required only when dealing with "dangerous" values. To me, the true problem with dangerous values is that it is easy to write a script that happens to work correctly so long as only safe inputs are used, but then fails horribly when dangerous inputs are used. (The classic script of rm -rf $var/, where there is no check that $var is not empty comes to mind. This works great as long as $var points to the directory you wanted to delete, but deletes everything accessible to the user if $var is empty.)

If instead the interpreter screamed the first time you used the "wrong" type of expansion, even when the expansion happened to produce a safe string, authors would be required to do the right thing to get the script to work at all. In the context of quoting dangerous filenames, that is hard to enforce, but a first step might have been that rather than having $VAR (unsafe) and "$VAR" (safe), you have $expand_wordsplit(VAR) and $expand_literal(VAR). The author must pick which mode to use. This doesn't force him to pick the right one (which would be nice, but very hard to enforce in code), but by forcing him to pick at all, he has a better chance of doing it right. If he still does it wrong, at least it's easier to audit. A reviewer could quickly grep for $expand_wordsplit and insist on a justification for each use of it, since improper use of that one is dangerous. You could extend this design with having multiple syntaxes for assigning the variable, where the syntax chosen is a declaration of how the variable should be used. For example, $assign_filename(VAR, expr) replaces VAR=expr, but now the shell has been informed that this is a filename, not just some arbitrary string. This then allows better checking later that it is not used in ways that are meaningless for a filename, such as trying to word-split a filename (rather than a list of filenames or a list of strings).
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Jun 23, 2018 7:22 pm    Post subject: Witter, witter.. Reply with quote

Dr.Willy wrote:
Well, in hindsight they really (reeeeally) should've disallowed '\t\n' in filenames …
Heh, I have to disagree with you here.
While there is a case for disallowing controls less than ' ' (space char), no-one is suggesting that we disallow the space char itself.
At which point, the whole argument breaks down, because you must use quoting, quite apart from processing filename expansions correctly.

If you want to take it further, consider the set "?*[]" - you would not want to ban '?' from filenames (consider a book/song/chapter title), so again, quoting is needed (they're all glob chars; same applies to any punctuation/statement terminator like '&', ';' and '|'.)

Having said that, I wouldn't mind a filesystem that banned newlines (only '\n' on Unix); could be quite useful.
It would never stop the need to quote parameter expansions, but it would make it possible to use find robustly in POSIX sh.
Again, though it comes up against the idea of robust scripting: you can never guarantee you won't get a newline, on randomFS somewhere out there. So just write it robustly, once, and be able to rely on it; checking your inputs is basic.

Quoting in some form or another is needed in every language (or we could not use strings); it is only optional in shell, because as I said, everything is a string, since it is all about processing user-typed arguments and macros.

"Every shell is a macro-processor at heart," is a quote from Chet Ramey's site, that really helped me understand sh better. (I think it's gone now, but still.)

It also explains why I think m4 is a stupid idea for ./configure; but then so did the author of autoconf, as he commented a year or so after he wrote it ("but it was just an undergrad project." Bloody students.. ;)
steveL wrote:
In that sense, Perl is the most "UNIXy" language, with its catchphrase Tim Toady: There's More Than One Way to Do It.
Dr.Willy wrote:
And in the sense that it "does everything and many things badly" it is the least UNIXy.
Offering many different ways to do the same thing is beneficial if (and only if) the underlying system is modular.
In the case of UNIX i can say 'grep' should support parallelism, write implementation that does and replace the current grep with it.
In this scenario providing many different ways to do the same thing is helpful, because it offers you different interfaces at different levels where you can plug your implementation.
This is not true for programming languages, because for the most part, they are not modular.
For instance the ability to use C- and Haskell-Style function headers [int foo(int)] and [foo -> int -> int] within the same language has no value outside of "Well I think this one looks better".
It does however make the language more complex, which raises the barrier to write a different compiler for that language.
Hmm interesting points.
Thing is with perl, the underlying system is modular, since mostly what it's doing is exposing the C userland.
The same applies more self-evidently with awk (a beautiful language.)

And this I think is a critical point: software is all about as thin a layer as possible on top of other code (or hardware, but we're discussing application-programming, not at kernel-level.)

The less it does, the quicker it runs.

The simpler it is, the more likely it is to be correct; and the quicker it runs, since it does less, not more.

It took me ages to really grok that. The default tendency is to presume that "more is better"; the more work I do, the more I earn, etc.

Computing is completely the opposite: the more you do, the more likely it is you are getting lost in "complexity of one's own making." (Djikstra)
And the more likely it is that you are writing a pig of a program.

This is why the code that survives, is the clean, apparently "simple" code; so newbs think "I can grok that, it's easy", "off I go", and off they go, spinning out castles in the air.
In fact, that "simple" code is the sophisticate in the room. Actual programmers (from back in the days of the "data-processing industry") have firstly done their domain research, and winnowed things down to as simple as possible, including trying out all the crazy schemes and backends (it's called "exploring the domain"), til just that thin layer is left, fulfilling the functions required, as efficiently as possible.

I agree with you that providing two forms of function specifier is not necessary, and does not add value.
A functional programmer, however, would insist that there is a critical difference, in that the functional form is designed for currying (roughly akin to "partial evaluation".)
Not that you can't curry (provide parms) with the other form; just that it's not an obvious thing to do, and when it is, you get other idioms around it, none of which tend to occur to programmers in other languages, especially those who never tried functional programming (so didn't complete a CS education, clearly.)

A good analogy would be what we call "first-order strings" when we are forced to use a term; ie strings as in sh, or awk (or snobol, or js, or..), where the coder simply does not worry about their allocation, and they are first-order as functions are first-order in FP: they can be passed around, or returned, and instantiated at will.
Effectively "first-order" means "builtin", or in my head at least: "can be treated as scalar." (Pick me up on that if you like.)

When you have first-order strings, as when you have first-order functions, different idioms open up, just because you can think at a higher-level.

WRT language complexity, though, the biggest barrier to writing different compilers, is when the language designers lose sense, and pretend that it does not matter if it can never be parsed with yacc.

This is the old "context-free grammar" argument; it should be noted however, that that is a term which applies to parse-theory, and no language is truly context-free; certainly not one with declarations (which are essential, imo.)

As you might guess, this is one of the reasons I dislike C++; Stroustrop just hacked a front-end on to C, completely ignoring the work that Thompson, Aho, Ullman, Kernighan and Ritchie (to name a few) had done in the 1970s.
To my mind that was unforgivable, especially since he knew them (iirc.) Even if he didn't: do your research, ffs.

Fast forward 30 years, and C++ is finally sort of reinventing LISP, badly, and arrogant nubs act as if it's somehow a sign of sophistication that C++ is unparseable by automated tools, and not the rank amateurism everyone with half a clue knows it to be.

I think I agree with you overall, in terms of language clarity; certainly I don't think python's myriad ever-changing idioms are at all helpful. (More like: a reason for the cluebat.)

If that doesn't make sense, consider how you read a file line-by-line. Contrast the various approaches that have been "the best" over the years depending on which series of python is under discussion, with shell's ever-reliable:
Code:
while read -r line; do sth with "$line"; done < "$file"
I might grumble about how that's not optimised, and cannot be according to the mksh maintainer and POSIX, but not with the idiom. (bash's mapfile does it anyhow.) [1]

Shell is not where you optimise, except algorithmically, which is ofc the only way to optimise: everything else flows from there; never the other way round.
Take a look at Bentley's "Programming Pearls" (the first one) and "The Practice of Programming" (Kernighan & Pike) for more insight into this (if it is of interest.)

--
[1] Yes, I am aware of "$REPLY" and where it's needed, but I don't think we need to go there to get the idea. It isn't in fact needed for most things I do in shell, and implicit trim usually is. Getting filenames from find -print0 into a bash array is a notable exception; in fact the only one I can recall right now, in ten years of scripting.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Jun 23, 2018 8:05 pm    Post subject: Reply with quote

Hu wrote:
To me, the true problem with dangerous values is that it is easy to write a script that happens to work correctly so long as only safe inputs are used, but then fails horribly when dangerous inputs are used.
IDK, that doesn't sound like much of an argument to me. It reads akin to: "everything was fine until I told it to do something dumb, and it did."

The "easy to write" part is due to what I discussed above: sh is designed to be forgiving of a harried user at the terminal.
Quote:
The classic script of rm -rf $var/, where there is no check that $var is not empty comes to mind. This works great as long as $var points to the directory you wanted to delete, but deletes everything accessible to the user if $var is empty.
No it does not "work great". It is truly crap, as greycat and others would point out in a heartbeat in #bash.
USE MORE QUOTES! And for God's sake CHECK YOUR INPUTS. Even: "${dir:?BUG}" would be better.

When did "I'm writing in shell" become an excuse for "I can write a turd if I want, and blame it on the language"?

Would you really accept (or even conceive of) that kind of nonsense in discussion about any other implementation language?

As an aside, 'rm' never had to try and remove rootfs, according to POSIX, and thankfully GNU bods have finally picked up on that.
Hu wrote:
A first step might have been that rather than having $VAR (unsafe) and "$VAR" (safe), you have $expand_wordsplit(VAR) and $expand_literal(VAR). The author must pick which mode to use. This doesn't force him to pick the right one (which would be nice, but very hard to enforce in code), but by forcing him to pick at all, he has a better chance of doing it right. If he still does it wrong, at least it's easier to audit. A reviewer could quickly grep for $expand_wordsplit and insist on a justification for each use of it, since improper use of that one is dangerous.
Right, so we have a "reviewer" but not a developer who has actually spent any time learning the implementation language to any notable degree (like spending a year or two in #bash until such travesties would make one twitch at sight.)

Kind of missing the point of a shell designed to process user input at the terminal, wouldn't you say?

Not that I demur from your basis: specific expansion is cleaner. Just your approach, and the idea that we should have formal review processes of utter rubbish, rather than implementors who care about their work.
The former means nothing without the latter, and I'll take the latter alone anyday over armchair-reviewers with no ability, from a pool of people who thought it was shell's fault they couldn't write it.
Quote:
You could extend this design with having multiple syntaxes for assigning the variable, where the syntax chosen is a declaration of how the variable should be used. For example, $assign_filename(VAR, expr) replaces VAR=expr, but now the shell has been informed that this is a filename, not just some arbitrary string. This then allows better checking later that it is not used in ways that are meaningless for a filename, such as trying to word-split a filename (rather than a list of filenames or a list of strings).
Good Lord, that really is overly complexifying things, simply to avoid telling people to learn how to code shell, already.

FWIW I had similar ideas to the former, wrt make expansions, so I am not unsympathetic. WRT make, I'd do it by name, so *FLAGS are always split, and nothing else is, until we say otherwise. Then I realised that misses the point of make, which is basically doubled macro-expansion, or eval via a much saner interface.
$(CC) is deliberately vague about what it contains; since this is build-process, we have more degrees of freedom than trying to be robust in a sh script. For a start, our target user is an admin, not an end-user. Secondly, we control the directory tree.

On the wider issue, you are not going to change POSIX sh. Just bite the bullet, and tell others to learn it properly, instead of making excuses for turds.
Yes, it can be embarrassing, but everyone has been there with shell. Instead of being unassuming, and making allowances, get them past it, so they can be useful and not YAF hindrance.
Direct them to #bash and let the 900 or so people in there teach them.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Sat Jun 23, 2018 11:47 pm    Post subject: Reply with quote

Of course it works great. It runs exactly as intended when given good inputs. Quotes would not help in that example. Your suggestion of :? goes exactly to my point. The language makes it easy to do things wrong and hard to do them right, because you need to use a non-default, and indeed non-obvious, construct to express the statement that in practice almost everybody intended to use, but didn't know enough to actually use, so they instead used an easy and wrong path. Worse, that wrong path isn't even wrong enough to punish them the first time they run the script.

You will never break all bad authors of all bad habits. The best you can hope for is that the people who know the craft can review code to catch mistakes before they hit production. The more readily the code can be reviewed, the more likely it will be reviewed well (and at all). A language that requires explicit easy-to-see constructs for potentially dangerous operations is easier to review than one that requires the reviewer to have a full language state machine memorized.

I think it was fundamentally a mistake to expose the full "forgiving" interactive parser to shell scripts. It was a further mistake not to provide a way for scripts to opt in to a strict mode that disables all those "helpful" do-what-I-mean guesswork items that users love in interactive mode.

[Edit: disabled bogus smiley.]


Last edited by Hu on Mon Jun 25, 2018 12:35 am; edited 1 time in total
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sun Jun 24, 2018 11:52 pm    Post subject: Reply with quote

Hu wrote:
Of course it works great. It runs exactly as intended when given good inputs.
This is like a newb C coder saying "well I don't get UB when I run it."

I should not need to say any more.
Quote:
Quotes would not help in that example. Your suggestion of :? goes exactly to my point. The language makes it easy to do things wrong and hard to do them right, because you need to use a non-default, and indeed non-obvious, construct to express the statement that in practice almost everybody intended to use, but didn't know enough to actually use, so they instead used an easy and wrong path.
IOW: "Thanks steve, I never knew about ":?", it's good to learn about, very handy for precisely this situation."

Really you should have read the manpages for bash PARAMETER EXPANSION several times already, and compared with the POSIX sh spec, to see what the common-ground is, before you start criticising sh for not "being obvious". I know I'd done exactly that several times within a few months of first joining #bash.
Please bear in mind I am actually quite surprised that you haven't; my experience of you is that you are competent and thorough.

"in practice almost everybody intended to use," is hilarious though, you must admit. Try running that line past ##c wrt some UB you've triggered.
You don't seem to have absorbed what GIGO really means, yet.
Quote:
Worse, that wrong path isn't even wrong enough to punish them the first time they run the script.
Yes, and quite often C coders get away with UB because it doesn't trigger on their architecture, or with a specific set of data, yadda-yadda.
Quote:
You will never break all bad authors of all bad habits.
I have no interest in doing so; I only work with coders who take responsibility, rather than blame their implementation language.
Only a bad workman blames the tools. (and it takes a man to blame his tool.. ;)
Quote:
The best you can hope for is that the people who know the craft can review code to catch mistakes before they hit production.
No, the best I can hope for is that at least some of the "modern" generation bother to do their groundwork, so that they never even develop "bad habits" especially when it comes to sh scripting, which is so critical for portable build-systems as well as portable most everything else, and is in any case a central part of UNIX, whether nubs like it or not.
Quote:
The more readily the code can be reviewed, the more likely it will be reviewed well (and at all). A language that requires explicit easy-to-see constructs for potentially dangerous operations is easier to review than one that requires the reviewer to have a full language state machine memorized.
Oh please, if you aren't already keeping track of bracketing in other languages, then you're not actually coding anything. String quoting is just the same in sh, which is why excess ${braces} that aren't needed are such a PITA, and as greycat stated quite a while back, anyone using them is either pretentious or ignorant. My experience is that in the Gentoo "leet developer" case: they're both, the latter wilfully so. (as if this is their sulky revenge on how shell has made them look dumb in the past, to write it like crap and justify their bleating, or at least make it look harder than it has to be.)

If you want review, all you have to do is pastebin the script, and ask #bash for comments; they even give you pastebin links in the /topic. Many people do that, as it's a valuable resource, that does not cost a dime.
So again, there is no real problem here beyond a PEBKAC, afaic.
Quote:
I think it was fundamentally a mistake to expose the full "forgiving" interactive parser to shell scripts. It was a further mistake not to provide a way for scripts to opt in to a strict mode that disables all those "helpful" do-what-I-mean guesswork items that users love in interactive mode.
By all means patch mksh so that "declare -f" or whatever does what you want wrt filenames; then you can add "declare -s" to state "this variable is meant to be word-split."

But honestly from where I'm sitting, all the verbiage reads exactly like god-knows how many people I've heard trying to justify various reasons for why their borked sh script should be okay, or really it's shell's fault, etc. ad nauseam.

I have zero sympathy, because I took the time to spend 6 months learning from ##bash, during the first 3 of which I never said a word, before I even felt capable of putting a script together properly; thereafter I spent hours every week in their for years, during which time I kept up with the language, and more importantly absorbed a shit-tonne of lessons about the UNIX userland: across vendors, not just one niche OS, with "developers" on so many distros who think it's okay not to know their implementation language.
I cannot stress enough how wrong that is on so many levels, to me and others I know, who care about their craft.

AFAIC they should hang their heads in shame, and hand their notices in, because all they got us was decades of wasted machine and human-hours on Poeterring's land-grab on behalf of RedHatGooglePlex.
Especially the "nice-guys" who keep promulgating that awful "narrative" of apparatchik bulshytt, that "users are the problem."

No: idiot "developers" who cba to do their research, nor their groundwork, but are happy to spend hours every day getting a dopamine fix by criticising others in their little bubble of poo-slinging and confirmation bias, are the problem. Most especially because they think that by burying their heads in the sand wrt sociopolitical matters, they're somehow "above the fray" rather than the mopes at the bottom; and all the while they're absolute marks (or less often, willing fifth-columnist apparatchik-wannabes) for the corporations who are busy turning their users into a commodity, and could not give a damn about any of us; hell, they don't even care for human-beings.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Mon Jun 25, 2018 12:25 am    Post subject: Reply with quote

I've never personally written that particular mistake, but I've seen news stories of that exact thing going horribly wrong, because, as I stated repeatedly, it's not broken for good inputs, so it gets past whatever cursory testing the author bothers to do. You seem determined to believe that I approve of any undefined construct as long as it works in the good case. I don't. I condemn undefined constructs that fail horribly in the bad case and produce the user's desired result in the good case, because if the undefined construct doesn't fail early and punish the user during acceptance testing, then the user won't go back and do it right until after something truly bad happens in production.

It's clear you just want to yell at me because you think I'm a bad shell coder. I stay out of shell as much as I can because the need to spawn a new process for anything interesting is a major burden for the type of work I want to do. Even with all bash extensions turned on, I can't do what I want without helper processes because bash just can't express some of what I need. If I need a helper in C, Python, Perl, etc., it's ultimately less trouble to do the entire job in the other language, especially when it can be expressed in a more powerful scripting language.

My point, which you seem determined to miss, is that entrusting users with powerful tools that are easily misused is a terrible idea. Given that, the choices are to make the tool less powerful, to make it safer, or not to let novice users use it. I don't want an even less powerful language; it already can't do what I want. I'm not going to be able to stop inexperienced users from using it poorly. That leaves making it safer, or at least condemning the choices of people who didn't make it safer when they had the chance.
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 547
Location: NRW, Germany

PostPosted: Tue Jun 26, 2018 12:12 pm    Post subject: Re: Witter, witter.. Reply with quote

steveL wrote:
Dr.Willy wrote:
Well, in hindsight they really (reeeeally) should've disallowed '\t\n' in filenames …
While there is a case for disallowing controls less than ' ' (space char), no-one is suggesting that we disallow the space char itself.

noone? ;-)

steveL wrote:
Having said that, I wouldn't mind a filesystem that banned newlines (only '\n' on Unix); could be quite useful.
It would never stop the need to quote parameter expansions, but it would make it possible to use find robustly in POSIX sh.
Again, though it comes up against the idea of robust scripting: you can never guarantee you won't get a newline, on randomFS somewhere out there. So just write it robustly, once, and be able to rely on it; checking your inputs is basic.

Well, you pretty much nailed the reason I want to ban '\t\n' specifically.
Unix uses plain text as an interface for good reasons, but as it stands the much of the unix userland is unequipped to handle file names.
Some tools offer a '-0' flag to fix the most pressing issues, but this reveals the fundamental problem:
Filenames in Unix are not plain text, they are binary data.
This in turn breaks usage of grep, it breaks usage of sed, it breaks usage of awk and all other line-oriented unix tools.
Afaik plan9 moved away from the line-orientation towards structural regexes, which allows to process much more complex text formats, but even that wouldn't fix the problem of filename processing. Allowing control characters in filenames is just insanity.

I absolutely agree that checking your input is basic. We just disagree on where that check should occur.
As far as I am concerned it should be checked at system-call level.

steveL wrote:
Hmm interesting points.
Thing is with perl, the underlying system is modular, since mostly what it's doing is exposing the C userland.
The same applies more self-evidently with awk (a beautiful language.)

And this I think is a critical point: software is all about as thin a layer as possible on top of other code (or hardware, but we're discussing application-programming, not at kernel-level.)

Wait, are we talking about the programming language (= the grammar) or its standard library?
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue Jun 26, 2018 1:02 pm    Post subject: Reply with quote

Hu wrote:
I've never personally written that particular mistake, but I've seen news stories of that exact thing going horribly wrong, because, as I stated repeatedly, it's not broken for good inputs, so it gets past whatever cursory testing the author bothers to do. You seem determined to believe that I approve of any undefined construct as long as it works in the good case. I don't. I condemn undefined constructs that fail horribly in the bad case and produce the user's desired result in the good case, because if the undefined construct doesn't fail early and punish the user during acceptance testing, then the user won't go back and do it right until after something truly bad happens in production.
You are missing out the coder, and the sysadmin from this analysis, so it's very flawed from conception.
Quote:
It's clear you just want to yell at me because you think I'm a bad shell coder.
Clearly I am coming across wrong, and for that I must apologise.
The rant about Linux "developers" who cannot code shell was not aimed at you; sorry about that.
Quote:
I stay out of shell as much as I can because the need to spawn a new process for anything interesting is a major burden for the type of work I want to do.
Yes, but shell is the glue, or the batch logic, that can often be eliminated in a later stage. More often, there is no need to.
Quote:
Even with all bash extensions turned on, I can't do what I want without helper processes because bash just can't express some of what I need. If I need a helper in C, Python, Perl, etc., it's ultimately less trouble to do the entire job in the other language, especially when it can be expressed in a more powerful scripting language.
I was wondering why you don't just use perl, wrt much of your syntax questions. Though I have a feeling you need to explore #awk more, if you haven't worked the awkbook yet.
Quote:
My point, which you seem determined to miss, is that entrusting users with powerful tools that are easily misused is a terrible idea.
Again, your analysis is completely flawed, imo, and leads you down the boring path of blaming end-users for implementor incompetence.
Quote:
Given that, the choices are to make the tool less powerful, to make it safer, or not to let novice users use it. I don't want an even less powerful language; it already can't do what I want. I'm not going to be able to stop inexperienced users from using it poorly. That leaves making it safer, or at least condemning the choices of people who didn't make it safer when they had the chance.
Or accepting that your expectations are completely misplaced, since your analysis is so terribly full of holes at the fundamental level.

There is a difference between a professional programmer, a professional sysadmin, and Joe EndUser, which you completely ignore, despite the discussion being entirely about scripting in an implementation language, just as we use C, or for web-output we must use javascript, CSS and HTML, whatever our personal feelings on each.
Now you've moved the goalposts to what a random user does, and suddenly this is the criterion for assessing a code language. (This is simply bad form, imo.)

Condemning the people who developed all this in the 1970s, at a time when 64KB of RAM was a pipedream, seems rather churlish, especially when you are completely dependent on the end-product of all their work for everything you do.

Still, it's my fault the conversation got overheated, so again, my apologies, and for any unintended offence caused.

One thought to leave you with: shell is much more functional (as in: functional programming) than people realise.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Tue Jun 26, 2018 1:30 pm    Post subject: Reply with quote

steveL wrote:
... This is like a newb C coder saying "well I don't get UB when I run it." ...
Pardon me if I've missed some context, but what does "UB" stand for?

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Tue Jun 26, 2018 1:40 pm    Post subject: Reply with quote

steveL wrote:
... This is like a newb C coder saying "well I don't get UB when I run it." ...

John R. Graham wrote:
Pardon me if I've missed some context, but what does "UB" stand for?

John ... I could be wrong but I read it as "unacceptable behavior", which seems to make sense ... though part of me is hoping the intended meaning is "uncle bill" :)

best ... khay
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue Jun 26, 2018 2:00 pm    Post subject: Re: Witter, witter.. Reply with quote

steveL wrote:
While there is a case for disallowing controls less than ' ' (space char), no-one is suggesting that we disallow the space char itself.
Dr.Willy wrote:
noone? ;-)
Lul, been a while since I've seen that one; good reference. (I thought you were going to quote the proposal to disallow controls.)
That is what strictly-conforming applications must conform to in their outputs (under a user path like "$HOME"); it is not at all about the _implementation_ filepath (which is where the proposal on controls was relevant) nor about arguments the user might pass to a command.
Dr.Willy wrote:
Unix uses plain text as an interface for good reasons, but as it stands the much of the unix userland is unequipped to handle file names.
No, this is simply untrue; shell scripts are perfectly able to deal with all filenames, when correctly written.
Globs are perfectly namespace-safe, for instance:
Code:
 for i in *; do sth with "$i"; done
will handle any filepath that happens to match the glob: and the path will be passed correctly as a parameter.
(I'm leaving out nullglob and [ -e "$i" ] check, as context-specific, often in the called function.)
You learn to do the same with find, using sh -c with exec and +, and again, all file paths are handled correctly (as arguments to the sh command.)
Dr.Willy wrote:
Some tools offer a '-0' flag to fix the most pressing issues, but this reveals the fundamental problem:
Filenames in Unix are not plain text, they are binary data.
Again, this is incorrect; filepaths in UNIX are C strings (that is '\0' terminated sequence of byte or multi-byte char.)
That is why UTF-8 is the way it is; its original name was Unicode Filesystem-safe Transform, with the emphasis very much on "Filesystem-safe", so ASCII paths (the "portable" encoding on everything apart from IBM machines) are exactly equivalent, and processed without change, while wide-paths do not break ASCII processing (such as on space, tab and newline.)

Binary data is allowed to contain NUL-bytes; and the principal restriction on a text file, is that they do not contain NUL-bytes.

So the -0 flag, such as -print0 in POSIX find, is a recognition of that; but it's incorrect to say that filepaths are binary data, since they are C strings and handled correctly by argv.
The only other restriction on filenames, as opposed to paths, is that they cannot contain the '/' directory separator, and possibly other implementation-specific ones (like '\\' on Windoze.)
Dr.Willy wrote:
This in turn breaks usage of grep, it breaks usage of sed, it breaks usage of awk and all other line-oriented unix tools.
No, it does not.
They only break when you try to use them to process filenames, which you should never do, just as you should never read or pipe the output from ls.
Or when you don't know how to use sh properly (because you haven't paid your dues in #bash) and consequently do not quote properly, or use ls instead of globs, and thus your scripts are fragile, not robust as they could be (and would be corrected instantly, as would your approach, if you just asked #bash.)
Quote:
I absolutely agree that checking your input is basic. We just disagree on where that check should occur.
As far as I am concerned it should be checked at system-call level.
They already are; that's what errno is for in C.
OFC if you think it's not your job to check the return from syscalls, then you, your users and employer, have bigger problems.
Chances are, you think it's not your job to quote properly in shell scripts, either (and I don't want to use anything you output. ;)

Input checking happens in code, too, or it is not robust, ime, and just spreads the garbage further into the system or pipeline, when it should fail early (and fail fast, as my boss likes to say.)
That really is the only way to limit the impact of GIGO, which is never going to change as the underlying principle of all computing, imo.
steveL wrote:
Thing is with perl, the underlying system is modular, since mostly what it's doing is exposing the C userland.
The same applies more self-evidently with awk (a beautiful language.)

And this I think is a critical point: software is all about as thin a layer as possible on top of other code (or hardware, but we're discussing application-programming, not at kernel-level.)
Dr.Willy wrote:
Wait, are we talking about the programming language (= the grammar) or its standard library?
That latter point is about all software, which is just control-logic for a CPU, ultimately. (Best not to get too pretentious about it.)
The former wrt modularity in perl, which I contend is inherent in terms of what actually happens at runtime: mostly someone else does the work, or you need to tune your algorithms and consider what optimisation is all about: trying to do as little as possible, preferably not even on each element of the dataset, but as few as possible.

OFC correctness comes first; you can make something correct work faster, which isn't even usually needed; but you'll have a hard time making something broken-for-speed-or-other-wacky-design-idea work correctly (and it's usually easier to start from scratch, as it is when you're done exploring the domain.)
Whatever happens, if you don't know the domain, you are only at the start of the whole lifecycle of a programmer working on that project.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue Jun 26, 2018 2:14 pm    Post subject: Reply with quote

steveL wrote:
... This is like a newb C coder saying "well I don't get UB when I run it." ...
John R. Graham wrote:
Pardon me if I've missed some context, but what does "UB" stand for?
khayyam wrote:
John ... I could be wrong but I read it as "unacceptable behavior", which seems to make sense ... though part of me is hoping the intended meaning is "uncle bill" :)
Lul.

It means "Undefined behaviour", jrg, which is what you get when an assembler-program goes out of wack. It overwrites whatever it feels like, tromping control-structures etc, and there are no guarantees whatsoever as to what will happen to the machine, nor its data, including the filesystem.

Avoiding UB in C is the equivalent of quoting correctly in sh, ime.
It's important to know why you are quoting (what the background of sh is, helps here) and what the ramifications of not quoting could be.
Similarly it helps to understand what C wraps (assembler, essentially) and the various forms of architecture are part of that history; but UB is much more frequently triggered by inappropriate use of a system-level interface, or standard function.

Quite often nothing bad happens, at least on a specific machine, or operating-system, so people develop bad-habits; again, a little time in ##c can cure them of that laxity, and they can then begin to learn properly: without the attitude that it's someone else's responsibility if their code goes awry.

Or indeed the ludicrous notion that they're somehow "better" than the rest of us, if they use a cross-platform language to write code that will only ever work on one platform, by inherent limitation of implementor.

If you're lucky, UB crashes your program immediately. That's why NULL ptr dereferencing should always SEGV. (I think Windoze allowed it, for years, on the same logic of pandering to stupidity rather than firmly correcting it so people learn.)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Tue Jun 26, 2018 3:54 pm    Post subject: Reply with quote

steveL wrote:
steveL wrote:
... This is like a newb C coder saying "well I don't get UB when I run it." ...
John R. Graham wrote:
Pardon me if I've missed some context, but what does "UB" stand for?
khayyam wrote:
John ... I could be wrong but I read it as "unacceptable behavior", which seems to make sense ... though part of me is hoping the intended meaning is "uncle bill" :)
Lul.

It means "Undefined behaviour", jrg, which is what you get when an assembler-program goes out of wack. It overwrites whatever it feels like, tromping control-structures etc, and there are no guarantees whatsoever as to what will happen to the machine, nor its data, including the filesystem.
Okay, thanks. I can't resist being the pedant, though. I think that essentially nothing is undefined at the assembly language level because engineers do not (at least intentionally) design CPU hardware with non-deterministic behavior. Perhaps you meant "unintended behavior".

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Dr.Willy
Guru
Guru


Joined: 15 Jul 2007
Posts: 547
Location: NRW, Germany

PostPosted: Tue Jun 26, 2018 10:30 pm    Post subject: Re: Witter, witter.. Reply with quote

steveL wrote:
Dr.Willy wrote:
Some tools offer a '-0' flag to fix the most pressing issues, but this reveals the fundamental problem:
Filenames in Unix are not plain text, they are binary data.
Again, this is incorrect; filepaths in UNIX are C strings (that is '\0' terminated sequence of byte or multi-byte char.)
That is why UTF-8 is the way it is; its original name was Unicode Filesystem-safe Transform, with the emphasis very much on "Filesystem-safe", so ASCII paths (the "portable" encoding on everything apart from IBM machines) are exactly equivalent, and processed without change, while wide-paths do not break ASCII processing (such as on space, tab and newline.)

Binary data is allowed to contain NUL-bytes; and the principal restriction on a text file, is that they do not contain NUL-bytes.

So the -0 flag, such as -print0 in POSIX find, is a recognition of that; but it's incorrect to say that filepaths are binary data, since they are C strings and handled correctly by argv.
The only other restriction on filenames, as opposed to paths, is that they cannot contain the '/' directory separator, and possibly other implementation-specific ones (like '\\' on Windoze.)

What I meant to say is that the -0 flag produces/reads binary data.

steveL wrote:
Dr.Willy wrote:
This in turn breaks usage of grep, it breaks usage of sed, it breaks usage of awk and all other line-oriented unix tools.
No, it does not.
They only break when you try to use them to process filenames, which you should never do, just as you should never read or pipe the output from ls.

Why yes, if you don't use them they don't break.
The point is that the only reason you cannot use them is the lack of non-NUL terminators.

steveL wrote:
Quote:
I absolutely agree that checking your input is basic. We just disagree on where that check should occur.
As far as I am concerned it should be checked at system-call level.
They already are; that's what errno is for in C.
OFC if you think it's not your job to check the return from syscalls, then you, your users and employer, have bigger problems.
Chances are, you think it's not your job to quote properly in shell scripts, either (and I don't want to use anything you output. ;)

Oh trust me, I'd be more than happy to check for EBADNAME.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Fri Jun 29, 2018 11:58 pm    Post subject: Reply with quote

steveL wrote:
It means "Undefined behaviour", jrg, which is what you get when an assembler-program goes out of wack. It overwrites whatever it feels like, tromping control-structures etc, and there are no guarantees whatsoever as to what will happen to the machine, nor its data, including the filesystem.
John R. Graham wrote:
Okay, thanks. I can't resist being the pedant, though. I think that essentially nothing is undefined at the assembly language level because engineers do not (at least intentionally) design CPU hardware with non-deterministic behavior. Perhaps you meant "unintended behavior".
Lul; you are welcome to argue with WG14 and POSIX about the term; it is not mine to "mean", only hopefully to explain when asked about usage, and point people at ##c for a fuller discussion.

WRT the overall point, I'd only say that behaviour is defined when you know what you are doing, just as every filename or command-line sequence of NUL-terminated bytes is perfectly acceptable to a sh-script, in the same way it is to C code, when you write sh correctly.
This is not always the same as "the user will be happy with the result", ofc. That's what return codes are for.

Both are just implementation languages, as javascript/ECMAscript is the implementation language for client-side code on the web.

You don't hear people complaining about being forced to use HTML, anywhere like you hear people berating sh for their mistakes.
And it's not like you never used to: it's just the younger generation see HTML as a given, in the same way that most programmers see every implementation language or format (make is a file-format, for example, not a language) as a given: that's the specification, work to it, and expect to be picked up on laxity, as that's how we keep things improving, rather than regressing.

It's not a personal issue; nor does ignorance shame the person: it's just a cue to go and read up on whatever.

That's my real problem with the den of nub-vipers (yakking in their cargo-cult confirmation bubble about the smell) that so many linux-distros seem to have become, following the dumbass Eric Raymond "master"-student approach: expertise becomes a tool of social control, rather than knowledge as a source of pleasure in the sharing, and the synchrony that every human seeks, especially from others who share their love of whichever craft means the most to them.

Thankfully I have come to realise that the good thing about being at half a century, is that I am allowed to be a grumpy old man. ;-)
As such, I won't ever stop berating from the peanut-gallery, whenever I see that same insidious trend toward delusions of control, rather than coding to serve the end-user.

Better to be written off as a ranting has-been, than fall into that trap. (I see it like a plumber thinking he controls the world, just because everyone uses water.)
Back to top
View user's profile Send private message
Tony0945
Watchman
Watchman


Joined: 25 Jul 2006
Posts: 5127
Location: Illinois, USA

PostPosted: Sat Jun 30, 2018 12:20 am    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
Sure you can find "aspects" but the truth is that UNIX and C were developed in parallel.

More than that. They were fraternal twins.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Jun 30, 2018 12:29 am    Post subject: Re: Witter, witter.. Reply with quote

Dr.Willy wrote:
Filenames in Unix are not plain text, they are binary data.
steveL wrote:
Again, this is incorrect; filepaths in UNIX are C strings (that is '\0' terminated sequence of byte or multi-byte char.)
.. Binary data is allowed to contain NUL-bytes; and the principal restriction on a text file, is that they do not contain NUL-bytes.
Dr.Willy wrote:
What I meant to say is that the -0 flag produces/reads binary data.
Well that is true, in that it contains NUL-bytes. But (being the pedant, now) it is not binary data, it is a NUL-separated (and terminated) sequence of filenames, each of which is guaranteed to be a C string.

Besides which you've just asked the tool to produce that output, so presumably you know how to handle it (or wtf are you doing?)

I don't see what that has to do with your consequent point:
Dr.Willy wrote:
This in turn breaks usage of grep, it breaks usage of sed, it breaks usage of awk and all other line-oriented unix tools.
steveL wrote:
No, it does not.
They only break when you try to use them to process filenames, which you should never do, just as you should never read or pipe the output from ls.
Dr.Willy wrote:
Why yes, if you don't use them they don't break.
The point is that the only reason you cannot use them is the lack of non-NUL terminators.
No, if you use them incorrectly on what no-one ever said was data that does not include newlines, by definition going back 50 years on what a "filepath" consists of, and expect it not to break, in stark disavowal of everything #bash teaches, then yes, your borked usage will break: which is why we keep telling people not to bloody well do that.

GIGO begins with what we code, first and foremost. That's why we have revision-control (or version-control) and code review etc.
It is never the implementation language's fault. UPE, "Software Tools", the awkbook, "The Practice of Programming", all have this in common: do not confine yourself to one language, nor to any one modality. There is no One True Way.
Dr.Willy wrote:
Oh trust me, I'd be more than happy to check for EBADNAME.
Agreed. I have no issue with it, as proposed to POSIX as an implementation error, just like there can be any other implementation-defined error at present, and since forever.
After all, perror(fpath); exit|return 2; is basic for a quick sketch in C.

The problem was with the definition (which again, the impl is free to decide on already); originally '\n', people then started talking about all controls less than space, but it always comes back to the original point: you will never be able to guarantee that paths are okay to use incorrectly in sh, which is the main "cause" of the complaints. And they can already be used correctly.

And the implementation (or the filesystem) can ban whatever it likes already. So can your script. (That's what case is for.)

Flags like -print0 are already in place to handle the main area of concern, and one can trivially use find and a script to process them, or indeed an embedded sh or bash -c line. People spit out that kind of thing in #bash all the time, especially greycat. (He is also a goldmine wrt commands like gzip and tar over ssh.)
Consider that for i; do handles any argument, and "$@" will always pass on all (remaining after shift) arguments to the called function or command, correctly.
Remember: everything is a C string (possibly wide-char) already. (It is not binary data; it can be processed textually, by both sh and any C code. sh just happens to do so much of it so well.)

And consider using find more.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Jun 30, 2018 12:35 am    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
Sure you can find "aspects" but the truth is that UNIX and C were developed in parallel.
Tony0945 wrote:
More than that. They were fraternal twins.
++
They definitely owe their success to each other. (or perhaps to the non-compete AT&T had to operate under for the decades of its original development.) As well ofc to the clear-headed thinking of Thompson & Ritchie.
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Sat Jun 30, 2018 6:57 am    Post subject: Reply with quote

Interesting reading, all this.

It reminded me of something I've occasionally wondered about but never got around to testing for myself.

What happens if one were to edit the disk-image of a (unmounted) filesystem, change a character in the inode's filename field to a '/', and then mount it? I'm thinking not much should happen, at least from the kernel-side. Probably just end up with an inaccessible file since the strcmp can't ever match after the pathname has been split on '/'. But how badly would receiving such a name mess up the userland tools? Anyone know if this possibility has been considered already? Or might there be latent vulnerabilities that need to be fixed?

Also, as per suggestion, I tried lurking in #bash a few times. Wonderful resource, hadn't known about it till not too long ago (first learned of in another thread but this one pushed me to visit). Although I don't know what it says about a language if one needs to spend 6 months on an IRC channel before they are qualified enough to use it properly :)

I saw they have a "shbot" on the channel, that appears to execute shell commands upon request... by anybody? I haven't tried it yet, but I have to wonder how do they protect it against the many ways in which something like that could go wrong? This has got to be the granddaddy counterpoint of all "do not use eval on unfiltered untrusted user input". (And now I'm imagining the response, "Oh, it's all good, we don't use eval at all. We pipe it straight into /bin/bash!")
_________________
Many think that Dilbert is a comic. Unfortunately it is a documentary.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat Jun 30, 2018 1:49 pm    Post subject: Reply with quote

Akkara wrote:
What happens if one were to edit the disk-image of a (unmounted) filesystem, change a character in the inode's filename field to a '/', and then mount it? I'm thinking not much should happen, at least from the kernel-side. Probably just end up with an inaccessible file since the strcmp can't ever match after the pathname has been split on '/'. But how badly would receiving such a name mess up the userland tools? Anyone know if this possibility has been considered already? Or might there be latent vulnerabilities that need to be fixed?
Not tried it personally; it comes under the (much) broader heading of "filesystem corruption" in my mind. wrt code: SEP. ;)
Quote:
Also, as per suggestion, I tried lurking in #bash a few times. Wonderful resource, hadn't known about it till not too long ago (first learned of in another thread but this one pushed me to visit). Although I don't know what it says about a language if one needs to spend 6 months on an IRC channel before they are qualified enough to use it properly :)
Good Lord, man, you need to spend a lot longer than that to be halfway competent in C, and indeed every language I've ever used to any degree.

The main "problem" with sh is the one mentioned repeatedly (because once it's in your head, it makes shell a lot more comfortable, ime): that it is designed to accept fumbling user input at the terminal and attempt to do something with it.
So people naturally assume that they've "got it right" when "it works", and resent being told "that's crap" by shell coders, until they've spent a bit of time watching corrected script fly by on a routine basis, and they see it's okay to make mistakes: just don't pretend they are not mistakes (as we don't have time, never mind capability, to correct your upbringing; only your code.)
Quote:
I saw they have a "shbot" on the channel, that appears to execute shell commands upon request... by anybody? I haven't tried it yet, but I have to wonder how do they protect it against the many ways in which something like that could go wrong? This has got to be the granddaddy counterpoint of all "do not use eval on unfiltered untrusted user input". (And now I'm imagining the response, "Oh, it's all good, we don't use eval at all. We pipe it straight into /bin/bash!")
Lul, you are allowed just to ask this sort of question in-channel, y'know? ;)
It runs in a virtual machine, a separate instance of which is spun up for every execution. That's why you can only write to a file and read from it, in the same command sequence.

Gotta love simply wrapping commands, and only optimising later, should it even prove necessary.
Correctness first, and last.

"You can always make a correctly-working program faster. It is an awful lot harder to make a 'fast', but incorrect, program, work correctly."
Back to top
View user's profile Send private message
CasperVector
Apprentice
Apprentice


Joined: 03 Apr 2012
Posts: 156

PostPosted: Sun Jul 01, 2018 12:26 pm    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
I think the true lesson of functional languages is that all code is serializable; rather obviously, any sequence of actions can be thought of as smaller subsequences, which begin as data in CPU terms.
It is that which makes them so suitable to optimisation of higher-order languages.

Which quite satisfatorily explains the badness of C++: if C used S-expressions, it would be much easier to implement structural macros, and then C with classes / templates / zero-cost abstractions would be probably just gymnastics with the macros; however, this is not the case, and now we end up with C++ with its increasing burdens^Wfeaturesets.

steveL wrote:
None; that's why rc can ditch a lot of baggage: precisely because it is purely a scripting language, not intended for interactive usage (from my reading of the project pages many years ago.) My point was more about how people dismiss shell, without understanding what it is about: making it easy for a user at an interactive terminal to run commands, while doing the wildcarding so no command, utility or other program, ever has to worry about it. Which is why argc and argv are the execution interface in Standard C, which like it or not, and I do, standardizes the UNIX approach as much as possible, leaving it to POSIX to fill in the middle layers, and more UNIX goodness than a language standard can mandate.

So this is quite unrelated (just like my post which you linked to) to the topic in the original thread: the implementation language of ebuilds, which is almost exclusively a non-interactive use case.
And by the way, rc(1) is quite suitable for interactive usage in Plan 9 because of the design of the latter, which seems to factor out a lot of implementation cost for interactiveness.
And I am inclined to think that it is not impossible to make a thin yet reasonably comfortable interactive layer on something like scsh, though it certainly requires some careful design (as with Plan 9, s6, etc).

steveL wrote:
Here I must disagree. Sure you can find "aspects" but the truth is that UNIX and C were developed in parallel. I can see what you're saying wrt computing and computability, but that's just the sequencing I referred to above, which derives from LISP's first purpose, as a thought-experiment on computing.

Yes, but Carnot's theorem was originally based on the caloric theory and early human beings were expected to use stone tools, which I think do not explain much.
I personally believe that the true spirit of the Unix philosophy is the minimisation of total complexity of the implementation while providing most necessary functionalities (note that this is also compatible with "worse is better").
While Lisp perhaps originated as a thought experiment, its core concepts proved to be vital to computer science, both in theory and in practice; see the analysis above about C++ for an example. S-expressions can be easily made to emulate nearly every programming language, and the importance of this is as practical as theoretical.

steveL wrote:
Additionally, I don't think "the UNIX way" is about any language: it is about modularity, high-cohesion and low-coupling, ie: basic fundamentals of Computer Science, with cheap processes (certainly by comparison to every other system when it was conceived and developed as a model, in the 1970s) and a clean namespace, which are what keep things modular. I couldn't see much about a language in there; regardless, as I said, UNIX is not about any one language. In fact if you read "Software Tools" (the Ratfor edition, 1976), "The UNIX Programming Environment", "The AWK Programming Language" and "The Practice of Programming", you will see it is all about combining the best approaches for each part of the problem, using whatever utility, language or mental-model is best for each part.

Programming languages are also specifications for computer programs, and compilers/interpreters are also computer programs, so the Unix philosophy does apply.
And regarding books, may I recommend the publications by Dan Friedman, and the Revised^5 Report on the Algorithmic Language Scheme?
(Additionally, note the length of R5RS, especially in comparison with the ISO specification of C; also note the first sentence from the introduction to the former:)
RnRS wrote:
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary.


steveL wrote:
You cannot solve sociopolitical problems via technical solutions to something completely different.

In "the only technical means by which we can get rid of control by big companies", by "we" I meant, roughly, those who have already been fed up with abominations like systemd (after all, most computer users on this planet use M$ Windows). If they take the pain to adapt to the Unix philosophy (cf. my definition above), what they get in return is not only a cleaner yet more powerful system, but also freedom from these abominations, since a system adherent to the philosophy is very immune to systemd-like infections (eg. imagine how systemd would invade Alpine Linux?).
And by the way, the free software movement was, apart from political, also very technical; unfortunately, I am vastly less talented than Stallman, and perhaps something like an Elegant Software Foundation would only be a wet dream of mine for a very long time. Nevertheless, If the above-mentioned organisation comes to exist some day, the definition of the above-mentioned "we" would be able to be broadened.

EDIT: I should have said "perhaps the only practical means by which we can get rid of technical control by big companies", and avoided the political aspects wherever possible.
_________________
My current OpenPGP key:
RSA4096/0x227E8CAAB7AA186C (expires: 2020.10.19)
7077 7781 B859 5166 AE07 0286 227E 8CAA B7AA 186C


Last edited by CasperVector on Mon Sep 16, 2019 4:15 pm; edited 13 times in total
Back to top
View user's profile Send private message
pun_guin
Apprentice
Apprentice


Joined: 06 Feb 2018
Posts: 204

PostPosted: Sun Jul 01, 2018 1:10 pm    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
I'd counter "assembler" in at least two architectures, to grok what a computer is all about. That is what C builds on, and formalises in its memory model.


You should have used past tense for this. While it is true that C was "designed" (= copied from BCPL, then having removed features until it fit the PDP-7 design) for better portability, you should keep in mind that today's computers have a vastly different memory model. Your computer is not a fast PDP-11. The main reason for C applications still being one of the fastest solutions (although Lisp applications can be faster) is not that they are close to bare-metal development. C compilers just have awesome optimization features by today and the runtime comes with not much overhead.
_________________
I already use the new Genthree.
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Wed Jul 04, 2018 1:12 pm    Post subject: Re: UNIX way, C, LISP et al. Reply with quote

steveL wrote:
I'd counter "assembler" in at least two architectures, to grok what a computer is all about. That is what C builds on, and formalises in its memory model.
pun_guin wrote:
You should have used past tense for this. While it is true that C was "designed" (= copied from BCPL, then having removed features until it fit the PDP-7 design) for better portability, you should keep in mind that today's computers have a vastly different memory model. Your computer is not a fast PDP-11.
It is not at all about what it was "copied" from, but the subsequent 40 years of standardisation across architectures, based on experience from implementation and usage.

So, no "the past tense" does not apply; computers are still the same basic design as 40 or 50 years ago, there is just an awful lot more going on (which makes for fun times with synchronisation); at much faster clock rates.
They still use transistors, last time I checked.

They still have a Program Counter reading from RAM (now icache) on a cyclical basis, they still have T-states and a Stack Pointer for machine control, they still use 2's complement arithmetic; and so on.

As for clock rates, computers were always faster than humans for the tasks given; that was the whole point of developing "calculating machines" in the first place.
We were working at microsecond level before; now it's nanosecond. That just means we can do more, and deliver better graphics and audio.

Nothing has changed in terms of actually programming a CPU: it's still the same underlying machine (with a register file, and so on.)
You still get on and do your job in as few cycles as possible, using the "virtual machine" model of multiprocessing designed to support precisely that approach by making it appear as if you are the only program running.

It just makes sense to work in C, because compilers are better at selecting instructions, especially since Bottom-Up Rewriting became a thing. So you don't have to worry about code portability, and it's an awful lot easier than writing asm, as you don't have to keep the stack in your head while you step through the code.

Synchronisation, and mutli-threading, are only an issue when you explicitly code to use them. By default, C and POSIX both, supply your code with a lovely environment, wherein if you just leave everything alone (instead of calling into a bloated stack of a "framework"), the system takes care of your process, and you can get on and do your job in as few cycles as possible.

You don't have to worry about a whole heap of things that you would have had to worry about before it became the dominant approach, by virtue of the results. (Read "Software Tools" for more perspective on this; the 1976 first-edition on Fortran/RATFOR, if you can get it.)
Quote:
The main reason for C applications still being one of the fastest solutions (although Lisp applications can be faster) is not that they are close to bare-metal development. C compilers just have awesome optimization features by today and the runtime comes with not much overhead.
Lul; that is precisely because the decades of implementation have been via writing asm output, for real CPUs, not some theoretical framework that only exists as castles in the air of someone's mind.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum