View previous topic :: View next topic |
Author |
Message |
finalturismo Guru

Joined: 06 Jan 2020 Posts: 410
|
Posted: Sun Nov 13, 2022 7:48 pm Post subject: sed remove comma after double quotes from list |
|
|
So what iam trying to do is remove the comma from "1,779.51 TB" while removing the quotes, i need to do this to a list of files but iam stuck a bit on this one and its messing up my csv format for about 10,000 certificates.
Code: | echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS" | sed 's&"\(.*\)"&\1&gI' |
So far this is as far as i can get........ iam still left with the comma after removing the "" from the 1,779.51TB |
|
Back to top |
|
 |
szatox Advocate

Joined: 27 Aug 2013 Posts: 3594
|
Posted: Sun Nov 13, 2022 9:21 pm Post subject: |
|
|
Code: | echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS" | This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?
Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.
Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.
Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion. |
|
Back to top |
|
 |
finalturismo Guru

Joined: 06 Jan 2020 Posts: 410
|
Posted: Sun Nov 13, 2022 9:52 pm Post subject: |
|
|
szatox wrote: | Code: | echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS" | This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?
Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.
Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.
Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion. |
as a backup plan how remove the first , after the first " using sed?
Like search for character after first " |
|
Back to top |
|
 |
szatox Advocate

Joined: 27 Aug 2013 Posts: 3594
|
Posted: Sun Nov 13, 2022 10:08 pm Post subject: |
|
|
s/^\([^"]*"[^,"]*\),/\1"/
Starting at the beginning of buffer, capture a non-quote character repeated any times, a quote, a non-coma/quote repeated any times, stop capture before a coma.
Since that final coma is a part of the matched pattern but not a part of capture pattern, it will be stripped from buffer by the match and not returned from \1 in the substitution.
Note: this will ONLY remove the first coma between the first and the second quote. |
|
Back to top |
|
 |
Genone Retired Dev


Joined: 14 Mar 2003 Posts: 9631 Location: beyond the rim
|
Posted: Mon Nov 14, 2022 9:35 am Post subject: |
|
|
You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed. |
|
Back to top |
|
 |
finalturismo Guru

Joined: 06 Jan 2020 Posts: 410
|
Posted: Tue Nov 15, 2022 4:33 am Post subject: |
|
|
Genone wrote: | You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed. |
True that, his solution worked perfect for me though and i thought it was a great idea. I used python to import padas and pd and than exported to text to process with bash.
Used it to convert about 10,000 certs with imagemagick. it worked great!!! .
Thanks! |
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23296
|
Posted: Tue Nov 15, 2022 1:25 pm Post subject: |
|
|
If you already have access to Python, then exporting it for processing with shell is going backwards. Python's text processing is at least as good as, and probably better than, anything you can easily build with bash+sed. |
|
Back to top |
|
 |
szatox Advocate

Joined: 27 Aug 2013 Posts: 3594
|
Posted: Tue Nov 15, 2022 2:33 pm Post subject: |
|
|
Python... Well, python actually has a csv library, so you should be able to open this file directly. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|