Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sed remove comma after double quotes from list
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
finalturismo
Guru
Guru


Joined: 06 Jan 2020
Posts: 410

PostPosted: Sun Nov 13, 2022 7:48 pm    Post subject: sed remove comma after double quotes from list Reply with quote

So what iam trying to do is remove the comma from "1,779.51 TB" while removing the quotes, i need to do this to a list of files but iam stuck a bit on this one and its messing up my csv format for about 10,000 certificates.

Code:
echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS" |  sed 's&"\(.*\)"&\1&gI'



So far this is as far as i can get........ iam still left with the comma after removing the "" from the 1,779.51TB
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3594

PostPosted: Sun Nov 13, 2022 9:21 pm    Post subject: Reply with quote

Code:
echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS"
This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?


Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.

Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.


Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion.
Back to top
View user's profile Send private message
finalturismo
Guru
Guru


Joined: 06 Jan 2020
Posts: 410

PostPosted: Sun Nov 13, 2022 9:52 pm    Post subject: Reply with quote

szatox wrote:
Code:
echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS"
This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?


Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.

Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.


Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion.


as a backup plan how remove the first , after the first " using sed?

Like search for character after first "
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3594

PostPosted: Sun Nov 13, 2022 10:08 pm    Post subject: Reply with quote

s/^\([^"]*"[^,"]*\),/\1"/

Starting at the beginning of buffer, capture a non-quote character repeated any times, a quote, a non-coma/quote repeated any times, stop capture before a coma.
Since that final coma is a part of the matched pattern but not a part of capture pattern, it will be stripped from buffer by the match and not returned from \1 in the substitution.

Note: this will ONLY remove the first coma between the first and the second quote.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9631
Location: beyond the rim

PostPosted: Mon Nov 14, 2022 9:35 am    Post subject: Reply with quote

You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed.
Back to top
View user's profile Send private message
finalturismo
Guru
Guru


Joined: 06 Jan 2020
Posts: 410

PostPosted: Tue Nov 15, 2022 4:33 am    Post subject: Reply with quote

Genone wrote:
You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed.



True that, his solution worked perfect for me though and i thought it was a great idea. I used python to import padas and pd and than exported to text to process with bash.

Used it to convert about 10,000 certs with imagemagick. it worked great!!! .

Thanks!
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23296

PostPosted: Tue Nov 15, 2022 1:25 pm    Post subject: Reply with quote

If you already have access to Python, then exporting it for processing with shell is going backwards. Python's text processing is at least as good as, and probably better than, anything you can easily build with bash+sed.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3594

PostPosted: Tue Nov 15, 2022 2:33 pm    Post subject: Reply with quote

Python... Well, python actually has a csv library, so you should be able to open this file directly.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum