Forums

Skip to content

Advanced search
  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Board index Assistance Portage & Programming
  • Search

sed remove comma after double quotes from list

Problems with emerge or ebuilds? Have a basic programming question about C, PHP, Perl, BASH or something else?
Post Reply
Advanced search
8 posts • Page 1 of 1
Author
Message
finalturismo
Guru
Guru
Posts: 410
Joined: Mon Jan 06, 2020 4:53 pm

sed remove comma after double quotes from list

  • Quote

Post by finalturismo » Sun Nov 13, 2022 7:48 pm

So what iam trying to do is remove the comma from "1,779.51 TB" while removing the quotes, i need to do this to a list of files but iam stuck a bit on this one and its messing up my csv format for about 10,000 certificates.

Code: Select all

echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS" |  sed 's&"\(.*\)"&\1&gI'

So far this is as far as i can get........ iam still left with the comma after removing the "" from the 1,779.51TB
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sun Nov 13, 2022 9:21 pm

Code: Select all

echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS"
This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?


Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.

Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.


Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion.
Top
finalturismo
Guru
Guru
Posts: 410
Joined: Mon Jan 06, 2020 4:53 pm

  • Quote

Post by finalturismo » Sun Nov 13, 2022 9:52 pm

szatox wrote:

Code: Select all

echo "10625,0SWXAEYA,8465,2022-10-13 03:35:57,/dev/sdx,30 °C,30 °C,88 %,43608 Hours,88 days,"1,779.51 TB",HP      VO1920JEUQQ,0SWXAEYA,1787.98GB,done,NIST.SP.800-88(1 Pass),SAS"
This is your first problem.
"1,779.51 TB" means "close quotation, 1,779.51 TB, open quotation". Not what you expected, is it?


Regarding the pattern, doing it right is quite tricky, it may be a good case for using sed's hold space or nested substring capture combined with negative character matches.
E.g. [^,]* will match "all characters until the first coma". You can try something along the lines of:
\("\([^,"]*,\)*"\)
\1 should return "quoted value with all comas removed", while \2 should return value up to the first coma within previously opened quotation. Haven't tested, typing from memory.
Using those tricks with global flag you should be able to end up with an expression that will drop comas within all quoted fields and ignore comas separating the fields.

Alternatively, hold space allows you to process a string bit by bit, you can take advantage of the new line at the end of buffer to separate input from output during processing, and chew the quoted values one bite at a time.
Like in copy buffer to hold space, remove everything after second quote, remove comas after first quote, append to hold, copy hold to pattern space, remove everything up until second quote, repeat until your buffer starts with a new line, at which point you drop the new line and print the result.
Yes, sed allows you to run scripts, including conditional execution in a simple form of if-pattern-matched then goto label.
Check man sed for details.


Now, if you can guarantee there is only 1 field with a coma inside, this may or may not simplify the code. Either way, relying on this property of input data will give you a buggy script which will probably break at some point down the line, so while it may be tempting, I discourage that.
BTW, if you need to only do that once and don't care about whatever problems might arise in the future, simply don't write any script at all and just import that csv into calc. It understands quoted fields and will help you do the conversion.
as a backup plan how remove the first , after the first " using sed?

Like search for character after first "
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Sun Nov 13, 2022 10:08 pm

s/^\([^"]*"[^,"]*\),/\1"/

Starting at the beginning of buffer, capture a non-quote character repeated any times, a quote, a non-coma/quote repeated any times, stop capture before a coma.
Since that final coma is a part of the matched pattern but not a part of capture pattern, it will be stripped from buffer by the match and not returned from \1 in the substitution.

Note: this will ONLY remove the first coma between the first and the second quote.
Top
Genone
Retired Dev
Retired Dev
User avatar
Posts: 9656
Joined: Fri Mar 14, 2003 6:02 pm
Location: beyond the rim

  • Quote

Post by Genone » Mon Nov 14, 2022 9:35 am

You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed.
Top
finalturismo
Guru
Guru
Posts: 410
Joined: Mon Jan 06, 2020 4:53 pm

  • Quote

Post by finalturismo » Tue Nov 15, 2022 4:33 am

Genone wrote:You may want to use a proper CSV parser and some other programming language for this. Or try to change the formatting of the output in the generating program.
As the quotes are likely only there due to the group separator comma, which may not always be there. So I wouldn't rely on the output format to stay constant, which is not a good starting point for using sed.

True that, his solution worked perfect for me though and i thought it was a great idea. I used python to import padas and pd and than exported to text to process with bash.

Used it to convert about 10,000 certs with imagemagick. it worked great!!! .

Thanks!
Top
Hu
Administrator
Administrator
Posts: 24397
Joined: Tue Mar 06, 2007 5:38 am

  • Quote

Post by Hu » Tue Nov 15, 2022 1:25 pm

If you already have access to Python, then exporting it for processing with shell is going backwards. Python's text processing is at least as good as, and probably better than, anything you can easily build with bash+sed.
Top
szatox
Advocate
Advocate
Posts: 3858
Joined: Tue Aug 27, 2013 12:35 pm

  • Quote

Post by szatox » Tue Nov 15, 2022 2:33 pm

Python... Well, python actually has a csv library, so you should be able to open this file directly.
Top
Post Reply

8 posts • Page 1 of 1

Return to “Portage & Programming”

Jump to
  • Assistance
  • ↳   News & Announcements
  • ↳   Frequently Asked Questions
  • ↳   Installing Gentoo
  • ↳   Multimedia
  • ↳   Desktop Environments
  • ↳   Networking & Security
  • ↳   Kernel & Hardware
  • ↳   Portage & Programming
  • ↳   Gamers & Players
  • ↳   Other Things Gentoo
  • ↳   Unsupported Software
  • Discussion & Documentation
  • ↳   Documentation, Tips & Tricks
  • ↳   Gentoo Chat
  • ↳   Gentoo Forums Feedback
  • ↳   Duplicate Threads
  • International Gentoo Users
  • ↳   中文 (Chinese)
  • ↳   Dutch
  • ↳   Finnish
  • ↳   French
  • ↳   Deutsches Forum (German)
  • ↳   Diskussionsforum
  • ↳   Deutsche Dokumentation
  • ↳   Greek
  • ↳   Forum italiano (Italian)
  • ↳   Forum di discussione italiano
  • ↳   Risorse italiane (documentazione e tools)
  • ↳   Polskie forum (Polish)
  • ↳   Instalacja i sprzęt
  • ↳   Polish OTW
  • ↳   Portuguese
  • ↳   Documentação, Ferramentas e Dicas
  • ↳   Russian
  • ↳   Scandinavian
  • ↳   Spanish
  • ↳   Other Languages
  • Architectures & Platforms
  • ↳   Gentoo on ARM
  • ↳   Gentoo on PPC
  • ↳   Gentoo on Sparc
  • ↳   Gentoo on Alternative Architectures
  • ↳   Gentoo on AMD64
  • ↳   Gentoo for Mac OS X (Portage for Mac OS X)
  • Board index
  • All times are UTC
  • Delete cookies

© 2001–2026 Gentoo Foundation, Inc.

Powered by phpBB® Forum Software © phpBB Limited

Privacy Policy

 

 

magic