Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
removing blank lines with sed
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
generalgherk
n00b
n00b


Joined: 22 May 2003
Posts: 58

PostPosted: Sun Jan 25, 2004 7:15 pm    Post subject: removing blank lines with sed Reply with quote

Hi.
I am trying to strip down the file available from:
http://news.bbc.co.uk/text_only.stm

I've worked out how to do most of what I want to do with sed except for deleting a lot of the blank lines. I've read many different faqs and they all say to remove blank lines I do sed "/^$/d" file
This works on test files I can make but not on this web page. The file seems to contain a lot of tabs and spaces on the blank lines so I've tried "s/[\t]*//" and "s/ *//" too which removed some but not all. I've also tried making sure the file is converted to unix with dos2unix. Still a load of blank lines remain. What are they and how to I strip them?

Thanks, gg
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
JensLH
n00b
n00b


Joined: 13 Aug 2003
Posts: 46

PostPosted: Sun Jan 25, 2004 9:42 pm    Post subject: Reply with quote

Now, I'm no sed expert, but try "s/[ \t]*//"; that is, include both a space and a tab in the []'s.

Hope it helps.
Back to top
View user's profile Send private message
grant.mcdorman
Apprentice
Apprentice


Joined: 29 Jan 2003
Posts: 295
Location: Toronto, ON, Canada

PostPosted: Sun Jan 25, 2004 9:55 pm    Post subject: Reply with quote

JensLH wrote:
Now, I'm no sed expert, but try "s/[ \t]*//"; that is, include both a space and a tab in the []'s.

Hope it helps.
That just removes multiple spaces and tabs; it doesn't delete the blank lines. Blank (i.e. just spaces/tabs) and empty (no spaces) lines can be deleted in one step by:
Code:
sed -e '/^[ \t]*$/d'
Even better, in Linux, is
Code:
sed -e '/^[[:space:]]*$/d'
(see the Sed FAQ section on extensions to regular expressions at sed.sourceforge.net for details.)
Back to top
View user's profile Send private message
BitJam
Advocate
Advocate


Joined: 12 Aug 2003
Posts: 2508
Location: Silver City, NM

PostPosted: Sun Jan 25, 2004 10:03 pm    Post subject: Reply with quote

try:

sed "/^[\t ]*$/d" file

or:

perl -n -e "/\S/ and print" infile > outfile

or to edit in place:

perl -ni -e "/\S/ and print" file
Back to top
View user's profile Send private message
generalgherk
n00b
n00b


Joined: 22 May 2003
Posts: 58

PostPosted: Tue Jan 27, 2004 2:04 pm    Post subject: Reply with quote

thanks a lot to all who helped, though sed -e '/^[[:space:]]*$/d' was especially useful :)
_________________
Join the adopt an unanswered post initiative today
Back to top
View user's profile Send private message
jesterspet
Apprentice
Apprentice


Joined: 05 Feb 2003
Posts: 215
Location: Atlanta

PostPosted: Tue Jan 27, 2004 8:32 pm    Post subject: Reply with quote

Normally when I have this issue I switch over to writing sed scripts that use multiple lines.

For a quick & dirty way of deliminating Paragraphs I use
Code:
# match any non-blank line.
# read the next line of input
# if it is not blank, then join it with the previous line
# repeat until a blank line is encountered.
# when the blank line is encountered, output the entire joined phrase
s/^\///g
/$/!{
     H
     d
     }
/$/{
        x
        s/^\n/<p>/      #DEBUG: denotes begining of paragraph
        H/\n//g
        s/$/<\/p>/      #DEBUG: denotes begining of paragraph
        G
        }


That should show you what is truly a blank line & what is not. Your output should be all single line paragraphs with only one blank line between them. From that you should be able to tweak your output accordingly.
_________________
(X) Yes! I am a brain damaged lemur on crack, and would like to buy your software package for $499.95
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum