View previous topic :: View next topic |
Author |
Message |
generalgherk n00b
Joined: 22 May 2003 Posts: 58
|
Posted: Sun Jan 25, 2004 7:15 pm Post subject: removing blank lines with sed |
|
|
Hi.
I am trying to strip down the file available from:
http://news.bbc.co.uk/text_only.stm
I've worked out how to do most of what I want to do with sed except for deleting a lot of the blank lines. I've read many different faqs and they all say to remove blank lines I do sed "/^$/d" file
This works on test files I can make but not on this web page. The file seems to contain a lot of tabs and spaces on the blank lines so I've tried "s/[\t]*//" and "s/ *//" too which removed some but not all. I've also tried making sure the file is converted to unix with dos2unix. Still a load of blank lines remain. What are they and how to I strip them?
Thanks, gg _________________ Join the adopt an unanswered post initiative today |
|
Back to top |
|
|
JensLH n00b
Joined: 13 Aug 2003 Posts: 46
|
Posted: Sun Jan 25, 2004 9:42 pm Post subject: |
|
|
Now, I'm no sed expert, but try "s/[ \t]*//"; that is, include both a space and a tab in the []'s.
Hope it helps. |
|
Back to top |
|
|
grant.mcdorman Apprentice
Joined: 29 Jan 2003 Posts: 295 Location: Toronto, ON, Canada
|
Posted: Sun Jan 25, 2004 9:55 pm Post subject: |
|
|
JensLH wrote: | Now, I'm no sed expert, but try "s/[ \t]*//"; that is, include both a space and a tab in the []'s.
Hope it helps. | That just removes multiple spaces and tabs; it doesn't delete the blank lines. Blank (i.e. just spaces/tabs) and empty (no spaces) lines can be deleted in one step by: Code: | sed -e '/^[ \t]*$/d' | Even better, in Linux, is Code: | sed -e '/^[[:space:]]*$/d' | (see the Sed FAQ section on extensions to regular expressions at sed.sourceforge.net for details.) |
|
Back to top |
|
|
BitJam Advocate
Joined: 12 Aug 2003 Posts: 2508 Location: Silver City, NM
|
Posted: Sun Jan 25, 2004 10:03 pm Post subject: |
|
|
try:
sed "/^[\t ]*$/d" file
or:
perl -n -e "/\S/ and print" infile > outfile
or to edit in place:
perl -ni -e "/\S/ and print" file |
|
Back to top |
|
|
generalgherk n00b
Joined: 22 May 2003 Posts: 58
|
|
Back to top |
|
|
jesterspet Apprentice
Joined: 05 Feb 2003 Posts: 215 Location: Atlanta
|
Posted: Tue Jan 27, 2004 8:32 pm Post subject: |
|
|
Normally when I have this issue I switch over to writing sed scripts that use multiple lines.
For a quick & dirty way of deliminating Paragraphs I use Code: | # match any non-blank line.
# read the next line of input
# if it is not blank, then join it with the previous line
# repeat until a blank line is encountered.
# when the blank line is encountered, output the entire joined phrase
s/^\///g
/$/!{
H
d
}
/$/{
x
s/^\n/<p>/ #DEBUG: denotes begining of paragraph
H/\n//g
s/$/<\/p>/ #DEBUG: denotes begining of paragraph
G
}
|
That should show you what is truly a blank line & what is not. Your output should be all single line paragraphs with only one blank line between them. From that you should be able to tweak your output accordingly. _________________ (X) Yes! I am a brain damaged lemur on crack, and would like to buy your software package for $499.95 |
|
Back to top |
|
|
|