| View previous topic :: View next topic |
| Author |
Message |
alex.blackbit Advocate

Joined: 26 Jul 2005 Posts: 2288
|
Posted: Fri Feb 10, 2012 9:11 am Post subject: extract URLs from IRClogs [SOLVED] |
|
|
Hi,
I use irssi for IRC and log the channels that I am in.
There are a few channels that I don't find particularly interesting as a whole, but only the posted URLs.
So what I would like to do is run tailf on the log file and pipe that to sed or awk to get only lines that contain URLs and of those lines, extract the URL itself, discard the rest of the line.
I have no clue about sed or awk beyond 's/foo/goo/' and '{ print $3 }'.
Would somebody please give me a hand?
Last edited by alex.blackbit on Fri Feb 10, 2012 2:11 pm; edited 1 time in total |
|
| Back to top |
|
 |
katfish Tux's lil' helper

Joined: 14 Nov 2011 Posts: 84
|
Posted: Fri Feb 10, 2012 11:26 am Post subject: |
|
|
Hi, I'm a beginner in scripting.
However, wouldn't a simple "echo xxx | grep http" do it perfectly? |
|
| Back to top |
|
 |
alex.blackbit Advocate

Joined: 26 Jul 2005 Posts: 2288
|
Posted: Fri Feb 10, 2012 12:53 pm Post subject: |
|
|
katfish: that solved the first part, but not the second.
The result are only lines that contain URLs.
What I want are only the URLS, nothing leading, nothing trailing.
That's why I suggested sed, awk, or even perl for that task.
Thanks for your reply anyway. |
|
| Back to top |
|
 |
tomk Administrator


Joined: 23 Sep 2003 Posts: 6793 Location: Sat in front of my computer
|
Posted: Fri Feb 10, 2012 1:28 pm Post subject: |
|
|
Something like this:
| Code: | | tail -f logfile | grep -o 'https\?://[^[:space:]]*' > links |
_________________ Search | Read | Answer | Report | Strip |
|
| Back to top |
|
 |
alex.blackbit Advocate

Joined: 26 Jul 2005 Posts: 2288
|
Posted: Fri Feb 10, 2012 2:11 pm Post subject: |
|
|
Uh, thanks for your answer, tomk.
| Code: | $ grep --help | grep "\-o,"
-o, --only-matching show only the part of a line matching PATTERN
$ |
I wasn't aware of this sweet option.
That's what I was searching for.
The regex can be improved I guess, but basically that does the trick.
adding [SOLVED].
Thanks again. |
|
| Back to top |
|
 |
|