Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
extract URLs from IRClogs [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
alex.blackbit
Advocate
Advocate


Joined: 26 Jul 2005
Posts: 2288

PostPosted: Fri Feb 10, 2012 9:11 am    Post subject: extract URLs from IRClogs [SOLVED] Reply with quote

Hi,

I use irssi for IRC and log the channels that I am in.
There are a few channels that I don't find particularly interesting as a whole, but only the posted URLs.
So what I would like to do is run tailf on the log file and pipe that to sed or awk to get only lines that contain URLs and of those lines, extract the URL itself, discard the rest of the line.
I have no clue about sed or awk beyond 's/foo/goo/' and '{ print $3 }'.
Would somebody please give me a hand?


Last edited by alex.blackbit on Fri Feb 10, 2012 2:11 pm; edited 1 time in total
Back to top
View user's profile Send private message
katfish
Tux's lil' helper
Tux's lil' helper


Joined: 14 Nov 2011
Posts: 84

PostPosted: Fri Feb 10, 2012 11:26 am    Post subject: Reply with quote

Hi, I'm a beginner in scripting.
However, wouldn't a simple "echo xxx | grep http" do it perfectly?
Back to top
View user's profile Send private message
alex.blackbit
Advocate
Advocate


Joined: 26 Jul 2005
Posts: 2288

PostPosted: Fri Feb 10, 2012 12:53 pm    Post subject: Reply with quote

katfish: that solved the first part, but not the second.
The result are only lines that contain URLs.
What I want are only the URLS, nothing leading, nothing trailing.
That's why I suggested sed, awk, or even perl for that task.
Thanks for your reply anyway.
Back to top
View user's profile Send private message
tomk
Administrator
Administrator


Joined: 23 Sep 2003
Posts: 6793
Location: Sat in front of my computer

PostPosted: Fri Feb 10, 2012 1:28 pm    Post subject: Reply with quote

Something like this:

Code:
tail -f logfile | grep -o 'https\?://[^[:space:]]*' > links

_________________
Search | Read | Answer | Report | Strip
Back to top
View user's profile Send private message
alex.blackbit
Advocate
Advocate


Joined: 26 Jul 2005
Posts: 2288

PostPosted: Fri Feb 10, 2012 2:11 pm    Post subject: Reply with quote

Uh, thanks for your answer, tomk.
Code:
$ grep --help | grep "\-o,"
  -o, --only-matching       show only the part of a line matching PATTERN
$

I wasn't aware of this sweet option.
That's what I was searching for.
The regex can be improved I guess, but basically that does the trick.
adding [SOLVED].
Thanks again.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum