Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
how to cut a pattern from log file? 15 million lines approx.
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
_______0
Guru
Guru


Joined: 15 Oct 2012
Posts: 521

PostPosted: Sat May 04, 2013 12:49 pm    Post subject: how to cut a pattern from log file? 15 million lines approx. Reply with quote

hi,

Testing a gtx 660 suddenly the box locked up with distorted graphics. It was possible to re-start X but extremely slow but with the same glitched graphics.

Upon rebooting I notice the /var/log/messages, weighing 1.7GB, has around 15 million lines with the following:

Code:
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000


It's practically impossible to sanely travel just at the beggining of the error.

I am not sure whether there's any point in preserving this block. grepping a second, let's say 44, of the log returns:

Code:
wc 39565  553960 4668775


Around 40000 lines per second.

How can I delete this pattern from /var/log/messages? With vim and delete the pattern? I would like to know how to do this from command line.

thanks
Back to top
View user's profile Send private message
miket
Guru
Guru


Joined: 28 Apr 2007
Posts: 483
Location: Gainesville, FL, USA

PostPosted: Sat May 04, 2013 1:17 pm    Post subject: Re: how to cut a pattern from log file? 15 million lines app Reply with quote

_______0 wrote:
nouveau E[ PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000
nouveau E[ PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000

I love vim, but this is not a good application for it. It's grep to the rescue. Decide how much of that line you need to match in order not to match lines you do not want to delete. Maybe "nouveau...TRAP" is good enough. So now exclude lines with a regex which follows that pattern
Code:
egrep -v '^nouveau.*TRAP' /var/log/messages >some_temp_file

Let that run and now check to be sure you didn't exclude too much. When you're happy with the result,. rename the temporary file to /var/log/messages.

EDIT: made it an anchored pattern for speed and lowered liklihood of false positives.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Sat May 04, 2013 4:22 pm    Post subject: Reply with quote

Code:
egrep -v '^nouveau.*TRAP' >/var/log/messages >some_temp_file
Be careful there! That's an output redirection to your /var/log/messages. Although, the egrep command will probably fail to run, or it may appear hung, doing nothing ;)
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Sat May 04, 2013 4:50 pm    Post subject: Reply with quote

Corrected typos in advice before they could cause (admittedly minor) mayhem. Thanks to ppurka for pointing out.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Sat May 04, 2013 4:55 pm    Post subject: Reply with quote

miket may have meant to use < redirection to put the log on stdin and mistyped it as >. Either way, the fixed version from John should work. The original version would have truncated both messages and the temporary file, then stopped waiting for searchable text on stdin. However, even if the user realized why egrep was waiting, the files would already have been truncated.
Back to top
View user's profile Send private message
_______0
Guru
Guru


Joined: 15 Oct 2012
Posts: 521

PostPosted: Sat May 04, 2013 5:18 pm    Post subject: Reply with quote

didn't work. Simply copies the entire file with its 15 millions lines:

Code:
nouveau E[  PGRAPH][0000:01:00.0] TRAP ch -1 [0x000027f9a6] 0x01000000


As a side note, pretty remarkable the card about generating such an large amount of data in short time. I am pretty sure if it was some other useful calculation would be positive.

Reviewing the logs the line varies within the TRAP part. I tried to run

Code:
grep PGRAPH | uniq


To retrieve the changes in "TRAP ch -1", because it appears to be more, "TRAP ch 4" but since each line has its own time? stamp, [427658.822182] sees each line as unique :/.

Online doesn't reveal much about the meaning of this error.

Is there a way to remove anything preciding nouveau keyword?

Dunno whether this would be efficient I can grep PGRAPH, pipe to a new file, then with vim and visual block delete the columns preceiding nouveau keyword.

:/
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Sat May 04, 2013 6:07 pm    Post subject: Reply with quote

By the way, you can use sed to do the cutting. Like
Code:
sed -i "/^nouveau /d" filename
which will remove *all* the lines starting with the string "nouveau ".
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
miket
Guru
Guru


Joined: 28 Apr 2007
Posts: 483
Location: Gainesville, FL, USA

PostPosted: Sat May 04, 2013 6:26 pm    Post subject: Reply with quote

ppurka wrote:
Code:
egrep -v '^nouveau.*TRAP' >/var/log/messages >some_temp_file
Be careful there! That's an output redirection to your /var/log/messages. Although, the egrep command will probably fail to run, or it may appear hung, doing nothing ;)
Yikes! That's what happens when I post a message on the phone!
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sat May 04, 2013 8:28 pm    Post subject: Reply with quote

ppurka wrote:
By the way, you can use sed to do the cutting. Like
Code:
sed -i "/^nouveau /d" filename
which will remove *all* the lines starting with the string "nouveau ".

I agree with using sed, but not sed -i: it's not as safe as it could be, and for something like this I don't think it's wise.
Code:
sed '/^nouveau E\[/d' /var/log/messages > messages.trim
will cut out all the nouveau errors, and put the rest in a new file in cwd, for review.

___o: note that you have to backslash-escape [ and ] if you want them treated as literals (same as * and ?)
Back to top
View user's profile Send private message
Goverp
Veteran
Veteran


Joined: 07 Mar 2007
Posts: 1972

PostPosted: Sun May 05, 2013 8:02 am    Post subject: Reply with quote

If you're using syslog-ng, I think it has a "suppress" option that will eat duplicate messages and leave "Last message repeated N times" as a placeholder. Which would lighten your I/O load as well as making the logs easier to read.
_________________
Greybeard
Back to top
View user's profile Send private message
truc
Advocate
Advocate


Joined: 25 Jul 2005
Posts: 3199

PostPosted: Sun May 05, 2013 8:21 am    Post subject: Reply with quote

ppurka wrote:
By the way, you can use sed to do the cutting. Like
Code:
sed -i "/^nouveau /d" filename
which will remove *all* the lines starting with the string "nouveau ".



sed will create a temporary file, and then rename it to /var/log/message which is not a good idea since the logging will still happen in the old file(on the old file descriptor)

You can edit a file in place with 'ed' with something like:
Code:
ed -s /var/log/message <<< $'g/nouveau.*TRAP/d\nw'



EDIT: or..... you can also reload your logger :lol:
_________________
The End of the Internet!
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Sun May 05, 2013 12:50 pm    Post subject: Reply with quote

I like ed too but it's a bad choice for a 15 million line file, since it loads the entire thing into memory first.
Back to top
View user's profile Send private message
ppurka
Advocate
Advocate


Joined: 26 Dec 2004
Posts: 3256

PostPosted: Sun May 05, 2013 1:26 pm    Post subject: Reply with quote

truc wrote:
ppurka wrote:
By the way, you can use sed to do the cutting. Like
Code:
sed -i "/^nouveau /d" filename
which will remove *all* the lines starting with the string "nouveau ".



sed will create a temporary file, and then rename it to /var/log/message which is not a good idea since the logging will still happen in the old file(on the old file descriptor)
Indeed, sed creates a temporary file. Like what steveL wrote, but it does that automatically. The fact that logging will still happen in the old file matters only if something (syslog?) keeps a file handle open to the old file permanently till it is stopped. Is that how syslog works? I doubt that.
_________________
emerge --quiet redefined | E17 vids: I, II | Now using kde5 | e is unstable :-/
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 21490

PostPosted: Sun May 05, 2013 4:51 pm    Post subject: Reply with quote

On my system, syslog-ng has descriptors open for each of the logs it writes. As far as I know, this is fairly typical for system logger implementations, which is why logrotate must signal the logger after the log is rotated.
Back to top
View user's profile Send private message
truc
Advocate
Advocate


Joined: 25 Jul 2005
Posts: 3199

PostPosted: Sun May 05, 2013 6:04 pm    Post subject: Reply with quote

steveL wrote:
I like ed too but it's a bad choice for a 15 million line file, since it loads the entire thing into memory first.


oh, I did not know that! :o So, just for the sake of learning something, is there a tool which we could use instead (non-interactive usage, does not load the whole file in memory and able to edit file in-place)?
_________________
The End of the Internet!
Back to top
View user's profile Send private message
platojones
Veteran
Veteran


Joined: 23 Oct 2002
Posts: 1602
Location: Just over the horizon

PostPosted: Sun May 05, 2013 6:30 pm    Post subject: Reply with quote

truc wrote:
steveL wrote:
I like ed too but it's a bad choice for a 15 million line file, since it loads the entire thing into memory first.


oh, I did not know that! :o So, just for the sake of learning something, is there a tool which we could use instead (non-interactive usage, does not load the whole file in memory and able to edit file in-place)?


This is Linux...there are quite a few...sed, awk, perl...all can do that (actually, there are more, but that's just 3 seconds off the top of my head!

Well, not 'in-place', but for stream editing, there are a lot.
Back to top
View user's profile Send private message
_______0
Guru
Guru


Joined: 15 Oct 2012
Posts: 521

PostPosted: Tue May 07, 2013 2:35 pm    Post subject: Reply with quote

Hu wrote:
On my system, syslog-ng has descriptors open for each of the logs it writes. As far as I know, this is fairly typical for system logger implementations, which is why logrotate must signal the logger after the log is rotated.


care to give an example?

I finally desisted because with a second crash /var/log/messages grew 6GB, in around a couple of minutes that lasted the crash o_O
Back to top
View user's profile Send private message
steveL
Watchman
Watchman


Joined: 13 Sep 2006
Posts: 5153
Location: The Peanut Gallery

PostPosted: Tue May 07, 2013 6:44 pm    Post subject: Reply with quote

truc wrote:
is there a tool which we could use instead (non-interactive usage, does not load the whole file in memory and able to edit file in-place)?

As platojones said, sed awk and perl in order of complexity. sed and awk are both POSIX utils. sed is not in-place unless you use GNU sed -i (or in BSD -i '' which is not recommended.) Historically sed -i has been flaky, if the process terminates early for some reason. ed is safer in this regard: as a file editor it's very careful about not losing data.

So the simplest, robust solution, is to use sed followed by mv, which is effectively what ed does. You can always wrap that in a shell function eg:
Code:
# sedit -n '/foo/p' file..
sedit() {
   local f b e=0 c opts
   while case $1 in --) shift; false;; -*) opts="$opts $1";; *) false;; esac
   do shift; done
   c=$1; shift
   for f; do
      b=$f.bak
      echo > "$b" && sed $opts "$c" "$f" > "$b" && mv "$b" "$f" || e=$((e + 1))
   done
   [ $e -gt 99 ] && e=99
   return $e
}

This is pretty similar to what sed -i '.bak' does under BSD, afaik, but should work with any sed or shell. (It doesn't handle options which take args.)
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Tue May 07, 2013 6:46 pm    Post subject: Reply with quote

Followed by a SIGHUP to the syslog-ng processes to force them to close and reopen the log file. I believe
Code:
logrotate --force
will handle this automatically.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum