View previous topic :: View next topic |
Author |
Message |
urcindalo l33t
Joined: 08 Feb 2005 Posts: 623 Location: Almeria, Spain
|
Posted: Thu Jan 17, 2013 10:03 am Post subject: How to split this ASCII file into chunks? Programmer needed |
|
|
Hi and thanks for helping me out.
I need a very simple script, preferably in bash, so that I can split an ASCII file into chunks.
The file itself shows the following structure: Code: | ZINC02384989
-OEChem-01171301283D
42 43 0 1 0 0 0 0 0999 V2000
9.1303 3.5000 3.8395 C 0 0 0 0 0 0 0 0 0 0 0 0
10.2989 3.9671 3.2534 C 0 0 0 0 0 0 0 0 0 0 0 0
10.3583 4.2061 1.8981 C 0 0 0 0 0 0 0 0 0 0 0 0
9.2372 3.9779 1.1064 C 0 0 0 0 0 0 0 0 0 0 0 0
8.0556 3.5046 1.7052 C 0 0 0 0 0 0 0 0 0 0 0 0
... -> many lines here like the ones above and below
23 40 1 0 0 0 0
37 41 1 0 0 0 0
38 41 1 0 0 0 0
41 42 1 0 0 0 0
M CHG 3 39 -1 40 -1 41 1
M END
$$$$
ZINC04899456
-OEChem-01171301283D
65 66 0 1 0 0 0 0 0999 V2000
4.5113 4.3431 3.0084 C 0 0 0 0 0 0 0 0 0 0 0 0
4.5217 2.9765 3.6963 C 0 0 0 0 0 0 0 0 0 0 0 0
5.3676 3.0525 4.9689 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0899 2.5770 4.0587 C 0 0 0 0 0 0 0 0 0 0 0 0
....
....
$$$$ |
I want every chunk of text between the $$$$ marks to be written out to an individual text file in the same directory, the name of which will be the first entry in every chunk: ZINC02384989 for the first file, ZINC04899456 for the second... The file extension should be .sdf
In other words, the first individual ZINC02384989.sdf file will begin with ZINC02384989 and will end with $$$$, the second one will begin with ZINC04899456 and will end with another $$$$, and so on.
The original file to be splitted ends with a final $$$$ mark but begins with no one, as shown above. Its file name has also the .sdf extension, but both its filename and its extension can be freely changed for the script to work.
In the worst scenario there could be literally thousands of those chunks, so manually generating the individual files is out of the question.
Helps very much in advance. |
|
Back to top |
|
|
urcindalo l33t
Joined: 08 Feb 2005 Posts: 623 Location: Almeria, Spain
|
Posted: Thu Jan 17, 2013 10:21 am Post subject: |
|
|
I forgot to mention that sometimes the same entry name may appear more than once, since they correspond to different molecule conformations.
So, the script must have some kind of checking to not overwrite previous individual files and to add something like "-2", "-3"... to the filenames in these cases. |
|
Back to top |
|
|
tomk Bodhisattva
Joined: 23 Sep 2003 Posts: 7221 Location: Sat in front of my computer
|
Posted: Thu Jan 17, 2013 11:38 am Post subject: |
|
|
I had something similar that I'd written in perl that I've modified to suit your needs. Save it as split.pl then run: split.pl filename
split.pl: | #!/usr/bin/perl
use warnings;
use strict;
$| = 1;
my $match = '\$\$\$\$';
my $line;
my $outfile;
my $snapfile;
my $suffix;
my @sorted;
if ($#ARGV != 0) {
print STDERR "you must specify a snapshot file\n";
exit(1);
} else {
$snapfile = $ARGV[0];
}
open(SNAP, $snapfile) || die("couldn't open $snapfile");
while ($line = <SNAP>) {
if ($line =~ m/^$match$/) {
print OUTFILE $line if (defined $outfile);
print "writing to $outfile\n";
close(OUTFILE);
undef $outfile;
} else {
if (! defined $outfile) {
chomp ($outfile = $line);
if (-e "$outfile.sdf") {
@sorted = map{$_->[0]}
sort{$b->[1] <=> $a->[1]}
map{[$1,/^$outfile-(\d+)/]} glob("$outfile-*.sdf");
if ($sorted[0]) {
$suffix = $sorted[0] + 1;
} else {
$suffix = 1;
}
$outfile .= "-$suffix";
}
$outfile .= ".sdf";
open(OUTFILE, ">$outfile");
}
print OUTFILE $line if (defined $outfile);
}
}
close(OUTFILE);
close(SNAP); |
_________________ Search | Read | Answer | Report | Strip
Last edited by tomk on Thu Jan 17, 2013 3:55 pm; edited 1 time in total |
|
Back to top |
|
|
urcindalo l33t
Joined: 08 Feb 2005 Posts: 623 Location: Almeria, Spain
|
Posted: Thu Jan 17, 2013 11:56 am Post subject: |
|
|
I can hardly express my gratitude. The script works like a charm. Thanks very much indeed.
This is the reason why Gentoo has the best Linux user community out there |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|