Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] Output files stop receiving data from program
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 4:25 am    Post subject: [SOLVED] Output files stop receiving data from program Reply with quote

Hi. I'm having a problem with the program I wrote for my math thesis. It is running on a Gentoo cluster and I think the problem may have something to do with the OS' buffering of output files (or something like that). But I haven't any idea how to resolve it. I am pretty new to this type of programming (I've done scripting, before, but not "real" programming).

The problem is that after the program runs for a long time, the output files stop receiving data the program writes. That is, in the middle - or what should be the middle - of a file, it will stop getting any new data. New output files are created, but they also receive no data - they are all size 0.

I have (somehow) attached to the program using gdb and found that it is running correctly and sending the correct output to the files. But the files never receive the output.

Some details:
The program is written in C and uses an ODE solver called CVODE (part of SUNDIALS suite of ODE solvers) and MPICH Version 1.2.7 (which I can't find any documentation about - but that's another issue). It simulates a 2 dimensional physical system. I output two file types: time data and position data. To keep the file sizes reasonable, I close them and start new ones every so often (right now every 10,000 lines).

Bigger systems have more data on each line, so 10,000 lines of a big system is larger than 10,000 lines of a small system. For example, the position file of an 8x8 system has 128 floating point numbers on each line; a 16x16 system will have 512 fl pt numbers per line.

The problem starts at varying times (IOW, it doesn't occur every time at, say, 3000 lines). But the larger the system, the sooner the problem occurs.

At any point in my program, both files have the exact same number of output lines. Both files are printed from the same node (that is, all the nodes send their data to the first node, which prints the data). The two files are written on (more or less) adjacent lines of the code. But when the problem occurs, the two files do not have the same number of lines.

Sometimes I compile my program so that it creates a log file. The log file has many more lines than either data file, but each line is smaller. The log file never has the same problem. It is created on a different device. The data files are created on a device called /raid, the log files are created on whatever device holds all the other directories (which is too small to contain all of my output data).

This all leads me to believe that OS buffering could be the issue, like maybe the buffer is overfilling, or the files are too large, etc.

So then I tried making the output files smaller - breaking at every 1000 and every 100 lines. But the problem continued to occur.

That's all the useful information I have. Code related to the output is below. Any ideas what could be wrong? Is it system related? If not, where could I get help for this?

Remember, please answer using small words (and speak slowly). And be patient with my replies (it takes me time to figure out what you mean and how to do what you instruct - plus I have a job).

I realize this may not be the best forum to ask such a question, but it's the best forum I know which is related. If you know of a good place to post this type of question, please let me know.

Note, also, that this is a math thesis, not a programming one, so helping me with the computer problems is not "doing my homework for me".

Following is code related to creating the output files (and a log file). The files named t_slots and u1 are the files in question. The most important lines are near the bottom of the listing, between fprintf(logFile, "printing..."); and fprintf( logFile, "printed.");. You may not even have to read the lines between the #if DELAY; #endifs.
Code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
#include <time.h>

#define FILEBREAK 100

FILE *t_slot, *u1;   // Used in PrintData

#if DELAY
FILE *logFile;
char logFn[100] = "logs/fpuLog_";
#endif

int main(int argc, char *argv[])
{

...

#if DELAY
  char thisLog[100] = "";
  time_t now;
  struct tm *today;
  char date[25];

  int i;
  for (i=1; i<argc; i++) {
    strcat(logFn, argv[i]);
    strcat(logFn, "_");
  }

  sprintf(logFn, "%s_rank=%02d__", logFn, my_pe);

  time(&now);
  today = localtime(&now);
  strftime(date, 25, "%Y.%m.%d_%H.%M", today);
  strcat(logFn, date);
  strcat(logFn, "__t=");

  sprintf(thisLog, "%s%s", logFn, "0000000.log");

  printf("Log file: %s\n", thisLog);
  logFile = fopen(thisLog, "w");

  for (i=0; i<argc; i++) {
    fprintf( logFile, "%s", argv[i]);
    fputc( i==0 ? '(' : i<argc-1 ? ',' : ')', logFile);
    }
  fprintf( logFile, "\n");
  fprintf( logFile, "MPI Initialized...\n");
#endif

...
  PrintData(t, u, data);

  for (iout=1, tout=T1; iout <= NOUT; iout++, tout += DTOUT) {
...
    PrintData(t, u, data);
  }

#if DELAY
  fclose (logFile);
#endif
  return(0);
}

static void PrintData(realtype t, N_Vector u, UserData data)
{
  char *fmt_spec = "";
  char strFmt[50] = "";

#if DELAY
  // Create new logfile (if necessary)
  char thisLog[100] = "";
  if ((int)t % FILEBREAK == 0) {
    if (t != 0.0L) {
      // t=0 logfile is created in mainline.
      fclose(logFile);
      sprintf(strFmt, "%%s%%07%sd%%s", fmt_spec);
      sprintf(thisLog, strFmt, logFn, (int)(t/FILEBREAK), ".log");
      printf("Log file: %s\n", thisLog);
      logFile = fopen(thisLog, "w");

    }
  }
#endif

#if DELAY
  fprintf( logFile, "PrintData(%f):\t", t);
#endif

... Receive data from other processes ...

    if ((int)t % FILEBREAK == 0) {
      if (t != 0.0L) {
        fclose(t_slot);
        fclose(u1);
        //#if DELAY
        //      fclose(logFile);
        //#endif
      }
      sprintf( strFmt, "t = %%05.%sf. ", fmt_spec);
      //      printf( strFmt, t);

      sprintf( strFmt, "N=%%03dx%%03d_Beta=%%.2%sf_tol=%%1.0%se_t=%%07d", fmt_spec, fmt_spec);
      sprintf( strFnFmt, strFmt, N, N, beta, ATOL, (int)(t/FILEBREAK));
      //      sprintf( strFmt, "_t=%%07d_N=%%02dx%%02d_Beta=%%.2%sf_tol=%%.1%se.csv", fmt_spec, fmt_spec);
      //      sprintf( strFnFmt, strFmt, (int)t, N, N, beta, ATOL);
      //      sprintf( strFmt, "_t=%%07d_N=%%02dx%%02d_tol=%%.2%se_npes=%%d.csv", fmt_spec);
      //      sprintf( strFnFmt, strFmt, (int)t, N, N, ATOL, npes);
      //      sprintf( strFmt, "_t=%%07d_N=%%02dx%%02d_tol=%%.2%se.csv", fmt_spec);
      //      sprintf( strFnFmt, strFmt, (int)t, N, N, ATOL);

      printf("\nOverwriting files: ");
      sprintf(fn, "/raid/fpuData/");
      strcat(fn, strFnFmt);
      strcat(fn, "_T.csv");
      t_slot = fopen(fn, "w");
      printf("%s,\t", fn);

      sprintf(fn, "/raid/fpuData/");
      strcat(fn, strFnFmt);
      strcat(fn, "_U.csv");
      u1 = fopen(fn, "w");
      printf("%s\n", fn);

      printf("t=");
    }

    /*
    if ((int)t % 100 == 0) {
      printf("%g,", t);
    }
    */

#if DELAY
    fprintf(logFile, "printing...");
#endif
    // Print the time.
    //    sprintf( strFmt, "%%%sg", fmt_spec);
    fprintf(t_slot, "%d", (int)t);
    fputc('\n', t_slot);

    if ((int)t % 100 == 0) {
      printf("%d,", (int)t);
    }

    // Print the position and velocity data
    sprintf( strFmt, "%%.25%sg", fmt_spec);
    for( n=0; n<numElts; n++) {
      fprintf( u1, strFmt, z[n]);
      fputc( n<numElts-1 ? ',' : '\n', u1);
    }
#if DELAY
    fprintf( logFile, "printed.");
#endif
  }
#if DELAY
  fprintf( logFile, "\n");
#endif
  return;
}

_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.


Last edited by odeSolver on Sun Jun 30, 2013 8:43 pm; edited 1 time in total
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Sun Jun 30, 2013 5:00 am    Post subject: Reply with quote

Hi!

A few things to try that come to mind:

What does the ulimit command return? Run it on the cluster you're using to run your program and run it as the same user as you've used for your program. If it says anything other than unlimited, it means there's restrictions placed on how much cpu, I/O, can be done. These would have to be lifted.

Another line of thought wonders whether there's stray pointers or other errors that are causing the variables related to output to get overwritten. There's a lot of fixed-size array stuff going on. The line char strFmt[50] = ""; seems especially questionable, given that you sprintf into it. Are you certain the length never exceeds 49 characters (the 50th is needed for the ending null)? Perhaps try increasing that to 200 or even 500 just to be sure. The same goes for char thisLog[100] = "" and its uses.

Also, in gdb, print out t_slot, u1, and logFile early in the program, and again later after the problem happens. They are pointers, which will show as hexadecimal numbers. Don't worry about the specific value, however make sure it isn't mysteriously changing as the program runs. If it changes when it shouldn't, it could indicate that you are having memory corruption issues due to some array overrunning its boundaries.

You can try compiling the program using -fmudflap. This instruments the program to help find stuff like going past the end of arrays. valgrind is another excellent tool to find memory corruption issues.

Also - generally - it is considered a bad idea to use dynamically-changable format strings for the *printf* class of functions. The compiler can help catch a lot of trouble spots but that only works when the format string is a regular constant string. It's a better idea to use several printfs if that's what it takes, choosing the one to use with if or switch statements and looping as necessary (or, more advanced, placed in separate functions and selected with a pointer-to-function).

Good luck, hope this helps!
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 5:58 am    Post subject: Reply with quote

Wow, fast reply! And I was just going to go to bed after posting that.
Akkara wrote:
Hi!

A few things to try that come to mind:

What does the ulimit command return? Run it on the cluster you're using to run your program and run it as the same user as you've used for your program. If it says anything other than unlimited, it means there's restrictions placed on how much cpu, I/O, can be done. These would have to be lifted.

It reports "unlimited"
Code:
~>ulimit
unlimited



Akkara wrote:
Another line of thought wonders whether there's stray pointers or other errors that are causing the variables related to output to get overwritten. There's a lot of fixed-size array stuff going on. The line char strFmt[50] = ""; seems especially questionable, given that you sprintf into it. Are you certain the length never exceeds 49 characters (the 50th is needed for the ending null)? Perhaps try increasing that to 200 or even 500 just to be sure. The same goes for char thisLog[100] = "" and its uses.

I counted most of those out before realizing that they don't have to be the exact length necessary. I made the following changes
Code:
char logFn[1000] = "/logs/fpuLog_";
char thisLog[1000] = "";
char strFmt[1000] = "";
char thisLog[1000] = "";



Akkara wrote:
Also, in gdb, print out t_slot, u1, and logFile early in the program, and again later after the problem happens. They are pointers, which will show as hexadecimal numbers. Don't worry about the specific value, however make sure it isn't mysteriously changing as the program runs. If it changes when it shouldn't, it could indicate that you are having memory corruption issues due to some array overrunning its boundaries.

This idea will have to wait until morning. But one question: since I create new files every 10000 lines, wouldn't those file pointers change anyway? I have no reliable way of attaching to the process during the file when the problem starts since I never know when it will start. But I can look for a way which attaches every 1000 loops, or something similar, and just watch it.


Akkara wrote:
You can try compiling the program using -fmudflap. This instruments the program to help find stuff like going past the end of arrays. valgrind is another excellent tool to find memory corruption issues.

This will also have to wait until morning. But a couple of questions: I don't see -fmudflap in the help for gcc. Nor do I see anything for -f, -m, etc. Do I have a different version of the compiler or something? Also, my program is compiled using a different compiler called mpicc and a make file which was given to me. I'll have to figure this one out.

How do I invoke valgrind? Command valgrind wasn't found. If it's not free, I won't have it.


Akkara wrote:
Also - generally - it is considered a bad idea to use dynamically-changable format strings for the *printf* class of functions. The compiler can help catch a lot of trouble spots but that only works when the format string is a regular constant string. It's a better idea to use several printfs if that's what it takes, choosing the one to use with if or switch statements and looping as necessary (or, more advanced, placed in separate functions and selected with a pointer-to-function).

I think you're referring to the strings where I use double percent signs, like this one:
Code:
sprintf(strFmt, "%%s%%07%sd%%s", fmt_spec);

I did that, at great time cost, because package CVODE has its own data type, called realtype, which can be different on different machines. So I don't know what format specifier to use to output realtype data. Instead, I select the format specifier using compiler commands
Code:
#if defined(SUNDIALS_EXTENDED_PRECISION)
  char *fmt_spec = "L";
#elif defined(SUNDIALS_DOUBLE_PRECISION)
  char *fmt_spec = "";
#else
  char *fmt_spec = "";
#endif

Then I create my output string with the correct format specifier for realtype
Code:
sprintf(strFmt, "%%s%%07%sd%%s", fmt_spec);

And finally, I can print the data
Code:
sprintf(thisLog, strFmt, logFn, (int)(t/FILEBREAK), ".log")


Is that what you're talking about? I probably will not be able to permanently change that. But I can change it temporarily to try to isolate the issue.



Akkara wrote:
Good luck, hope this helps!

Thanks for your suggestions.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3264
Location: Canada

PostPosted: Sun Jun 30, 2013 6:13 am    Post subject: Reply with quote

Am I right that you are using parallelized implementation and MPI ? MPI I/O can be tricky set up. Could you run for the test the serial version ? (like setting the number of nodes to 1)
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 7:44 am    Post subject: Reply with quote

dmpogo wrote:
Am I right that you are using parallelized implementation and MPI ? MPI I/O can be tricky set up. Could you run for the test the serial version ? (like setting the number of nodes to 1)

All of the output is done from 1 node, even when running on multiple nodes.
When running on only 1 node - what you called "serial version" - the problem still occurs.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.


Last edited by odeSolver on Sun Jun 30, 2013 6:42 pm; edited 1 time in total
Back to top
View user's profile Send private message
Akkara
Bodhisattva
Bodhisattva


Joined: 28 Mar 2006
Posts: 6702
Location: &akkara

PostPosted: Sun Jun 30, 2013 7:59 am    Post subject: Reply with quote

odeSolver wrote:
I think you're referring to the strings where I use double percent signs, like this one:
Code:
sprintf(strFmt, "%%s%%07%sd%%s", fmt_spec);

I did that, at great time cost, because package CVODE has its own data type, called realtype, which can be different on different machines. So I don't know what format specifier to use to output realtype data. Instead, I select the format specifier using compiler commands
Code:
#if defined(SUNDIALS_EXTENDED_PRECISION)
  char *fmt_spec = "L";
#elif defined(SUNDIALS_DOUBLE_PRECISION)
  char *fmt_spec = "";
#else
  char *fmt_spec = "";
#endif

Then I create my output string with the correct format specifier for realtype
Code:
sprintf(strFmt, "%%s%%07%sd%%s", fmt_spec);

And finally, I can print the data
Code:
sprintf(thisLog, strFmt, logFn, (int)(t/FILEBREAK), ".log")


Is that what you're talking about? I probably will not be able to permanently change that. But I can change it temporarily to try to isolate the issue.


There's a better way of doing this that's no less general than what you have.

But first, a little tidbit about C you might not be familiar with:

A string constant is the familiar text-in-quotes such as the often-seen "Hello! World\n".

It is less well-known that you can have several such strings adjacent to the others. It creates a string-constant that is the concatenation of the strings. For example,
Code:
printf("He" "llo! Wo" "rld\n");


Using this idea, you can:
Code:
#if defined(SUNDIALS_EXTENDED_PRECISION)
#   define FORMAT_FOR_SUNDIALS "%07Ld"
#elif defined(SUNDIALS_DOUBLE_PRECISION)
#   define FORMAT_FOR_SUNDIALS "%07d"
#else
#   define FORMAT_FOR_SUNDIALS "%07d"
#endif


And then in your code, use
Code:
printf("%s" FORMAT_FOR_SUNDIALS "%s", ....);

That strange-looking format string resolves to a string constant, and the compiler can check it for you. And you avoid worrying about creating the right string and making sure your array is big enough for it.

There's variations on this idea. You can put only the "L" and "" part in the FORMAT_FOR_SUNDIALS macro, and put the "%07" and the "d" part together with the rest of the print line.

Or you can make the macro take an argument that fills in the "07" part, if you need to make that adjustable. Such a macro looks like this:
Code:
#   define FORMAT_FOR_SUNDIALS(n) "%" #n "d"
...
printf("%s" FORMAT_FOR_SUNDIALS(07) "%s", ....);


The # character in a preprocessor macro means "expand the next thing as a string". Note: this is just pure text substitution and stringification. So you'll have to use digits here. Passing "x" to that macro results in a literal 'x' appearing in the string, not the runtime value that x might have. (If you need runtime-variable field widths, look into the '*' format character.)
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 2:00 pm    Post subject: Reply with quote

Akkara wrote:
...Also, in gdb, print out t_slot, u1, and logFile early in the program, and again later after the problem happens. They are pointers, which will show as hexadecimal numbers. Don't worry about the specific value, however make sure it isn't mysteriously changing as the program runs. If it changes when it shouldn't, it could indicate that you are having memory corruption issues due to some array overrunning its boundaries...

How do you print out a file handle in gdb, or using a printf statement?

In gdb, p t_slot and p u1 returned pointers. p *t_slot and p *u1 printed several fields which had no meaning to me. p *(int)t_slot, p*(int)u1, p (int)*t_slot and p (int)*u1 all returned the same value, which was just the value of the first field in each FILE. (I understand pointers when I read tutorials, but when I try to put them in practice, I never get what I expect.)

Anything I try in a printf statement fails to compile. For example, the line printf("Time file handle = %d\n", (int)*t_slot, (int)*u1); generates "error: aggregate value used where an integer was expected"
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10587
Location: Somewhere over Atlanta, Georgia

PostPosted: Sun Jun 30, 2013 2:39 pm    Post subject: Reply with quote

t_slot is of type "FILE *" (pointer to FILE) where FILE is a structure. Dereferencing it gets you the bare structure which cannot be cast into an integer. Akkara was suggesting printing out the pointer. Try
Code:
printf("Time file handle = %p\n", t_slot);
Duplicate that line, more or less, for the other file handle.
Code:
man 3 printf
will give you more details on the construction of format strings.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 7:37 pm    Post subject: Reply with quote

Akkara wrote:
There's a better way of doing this that's no less general than what you have.

But first, a little tidbit about C you might not be familiar with:

A string constant is the familiar text-in-quotes such as the often-seen "Hello! World\n".

It is less well-known that you can have several such strings adjacent to the others. It creates a string-constant that is the concatenation of the strings. For example,
Code:
printf("He" "llo! Wo" "rld\n");


Using this idea, you can:
Code:
#if defined(SUNDIALS_EXTENDED_PRECISION)
#   define FORMAT_FOR_SUNDIALS "%07Ld"
#elif defined(SUNDIALS_DOUBLE_PRECISION)
#   define FORMAT_FOR_SUNDIALS "%07d"
#else
#   define FORMAT_FOR_SUNDIALS "%07d"
#endif


And then in your code, use
Code:
printf("%s" FORMAT_FOR_SUNDIALS "%s", ....);

That strange-looking format string resolves to a string constant, and the compiler can check it for you. And you avoid worrying about creating the right string and making sure your array is big enough for it.

There's variations on this idea. You can put only the "L" and "" part in the FORMAT_FOR_SUNDIALS macro, and put the "%07" and the "d" part together with the rest of the print line.

Or you can make the macro take an argument that fills in the "07" part, if you need to make that adjustable. Such a macro looks like this:
Code:
#   define FORMAT_FOR_SUNDIALS(n) "%" #n "d"
...
printf("%s" FORMAT_FOR_SUNDIALS(07) "%s", ....);


The # character in a preprocessor macro means "expand the next thing as a string". Note: this is just pure text substitution and stringification. So you'll have to use digits here. Passing "x" to that macro results in a literal 'x' appearing in the string, not the runtime value that x might have. (If you need runtime-variable field widths, look into the '*' format character.)


I see where this is a much better way of doing this. I had tried that method, but did not get it to work. One day I will implement your FORMAT_FOR_SUNDIALS example, which will make life much easier. But for now I am going to stick with what I have due to time constraints.

John R. Graham wrote:
t_slot is of type "FILE *" (pointer to FILE) where FILE is a structure. Dereferencing it gets you the bare structure which cannot be cast into an integer. Akkara was suggesting printing out the pointer. Try
Code:
printf("Time file handle = %p\n", t_slot);
Duplicate that line, more or less, for the other file handle.
Code:
man 3 printf
will give you more details on the construction of format strings.

- John

Thanks, John. But I think this is printing the pointer value. For example, I have printf("%s (%p),\t", fn, t_slot);, and I'm getting the output /raid/fpuData/N=128x128_Beta=0.00_tol=1e-07_t=0000000_T.csv (0x1943be0). Is that a pointer value in parenthesis?


I'm working on most all of everyone's suggestions now.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 8:02 pm    Post subject: Reply with quote

OK. I've been experimenting using all of your ideas and I have determined that the problem is related to the /raid device.

I printed out the file handle pointer values immediately after each file is created and immediately before they are closed (see the first printf on the third line, and the last printfs in each of the last two paragraphs). The pointers' values do not change.

Code:
    if ((int)t % FILEBREAK == 0) {
      if (t != 0.0L) {
        printf("Closing Time file handle = %p and Position file handle = %p\n", t_slot, u1);  // Did the file handle's pointer value change?
        fclose(t_slot);
        fclose(u1);
     }

      sprintf(strFmt, "\nt = %%05.%sf. ", fmt_spec);
      printf(strFmt, t);

      sprintf(strFmt, "N=%%03dx%%03d_Beta=%%.2%sf_tol=%%1.0%se_t=%%07d", fmt_spec, fmt_spec);
      sprintf(strFnFmt, strFmt, N, N, beta, ATOL, (int)(t/FILEBREAK));

      printf("Overwriting files: ");
      sprintf(fn, "/raid/fpuData/");
      strcat(fn, strFnFmt);
      strcat(fn, "_T.csv");
      t_slot = fopen(fn, "w");
      printf("%s (%p),\t", fn, t_slot);    // Time file handler's initial value.

      sprintf(fn, "/raid/fpuData/");
      strcat(fn, strFnFmt);
      strcat(fn, "_U.csv");
      u1 = fopen(fn, "w");
      printf("%s (%p)\n", fn, u1);    // Position file handler's initial value.
    }



Note the two lines which read sprintf(fn, "/raid/fpuData/");. My program typically creates the output files on /raid. If I change those lines to a subdirectory of my working directory (like this: sprintf(fn, "data/");), the problem goes away. If I change the code back to create the files on /raid, the problem instantly recurs.

So there is a difference in writing to the two different devices. I need this to work on /raid. How do I troubleshoot and fix this?

Note that the system administrator is a math professor who is not a computer expert and who is exceedingly reluctant to make changes to the OS.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Sun Jun 30, 2013 8:28 pm    Post subject: Reply with quote

So, did you know that if you save enough data for enough long simulations that you can completely fill a 7.3 terrabyte RAID device?

My data files were getting no new output because there were less than 1000 bytes of free space on /raid.
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
dmpogo
Advocate
Advocate


Joined: 02 Sep 2004
Posts: 3264
Location: Canada

PostPosted: Sun Jun 30, 2013 10:36 pm    Post subject: Reply with quote

odeSolver wrote:
So, did you know that if you save enough data for enough long simulations that you can completely fill a 7.3 terrabyte RAID device?
.



Sure, I do that for living :) Imagine 4096x4096x4096 3D simulations (not the very largest being done) with 10 floating values per grid cell/particle. One time slice is 5 terabytes.
Back to top
View user's profile Send private message
odeSolver
Tux's lil' helper
Tux's lil' helper


Joined: 11 Jul 2010
Posts: 84
Location: NJ, USA

PostPosted: Tue Jul 02, 2013 12:26 am    Post subject: Reply with quote

dmpogo wrote:
odeSolver wrote:
So, did you know that if you save enough data for enough long simulations that you can completely fill a 7.3 terrabyte RAID device?
.



Sure, I do that for living :) Imagine 4096x4096x4096 3D simulations (not the very largest being done) with 10 floating values per grid cell/particle. One time slice is 5 terabytes.


Sounds fun. Got any job openings for a recent math grad? :D
_________________
Depserately needs help learning Gentoo Linux in order to use a 32-node cluster for my master's thesis in mathematics.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum