View previous topic :: View next topic |
Author |
Message |
gabrielg Tux's lil' helper
Joined: 16 Nov 2012 Posts: 134
|
Posted: Tue Nov 20, 2012 3:58 pm Post subject: |
|
|
FWIW, I only have noatime.
Also - did you check the health of your raids? Would be nice to get an output of /proc/mdstat to find out superblock version et al - hopefully the jump in kernels doesn't involve you doing something with mdadm. |
|
Back to top |
|
|
Tambor n00b
Joined: 07 Apr 2005 Posts: 53 Location: Girona (CAT)
|
Posted: Tue Nov 20, 2012 5:01 pm Post subject: |
|
|
Code: |
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sdb1[0] sda1[1]
256896 blocks [2/2] [UU]
md3 : active raid1 sdb3[0] sda3[1]
50010240 blocks [2/2] [UU]
md5 : active raid1 sdb5[0] sda5[1]
50010240 blocks [2/2] [UU]
md6 : active raid1 sdb6[0] sda6[1]
25005056 blocks [2/2] [UU]
md7 : active raid1 sdb7[0] sda7[1]
50010240 blocks [2/2] [UU]
md8 : active raid1 sdb8[0] sda8[1]
107731776 blocks [2/2] [UU]
md2 : active raid1 sdb2[0] sda2[1]
10008384 blocks [2/2] [UU]
unused devices: <none>
|
|
|
Back to top |
|
|
DaggyStyle Watchman
Joined: 22 Mar 2006 Posts: 5909
|
Posted: Tue Nov 20, 2012 9:18 pm Post subject: |
|
|
I find it strange in having raid1 per partition, the logical thing to do imho is to use all as one raid setup and use lvm on it.
here is my fstab:
Code: | /dev/md0 /boot ext3 noauto,noatime,defaults 1 2
/dev/md2 / reiserfs noatime 0 1
/dev/extra/swap none swap sw 0 0
/dev/dvdrw /mnt/dvdrw auto noauto,rw 0 0
/dev/md1p3 /var reiserfs noatime 0 0
/dev/md1p4 /opt reiserfs noatime 0 0
/dev/md1p5 /usr reiserfs noatime 0 0
/dev/md1p2 /usr/portage-tree ext2 noatime 0 0
/dev/md1p1 /usr/portage-bins reiserfs noatime 0 0
/dev/md1p6 /home reiserfs defaults 0 0
/dev/md1p7 /mnt/storage xfs defaults,rw 0 0
/dev/extra/share /mnt/share vfat defaults,rw,users 0 0
/dev/extra/dev_and_utils /mnt/extra reiserfs defaults,rw 0 0
/dev/sdf1 /mnt/usb auto defaults,rw,users,noauto 0 0
|
same here, my root (raid1) has only noatime _________________ Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein |
|
Back to top |
|
|
gabrielg Tux's lil' helper
Joined: 16 Nov 2012 Posts: 134
|
Posted: Wed Nov 21, 2012 11:51 am Post subject: |
|
|
In fairness, nodev and nosuid shouldn't be part of the problem, and in fact it should make the server a little bit more secure by setting those in /var (and /home, and /usr/local, and... ).
Now, back to the problem - the raids seem to be healthy enough, and quite frankly I've run out of ideas.
My understanding is that the first (and perhaps main) impediment is that you can't write to /var, hence you don't get much logging, which is rather unfortunate.
Have you considered booting from a CD and diagnose? Basically:
- Boot up from a CD
- Mount your /dev/md3 somewhere
- Try to write something (touch test or what DaggyStyle suggested)
- See what happens in your /var/log
If you are in a hurry, you can probably even set up a new /var somewhere else:
- Boot up from a CD
- Create a large enough partition somewhere (or even use /)
- rsync your current /var in your /dev/md3 into the new /var
- Modify your fstab to point /var to the new device (or comment it out if you're using root)
- Reboot and see what happens.
Needless to say, this "CD" has to be a Linux one. |
|
Back to top |
|
|
Tambor n00b
Joined: 07 Apr 2005 Posts: 53 Location: Girona (CAT)
|
Posted: Wed Nov 21, 2012 2:01 pm Post subject: |
|
|
It seams clear that the problem is /var. Because you can not write into the partition then you can not loggin, create new logs, ...
The problem is that this problem appears not when you boot the machine, and for instead in some hours or few days. Because booting the machine the logs are generated and you can create files on the /var.
Due to that and looking to "ps" output I noticed that the first process to become "defunct" are the syslog and the cron. Yesterday I rebooted again the machine with syslog-ng and vixie-cron dissabled. The worst thing now is that I don't have any feedback of what is happening on the machine. But people is working and the machine seams to be ok, in situation that crashed the machine before.
Let's see if things continues going Ok in order to be sure that the problem is caused by these two services. |
|
Back to top |
|
|
DaggyStyle Watchman
Joined: 22 Mar 2006 Posts: 5909
|
Posted: Wed Nov 21, 2012 2:12 pm Post subject: |
|
|
Tambor wrote: | It seams clear that the problem is /var. Because you can not write into the partition then you can not loggin, create new logs, ...
The problem is that this problem appears not when you boot the machine, and for instead in some hours or few days. Because booting the machine the logs are generated and you can create files on the /var.
Due to that and looking to "ps" output I noticed that the first process to become "defunct" are the syslog and the cron. Yesterday I rebooted again the machine with syslog-ng and vixie-cron dissabled. The worst thing now is that I don't have any feedback of what is happening on the machine. But people is working and the machine seams to be ok, in situation that crashed the machine before.
Let's see if things continues going Ok in order to be sure that the problem is caused by these two services. |
maybe hd failure of one of the two? _________________ Only two things are infinite, the universe and human stupidity and I'm not sure about the former - Albert Einstein |
|
Back to top |
|
|
gabrielg Tux's lil' helper
Joined: 16 Nov 2012 Posts: 134
|
Posted: Wed Nov 21, 2012 2:30 pm Post subject: |
|
|
Tambor wrote: | The problem is that this problem appears not when you boot the machine, and for instead in some hours or few days. |
Sorry... I didn't realize this.
So... another thing you can do is check SMART on the hard drives, owing to HD failure like DaggyStyle suggests? smartctl -a /dev/sda (and then sdb) should tell you something, although SMART has been known to not tell enough, depending on how good the HD manufacturer is.
Stopping syslog-ng shouldn't harm you if it isn't the problem, but won't tell you much if you run into the issue again.
Perhaps try to mount /var/log elsewhere, away from /dev/md3? The general idea being to keep logging happening to rule out that the issue is that partition.
Good luck! |
|
Back to top |
|
|
Tambor n00b
Joined: 07 Apr 2005 Posts: 53 Location: Girona (CAT)
|
Posted: Wed Nov 21, 2012 2:39 pm Post subject: |
|
|
It is supose, that being the partition a RAID 1. If one of both fails, the other should still work without any problem.
Also we made a fsck.reiserfs on all the partitions and the filesystems were ok.
I can just try to execute "smartctl --all" to both harddrives. But the system has the smartd daemon running always and We didn't had any problem on these hard drives. |
|
Back to top |
|
|
|