Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Samba + sleeping drives. Can we wake up faster? Solved
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Tue Aug 05, 2014 6:13 pm    Post subject: Samba + sleeping drives. Can we wake up faster? Solved Reply with quote

Success!!

Posted this to the beginning of the thread in order to guide all those who follow.

Why?
I wanted to put my DM software based RAID6 to sleep when not in use. At 10 watts per drive, it adds up! I did not want to wait 10 seconds per drive, in series for the array to come to life. I was tired of my windows desktop hanging while waiting for a simple directory look up on my NAS.

Disclaimer:
Do not come crying to me when you destroy a hard drive, loose all your data, fry a power supply, or cause a small country to be erased from the face of the Earth.

The key points covered below:
Drive Controller
Bcache
Inotify

Drive Controller
My server/NAS was running 3X LSI SAS 1068e controllers to control my 7 drives RAID 6. Turns out that the cards are hard coded to spin up in series. No way to get around it, it just is. This happens to apply to ANY card running the LSI 1068e chipset, such as a Dell Perc 6/i, or HP P400. This may even apply to all LSI based cards. To make matters worse, the cards are smart and will only spin one drive up at a time across all 3 cards. My 7 disk RAID 6 was taking 50 seconds to spin up (10 seconds per drive). This was dropped to 40 seconds when I moved 1 drives to the on board SATA controller. That was my first clue. Thanks to the Linux-Raid group mailing list for the help isolating this one.

So I was on the Internets looking for a new, cheap, 12~16 port SATAII controller card. I found a very strange card on ebay. A "Ciprico Inc. RAIDCore" 16-port card. I cant even find any good pictures or links to add to this post so you can see it. It basically has 4 Marvell controllers and a pci-e bridge strapped onto a single card. No brains, no nothing. Just a pure, dumb controller with out any spin up stupidity. Same chipset (88SE6445) found on some RocketRAID cards. It was EXACTLY what I was looking for. At a cost of $60 I was thrilled. In Linux is shows up as a bridge + controller chips:
Code:
07:00.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)
08:02.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)
08:03.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)
08:04.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)
08:05.0 PCI bridge: Integrated Device Technology, Inc. PES24T6 PCI Express Switch (rev 0d)
09:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
0a:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
0b:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)
0c:00.0 SCSI storage controller: Marvell Technology Group Ltd. 88SE6440 SAS/SATA PCIe controller (rev 02)

Bcache https://www.kernel.org/doc/Documentation/bcache.txt
Now that I have the total spin up time down from 50 seconds ((number_of_drives *10) -2) to 10 seconds. I was able to address the reaming 10 seconds using caching. In this case I am using bcache. My operating system disks 2X are OCZ Deneva 240GB SSD's set up in a basic mirror. I partitioned these drives out and used 24GB's as a caching device for my raid. Quickly found out that bcache is unstable on the 3.16 kernel and was forced back to the 3.14lts kernel. After I landed on the 3.14.15 kernel everything is running great. The basic bcache setting work, but I wanted more:
Code:
#Setup bcache just the way I like it, hun-hun, hun-hun
#Get involved in read and write activities
echo "writeback" > /sys/block/bcache0/bcache/cache_mode

#Allow the bcache to put data in the cache, but get it out as fast as possible
echo "0" > /sys/block/bcache0/bcache/writeback_percent
echo "0" > /sys/block/bcache0/bcache/writeback_delay
echo $((16*1024)) > /sys/block/bcache0/bcache/writeback_rate

#Clean up jerky read performance on file that have never been cached.
echo "16M" > /sys/block/bcache0/bcache/readahead

I put all the above code in rc.local so my system picks them up on boot. Writes still need to wake the array, but reads from cache don't even wake up the drives.
Code:
root@nas:/data# time (dd if=/dev/zero of=foo.dd bs=4096k count=16 ; sync)
16+0 records in
16+0 records out
67108864 bytes (67 MB) copied, 0.0963405 s, 697 MB/s

real    0m10.656s  #######Array spin up time#########
user    0m0.000s
sys     0m0.128s

root@nas:~# ./sleeping_raid_status.sh
/dev/sdc standby
...
/dev/sdd standby
root@nas:/data#  time (dd if=foo.dd of=/dev/null iflag=direct)
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 0.118975 s, 564 MB/s

real    0m0.121s  ########Array never even woke up#########
user    0m0.024s
sys     0m0.096s
root@nas:~# ./sleeping_raid_status.sh
/dev/sdc standby
/dev/sdj standby
...

Inotify
Wait... The array did not spin up because it read from cache?! Not good, but working exactly as expected. I have the file metadata in cache, but what happens when I want to read the file... 10 seconds later... Normally when I find a media file, I want to read/watch/listen to it. I accessed the metadata; preemptive spin up? Time for a fun script using inotify.

I actually took this script one step further then just preemptive spin up and have it do all drive power management. Turns out different drive manufactures interpret `hdparm -S 84 $DRIVE` (Go to sleep in 7m) differently. This whole NAS was built on the cheap and I have 4 different types of drives in my array.
Code:
#!/bin/bash
WATCH_PATH="/data"
ARRAY_NAME="data"
SLEEPING_TIME_S="600"
ARRAY=`ls -la /dev/md/$ARRAY_NAME | awk -F"../" '{print $5}'`

PARTS=`ls /sys/block/$ARRAY/slaves | sed 's/[^a-z]*//g'`

set -m

while [ 1 ];do
  inotifywait $WATCH_PATH -qq -t $SLEEPING_TIME_S
  if [ $? = "0" ];then
    #echo -n "Start waking: "
    for i in $PARTS; do
      (hdparm -S 0 /dev/$i) &
    done
    #echo "Done"
  else
    #echo -n "Make go sleep: "
    for i in $PARTS; do
      STATE=`hdparm -C /dev/$i | grep "drive state is" | awk '{print $4}'`
      #Really should check that the array is not doing something block related, like a check or rebuild
      if [ "$STATE" != "standby" ];then
        hdparm -y /dev/$i > /dev/null 2>&1
      fi
    done
    #echo "Done"
  fi
  sleep 1s
done


A few other key points have been addressed in this thread. There is much greater details in the below posts:
Spinning drives up/down puts wear on drives, but it is more cost effective to sleep the drives and wear them out then it is to pay for the power.
Spinning up X drives at once puts a huge load on the PSU (Power Supply Unit). According to Western Digital, their 7200RPM drives spike at 30 watts during spin up. You have been warned.
Warning, formatting a drive for bcache will remove ALL your data. There is no way to remove bcache with out reformatting the device.
5400RPM drives take about 10 seconds to spin up. 7200RPM take about 14 seconds to spin up.











####Original starting post####
Everything is working as expected, which is really frustrating.

I have a home NAS with 6X 2TB drives in a software RAID 6 configuration. The array is formatted with XFS and holds all my media, such as movies and music. After 7m all my drives fall asleep. I don't want to run 6X drives at 10W each 24/7.

Code:
#Sleep time in inc's of 5s. (84*5)/60=7m
for disks in `ls -1 /dev/sd?`
do
  hdparm -S 84 $disks
done

I share out all my media to my windows, or android systems with Samba. When I first go to access my media everything hangs (In windows, or android) for about 15~30s while the drives spin up.
Code:
# Global parameters
[global]
log file = /var/log/samba/log.%m
server string = nas
workgroup = lan
max log size = 50
read raw = yes
write raw = yes

#Showing up on the network
local master = yes
os level = 255
preferred master = yes

[Media]
path = /home/public/Media
mangled names = no
read only = no


Is there any way to mitigate, mask, or cache the drives so the spin up time does not seem so painful?


Last edited by DingbatCA on Thu Aug 28, 2014 8:55 pm; edited 1 time in total
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Sun Aug 10, 2014 6:15 pm    Post subject: Reply with quote

I think you're pretty much asking for two conflicting desires. If the data you want is not in cache, it has to spin up the disks which means you wait. So pretty much if you don't want to wait, keep the disks spinning or keep the data you want frequently on a disk that remains spinning.

(If your PSU is very hefty, I don't know if there's a way to get mdraid to simultaneously spin up all disks, as currently it will stagger spin - which is much less wear and tear on your system.)

I end up having to run my 4x500GB RAID5 spun up 24/7 since it's being used so randomly, albeit lightly - the the spin up/down will get annoying as well as start eating into the lifetime of the disks. Which may or may not be the case for you...
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Sun Aug 10, 2014 8:31 pm    Post subject: Reply with quote

I very much so have two conflicting desires.

My PSU has the power. I am running a Ablecom SP762-TS which is a 3 way redundant power supply. My whole system is a re-purposed server.

I was not aware that mdraid did a staggered spin up, by default. I will hunt around and see if I can find out how to disable/adjust that.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Sun Aug 10, 2014 9:50 pm    Post subject: Reply with quote

This is just strange. So I wrote a simple script to look at the state of my drives as they spin up:
Code:
while [ 1 ]; do
date
hdparm -C /dev/sdb1 | grep "drive state"
hdparm -C /dev/sdc1 | grep "drive state"
hdparm -C /dev/sdd1 | grep "drive state"
hdparm -C /dev/sde1 | grep "drive state"
hdparm -C /dev/sdf1 | grep "drive state"
hdparm -C /dev/sdj1 | grep "drive state"
hdparm -C /dev/sdi1 | grep "drive state"
sleep 0.1
done


But it looks like hdparm freezes when the drives go to spin up. Nothing in the logs about it.
Code:
Sun Aug 10 14:34:34 PDT 2014
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
Sun Aug 10 14:34:34 PDT 2014
 drive state is:  standby
 drive state is:  active/idle
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
Sun Aug 10 14:34:44 PDT 2014
 drive state is:  standby
 drive state is:  active/idle
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  standby
 drive state is:  active/idle
Sun Aug 10 14:35:01 PDT 2014
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
Sun Aug 10 14:35:20 PDT 2014
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle
 drive state is:  active/idle


I might have to take this question over to the mdraid guys for help.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Sun Aug 10, 2014 11:43 pm    Post subject: Reply with quote

I think the IDE commands are serialized so yes they will stop when there's an outstanding request to spin the disk...
Also it is possible for two disks to spin up but eventually all need to be spun up.

I have to say it's not "staggered" but rather "serialized" - it will fetch from the disks as needed but this has the effect of staggered startup as getting all the requests out at the same time isn't likely...

Also keep in mind "server quality" means "24/7 99.999% availability" not "spin up spin down as needed" - so you are still using it in an unintended manner :D
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Sun Aug 10, 2014 11:54 pm    Post subject: Reply with quote

Wow. If it is truly "serialized" than this is going to become a big problem as I add more drives to grow the array. Any way of caching the file system's metadata? Trying to give the drives time to spin up in the background with out completely hanging the clients request.
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Aug 11, 2014 12:27 pm    Post subject: Reply with quote

The problem I found was that even if you had gigantic caches, everything would still hang as soon as you requested something outside the cache, as requests that hit the cache don't necessarily wake the disks up.

I have yet to find a nice way around what you describe.

In the end I just got some low-RPM WD Greens and just let them stay spinning! They automatically park the heads when not in use but keep the disks spinning, so you save some power while idling, albeit not as much as a fully sleeping drive, but obv the recovery is a lot faster!

(On a slight tangent, I recently switched to the newer Reds; They run 15C cooler and draw slightly less power vs the 1st gen Greens!)
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Mon Aug 11, 2014 2:47 pm    Post subject: Reply with quote

Time to have some fun! This is Linux, we can solve this.

My main array with 7X 2TB Western Digital Caviar Green drives. I have two other arrays in the same system. The OS array is a mirror running 2X OCZ Deneva 240GB SSD's. The archive array is a mirror running 2X Hitachi Deskstar 7K500 with btrfs and compression.

What type of gigantic caches were you able to put in place? Here is my idea. Setup inotify to watch the cache. When it is accessed, start all disks in the array. This falls apart if the cache cant be watched by inotify, like the generic system cache. Or if the cache is global, and not per array. In a worst case scenario, this trick might be employed against the array its self to start all drives up in parallel, but that would only save a few seconds.
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Aug 11, 2014 5:50 pm    Post subject: Reply with quote

That's the spirit! :D

Well 'gigantic' was about 2GB on my old server :lol:. I haven't played with it much on my new one (Currently the cache is 12GB :lol: ) since all the disks just spin perpetually (I find running a torrent server with 400+ seeds keeps it busy and random enough that it never gets to sleep!)

One thing to watch out for is that the IO system tends to block while it waits for the disk to spin up. I know the Explorer threads on my Windows machines would lock up until any sleeping disks woke up and started doing Samba's bidding.

I just had a thought tho' - IIRC Linux 'recently' added the ability to use other devices as an intermediary cache; I wonder if you could set up a small fast SSD as an intermediary cache - Theoretically it would be easier to monitor that for access than the cache in RAM? - and then use that to trigger the disk wakeup?
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Mon Aug 11, 2014 6:03 pm    Post subject: Reply with quote

I have my OS on 2X 240GB SSD's. There are lots of ways I can cut a chunk of SSD out for an intermediary cache. I think, in this case, you are referring to bcache (http://en.wikipedia.org/wiki/Bcache).

A RAM based cache also works, as long as it is treated as read only. Dont want a power outage causing loss of data or corruption.

I have the RAM, or the SSD storage. I would rather use a non-persistent RAM cache. Something like a cache in tmpfs.

Using the SSD mirror works, but kinda defeats the point of my RAID 6.

Ideas?
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Mon Aug 11, 2014 6:14 pm    Post subject: Reply with quote

I'd just say, just keep the disks spinning and at least allow the heads to unload, at least you'll get some saving there. The I/O blocking is indeed very annoying during interactive use.

No matter how big your cache, chances are, you'll always be fetching something that's not in cache (why would you be reading the same thing over and over again?)...

(As a side issue, I hate my raid5, IOPS is awful for some reason or another... the drives I have are not blacks or reds, I have two WD "blue" and three Hitachi disks in my 4+1 hotspare system and it bogs down badly during nfs use...)
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Mon Aug 11, 2014 6:36 pm    Post subject: Reply with quote

Dont want to be burning 70 watts of power 24/7. Or at least that is the power draw of my 7 disks when spinning according to my power strip.

The cache would only be in place to see read requests. I am really only after caching the FS metadata. This comes into play when I walk the file system from windows. I need to get to the correct directory before I can watch a movie/listen to music. I am tired of windows explorer hanging until the drives spin up. In this case I think a cache of 16mb would be plenty! But I can fling GB's at it.

RAID5 perf. In the case of Linux software raid, you really need 4 drives to get the equivalent of 1 solo drives speed. This is do to the fact that there is no write-through cache capabilities. Raid 6, requires 5 drives before you will get the equivalent performance of 1 stand alone drive. Most of the time the drives are not even the problem. People like to run to many drives on a slow PCI interface. There is also one basic tweak that most n00b's forget to set. Stripe cache size. If it is default, your system will run like trash.
Code:
root@nas:~# cd /sys/block/md125/md
root@nas:/sys/block/md125/md# cat stripe_cache_size
256
root@nas:/sys/block/md125/md# echo $((16*1024)) > stripe_cache_size
root@nas:/sys/block/md125/md# cat stripe_cache_size
16384

And if you really want to have fun, watch the cache during a big write.
Code:
root@nas:~# cd /sys/block/md125/md
watch -n 0.1  cat stripe_cache_active

But Linux software RAID performance is a very large subject that should be on a different thread.
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Mon Aug 11, 2014 9:12 pm    Post subject: Reply with quote

Yea, I remember messing around with a bunch of settings to try and speed up my old mdadm RAID5.
I had stuff like this in my local.start for a while :lol:
Code:

blockdev --setra 8192 /dev/md0
blockdev --setra 2048 /dev/sda /dev/sdb /dev/sdc /dev/sdd
echo 8192 > /sys/block/md0/md/stripe_cache_size


btrfs RAID5 speed seems to be pretty good; I can hit 100MB/s (!!?!) on each RAID element whereas before I'd be lucky to get 150MB/s off the whole array! Beefier CPU and faster bus probably helps, but I also suspect btrfs isn't actually doing real RAID5 at the moment... :(


I forgot about tmpfs; That should work!

I wonder if caching the metadata will be enough tho' if this is to avoid pausing in Windows - Windows doesn't just pull directory table data, but is like the bloatier Linux DE's in that it reads the contents of a lot of the files it touches to generate previews and thumbnails.
That said, I think they split that off into a worker thread in Vista+ so you might be able to get away with it...

Come to think about it, doesn't Linux already prioritise caching the file tables?


Maybe it'd be easier to just set the spindown for like, an hour or two, then it'll spin down when you aren't using it, but stay spinning when you are?
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Mon Aug 11, 2014 9:23 pm    Post subject: Reply with quote

As a rule, when I am using my array it does not spin down. The primary job of the array is media (Music and Movies). In the case of a movie there is almost always disk IO going on. If I set the spin down for 7m or 2 hours it wont really help.

I am good with tmpfs and building the inotify scrip but I don't know how to build the metadata cache.. Can you point me in the right direction?

I wish btrfs RAID5/6 was more stable. :-(
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Mon Aug 11, 2014 11:40 pm    Post subject: Reply with quote

Just adding some more info. Spin up takes about 9.6 seconds. Need at least 5, of the 7 drives spinning to access data. 9.5 x 5 = 48 seconds. I need to find a fix for this... When I fill my drive cage with 15-2 drive the spin up time will be 125s. OUCH!!!
Code:
root@nas:/data# smartctl -a /dev/sdd | grep Spin_Up
  3 Spin_Up_Time            0x0027   150   137   021    Pre-fail  Always       -       9608

root@nas:/data# time (touch foo ; sync)

real    0m49.004s
user    0m0.000s
sys     0m0.004s
root@nas:/data# time (touch foo ; sync)

real    0m50.647s
user    0m0.000s
sys     0m0.008s
root@nas:/data# df -h /data
Filesystem      Size  Used Avail Use% Mounted on
/dev/md125      9.1T  3.8T  5.4T  42% /data
root@nas:/data# mdadm -D /dev/md125
/dev/md125:
        Version : 1.2
  Creation Time : Wed Jun 18 07:54:38 2014
     Raid Level : raid6
     Array Size : 9766909440 (9314.45 GiB 10001.32 GB)
  Used Dev Size : 1953381888 (1862.89 GiB 2000.26 GB)
   Raid Devices : 7
  Total Devices : 7
    Persistence : Superblock is persistent

    Update Time : Mon Aug 11 16:30:16 2014
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : nas:data  (local to host nas)
           UUID : 74f9ce7a:df1c2698:c8ec7259:5fdb2618
         Events : 1038642

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       3       8       49        2      active sync   /dev/sdd1
       4       8       65        3      active sync   /dev/sde1
       5       8       81        4      active sync   /dev/sdf1
       7       8      145        5      active sync   /dev/sdj1
       6       8      129        6      active sync   /dev/sdi1
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Tue Aug 12, 2014 1:26 am    Post subject: Reply with quote

I don't find the power draw a big deal, then again I only have four disks and service requests are not only local, so I can't control who powers the disks up. I've been running a RAID5's for quite a while now, though I was running an Athlon as the server CPU, now I'm running a Core2 Quad. Mostly as this machine is a shell box/VM server/webserver/mailserver. I have another machine that far exceeds the power draw of these disks... and another machine with just its GPU eat more power than the HDDs.

The problem with any cache is that it's still LRU and if you use the cache enough it will discard from the cache. I don't think there is a metadata-only cache available... that would be interesting but potentially wasteful.

Perhaps something easier is just to monitor the network, if you see a SMB packet come by and the disks are sleeping, go ahead and try to spin all of the disks up?

Maybe another way is to break up your raid so you don't have to pay the penalty for spinning up all disks when you only need to use one volume? Then again this complicates other things...

All of my RAID members are on an ICH10 onboard PCIe SATA 3Gbit. Disk sequential read is fine on the server, it's on the order of around 2-3x of a single disk speed (around 150 MB/sec), but random i/o over NFS is awful - even if it's NFS to a VM on the same machine. And yeah I was setting the read ahead and stripe cache larger. The readahead and stripe size (64K) may actually be hurting the performance of small files - I recall my 32K stripe system marginally better than the 64K stripe setup, but it definitely helped hdparm -t /dev/md1 speeds...
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Tue Aug 12, 2014 2:09 am    Post subject: Reply with quote

Still trying to find a good solution for getting access to the data faster. I think I am getting close to an acceptable solution. I asked for help from the linux-raid mailing list and Larkin was kind enough to give me the idea of writing a daemon that controls all sleeping/waking of the array.

So I am currently just playing with ideas.
Code:
root@nas:~# inotifywait /data/
Setting up watches.
Watches established.
/data/ OPEN foo

This worked. The second I touch a file on the array it responds. Even though the array does not respond for 50 seconds. I will roll this into a script/daemon in the morning that will keep track of the arrays activity and, most importantly, issue sleep/wake command in parallel, NOT serial.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Tue Aug 12, 2014 3:27 pm    Post subject: Reply with quote

As far as wear and tear on the disks. Yes, starting and stopping the drives shortens their life span. I don't trust my disks, regardless of starting/stopping, that is why I run RAID 6.

Lets say I use my NAS with it's 7 disks for 2 hours a day, 7 days a week @ 10 watts per drive. The current price for power in my area is $0.11 per kilowatt-hour. That comes out to be $5.62 per year to run my drives for 2 hours, daily. But if I run my drives 24/7 it would cost me $67.45/year. Basically it would cost me an extra $61.83/year to run the drives 24/7. The 2TB 5400RPM SATA drives I have been picking up from local surplus, or auction websites are costing me $40~$50, including shipping and tax. In other words I could buy a new disk every 8~10 months to replace failures and it would be the same cost. Drives don't fail that fast, even if I was start/stopping them 10 times daily. This is also completely ignoring the fact that drive prices are failing. Sorry to disappoint, but I am going to spin down my array and save some money.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 7130
Location: almost Mile High in the USA

PostPosted: Wed Aug 13, 2014 1:15 am    Post subject: Reply with quote

But is it worth ripping your hairs out getting annoyed at waiting for the disks? :D

It's a quality of life issue really then. Replace a disk every year or not have to be annoyed at disk spinup - always available.

I think it's the same cost either way really. Well, for me at least as I don't have as many disks.
_________________
Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Wed Aug 13, 2014 1:29 pm    Post subject: Reply with quote

Well it's definitely not worth the zots required to do this, but it is a fun little experiment :)

Who knows, we might see a paper on The DingbatCA Early Pre-emptive Midline-Storage Wakeup Algorithm in the future :D

It'll be cool to see you come up with and how well it performs!

The ionotify thingy looks to be a good start; The next tricky bit will be caching enough stuff to give the disks time to spin up,
I wonder, if you can cache the filesystem metadata entirely, but also have some sort of learning predictor cache that tries to spot access patterns in order to cache enough relevant stuff to give the array time to spin up.


This really is the sort of thing that a Linux hacker should be doing for a final year project or something :lol:
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Aug 13, 2014 3:34 pm    Post subject: Reply with quote

I am good with waiting for 10 seconds. With a little bit of caching I could mitigate that; if I can get the array to spin up as one unit. But I agree with eccerr0r that my quality of life is not worth waiting a minute every single time I want to use the array. Most of the media devices I have connected to the array will fail before waiting that long.

So, back to hacking, and my latest problem. Inotify work perfectly and responds with in 0.01 seconds of my array being accessed (Watching the mount point /data). But I can not get the disks to spin up in parallel.
Code:
root@nas:~# hdparm -C /dev/sdh /dev/sdg                                         
/dev/sdh:
 drive state is:  standby

/dev/sdg:
 drive state is:  standby

#Two terminal windows dd'ing sdg and sdh.
root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 14.371 s, 0.3 kB/s

real   0m28.139s  ############# WHY?! ################
user   0m0.000s
sys   0m0.000s

#A single drive spin-up
root@nas:~/dm_drive_sleeper# time dd if=/dev/sdh of=/dev/null bs=4096 count=1 iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 14.4212 s, 0.3 kB/s

real   0m14.424s
user   0m0.000s
sys   0m0.000s


I need a way to spin up the drives with an ATA command, not through the Linux block layer. This is starting to feel like I am running into a problem with the kernel it's self?!
Back to top
View user's profile Send private message
Cyker
Veteran
Veteran


Joined: 15 Jun 2006
Posts: 1746

PostPosted: Wed Aug 13, 2014 6:42 pm    Post subject: Reply with quote

Possibly relevant?

http://linux.slashdot.org/story/14/04/12/1833244/linux-315-will-suspend-resume-much-faster


Also, what's your PSU like? HDD spinups, esp. 3.5" disks, have a surprisingly high amp draw and I'm slightly concerned your PSU might blow if it gets repeatedly spiked like that...!
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Aug 13, 2014 9:35 pm    Post subject: Reply with quote

Well. Thanks for the tip Cyker!
Quote:
The Linux 3.15 kernel ... ensured the kernel is no longer blocked by waiting for ATA devices to resume.

Power is not an issue.
Quote:
I am running a Ablecom SP762-TS which is a 3 way redundant power supply. My whole system is a re-purposed server.

Off to play with a new kernel. I will report back soon.
Back to top
View user's profile Send private message
DingbatCA
Guru
Guru


Joined: 07 Jul 2004
Posts: 384
Location: Portland Or

PostPosted: Wed Aug 13, 2014 10:12 pm    Post subject: Reply with quote

Running the shiny new 3.16 kernel.
Code:
root@nas:~# time dd if=/dev/sdg of=/dev/null bs=4096 count=1 iflag=direct
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 13.8612 s, 0.3 kB/s

real   0m27.819s  #################Still blocking###########
user   0m0.000s
sys   0m0.000s

I also tried this same test against my 7 disk array.
Code:
#Two terminal windows dd'ing sdg and sdh.
root@nas:~# time dd if=/dev/md125 of=/dev/null bs=512k count=7 iflag=direct
7+0 records in
7+0 records out
3670016 bytes (3.7 MB) copied, 47.8668 s, 76.7 kB/s

real   0m47.869s
user   0m0.004s
sys   0m0.000s


Failure is always an option? :-(
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10261
Location: Somewhere over Atlanta, Georgia

PostPosted: Wed Aug 13, 2014 10:49 pm    Post subject: Reply with quote

There should be a non-blocking ioctl that could be issued against all drives to spin them up. Let me do some experimentation on my 4-drive RAID5 setup.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum