Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
BTRFS: The SSD killer
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Unsupported Software
View previous topic :: View next topic  
Author Message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Tue Jun 29, 2010 5:41 am    Post subject: BTRFS: The SSD killer Reply with quote

So, recently I have been noticing that the erase cycle count on my SSD was going up very rapidly. Its a measure of how much data was being written to it. What I noticed was the something was writing to sda about 1-2MB data every minute, sometimes even more, even when nothing was going on in the system. And this data was random IOs. During a 10 hour period it had written around 2GB of data just idling and erased about 2 erase cycles from a 120GB disk (i.e. the firmware on the SSD is bad at combining smaller random writes and is erasing about 240GB for 2GB of real data written by OS i.e. write amplification of about 120... :x Looks like a firmware bug).

Anyhow, to further debug this, I shutdown X, stopped all services, unmounted all other FSs: almost like single user mode. Still writing about 1-2MB every minute. Then, I had the bright idea. I moved the stuff to another partition on the same disk and formatted that one with ext4. Booted back in and noted the writes: 650KB written in 10 mins (**) i.e. around 1KB per second, compared to anywhere between 15 to 400KB per second (averaged over a long periods i.e. its not necessarily writing every second) with BTRFS.

What the heck is it writing MBs of data for on an idle system?

I have now officially abandoned BTRFS for my root as well as other data I had on it. No BTRFS for me!

(**) I do wanna know what the heck is ext4 writing this much data for on an idle system if someone knows?
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Tue Jun 29, 2010 6:06 am    Post subject: Reply with quote

BTW, this is not the only reason for quiting on BTRFS. I got bit by silent corruptions which were reported in the other BTRFS thread couple of times.
Back to top
View user's profile Send private message
max_power
n00b
n00b


Joined: 01 Aug 2004
Posts: 48
Location: /dev/bed

PostPosted: Tue Jun 29, 2010 9:33 am    Post subject: Reply with quote

how do you read out the cycle count of the ssd?
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Tue Jun 29, 2010 11:34 am    Post subject: Reply with quote

Yeah, how do I check such statistics? I also have a SSD (Samsung with TRIM), BTRFS compressed on root and home and would like to find out if something wrong is going on...
Also this is not really encouraging: http://lkml.org/lkml/2010/6/3/313
Quote:
Unbound(?) internal fragmentation in Btrfs
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Tue Jun 29, 2010 11:58 am    Post subject: Reply with quote

It's still in heavy developpement, so maybe wait a couple of months and then retry.
Back to top
View user's profile Send private message
P.Kosunen
Guru
Guru


Joined: 21 Nov 2005
Posts: 309
Location: Finland

PostPosted: Tue Jun 29, 2010 12:09 pm    Post subject: Reply with quote

max_power wrote:
how do you read out the cycle count of the ssd?


Smartmontools/smartctl i think.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Tue Jun 29, 2010 4:17 pm    Post subject: Reply with quote

P.Kosunen wrote:
max_power wrote:
how do you read out the cycle count of the ssd?


Smartmontools/smartctl i think.
Yes, 'smartctl -a /dev/sda'. Use the latest smartmontools package and the attribute names are self-explanatory.
Back to top
View user's profile Send private message
max_power
n00b
n00b


Joined: 01 Aug 2004
Posts: 48
Location: /dev/bed

PostPosted: Tue Jun 29, 2010 6:26 pm    Post subject: Reply with quote

and which scheduler do you use with your ssd? i set mine to noop, but i am not sure if this this is the optimum. but at least the system should not read or write on the drive if the fifo stack is empty.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Tue Jun 29, 2010 6:37 pm    Post subject: Reply with quote

max_power wrote:
and which scheduler do you use with your ssd? i set mine to noop, but i am not sure if this this is the optimum. but at least the system should not read or write on the drive if the fifo stack is empty.
I use deadline. Plain and simple. No mickey mouse CFQ!
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Tue Jun 29, 2010 7:53 pm    Post subject: Reply with quote

I use SIO, so let's compare results :) Here are mine, I'm going to give you a full dump -- me going to sleep in a few minutes (I did Secure ATA erase before installing gentoo approx 2 months ago, maybe less):

Code:
gentoo-xps64 ~ # smartctl -a /dev/sda
smartctl 5.39.1 2010-01-28 r3054 [x86_64-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG SSD PM800 2.5" 256GB
Serial Number:    YF11700953SY953B3844
Firmware Version: VBM24D1Q
User Capacity:    256,060,514,304 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Tue Jun 29 21:50:27 2010 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
               was never started.
               Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)   The previous self-test routine completed
               without error or no self-test has ever
               been run.
Total time to complete Offline
data collection:        ( 720) seconds.
Offline data collection
capabilities:           (0x53) SMART execute Offline immediate.
               Auto Offline data collection on/off support.
               Suspend Offline collection upon new
               command.
               No Offline surface scan supported.
               Self-test supported.
               No Conveyance Self-test supported.
               Selective Self-test supported.
SMART capabilities:            (0x0003)   Saves SMART data before entering
               power-saving mode.
               Supports SMART auto save timer.
Error logging capability:        (0x01)   Error logging supported.
               General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  12) minutes.
Extended self-test routine
recommended polling time:     (  72) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       790
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       286
175 Program_Fail_Count_Chip 0x0032   099   099   011    Old_age   Always       -       1
176 Erase_Fail_Count_Chip   0x0032   100   100   011    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0013   099   099   017    Pre-fail  Always       -       17
178 Used_Rsvd_Blk_Cnt_Chip  0x0013   077   077   011    Pre-fail  Always       -       28
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   093   093   010    Pre-fail  Always       -       538
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   093   093   010    Pre-fail  Always       -       7398
181 Program_Fail_Cnt_Total  0x0032   099   099   010    Old_age   Always       -       2
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   099   099   010    Pre-fail  Always       -       2
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   253   253   000    Old_age   Always       -       0
232 Available_Reservd_Space 0x0013   077   077   011    Pre-fail  Always       -       96
233 Media_Wearout_Indicator 0x0032   099   099   000    Old_age   Always       -       2660

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       472         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Back to top
View user's profile Send private message
bollucks
l33t
l33t


Joined: 27 Oct 2004
Posts: 606

PostPosted: Wed Jun 30, 2010 1:19 am    Post subject: Re: BTRFS: The SSD killer Reply with quote

devsk wrote:
(**) I do wanna know what the heck is ext4 writing this much data for on an idle system if someone knows?

All journalled file systems will write out a certain amount of data to the journal every certain time period. By default that time period is 5 seconds. If you want a truly idle filesystem, try booting ext4 without the journal enabled (nolog), but of course you'll lose the filesystem safety of journalling if you do this.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Wed Jun 30, 2010 2:20 am    Post subject: Re: BTRFS: The SSD killer Reply with quote

bollucks wrote:
devsk wrote:
(**) I do wanna know what the heck is ext4 writing this much data for on an idle system if someone knows?

All journalled file systems will write out a certain amount of data to the journal every certain time period. By default that time period is 5 seconds. If you want a truly idle filesystem, try booting ext4 without the journal enabled (nolog), but of course you'll lose the filesystem safety of journalling if you do this.
I do know they need to write the journal every 5 (or "commit") seconds, but if the journal size is greater than the data written in an idle system (hence making overall write a multiple of useful data written), the FS is doing something wrong, which is the main problem described in this thread.

I think ext4 is also writing more metadata than the data that's being written. The question can be rephrased like this: How can the system write 1MB of log files in an idle system in an hour but write 5MB of journal? Those numbers are not real but they are meant to illustrate the question.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Wed Jun 30, 2010 2:22 am    Post subject: Reply with quote

@mbar: your Samsung disk is different from my OCZ Vertex and SMART data is completely different. I don't know what's what. You will have to get help from Samsung forums/techs about what that data really means.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Wed Jun 30, 2010 6:06 am    Post subject: Reply with quote

ok, could you please post yours?
Back to top
View user's profile Send private message
haarp
Guru
Guru


Joined: 31 Oct 2007
Posts: 535

PostPosted: Wed Jun 30, 2010 6:12 am    Post subject: Reply with quote

What brand of SSD do you use? I have an Intel one and now I'm afraid I'll have to get rid of btrfs aswell if this is indeed the case :/
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Wed Jun 30, 2010 6:28 am    Post subject: Reply with quote

mbar wrote:
ok, could you please post yours?
Code:
# ssd-stats sdb

Drive sdb:
184     Initial_Bad_Block_Count 44
195     Program_Failure_Blk_Ct  0
196     Erase_Failure_Blk_Ct    0
197     Read_Failure_Blk_Ct     0
198     Read_Sectors_Tot_Ct     5448667150
199     Write_Sectors_Tot_Ct    3942772932
200     Read_Commands_Tot_Ct    128687675
201     Write_Commands_Tot_Ct   29873046
202     Error_Bits_Flash_Tot_Ct 1886205
203     Corr_Read_Errors_Tot_Ct 1788537
204     Bad_Block_Full_Flag     0
205     Max_PE_Count_Spec       5000
206     Min_Erase_Count 4                                                                                                                                                   
207     Max_Erase_Count 3847
208     Average_Erase_Count     134
209     Remaining_Lifetime_Perc 98
It looks nice on a terminal than here.

ssd-stats is a script wrapper around smartctl.

Code:
$ cat ssd-stats
#!/bin/sh
if [ $# -eq 0 ]
then
        echo "Usage: $0 <device>"
        echo "       $0 sda sdb"
        echo ""
        exit 1
fi
for i in "$@"
do
        echo ""
        echo "Drive $i:"
        drv=`readlink /dev/$i`
        [ -z "$drv" ] && drv="/dev/$i"
        smartctl -a $drv | grep "^[12][089]" | awk '{print $1"\t"$2"\t"$10}'
        echo ""
done
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Wed Jun 30, 2010 6:33 am    Post subject: Reply with quote

haarp wrote:
What brand of SSD do you use? I have an Intel one and now I'm afraid I'll have to get rid of btrfs aswell if this is indeed the case :/
Mine is OCZ Vertex. The controller is different, firmware is different. So, I don't know how much write-amplification matters with Intel's controller. But Indilinx firmware is pretty bad with write-amplification. I just posted on OCZ forums about writing 242MB data in 23 hours and using up 5 erase cycles i.e. 600GB of data erased by firmware for a 242MB of data written by OS. That's a write-amplification of 2500! Unheard of! Something really screwy is going on with 1.6 firmware on Indilinx drives.

Also, it seems like BTRFS writes are small in size and random in placement.
Back to top
View user's profile Send private message
mbar
Veteran
Veteran


Joined: 19 Jan 2005
Posts: 1990
Location: Poland

PostPosted: Wed Jun 30, 2010 6:40 am    Post subject: Reply with quote

Then I think I may hope that Samsung firmware is better in that aspect.
BTW, if someone is interested, here's Samsung PM800 datasheet (with some SMART information): http://www.samsung.com/global/business/semiconductor/products/flash/ssd/2008/down/pm800_25_inch.pdf
Back to top
View user's profile Send private message
Ant P.
Watchman
Watchman


Joined: 18 Apr 2009
Posts: 6920

PostPosted: Tue Jul 06, 2010 8:03 pm    Post subject: Reply with quote

Don't want to cause a panic here, but it seems btrfs may be FUBAR by design.
Back to top
View user's profile Send private message
devsk
Advocate
Advocate


Joined: 24 Oct 2003
Posts: 2995
Location: Bay Area, CA

PostPosted: Tue Jul 06, 2010 8:10 pm    Post subject: Reply with quote

Ant_P wrote:
Don't want to cause a panic here, but it seems btrfs may be FUBAR by design.
That was already discussed here on Gentoo forums.
Back to top
View user's profile Send private message
cach0rr0
Bodhisattva
Bodhisattva


Joined: 13 Nov 2008
Posts: 4123
Location: Houston, Republic of Texas

PostPosted: Wed Jul 07, 2010 12:16 am    Post subject: Reply with quote

devsk wrote:
Ant_P wrote:
Don't want to cause a panic here, but it seems btrfs may be FUBAR by design.
That was already discussed here on Gentoo forums.


++

and it's already resulting in a patch
_________________
Lost configuring your system?
dump lspci -n here | see Pappy's guide | Link Stash
Back to top
View user's profile Send private message
d2_racing
Bodhisattva
Bodhisattva


Joined: 25 Apr 2005
Posts: 13047
Location: Ste-Foy,Canada

PostPosted: Wed Jul 07, 2010 11:39 am    Post subject: Reply with quote

I hope that they resolve that kind of problem, because it seems that BRTFS may become the next standard like EXT2/EXT3 was a couple years ago.
Back to top
View user's profile Send private message
DigitalCorpus
Apprentice
Apprentice


Joined: 30 Jul 2007
Posts: 283

PostPosted: Thu Jul 08, 2010 10:53 pm    Post subject: Reply with quote

I have to ask since this was never mentioned, but did you use the ssd mount option for BTRFS?
_________________
Atlas (HDTV PVR, HTTP & Media server)
http://mobrienphotography.com/
Back to top
View user's profile Send private message
Shining Arcanine
Veteran
Veteran


Joined: 24 Sep 2009
Posts: 1110

PostPosted: Thu Jul 08, 2010 10:59 pm    Post subject: Reply with quote

max_power wrote:
and which scheduler do you use with your ssd? i set mine to noop, but i am not sure if this this is the optimum. but at least the system should not read or write on the drive if the fifo stack is empty.


It is only optimal on a single core system. On a multicore system, CFQ can do optimizations on requests such that multiple requests to the same region of the virtual address space can be merged, increasing performance.
Back to top
View user's profile Send private message
DestroyFX
n00b
n00b


Joined: 05 Dec 2005
Posts: 44

PostPosted: Sat Jul 10, 2010 2:58 pm    Post subject: Reply with quote

For SSD, you must:

  • Use NOOP scheduler
  • align partition with HDD blocks and use the same size of sectors if possible
  • use noatime, compress, ssd_spread and nodiratime mount options


The *atime are usefull for not writhing access time of everything....

I have a cheapo Patriot Warp 2 128GB SSD and the only FS working without shuttering is btrfs+ssd options.

Also, I recommand to use TMPFS for
  • /tmp
  • /var/tmp
  • /var/log
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Unsupported Software All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum