Latest interface: 0.3.1
Latest system: 010
aaront
User

75 posts

Posted on 8 October 2014 @ 20:23edited 20:25 43s
1. My samsung 840 evo showed no temp, turns out they don't have 194 Temperature_Celsius only 190 Airflow_Temperature_Cel. My intel drives have both, but in case there are drives with only 194, I added the following line in /usr/local/www/zfsguru/includes/disk.php
insert after line 143
if (!$temp) {$temp = (int)@$smart['data'][190]['raw'];}

I'm sure this can be written better, but it worked for me.

I would think it would be a good idea to add support for some of the more common ssds and ssd specific codes for things like wear leveling, unfortunately they seem to be different across brands. Samsung looks like 177 whereas intel looks like 233

Here is the full dump of a samsung evo 840
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 5675
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 7
177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 6
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 071 057 000 Old_age Always - 29
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 6
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 27188662909

Here is the full dump of an intel s3700
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0032 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5649
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
170 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
171 Program_Fail_Count 0x0032 100 100 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 000 Old_age Always - 0
174 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 2
175 Power_Loss_Cap_Test 0x0033 100 100 010 Pre-fail Always - 628 (32 6638)
183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Temperature_Case 0x0022 079 079 000 Old_age Always - 21 (Min/Max 18/34)
192 Unsafe_Shutdown_Count 0x0032 100 100 000 Old_age Always - 2
194 Temperature_Internal 0x0022 100 100 000 Old_age Always - 21
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
225 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 725810
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age Always - 51
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age Always - 0
228 Workload_Minutes 0x0032 100 100 000 Old_age Always - 36878
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 0
234 Thermal_Throttle 0x0032 100 100 000 Old_age Always - 0/0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age Always - 725810
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age Always - 211

aaront
User

75 posts

Posted on 8 October 2014 @ 20:25edited 20:27 48s
2. seagate constellation ST4000NM0023 (SAS) and I suppose all other scsi/sas devices don't show the lovely ata smart number stats. For better or worse, it looks like this is just how it is, and it's onto text parsing to get any information out of them.

---smartctl -A /dev/da2
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature: 31 C
Drive Trip Temperature: 68 C

Manufactured in week of year 20#
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 86
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 99
Elements in grown defect list: 0

Vendor (Seagate) cache information
Blocks sent to initiator = 3613555035
Blocks received from initiator = 923439551
Blocks read from cache and sent to initiator = 470213542
Number of read and write commands whose size <= segment size = 3397539
Number of read and write commands whose size > segment size = 2495

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 5676.35
number of minutes until next internal SMART test = 30


---smartctl -l error /dev/da2
=== START OF READ SMART DATA SECTION ===
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 3391838666 0 0 3391838666 0 10646.307 0
write: 0 0 0 0 0 473.569 0

Non-medium error count: 0


---
you can get all of the scsi info by doing -a (instead of -A) or call them individually. I'm not about to write a parser for this and try to shove it into zfsguru, but maybe someone else has the skills to do so.
CiPHER
Developer

1199 posts

Posted on 8 October 2014 @ 21:17
Thanks for your report!

aaront wrote: 1. My samsung 840 evo showed no temp, turns out they don't have 194 Temperature_Celsius only 190 Airflow_Temperature_Cel. My intel drives have both, but in case there are drives with only 194
I think i can change that...
Just added some additional stuff to detect 190 in case 194 is missing. Not sure when Jason will release it though.

I would think it would be a good idea to add support for some of the more common ssds and ssd specific codes for things like wear leveling, unfortunately they seem to be different across brands. Samsung looks like 177 whereas intel looks like 233 Well wear leveling is not the same as MWI. Your Intel does give proper information in the 233 Media Wearout Indicator. The normalised value of 100 means you still have 100% of MWI-life left. That means you are still working within the first percent of lifetime writes. Once you eaten up 1%, the counter goes to 99. The raw value may be binary encoded or have a meaning that i do not know. But the normalised values are key here.

I have not added any logic to the web-interface for this; but i can look at it sometime. Probably a new column to give you a percentage of lifetime (MWI) writes. That would be nice!

It is a shame though that SMART is not better standardised. The attributes and also the way the data is stored/formatted can change quite a lot between brands or even models of the same brand.
Last Page

Valid XHTML 1.1