Unexpected NFS Auto-unmount?

Discussion in general that pertains to Sabayon Linux - Must Pertain to Sabayon Linux

Moderator: Moderators


Re: Kernel Fault?

Postby kayasaman » Sat Feb 09, 2013 18:08

don't run "df -h " but smply "mount" to show all mount points
CODE: SELECT ALL
# mount


Fine, but bottom line is that the data is still unaccessible?

Re-mounting is the only option I've got left which shouldn't be needed!

But even then df will still post the file system usage of *all* mounted directories. Just using "mount" to 'show' what is mounted doesn't solve the issue that the mount points are now empty dirs. "cd'ing" into them will even confirm :(
kayasaman
Young Hen
 
Posts: 29
Joined: Wed Feb 06, 2013 5:14


Re: Kernel Fault?

Postby batvink » Sat Feb 09, 2013 20:39

I'm not sure, but since it's network related;
could it be caused by a temp. network downtime.?
Or a network cable that is not properly fitting in the socket,
the router somehow got reset, asking new IP from ISP... etc.
Steps to Reproduce:
1. mount via NFS a share on eth0 network
2. watch -n1 "cat /proc/mounts"
3. unplug eth1 cable
result:
There is a blackout in nfs mount

Also, are you by any chance running a program on your server that might
reset the modification time on various directories of your server
(rsync with the -a option is an example of this.)

and the mountpoints in fstab.
can you try, if defining them as -soft or -hard mountpoints helps ?
User avatar
batvink
Old Dear Hen
 
Posts: 495
Joined: Sun Oct 28, 2007 11:44
Location: the netherlands

Re: Kernel Fault?

Postby sabayonino » Sat Feb 09, 2013 20:55

batvink wrote:I'm not sure, but since it's network related;
could it be caused by a temp. network downtime.?
Or a network cable that is not properly fitting in the socket,
the router somehow got reset, asking new IP from ISP... etc.
Steps to Reproduce:
1. mount via NFS a share on eth0 network
2. watch -n1 "cat /proc/mounts"
3. unplug eth1 cable
result:
There is a blackout in nfs mount

Also, are you by any chance running a program on your server that might
reset the modification time on various directories of your server
(rsync with the -a option is an example of this.)

and the mountpoints in fstab.
can you try, if defining them as -soft or -hard mountpoints helps ?



I have 2 eth also

"3. unplug eth1 cable" .

In my case if I unplugged eth1 (or 0) , connections swap to eth0 (or 1) . My default (or primary) is eth1

my /etc/exports for my own nfs server is :
Code: Select all
/usr/portage/distfiles/packages  192.168.0.[n](async,no_subtree_check,rw,no_root_squash)
[...]
/usr/portage/distfiles/packages  192.168.0.7(async,no_subtree_check,rw,no_root_squash)
[...]
/usr/portage/distfiles/packages  192.168.0.[n](async,no_subtree_check,rw,no_root_squash)


Check if your dhcp server is running
Code: Select all
# rc-update show | grep dhcp


I think that your eth[n] shutdown by powersaving (but I'm not sure)
User avatar
sabayonino
Sagely Hen
 
Posts: 2576
Joined: Sun Sep 21, 2008 1:12
Location: Italy

Re: Kernel Fault?

Postby kayasaman » Sun Feb 10, 2013 4:15

Hi,

I lost the mointpoints again on both my laptop and Desktop (both run Sabayon) :(

Have been SSH'ed to the desktop all this time from laptop connection was fine!

Desktop dmesg output:

Code: Select all
[   14.053377] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
[   16.022798] fbcondecor: console 1 using theme 'sabayon'
[   16.176279] fbcondecor: switched decor state to 'on' on console 1
[   16.335929] fbcondecor: console 2 using theme 'sabayon'
[   16.489608] fbcondecor: switched decor state to 'on' on console 2
[   16.649986] fbcondecor: console 3 using theme 'sabayon'
[   16.803361] fbcondecor: switched decor state to 'on' on console 3
[   16.963121] fbcondecor: console 4 using theme 'sabayon'
[   17.116838] fbcondecor: switched decor state to 'on' on console 4
[   17.277395] fbcondecor: console 5 using theme 'sabayon'
[   17.430715] fbcondecor: switched decor state to 'on' on console 5
[   17.590445] fbcondecor: console 6 using theme 'sabayon'
[   17.744282] fbcondecor: switched decor state to 'on' on console 6
[   17.905719] fbcondecor: console 7 using theme 'sabayon'
[   18.059621] fbcondecor: switched decor state to 'on' on console 7
[   18.220065] fbcondecor: console 8 using theme 'sabayon'
[   18.373563] fbcondecor: switched decor state to 'on' on console 8
[   18.534198] fbcondecor: console 9 using theme 'sabayon'
[   18.688221] fbcondecor: switched decor state to 'on' on console 9
[   18.847985] fbcondecor: console 10 using theme 'sabayon'
[   19.001841] fbcondecor: switched decor state to 'on' on console 10
[   19.162287] fbcondecor: console 11 using theme 'sabayon'
[   19.315737] fbcondecor: switched decor state to 'on' on console 11
[   19.822637] ip_tables: (C) 2000-2006 Netfilter Core Team
[   19.867676] NET: Registered protocol family 10
[   21.924994] fbcondecor: switched decor state to 'off' on console 6
[   22.092003] fbcondecor: console 6 using theme 'sabayon'
[   22.096264] fbcondecor: switched decor state to 'on' on console 6
[   22.513577] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   22.539446] ip6_tables: (C) 2000-2006 Netfilter Core Team
[   22.576587] NVRM: GPU at 0000:03:00: GPU-dfcc001a-fe9b-060e-899a-632e665be76b
[   23.909157] 8021q: 802.1Q VLAN Support v1.8
[   24.047294] r8169 0000:04:00.0: eth0: link down
[   24.048732] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   25.545820] r8169 0000:04:00.0: eth0: link up
[   25.547228] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   27.452446] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[   27.454967] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[   27.457462] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[   27.459938] ondemand governor failed, too long transition latency of HW, fallback to performance governor
[   28.519129] netlink: 12 bytes leftover after parsing attributes.
[69412.202279] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead.


Laptop dmesg output:

Code: Select all
[14147.187126] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda pop-up
[14147.187135] cdrom: Uniform CD-ROM driver Revision: 3.20
[14147.187604] sr 8:0:0:0: Attached scsi CD-ROM sr0
[14147.187929] sr 8:0:0:0: Attached scsi generic sg1 type 5
[14190.148733] ISO 9660 Extensions: Microsoft Joliet Level 1
[14190.206340] ISOFS: changing to secondary root
[14429.682276] CPU2: Package power limit notification (total events = 479)
[14429.682281] CPU3: Package power limit notification (total events = 479)
[14429.682289] CPU1: Package power limit notification (total events = 479)
[14429.682294] CPU0: Package power limit notification (total events = 479)
[14429.686255] CPU2: Package power limit normal
[14429.686260] CPU3: Package power limit normal
[14429.686265] CPU1: Package power limit normal
[14429.686269] CPU0: Package power limit normal
[14730.616418] usb 2-1.2: USB disconnect, device number 4
[32501.591671] cfg80211: Calling CRDA to update world regulatory domain
[32504.845339] wlp3s0: authenticate with 10:bd:18:c8:a6:a0
[32504.848404] wlp3s0: send auth to 10:bd:18:c8:a6:a0 (try 1/3)
[32505.048000] wlp3s0: send auth to 10:bd:18:c8:a6:a0 (try 2/3)
[32505.048867] wlp3s0: authenticated
[32505.050003] wlp3s0: associate with 10:bd:18:c8:a6:a0 (try 1/3)
[32505.051401] wlp3s0: RX AssocResp from 10:bd:18:c8:a6:a0 (capab=0x111 status=0 aid=1)
[32505.055582] wlp3s0: associated
[32556.378965] wlp3s0: authenticate with 10:bd:18:ce:8a:60
[32556.381245] wlp3s0: direct probe to 10:bd:18:ce:8a:60 (try 1/3)
[32556.580967] wlp3s0: direct probe to 10:bd:18:ce:8a:60 (try 2/3)
[32556.781462] wlp3s0: send auth to 10:bd:18:ce:8a:60 (try 3/3)
[32556.783321] wlp3s0: authenticated
[32556.784452] wlp3s0: associate with 10:bd:18:ce:8a:60 (try 1/3)
[32556.788802] wlp3s0: RX AssocResp from 10:bd:18:ce:8a:60 (capab=0x431 status=0 aid=4)
[32556.797267] wlp3s0: associated
[45325.956967] cfg80211: Calling CRDA to update world regulatory domain
[45329.250593] wlp3s0: authenticate with 10:bd:18:c8:a6:a0
[45329.255077] wlp3s0: send auth to 10:bd:18:c8:a6:a0 (try 1/3)
[45329.455183] wlp3s0: send auth to 10:bd:18:c8:a6:a0 (try 2/3)
[45329.456081] wlp3s0: authenticated
[45329.457127] wlp3s0: associate with 10:bd:18:c8:a6:a0 (try 1/3)
[45329.458496] wlp3s0: RX AssocResp from 10:bd:18:c8:a6:a0 (capab=0x111 status=0 aid=1)
[45329.461633] wlp3s0: associated
[45364.842478] wlp3s0: authenticate with 10:bd:18:ce:8a:60
[45364.844654] wlp3s0: direct probe to 10:bd:18:ce:8a:60 (try 1/3)
[45365.044338] wlp3s0: direct probe to 10:bd:18:ce:8a:60 (try 2/3)
[45365.244859] wlp3s0: send auth to 10:bd:18:ce:8a:60 (try 3/3)
[45365.246770] wlp3s0: authenticated
[45365.247881] wlp3s0: associate with 10:bd:18:ce:8a:60 (try 1/3)
[45365.251576] wlp3s0: RX AssocResp from 10:bd:18:ce:8a:60 (capab=0x431 status=0 aid=4)
[45365.261595] wlp3s0: associated


I'm not sure, but since it's network related;
could it be caused by a temp. network downtime.?


If this is the case then why did my previous Fedora build work continuously without issues even when up for months?

Am actually graphing desktop by snmp and so far reliability is 100%

Also desktop connected to Cisco 2950 switch and laptop connected to Cisco 1142P access point, reliability is 100%. - as far as I'm aware!

Or a network cable that is not properly fitting in the socket,


No, my desktop is 19" rack mounted and has been working fine for over a year with nfs up all the time running Fedora.

the router somehow got reset, asking new IP from ISP... etc.


My router is a Sun Fire V210 running OpenBSD... no chance of resetting or things going funny is 100% reliable - also have snmp confirmation of that.

- additionally am running another Fire V210 with NetBSD that stays mounted to NFS contuously and no dropouts just like when running fedora on my Desktop and laptop previously. All are going through same network infrastructure!

Steps to Reproduce:
1. mount via NFS a share on eth0 network
2. watch -n1 "cat /proc/mounts"
3. unplug eth1 cable
result:
There is a blackout in nfs mount


"watch -n1 "cat /proc/mounts" "

Does this run until stopped manually by interupt? Also does output get shown in dmesg? As this might be the key to debugging the issue - perhaps it's kernel or just buggy firmware for NIC's?

Also, are you by any chance running a program on your server that might
reset the modification time on various directories of your server
(rsync with the -a option is an example of this.)


Am not running "rsync -a" on server but on clients I do transfer information using:

Code: Select all
rsync -avcr --inplace --progress <src> <dst>


- quite often but in the past this has never lead to any issues ever!

[quote]
and the mountpoints in fstab.
can you try, if defining them as -soft or -hard mountpoints helps ?
[/cquote]

I can try that as perhaps something is funny somewhere??


This is all really strange!!! If could just get NFS to be reliable then these machines that I'm running Sabayon on are going to be great :)

Here is output of NetBSD system which hasn't unmounted at all:

Code: Select all
NetBSD 6.0 (GENERIC)

Welcome to NetBSD!

$ df -h
Filesystem                    Size       Used      Avail %Cap Mounted on
/dev/sd0a                      31G       7.0G        22G  23% /
kernfs                        1.0K       1.0K         0B 100% /kern
ptyfs                         1.0K       1.0K         0B 100% /dev/pts
procfs                        8.0K       8.0K         0B 100% /proc
192.168.1.120:/mnt/zfs        3.6T       3.6T        16G  99% /zr
192.168.1.120:/mnt/zfs2       3.5T       3.2T        51G  98% /zr_1


$ uptime
 3:12AM  up 1 day,  8:41, 1 user, load averages: 0.00, 0.06, 0.11


.... this is the behaviour I exptect from NFS!

P.s. thanks for everybodies help so far, I'm really enjoying getting to know and running Sabayon as it's stable and reliable - not to mention lightweight and really fast :)
kayasaman
Young Hen
 
Posts: 29
Joined: Wed Feb 06, 2013 5:14

Re: Kernel Fault?

Postby kayasaman » Sun Feb 10, 2013 5:21

Well.... I don't see any NFS mounts in:

Code: Select all
watch -n1 "cat /proc/mounts"


though the command is running currently on both my Sambayon systems.

In the meantime I devised a shell script to detect if NFS goes down:

Code: Select all

#!/bin/sh
HOSTNM=`hostname`
DATE=$(date +"%d-%b-%y %H:%M")

df -k | grep zr | grep -v grep > /dev/null

if [ $? == 1 ]; then

echo 'Alert Mount Points Offline!' $HOSTNM $DATE

else echo > /dev/null

fi


I guess I could just add a line saying:

Code: Select all
if output ==1
mount <mnt_points>


Might be a quick and dirty fix for now....
kayasaman
Young Hen
 
Posts: 29
Joined: Wed Feb 06, 2013 5:14

Re: Kernel Fault?

Postby kayasaman » Sun Feb 10, 2013 7:42

Bingo, I think I found my answer :)

My script emailed me and this is the result:

Code: Select all
Feb 10 06:27:24 dhcpcd[9150]: eth0: renewing lease of <ip>
Feb 10 06:27:24 dhcpcd[9150]: eth0: acknowledged <ip> from <ip>
Feb 10 06:27:24 dhcpcd[9150]: eth0: leased <ip> for 43200 seconds
Feb 10 06:27:24 NetworkManager[9111]: <info> (eth0): DHCPv4 state changed renew -> renew
Feb 10 06:27:24 NetworkManager[9111]: <info>   address <ip>
Feb 10 06:27:24 NetworkManager[9111]: <info>   prefix 26 (255.255.255.192)
Feb 10 06:27:24 NetworkManager[9111]: <info>   gateway <ip>
Feb 10 06:27:24 NetworkManager[9111]: <info>   nameserver '<ip>'
Feb 10 06:27:24 NetworkManager[9111]: <info>   nameserver '<ip>'
Feb 10 06:27:24 NetworkManager[9111]: <info>   domain name '<domain>'
Feb 10 06:27:24 dbus[8091]: [system] Activating service name='org.freedesktop.nm_dispatcher' (using servicehelper)
Feb 10 06:27:24 dbus[8091]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'


Basically it seems the renewale of DHCP is causing the issue.

At present I have setup my laptop with the 'hard' option so I will also test that with the desktop and see if that helps.
kayasaman
Young Hen
 
Posts: 29
Joined: Wed Feb 06, 2013 5:14

Re: Kernel Fault?

Postby sabayonino » Sun Feb 10, 2013 11:24

"Feb 10 06:27:24 NetworkManager[9111]: <info> (eth0): DHCPv4 state changed renew -> renew"


Does your router assign IP-Lan dynamically ?

When NFS runs , check IP
Code: Select all
# ifconfig


When you resume (or turn on) your PC
Code: Select all
# ifconfig


if for some reason your IP changes , maybe NFS-server can't assign the same IP
User avatar
sabayonino
Sagely Hen
 
Posts: 2576
Joined: Sun Sep 21, 2008 1:12
Location: Italy

Re: Kernel Fault?

Postby kayasaman » Sun Feb 10, 2013 11:38

Does your router assign IP-Lan dynamically ?


My clients are all running dhcp. Only servers have static IP's.... In my network I have round 10 servers and 10 client (desktop/laptop/cell phones) systems.

When NFS runs , check IP
CODE: SELECT ALL
# ifconfig


IP doesn't change. It stays the same on all machines. The dhcp renew is just because the lease expires; ie. it's not infinite.

When you resume (or turn on) your PC
CODE: SELECT ALL
# ifconfig


if for some reason your IP changes , maybe NFS-server can't assign the same IP


hmm... no that's not it as IP isn't changing.

To me it looks like nfs is tied into NetworkManager so when the DHCP lease get's renewed, NetworkManager restarts the interface or something within the protocol stack causing NFS to unmount; this would happen if the RPC service is either restarted or a network packet is causing the "odd" behaviour.

I tried with the 'hard' option in fstab which will send an indefinite amount of re-tries to the server but unfortunately the same thing happened again :(
kayasaman
Young Hen
 
Posts: 29
Joined: Wed Feb 06, 2013 5:14

PreviousNext

Return to Sabayon Linux General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest