Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Broadcom NetXtreme Ethernet card random disconnects on Linux
bnx2: eth0 NIC Copper Link is Down

News

TCP/IP Network Troubleshooting

Recommended Books

Recommended Links

Network Utilities

Redhat Network Configuration Suse 10 network configuration
Network Troubleshooting Tools ifconfig ethtool Ethernet Protocol Autonegotiation Linux multipath Bonding Multiple Network Interfaces
route command Linux Routing DNS nslookup netstat Xinetd Remote Syslog
NFS vsftp pure ftpd rsync NTP RHEL NTP configuration Troubleshooting NTP on Red Hat Linux
  Telnet Protocol VNC on Linux   SSH Samba Sendmail on RHEL Postfix
bnx2: eth0 NIC Copper Link is Down   TCP Performance Tuning  Linux Troubleshooting Tips Admin Horror Stories Humor Etc

Suse 11 SP 1 and other Linux flavor running on the server with Broadcom gigabit ethernet cards have an interesting bug: network connection periodically (but randomly) experience outages that last several seconds. In rare cases connectivity disappears completely and is never restored. In our case the card was four port version:

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

Frequency can vary. In our case this was approximately once or two times a week. Typical syslog fragment looks like

Case 1:

Jan 29 12:37:24 box1 kernel: [6003080.641907] do_IRQ: 3.172 No irq handler for vector (irq -1)
Jan 29 12:37:32 box1 kernel: [6003089.013055] bnx2: eth0 DEBUG: intr_sem[0]
Jan 29 12:37:32 box1 kernel: [6003089.013061] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[40000088]
Jan 29 12:37:32 box1 kernel: [6003089.013067] bnx2: eth0 DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Jan 29 12:37:32 box1 kernel: [6003089.013070] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01f70008]
Jan 29 12:37:32 box1 kernel: [6003089.013073] bnx2: eth0 DEBUG: PBA[00000000]
Jan 29 12:37:32 box1 kernel: [6003089.071200] bnx2: eth0 NIC Copper Link is Down
Jan 29 12:37:35 box1 kernel: [6003092.181218] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Case 2:
Jan 18 15:44:04 box1 kernel: [5065630.877480] do_IRQ: 0.82 No irq handler for vector (irq -1)
Jan 18 15:44:04 box1 kernel: klogd 1.4.1, ---------- state change ----------
Jan 18 15:44:11 box1 kernel: [5065638.250502] bnx2: eth0 DEBUG: intr_sem[0]
Jan 18 15:44:11 box1 kernel: [5065638.250507] bnx2: eth0 DEBUG: EMAC_TX_STATUS[00000008] RPM_MGMT_PKT_CTRL[40000088]
Jan 18 15:44:11 box1 kernel: [5065638.250513] bnx2: eth0 DEBUG: MCP_STATE_P0[0003610e] MCP_STATE_P1[0003610e]
Jan 18 15:44:11 box1 kernel: [5065638.250516] bnx2: eth0 DEBUG: HC_STATS_INTERRUPT_STATUS[01bf0040]
Jan 18 15:44:11 box1 kernel: [5065638.250519] bnx2: eth0 DEBUG: PBA[00000000]
Jan 18 15:44:11 box1 kernel: [5065638.308650] bnx2: eth0 NIC Copper Link is Down
Jan 18 15:44:14 box1 kernel: [5065641.580332] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON

We have two servers on which this is observed. On one those are pretty frequent(each week), on the other no so much (just two instances for the last 6 months).  But on the other this situation is really frequent:

# bzip2 -cd mes*bz2 | grep "Link is Down"
Feb  7 08:54:47 box1 kernel: [1176597.882294] bnx2: eth0 NIC Copper Link is Down
Feb 16 18:22:43 box1 kernel: [1988273.802087] bnx2: eth0 NIC Copper Link is Down
Apr 10 09:18:31 box1 kernel: [6617621.854161] bnx2: eth0 NIC Copper Link is Down
Apr 19 12:38:39 box1 kernel: [7407229.802312] bnx2: eth0 NIC Copper Link is Down
Apr 22 09:24:58 box1 kernel: [7654808.874319] bnx2: eth0 NIC Copper Link is Down
Apr 22 22:39:16 box1 kernel: [7702466.802227] bnx2: eth0 NIC Copper Link is Down
Apr 24 19:00:59 box1 kernel: [7862169.814121] bnx2: eth0 NIC Copper Link is Down
May  7 07:12:33 box1 kernel: [8942863.790287] bnx2: eth0 NIC Copper Link is Down
May  8 15:22:26 box1 kernel: [9058656.814135] bnx2: eth0 NIC Copper Link is Down
May  9 16:55:49 box1 kernel: [9150659.802288] bnx2: eth0 NIC Copper Link is Down
May 21 14:01:58 box1 kernel: [10177028.797703] bnx2: eth0 NIC Copper Link is Down
May 24 14:17:06 box1 kernel: [10437136.862092] bnx2: eth0 NIC Copper Link is Down
Jun  5 15:49:14 box1 kernel: [11479464.790159] bnx2: eth0 NIC Copper Link is Down
Jun 12 10:30:12 box1 kernel: [12065122.790354] bnx2: eth0 NIC Copper Link is Down
Jul 10 08:21:39 box1 kernel: [14476610.813667] bnx2: eth0 NIC Copper Link is Down
Jul 13 10:12:47 box1 kernel: [14742478.790192] bnx2: eth0 NIC Copper Link is Down
Jul 18 16:07:05 box1 kernel: [15195736.814217] bnx2: eth0 NIC Copper Link is Down
Jul 31 19:48:53 box1 kernel: [16332244.802367] bnx2: eth0 NIC Copper Link is Down
Aug  6 15:21:51 box1 kernel: [16834622.814217] bnx2: eth0 NIC Copper Link is Down
Aug 15 12:37:19 box1 kernel: [17602350.789994] bnx2: eth0 NIC Copper Link is Down
Aug 23 11:23:42 box1 kernel: [18289133.790364] bnx2: eth0 NIC Copper Link is Down
Aug 23 14:55:30 box1 kernel: [18301841.802322] bnx2: eth0 NIC Copper Link is Down
Aug 28 14:44:04 box1 kernel: [18733155.790356] bnx2: eth0 NIC Copper Link is Down
Aug 29 14:26:47 box1 kernel: [18818518.882215] bnx2: eth0 NIC Copper Link is Down
Sep 13 15:42:50 box1 kernel: [20119081.874202] bnx2: eth0 NIC Copper Link is Down
Sep 17 13:15:29 box1 kernel: [20455840.814129] bnx2: eth0 NIC Copper Link is Down
Oct  5 14:47:36 box1 kernel: [75141.760185] bnx2: eth0 NIC Copper Link is Down
Oct 10 09:35:14 box1 kernel: [487629.769073] bnx2: eth0 NIC Copper Link is Down
Nov  5 11:26:38 box1 kernel: [2740108.822741] bnx2: eth0 NIC Copper Link is Down
Nov  8 13:50:31 box1 kernel: [3007442.753450] bnx2: eth0 NIC Copper Link is Down
Nov 28 10:23:03 box1 kernel: [648217.869074] bnx2: eth0 NIC Copper Link is Down
Dec  4 15:59:51 box1 kernel: [1185822.121589] bnx2: eth0 NIC Copper Link is Down
Dec 10 06:09:24 box1 kernel: [1667895.065491] bnx2: eth0 NIC Copper Link is Down
Dec 14 15:25:08 box1 kernel: [2046132.868741] bnx2: eth0 NIC Copper Link is Down
Dec 20 10:12:17 box1 kernel: [2544830.746232] bnx2: eth0 NIC Copper Link is Down
Jan  3 10:36:05 box1 kernel: [3753601.877507] bnx2: eth0 NIC Copper Link is Down
Jan  7 16:40:24 box1 kernel: [4120376.092928] bnx2: eth0 NIC Copper Link is Down
Jan  8 14:16:02 box1 kernel: [4197969.222614] bnx2: eth0 NIC Copper Link is Down
Jan  9 12:59:10 box1 kernel: [4279604.816098] bnx2: eth0 NIC Copper Link is Down
Jan  9 17:01:29 box1 kernel: [4294116.717905] bnx2: eth0 NIC Copper Link is Down
Jan 11 09:31:13 box1 kernel: [4439629.059153] bnx2: eth0 NIC Copper Link is Down
Jan 11 15:18:16 box1 kernel: [4460413.248058] bnx2: eth0 NIC Copper Link is Down
Jan 14 14:12:59 box1 kernel: [4715220.529843] bnx2: eth0 NIC Copper Link is Down
Jan 17 16:04:17 box1 kernel: [4980603.066416] bnx2: eth0 NIC Copper Link is Down
# grep "Link is Down" messages
Jan 18 15:44:11 box1 kernel: [5065638.308650] bnx2: eth0 NIC Copper Link is Down
Jan 27 17:13:45 box1 kernel: [5847153.265359] bnx2: eth0 NIC Copper Link is Down
Jan 28 08:14:54 box1 kernel: [5901121.437566] bnx2: eth0 NIC Copper Link is Down
Jan 29 12:37:32 box1 kernel: [6003089.071200] bnx2: eth0 NIC Copper Link is Down

The culprit is version of bnx2 driver used on Suse 11 SP 1. Version is pretty old: Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.4 (Mar 03, 2010) is installed.

cd /lib/modules/2.6.32.59-0.7-default/kernel/drivers/net
# ll bnx*
-rw-r--r-- 1 root root 118840 2012-07-14 13:58 bnx2.ko
-rw-r--r-- 1 root root 376944 2012-07-14 13:58 bnx2x.ko
Hardware-wise we have

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
03:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
04:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)

In our case bnx2x.ko is not needed as we do not have 10GB card, only one gigabit card.

One way to deal with this situation is to disable MSI, the other to update the driver to version 7.4.27 available from the NetXtreme II 1 Gigabit Drivers Broadcom.

Disable MSI solution

One optional parameter "disable_msi" can be supplied as a command line argument to the modprobe command for bnx2. This parameter is used to disable Message Signaled Interrupts (MSI) and MSI-X. The parameter is only valid on 2.6/3.x kernels that support MSI/MSI-X.  By default, the driver will enable MSI or MSI-X if it is supported by the kernel. MSI-X is only supported on 5709 devices.

The driver will run an interrupt test during initialization to determine if MSI/ MSI-X is working. If the test passes, the driver will enable MSI/MSI-X. Otherwise, it will use legacy INTx mode.

Set the "disable_msi" parameter to 1 as shown below to always disable MSI/MSI-X on all NetXtreme II NICs in the system.

modprobe bnx2 disable_msi=1

The parameter can also be set in modprobe.conf :

options bnx2 disable_msi=1

In Suse the preferred place is modprobe.conf.local, not the modprobe.conf . Entries made in modprobe.conf can and will be overwritten during an RPM installation. This could mean that you will loose the setting, when there is e.g. an update in the kernel. modprobe.conf.local is overruling entries in modprobe.conf and will never be overwritten by an update or installation. 

Update driver solution

Broadcom package includes

The current versions of the drivers have been tested on all 2.6.x kernels. The driver may not compile on kernels
older than 2.4.24. Testing was performed mainly on i386 and x86_64 architectures. Only limited testing has been done on some other architectures.

Minor changes to some source files and Makefile may be needed on some kernels.

Additionally, the Makefile will not compile the cnic driver on kernels older than 2.6.16. iSCSI offload is only supported on 2.6.16 and newer kernels. FCoE offload is only supported on 2.6.32 and newer kernels.

The driver is released in two packaging formats: source RPM and compressed tar formats. The structure of the file name for the source RPM is:

netxtreme2-<version>.src.rpm

The file name for the tar archive is:

netxtreme2-<version>.tar.gz.
Identical source files to build the drivers are included in both packages.

Following is a list of files included

Installing Source RPM Package

The following are general guidelines for installing the driver.

1. Install the source RPM package:

rpm -ivh netxtreme2-<version>.src.rpm
2. CD to the RPM path and build the binary driver for your kernel:
cd /usr/src/{redhat,OpenLinux,turbo,packages,rpm ..}
(For RHEL 6.0 and above, cd ~/rpmbuild )
rpm -bb SPECS/netxtreme2.spec
or
rpmbuild -bb SPECS/netxtreme2.spec (for RPM version 4.x.x)

Note that the RPM path is different for different Linux distributions. The driver will be compiled for the running kernel by default. To build the driver for a kernel different than the running one, specify the kernel by defining it in KVER:

rpmbuild -bb SPECS/netxtreme2.spec --define "KVER <kernel version>"
where <kernel version> in the form of 2.x.y-z is the version of another kernel that is installed on the system. 3. Install the newly built package (driver and man page):
rpm -ivh RPMS/<arch>/netxtreme2-<version>.<arch>.rpm
where <arch> is the machine architecture such as i386:
rpm -ivh RPMS/i386/netxtreme2-<version>.i386.rpm

Note that the --force option may be needed on some Linux distributions if conflicts are reported.

The drivers will be installed in the following path:

2.6.16 and newer kernels:

/lib/modules/<kernel_version>/kernel/drivers/net/bnx2.ko 
/lib/modules/<kernel_version>/kernel/drivers/net/bnx2x.ko 
/lib/modules/<kernel_version>/kernel/drivers/net/cnic.ko
Newer RHEL and SLES distros:
/lib/modules/<kernel_version>/updates/bnx2.ko 
/lib/modules/<kernel_version>/updates/cnic.ko 
/lib/modules/<kernel_version>/updates/bnx2x.ko 
/lib/modules/<kernel_version>/updates/bnx2i.ko 
/lib/modules/<kernel_version>/updates/bnx2fc.ko 
4. Unload existing driver if necessary:
rmmod bnx2
rmmod bnx2x
If the cnic driver is loaded, it should also be unloaded along with dependent drivers:
rmmod bnx2fc 
rmmod bnx2i 
rmmod cnic
5. Load the bnx2 driver for the BCM5706/BCM5708/5709/5716 devices:
insmod bnx2.ko
or
modprobe bnx2

To load the bnx2x driver for the BCM57710/BCM57711/BCM57711E/BCM57712 devices:

modprobe bnx2x
a) Reboot the server OR
b) If already loaded, unload "in the box" bnx2, bnx2x, cnic drivers and load newly installed version from netxtreme2-foce package using 'modprobe <DRV-NAME>'

NOTES:

6. To configure network protocol and address, refer to various Linux documentations.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Jan 31, 2013] VMware KB ESX-ESXi host loses network connectivity with a Broadcom bnx2 driver FTQ dump

Jan 21, 2013

This article describes a specific condition. If you observe an FTQ Dump without the Tx Ring Full error also being logged, the workaround and fix described may not be applicable. If you observe a loss of network connectivity to an ESX/ESXi host without these symptoms, see Determining why an host is labeled as Not Responding and multiple virtual machines are labeled as Disconnected (1019082).

Resolution

This issue occurs when the IRQ balancer disables the Message Signaled Interrupt vector (MSI-X) during a chip reset.

The MSI-X vector gets remapped at the beginning of the Base Address Register (BAR). The driver attempts to disable the MSI, but the memory access bit is disabled instead.

The Broadcom bnx2 driver did not complete a chip reset correctly after some condition (eg, a transmission timeout). This results in corruption of the PCI configuration space, which can cause invalid address references (such as 0xffffffff), also seen in dump and logs.

This issue has been observed in bnx2 driver version 2.0.7c.

This issue is resolved in the following asynchronous Broadcom driver releases:

To resolve this issue, ensure that your ESX/ESXi host has one of these driver version installed. To download the latest Broadcom NetXtreme II Ethernet Network Controller driver version, see the VMware Download Center.

To workaround this issue, disable MSI support in the Broadcom bnx2 driver. This causes the driver to fall back to the PIN-IRQ assertion method of raising an interrupt.

To disable MSI:
  1. Log into the ESX/ESXi host's terminal directly or via SSH. For additional information, see Connecting to an ESX host using a SSH client (1019852).
  2. Reconfigure the driver module using the command:

    esxcfg-module -s 'disable_msi=1' bnx2

  3. Reboot the server. The changes are loaded next time the module loads.
  4. After the ESX/ESXi host has finished booting, verify that disable_msi is set by running the command:

    esxcfg-module -g bnx2


Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

NetXtreme II 1 Gigabit Drivers Broadcom

VMware KB ESX-ESXi host loses network connectivity with a Broadcom bnx2 driver FTQ dump

linux - bnx2 and e1000e drivers on RHEL 5.3 detects repeated link

Linux Kernel Documentation PCI MSI-HOWTO.txt



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019