Hi All..
We have 1 master sever and 4 media server, backups are running fine for 3 media servers but for one media server all backups are failing with Error 82
Active monitor logs
May 31, 2013 12:01:15 PM - requesting resource cvgsolbkpp001-tld-2-cbtsedl2b May 31, 2013 12:01:15 PM - requesting resource nasolbkp200.NBU_CLIENT.MAXJOBS.cvgrhesosp004-bka
May 31, 2013 12:01:15 PM - requesting resource nasolbkp200.NBU_POLICY.MAXJOBS.fs.all.prd.200
May 31, 2013 12:01:52 PM - granted resource nasolbkp200.NBU_CLIENT.MAXJOBS.cvgrhesosp004-bka
May 31, 2013 12:01:52 PM - granted resource nasolbkp200.NBU_POLICY.MAXJOBS.fs.all.prd.200
May 31, 2013 12:01:52 PM - granted resource V25082 May 31, 2013 12:01:52 PM - granted resource CBTSEDL2B_DR010 May 31, 2013 12:01:52 PM - granted resource cvgsolbkpp001-tld-2-cbtsedl2b May 31, 2013 12:01:52 PM - estimated 72281241 kbytes needed May 31, 2013 12:01:55 PM - started process bpbrm (pid=20363) May 31, 2013 12:01:56 PM - connecting May 31, 2013 12:01:57 PM - connected; connect time: 0:00:00 May 31, 2013 12:01:59 PM - mounting V25082 May 31, 2013 12:02:09 PM - mounted V25082; mount time: 0:00:10 May 31, 2013 12:02:09 PM - positioning V25082 to file 1 May 31, 2013 12:02:09 PM - positioned V25082; position time: 0:00:00 May 31, 2013 12:02:09 PM - begin writing May 31, 2013 12:02:39 PM - Info bpbrm (pid=20363) from client cvgrhesosp004-bka: TRV - /opt/app is in a different file system from /opt. Skipping.
May 31, 2013 12:10:44 PM - Error bptm (pid=20379) media manager terminated by parent process May 31, 2013 12:10:46 PM - end writing; write time: 0:08:37 media manager killed by signal (82)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
After deep analysis of BPTM,BPBRM,BPCD,BPBKAR,NBSU logs from symantech i got an update to upgrade kernel in the server..we have same kernel in all our servers , y do we need to upgrade this alone
Below is the advise from symantech do we really need to upgrade kernal for this issue
I was able to download the logs (bptm and bpbrm ) and we can clearly see Socket errors in the bptm log
Bptm shows the socket errors below
*********************************************
main: Setting mud from bp.conf
11:30:07.817 [16777] <2> nbjm_media_request: Passing job control to NBJM, type WRITE/9
11:30:07.817 [16777] <2> nbjm_media_request: old_media_id = , media_id = NULL
11:30:07.818 [16777] <2> RequestInitialResources: starting
11:30:07.818 [16777] <2> RequestInitialResources: started
11:30:07.821 [16777] <2> Orb::init: Created anon service name: NB_16777_-216338603(Orb.cpp:630)
11:30:07.821 [16777] <2> Orb::init: endpointvalue is : pbxiop://1556:NB_16777_-216338603(Orb.cpp:648)
11:30:07.821 [16777] <2> Orb::init: initializing ORB Default_DAEMON_Orb with: Unknown -ORBSvcConfDirective "-ORBDottedDecimalAddresses 0" -ORBSvcConfDirective "static PBXIOP_Factory '-enable_keepalive'" -ORBSvcConfDirective "static EndpointSelectorFactory ''" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory PBXIOP_Factory'" -ORBSvcConfDirective "static Resource_Factory '-ORBProtocolFactory IIOP_Factory'" -ORBSvcConfDirective "static PBXIOP_Evaluator_Factory '-orb Default_DAEMON_Orb'" -ORBSvcConfDirective "static Resource_Factory '-ORBConnectionCacheMax 1024 '" -ORBEndpoint pbxiop://1556:NB_16777_-216338603 -ORBSvcConf /dev/null -ORBSvcConfDirective "static Server_Strategy_Factory '-ORBMaxRecvGIOPPayloadSize 268435456'"(Orb.cpp:759)
11:30:07.846 [16777] <32> Orb::activate: Failed to initialize ORB: check to see if PBX is running or if service has permissions to connect to PBX. Check PBX logs for details
11:30:07.859 [16777] <8> Orb::init: CORBA exception: system exception, ID 'IDL:omg.org/CORBA/BAD_PARAM:1.0'
TAO exception, minor code = 5 (endpoint initialization failure in Acceptor Registry; ECONNREFUSED), completed = NO during orb activation
11:30:07.860 [16777] <16> initializeJmComm: RequestInitialResources : failed to initialize ORB: [BAD_PARAM]. Verify PBX is running and caller has permissions to connect to PBX. See PBX logs for details
11:30:07.860 [16777] <2> RequestInitialResources: retVal = 25 emmStatus = 3000000
11:30:07.860 [16777] <2> RequestInitialResources: returning
11:30:07.860 [16777] <4> nbjm_media_request: Error from RequestMultipleResources, Master nasolbkp200, error 25, resourceAllocated 0
11:30:07.861 [16777] <2> set_job_details: Tfile (1468172): LOG 1370014207 16 bptm 16777 nbjm_media_request() failed: 25, cannot continue with copy 1
The media server has Solaris 10 but very old OS patch
5.10 Generic_142900-03 sun4v sparc SUNW,SPARC-Enterprise-T5220
You will need to update the OS patch due to socket errors showing up in bptm
Due to the socket related error noted in BPTM for 2 different master daemon connection failures
Do the following—
1)
Solaris servers needs SUN KERNEL update June 2011 or later Note the following kernel version on the problem media server / client
# uname -a
142900-03 = Release Date: Dec/09/2009
Oracle / Sun and SYMANTEC identified minimum Solaris Kernel patch to avoid system Socket Management issues and NetBackup daemon issues using the server sockets.
Symantec recommends that you download the patch set dated June 2011 (or newer) from the Oracle Support website.
The patch set contains the following minimum recommended patches:
¦ 118777-17 (SunOS 5.10: Sun GigaSwift Ethernet 1.0 driver patch) ¦ 139555-08 (Kernel patch with C++ library updates).
¦ 142394-01 (Internet Control Message Protocol (ICMP) patch) ¦ 143513-02 (Data Link Admincommandfor Solaris (DLADM) patch) ¦ 141562-02 (Address Resolution Protocol (ARP) patch)
The following patches are recommended for Solaris 10 SPARC with NXGE cards:
¦ 142909-17 (SunOS 5.10: nxge patch)
¦ 143897-03 (Distributed Link Software patch) ¦ 143135-03 (Aggregation patch) ¦ 119963-21 (Change Request ID - 6815915) ¦ 139555-08 (Change Request ID - 6723423)
Reference
Solaris nxge driver and NetBackup communication errors
http://www.symantec.com/business/support/index?page=content&id=TECH128953
Reference
SUN BUG 119963-21 - SunOS 5.10: Shared library patch for C++ This bug describes a lock contention in dtrace area when frequently forking or exiting short-living processes that uses C++ runtime library libCrun.so. Symptoms are a high load in kernel and a delay in fork() and exit() system call.
NBU support has now started to identify this problem for NBU in latest release info— Reference NetBackup 7.5 Release Notes Page 65 Solaris Patches
http://www.symantec.com/docs/DOC5041
2)
Reduce Media Server socket usage
Move NBU internal VNETD socket connections on servers to server loopback interface instead of using VNETD daemon --Add the following line to /usr/openv/netbackup/bp.conf CONNECT_OPTIONS = localhost 1 0 2
No restart needed