Friday 4 November 2011

Replacing a faulty tape drive

The scenario is you have Veritas Netbackup Enterprise running on a *nix server connected to a tape library. Unfortunately for you media have been constantly freezing, you are unable to label new media and your drives are asking to be cleaned every hour. An engineer from your backup support arrives and replaces the faulty drive so what do you do next?

First thing is to find out what drives you have and what their paths are. So before the engineer inserts the drive make sure you have the serial number of the faulty drive. 
 
[root@backup1:/]# tpautoconf -report_disc
======================= Missing Device (Drive) =======================
 Drive Name = HP.ULTRIUM4-SCSI.001
 Drive Path = /dev/rmt/0cbn
 Inquiry = "HP      Ultrium 4-SCSI  H21Z"
 Serial Number = AABBCCDDEE
 TLD(0) definition Drive = 1
 Hosts configured for this device:
  Host = backup1
======================= Missing Device (Drive) =======================
 Drive Name = HP.ULTRIUM4-SCSI.002
 Drive Path = /dev/rmt/7cbn
 Inquiry = "HP      Ultrium 4-SCSI  H58W"
 Serial Number = FFGGHHIIJJ
 TLD(1) definition Drive = 2
 Hosts configured for this device:
  Host = backup1
======================= New Device (Drive) =======================
 Inquiry = "HP      Ultrium 4-SCSI  H44W"
 Serial Number = KKLLMMNNOO
 Drive Path = /dev/rmt/7cbn
 Found as TLD(1), Drive = 2


Here we note that our faulty drive is drive 2 (FFGGHHIIJJ) and the new drive inserted is also drive 2 (KKLLMMNNOO). Both drives have the same path (/dev/rmt/7cbn) but Netbackup still does not recognise the new drive. So with the following command we replace the drive:

[root@backup1:/]# tpautoconf -replace_drive HP.ULTRIUM4-SCSI.002 -path /dev/rmt/7cbn
Found a matching device in global DB, HP.ULTRIUM4-SCSI.002 on host backup1

After this we restart our Netbackup daemon and test some backups. If you get an error when attempting to UP a drive e.g. IPC Error: Daemon may not be running. This may be due to the fact that if there is a faulty drive still in the library and in not operational Netbackup cannot seem to avoid it. It is best to delete the drive and restart Netbackup:

[root@backup1:/]# tpconfig -delete -drive ID