HowTo DELETE & RECREATE a Tape Library in Spectrum Protect

Sometimes, we need to delete and recreate all references to a tape library under Spectrum Protect, maybe because we have replaced the HW (even if it is for the same model of library), or because we are running a Disaster Recovery test,
or like on my particular case: because I had a tape library logically partitioned, and I removed the partition and assigned all the tape library resources to my Spectrum Protect server.

The process is not extremely complex, nor trivial, therefore I will post the steps needed to achieve the full change.

My setup is a Spectrum Protect server v8.1.5 running under an LPAR with AIX v7.2, and my Tape Library is an IBM (Now Lenovo) TS3200 with 4 LTO Tape Drives.
Since I had the TS3200 partitioned in 2 logicalis libraries, the name of the tape library was TS3200_LL2 in my Spectrum Protect server (called spectre), and had 2 LTO tape drives assigned (DRIVE3 & DRIVE4).
After the change, the library under Spectrum Protect will be called TS3200, and will have all 4 drives assigned (DRIVE1 to DRIVE4).

I have put an easy to follow index, step-by-step, first I indicate where is the action performed (SP for Spectrum Protect, AIX for the OS, TS3200 for the physical library GUI interface, and MANUAL: hmm, for -pen & paper!-).

INDEX
1.- SP – DELETE TAPE DEVS
2.- SP – DEFINE LIB SP
3.- AIX – DELETE TAPE DEVICES
4.- AIX – RECREATE TAPE DEVICES
5.- AIX – Get the WWNs from the AIX DEVs
6.- TS3200 – Get the WWNs from the TS3200
7.- MANUAL – CORRELATE WWNs y DEVs
8.- AIX – RENAME TAPEDEVs to follow HW’s ORDER
9.- AIX – CHECK DEVs
10.- SP – DEFINE LIBRARY’s CONTROL PATH
11.- SP – DEFINE LIBRARY’s DRIVES
12.- SP – DEFINE LIBRARY’s PATHS
13.- SP – VERIFY (LOGICAL)
14.- SP – REVISE DEVCLASSES
15.- SP – REVISE SCRIPTS
16.- SP – FINAL VERIFY (PHYSICAL)
16.1.- Try the freshly modified scripts
16.2.- Try to use all the tape drives
16.3.- Check Tape Library HW
16.4.- Check and Backup SP Tape Library Definitions

NOTES
A.- What is a Tape Library Control Path
B.- Final Thoughts

bdr

1.- SP – DELETE TAPE DEVS

First, we delete all references to the old devices, so we find out what we have by issuing <query path>, <query drive> and <query library>, and then, we delete the old devices:

Protect: SERVER1>delete path SERVER1 DRIVE4 srctype=server desttype=drive library=TS3200_LL2
ANR1721I A path from SERVER1 to TS3200_LL2 DRIVE4 has been deleted.
Protect: SERVER1>delete path SERVER1 DRIVE3 srctype=server desttype=drive library=TS3200_LL2
ANR1721I A path from SERVER1 to TS3200_LL2 DRIVE3 has been deleted.
Protect: SERVER1>delete path SERVER1 TS3200_LL2 srctype=server desttype=library
ANR1721I A path from SERVER1 to TS3200_LL2 has been deleted.
Protect: SERVER1>delete drive TS3200_LL2 DRIVE4
ANR8412I Drive DRIVE4 deleted from library TS3200_LL2.
Protect: SERVER1>delete drive TS3200_LL2 DRIVE3
ANR8412I Drive DRIVE3 deleted from library TS3200_LL2.
Protect: SERVER1>delete library TS3200_LL2
ANR8410I Library TS3200_LL2 deleted.

2.- SP – DEFINE LIB SP

Then, we define the new library name, it’s only a high level object, as it doesn’t actually link to the HW until we define the Control Path (if you don’t know what a control path is, you can look at the NOTES section A.- What is a Tape Library Control Path at the bottom of this article).

Protect: SERVER1>define library TS3200 libtype=scsi serial=autodetect RESETDrives=yes shared=yes
ANR8400I Library TS3200 defined.

3.- AIX – DELETE TAPE DEVICES

[root@spectre:/]cfgmgr

[root@spectre:/]lsdev -c tape
rmt0 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt1 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt4 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
smc0 Available 14-T1-01 IBM 3573 Tape Medium Changer (FCP)
smc1 Available 14-T1-01 IBM 3573 Tape Medium Changer (FCP)

[root@spectre:/]rmdev -Rdl rmt0
rmt0 deleted
[root@spectre:/]rmdev -Rdl rmt1
rmt1 deleted
[root@spectre:/]rmdev -Rdl rmt3
rmt3 deleted
[root@spectre:/]rmdev -Rdl rmt4
rmt4 deleted
[root@spectre:/]rmdev -Rdl smc0
smc0 deleted
[root@spectre:/]rmdev -Rdl smc1
smc1 deleted

[root@spectre:/]lsdev -c tape
[root@spectre:/]

4.- AIX – RECREATE TAPE DEVICES

[root@spectre:/]cfgmgr

[root@spectre:/]lsdev -c tape
rmt0 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt1 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt2 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
smc0 Available 14-T1-01 IBM 3573 Tape Medium Changer (FCP)

5.- AIX – Get the WWNs from the AIX DEVs

[root@spectre:/]lsdev -c tape -F "name class location physloc description"
rmt0 tape 13-T1-01 U9009.42A.7803790-V5-C13-T1-W2005000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt1 tape 13-T1-01 U9009.42A.7803790-V5-C13-T1-W200B000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt2 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2002000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2008000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
smc0 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2002000E1115B46F-L1000000000000 IBM 3573 Tape Medium Changer (FCP)

6.- TS3200 – Get the WWNs from the TS3200

DEVICE WWNN             WWPN
DRIVE1 2001000E1115B46F-2002000E1115B46F
DRIVE2 2004000E1115B46F-2005000E1115B46F
DRIVE3 2007000E1115B46F-2008000E1115B46F
DRIVE4 200A000E1115B46F-200B000E1115B46F 

7.- MANUAL – CORRELATE WWNs y DEVs

rmt0 - drive2
rmt1 - drive4
rmt2 - drive1
rmt3 - drive3, OK
smc0 - drive1, OK, it's the one with the Control Path & appears as WWN-L1000... and identified as Tape Medium Changer.

8.- AIX – RENAME TAPEDEVs to follow HW’s ORDER

Call me finicky, but I cannot stand to have a device called rmt2 in AIX and DRIVE4 in the tape library.

This doesn’t usually happen when you just deploy a brand new tape library, since the serials & WWNs should be correlative, however, a couple of years down the line, and a couple of hardware replacements done, and the serials/WWNs are not correlative anymore, and therefore cfgmgr just creates the devices following an order which is not what we need or want. Most people will leave them as is, but I cannot, it produces me severe itch ;o)

Looking at the relationship we did on the previous step:

smc0 = OK (smc0 it’s fine, we only have 1 drive with a control path, so we leave it as-is)
rmt2 = rmt1 / drive1  (we need to rename rmt2 as rmt1)
rmt0 = rmt2 / drive2  (rmt0 as rmt2)
rmt3 = OK (Bonus! one of the devices matches the right drive out of pure probability!)
rmt1 = rmt4 / drive4  (and finally, rmt1 as rmt4)

[root@spectre:/]chdev -l rmt1 -a new_name=rmt4
rmt1 changed
[root@spectre:/]chdev -l rmt2 -a new_name=rmt1
rmt2 changed
[root@spectre:/]chdev -l rmt0 -a new_name=rmt2
rmt0 changed

9.- AIX – CHECK DEVs

We will check that the renaming of the devices matches the hardware descriptions:

[root@spectre:/]lsdev -c tape
rmt1 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt2 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt4 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
smc0 Available 14-T1-01 IBM 3573 Tape Medium Changer (FCP)
 [root@spectre:/]lsdev -c tape -F "name class location physloc description"
rmt1 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2002000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt2 tape 13-T1-01 U9009.42A.7803790-V5-C13-T1-W2005000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2008000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
rmt4 tape 13-T1-01 U9009.42A.7803790-V5-C13-T1-W200B000E1115B46F-L0 IBM 3580 Ultrium Tape Drive (FCP)
smc0 tape 14-T1-01 U9009.42A.7803790-V5-C14-T1-W2002000E1115B46F-L1000000000000 IBM 3573 Tape Medium Changer (FCP)

10.- SP – DEFINE LIBRARY’s CONTROL PATH

Protect: SERVER1>define path SERVER1 TS3200 srctype=server desttype=library device=/dev/smc0 online=yes autodetect=yes
ANR1720I A path from SERVER1 to TS3200 has been defined.

11.- SP – DEFINE LIBRARY’s DRIVES

Spectrum protect just uses the drives as a logical object for a device, it’s not until you create the PATHs that the physical tape device get associated with a drive.

Protect: SERVER1>define drive TS3200 DRIVE1
ANR8404I Drive DRIVE1 defined in library TS3200.
Protect: SERVER1>define drive TS3200 DRIVE2
ANR8404I Drive DRIVE2 defined in library TS3200.
Protect: SERVER1>define drive TS3200 DRIVE3
ANR8404I Drive DRIVE3 defined in library TS3200.
Protect: SERVER1>define drive TS3200 DRIVE4
ANR8404I Drive DRIVE4 defined in library TS3200.

12.- SP – DEFINE LIBRARY’s PATHS

Here is where we associate the OS tape devices with the SP drive objects

Protect: SERVER1>define path SERVER1 DRIVE1 srctype=server desttype=drive library=TS3200 online=yes device=/dev/rmt1 autodetect=yes
ANR1720I A path from SERVER1 to TS3200 DRIVE1 has been defined.
Protect: SERVER1>define path SERVER1 DRIVE2 srctype=server desttype=drive library=TS3200 online=yes device=/dev/rmt2 autodetect=yes
ANR1720I A path from SERVER1 to TS3200 DRIVE2 has been defined.
Protect: SERVER1>define path SERVER1 DRIVE3 srctype=server desttype=drive library=TS3200 online=yes device=/dev/rmt3 autodetect=yes
ANR1720I A path from SERVER1 to TS3200 DRIVE3 has been defined.
Protect: SERVER1>define path SERVER1 DRIVE4 srctype=server desttype=drive library=TS3200 online=yes device=/dev/rmt4 autodetect=yes
ANR1720I A path from SERVER1 to TS3200 DRIVE4 has been defined.

13.- SP – VERIFY (LOGICAL)

Verify that everything looks OK from the logical perspective

Protect: SERVER1>q path
Source Name Source Type Destination Destination On-Line
----------- ----------- ----------- ----------- ----------
SERVER1 SERVER TS3200 LIBRARY Yes
SERVER1 SERVER DRIVE1 DRIVE Yes
SERVER1 SERVER DRIVE2 DRIVE Yes
SERVER1 SERVER DRIVE3 DRIVE Yes
SERVER1 SERVER DRIVE4 DRIVE Yes Protect: SERVER1>q drive
Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
TS3200 DRIVE1 LTO Yes
TS3200 DRIVE2 LTO Yes
TS3200 DRIVE3 LTO Yes
TS3200 DRIVE4 LTO Yes Protect: SERVER1>q library
Library Name Library Type Shared TS3200  SCSI       Yes

14.- SP – REVISE DEVCLASSES

Protect: SERVER1>q devclass
Device Cl Device Acc Storage Device Ty Format Est/Max Mount
ass Name ess Strate Pool C pe Capacity Limit
gy ount (MB)
--------- ---------- ------- --------- ------ -------- ------
DBBACK_FI Sequential 0       FILE      DRIVE  51,200.0     32
LEDEV
DISK Random 1
LTO_6 Sequential 3 LTO DRIVE DRIVES
 Protect: SERVER1>q devclass LTO_6 f=d
Device Class Name: LTO_6
Device Access Strategy: Sequential
Storage Pool Count: 3
Device Type: LTO
Format: DRIVE
...
Library: TS3200_LL2
Directory:
Server Name:
... Protect: SERVER1>update devclass LTO_6 library=TS3200
ANR2205I Device class LTO_6 updated.

15.- SP – REVISE SCRIPTS

Protect: SERVER1>q scr
Name            Description                     Managing profile
--------------- ------------------------------------- ----------
AUDIT_LIB       SP - Syncro Tape inventory with Tape library
BACKUP_DB       SP - BACKUP DB & Config
CHECKIN_ALL_LIB SP - CHECKIN ALL Tapes in the Library
CHECKIN_PRIVATE SP - CHECKIN Private Tapes
CHECKIN_SCRATCH SP - CHECKIN Scratch Tapes
CONTAINER_COPY  Run container copy pool operation
CONTAINER_RECL  Run container-copy reclamation
LABEL_TAPES     SP - LABEL New Tapes
PATHS_DOWN      SP - Bring DOWN PATHS & DRIVES of the Tape Library
PATHS_UP        SP - Bring UP PATHS & DRIVES of the Tape Library

As usual, we have a good number of scripts to perform actions with tapes, and as SP forces us to specify the LIB name in each command, we will have to change a few of this scripts to point to the new tape devices.

Also, as we now have 4 drives instead of 2, we will need to modify a couple of scripts to account for the extra tape drives. So, let’s go ahead and change three as an example:

Protect: SERVER1>q scr AUDIT_LIB f=d
...
Name: AUDIT_LIB
Line Number: 5
Command: audit library TS3200_LL2 checklabel=barcode refresh=yes
Last Update by (administrator): CIJALBA
Last Update Date/Time: 06/09/17 10:18:54
Protect: SERVER1>upd scr AUDIT_LIB "audit library TS3200 checklabel=barcode refresh=yes" line=5
ANR1456I UPDATE SCRIPT: Command script AUDIT_LIB updated. Protect: SERVER1>q scr CHECKIN_ALL_LIB f=d
...
Name: CHECKIN_ALL_LIB
Line Number: 10
Command: checkin libvolume TS3200_LL2 status=scratch search=yes checklabel=barcode
Name: CHECKIN_ALL_LIB
Line Number: 20
Command: checkin libvolume TS3200_LL2 status=private search=yes checklabel=barcode
Protect: SERVER1>upd scr CHECKIN_ALL_LIB "checkin libvolume TS3200 status=scratch search=yes checklabel=barcode" line=10
ANR1456I UPDATE SCRIPT: Command script CHECKIN_ALL_LIB updated.
Protect: SERVER1>upd scr CHECKIN_ALL_LIB "checkin libvolume TS3200 status=private search=yes checklabel=barcode" line=20
ANR1456I UPDATE SCRIPT: Command script CHECKIN_ALL_LIB updated. Protect: SERVER1>q scr PATHS_DOWN f=d
...
Name: PATHS_DOWN
Line Number: 1
Command: upd path SERVER1 DRIVE3 srcty=server destt=drive library=TS3200_LL2 online=no
Name: PATHS_DOWN
Line Number: 5
Command: upd path SERVER1 DRIVE4 srcty=server destt=drive library=TS3200_LL2 online=no
Name: PATHS_DOWN
Line Number: 10
Command: upd drive TS3200_LL2 DRIVE3 online=no
Name: PATHS_DOWN
Line Number: 15
Command: upd drive TS3200_LL2 DRIVE4 online=no
Protect: SERVER1>upd scr PATHS_DOWN "upd drive TS3200 DRIVE4 online=no" line=40
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd drive TS3200 DRIVE3 online=no" line=35
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd drive TS3200 DRIVE2 online=no" line=30
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd drive TS3200 DRIVE1 online=no" line=25
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd path SERVER1 DRIVE4 srcty=server destt=drive library=TS3200 online=no" line=20
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd path SERVER1 DRIVE3 srcty=server destt=drive library=TS3200 online=no" line=15
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd path SERVER1 DRIVE2 srcty=server destt=drive library=TS3200 online=no" line=10
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd path SERVER1 DRIVE1 srcty=server destt=drive library=TS3200 online=no" line=5
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.
Protect: SERVER1>upd scr PATHS_DOWN "upd path SERVER1 TS3200 srcty=server destt=library online=no" line=1
ANR1456I UPDATE SCRIPT: Command script PATHS_DOWN updated.

At the end, I had to change a few scripts, but if you want to save yourself some time, or have a lot more scripts than I do, then it will be more efficient to redirect all scripts to a text file and manipulate it from the OS (In fact, this is a Best Practice which I recommend to do from time to time: Export your SP Scripts out of SP).

This is easily done with:

Protect: SERVER1>q scr * f=d > /tmp/scripts.txt
Output of command redirected to file '/tmp/scripts.txt'

And then just do a grep from the OS, you can check for the old name and the new name, util, you have modified all the scripts:

[root@spectre:/tmp]grep -c TS3200 scripts.txt
23
[root@spectre:/tmp]grep -c TS3200_LL2 scripts.txt
0

16.- SP – FINAL VERIFY (PHYSICAL)

16.1.- Try the freshly modified scripts:

Protect: SERVER1>run PATHS_DOWN
ANR1722I A path from SERVER1 to TS3200 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE1 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE2 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE3 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE4 has been updated.
ANR8467I Drive DRIVE1 in library TS3200 updated.
ANR8467I Drive DRIVE2 in library TS3200 updated.
ANR8467I Drive DRIVE3 in library TS3200 updated.
ANR8467I Drive DRIVE4 in library TS3200 updated.
ANR1462I RUN: Command script PATHS_DOWN completed successfully.
Protect: SERVER1>q path
Source Name Source Type Destination Destination On-Line
Name Type
----------- ----------- ----------- ----------- ----------
SERVER1 SERVER TS3200 LIBRARY No
SERVER1 SERVER DRIVE1 DRIVE No
SERVER1 SERVER DRIVE2 DRIVE No
SERVER1 SERVER DRIVE3 DRIVE No
SERVER1 SERVER DRIVE4 DRIVE No
Protect: SERVER1>q drive
Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
TS3200 DRIVE1 LTO No
TS3200 DRIVE2 LTO No
TS3200 DRIVE3 LTO No
TS3200 DRIVE4 LTO No
Protect: SERVER1>run PATHS_UP
ANR1722I A path from SERVER1 to TS3200 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE1 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE2 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE3 has been updated.
ANR1722I A path from SERVER1 to TS3200 DRIVE4 has been updated.
ANR8467I Drive DRIVE1 in library TS3200 updated.
ANR8467I Drive DRIVE2 in library TS3200 updated.
ANR8467I Drive DRIVE3 in library TS3200 updated.
ANR8467I Drive DRIVE4 in library TS3200 updated.
ANR1462I RUN: Command script PATHS_UP completed successfully.
Protect: SERVER1>q path
Source Name Source Type Destination Destination On-Line
Name Type
----------- ----------- ----------- ----------- ----------
SERVER1 SERVER TS3200 LIBRARY Yes
SERVER1 SERVER DRIVE1 DRIVE Yes
SERVER1 SERVER DRIVE2 DRIVE Yes
SERVER1 SERVER DRIVE3 DRIVE Yes
SERVER1 SERVER DRIVE4 DRIVE Yes
Protect: SERVER1>q drive
Library Name Drive Name Device Type On-Line
------------ ------------ ----------- -------------------
TS3200 DRIVE1 LTO Yes
TS3200 DRIVE2 LTO Yes
TS3200 DRIVE3 LTO Yes
TS3200 DRIVE4 LTO Yes
Protect: SERVER1>run AUDIT_LIB
ANR1462I RUN: Command script AUDIT_LIB completed successfully.
Protect: SERVER1>q libv
ANR2034E QUERY LIBVOLUME: No match found using this criteria.
ANS8001I Return code 11.
Protect: SERVER1>run CHECKIN_ALL_LIB
ANR1462I RUN: Command script CHECKIN_ALL_LIB completed successfully.
Protect: SERVER1>q libv
Library Name Volume Name Status Owner Last Use Home El Device
ement Type
------------ ----------- ---------------- ---------- --------- ------- ------
TS3200 000001L6 Private SERVER1 4,118 LTO
TS3200 000004L6 Private SERVER1 4,123 LTO
TS3200 000006L6 Private SERVER1 4,119 LTO
TS3200 000007L6 Private SERVER1 4,125 LTO
TS3200 000009L6 Private SERVER1 4,136 LTO
TS3200 000010L6 Scratch 4,102 LTO
TS3200 000011L6 Private SERVER1 4,139 LTO
...

The scripts work OK.
OK!!!!!!!!!!

16.2.- Try to use all the tape drives:

If we are lucky, SP might launch a Space Reclamation process which will use 2 drives, otherwise by using a MOVE DATA command, we will use 2 tape drives at the same time one for READ and another for WRITE, so by issuing a couple of MOVE DATAs, we will try the 4 tape drives at once.

Protect: SERVER1>q vol stg=tapepool
Volume Name Storage Poo Device Cla Estimated Pct U Volume S
l Name ss Name Capacity til tatus
------------------------ ----------- ---------- --------- ----- --------
000002L6 TAPEPOOL LTO_6 8.9 T 10.7 Filling
000009L6 TAPEPOOL LTO_6 5.7 T 0.4  Filling
000014L6 TAPEPOOL LTO_6 9.5 T 0.0  Full
000019L6 TAPEPOOL LTO_6 5.7 T 0.0  Filling
000024L6 TAPEPOOL LTO_6 5.7 T 8.2  Filling
000031L6 TAPEPOOL LTO_6 5.7 T 37.9 Filling
...
Protect: SERVER1>move data 000014L6
ANR2232W This command will move all of the data stored on volume 000014L6 to other volumes within the same storage pool; the data
will be inaccessible to users until the operation completes.
Do you wish to proceed? (Yes (Y)/No (N)) Y
ANS8003I Process number 5 started.
Protect: SERVER1>q pr
Process Process Description Process Status
Number
-------- -------------------- -----------------------------------------------
5 Move Data Volume 000014L6 (storage pool TAPEPOOL), Target Pool TAPEPOOL, Moved Files: 0, 
Moved Bytes: 0 bytes, Deduplicated Bytes: 0 bytes, Unreadable Files: 0, Unreadable Bytes: 0
 bytes. Current Physical File (bytes): 2,033 bytes Waiting for mount of scratch volume (1 seconds).
Protect: SERVER1>q req
ANR8352I Requests outstanding:
ANR8308I 001: LTO volume 000014L6 is required for use in library TS3200; CHECKIN LIBVOLUME required within 20 minutes.
Protect: SERVER1>reply 1
ANR8499I Command accepted.

Update the status of the tapes in the library to be READWRITE (depends how how many tapes you have, careful since in the following example, I have made READW all my tapes, but might not be wise for your system if you have a big tape library, or different tape libraries –a better example should have been a and update each vol individually, but I am pressed for time ;o) –)

Protect: SERVER1>upd vol * access=readw
ANR2207I Volume 000001L6 updated.
ANR2207I Volume 000002L6 updated.
ANR2207I Volume 000003L6 updated.
...
ANR2207I Volume 000061L6 updated.
ANR2207I Volume 000062L6 updated.
ANR2207I Volume 000063L6 updated.

After a while, all the 4 drives had a tape mounted and where doing operations, so: The drives work fine.
OK!!!!!!!!!!

16.3.- Check Tape Library HW

Bad tapes or problems with barcodes can be checked using the SHOW SLOTS undocumented cmd:

Protect: SERVER1>show slots ts3200
PVR slot information for library TS3200.
Library : TS3200
Product Id : 3573-TL
Support module : 2
Mount count : 1
Drives : 4
Slots : 44
Changers : 1
Import/Exports : 3
.
Device : /dev/smc0
.
Drive 0, element 256
Drive 1, element 257
Drive 2, element 258
Drive 3, element 259
.
Changer 0, element 1
.
ImpExp 0, element number 16
ImpExp 1, element number 17
ImpExp 2, element number 18
Slot 0, status Allocated, element number 4096, barcode present, barcode value , devT=LTO, mediaT=436, elemT=ANY
Slot 1, status Allocated, element number 4097, barcode present, barcode value , devT=LTO, mediaT=436, elemT=ANY
Slot 2, status Allocated, element number 4098, barcode present, barcode value , devT=LTO, mediaT=436, elemT=ANY
...
Slot 42, status Allocated, element number 4138, barcode present, barcode value , devT=LTO, mediaT=436, elemT=ANY
Slot 43, status Allocated, element number 4139, barcode present, barcode value , devT=LTO, mediaT=436, elemT=ANY
.
slot element range 4096 - 4139

No problems in tapes or barcodes found.
OK!!!!!!!!!!

16.4.- Check and Backup SP Tape Library Definitions:

Now that we have redefined the tape library configuration and loaded the tapes, Issue a BACKUP DEVCONFIG and a BACKUP VOLHIST.

Protect: SERVER1> BACKUP VOLHISTORY
ANR2463I BACKUP VOLHISTORY: Server sequential volume history information was written to all configured history files. Protect: SERVER1> BACKUP DEVCONFIG
ANR2394I BACKUP DEVCONFIG: Server device configuration information was written to all device configuration files.

We should go to Spectrum Protects installation directory (by default /home/tsminst1), and look at the devconfig file (devconf.dat).

[root@spectre:/home/tsminst1]cat devconf.dat
/* Device Configuration */
DEFINE DEVCLASS DBBACK_FILEDEV DEVT=FILE FORMAT=DRIVE SHARE=NO MAXCAP=52428800K MOUNTL=32 DIR=/tsminst1/TSMbkup00,/tsminst1/TSMbkup01
DEFINE DEVCLASS LTO_6 DEVT=LTO FORMAT=DRIVE MOUNTL=DRIVES MOUNTWAIT=20 MOUNTRETENTION=5 PREFIX=ADSM LIBRARY=TS3200 WORM=NO DRIVEENCRYPTION=ALLOW LBPROTECT=NO
DEFINE SERVER SPECTRE COMMMETHOD=TCPIP HLADDRESS=10.1.1.207 LLADDRESS=1500
SET SERVERNAME SERVER1
DEFINE LIBRARY TS3200 LIBTYPE=SCSI WWN="2001000E1115B46F" SERIAL="A0L4U78W5927_LL0" SHARED=YES AUTOLABEL=NO RESETDRIVE=YES
DEFINE DRIVE TS3200 DRIVE1 ELEMENT=256 ONLINE=Yes WWN="2001000E1115B46F" SERIAL="A0WT025496"
DEFINE DRIVE TS3200 DRIVE2 ELEMENT=257 ONLINE=Yes WWN="2004000E1115B46F" SERIAL="A0WT038765"
DEFINE DRIVE TS3200 DRIVE3 ELEMENT=258 ONLINE=Yes WWN="2007000E1115B46F" SERIAL="A0WT046112"
DEFINE DRIVE TS3200 DRIVE4 ELEMENT=259 ONLINE=Yes WWN="200A000E1115B46F" SERIAL="A0WT045812"
/* LIBRARYINVENTORY SCSI TS3200 000001L6 4118 101*/
/* LIBRARYINVENTORY SCSI TS3200 000004L6 4123 101*/
...
/* LIBRARYINVENTORY SCSI TS3200 000062L6 4121 101*/
/* LIBRARYINVENTORY SCSI TS3200 000063L6 4105 101*/
DEFINE PATH SERVER1 TS3200 SRCTYPE=SERVER DESTTYPE=LIBRARY DEVICE=/dev/smc0 ONLINE=YES
DEFINE PATH SERVER1 DRIVE1 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=TS3200 DEVICE=/dev/rmt1 ONLINE=YES
DEFINE PATH SERVER1 DRIVE2 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=TS3200 DEVICE=/dev/rmt2 ONLINE=YES
DEFINE PATH SERVER1 DRIVE3 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=TS3200 DEVICE=/dev/rmt3 ONLINE=YES
DEFINE PATH SERVER1 DRIVE4 SRCTYPE=SERVER DESTTYPE=DRIVE LIBRARY=TS3200 DEVICE=/dev/rmt4 ONLINE=YES
SERVERBACKUPNODEID 1

OK!!!!!!!!!!

NOTES

A.- What is a Tape Library Control Path

A Tape Library Control Path it’s a logical path for a SCSI Medium Changer to send commands over to tape drives.

Each tape library has at least one control path, and for example on an AIX OS, the tape drive with the control path, will create 2 devices, one for Tape Drive and one for the Tape Medium Changer (in this example rmt1 and smc0 are really the same physical device):

[root@spectre:/tmp]lsdev -c tape
rmt1 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt2 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt3 Available 14-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
rmt4 Available 13-T1-01 IBM 3580 Ultrium Tape Drive (FCP)
smc0 Available 14-T1-01 IBM 3573 Tape Medium Changer (FCP)

There is a catch here, and that is: if the tape drive with the control path is taken down, all the tape librarie’s drives will stop working, as the communications bus with the library is down.

In that case, and while we replace/repair the hardware, we will have to change the control path over to another drive, and perhaps reconfigure the device in AIX and Spectrum Protect.

We can have more than one control path in a library to eliminate single points of failure (as this is a clear SPOF in a several drives tape library), however it comes at a price, as at least in IBM libraries, an extra licence must be purchased, to enable Control Path Failover (CPF). In some cases, having CPF also enables Data Path Failover (DFP), which includes load balancing of the HBAs.

B.- Final Thoughts

Well, Phewww, now that was a bit of a long ride, wasn’t it? It’s not actually complex, it’s just a matter or order, and if done in the right sequence (and after having devoured a few red books, and technical guides) it’s pretty straight-forward.

Just try not perform this procedure very often, as it does take a few hours work, and while the process is being done Spectrum Protect cannot use the library (you can do it quicker if you script the lot, of course, and for Disaster Recovery it is recommended, because one or two hours saved in time might make a huge difference).

I hope you have enjoyed the procedure, and any comments or steps which can be done better are always welcome, so if you have suggestions, post them here to <<Give the sysadmin a shout!>>

Advertisements

New Nagios/NagiosXI plugin – Check Kaspersky Security for Linux Mail Server (KLMS)

A few days ago I submitted to Nagios Exchange a new plugin to check KLMS health, so if you use Kaspersky Security for Linux Mail Server, it might be of use to you.

The plugin is a bash Shell script, that reports the following status:

OK:       All KLMS Databases are Up to Date, KLMS running, LDAP connected.
WARNING:  Database Outdated: [ AntiVirus | AntiSPAM | AntiPhishing ].
CRITICAL: Database Obsolete: [ AntiVirus | AntiSPAM | AntiPhishing ], KLMS not running, LDAP not connected.
Error:    KLMS couldn't be contacted, or not installed (check your PATH or install KLMS software).

NagiosXI installation instructions follow:

1.- Change your command definition in the nrpe.cfg

   nagios@pmimta:/usr/local/nagios/etc$ sudo cp -p nrpe.cfg nrpe.cfg.20180808 	<-- always make a backup first!
   nagios@pmimta:/usr/local/nagios/etc$ sudo vi nrpe.cfg 			<-- edit your nrpe.cfg
      Add:
	command[check_klms]=/usr/local/nagios/libexec/check_klms.sh status

2.- Edit sudoers file:

   sudo visudo

3.- Add permissions for the klms-control binary to nagios:

   Defaults:nagios !requiretty
   nagios ALL=NOPASSWD: /opt/kaspersky/klms/bin/klms-control

4.- Restart the nrpe daemon:

   nagios@pmimta:/usr/local/nagios/etc$ ps -ef |grep nrpe
   nagios 1476 1 0 Aug05 ? 00:00:02 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
   nagios@pmimta:/usr/local/nagios/etc$ sudo kill -9 1476
   nagios@pmimta:/usr/local/nagios/etc$ sudo /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
   nagios@pmimta:/usr/local/nagios/etc$ ps -ef |grep nrpe
   nagios 31928 1 0 12:11 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d

or, if nrpe is under xinetd:

   service xinetd restart

5.- Verify nrpe log:

   nagios@pmimta:/usr/local/nagios/etc$ journalctl --since=today | grep nrpe
   Aug 08 12:11:41 pmimta sudo[31926]: sistemas : TTY=pts/0 ; PWD=***** ; USER=nagios ; COMMAND=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
   Aug 08 12:11:41 pmimta nrpe[31928]: Starting up daemon
   Aug 08 12:11:41 pmimta nrpe[31928]: Server listening on 0.0.0.0 port 5666.
   Aug 08 12:11:41 pmimta nrpe[31928]: Server listening on :: port 5666.
   Aug 08 12:11:41 pmimta nrpe[31928]: Warning: Daemon is configured to accept command arguments from clients!
   Aug 08 12:11:41 pmimta nrpe[31928]: Listening for connections on port 0
   Aug 08 12:11:41 pmimta nrpe[31928]: Allowing connections from: 127.0.0.1, nagiosxiserver

Now we will only need to define a new command under NagiosXI and a service to use that command, and we will have the check working in our NagiosXI:

KLMS Warning 01

The plugin can be downloaded from Nagios Exchange.

Nuevo Plugin para Nagios – Check Kaspersky Security for Linux Mail Server (KLMS)

Hace unos días he subido a Nagios Exchange un plugin para comprobar el estado de KLMS, por lo que si tenéis KLMS instalado, os puede ser bastante útil.

Es un script en Shell bash que reporta los siguientes estados:

OK:                All KLMS Databases are Up to Date, KLMS running, LDAP connected.
WARNING: Database Outdated: [ AntiVirus | AntiSPAM | AntiPhishing ].
CRITICAL:   Database Obsolete: [ AntiVirus | AntiSPAM | AntiPhishing ], KLMS not running, LDAP not connected.
Error:          KLMS couldn’t be contacted, or not installed (check your PATH or install KLMS software).

 

Las instrucciones de instalación para NagiosXI, son las siguientes:

1.- Bajarse el script de nagios (check_klms.sh) en la ruta de los controles nagios (por defecto /usr/local/nagios/libexec), y añadir la definición de un nuevo command al fichero nrpe.cfg de la máquina que queremos monitorizar:

nagios@pmimta:/usr/local/nagios/etc$ sudo cp -p nrpe.cfg nrpe.cfg.20180822    <-- siempre haz backup 1º!
nagios@pmimta:/usr/local/nagios/etc$ sudo vi nrpe.cfg                         <-- edita tu nrpe.cfg
 Añadir:
   command[check_klms]=/usr/local/nagios/libexec/check_klms.sh status

2.- Editar el control de permisos de sudo:

nagios@pmimta:/home/nagios$ sudo visudo

3.- Dar a nagios permisos para ejecutar klms-control (binario de admin de KLMS):

Defaults:nagios !requiretty 
nagios ALL=NOPASSWD: /opt/kaspersky/klms/bin/klms-control

4.- Reiniciar el demonio NRPE:

nagios@pmimta:/usr/local/nagios/etc$ ps -ef |grep nrpe
nagios 1476 1 0 Aug05 ? 00:00:02 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d 
nagios@pmimta:/usr/local/nagios/etc$ sudo kill -9 1476
nagios@pmimta:/usr/local/nagios/etc$ sudo /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
nagios@pmimta:/usr/local/nagios/etc$ ps -ef |grep nrpe
nagios 31928 1 0 12:11 ? 00:00:00 /usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d

o, si nuestro nrpe está corriendo bajo xinetd:   service xinetd restart

5.- Verificar el log nrpe, para ver que todo funciona OK:

nagios@pmimta:/usr/local/nagios/etc$ journalctl --since=today | grep nrpe
Aug 08 12:11:41 pmimta sudo[31926]: sistemas : TTY=pts/0 ; PWD=***** ; USER=nagios ; COMMAND=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -d
Aug 08 12:11:41 pmimta nrpe[31928]: Starting up daemon
Aug 08 12:11:41 pmimta nrpe[31928]: Server listening on 0.0.0.0 port 5666.
Aug 08 12:11:41 pmimta nrpe[31928]: Server listening on :: port 5666.
Aug 08 12:11:41 pmimta nrpe[31928]: Warning: Daemon is configured to accept command arguments from clients!
Aug 08 12:11:41 pmimta nrpe[31928]: Listening for connections on port 0
Aug 08 12:11:41 pmimta nrpe[31928]: Allowing connections from: 127.0.0.1, nagiosxiserver

Una vez hemos puesto el control y los permisos pertinentes en el servidor a comprobar, ya solo queda definir un nuevo command en NagiosXI y un service que use dicho command, y ya tendremos el control funcionando en nuestro NagiosXI:


Podéis bajaros el script desde la web de Nagios Exchange

 

 

Revisando Vulnerabilidades en AIX

IBM tiene una herramienta para reportar vulnerabilidades en sus productos, llamada Fix Level Recommendation Tool o FLRT (herramienta de recomendación de parches).

https://www-304.ibm.com/support/customercare/flrt/

Para AIX en particular, disponemos del  Security APAR Information, o el Security Bulletin information for AIX 7.2, 7.1, 6.1, 5.3, and VIOS

https://www-304.ibm.com/webapp/set2/flrt/doc?page=security

Para facilitar la comprobación de nuestros sistemas, tenemos el script en korn shell flrtvc.ksh, el cual nos obsequia con informes en varios tipos de formatos (CSV para importar en excel u otros, personalizados, compacto, detallado, etc).

Los prerequisitos de este script son los siguientes:

1.- access to internet to retrieve the latest vulnerability CSV listing (aparCSV)
2.- wget
3.- curl

Los puntos 2 y 3 son fácilmente obtenibles, si hemos configurado yum en nuestro AIX (yum install wget curl).

Algunos ejemplos de ejecución del script flrtvc:

[root@aixtest:/home/admin]./flrtvc.ksh | cut -c 1-110
Fileset|Current Version|Type|EFix Installed|Abstract|Unsafe Versions|APARs|Bulletin URL|Download URL
bos.acct|7.2.1.0|sec||NOT FIXED - (caccelstat) Vulnerabilities in bellmail / caccelstat / iostat / l
bos.acct|7.2.1.0|sec||NOT FIXED - (iostat) Vulnerabilities in bellmail / caccelstat / iostat / lquer
bos.acct|7.2.1.0|sec||NOT FIXED - (vmstat) Vulnerabilities in bellmail / caccelstat / iostat / lquer
bos.cluster.rte|7.2.1.0|hiper||NOT FIXED - CAA:SLOW GOSSIP RECEIPT ON BOOT MAY CAUSE PARTITIONED CLU
bos.mp64|7.2.1.1|hiper||NOT FIXED - getsockname() returns incorrect NameLength|7.2.1.0-7.2.1.1|IV914
bos.mp64|7.2.1.1|hiper||NOT FIXED - PROBLEMS CAN OCCUR WITH THREAD_CPUTIME AND THREAD_CPUTIME_FAST|7
bos.mp64|7.2.1.1|hiper||NOT FIXED - CRASH OR POTENTIAL DATA LOSS AFTER REMOVING LARGE JFS2 FILES ON
bos.mp64|7.2.1.1|hiper||NOT FIXED - SYSTEM CRASH WHEN USING PROCFS FOR PROCESSES CLOSING MANY FILES|
bos.mp64|7.2.1.1|sec||NOT FIXED - IBM has released AIX and VIOS iFixes in response to the vulnerabil
bos.net.tcp.bind_utils|7.2.1.1|sec||NOT FIXED - There is a vulnerability in BIND that impacts AIX.|7
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - There is a vulnerability in bellmail that impacts A
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - Vulnerabilities in BIND impact AIX|7.2.1.0|CVE-2016
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - There are two vulnerabilities in BIND that impact A
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - Vulnerability in bellmail affects AIX|7.2.1.0-7.2.1
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - (bellmail) Vulnerabilities in bellmail / caccelstat
bos.net.tcp.ntp|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in NTPv3 and NTPv4 that
bos.net.tcp.ntpd|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in NTPv3 and NTPv4 that
bos.net.tcp.tcpdump|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in tcpdump that impa
bos.rte.archive|7.2.1.0|sec||NOT FIXED - (restbyinode) Vulnerabilities in bellmail / caccelstat / io
bos.rte.lvm|7.2.1.0|sec||NOT FIXED - (lquerypv) Vulnerabilities in bellmail / caccelstat / iostat /
devices.fcp.disk.rte|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERRORS WITH CERTA
devices.pci.77102224.com|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERRORS WITH C
devices.pciex.df1060e214103404.com|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERR
devices.vdevice.ibm.l-lan.rte|7.2.1.0|hiper||NOT FIXED - CRASH IN VIOENT_INIT_LS_TIMER WHEN POLL_UPL
devices.vdevice.ibm.vfc-client.rte|7.2.1.0|hiper||NOT FIXED - Potential data loss using Virtual FC w
java7_64.jre|7.0.0.370|sec||NOT FIXED - There are multiple vulnerabilities in IBM SDK Java Technolog
java7_64.sdk|7.0.0.370|sec||NOT FIXED - Multiple vulnerabilities in IBM Java SDK affect AIX|<7.0.0.4
java7_64.sdk|7.0.0.370|sec||NOT FIXED - Multiple vulnerabilities in IBM Java SDK affect AIX|<7.0.0.5
java7_64.sdk|7.0.0.370|sec||NOT FIXED - There are multiple vulnerabilities in IBM SDK Java Technolog
openssh.base.client|6.0.0.6201|sec||NOT FIXED - AIX OpenSSH Vulnerability|4.0.0.5200-6.0.0.6201|CVE-
openssh.base.client|6.0.0.6201|sec||NOT FIXED - Vulnerabilities in OpenSSH affect AIX|4.0.0.5200-6.0
openssl.base|1.0.2.800|sec||NOT FIXED - There is a vulnerability in OpenSSL used by AIX|1.0.2.500-1.
openssl.base|1.0.2.800|sec||NOT FIXED - Vulnerability in OpenSSL affects AIX|1.0.2.500-1.0.2.1100|CV
...
[root@aixtest:/home/admin]./flrtvc.ksh -v | pg
////////////////////////////////////////////////////////////
// IBM FLRTVC (v0.7.3) Report
// Server: aixtest
// Date: Fri Feb 9 10// Report by: root
// Vulnerable Filesets: 22
// Total Vulnerabilities: 54
// Total Fixes (not shown): 22
////////////////////////////////////////////////////////////

--------------------------------------------------------------------------------
bos.acct - 7.2.1.0 - Vulnerabilities (3)
--------------------------------------------------------------------------------

(1) NOT FIXED - (caccelstat) Vulnerabilities in bellmail / caccelstat / iostat / lquerypv / restbyinode / vmstat affect AIX (CVE-2017-1692)

Type: sec
Score: CVE-2017-1692:8.4
Versions: 7.2.1.0-7.2.1.0
APARs/CVEs: IV97811
Last Update: 02/05/2018
Bulletin: http://aix.software.ibm.com/aix/efixes/security/suid_advisory.asc
Download: ftp://aix.software.ibm.com/aix/efixes/security/suid_fix.tar
Fixed In: 7200-01-04

(2) NOT FIXED - (iostat) Vulnerabilities in bellmail / caccelstat / iostat / lquerypv / restbyinode / vmstat affect AIX (CVE-2017-1692)
Type: sec
Score: CVE-2017-1692:8.4
Versions: 7.2.1.0-7.2.1.1
APARs/CVEs: IV97898
Last Update: 02/05/2018
Bulletin: http://aix.software.ibm.com/aix/efixes/security/suid_advisory.asc
Download: ftp://aix.software.ibm.com/aix/efixes/security/suid_fix.tar
Fixed In: 7200-01-04
...

Francamente, yo la encuentro una herramienta fantástica, que puede ahorrarnos un montón de tiempo cuando necesitamos efectuar un control de vulnerabilidades en alguno de nuestros sistemas.

El listado de parámetros completo de la herramienta es el siguiente:

Usage flrtvc: Change delimiter for compact reporting
 ./flrtvc.ksh -d '||'

Usage flrtvc: Generate full reporting (verbose mode)
 ./flrtvc.ksh -v

Usage flrtvc: Choose custom apar.csv file to use
 ./flrtvc.ksh -f myfile.csv

Usage flrtvc: Only show specific filesets in verbose mode
 ./flrtvc.ksh -vg printers

Usage flrtvc: Show only hiper results
 ./flrtvc.ksh -t hiper

Usage flrtvc: Custom lslpp and emgr outputs
 ./flrtvc.ksh -l lslpp.txt -e emgr.txt

Flags:

-d = Change delimiter for compact reporting
-f = Enter a custom aparCSV file in local filesystem
-q = Quiet mode, hide compact reporting header
-s = Skip download and locate 'apar.csv' filename in current directory
-v = Verbose, full report (for piping to email)
-g = Filter filesets for specific phrase, useful for verbose mode
-t = Type of APAR [hiper | sec]
-l = Enter a custom LSLPP output file, must match lslpp -Lqc
-e = Enter a custom EMGR output file, must match emgr -lv3
-x = Skip EFix processing
-a = Show all fixed and non-fixed HIPER/Security vulnerabilities.

SuSe VM Migrated from VMware

SuSe VMs migrated from VMware to other platforms (OVM, Xen, KVM, Ravello) check for compliance on startup, and give the following message (or simmilar):

The profile does not allow you to run the products on this system.

Hypervisor used current value: 'xen' must be one of: 'vmware'.

Forcing you to go into the console and press any key to accept the warning disclaimer that the actual setup is not supported by Suse.

OVM - Suse VMware to OVM migrations

While this might be good to know the first time you migrate a VM, it’s a disguised “Nagging Screen“, which limits the use of this VMs in a production setup, as reboots require manual intervention. And as we know, sometimes is not that easy to recreate the VM from fresh in the supported setup.

To bypass this compliance check, we should find out the script which is giving us the error, so we can decide how to bypass it:

1.- First, we check the release version of our troubled system, so we can write down the solution for this particular version (in case different versions have diff. solutions).

SUSETEST:~ # cat /etc/*release*
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 4
LSB_VERSION="core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64"
cat: /etc/lsb-release.d: Is a directory
NAME="SLES"
VERSION="11.4"
VERSION_ID="11.4"
PRETTY_NAME="SUSE Linux Enterprise Server 11 SP4"
ID="sles"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:11:4"

2.- Second, we will try to find the script that gives us the nasty message, by launching a brute-force grep on /usr/bin:

SUSETEST:~ # grep "profile does not allow you" /usr/bin/*
isCompliant: print __("The profile does not allow you to run the products on this system.\n");

3.- Third, we have found one file called isCompliant on /usr/bin, so we will find more about it:

SUSETEST:~# file /usr/bin/isCompliant
/usr/bin/isCompliant: a /usr/bin/perl -w script text

It is a perl script, we can check to see if it has manual page:

SUSETEST:~# man isCompliant

ISCOMPLIANT(1) ISCOMPLIANT(1)

NAME
isCompliant

SYNOPSIS
isCompliant [options]

DESCRIPTION
"isCompliant" check a profile with the system where products gets installed.

OPTIONS
--quiet -q
Do not print messages on stdout or stderr. Only the exit code indicate if the compliance check was successfull or
not.

--debug -d
Turn on the debug mode.

AUTHORS and CONTRIBUTORS
Duncan Mac-Vicar Prett, Michael Calmer

LICENSE
Copyright (c) 2011 SUSE LINUX Products GmbH, Nuernberg, Germany.

Or Help info:

SUSETEST:~# /usr/bin/isCompliant -h
usage: /usr/bin/isCompliant [--quiet|-q]
/usr/bin/isCompliant [--debug|-d]
/usr/bin/isCompliant [--help|-h|-?]
Options:
-h -? [--help] show this help
-q [--quiet] print no messages
-d [--debug] print debug messages
isCompliant check a profile with the system where products gets installed.
isCompliant exit with 0 if the system is compliant to the product profile.
Otherwise it exist with 1

Looks like it’s just a script that checks certain compliance requirements and returns our infamous message, and on the help it says that it returns a RC=0 when a system is compliant, and a RTC=1 when it’s not.

We also see, that running it with the -q flag, it will only run and return a RC, without displaying messages on screen.

OK, knowing this info, let’s invoke it, and check it out:

SUSETEST:~# /usr/bin/isCompliant
The profile does not allow you to run the products on this system.

Proceeding to run this installation will leave you in an
unsupported state and might impact your compliance requirements.

The following requirements are not fulfilled on this system:
* Hypervisor used (current value: 'xen'); must be one of: 'vmware'

SUSETEST:~# echo $?
1

SUSETEST:~# /usr/bin/isCompliant -q
SUSETEST:~# echo $?
1

Bingo! This is definitely our boot pausing friend. It gives the error message, and returns a RC=1. But it’s not the one that forces us to press a key if the system is not
compliant, so it must be invoked by another script at boot time.

Now we could do two things, one is replacing this script for our own “compliance checker” and the other is to find the isCompliant invoking script and change it there, just
in that case replacing the original isCompliant breaks something else further down the line.

My recommended method is Solution 2, but the fastest to implement is Solution 1.

Solution 1.- Replace isCompliant with our own brew.

Before replacing the original script, we will keep the original, just in case it’s needed further on, and to have a rollback option.

SUSETEST:~# cp -p /usr/bin/isCompliant /usr/bin/isCompliant.orig

Now we will just replace the script with a simple “exit 0”, that will allways return OK in any case.

SUSETEST:~# echo "exit 0" > /usr/bin/isCompliant

 

Solution 2.- Modify isCompliant boot invoking script.

First we need to find out the name of the isCompliant invoking scrpits, so we launch another brute-force grep, this time at /etc/init.d folders:

SUSETEST:~# grep "/usr/bin/isCompliant" /etc/init.d/*
/etc/init.d/boot.compliance: MSG=`/usr/bin/isCompliant`

And we find it, it’s called boot.compliance (so it really was pretty well self-documented).

We check the startup script, and we find the start sequence:

case "$1" in
start|restart|force-reload)
# Check if we're running in inst-sys or stage 2 of the installation - if so,
# the check must not be performed as there's already a check in YaST
if test -f /var/lib/YaST2/runme_at_boot ; then
exit
fi
echo -n "Check if the profiles matches the system"

MSG=`/usr/bin/isCompliant`
CODE=$?
if [ "$CODE" != "0" ]; then
splash=""
# Switch bootsplash to verbose mode to make text messages visible.
if test -f /proc/splash ; then
read splash < /proc/splash echo "verbose" > /proc/splash
fi

clear

echo -e "\n"
echo -e "=========================================================================="
echo -e "\n"
echo -e "$MSG"
echo -e "\n"
echo -e "Press any key to proceed with booting."
echo -e "\n"
echo -e "=========================================================================="

read -n 1

clear
[[ "$splash" =~ silent ]] && echo silent > /proc/splash
fi

As we can see marked in red, the variable MSG executes and captures the isCompliant output, and the variable CODE collects it’s return code.
Then an “if” sentence checks the compliance, and if its different from 0 (Not Compliant), shows the screen with the MSG and waits for any key to be pressed before continuing.

As usual, we will make a backup of the original config file before making any changes into the working configuration:

SUSETEST:~# cp -p /etc/init.d/boot.compliance /etc/init.d/boot.compliance.orig

So, our change it’s going to be replacing the “read -n 1” with a “sleep 6“, because we are going to leave the “Not Compliance” message on screen during 6 seconds at boot time, to remind us that this system is not supported by SuSe in it’s actual configuration.

We will also comment the “Press any key…” message, to allow for automatic rebooting of the system.

SUSETEST:~# vi /etc/init.d/boot.compliance

It will end up as follows:

case "$1" in
start|restart|force-reload)
# Check if we're running in inst-sys or stage 2 of the installation - if so,
# the check must not be performed as there's already a check in YaST
if test -f /var/lib/YaST2/runme_at_boot ; then
exit
fi
echo -n "Check if the profiles matches the system"

MSG=`/usr/bin/isCompliant`
CODE=$?
if [ "$CODE" != "0" ]; then
splash=""
# Switch bootsplash to verbose mode to make text messages visible.
if test -f /proc/splash ; then
read splash < /proc/splash echo "verbose" > /proc/splash
fi

clear

echo -e "\n"
echo -e "=========================================================================="
echo -e "\n"
echo -e "$MSG"
echo -e "\n"
#echo -e "Press any key to proceed with booting."
echo -e "\n"
echo -e "=========================================================================="

#read -n 1
sleep 6

clear
[[ "$splash" =~ silent ]] && echo silent > /proc/splash
fi

We can save the modified file and reboot the system to check that we have the 6 seconds warning and then just carries on booting as usual.

So, with a little digging around the system we have found how to bypass the nagging screen and convert back our migrated VM into a workable system.

 

AIX Vulnerabilities

IBM has a tool to track and report vulnerabilites in it’s products, called the Fix Level Recommendation Tool (FLRT).

https://www-304.ibm.com/support/customercare/flrt/

Particularly for AIX, it has the Security APAR Information, or Security Bulletin information for AIX 7.2, 7.1, 6.1, 5.3, and VIOS

https://www-304.ibm.com/webapp/set2/flrt/doc?page=security

And to check our systems, IBM provides the flrtvc.ksh script, which produces an awesome output, in different formats.
As prerequisites, it needs:

1.- access to internet to retrieve the latest vulnerability CSV listing (aparCSV)
2.- wget
3.- curl

Points 2 & 3 are easily done if we have setup yum in our AIX system (yum install wget curl).

Some examples of flrtvc script execution:

[root@aixtest:/home/admin]./flrtvc.ksh | cut -c 1-110
Fileset|Current Version|Type|EFix Installed|Abstract|Unsafe Versions|APARs|Bulletin URL|Download URL
bos.acct|7.2.1.0|sec||NOT FIXED - (caccelstat) Vulnerabilities in bellmail / caccelstat / iostat / l
bos.acct|7.2.1.0|sec||NOT FIXED - (iostat) Vulnerabilities in bellmail / caccelstat / iostat / lquer
bos.acct|7.2.1.0|sec||NOT FIXED - (vmstat) Vulnerabilities in bellmail / caccelstat / iostat / lquer
bos.cluster.rte|7.2.1.0|hiper||NOT FIXED - CAA:SLOW GOSSIP RECEIPT ON BOOT MAY CAUSE PARTITIONED CLU
bos.mp64|7.2.1.1|hiper||NOT FIXED - getsockname() returns incorrect NameLength|7.2.1.0-7.2.1.1|IV914
bos.mp64|7.2.1.1|hiper||NOT FIXED - PROBLEMS CAN OCCUR WITH THREAD_CPUTIME AND THREAD_CPUTIME_FAST|7
bos.mp64|7.2.1.1|hiper||NOT FIXED - CRASH OR POTENTIAL DATA LOSS AFTER REMOVING LARGE JFS2 FILES ON
bos.mp64|7.2.1.1|hiper||NOT FIXED - SYSTEM CRASH WHEN USING PROCFS FOR PROCESSES CLOSING MANY FILES|
bos.mp64|7.2.1.1|sec||NOT FIXED - IBM has released AIX and VIOS iFixes in response to the vulnerabil
bos.net.tcp.bind_utils|7.2.1.1|sec||NOT FIXED - There is a vulnerability in BIND that impacts AIX.|7
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - There is a vulnerability in bellmail that impacts A
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - Vulnerabilities in BIND impact AIX|7.2.1.0|CVE-2016
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - There are two vulnerabilities in BIND that impact A
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - Vulnerability in bellmail affects AIX|7.2.1.0-7.2.1
bos.net.tcp.client_core|7.2.1.0|sec||NOT FIXED - (bellmail) Vulnerabilities in bellmail / caccelstat
bos.net.tcp.ntp|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in NTPv3 and NTPv4 that
bos.net.tcp.ntpd|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in NTPv3 and NTPv4 that
bos.net.tcp.tcpdump|7.2.1.0|sec||NOT FIXED - There are multiple vulnerabilities in tcpdump that impa
bos.rte.archive|7.2.1.0|sec||NOT FIXED - (restbyinode) Vulnerabilities in bellmail / caccelstat / io
bos.rte.lvm|7.2.1.0|sec||NOT FIXED - (lquerypv) Vulnerabilities in bellmail / caccelstat / iostat /
devices.fcp.disk.rte|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERRORS WITH CERTA
devices.pci.77102224.com|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERRORS WITH C
devices.pciex.df1060e214103404.com|7.2.1.0|hiper||NOT FIXED - UNDETECTED DATA LOSS AFTER STORAGE ERR
devices.vdevice.ibm.l-lan.rte|7.2.1.0|hiper||NOT FIXED - CRASH IN VIOENT_INIT_LS_TIMER WHEN POLL_UPL
devices.vdevice.ibm.vfc-client.rte|7.2.1.0|hiper||NOT FIXED - Potential data loss using Virtual FC w
java7_64.jre|7.0.0.370|sec||NOT FIXED - There are multiple vulnerabilities in IBM SDK Java Technolog
java7_64.sdk|7.0.0.370|sec||NOT FIXED - Multiple vulnerabilities in IBM Java SDK affect AIX|<7.0.0.4
java7_64.sdk|7.0.0.370|sec||NOT FIXED - Multiple vulnerabilities in IBM Java SDK affect AIX|<7.0.0.5
java7_64.sdk|7.0.0.370|sec||NOT FIXED - There are multiple vulnerabilities in IBM SDK Java Technolog
openssh.base.client|6.0.0.6201|sec||NOT FIXED - AIX OpenSSH Vulnerability|4.0.0.5200-6.0.0.6201|CVE-
openssh.base.client|6.0.0.6201|sec||NOT FIXED - Vulnerabilities in OpenSSH affect AIX|4.0.0.5200-6.0
openssl.base|1.0.2.800|sec||NOT FIXED - There is a vulnerability in OpenSSL used by AIX|1.0.2.500-1.
openssl.base|1.0.2.800|sec||NOT FIXED - Vulnerability in OpenSSL affects AIX|1.0.2.500-1.0.2.1100|CV
...
[root@aixtest:/home/admin]./flrtvc.ksh -v | pg
////////////////////////////////////////////////////////////
// IBM FLRTVC (v0.7.3) Report
// Server: aixtest
// Date: Fri Feb 9 10// Report by: root
// Vulnerable Filesets: 22
// Total Vulnerabilities: 54
// Total Fixes (not shown): 22
////////////////////////////////////////////////////////////

--------------------------------------------------------------------------------
bos.acct - 7.2.1.0 - Vulnerabilities (3)
--------------------------------------------------------------------------------

(1) NOT FIXED - (caccelstat) Vulnerabilities in bellmail / caccelstat / iostat / lquerypv / restbyinode / vmstat affect AIX (CVE-2017-1692)

Type: sec
Score: CVE-2017-1692:8.4
Versions: 7.2.1.0-7.2.1.0
APARs/CVEs: IV97811
Last Update: 02/05/2018
Bulletin: http://aix.software.ibm.com/aix/efixes/security/suid_advisory.asc
Download: ftp://aix.software.ibm.com/aix/efixes/security/suid_fix.tar
Fixed In: 7200-01-04

(2) NOT FIXED - (iostat) Vulnerabilities in bellmail / caccelstat / iostat / lquerypv / restbyinode / vmstat affect AIX (CVE-2017-1692)
Type: sec
Score: CVE-2017-1692:8.4
Versions: 7.2.1.0-7.2.1.1
APARs/CVEs: IV97898
Last Update: 02/05/2018
Bulletin: http://aix.software.ibm.com/aix/efixes/security/suid_advisory.asc
Download: ftp://aix.software.ibm.com/aix/efixes/security/suid_fix.tar
Fixed In: 7200-01-04
...

It really is a great tool, that can save us a lot of time when a vulnerability check is needed in our systems.

For a full usage of the tool:

Usage flrtvc: Change delimiter for compact reporting
 ./flrtvc.ksh -d '||'

Usage flrtvc: Generate full reporting (verbose mode)
 ./flrtvc.ksh -v

Usage flrtvc: Choose custom apar.csv file to use
 ./flrtvc.ksh -f myfile.csv

Usage flrtvc: Only show specific filesets in verbose mode
 ./flrtvc.ksh -vg printers

Usage flrtvc: Show only hiper results
 ./flrtvc.ksh -t hiper

Usage flrtvc: Custom lslpp and emgr outputs
 ./flrtvc.ksh -l lslpp.txt -e emgr.txt

Flags:

-d = Change delimiter for compact reporting
-f = Enter a custom aparCSV file in local filesystem
-q = Quiet mode, hide compact reporting header
-s = Skip download and locate 'apar.csv' filename in current directory
-v = Verbose, full report (for piping to email)
-g = Filter filesets for specific phrase, useful for verbose mode
-t = Type of APAR [hiper | sec]
-l = Enter a custom LSLPP output file, must match lslpp -Lqc
-e = Enter a custom EMGR output file, must match emgr -lv3
-x = Skip EFix processing
-a = Show all fixed and non-fixed HIPER/Security vulnerabilities.

6.- Advanced logrotate for AIX

The most powerful facilities provided by logrotate are prerotate,postrotate & endscript, and we can make good use of this facilities to employ “complex” log rotation schedules.

We will do a logrotate setup to perform log rotation following IBM recommendations for AIX v7.2, and for that we can resort to the following KBs in IBM Support Knowledgecenter:

/ (root) overflow
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.osdevice/fsrootover.htm

Resolving overflows in the /var file system
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.osdevice/fsvarover.htm

6.1.- logrotate failedlogin

IBM states in the official documentation, that failedlogin is a binary log, and therefore the utility who must be used to see it’s entries, as follows:

[root@aix72:/]who /etc/security/failedlogin
root vty0 Oct 20 18:03
UNKNOWN_ vty0 Oct 20 18:03
UNKNOWN_ ssh Oct 26 10:22 (10.1.15.12)
root ssh Oct 26 10:22 (10.1.15.12)
root ssh Nov 03 10:38 (10.20.30.129)
root ssh Nov 05 11:49 (srv.dom.myanet)
root ssh Nov 12 11:03 (tsmsrv)
root vty0 Nov 19 07:47
root ssh Nov 21 10:25 (10.20.120.72)
root ssh Jan 25 05:37 (10.20.130.229)
tsminst1 ssh Feb 13 10:03 (loopback)
tsminst1 ssh Feb 13 10:05 (loopback)
root ssh Feb 27 16:45 (10.1.15.214)
root pts/2 Mar 07 15:11 (10.1.165.159)
UNKNOWN_ pts/2 Mar 07 15:11 (10.1.165.159)
...

So we check the file and it’s permissions:

[root@aix72:/]ls -l /etc/security/failedlogin
-rw-rw---- 1 root system 14256 Mar 07 15:15 /etc/security/failedlogin

And we take note of the 660 access rights and user root group system.

Well, we can see that this is an interesting log to keep (and also a log that can grow large in size if there is a problem and we have a lot of terminal users), so using logrotate we can just rotate it by size, say 5 MB each log, keep 3 copies (keep the original log and rotate 2 more versions) and also keep it compressed, so we write the following file: /etc/logrotate.d/failedlogin

# logrotate config for failedlogin which logs failed login sessions in binary form, can be used to detect brute-force attacks. Read with "who".
/etc/security/failedlogin {
  size 5M
  compress
  rotate 2
  create 660 root system
}

And now we check that we don’t have any typos or problems on the logrotate config of the file just created:

[root@aix72:/]/usr/sbin/logrotate -vf /etc/logrotate.d/failedlogin
reading config file /etc/logrotate.d/failedlogin

Handling 1 logs

rotating pattern: /etc/security/failedlogin forced from command line (2 rotations)
empty log files are rotated, old logs are removed
considering log /etc/security/failedlogin
log does not need rotating

NOTE: If we want to force this rotation, then we could change the size 5M for size 10k, in which case the file will be rotated as it is 14k in size.

6.2.- logrotate wtmp

IBM states in the official documentation, that wtmp is a binary log, and therefore IBM recommends using the utility fwtmp to convert the binary log to an ASCII log, as follows:

Export the wtmp log to an ASCII copy called /tmp/wtmp-delete.me:

[root@aix72:/]/usr/sbin/acct/fwtmp < /var/adm/wtmp > /tmp/wtmp-delete.me

Now we can just read the file /tmp/wtmp-delete.me with vi, emacs, nano, or whichever editor we fancy:

[root@aix72:/]cat /tmp/wtmp-delete.me
clcomd   clcomd                       5 7733518 0000 0000 1488218371                                  Mon Feb 27 11:59:31 CST 2017
xmdaily  xmdaily                      5 8192296 0000 0000 1488218371                                  Mon Feb 27 11:59:31 CST 2017
ctrmc    ctrmc                        5 10944850 0000 0000 1488218371                                 Mon Feb 27 11:59:31 CST 2017
spectrum spectrum                     5 11075926 0000 0000 1488218371                                 Mon Feb 27 11:59:31 CST 2017
tsmcc    tsmcc                        5 11141464 0000 0000 1488218371                                 Mon Feb 27 11:59:31 CST 2017
ha_star  ha_star                      5 11010388 0000 0000 1488218371                                 Mon Feb 27 11:59:31 CST 2017
         pts/1          pts/1         8 8454482 0000 0000 1488245570                                  Mon Feb 27 19:32:50 CST 2017
root     pts/0          pts/0         7 15532358 0000 0000 1488305060 10.1.165.159                    Tue Feb 28 12:04:20 CST 2017
root     pts/1          pts/1         7 15860100 0000 0000 1488305690 10.1.165.159                    Tue Feb 28 12:14:50 CST 2017
root     pts/2          pts/2         7 15663394 0000 0000 1488323898 10.1.165.159                    Tue Feb 28 17:18:18 CST 2017
 ...

This log tells us all valid terminal logins (loging, rlogins and telnet), date and time, which terminal they used and their remote IP. So it is also an interesting log to keep. (and can also be read by who, try ” who /var/adm/wtmp “)

Now to check the file and it’s permissions:

[root@aix72:/]ls -l /var/adm/wtmp
 -rw-rw-r-- 1 adm adm 846288 Mar 07 15:12 /var/adm/wtmp

Aha, this one has different attributes from the one on the step 6.1, we take note of the 664 access rights and user adm group adm.

NOTE: If we wanted to process wtmp log, say to keep the last 1000 lines, and truncating the rest, we could process the ASCII log and convert back to the /var/adm/wtmp binary log using the utility fwtmp like this:

/usr/bin/tail -1000 /tmp/wtmp-delete.me | /usr/sbin/acct/fwtmp -ic > /var/adm/wtmp

Not forgetting to cleanup the temp file afterwards:

rm /tmp/wtmp-delete.me

This used to be a common practice to keep wtmp log under control in AIX, but this is not really needed anymore, as using logrotate we can just rotate wtmp by size, say 5 MB each log, keep 2 copies (keep the original log and rotate 1 more version), so we write the following file:

# logrotate config for wtmp which logs all logins, rlogins and telnet sessions in binary form
 /var/adm/wtmp {
 size 5M
 rotate 1
 create 664 adm adm
 }

And now we check that we don’t have any typos or problems on the logrotate config of the file just created:

[root@aix72:/]/usr/sbin/logrotate -vf /etc/logrotate.d/wtmp
 reading config file /etc/logrotate.d/wtmp

Handling 1 logs

rotating pattern: /var/adm/wtmp forced from command line (1 rotation)
 empty log files are rotated, old logs are removed
 considering log /var/adm/wtmp
 error: skipping "/var/adm/wtmp" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation.

Hey!!!, this time we have an error! and this is because the parent directory group is different from root (so it expects to be told the right ones to use on the “su” directive ), so we check this and change our config file accordingly, and we’ll try again:

[root@aix72:/home/admin]ls -la /var/adm
 total 4960
 drwxrwxr-x 15 root adm 4096 Mar 03 19:25 . <-- it's owned by root/adm

Add the su directive to the config file:

# logrotate config for wtmp which logs all logins, rlogins and telnet sessions in binary form
 /var/adm/wtmp {
 su root adm
 size 5M
 rotate 2
 create 664 adm adm
 }
[root@aix72:/]/usr/sbin/logrotate -vf /etc/logrotate.d/wtmp
 reading config file /etc/logrotate.d/wtmp

Handling 1 logs

rotating pattern: /var/adm/wtmp forced from command line (1 rotation)
 empty log files are rotated, old logs are removed
 switching euid to 4 and egid to 4
 considering log /var/adm/wtmp
 log does not need rotating
 switching euid to 0 and egid to 0

This time, it has worked fine, and also it tells us that it has switched user to userid 0 / groupid 4 and then back to the original (0/0). So now we know more things about logrotate’s output. Good.

Now we also know that we need to check the owners on the log’s parent directory as well in case we need to add the su directive, and we will from now on.

6.3.- logrotate errlog6.3.- logrotate errlog

Errlog is a binary circular log, and therefore cannot really be rotated. So for this log in particular is not recommended to be managed using logrotate.

By default, root’s crontab comes with the following 2 entries to trim old HW errors:

0 11 * * * /usr/bin/errclear -d S,O 30 <-- delete Software & errlogger msgs (Other) older than 30 days
0 12 * * * /usr/bin/errclear -d H 90   <-- delete Hardware errors older than 90 days

Error messages get overwritten by new ones as they are generated, and there are system admins that do not even use errclear to trim the log, since they prefer to keep an historical of all theHW errors ever generated on one server.

However in our days with virtualized hardware, it’s not that relevant to keep all errors ever registered, and I find a good practice to enable the default errclear entries on crontab, as well as skulker to clean old log & temp system files.

Errlog can however be increased if need be, to see the actual errlog size:

[root@aix72:/]/usr/lib/errdemon -l
Error Log Attributes
--------------------------------------------
Log File                /var/adm/ras/errlog
Log Size                1048576 bytes				<-- 1MB default value
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000
PureScale Logging       off
PureScale Logstream     CentralizedRAS/Errlog

To increase the size:

[root@aix72:/]/usr/lib/errdemon -s 2097152
[root@aix72:/]/usr/lib/errdemon -l
Error Log Attributes
--------------------------------------------
Log File                /var/adm/ras/errlog
Log Size                2097152 bytes				<-- increased to 2MB
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000
PureScale Logging       off
PureScale Logstream     CentralizedRAS/Errlog

NOTE: The errlog daemon ( /usr/lib/errdemon ) can be stopped using the command  ( /usr/lib/errstop ), and one VERY important thing to take into account is that the errlog should not be zeroed (a procedure often used to clear logs), otherwise the daemon would not start. So, let’s try it to see what would happen :o)

[root@aix72:/var/adm/ras]cp -p /var/adm/ras/errlog /var/adm/ras/errlog.bak <-- we do a backup of it first!!!

[root@aix72:/var/adm/ras]> /var/adm/ras/errlog       <-- we zero the file

[root@aix72:/var/adm/ras]/usr/lib/errdemon	     <-- and we try to start the errdemon...
0315-180 logread: UNEXPECTED EOF		     <-- the only entry in the log is the End Of File (it is a zero file)
0315-171 Unable to process the error log file /var/adm/ras/errlog.
errdemon:
0315-001 Failure to open the logfile '/var/adm/ras/errlog' for writing.
Possible causes are:
1. The file exists but the invoking process does not have write
   permission.
2. The file exists but the directory '/var/adm/ras' does not have write
   permission.
3. The file exists but it is not a valid error logfile.  Remove

4. The file does exist and the directory ‘/var/adm/ras’ does not have enough
space available. The minimum logfile size is 8192 bytes.

[root@aix72:/var/adm/ras]cp -p /var/adm/ras/errlog.bak /var/adm/ras/errlog <– restore the backup to be able to start the errdemon again

6.4.- logrotate sulog

Now we will do rotation for sudo’s log ( sulog ):

/var/adm/sulog

We do file, permission & parent directory checks :

[root@aix72:/]ls -l /var/adm/sulog
-rw-------    1 root     system         4693 Mar 03 18:24 /var/adm/sulog
[root@aix72:/]ls -la /var/adm/.
total 5208
drwxrwxr-x   15 root     adm            4096 Mar 20 22:36 .

It has 600 access rights, user root group system, and parent directory has user root group adm, so we will have to use the “su” directive

# logrotate config for sudo's log (sulog).
/var/adm/sulog {    
  su root adm    
  size 5M    
  compress    
  rotate 2    
  create 600 root system
}

[root@aix72:/]/usr/sbin/logrotate -vf /etc/logrotate.d/sulog
reading config file /etc/logrotate.d/sulog
Handling 1 logs
rotating pattern: /var/adm/sulog  forced from command line (2 rotations)
empty log files are rotated, old logs are removed
switching euid to 0 and egid to 4
considering log /var/adm/sulog 
 log needs rotating
rotating log /var/adm/sulog, log->rotateCount is 2
dateext suffix '-20170320'
glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
renaming /var/adm/sulog.2.gz to /var/adm/sulog.3.gz (rotatecount 2, logstart 1, i 2),
old log /var/adm/sulog.2.gz does not exist
renaming /var/adm/sulog.1.gz to /var/adm/sulog.2.gz (rotatecount 2, logstart 1, i 1),
old log /var/adm/sulog.1.gz does not exist
renaming /var/adm/sulog.0.gz to /var/adm/sulog.1.gz (rotatecount 2, logstart 1, i 0),
old log /var/adm/sulog.0.gz does not exist
log /var/adm/sulog.3.gz doesn't exist -- won't try to dispose of it
renaming /var/adm/sulog to /var/adm/sulog.1
creating new /var/adm/sulog mode = 0600 uid = 0 gid = 0
compressing log with: /bin/gzip
switching uid to 0 and gid to 4
switching euid to 0 and egid to 0

6.5.- logrotate syslog

Syslog is an AIX classic, and the only log churner that got upgraded a while ago and has it’s own “decent” log rotation & compression configuration since at least AIX version 6.1, therefore the recommended log rotation method is to use syslog’s own configuration file and not logrotate’s.

For the official documentation on syslog from AIX v7.2, you can refer to:
https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.cmds5/syslogd.htm

To apply a weekly logrotation, with 5 log versions, and compression enabled, just edit the /etc/syslog.conf file and modify the entry *.info for the following:

 *.info /var/log/syslog.log rotate time 1w files 5 compress     #weekly rotation, 5 files, compressed

Just remember to refresh the daemon after making the modifications in the config file:

[root@aix72:/] refresh -s syslogd
0513-095 The request for subsystem refresh was completed successfully.

Well chaps, I think that it is now enough logrotate for AIX now. There are examples of common logrotate configs that I will like to post on the future, but for now, I think that this is a good end of series for Logrotate.

AS always: Thanx for reading!

Logrotate 4 & 5.- Support & Common Errors

NOTE:  This is a follow-up, from the previous post: Logrotate 3.- Logrotate checks

4.- Logrotate Support

Disclaimer (IBM Unsupported):  IBM stand on opensource utilities is that they are not directly supported by IBM, this is IBM Support’s page for logrotate (dated 06 June 2011):

http://www-01.ibm.com/support/docview.wss?uid=isg3T1012796

So, IBM will not provide any PMR Support on Open Source Software (and this is completely logical, as it’s not an IBM product), but still, you can get community based support at the developerWorks pages, and for this forum-based support, you can go to:

IBMDeveloperWorks: Forum Directory >‎ dW >‎ AIX and UNIX >‎ Forum: AIX Open Source Software

And in that forum, exists an specific YUM topic:

IBMDeveloperWorks: Forum Directory >‎ dW >‎ AIX and UNIX >‎ Forum: AIX Open Source Software >‎ Topic: yum for AIX Toolbox

5.- Fixing logrotate errors

5.1.- config file logrotate.conf errors

[root@aix72:/home/admin]logrotate -vf /etc/logrotate.conf
error: cannot stat /etc/logrotate.conf: A file or directory in the path name does not exist.

Cannot stat means that the config file is NOT FOUND, so revise that /etc/logrotate exists and has the right access rights (if it doesn’t, look for it in /opt/freeware/etc and copy it to /etc)

5.2.- config directory logrotate.d errors

[root@aix72:/home/admin]logrotate -vf /etc/logrotate.conf
reading config file /etc/logrotate.conf
including /etc/logrotate.d
error: cannot stat /etc/logrotate.d: A file or directory in the path name does not exist.
removing last 0 log configs

Cannot stat means that the config directory is NOT FOUND, so revise that /etc/logrotate.d exists and has the right access rights (if it doesn’t, look for it in /opt/freeware/etc and copy it to /etc)

5.3.- files in directory logrotate.d errors

[root@aix72:/opt/freeware/etc]logrotate -v /etc/logrotate.conf
reading config file /etc/logrotate.conf
including /etc/logrotate.d
reading config file yum
error: yum:6 unknown group 'root'
error: found error in /var/log/yum.log , skipping
removing last 1 log configs
error: /etc/logrotate.conf:23 unknown group 'utmp'
error: found error in /var/log/wtmp , skipping
removing last 1 log configs
error: /etc/logrotate.conf:31 unknown group 'utmp'
error: found error in /var/log/btmp , skipping
removing last 1 log configs

Handling 0 logs

This errors are usually caused at installation time of logrotate in AIX, since the config files require some modifications:

error: yum:6 unknown group 'root'
error: found error in /var/log/yum.log , skipping

It complains against the line 6 of /etc/logrotate.d/yum file, since in AIX there isn’t a “root” group, it is “system“, so modify the file:

/var/log/yum.log {  
  missingok 
  notifempty 
  size 30k 
  yearly 
  create 0600 root root 
}

for the file:

/var/log/yum.log {
  missingok
  notifempty
  size 30k
  yearly
  create 0600 root system
}
error: /etc/logrotate.conf:23 unknown group 'utmp' 
error: found error in /var/log/wtmp , skipping

It complains against the line 23 of /etc/logrotate.conf file, since in AIX there isn’t a “utmp” group, and in fact wtmp is not located in /var/log/wtmp, but in /var/adm/wtmp but in any case, we can just refer to the steps in 2.1 to fix it by deleting the wtmp lines in /etc/logrotate.conf.

error: /etc/logrotate.conf:31 unknown group 'utmp'
error: found error in /var/log/btmp , skipping

It complains against the line 31 of /etc/logrotate.conf file, since in AIX there isn’t a “utmp” group, and in fact AIX does not have a btmp, so we can just refer to the steps in 2.1 to fix it by deleting the wtmp lines in /etc/logrotate.conf.

 

That covers the most common Logrotate config errors in AIX. I’m sure that you will find some more obscure ones to entertain yourself with, as it is often the case!

On the next post, it will be time for step 6.- Advanced Logrotate for AIX.  See you then, and thanks for reading!

Logrotate 3.- Logrotate checks

NOTE:  This is a follow-up, from the previous post:  Logrotate 2.- Configure logrotate for AIX

To check that logrotate is configured and working OK, all we need to do is call logrotate from the command line telling it to verbose it’s internal checks ( -v ) and to check the config file ( /etc/logrotate.conf ), like the following:

[root@aix72:/home/admin]/usr/sbin/logrotate -v /etc/logrotate.conf
reading config file /etc/logrotate.conf
including /etc/logrotate.d      
reading config file failedlogin 
reading config file sysadmin    
reading config file wtmp        
reading config file yum         

Handling 6 logs

rotating pattern: /etc/security/failedlogin 5242880 bytes (2 rotations)
empty log files are rotated, old logs are removed
considering log /etc/security/failedlogin
 log does not need rotating        

rotating pattern: /home/admin/log/check_all.log 1048576 bytes (2 rotations)
empty log files are rotated, old logs are removed
considering log /home/admin/log/check_all.log 
 log does not need rotating         

rotating pattern: /var/adm/wtmp 5242880 bytes (2 rotations)
empty log files are rotated, old logs are removed
switching euid to 4 and egid to 4
considering log /var/adm/wtmp 
 log does not need rotating   
switching euid to 0 and egid to 0

rotating pattern: /var/log/yum.log yearly (4 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/yum.log 
 log does not need rotating      

Once we have checked that the config is OK, we can check the rotation by Forcing rotation with the -f or –force flag:

[root@aix72:/etc/logrotate.d]logrotate -vf /etc/logrotate.conf
 reading config file /etc/logrotate.conf
 including /etc/logrotate.d
 reading config file failedlogin
 reading config file sysadmin
 reading config file wtmp
 reading config file yum
 Handling 6 logs

rotating pattern: /home/admin/log/check_all.log forced from command line (2 rotations)
 empty log files are rotated, old logs are removed
 considering log /home/admin/log/check_all.log
 log does not need rotating

rotating pattern: /home/admin/log/start_all.log forced from command line (1 rotations)
 empty log files are rotated, old logs are removed
 considering log /home/admin/log/start_all.log
 log does not need rotating

rotating pattern: /home/admin/log/stop_all.log forced from command line (1 rotations)
 empty log files are rotated, old logs are removed
 considering log /home/admin/log/stop_all.log
 log does not need rotating

rotating pattern: /var/log/yum.log forced from command line (4 rotations)
 empty log files are not rotated, old logs are removed
 considering log /var/log/yum.log log needs rotating rotateCount is 4
 dateext suffix '-20170226'
 glob pattern '-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
 glob finding old rotated logs failed
 renaming /var/log/yum.log to /var/log/yum.log-20170226
 creating new /var/log/yum.log mode = 0600 uid = 0 gid = 0

Logrotate is configured OK and it seems to work fine, so if it’s not executing properly, we will have to check it’s schedule on the crontab.

NOTE: Notice that when we configure the rotation to be on size, the –force option cannot force this rotation, so to force rotation on stanzas where size has been used, just lower the size attribute temporarily (size 10k instead of 5M, for example).

3.1- Logrotate individual files/logs check

To check logrotate’s config for a particular file, we will have to identify it first in the /etc/logrotate.d directory, for example to check the config for yum’s logs:

[root@aix72:/etc/logrotate.d]logrotate -vf /etc/logrotate.d/yum
reading config file /etc/logrotate.d/yum

Handling 1 logs

rotating pattern: /var/log/yum.log forced from command line (no old logs will be kept)
empty log files are not rotated, old logs are removed
considering log /var/log/yum.log
 log does not need rotating

To check the config for a specific log, but we don’t see a logrotate file stored by its name in /etc/logrotate.d, we will have to dig it out (for example let’s look for start_all.log):

[root@aix72:/home/admin]grep start_all.log /etc/logrotate.d/*
/etc/logrotate.d/sysadmin:/home/admin/log/start_all.log

OK, so it looks like the logrotate config for start_all.log resides in the /etc/logrotate.d/sysadmin file, so now we can check it:

[root@aix72:/etc/logrotate.d]logrotate -vf /etc/logrotate.d/sysadmin
reading config file /etc/logrotate.d/sysadmin

Handling 3 logs

rotating pattern: /home/admin/log/check_all.log forced from command line (2 rotations)
empty log files are rotated, old logs are removed
considering log /home/admin/log/check_all.log
 log does not need rotating

rotating pattern: /home/admin/log/start_all.log forced from command line (1 rotations) 
empty log files are rotated, old logs are removed
considering log /home/admin/log/start_all.log
 log does not need rotating

rotating pattern: /home/admin/log/stop_all.log forced from command line (1 rotations)
empty log files are rotated, old logs are removed
considering log /home/admin/log/stop_all.log
 log does not need rotating

So, as always, an important part of a configuration (the most important, actually) is to check that our new config works just as we expected it.

And now we have seen how to check all the logrotate config, how to force the log rotation, and how to check individual logrotate config files, so with this three checks we should be able to perform config-test-change-retest until our friend logrotate does what we expect it to.

On the step 4, I will talk about logrotate documentation & support, and step 5 will show how to fix common logrotate errors. See you soon.

Blog at WordPress.com.

Up ↑