Cannot create a failover image for group on vSphere Replication Server ‘vsra01.corp.ad’ (address ‘127.0.0.1’)

Greetings guys.

Well I am back with another interesting issue today. Again its related to vSphere Replication.

There was a requirement at a customer site where an application team would like to recover their VMs at DR site that were getting replicated using vSphere Replication.

Well since the requirement was simple, user checked the replication and it was OK and performed Sync Now to sync all latest data and status changed from Incremental Sync to OK.

Now next step is to recover the VM at DR site. Logged in to DR site vSphere Replication UI and selected the VM and performed the Recovery.

To his surprise operation failed with the error “Cannot create a failover image for group ‘GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce’ on vSphere Replication Server ‘virtualMI01-1-vsra01.corp.ad.sbi’ (address ‘127.0.0.1’)”

I had seen such issues in past where the vm files in vSphere Replication database are not matching with the files in destination datastores.

To verify I logged on to the respective hbr server and checked the files using below command and compared them with the files in the destination datastore.

root@virtualMI01-1-vsra01 [ ~ ]# /usr/bin/hbrsrv-replicainfo.sh -p -d /etc/vmware/hbrsrv.85.db | grep #########_MANAGER_DB01_PROD
15|#########_MANAGER_DB01_PROD_9.vmdk|RDID-4df1ec0b-f733-4a85-aab4-3828f20be5c9||26|5|1|0|0|1|0|2147483647|133674|255|63|0|0|0|0|0||0|0
16|#########_MANAGER_DB01_PROD_6.vmdk|RDID-3bcc25fb-a4bc-4f8f-af17-54ca6630c3f9||27|5|1|0|1876705280|53|0|2147483647|3916|255|63|0|0|0|0|0||0|0
17|#########_MANAGER_DB01_PROD_11.vmdk|RDID-e73b7a03-329b-4323-8d87-b473b2e030f4||28|5|1|0|10674176|3|0|2147483647|13054|255|63|0|0|0|0|0||0|0
18|#########_MANAGER_DB01_PROD_1.vmdk|RDID-da635056-b3b0-48a2-b7d4-1d194e1062cf||29|5|1|0|1400832|3|0|2147483647|26108|255|63|0|0|0|0|0||0|0
19|#########_MANAGER_DB01_PROD_2.vmdk|RDID-88f0c156-a1f1-4b84-b854-79bc6cffc896||30|5|1|0|4446511104|161|0|2147483647|267349|255|63|0|0|0|0|0||0|0
20|#########_MANAGER_DB01_PROD_3.vmdk|RDID-f1bb2563-a3df-491b-b191-b2cfcba14c5e||31|5|1|0|2564644864|121|0|2147483647|267349|255|63|0|0|0|0|0||0|0
21|#########_MANAGER_DB01_PROD_4.vmdk|RDID-bb80acb7-7f1c-4ba4-938f-92f499aa6abc||32|5|1|0|589824|1|0|2147483647|267349|255|63|0|0|0|0|0||0|0
22|#########_MANAGER_DB01_PROD.vmdk|RDID-11e7c3ce-ef80-44da-bfd4-ab86737a3d0d||33|5|1|0|1813274624|52|0|2147483647|133674|255|63|0|0|0|0|0||0|0
23|#########_MANAGER_DB01_PROD.vmdk|RDID-1897ec1d-d19d-40aa-8ed2-5cde4e9e6be8||34|5|1|0|8192|2|0|2147483647|267349|255|63|0|0|0|0|0||0|0
24|#########_MANAGER_DB01_PROD_5.vmdk|RDID-31179b39-4912-4ce1-89ce-80be9f7bb502||35|5|1|0|81920|2|0|2147483647|267349|255|63|0|0|0|0|0||0|0
25|#########_MANAGER_DB01_PROD_7.vmdk|RDID-65ca9e4b-db9c-420d-83b5-e012162c44fe||36|5|1|0|1876705280|53|0|2147483647|3916|255|63|0|0|0|0|0||0|0
RDID-4df1ec0b-f733-4a85-aab4-3828f20be5c9|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|5ec4757e-da465d24-9ea8-0671ab80009b|#########_MANAGER_DB01_PROD|15|39||01b9249e2e62bfe128bb967d36223b74|12131994294265684|0|1|||615429177344|0|0|||
RDID-3bcc25fb-a4bc-4f8f-af17-54ca6630c3f9|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|16|48||4a6939243530d5818db1044b6b3fc3f9|12131994294265684|0|1|||16291405824|0|0|||
RDID-e73b7a03-329b-4323-8d87-b473b2e030f4|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|17|44||9df972cd93bc3657d578d4d95c7b2695|12131994294265684|0|1|||107375230976|0|0|||
RDID-da635056-b3b0-48a2-b7d4-1d194e1062cf|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|18|43||8ee60cfee2b7727ab9c5ba3860483bf1|12131994294265684|0|1|||50186944512|0|0|||
RDID-88f0c156-a1f1-4b84-b854-79bc6cffc896|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|19|30||db42e4984ad027a5c8246c0b62bbf7b7|12131994294265684|0|1|||2202293329920|0|0|||
RDID-f1bb2563-a3df-491b-b191-b2cfcba14c5e|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d5d-73eae522-757e-0017a4773460|#########_MANAGER_DB01_PROD|20|31||3e7ae1a59f497e1c8a977ee362a10f03|12131994294265684|0|1|||2201123078144|0|0|||
RDID-bb80acb7-7f1c-4ba4-938f-92f499aa6abc|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|5f3946ec-4187ab50-6d75-0671ab80009b|#########_MANAGER_DB01_PROD|21|40||d6e74d455bf6817788ff0220c8e8f544|12131994294265684|0|1|||868182130688|0|0|||
RDID-11e7c3ce-ef80-44da-bfd4-ab86737a3d0d|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|5ec4757e-da465d24-9ea8-0671ab80009b|#########_MANAGER_DB01_PROD|22|33||fcb9a54340df2e0b2d9402311733ff3c|12131994294265684|0|1|||1100914622464|0|0|||
RDID-1897ec1d-d19d-40aa-8ed2-5cde4e9e6be8|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|5ec47598-bec2f980-4ef1-0671ab800039|#########_MANAGER_DB01_PROD|23|41||19bf9b9812d49ecf8007160e319467d7|12131994294265684|0|1|||2199009624064|0|0|||
RDID-31179b39-4912-4ce1-89ce-80be9f7bb502|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|5ecfcc74-9b0561c4-d0c3-0671ab80009b|#########_MANAGER_DB01_PROD|24|42||e1f7a3fced03ad66db72f756e9773c28|12131994294265684|0|1|||2198085828608|0|0|||
RDID-65ca9e4b-db9c-420d-83b5-e012162c44fe|GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|25|47||4cf29142e9061133a462e49b6b0835b9|12131994294265684|0|1|||16292782080|0|0|||
9|1|#########_MANAGER_DB01_PROD.vmxf|hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.vmxf.63704|1|1621593672 0||0|1|d36bbe777b0aa7ad64b407f519dede15
10|2|#########_MANAGER_DB01_PROD.nvram|hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.nvram.63705|1|1621593672 0||0|1|7b413f1bd0ee5096e3ab16372f6ec6d0
11|0|#########_MANAGER_DB01_PROD.vmx|hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2130071.vmx.64318|1|1621593672 0||0|1|59334839158e63bd8b7b002f9c3286f5
12|0|#########_MANAGER_DB01_PROD.vmx|hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.7.vmx.12|1|1621594310 0|1621594310 0|109983|1|42469421564459c40873237c1b899ae7
GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce|4|2594461774|7|3|0|0|609f6d2f-8a552036-54a2-0017a4773450|#########_MANAGER_DB01_PROD|30|0||||||1|9378f698-00c8-4544-9892-ac3047d3b38e|30|0||

Files in the destination datastore are matching with the hbr database.

Now I went ahead and examined the hbr logs for further clues and found an interesting entry in the logs (highlighted below).


2021-05-21T11:04:02.948Z verbose hbrsrv[50071] [Originator@6876 sub=Misc opID=hsl-872d9ab0] Disk instance removed (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (diskID=RDID-88f0c156-a1f1-4b84-b854-79bc6cffc896) (diskInstanceKey=19) after 174 seconds.
2021-05-21T11:04:02.948Z verbose hbrsrv[50083] [Originator@6876 sub=PropertyProvider opID=hsl-872d9ab0] RecordOp ASSIGN: syncPruneTask, Hbr.Replica.Group.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce. Applied change to temp map.
2021-05-21T11:04:02.948Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Removing (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=9) (groupInstanceKey=5) (type=vmxf) (identifier=#########_MANAGER_DB01_PROD.vmxf) (path=/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.vmxf.63704) (complete=TRUE)
2021-05-21T11:04:02.949Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Completed file remove (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=9) after 0 seconds.
2021-05-21T11:04:02.949Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Removing (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=10) (groupInstanceKey=5) (type=nvram) (identifier=#########_MANAGER_DB01_PROD.nvram) (path=/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.nvram.63705) (complete=TRUE)
2021-05-21T11:04:02.950Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Completed file remove (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=10) after 0 seconds.
2021-05-21T11:04:02.950Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Removing (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=11) (groupInstanceKey=5) (type=vmx) (identifier=#########_MANAGER_DB01_PROD.vmx) (path=/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2130071.vmx.64318) (complete=TRUE)
2021-05-21T11:04:02.961Z info hbrsrv[50083] [Originator@6876 sub=PersistentCleanup opID=hsl-872d9ab0] The file '/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2130071.vmx.64318' (key=5) was cleaned up successfully.
2021-05-21T11:04:02.961Z verbose hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Completed file remove (GroupID=GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) (key=11) after 0 seconds.
2021-05-21T11:04:02.961Z info hbrsrv[50083] [Originator@6876 sub=Delta opID=hsl-872d9ab0] Completed instance prune (key 5, group GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce) after 174 seconds.
2021-05-21T11:04:03.208Z info hbrsrv[50083] [Originator@6876 sub=StorageManager opID=hsl-872d9ab0] RemoteFile: Opened path (/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrgrp.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.txt)
2021-05-21T11:04:03.212Z verbose hbrsrv[50038] [Originator@6876 sub=PropertyProvider opID=hsl-872d9ab0] RecordOp ASSIGN: groupStats, Hbr.Replica.Group.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce. Applied change to temp map.
2021-05-21T11:04:03.213Z info hbrsrv[50083] [Originator@6876 sub=Recover opID=hsl-872d9ab0] Wrote group GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce recovery state to /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrgrp.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.txt
2021-05-21T11:04:03.214Z verbose hbrsrv[44533] [Originator@6876 sub=Delta opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5] Prune check group GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce (has 1 consistent instances)
2021-05-21T11:04:03.214Z verbose hbrsrv[44533] [Originator@6876 sub=PropertyProvider opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5] RecordOp ASSIGN: state, Hbr.Replica.Group.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce. Applied change to temp map.
2021-05-21T11:04:03.216Z verbose hbrsrv[44533] [Originator@6876 sub=PropertyProvider opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5] RecordOp ASSIGN: state, Hbr.Replica.Group.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce. Applied change to temp map.
2021-05-21T11:04:03.216Z info hbrsrv[44533] [Originator@6876 sub=Image opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5:hs-ef22] Creating  image from group GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce, instance 7, in #########_MANAGER_DB01_PROD
2021-05-21T11:04:08.876Z info hbrsrv[44533] [Originator@6876 sub=Image opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5:hs-ef22] Copying cfg /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.vmxf.63704 to /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/#########_MANAGER_DB01_PROD.vmxf
2021-05-21T11:04:08.890Z info hbrsrv[44533] [Originator@6876 sub=Image opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5:hs-ef22] Copying cfg /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.2114928.nvram.63705 to /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/#########_MANAGER_DB01_PROD.nvram
2021-05-21T11:04:08.905Z info hbrsrv[44533] [Originator@6876 sub=Image opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5:hs-ef22] Copying cfg /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.7.vmx.12 to /vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/#########_MANAGER_DB01_PROD.vmx
2021-05-21T11:04:09.385Z info hbrsrv[44533] [Originator@6876 sub=StorageManager opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648:hs-2fb5:hs-ef22] RemoteFile: Opened path (/vmfs/volumes/609f6d2f-8a552036-54a2-0017a4773450/#########_MANAGER_DB01_PROD/hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.7.vmx.12)
2021-05-21T11:04:12.221Z error hbrsrv[44533] [Originator@6876 sub=Main opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648]    [0] Config file hbrcfg.GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce.7.vmx.12 size is 109983, maximum supported size is 65536
2021-05-21T11:04:12.221Z error hbrsrv[44533] [Originator@6876 sub=Main opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648]    [2] Creating image of instance of GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce
2021-05-21T11:04:12.221Z error hbrsrv[44533] [Originator@6876 sub=Main opID=2114f4ee-eba3-4b1b-9731-a5e087d1718a-HMS-7648]    [3] Creating fail-over image of GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce
2021-05-21T11:04:29.844Z info hbrsrv[33503] [Originator@6876 sub=StorageMap] Get VM object GID-e0b91df5-35a5-4a00-b66f-9aeeb36fe9ce for dsPath 609f6d2f-8a552036-54a2-0017a4773450. Sending PC updates for singleHost.

To confirm my doubt I checked both the configuration file and the vmx file size of the reported VM.

Default maximum config file size is only 64Kb.

root@virtualMI01-1-vsra01 [ ~ ]# cat /etc/vmware/hbrsrv.xml | grep -i maxconfig
      <!-- <maxConfigFileSize>65536</maxConfigFileSize> -->

Where as the vmx file size of the failed VM is above 64Kb.

So now as per the VMware KB article, I changed the value to slightly higher than the vmx file size and restarted hbr services using systemctl stop hbrsrv;systemctl start hbrsrv.

root@virtualMI01-1-vsra01 [ /etc/vmware ]# cat /etc/vmware/hbrsrv.xml | grep -i maxconfig
      <!-- <maxConfigFileSize>65536</maxConfigFileSize> -->
        <maxConfigFileSize>131072</maxConfigFileSize>

It took couple of minutes for services to come up. After services are up, we tried recovery once again and it was successful.