Skip to main content

Guest Introspection(GI) NSX Appliance gets Deleted Automatically after vCenter Disconnect



I was working in an environment where NSX was configured with Trend micro deep security in VDI environment to protect VDI desktops. We had 2 vCenters setup in that way and one of the vCenter is showing some odd symptoms. We are seeing some weird actions with NSX manager in one vCenter.

So, I lose the connection to the vCenter from fat client, web client is still fine and doesn’t kick me off. In the process, the Guest Introspection appliance or GI VMs (appliance managed by NSX) from some  random hosts were getting deleted automatically - from the cluster that was protected with NSX. When I checked the windows logs (as vCenter was on windows server) this is what I saw - 




I opened a case with VMware support - They reviewed the NSX logs and came to conclusion that this is not the NSX issue but it's vCenter that is deleting those VMs. vCenter gets disconnected for few seconds, then connects back. It doesn't recognizes those GI VMs and starts deleting them.

 ERROR | 2018-08-16 13:31:15,856 | eam-0 | VcListener.java | 351 | Lost connection to vCenter server - trying to reconnect
 INFO | 2018-08-16 13:31:15,858 | eam-0 | VcConnection.java | 128 | Connecting to vCenter as the com.vmware.vim.eam extension
 INFO | 2018-08-16 13:31:15,859 | eam-0 | VcConnection.java | 224 | Connecting to https://sdkTunnel:8089/sdk/vimService via vCenter proxy http://localhost:80
ERROR | 2018-08-16 13:31:16,865 | eam-0 | VcConnection.java | 267 | Error communicating with the vCenter server Cause: org.apache.http.conn.HttpHostConnectException: Connection to http://localhost:80 refused
ERROR | 2018-08-16 13:31:16,865 | eam-0 | VcListener.java | 310 | No connection to vCenter server - retrying in 10 seconds
 INFO | 2018-08-16 13:31:26,866 | eam-0 | VcConnection.java | 128 | Connecting to vCenter as the com.vmware.vim.eam extension
 INFO | 2018-08-16 13:31:26,866 | eam-0 | VcConnection.java | 224 | Connecting to https://sdkTunnel:8089/sdk/vimService via vCenter proxy http://localhost:80
ERROR | 2018-08-16 13:31:27,883 | eam-0 | VcConnection.java | 267 | Error communicating with the vCenter server Cause: org.apache.http.conn.HttpHostConnectException: Connection to http://localhost:80 refused
ERROR | 2018-08-16 13:31:27,883 | eam-0 | VcListener.java | 310 | No connection to vCenter server - retrying in 10 seconds
 INFO | 2018-08-16 13:31:37,884 | eam-0 | VcConnection.java | 128 | Connecting to vCenter as the com.vmware.vim.eam extension
 INFO | 2018-08-16 13:31:37,884 | eam-0 | VcConnection.java | 224 | Connecting to https://sdkTunnel:8089/sdk/vimService via vCenter proxy http://localhost:80
 INFO | 2018-08-16 13:31:43,379 | eam-0 | VcConnection.java | 134 | Logged in with logical user session ID 91886152
 INFO | 2018-08-16 13:31:43,379 | eam-0 | VcConnection.java | 136 | Logged in with physical session cookie 4218B1AB
 INFO | 2018-08-16 13:31:43,379 | eam-0 | VcListener.java | 301 | Connected to vCenter server
 INFO | 2018-08-16 13:31:43,379 | eam-0 | VcListener.java | 327 | vCenter running - registering EAM
 INFO | 2018-08-16 13:31:43,567 | eam-0 | AgencyImpl.java | 736 | Updating agency configuration: agency-0
 INFO | 2018-08-16 13:31:43,714 | eam-0 | VcComputeResource.java | 519 | VcClusterComputeResource(domain-c5616) setting required agent count in VC to 2
 INFO | 2018-08-16 13:31:44,636 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-117
 INFO | 2018-08-16 13:31:44,636 | eam-0 | AgentImpl.java | 421 | Goal state changed from enabled to uninstalled (agent-117)
 INFO | 2018-08-16 13:31:44,669 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-120
 INFO | 2018-08-16 13:31:44,669 | eam-0 | AgentImpl.java | 421 | Goal state changed from enabled to uninstalled (agent-120)
 INFO | 2018-08-16 13:31:44,684 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-123
 INFO | 2018-08-16 13:31:44,685 | eam-0 | AgentImpl.java | 421 | Goal state changed from enabled to uninstalled (agent-123)
 INFO | 2018-08-16 13:31:44,697 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-126
 INFO | 2018-08-16 13:31:44,698 | eam-0 | AgentImpl.java | 421 | Goal state changed from enabled to uninstalled (agent-126)
 INFO | 2018-08-16 13:31:44,707 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-129
 INFO | 2018-08-16 13:31:44,708 | eam-0 | AgentImpl.java | 421 | Goal state changed from enabled to uninstalled (agent-129)
 INFO | 2018-08-16 13:31:44,722 | eam-0 | AgentImpl.java | 1229 | Scheduling agent for removal: agent-132


They went further and found out that vCenter failure was due to a disconnect of vCenter from the database. It is a known issue with VMware vCenter Server regarding these extract disconnects and is fixed with 5.5 Update 3i.

If you don't want to upgrade your vCenter, you can mitigate the issue by cleaning up the database. The largest factor that slows the database down is your events tables. I recommend taking a backup (for auditing purposes in case you need to go back and look at a specific event) and then truncate the events using the instructions below.

1. Stop the vCenter Server Service (Must do).
2. Take the Full backup of DB (Must do).
3: Execute following SQL Queries:

alter table VPX_EVENT_ARG drop constraint FK_VPX_EVENT_ARG_REF_EVENT, FK_VPX_EVENT_ARG_REF_ENTITY

alter table VPX_ENTITY_LAST_EVENT drop constraint FK_VPX_LAST_EVENT_EVENT

truncate table VPX_ENTITY_LAST_EVENT

truncate table VPX_EVENT

truncate table VPX_EVENT_ARG

alter table VPX_EVENT_ARG add constraint FK_VPX_EVENT_ARG_REF_EVENT foreign key(EVENT_ID) references VPX_EVENT (EVENT_ID) on delete cascade, constraint FK_VPX_EVENT_ARG_REF_ENTITY foreign key (OBJ_TYPE) references VPX_OBJECT_TYPE (ID)

alter table VPX_ENTITY_LAST_EVENT add constraint FK_VPX_LAST_EVENT_EVENT foreign key(LAST_EVENT_ID) references VPX_EVENT (EVENT_ID) on delete cascade
  
4. Restart the vCenter Server Service.




Comments

Post a Comment

Popular posts from this blog

How to Check Up Time of ESXi Host or Virtual Machine from vCenter Server

To check up time of ESXi host from vCenter server - Logon to vCenter Server -  If using C# (fat) client - click on cluster - on right side pane click on Hosts - select the up time option by right clicking on the header. The column will be added at the end. You can drag and drop column to place it in your view. If using HTML client - click on cluster - on right side pane click on Hosts - select the up time attribute from show/hide column by clicking on the header  If using web (Flash)  client - click on cluster - on right side pane click on related objects - hosts - right click on the header to show/hide column - click on up time attribute Similarly,  you can check the up time for a virtual machine by clicking on the virtual machine tab instead of host tab, but only caveat  is vCenter gives you up time of a virtual machine according to reboots performed from the vCenter. So, rather than rebooting server fr...

VMWare ESXi Host showing Up Time as 0 second

I have seen this more on 5.5 ESXi hosts but the host shows the up time as 0 seconds and CPU and Memory percentage as 0% Easiest way to resolve this is to put the host in maintenance mode and then restart the management agents on the ESXi hosts.  #esxcli services.sh restart But to be more cleaner I prefer to put host in maintenance mode and reboot the host to get the clean state.