Windows 10 Always On VPN clients regularly drop VPN connection with error 829

John Perkins 1 Reputation point
2020-12-02T21:54:32.453+00:00

Since the November 2020 patch updates, we have a number of Windows 10 1909 (64-bit) laptop clients that periodically drop their Always On VPN device tunnels. The Windows 10 clients report a RasClient error 829 in most situations, although we sometimes see error 828.

There is no sign of the wireless connections failing when this occurs.

The Remote Access server runs Server 2016. Tunnels are certificate-based IPsec VPN links.

There was a known issue with Windows 10 2004 clients that should have been resolved with the September 2020 monthly update. Given that we're a few months after that patch update and on a different Windows 10 build, it seems unlikely to be the same cause.

Any suggestions for what might be causing this or how to clear up the issue?

Windows 10 Network
Windows 10 Network
Windows 10: A Microsoft operating system that runs on personal computers and tablets.Network: A group of devices that communicate either wirelessly or via a physical connection.
2,413 questions
Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
1,044 questions
{count} votes

10 answers

Sort by: Most helpful
  1. Marco Hald 1 Reputation point
    2022-02-28T07:59:45.787+00:00

    Hi Gary,

    thank you for all the Information. We can keep the Thread in English so anybody can use the Knowledge we gain about the drops.

    I have deployed a Script to our Client Machines which run the netsh trace command to generate the etl Files.
    I hope I get a few cases with the etl Files to find what the Problem is.

    I generated one on a virtual Machine and it contains only this providers:

    Microsoft-Windows-TCPIP
    Microsoft-Windows-NDIS-PacketCapture
    Microsoft-Windows-WFP
    Microsoft-Windows-WebIO
    Microsoft-Windows-RRAS
    Microsoft-Windows-Ras-NdisWanPacketCapture

    I used the Command " netsh trace start provider=Microsoft-Windows-RRAS provider=Microsoft-Windows-TCPIP provider=Microsoft-Windows-WFP provider=Microsoft-Windows-Ras-NdisWanPacketCapture provider=Microsoft-Windows-RasSstp provider=Microsoft-Windows-WebIO provider={106B464D-8043-46B1-8CB8-E92A0CD7A560} keywords=0xFFFFFFFFFFFFFFFF level=255 Ethernet.Type=(IPv4,IPv6,0) Wifi.Type=Data capture=yes report=disabled correlation=disabled overwrite=yes tracefile=vpn-prob.etl" from your Blog.

    The File is 500 MB and contains only 20 Seconds worth of data.
    I will try to generate another one without the Microsoft-Windows-TCPIP Provider which seems to produce the most of the data.
    I stop the capture via the Task Scheduler when the Event Code is received. But the Connection could be dead before the Event is triggered and the Data in the ETL is already overwritten.

    The Output of the logman query from you does not even contain the Provider "Microsoft-Windows-RasSstp" so maybe this isn't in use anymore.

    0 comments No comments

  2. Gary Nebbett 6,211 Reputation points
    2022-02-28T09:31:09.73+00:00

    Hello Marco,

    My approach to troubleshooting would be to first get a very rough understanding of what is happening (any error messages, any reproducible behaviour to trigger the problem, frequency of occurence, etc.) and then, if appropriate, use event tracing.

    If the problem affects both SSTP and IKEv2 (or L2TP/IPsec) then it is unlikely to be a protocol specific problem (and all of the detailed protocol specific trace data will just be a distraction).

    The amount of output produced by trace providers can be reduced by judicious choice of provider specific "keywords" (and sometimes also other filtering mechanisms).

    One can get a rough idea of the type of events generated by some providers by using a command like wevtutil gp Microsoft-Windows-RasSstp /ge:true /gm:true. In this case (RasSstp), all of the events are error events likely to only occur during the establishment of the SSTP connection (not once it is established).

    All that we currently know about your problem is that you have a "similar Problem with disconnecting VPN Session". With such a broad problem description, a simple network trace (created via pktmon, "netsh trace", WIreShark or anything else of that type) would probably be the best starting point.

    Gary

    0 comments No comments

  3. Marco Hald 1 Reputation point
    2022-02-28T10:23:33.94+00:00

    Today I had 9 different users with disconnects and 16 disconnects in total on currently 24 active VPN Connections (gathered direct from the clients via a Script).
    The Server is a Windows Server 2019 VM on esxi and offers only SSTP.
    I'm setting up a Windows Server 2022 and will enable IKEv2 on it and try it with a few Problematic Clients.

    The Problem is that most users use their VPN over their Home Wifi Connection so we can't say what the Problem really is.
    On the User side only the Error 829 is logged in the Eventlog. Most User do not even report the Problems direct to us.
    178340-image.png
    The Time between disconnects also vary and not all users are disconnected at the same time

    0 comments No comments

  4. Gary Nebbett 6,211 Reputation points
    2022-02-28T12:02:46.1+00:00

    Hello Marco,

    There is not "one" right way of starting this troubleshooting and there are lots of "external" influences (frequency of occurrence, helpfulness and IT know-how of users, storage/processing difficulties with large trace files, etc.).

    If some clients are more inclined to exhibit the problem than others, then I would try using "pktmon" to capture the first few bytes of the SSTP traffic over a long period.

    For example:

    pktmon filter add SSTP -p 443 -i 2a00:1450:400a:801::2004
    pktmon start --capture --comp nics --flags 0x10 --file-name why.etl
    [wait for a day or so]
    pktmon stop

    Obviously, replacing the IPv4/IPv6 address with the address of the VPN server. This will capture the first 128 bytes of the SSTP packets.

    The resulting trace data won't help very much, but it would be a start. One would see if the connection was "forcibly" disconnected (e.g. with a TCP RST) or whether packets just stop flowing over the connection (no response from VPN server).

    That might not sound like much reward for the effort, but the results would help when thinking about the next step.

    Gary

    0 comments No comments

  5. Marco Hald 1 Reputation point
    2022-03-23T07:20:35.947+00:00

    Hi Gary,

    sorry for the late response. I tried several things to try to troubleshoot the SSTP connection.
    But didn't find anything useful in the generated logs and no user really reported a problem, even when we see multiple disconnects per day from one user.

    The Solution for us was offering a Connection that only does L2TP and the SSTP as a fallback connection.
    L2TP seems to be way more robust in my testing to short connection losses. Even a disconnect of the Nic from the VM I was testing it did not kill the connection.
    It began to work again after i reconnected the NIC.

    However we continue to log the connections and I will post a Update if we find the root cause.
    But for now we won't continue troubleshooting as long a no users files a Ticket.

    Here are some scripts that might be useful for others when they need to troubleshoot such a Problem.
    https://github.com/marcohald/SSTP-Troubleshooting

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.