Opened 5 years ago
Last modified 5 years ago
#19295 new defect
VM reset ("HeartbeatFlatlinedTimer") under load
| Reported by: | Timothe Litt | Owned by: | |
|---|---|---|---|
| Component: | other | Version: | VirtualBox 6.0.16 |
| Keywords: | Cc: | ||
| Guest type: | Linux | Host type: | Linux |
Description
Since upgrading to 6.1.2, I'm seeing reboots during guest backup runs.
The guest is Linux 2.6.22.14-100 #1 SMP Wed Apr 8 18:07:54 EDT 2015 i686 i686 i386 GNU/Linux
Prior to the upgrade, this was never seen.
The backup is tar (some excludes) -czpf (NFS mountpoint) |tee (log) | grep -v (noise)
A full backup is about 75GB; the crashes generate between 3 and 40GB before the crash.
Log file attached.
The relevant log lines appear to be:
90:00:58.137957 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago 90:01:56.653151 Reset initiated by keyboard controller 90:01:56.682588 Changing the VM state from 'RUNNING' to 'RESETTING'
Let me know if you need more info.
Is there a way to disable either the detection or the reset action until a fix is available?
I'd like to be able to complete a backup!
Attachments (1)
Change History (5)
by , 5 years ago
| Attachment: | vbox_flatline.log.zip added |
|---|
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Yes, I understand that the guest appears unresponsive. It tends to have a very high load average during backup - tar is reading the filesystem, a gzip process is doing the compression, plus its usual (modest) load.
I never saw this issue with previous VirtualBox versions - which have run this machine for years. So I think it's a false detection. Perhaps VBox has become more sensitive. If I can turn off the detection, we can determine where the issue is. If the Backup completes, it's a false detection. If the guest ends up hung, it's not...
The host and NAS are connected on a (switched) gbit ethernet.
Fedora Core release 6 (Zod), kernel built for 100Hz
Ran with one cpu. Saw load average move up to about 2.75 but may have gone higher as the Backup takes a very long time and I worked on something else.
Same keyboard controller initiated reset after 2.6GB.
What is generating the guest side of the heartbeat? Can it be stopped? Priority increased?
comment:3 by , 5 years ago
I tried stopping the vboxadd and vboxadd-service.
Still got this reset:
135:18:50.898442 VMMDev: Guest Log: 02:09:17.155667 control Guest control service stopped 135:18:50.939824 VMMDev: Guest Log: 02:09:17.194566 control Guest control worker returned with rc=VINF_TRY_AGAIN 135:18:50.942123 VMMDev: Guest Log: 02:09:17.199284 main Session 0 is about to close ... 135:18:50.942290 VMMDev: Guest Log: 02:09:17.199521 main Stopping all guest processes ... 135:18:50.942480 VMMDev: Guest Log: 02:09:17.199671 main Closing all guest files ... 135:18:51.007628 VMMDev: Guest Log: 02:09:17.265187 main Ended. 163:23:12.713790 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago 163:24:10.179844 Reset initiated by keyboard controller
Not clear what to try next...
comment:4 by , 5 years ago
I updated VirtualBox to 6.1.4 r136177, no change.
Is there any additional data that would help debug this?


This looks like something within the guest is stuck rather than VMM. The heartbeat flatline is an indication that the guest has gone unresponsive for 4 seconds.