VirtualBox

Opened 5 years ago

Last modified 5 years ago

#19295 new defect

VM reset ("HeartbeatFlatlinedTimer") under load

Reported by: Timothe Litt Owned by:
Component: other Version: VirtualBox 6.0.16
Keywords: Cc:
Guest type: Linux Host type: Linux

Description

Since upgrading to 6.1.2, I'm seeing reboots during guest backup runs.

The guest is Linux 2.6.22.14-100 #1 SMP Wed Apr 8 18:07:54 EDT 2015 i686 i686 i386 GNU/Linux

Prior to the upgrade, this was never seen.

The backup is tar (some excludes) -czpf (NFS mountpoint) |tee (log) | grep -v (noise)

A full backup is about 75GB; the crashes generate between 3 and 40GB before the crash.

Log file attached.

The relevant log lines appear to be:

90:00:58.137957 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago
90:01:56.653151 Reset initiated by keyboard controller
90:01:56.682588 Changing the VM state from 'RUNNING' to 'RESETTING'

Let me know if you need more info.

Is there a way to disable either the detection or the reset action until a fix is available?

I'd like to be able to complete a backup!

Attachments (1)

vbox_flatline.log.zip (20.2 KB ) - added by Timothe Litt 5 years ago.

Download all attachments as: .zip

Change History (5)

by Timothe Litt, 5 years ago

Attachment: vbox_flatline.log.zip added

comment:1 by Ramshankar Venkataraman, 5 years ago

This looks like something within the guest is stuck rather than VMM. The heartbeat flatline is an indication that the guest has gone unresponsive for 4 seconds.

  1. Let us know which Fedora guest this is exactly, so we can download the ISO and try to reproduce the problem here.
  1. Does changing the number of VCPUs to 1 (from 4) have any effect on the problem?

comment:2 by Timothe Litt, 5 years ago

Yes, I understand that the guest appears unresponsive. It tends to have a very high load average during backup - tar is reading the filesystem, a gzip process is doing the compression, plus its usual (modest) load.

I never saw this issue with previous VirtualBox versions - which have run this machine for years. So I think it's a false detection. Perhaps VBox has become more sensitive. If I can turn off the detection, we can determine where the issue is. If the Backup completes, it's a false detection. If the guest ends up hung, it's not...

The host and NAS are connected on a (switched) gbit ethernet.

Fedora Core release 6 (Zod), kernel built for 100Hz

Ran with one cpu. Saw load average move up to about 2.75 but may have gone higher as the Backup takes a very long time and I worked on something else.

Same keyboard controller initiated reset after 2.6GB.

What is generating the guest side of the heartbeat? Can it be stopped? Priority increased?

comment:3 by Timothe Litt, 5 years ago

I tried stopping the vboxadd and vboxadd-service.

Still got this reset:

135:18:50.898442 VMMDev: Guest Log: 02:09:17.155667 control  Guest control service stopped
135:18:50.939824 VMMDev: Guest Log: 02:09:17.194566 control  Guest control worker returned with rc=VINF_TRY_AGAIN
135:18:50.942123 VMMDev: Guest Log: 02:09:17.199284 main     Session 0 is about to close ...
135:18:50.942290 VMMDev: Guest Log: 02:09:17.199521 main     Stopping all guest processes ...
135:18:50.942480 VMMDev: Guest Log: 02:09:17.199671 main     Closing all guest files ...
135:18:51.007628 VMMDev: Guest Log: 02:09:17.265187 main     Ended.
163:23:12.713790 VMMDev: vmmDevHeartbeatFlatlinedTimer: Guest seems to be unresponsive. Last heartbeat received 4 seconds ago
163:24:10.179844 Reset initiated by keyboard controller

Not clear what to try next...

comment:4 by Timothe Litt, 5 years ago

I updated VirtualBox to 6.1.4 r136177, no change.

Is there any additional data that would help debug this?

Note: See TracTickets for help on using tickets.

© 2024 Oracle Support Privacy / Do Not Sell My Info Terms of Use Trademark Policy