Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper models (current and all previous) are being backed up despite the documentation saying they are not. #3598

Open
donburch888 opened this issue May 13, 2024 · 2 comments

Comments

@donburch888
Copy link

Describe the issue you are experiencing

I just created a Full Backup (System > Backups > Create backup > Full Backup) taking 3840.1MB. Download to my PC and open to see:
Screenshot from 2024-05-13 20-43-03

Note the last entry ... core_whisper.tar.gz 2.8GB (out of the 3.8GB file); and opening that file contains the "medium.en" whisper model I am currently using, plus the files for previously used models.

Whisper documentation says

**Backups**

Whisper model files can be quite large, so they are automatically excluded from backups.

but ... not only is the current Whisper model being included in each Full Backup - wasting limited disk space - but previously used models are also being kept and included in each Full Backup. Given the size of these models, and that storage on Raspberry Pi is very limited; I am surprised this has been outstanding 11 months. If it is not considered important, the documentation should be corrected.

The other question is, how to remove the no-longer-used whisper models on a HAOS system ?

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

Whisper

What is the version of the add-on?

2.0.0

Steps to reproduce the issue

  1. On a HAOS system with Whisper, and especially with a large model
  2. System > Backups > Create backup > Full Backup
  3. copy the Full Backup file to a system where it can be opened
  4. open the Full Backup file and check the contents of the core_whisper.tar.gz file

System Health information

System Information

version core-2024.5.3
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.2
os_name Linux
os_version 6.6.29-haos
arch x86_64
timezone Australia/Sydney
config_dir /config
Home Assistant Community Store
GitHub API ok
GitHub Content ok
GitHub Web ok
GitHub API Calls Remaining 4995
Installed Version 1.34.0
Stage running
Available Repositories 1462
Downloaded Repositories 15
Home Assistant Cloud
logged_in true
subscription_expiration 3 June 2024 at 10:00 am
relayer_connected true
relayer_region ap-southeast-1
remote_enabled true
remote_connected true
alexa_enabled false
google_enabled false
remote_server ap-southeast-1-1.ui.nabu.casa
certificate_status ready
instance_id 5ed6a2304786423a8e093715817678d7
can_reach_cert_server ok
can_reach_cloud_auth ok
can_reach_cloud ok
Home Assistant Supervisor
host_os Home Assistant OS 12.3
update_channel stable
supervisor_version supervisor-2024.05.1
agent_version 1.6.0
docker_version 25.0.5
disk_total 38.7 GB
disk_used 37.5 GB
healthy true
supported true
board ova
supervisor_api ok
version_api ok
installed_addons File editor (5.8.0), Samba share (12.3.1), Terminal & SSH (9.14.0), Mosquitto broker (6.4.0), Node-RED (17.0.12), Rhasspy Assistant (2.5.11), openWakeWord (1.10.0), Whisper (2.0.0), Piper (1.5.0), ESPHome (2024.4.2), Music Assistant BETA (2.0.2)
Dashboards
dashboards 2
resources 8
views 7
mode storage
Recorder
oldest_recorder_run 3 May 2024 at 3:55 am
current_recorder_run 12 May 2024 at 9:51 am
estimated_db_size 414.31 MiB
database_engine sqlite
database_version 3.44.2

Anything in the Supervisor logs that might be useful for us?

2024-05-13 19:48:35.673 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_samba
2024-05-13 19:48:35.676 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_samba
2024-05-13 19:48:35.688 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_ssh
2024-05-13 19:48:35.695 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_ssh
2024-05-13 19:48:35.707 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_mosquitto
2024-05-13 19:48:35.710 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_mosquitto
2024-05-13 19:48:35.721 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on a0d7b954_nodered
2024-05-13 19:48:36.966 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon a0d7b954_nodered
2024-05-13 19:50:37.106 INFO (SyncWorker_4) [supervisor.docker.manager] Export image <generator object APIClient._stream_raw_result at 0x7efd5e211b70> to /data/tmp/tmpmk2owapi/image.tar
2024-05-13 19:51:48.968 INFO (SyncWorker_4) [supervisor.docker.manager] Export image <generator object APIClient._stream_raw_result at 0x7efd5e211b70> done
2024-05-13 19:52:05.759 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on 47701997_rhasspy
2024-05-13 19:52:12.213 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon 47701997_rhasspy
2024-05-13 19:52:12.255 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_openwakeword
2024-05-13 19:52:12.261 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_openwakeword
2024-05-13 19:52:12.278 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_whisper
2024-05-13 19:53:39.298 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-05-13 19:54:21.959 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_whisper
2024-05-13 19:54:26.285 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on core_piper
2024-05-13 19:54:32.236 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon core_piper
2024-05-13 19:54:33.526 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on 5c53de3b_esphome
2024-05-13 19:54:33.528 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon 5c53de3b_esphome
2024-05-13 19:54:33.570 INFO (MainThread) [supervisor.addons.addon] Building backup for add-on d5369777_music_assistant_beta
2024-05-13 19:54:34.690 INFO (MainThread) [supervisor.addons.addon] Finish backup for addon d5369777_music_assistant_beta
2024-05-13 19:54:34.690 INFO (MainThread) [supervisor.backups.manager] Backup cd6c24bb starting stage home_assistant
2024-05-13 19:54:43.360 INFO (MainThread) [supervisor.homeassistant.module] Backing up Home Assistant Core config folder
2024-05-13 19:54:48.729 INFO (MainThread) [supervisor.homeassistant.module] Backup Home Assistant Core config folder done
2024-05-13 19:54:48.733 INFO (MainThread) [supervisor.backups.manager] Backup cd6c24bb starting stage folders
2024-05-13 19:54:48.733 INFO (SyncWorker_0) [supervisor.backups.backup] Backing up folder share
2024-05-13 19:54:50.940 INFO (SyncWorker_0) [supervisor.backups.backup] Backup folder share done
2024-05-13 19:54:50.942 INFO (SyncWorker_4) [supervisor.backups.backup] Backing up folder addons/local
2024-05-13 19:54:50.944 INFO (SyncWorker_4) [supervisor.backups.backup] Backup folder addons/local done
2024-05-13 19:54:50.944 INFO (SyncWorker_1) [supervisor.backups.backup] Backing up folder ssl
2024-05-13 19:54:50.947 INFO (SyncWorker_1) [supervisor.backups.backup] Backup folder ssl done
2024-05-13 19:54:50.948 INFO (SyncWorker_2) [supervisor.backups.backup] Backing up folder media
2024-05-13 19:54:50.990 INFO (SyncWorker_2) [supervisor.backups.backup] Backup folder media done
2024-05-13 19:54:50.990 INFO (MainThread) [supervisor.backups.manager] Backup cd6c24bb starting stage finishing_file
2024-05-13 19:54:50.992 INFO (MainThread) [supervisor.backups.manager] Creating full backup with slug cd6c24bb completed
2024-05-13 19:54:51.004 INFO (MainThread) [supervisor.backups.manager] Found 16 backup files
2024-05-13 19:54:51.019 ERROR (MainThread) [supervisor.backups.backup] Can't read backup tarfile /data/backup/d95b95c7.tar: "filename './backup.json' not found"
2024-05-13 20:04:43.454 INFO (MainThread) [supervisor.api.backups] Downloading backup cd6c24bb
2024-05-13 20:23:42.524 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-05-13 20:23:59.033 INFO (MainThread) [supervisor.backups.manager] Found 16 backup files
2024-05-13 20:23:59.049 ERROR (MainThread) [supervisor.backups.backup] Can't read backup tarfile /data/backup/d95b95c7.tar: "filename './backup.json' not found"
2024-05-13 20:23:59.057 INFO (MainThread) [supervisor.backups.manager] Found 16 backup files
2024-05-13 20:23:59.074 ERROR (MainThread) [supervisor.backups.backup] Can't read backup tarfile /data/backup/d95b95c7.tar: "filename './backup.json' not found"
2024-05-13 20:51:18.817 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state running
2024-05-13 20:51:18.817 INFO (MainThread) [supervisor.resolution.checks.base] Run check for free_space/system
2024-05-13 20:51:18.817 INFO (MainThread) [supervisor.resolution.module] Create new suggestion clear_full_backup - system / None
2024-05-13 20:51:18.818 INFO (MainThread) [supervisor.resolution.module] Create new issue free_space - system / None
2024-05-13 20:51:18.818 INFO (MainThread) [supervisor.resolution.checks.base] Run check for trust/supervisor
2024-05-13 20:51:18.823 INFO (MainThread) [supervisor.resolution.checks.base] Run check for multiple_data_disks/system
2024-05-13 20:51:18.823 INFO (MainThread) [supervisor.resolution.checks.base] Run check for security/core
2024-05-13 20:51:18.824 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_failed/dns_server
2024-05-13 20:51:18.888 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_ipv6_error/dns_server
2024-05-13 20:51:18.888 INFO (MainThread) [supervisor.resolution.checks.base] Run check for docker_config/system
2024-05-13 20:51:18.888 INFO (MainThread) [supervisor.resolution.checks.base] Run check for pwned/addon
2024-05-13 20:51:18.889 INFO (MainThread) [supervisor.resolution.checks.base] Run check for disabled_data_disk/system
2024-05-13 20:51:18.889 INFO (MainThread) [supervisor.resolution.checks.base] Run check for ipv4_connection_problem/system
2024-05-13 20:51:18.889 INFO (MainThread) [supervisor.resolution.checks.base] Run check for no_current_backup/system
2024-05-13 20:51:18.889 INFO (MainThread) [supervisor.resolution.check] System checks complete
2024-05-13 20:51:18.889 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
2024-05-13 20:51:18.954 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
2024-05-13 20:51:18.954 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state running
2024-05-13 20:51:18.954 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
2024-05-13 20:53:45.619 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-05-13 21:23:48.730 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2024-05-13 21:46:58.728 INFO (MainThread) [supervisor.host.info] Updating local host information
2024-05-13 21:46:59.158 INFO (MainThread) [supervisor.host.services] Updating service information
2024-05-13 21:46:59.161 INFO (MainThread) [supervisor.host.network] Updating local network information
2024-05-13 21:46:59.221 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
2024-05-13 21:46:59.226 INFO (MainThread) [supervisor.host.manager] Host information reload completed
2024-05-13 21:51:18.964 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state running
2024-05-13 21:51:18.964 INFO (MainThread) [supervisor.resolution.checks.base] Run check for trust/supervisor
2024-05-13 21:51:18.968 INFO (MainThread) [supervisor.resolution.checks.base] Run check for multiple_data_disks/system
2024-05-13 21:51:18.968 INFO (MainThread) [supervisor.resolution.checks.base] Run check for security/core
2024-05-13 21:51:18.969 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_failed/dns_server
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for dns_server_ipv6_error/dns_server
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for docker_config/system
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for pwned/addon
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for disabled_data_disk/system
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for ipv4_connection_problem/system
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.checks.base] Run check for no_current_backup/system
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.check] System checks complete
2024-05-13 21:51:19.999 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
2024-05-13 21:51:19.061 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
2024-05-13 21:51:19.062 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state running
2024-05-13 21:51:19.062 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
2024-05-13 21:53:31.875 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
2024-05-13 21:53:47.690 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device
2024-05-13 21:53:47.690 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device
2024-05-13 21:53:47.691 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device
2024-05-13 21:53:47.691 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device
2024-05-13 21:53:47.692 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device
2024-05-13 21:53:47.692 ERROR (MainThread) [supervisor.store] Could not reload repository d5369777 due to StoreJobError("'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device")
2024-05-13 21:53:47.692 ERROR (MainThread) [supervisor.store] Could not reload repository 47701997 due to StoreJobError("'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device")
2024-05-13 21:53:47.692 ERROR (MainThread) [supervisor.store] Could not reload repository 5c53de3b due to StoreJobError("'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device")
2024-05-13 21:53:47.692 ERROR (MainThread) [supervisor.store] Could not reload repository core due to StoreJobError("'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device")
2024-05-13 21:53:47.692 ERROR (MainThread) [supervisor.store] Could not reload repository a0d7b954 due to StoreJobError("'GitRepo.pull' blocked from execution, not enough free space (0.0GB) left on the device")
2024-05-13 21:53:48.039 INFO (MainThread) [supervisor.store] Loading add-ons from store: 88 all - 0 new - 0 remove
2024-05-13 21:53:48.039 INFO (MainThread) [supervisor.store] Loading add-ons from store: 88 all - 0 new - 0 remove
2024-05-13 21:53:51.827 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token

Anything in the add-on logs that might be useful for us?

[09:49:54] INFO: Service exited with code 256 (by signal 15)
s6-rc: info: service whisper successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
[2024-05-12 09:53:11.852] [ctranslate2] [thread 55] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
INFO:__main__:Ready
[09:53:12] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
INFO:faster_whisper:Processing audio with duration 00:01.350
INFO:wyoming_faster_whisper.handler: Happy light on!
INFO:faster_whisper:Processing audio with duration 00:02.170
INFO:wyoming_faster_whisper.handler: Study light on.
INFO:faster_whisper:Processing audio with duration 00:15.000
INFO:wyoming_faster_whisper.handler: Yeah. Yeah. Yeah. Mhm.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:31> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 41, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 95, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 29, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 131, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost
INFO:faster_whisper:Processing audio with duration 00:02.660
INFO:wyoming_faster_whisper.handler: So I had to spend two weeks in Tanzania...
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:31> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 41, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 95, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 29, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 131, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost
INFO:faster_whisper:Processing audio with duration 00:01.620
INFO:wyoming_faster_whisper.handler: Study light off.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:31> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 41, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 95, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 29, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 131, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost
INFO:faster_whisper:Processing audio with duration 00:01.410
INFO:wyoming_faster_whisper.handler: set the light on
INFO:faster_whisper:Processing audio with duration 00:02.140
INFO:wyoming_faster_whisper.handler: Turn steady, light on
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:31> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 41, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 95, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 29, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 131, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

Additional information

No response

@donburch888
Copy link
Author

On second thoughts, the folders for the previously used models are tiny, so I guess it is only the configurations which are kept ... though I don't think they should if no longer being used.

@rwjack
Copy link

rwjack commented May 24, 2024

#3577 related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants