Introduce post_progress_to_api() in misc/api.func — a non-blocking, fire-and-forget curl ping (gated by DIAGNOSTICS and RANDOM_UUID) that updates telemetry status to "configuring". Wire this progress ping into multiple scripts (alpine-install.func, install.func, build.func, core.func) at key milestones (container start, network ready, customization, creation, cleanup) and replace/deduplicate some earlier post_to_api calls. Also update error_handler.func to always report failures immediately via post_update_to_api to ensure failures are captured even before/after container lifecycle.
Prevent hangs when pulling logs from containers by wrapping pct pull calls with timeout (8s) and running ensure_log_on_host under timeout (10s). Always send telemetry (post_update_to_api) before attempting best-effort log collection so status is reported even if log retrieval blocks. Update EXIT/ERR/SIGHUP/SIGINT/SIGTERM traps and consolidate error/interrupt handlers to use the new timeouted log collection. Changes in misc/build.func and misc/error_handler.func.
* fix: send telemetry BEFORE log collection in signal handlers
- Swap ensure_log_on_host/post_update_to_api order in on_interrupt, on_terminate, api_exit_script, and inline SIGHUP/SIGINT/SIGTERM traps
- For signal exits (>128): send telemetry immediately, then best-effort log collection
- Add 2>/dev/null || true to all I/O in signal handlers to prevent SIGPIPE
- Fix on_exit: exit_code=0 now reports 'done' instead of 'failed 1'
- Root cause: pct pull hangs on dying containers blocked telemetry updates, leaving 595+ records stuck in 'installing' daily
* feat: add execution_id to all telemetry payloads
- Generate EXECUTION_ID from RANDOM_UUID in variables()
- Export EXECUTION_ID to container environment
- Add execution_id field to all 8 API payloads in api.func
- Add execution_id to post_progress_to_api in install.func and alpine-install.func
- Fallback to RANDOM_UUID when EXECUTION_ID not set (backward compat)
* fix: correct telemetry type values for PVE and addon scripts
- PVE scripts (tools/pve/*): change type 'tool' -> 'pve'
- Addon scripts (tools/addon/*): fix 4 scripts that wrongly used 'tool' -> 'addon'
(netdata, add-tailscale-lxc, add-netbird-lxc, all-templates)
- api.func: post_tool_to_api sends type='pve', default fallback 'pve'
- Aligns with PocketBase categories: lxc, vm, pve, addon
* fix: persist diagnostics opt-in inside containers for addon telemetry
- install.func + alpine-install.func: create /usr/local/community-scripts/diagnostics
inside the container when DIAGNOSTICS=yes (from build.func export)
- Enables addon scripts running later inside containers to find the opt-in
- Update init_tool_telemetry default type from 'tool' to 'pve'
* refactor: clean up diagnostics/telemetry opt-in system
- diagnostics_check(): deduplicate heredoc (was 2x 22 lines), improve whiptail
text with clear what/what-not collected, add telemetry + privacy links
- diagnostics_menu(): better UX with current status, clear enable/disable
buttons, note about existing containers
- variables(): change DIAGNOSTICS default from 'yes' to 'no' (safe: no
telemetry before user consents via diagnostics_check)
- install.func + alpine-install.func: persist BOTH yes AND no in container
so opt-out is explicit (not just missing file = no)
- Fix typo 'menue' -> 'menu' in config file comments
* fix: no pre-selection in telemetry dialog, link to telemetry-service README
- Add --defaultno so 'No, opt out' is focused by default (user must Tab to Yes)
- Change privacy link from discussions/1836 to telemetry-service#privacy--compliance
* fix: use radiolist for telemetry dialog (no pre-selection)
- Replace --yesno with --radiolist: user must actively SPACE-select an option
- Both options start as OFF (no pre-selection)
- Cancel/Exit defaults to 'no' (opt-out)
* simplify: inline telemetry dialog text like other whiptail dialogs
* improve: telemetry dialog with more detail, link to PRIVACY.md
- Add what we collect / don't collect sections back to dialog
- Link to telemetry-service/docs/PRIVACY.md instead of README anchor
- Update config file comment with same link
* Ensure API update is sent on script exit
Add exit-time telemetry handling across scripts to avoid orphaned "installing" records. Introduce local exit_code capture in api_exit_script and cleanup handlers and, when POST_TO_API_DONE is true but POST_UPDATE_DONE is not, post a final status (marking failures on non-zero exit codes, or marking done/failed in VM cleanups based on exit code). Changes touch misc/build.func, misc/vm-core.func and various vm/*-vm.sh cleanup functions to reliably send post_update_to_api on normal or early exits.
* Update api.func
* fix(telemetry): add missing exit codes to explain_exit_code()
- Add curl error codes: 4, 5, 8, 23, 25, 30, 56, 78
- Add code 10: Docker/privileged mode required (used in ~15 scripts)
- Add code 75: Temporary failure (retry later)
- Add BSD sysexits.h codes: 64-77
- Sync error_handler.func fallback with canonical api.func
* fix(telemetry): improve error reporting with structured error strings and better categorization
- Add build_error_string() that creates structured format:
'exit_code=N | description\n---\n<last 20 log lines>'
- Fix categorize_error() to map ALL known exit codes:
- Added: shell(1,2), proxmox(200-231), service(150-154),
database(170-193), runtime(243-249), signal(139,141,143)
- Split timeout from network (28 was in both)
- Added DPKG(255) to dependency category
- Update all API functions to use build_error_string():
post_update_to_api, post_update_to_api_extended,
post_tool_to_api, post_addon_to_api
- Add ensure_log_on_host() calls to on_exit, on_interrupt,
on_terminate handlers to prevent race condition where
telemetry reports before container log is pulled to host
* fix(ui): improve error output formatting and remove redundant log paths
- error_handler: Use msg_info/msg_ok/msg_warn for container cleanup
instead of raw echo with manual ANSI codes
- error_handler: Add ❓ icon before 'Remove broken container?' prompt
- error_handler: Indent log output with TAB for visual consistency
- build.func: Use msg_custom for installation log path display
- build.func: Use msg_info → msg_ok for container removal flow
- build.func: Use msg_warn for 'kept for debugging' message
- core.func/vm-core.func: Remove redundant container-internal log
path display (📋 View full log) since combined log on host is
the canonical location shown after failure
Enhance post_update_to_api to support a "force" mode and robust retry logic: add a 3rd-arg bypass to duplicate suppression, capture a short error summary, and perform up to three POST attempts (full payload, shortened error payload, minimal payload) with HTTP code checks and small backoffs. Mark POST_UPDATE_DONE on success (or after three attempts) to avoid infinite retries. Also invoke post_update_to_api with the "force" flag from cleanup paths in build.func and error_handler.func so a final status update is attempted after cleanup.
The catch_errors() function in CT scripts overrides the API telemetry
traps set by build.func. This caused on_exit, on_interrupt, and
on_terminate to never call post_update_to_api, leaving telemetry
records permanently stuck on 'installing'.
Changes:
- on_exit: Report orphaned 'installing' records on ANY exit where
post_to_api was called but post_update_to_api was not
- on_interrupt: Call post_update_to_api('failed', '130') before exit
- on_terminate: Call post_update_to_api('failed', '143') before exit
All calls are guarded by POST_UPDATE_DONE flag to prevent duplicates.
* Remove Go API and extend misc/api.func
Delete the Go-based API (api/main.go, api/go.mod, api/go.sum, api/.env.example) and significantly enhance misc/api.func. The shell telemetry file now includes telemetry configuration, repo source detection, GPU/CPU/RAM detection, expanded explain_exit_code mappings, and refactored post_to_api/post_to_api_vm to send non-blocking telemetry to telemetry.community-scripts.org while respecting DIAGNOSTICS/DEV_MODE and adding richer metadata (cpu/gpu/ram/repo_source). Also updates header/author info and improves privacy/robustness and error handling.
* Start install timer and refine error reporting
Call start_install_timer during build startup and overhaul exit/error reporting.
Changes:
- Invoke start_install_timer early in misc/build.func to track install duration.
- Update api_exit_script comments to reference PocketBase/api.func and adjust ERR/SIGINT/SIGTERM traps to post numeric exit codes (use $? / 130 / 143) instead of command strings.
- Replace the previous explain_exit_code implementation with a conditional fallback: only define explain_exit_code if not already provided (api.func is the canonical source). Expanded and reorganized exit code mappings (curl, timeout, systemd, Node/Python/Postgres/MySQL/MongoDB, Proxmox, etc.).
- In error_handler: stop echoing the container log path (host shows combined log), and post a "failed" update to the API with the exit code before offering container cleanup.
Rationale: these changes make telemetry more consistent and robust (numeric codes), provide a safe fallback for exit descriptions when api.func isn't loaded, and ensure failures are reported to the API prior to any automatic cleanup.
* Report install start/failure to telemetry API
Add telemetry hooks in misc/build.func: call post_to_api at installation start to capture early or immediately-failing installs, and call post_update_to_api with status "failed" and the install exit code when a container installation fails. This improves visibility into install failures for monitoring/telemetry.
* core: enhance storage type validation and error codes
Improve storage validation for LXC container creation with
explicit checks for unsupported storage types.
Changes:
- Add validation for storage types that don't support containers:
- iscsidirect (exit 212) - VMs only
- iscsi/zfs (exit 213) - no rootdir support
- cephfs (exit 219) - use RBD instead
- pbs (exit 224) - backups only
- Add connectivity check for network storage (linstor, rbd, nfs, cifs)
- Simplify storage content validation using pvesm status
- Reorganize Proxmox error codes (200-231) for consistency
- Update error messages to be more descriptive and actionable
This helps users identify storage compatibility issues early
before container creation fails with cryptic errors.
* Update build.func
* Refactor Core
Refactored misc/alpine-install.func to improve error handling, network checks, and MOTD setup. Added misc/alpine-tools.func and misc/error_handler.func for modular tool installation and error management. Enhanced misc/api.func with detailed exit code explanations and telemetry functions. Updated misc/core.func for better initialization, validation, and execution helpers. Removed misc/create_lxc.sh as part of cleanup.
* Delete config-file.func
* Update install.func
* Refactor stop_all_services function and variable names
Refactor service stopping logic and improve variable handling
* Refactor installation script and update copyright
Updated copyright information and adjusted package installation commands. Enhanced IPv6 disabling logic and improved container customization process.
* Update install.func
* Update license comment format in install.func
* Refactor IPv6 handling and enhance MOTD and SSH
Refactor IPv6 handling and update OS function. Enhance MOTD with additional details and configure SSH settings.
* big core refactor
* Enhance IPv6 configuration menu options
Updated IPv6 Address Management menu options for clarity and added a new option for fully disabling IPv6.
* Update default Node.js version to 24 LTS
* Update misc/alpine-tools.func
Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>
* indention
* remove debugf and duplicate codes
* Update whiptail backtitles and error codes
Removed '[dev]' from whiptail --backtitle strings for consistency. Refactored custom exit codes in build.func and error_handler.func: updated Proxmox error codes, shifted MySQL/MariaDB codes to 260-263, and removed unused MongoDB code. Updated error descriptions to match new codes.
* comments
* Refactor error handling and clean up debug comments
Standardized bash variable checks, removed unnecessary debug and commented code, and clarified error handling logic in container build and setup scripts. These changes improve code readability and maintainability without altering functional behavior.
* Update build.func
* feat: Improve LXC network checks and LINSTOR storage handling
Enhanced LXC container network setup to check for both IPv4 and IPv6 addresses, added connectivity (ping) tests, and provided troubleshooting tips on failure. Updated storage validation to support LINSTOR, including cluster connectivity checks and special handling for LINSTOR template storage.
---------
Co-authored-by: Michel Roegl-Brunner <73236783+michelroegl-brunner@users.noreply.github.com>