mirror of
https://github.com/community-scripts/ProxmoxVE.git
synced 2026-02-18 19:15:55 +01:00
core: smart recovery for failed installs | extend exit_codes (#11221)
* feat(build.func): smart error recovery menu for failed installations Replace simple Y/n removal prompt with interactive recovery menu: - Option 1: Remove container and exit (default, auto after 60s timeout) - Option 2: Keep container for debugging - Option 3: Retry installation with verbose mode enabled - Option 4: Retry with 1.5x RAM and +1 CPU core (OOM errors only) Improvements: - Detect OOM errors (exit codes 137, 243) and offer resource increase - Show human-readable error explanation using explain_exit_code() - Recursive rebuild preserves ALL settings from advanced/app.vars/default.vars - Settings preserved: Network (IP, Gateway, VLAN, MTU, Bridge), Features (Nesting, FUSE, TUN, GPU), Storage, SSH keys, Tags, Hostname, etc. - Show rebuild summary before retry (old→new CTID, resources, network) - New container ID generated automatically for rebuilds This helps users recover from transient failures without re-running the entire script manually. * fix(api.func): fix duplicate exit codes and add missing error codes Exit code fixes: - Remove duplicate definitions for codes 243, 254 (Node.js vs DB) - Reassign MySQL/MariaDB to 240-242, 244 (was 241-244) - Reassign MongoDB to 250-253 (was 251-254) New exit codes added (based on GitHub issues analysis): - 6: curl couldn't resolve host (DNS failure) - 7: curl failed to connect (network unreachable) - 22: curl HTTP error (404, 429 rate limit, 500) - 28: curl timeout (very common in download failures) - 35: curl SSL error - 102: APT lock held by another process - 124: Command timeout - 141: SIGPIPE (broken pipe) Also update OOM detection to include exit code 134 (SIGABRT) which is commonly seen in Node.js heap overflow issues. Fixes based on analysis of ~500 GitHub issues. * fix(exit-codes): sync error_handler.func and api.func with conflict-free code ranges - Add curl error codes (6, 7, 22, 28, 35) - Add APT lock code (102), timeout (124), signals (134, 141) - Move Python codes: 210-212 → 160-162 (avoid Proxmox conflict) - Move PostgreSQL codes: 231-234 → 170-173 - Move MySQL/MariaDB codes: 241-244 → 180-183 - Move MongoDB codes: 251-254 → 190-193 - Keep Node.js at 243-249, Proxmox at 200-231 - Both files now synchronized with identical mappings * feat(exit-codes): add systemd and build error codes (150-154) - 150: Systemd service failed to start - 151: Systemd service unit not found - 152: Permission denied (EACCES) - 153: Build/compile failed (make/gcc/cmake) - 154: Node.js native addon build failed (node-gyp) Based on issue analysis: 57 service failures, 25 build failures, 22 node-gyp issues * fix(build): restore smart recovery and add OOM/DNS retry paths * feat(build): APT in-place repair, exit 1 subclassification, new exit codes - Add APT/DPKG in-place recovery: detects exit 100/101/102/255 and exit 1 with APT log patterns, offers to repair dpkg state and re-run install script without destroying the container - Add exit 1 subclassification: analyzes combined log to identify root cause (APT, OOM, network, command-not-found) and routes to appropriate recovery option - Add exit 10 hint: shows privileged mode / nesting suggestion - Add exit 127 hint: extracts missing command name from logs - Refactor recovery menu: use named option variables (APT_OPTION, OOM_OPTION, DNS_OPTION) instead of hardcoded option numbers, supports up to 6 dynamic options cleanly - Map missing exit codes in api.func: curl 27/36/45/47/55, signals 129 (SIGHUP) / 131 (SIGQUIT), npm 239 * feat(api+build): map 25 more exit codes, add SIGHUP trap, network/perm hints api.func: - Map 25+ new exit codes that were showing as 'Unknown' in telemetry: curl: 3, 16, 18, 24, 26, 32-34, 39, 44, 46, 48, 51, 52, 57, 59, 61, 63, 79, 92, 95; signals: 125, 132, 144, 146 - Update code 8 description (FTP + apk untrusted key) - Update header comment with full supported ranges build.func: - Add SIGHUP trap: reports 'failed/129' to API when terminal is closed, should significantly reduce the 2841 stuck 'installing' records - Add exit 52 (empty reply) and 57 (poll error) to network issue detection for DNS override recovery option - Add exit 125/126 hint: suggests privileged mode for permission errors * fix: sync error_handler fallback, Alpine APK repair, retry limit error_handler.func: - Sync fallback explain_exit_code() with api.func: add 25+ codes that were missing (curl 16/18/24/26/27/32-34/36/39/44-48/51/52/55/57/59/ 61/63/79/92/95, signals 125/129/131/132/144/146, npm 239, code 3/8) - Ensures consistent error descriptions even when api.func isn't loaded build.func: - Alpine APK repair: detect var_os=alpine and run 'apk fix && apk cache clean && apk update' instead of apt-get/dpkg commands - Show 'Repair APK state' instead of 'APT/DPKG' in menu for Alpine - Retry safety counter: OOM x2 retry limited to max 2 attempts (prevents infinite RAM doubling via RECOVERY_ATTEMPT env var) - Show attempt count in rebuild summary * fix(build): preserve exit code in ERR trap to prevent false exit_code=0 The ERR trap called ensure_log_on_host before post_update_to_api, which reset \True to 0 (success). This caused ~15-20 records/day to be reported as 'failed' with exit_code=0 instead of the actual error code. Root cause chain: 1. Command fails with exit code N → ERR trap fires (\True = N) 2. ensure_log_on_host succeeds → \True becomes 0 3. post_update_to_api 'failed' '\True' → sends 'failed/0' (wrong!) 4. POST_UPDATE_DONE=true → EXIT trap skips the correct code Fix: capture \True into _ERR_CODE before ensure_log_on_host runs. * Implement telemetry settings and repo source detection Add telemetry configuration and repository source detection function.
This commit is contained in:
committed by
GitHub
parent
d274a269b5
commit
c9ecb1ccca
@@ -37,21 +37,47 @@ if ! declare -f explain_exit_code &>/dev/null; then
|
||||
case "$code" in
|
||||
1) echo "General error / Operation not permitted" ;;
|
||||
2) echo "Misuse of shell builtins (e.g. syntax error)" ;;
|
||||
3) echo "General syntax or argument error" ;;
|
||||
10) echo "Docker / privileged mode required (unsupported environment)" ;;
|
||||
4) echo "curl: Feature not supported or protocol error" ;;
|
||||
5) echo "curl: Could not resolve proxy" ;;
|
||||
6) echo "curl: DNS resolution failed (could not resolve host)" ;;
|
||||
7) echo "curl: Failed to connect (network unreachable / host down)" ;;
|
||||
8) echo "curl: FTP server reply error" ;;
|
||||
8) echo "curl: Server reply error (FTP/SFTP or apk untrusted key)" ;;
|
||||
16) echo "curl: HTTP/2 framing layer error" ;;
|
||||
18) echo "curl: Partial file (transfer not completed)" ;;
|
||||
22) echo "curl: HTTP error returned (404, 429, 500+)" ;;
|
||||
23) echo "curl: Write error (disk full or permissions)" ;;
|
||||
24) echo "curl: Write to local file failed" ;;
|
||||
25) echo "curl: Upload failed" ;;
|
||||
26) echo "curl: Read error on local file (I/O)" ;;
|
||||
27) echo "curl: Out of memory (memory allocation failed)" ;;
|
||||
28) echo "curl: Operation timeout (network slow or server not responding)" ;;
|
||||
30) echo "curl: FTP port command failed" ;;
|
||||
32) echo "curl: FTP SIZE command failed" ;;
|
||||
33) echo "curl: HTTP range error" ;;
|
||||
34) echo "curl: HTTP post error" ;;
|
||||
35) echo "curl: SSL/TLS handshake failed (certificate error)" ;;
|
||||
36) echo "curl: FTP bad download resume" ;;
|
||||
39) echo "curl: LDAP search failed" ;;
|
||||
44) echo "curl: Internal error (bad function call order)" ;;
|
||||
45) echo "curl: Interface error (failed to bind to specified interface)" ;;
|
||||
46) echo "curl: Bad password entered" ;;
|
||||
47) echo "curl: Too many redirects" ;;
|
||||
48) echo "curl: Unknown command line option specified" ;;
|
||||
51) echo "curl: SSL peer certificate or SSH host key verification failed" ;;
|
||||
52) echo "curl: Empty reply from server (got nothing)" ;;
|
||||
55) echo "curl: Failed sending network data" ;;
|
||||
56) echo "curl: Receive error (connection reset by peer)" ;;
|
||||
57) echo "curl: Unrecoverable poll/select error (system I/O failure)" ;;
|
||||
59) echo "curl: Couldn't use specified SSL cipher" ;;
|
||||
61) echo "curl: Bad/unrecognized transfer encoding" ;;
|
||||
63) echo "curl: Maximum file size exceeded" ;;
|
||||
75) echo "Temporary failure (retry later)" ;;
|
||||
78) echo "curl: Remote file not found (404 on FTP/file)" ;;
|
||||
79) echo "curl: SSH session error (key exchange/auth failed)" ;;
|
||||
92) echo "curl: HTTP/2 stream error (protocol violation)" ;;
|
||||
95) echo "curl: HTTP/3 layer error" ;;
|
||||
64) echo "Usage error (wrong arguments)" ;;
|
||||
65) echo "Data format error (bad input data)" ;;
|
||||
66) echo "Input file not found (cannot open input)" ;;
|
||||
@@ -69,15 +95,21 @@ if ! declare -f explain_exit_code &>/dev/null; then
|
||||
101) echo "APT: Configuration error (bad sources.list, malformed config)" ;;
|
||||
102) echo "APT: Lock held by another process (dpkg/apt still running)" ;;
|
||||
124) echo "Command timed out (timeout command)" ;;
|
||||
125) echo "Command failed to start (Docker daemon or execution error)" ;;
|
||||
126) echo "Command invoked cannot execute (permission problem?)" ;;
|
||||
127) echo "Command not found" ;;
|
||||
128) echo "Invalid argument to exit" ;;
|
||||
130) echo "Terminated by Ctrl+C (SIGINT)" ;;
|
||||
129) echo "Killed by SIGHUP (terminal closed / hangup)" ;;
|
||||
130) echo "Aborted by user (SIGINT)" ;;
|
||||
131) echo "Killed by SIGQUIT (core dumped)" ;;
|
||||
132) echo "Killed by SIGILL (illegal CPU instruction)" ;;
|
||||
134) echo "Process aborted (SIGABRT - possibly Node.js heap overflow)" ;;
|
||||
137) echo "Killed (SIGKILL / Out of memory?)" ;;
|
||||
139) echo "Segmentation fault (core dumped)" ;;
|
||||
141) echo "Broken pipe (SIGPIPE - output closed prematurely)" ;;
|
||||
143) echo "Terminated (SIGTERM)" ;;
|
||||
144) echo "Killed by signal 16 (SIGUSR1 / SIGSTKFLT)" ;;
|
||||
146) echo "Killed by signal 18 (SIGTSTP)" ;;
|
||||
150) echo "Systemd: Service failed to start" ;;
|
||||
151) echo "Systemd: Service unit not found" ;;
|
||||
152) echo "Permission denied (EACCES)" ;;
|
||||
@@ -123,6 +155,7 @@ if ! declare -f explain_exit_code &>/dev/null; then
|
||||
224) echo "Proxmox: PBS storage is for backups only" ;;
|
||||
225) echo "Proxmox: No template available for OS/Version" ;;
|
||||
231) echo "Proxmox: LXC stack upgrade failed" ;;
|
||||
239) echo "npm/Node.js: Unexpected runtime error or dependency failure" ;;
|
||||
243) echo "Node.js: Out of memory (JavaScript heap out of memory)" ;;
|
||||
245) echo "Node.js: Invalid command-line option" ;;
|
||||
246) echo "Node.js: Internal JavaScript Parse Error" ;;
|
||||
|
||||
Reference in New Issue
Block a user