Problem: While collecting Pure Disk logs to provide to support for engineering to try to resolve an issue we where having, The log collection script filled the root volume with ~60 GB of stuff causing the box to cease functioning correctly.
Expected Results: That the root volume would not be filled by the log collections process.
Actual Results: 72Gb of log files where collected that filled the root volume, and caused many of the process to crash and may have even caused the crontab file that does clean up to be wipped out
Suggested Potential Resolutions.
Idea A.
Make it so that you have to throw a flag of either local or remote & if it is remote. If remote is chosen have the user enter in the iosupport password and case specific file location and have it grab the files one at a time and gzip them and push them to the ftp server, instead of collecting them all and then attempting to gzip them space permitting. Thus minimizing what is writen to disk. If local then it writes to the local disks.Idea B. Have it gzip any files that are not already gzipped before staging them into /tmp or at least imediatly gzip them and remove the unzipped file prior to moving onto the next, so they stay small. Yes a lot more overhead invoking gz so often.
Idea C. If it doesn't already only collect logs for the last so many day by default, make it default to only collect logs for the last 1 day unless a number of days flag and number of days value is eplicetly thrown.
idea D. Have the script do better clean up sooner when it is running and also detect if it runs out of disk space on the volume it is running on and clean up imedietally instead of leaving a bunch of stuff behind and causing the appliance to stop operating.
Idea E. Have it first write its files to /Storage if it is available and there is space, that way you don't fill up the boot disk if you have space on /Storage. (seems there is already precident for that for cores, etc, patches,.... )