Page 1 of 1

Experimental systray temperature utility for Nvidia GPUs...

Posted: Wed Sep 18, 2024 2:01 pm
by mikewalsh

Afternoon, gang.

Following development of the "on-demand" NvidiaTrayTemp utility for monitoring the core temperature of an Nvidia GPU:-

https://forum.puppylinux.com/viewtopic.php?t=9943

.....ozsouth and I, between us, have developed an "always-visible" Nvidia GPU systray monitor, giving a continuous readout of your Nvidia card's core temperature. Poorergputemp has been based around Oz's own poorercputemp, itself based around a 'hack' of the source code for Micko's original pmcputemp (with gracious assent, I believe, from Micko himself).

Many of us are familiar with Micko's original, since it's been included with plenty of Pups/Puplets OOTB over the years.

=======================

The utility consists of two items in /usr/bin, plus another two under /root/Startup.

In /usr/bin live the poorergputemp binary, which is responsible for generating the icons that you see, along with a script that directs poorergputemp to read values from a pre-determined location.....which leads us to /root/Startup.

Here, we have the detection script for Nvidia cards, which will always be one of two things. The detection script - a modified version of that used for NvidiaTrayTemp - auto-checks for both of these items.

  • If the official nvidia.ko kernel module is found at /lib/modules/[version]/kernel/drivers/video, then this value is grepped & processed from /usr/bin/nvidia-smi (this is the Nvidia CLI client; if nvidia.ko exists then so will this, since both are component parts of the official driver package). This is written to a temporary location under /tmp.

  • If the nvidia.ko module is NOT found, the script then looks for the readout generated by the "nouveau" kernel module instead.....usually to be found at /sys/class/hwmon/hwmon0/temp1_input, and again written to the afore-mentioned location in /tmp. This latter MAY vary, depending on kernel/hardware differences.....and that's why I'm still classing this as 'experimental'.

The above detection script is wrapped in a repeating 5-second loop, so as to continuously update the current value. This value is written to the pre-determined location which poorergputemp is directed to get its input FROM. Oz's help & advice with all this stuff has been invaluable, as always.

The remaining item in /root/Startup is a simple delayed launcher for poorergputemp, which allows time for the detection script to do its thing and to start writing values to the location in /tmp before poorergputemp begins to display them.

It will be fairly obvious which temp display is which, since the colours & fonts for poorergputemp are both brighter, larger & easier to read.

Image

=====================================

As it stands, I'm very happy with it. However - as with anything new and/or experimental - there ARE caveats.

Auto-detection for GPUs is nowhere near as simple as that for CPUs; the latter tend to use a number of 'fixed' architecture classes that are simple enough to check for. GPU manufacturers, however, have developed so many different generations of unique architectures over the years that even the team at kernel.org would be hard-pressed to keep up with them.

I've only been able to build this for Nvidia cards, since both my machines run one. In theory, it should be possible to modify the detection script for AMD cards as well.....IF anybody has one, running the official drivers, and can locate both the official CLI client (I'm almost certain the official AMD driver will provide one) OR the in-kernel 'radeon' module. This, I'm afraid, will need to be left to others, since I can't follow this side of things where it would need to go.

In theory, too, it MAY be possible to include a temp readout for built-in Intel graphics. I have no knowledge of this side of things, so again others would need to help research stuff for modifying the detection script.

There is also the issue of changing standards within the kernel itself; older kernels read much of this stuff from different locations to those employed by current kernels.

NOTE:- At present, this works fine with Tahrpup64, Xenialpup64, Bionicpup64 and Fossapup64 (and derivatives thereof). It DOES NOT work under Bookwormpup64 - I've just tried it - and am unsure why. Bookwormpup64 is using different locations for the "nouveau" driver's temp output, and poorergputemp is refusing to read its config file. All dependencies are met, nothing is missing, yet adjusting for the different temp location still results in a refusal to display anything.

============================

With the above caveats in mind, please understand that if you're interested in trying this you will be in much the same boat as myself.....still learning about all this stuff for the first time. Please don't post in, complaining that it doesn't work and/or do loads of other fancy stuff; this was only ever intended to be an 'always-visible' readout of an Nvidia GPU's core temperature.......nothing more, nothing less.

Please also be aware that if you should happen to be using peebee's "64-bit compatibility SFS", which permits running modern 64-bit browsers within an otherwise stock 32-bit Puppy (by the use of a 64-bit kernel and a special SFS package), this will NOT work.....neither the 32-bit OR 64-bit packages will behave themselves. They attempt to write the config script at /root/.config/poorergputemp/poorergputemprc but immediately delete it again. This repeats half-a-dozen times or so, then aborts with the message "Your processor is not supported. Giving up..."

My guess is that the binary is confused by the mix of architectures in use, since this IS something of a "hack", even though it actually works very well. (I can testify to this, since I've been using this trick for some time now. It DOES work very well indeed).

==============================

I've attached the packages below; both 32-bit and 64-bit are available. After installation, a restart of the graphical server will be required in order to kickstart the scripts in /root/Startup into life.

Use entirely at your own discretion. These will either work for you, or.....they won't. Just because these work for MY hardware is NO "guarantee" they will work as intended for others, but.....we shall see what we shall see.

Feedback and/or sensible discussion will be appreciated. Hope they're useful to some of you! :D

Mike. ;)