Monday, March 18, 2019

An NVidia GTX 1050 Ti in that PowerEdge T440 without the GPU Kit.

Having recently upgraded the GPU in my Dell Precision T7910, I found myself with this card lying around (ASUS ROC STRIX 1050 Ti Gaming):

So I had this crazy thought: What if I tried to use that card in my PowerEdge T440? This would provide a decent (and silent) upgrade to the MSI 1030 GTX currently in the server.

One problem was that the card required external GPU power and I had ordered my T440 without the GPU kit, which can only be installed at Point-Of-Sale and cannot be retro-fitted afterwards.
I tried using the 1050 Ti with just the power provided by the x16 PCI-E slot but the server failed to recognize the GPU.

So I went looking into my T440 (that GPU Kit must draw power from somewhere, right?) and found a white connector on the PCB directly attached to the PSU cage:

A close inspection using my Phone revealed something looking like almost like an 8-Pin GPU connector with an informative label on the side:

Did you read "GPU_PWR" too? I surely did but that white connector was a little different from what I used to see as far as GPU power connectors go.

Then I remembered I had seen similar connectors very recently.. in my Dell Precision T7910!!
(Why change a good design when you've got one?)

Luckily, the Precision T7910, with its 1300W PSU had lots of GPU power cables (enough for 2 power-hungry 6-pin or 8-pin GPUs) and I was pretty sure I'd never use more than one GPU in my T7910, A nice 1660 GTX was good enough for me.
So I went ahead and pulled one of the two GPU cables from the Precision T7910. Unfortunately, the 8pin cable from the T7910 didn't fit on the T440 PSU connector due to mismatched diameters on the two bottom right slots.

After trying to make those 'fingers' thinner using a cutter, I realized that those two didn't even have electrical wiring so I just cut them off:

Once this (not so) delicate surgery took place, the GPU cable fit perfectly into the GPU Power connector on the PSU PCB of my T440. The other end of the cable (6pin) made its way to the GPU card and I powered up the server, which came up perfectly.

Such cables can be ordered on ebay for about 10USD.

Here are a few pictures of the finished assembly:

In conclusion I'll state that although I like Dell PowerEdge and Precision hardware, I dislike very much the FUD surrounding those systems:
- I didn't buy my T440 with a GPU Kit (my own mistake) and Dell wasn't able to help retrofit a kit afterwards (no such solution exists).
- I still managed to power up that GTX card using an extra cable I borrowed from another system without Dell's help.
- Notwithstanding their (Dell's) desire to sell me Platinum 795W or 1100W PSU units, my complete T440 system still idles at 88Wats and never seem to exceed 200W. IMHO that 495W PSU might be just fine.

# ipmitool sdr list full|grep Watt
Pwr Consumption  | 88 Watts          | ok
Complete Specs here:
  • Two 4110 Xeon Silver cpus (8C/16T)
  • 96 (3 * 32) Gb RAM
  • Two Samsung 860 SSD Evos
  • One WD Red 8Tb drive
  • One H730P HBA
  • One i350-4 Quad Gigabit NIC
  • One 495W PSU
This server is probably one of the best workstations I've ever had!

Friday, March 8, 2019

Some Tips about PowerEdge as Workstation (Revisited for 14th Gen servers)

A new computer: Dell PowerEdge T440 server.

As much as I consider it a very fine machine now, that road wasn't easy. Some of the previous 12th Gen and 13th Gen Tips didn't apply and had to be-revisited.

Also. because I was unaware of some of the 'quirks' I ran into some issues after purchase and it took me a while to add in the extra hardware to make the T440 experience more enjoyable.

3rd Party PCI Fan response

There is no more a one-size-fits all 3rd party PCI fan response.
Instead, this is now done per slot. Look for this in the idrac GUI under
'Hardware Settings':

Or, for the CLI-minded:
/admin1-> racadm get System.PCIESlotLFM.1
/admin1-> racadm set System.PCIESlotLFM.1.LFMMode 2
Object value modified successfully
/admin1-> racadm get System.PCIESlotLFM.1 

Fan Speed

The server is -very- picky about components health (there are a lot more sensors). At one point it was pushing 100% fan because of the lack of a temperature sensor on the 850 evo SSD which was behind the H730P. Upgraded to 860 evo's, problem solved. The PowerEdge T130 which had both the H730P and the SSD never had a single issue with that.

I decided that I liked it more if the fan stayed around 1080rpm so I added a script to my RHEL7 system:

# gmake install
chkconfig --add dellfanctl
(II) -------
(II) -------
You have new mail in /var/spool/mail/root
# systemctl -al|grep dellf
  dellfanctl.service                                                                                             loaded    active     exited    SYSV: Enables manual IPMI Dell Fan control after boot
# crontab -l|grep dellf
*/35 * * * * /etc/init.d/dellfanctl start > /dev/null 2>&1
# /etc/init.d/dellfanctl status
(II) MAX T: 65C, Current T: 30C, Fan: 1080 (+/- 120) RPM   [  OK  ]
# /etc/init.d/dellfanctl start
(II) Enabled Manual fan Control on host daltigoth          [  OK  ]

This script can be downloaded here (adapt script for your hostnames):

GPU cards

The single x16 slot for GPU only gets enabled for GPUs when you have two Xeons, not one.
If you want a GPU that draws more power than that provided by the PCI slot, please remember to order your server with the GPU kit as it cannot be retrofitted/ordered afterwards. I am planning to research this soon.

I'm currently using an MSI Geforce GTX 1030 single width card in the machine.

Power Draw

With one Xeon Silver 4110, my T440 idled around 66 Watts. With two cpus it idles around
  88 Watts. That's quite decent.

Wednesday, March 22, 2017

LVM2 bootdisk encapsulation on RHEL7/Centos7


Hi everyone,
Life on overcloud nodes was simple back then and everybody loved that single 'root' partition on the (currently less than 2Tb) bootdisk. This gave us overcloud nodes partitioned like this:

[root@msccld2-l-rh-cmp-12 ~]# df -h -t xfs 
 Filesystem Size Used Avail Use% Mounted on 
/dev/sda2 1.1T 4.6G 1.1T 1% /

The problem with this approach is that anything filling up any subdirectory on the boot disk will cause services to fail. This story is almost 30 years old.
For that reason, most security policies (Think SCAP) insist that /var, /tmp, /home be different logical volumes and that any disk uses LVM2 to allow additional logical volumes.

To solve this problem, whole-disk image support is coming to Ironic. It landed in 5.6.0 (See [1] ) but missed the OSP10 release. With whole-disk image support in Ironic, we could easily change overcloud-full.qcow2 to be a full-disk image with LVM and separate volumes. This work is a tremendous advance, thanks to Yolanda Robla. I hope it gets backported to stable/Newton (OSP10, our first LTS release).

I wanted to solve this issue for OSP10 (and maybe for previous versions too) and started working on some tool to 'encapsulate' the existing overcloud partition into LVM2 during deployment. This is now working reliably and I wanted to present the result here so this could be re-used for other purposes.

Resulting configuration

The resulting config is fully configurable and automated. It will make use of an arbitrary number of logical volumes from your freshly deployed overcloud node. 
Here's an example for a compute node with a 64gb boot disk and an 8Tb secondary disk:

[root@krynn-cmpt-1 ~]# df -t xfs
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/rootdg-lv_root 16766976 3157044 13609932 19% /
/dev/mapper/rootdg-lv_tmp 2086912 33052 2053860 2% /tmp
/dev/mapper/rootdg-lv_var 33538048 428144 33109904 2% /var
/dev/mapper/rootdg-lv_home 2086912 33056 2053856 2% /home

[root@krynn-cmpt-1 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/sda2 rootdg lvm2 a-- 63.99g 11.99g

[root@krynn-cmpt-1 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
rootdg 1 4 0 wz--n- 63.99g 11.99g


The tool (mostly a big fat shell script) will come into action at the end of firstboot and use a temporary disk to create the LVM2 structures and volumes. It will then set the root to this newly-created LV and will reboot the system.

When the system boots, it will wipe clean the partition the system was originally installed on. Then it will proceed to mirror back the LV's and VG to that single partition. Once finished, everything will be back to where it was before, except for the temporary disk which was wiped clean too..

Logs of all actions are kept on the nodes themselves:

root@krynn-cmpt-1 ~]# ls -lrt /var/log/ospd/*root*log
-rw-r--r--. 1 root root 15835 Mar 20 16:53 /var/log/ospd/firstboot-encapsulate_rootvol.log
-rw-r--r--. 1 root root 2645 Mar 20 17:02 /var/log/ospd/firstboot-lvmroot-relocate.log

The first log details the execution of the initial part of the encapsulation: creating the VG, the LV's, setting up GRUB, injecting the boot run-once service, etc..
The second log details the execution of the run-once service that mirrors back the Volumes to the original partition carved by tripleo during a deploy.

It is called by the global multi-FirstBoot template here:

Which we called from the main environment file:


The tool provides you with the ability to change the names of the Volume Group, how many volumes are needed, what size they shall be, etc... The only way to change this is to edit your copy of the script and edit the lines marked as 'EDITABLE' at the top. E.g:

boot_dg=rootdg                                 # EDITABLE
boot_lv=lv_root                                # EDITABLE
# ${temp_disk} is the target disk. This disk will be wiped clean, be careful.
temp_disk=/dev/sdc                             # EDITABLE
# Size the volume
declare -A boot_vols
boot_vols["${boot_lv}"]="16g"                   # EDITABLE
boot_vols["lv_var"]="32g"                       # EDITABLE
boot_vols["lv_home"]="2g"                       # EDITABLE
boot_vols["lv_tmp"]="2g"                        # EDITABLE
declare -A vol_mounts
vol_mounts["lv_var"]="/var"                     # EDITABLE
vol_mounts["lv_home"]="/home"                   # EDITABLE
vol_mounts["lv_tmp"]="/tmp"                     # EDITABLE

All of the fields marked 'EDITABLE' can be change. Any new LV can be added by inserting a new entry for both boot_vols and vol_mounts.

Warnings, Caveats and Limitations

Please be aware of the following warnings
  • The tool will WIPE/ERASE/DESTROY whatever temporary disk you give it. (I use /dev/sdc because /dev/sdb is used for something else). This is less than ideal but I haven't found something better yet.
  • This tool has only been used on RHEL7.3 and above. It should work fine on Centos7.
  • The tool -REQUIRES- a temporary disk. It will not function without it. It will WIPE THAT DISK.
  • This tool can be used outside of OSP-Director. In fact this is how I developed this script but you still REQUIRE a temporary disk. 
  • This tool can be used with OSP-Director but it MUST be invoked in firstboot and it MUST execute last. One way to do this is to make it 'depend' on all of the previous first boot scripts. For my templates, it involved doing the following:
  • It lengthens your deployment time and causes an I/O storm on your machines as the data blocks are copied back and forth. If you do it in a virtual environment, I have added 'rootdelay' and 'scsi_mod.scan=sync' to help the nodes find their 'root' after reboot. If some nodes complain that they couldn't mount 'root' on unknown(0,0) this is likely caused by that issue and resetting the node manually should get everything back on track.
  • The resulting final configuration is fully RHEL-supported, nothing specific there.

  • THIS IS A WORK IN PROGRESS, feel free to report back success and/or failure.

Tuesday, November 15, 2016

Some Tips about running a Dell PowerEdge Tower Server as your workstation

Some use workstations as servers.
I'm using servers as workstations.

Over the years, I've changed computing gear on quite a few occasions. I've been using Tower Servers for the past 5 years and would like to share some tips to help others.

  • But why would anyone want to do that??

- Servers are well integrated systems and are usually seriously designed and tested.

- They offer greater expandability (6x3.5" hotswap bays in my previous T410, now 8x3.5" in my T430).

- They usually include some kind of Remote Access Card (RAC) which is great for remote'ing in when all else has failed.

- I can get tons of server equipment on ebay that will be compatible with that system.

- Where else can I get 192Gb of ECC DDR4 RDIMM, dual 6-core Xeons and 8 hotswap bays?

  • Tip #1 : Choose your chassis with care.
Not all servers are created equal:

- Rack servers are usually thin and noisy (those 8k rpm fans have the job of cool that 2U enclosure). It is not uncommon for them to be in the 60-70dBA range.

- Tower servers are much bigger and less noisy. The are also more expensive -but- you get an electricity bill that's lower than a comparable Rack server so the price difference will shrink after a few months. And having a server that makes less noise and draws less power is more environment-friendly!

- Most pre-2011 tower servers from Dell and HP (before Dell 11th Gen and before HP's Gen8) are less quiet than their modern counter-parts.

In 2016, I'd recommend getting a 12th or 13th Gen from Dell.. If you are into HP Gear, get a Gen8 or a Gen9. I've never done Lenovo or Cisco gear, so I can't help here.

- Most modern towers from Dell feature a single 120mm PWM fan to cool the entire chassis. That's the T410, T420 or T430. I assume the T310, T320 and T320 are similar since they feature the same chassis.

- The environmental ratings for current and past servers can usually be obtained from the manufacturer. Check the specs carefully. I found the spec for most recent Dell Tower servers here:

Dell-13G PowerEdge Acoustical Performance and Dependencies

  • Tip #2 : Choose your components carefully.

Now that you've selected the system, let's pick the components.

- CPUs

- Most recent tower servers feature PWM (4-pin) fans that are controlled by the iDrac/iLo controller. The sensors on these systems feed the former with information which they use to drive the speed of the fans.

- Consequently, even if you want enough Xeon cores, you probably don't want one of their 145W 12-core monsters. Such a chip (or a pair of them) will increase thermal response under load in your system which will result in increased fan speed. On the other hand, lower Wattage Xeons usually have a low core frequency that might make the user experience in interactive sessions oh-not-so-great.

I usually pick Xeons in the 65W-85W range. These typically feature decent punch while keeping heat (and noise) tolerable.

Wikipedia has a great list of all Xeon processors with Wattage, Cores, etc.. here:
List of Intel Xeon Microprocessors

- Graphics!

The bundled graphics adapter in your server will not let you run much else than a 2D environment. This can be solved by adding a PCI-E GPU which will give you decent 3D performance.

Forget about the latest Radeon or NVidia monster, it's not going to work at all.
When I tried my NVidia Quadro K2000 (a 65W card) in my Dell PowerEdge T130, the system simply refused to boot and told me that the card was drawing too much power to power on all components.

GPUs can usually work fine if they are in the 45W or below range. I've used with great success NVidia Quadro K620 and K600 cards in my Poweredge. The Passive Geforce cards from the previous gens (GT730, etc...) can also be used successfully.

Here's my Poweredge T130:
# lspci |grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 620] (rev a1)
That card was replaced by a GT 730:
lspci |grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1)

And here's the Poweredge T430:
# lspci |grep VGA
03:00.0 VGA compatible controller: NVIDIA Corporation GF108GL [Quadro 600] (rev a1)

- Sound

Servers don't have sound cards.. but I've used with much success USB audio adapters to get sound from Videos and Games on my Linux Servers/Workstations.
These can usually be obtain for about USD10 on amazon or e-bay:

  • Tip #3 : Use the right settings
Dell servers need some parameters passed to the iDrac in order to keep noise to a minimum even when using 3rd Party PCI-E cards.

Disable PCI-E 3rd Party thermal reponse (can also be done from the iDrac submenu of the BIOS GUI):

Here's a 13th Gen server. I hightlighted the most important fields.

/admin1-> racadm get System.ThermalSettings[Key=System.Embedded.1#ThermalSettings.1]

ThermalProfile=Minimum Power

Some of these can be modified by using the iDrac CLI:
/admin1-> racadm set  System.ThermalSettings.FanSpeedOffset Off      
/admin1-> racadm set  System.ThermalSettings.ThirdPartyPCIFanResponse 0
Object value modified successfully

To be continued...

Thursday, July 28, 2016

Of Samsung SSD's, LSI HBA's and SSD firmwares.


Everyone loves SSD's, that's no doubt but when it comes to updating the of-so-very-important firmware things become quite complicated quickly:

First, I don't run Windows or OSX on my most important hardware (the hypervisors/workstations).

Second, I run most of my SSD's behind an LSI HBA (to benefit from the cache and from the processing power of the LSI card).

Here's how I did -without- Windows or OSX and -without- taking my SSD's out of my boxes.

A few words of warning

This post is not a full "Method Of Procedure" and is very specific.
It is -NOT- for the average user and it requires some good Linux knowledge.
If things go wrong, it may brick/destroy your SSD, your PC and even your HD TV.
It will most likely require to be adapted in most cases unless you have a very similar setup.
The purpose of this post is to show that there are alternative ways available and that you do NOT have to tear up your gear only to update firmware.

Chances are that this might work for SSD's from Intel, Crucial and others but you won't know until you've found a way to decrypt/un-obfuscate their firmwares.

Please do not attempt to go through these steps unless you know what you are doing.

Hardware Setup

- Dell Precision T5610 (128Gb RAM, dual 6-cores Xeons, LSI Megaraid 9271-8i 1Gb)
- RHEL 6.8.z
- Samsung SSD 840 EVO m-Sata with StarTech  2.5-Inch SATA to Mini-SATA SSD Adapter

Here's a view from the LSI HBA of the above setup (SSD firmware highlighted blue):

# megaclisas-status
-- Controller information --
-- ID | H/W Model                | RAM    | Temp | BBU    | Firmware
c0    | LSI MegaRAID SAS 9271-8i | 1024MB | 55C  | Absent | FW: 23.34.0-0005
-- Array information --
-- ID | Type   |    Size |  Strpsz |   Flags | DskCache |   Status |  OS Path | CacheCade |InProgress
c0u0  | RAID-0 |    931G |  256 KB | ADRA,WT |  Enabled |  Optimal | /dev/sda | None      |None
c0u1  | RAID-0 |   3637G |  256 KB | ADRA,WB |  Default |  Optimal | /dev/sdb | None      |None
-- Disk information --
-- ID  | Type | Drive Model                                            | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u0p0 | SSD  | S1KRNEAFB01520M Samsung SSD 840 EVO 1TB mSATA EXT42B6Q | 931.0 GB | Online, Spun Up | 6.0Gb/s  | 36C  | [252:0]  | 1
c0u1p0 | HDD  | WD-WCC4E0346870WDC WD40EFRX-68WT0N0 80.00A80           | 3.637 TB | Online, Spun Up | 6.0Gb/s  | 36C  | [252:1]  | 0

In short, deep within that Precision T5610 running RHEL6.8, there'a Samsung mSata SSD inside a small SATA enclosure connected to an LSI HBA.
Thankfully, MegaCli can upgrade arbitrary firmware on drives connected to an LSI HBA.

Technical Procedure

Obtain firmware from Samsung

If you have a Samsung SSD from before the 850 times, you're in luck because you don't have to go through the Windows-Only Samsung Magician Application. For 850's and later, the firmware is included in Samsung Magician and not available as a separate download anymore.

Download your firmware from Samsung at: Samsung SSD Tools Download
For that SSD, it came in the form of a Win/Mac ISO image:

# ls -l *iso
-rw-r--r-- 1 root root 3117056 May  2 06:47 Samsung_SSD_840_EVO_mSATA_EXT43B6Q_Win_Mac.iso

Extract firmware payload from download

The firmware image is deep within the download. Let's extract it:

# mount -o loop $(pwd)/Samsung_SSD_840_EVO_mSATA_EXT43B6Q_Win_Mac.iso /mnt
# losetup /dev/loop4 /mnt/isolinux/btdsk.img
# mkdir /tmp/2
# mount /dev/loop4 /tmp/2
mount: block device /dev/loop4 is write-protected, mounting read-only
# cp -afv /tmp/2/samsung/DSRD/FW .
`/tmp/2/samsung/DSRD/FW' -> `./FW'
`/tmp/2/samsung/DSRD/FW/EXT43B6Q' -> `./FW/EXT43B6Q'
`/tmp/2/samsung/DSRD/FW/EXT43B6Q/EXT43B6Q.enc' -> `./FW/EXT43B6Q/EXT43B6Q.enc'

Here's the firmware image (note that its name is related to the actual firmware version):

# ls -l
-rwxr-xr-x 1 root root 1048576 Sep 23  2015 EXT43B6Q.enc

Decrypt firmware

Several posts on the net mentionned that the above image was encrypted. It surely did look like some binary garbage and not like a firmware of any kind.

Then, I found this post where someone had posted a simply python script to decrypt the encrypted payload downloaded from Samsung's site.

Since I wasn't sure if the drive would accept an excrypted firmware through MegaCli, I decided to decrypt the firmware first. I placed a copy of the python script here.

# python ./ EXT43B6Q.enc
# ls -lrt
-rwxr-xr-x   1 root root 1048576 Jul 22 11:03 EXT43B6Q.enc
-rw-r--r--   1 root root 1048576 Jul 22 11:03 EXT43B6Q.enc.decoded

Find the PhysDrv ID and apply firmware

# megaclisas-status
-- Controller information --
-- ID | H/W Model                | RAM    | Temp | BBU    | Firmware     
c0    | LSI MegaRAID SAS 9271-8i | 1024MB | 55C  | Absent | FW: 23.34.0-0005 
-- Array information --
-- ID | Type   |    Size |  Strpsz |   Flags | DskCache |   Status |  OS Path | CacheCade |InProgress   
c0u0  | RAID-0 |    931G |  256 KB | ADRA,WT |  Enabled |  Optimal | /dev/sda | None      |None         
c0u1  | RAID-0 |   3637G |  256 KB | ADRA,WB |  Default |  Optimal | /dev/sdb | None      |None         
-- Disk information --
-- ID  | Type | Drive Model                                            | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u0p0 | SSD  | S1KRNEAFB01520M Samsung SSD 840 EVO 1TB mSATA EXT42B6Q | 931.0 GB | Online, Spun Up | 6.0Gb/s  | 36C  | [252:0]  | 1       
c0u1p0 | HDD  | WD-WCC4E0346870WDC WD40EFRX-68WT0N0 80.00A80           | 3.637 TB | Online, Spun Up | 6.0Gb/s  | 36C  | [252:1]  | 0       

# MegaCli -pdfwdownload -physdrv[252:0] -f EXT43B6Q.enc.decoded -a0

Flashing firmware image size 0x8000 (0x0 0x0 0xa0). Please wait...
Flashing firmware image size 0x8000 (0x0 0x0 0xa0). Please wait...
Flashing firmware image size 0x8000 (0x0 0x0 0xa0). Please wait...
Flashing firmware image size 0x8000 (0x0 0x0 0xa0). Please wait...
Flashing firmware image size 0x8000 (0x0 0x0 0xa0). Please wait...
Flashing firmware image size 0x100000 (0x0 0x0 0xa0). Please wait...

Exit Code: 0x00

Reboot and Power Cycle (just to be sure)

Immediately after applying firmware, the drive showed up with the new firmware:

# megaclisas-status
-- Controller information --
-- ID | H/W Model                | RAM    | Temp | BBU    | Firmware     
c0    | LSI MegaRAID SAS 9271-8i | 1024MB | 57C  | Absent | FW: 23.34.0-0005 
-- Array information --
-- ID | Type   |    Size |  Strpsz |   Flags | DskCache |   Status |  OS Path | CacheCade |InProgress   
c0u0  | RAID-0 |    931G |  256 KB | ADRA,WT |  Enabled |  Optimal | /dev/sda | None      |None         
c0u1  | RAID-0 |   3637G |  256 KB | ADRA,WB |  Default |  Optimal | /dev/sdb | None      |None         
-- Disk information --
-- ID  | Type | Drive Model                                            | Size     | Status          | Speed    | Temp | Slot ID  | LSI Device ID
c0u0p0 | SSD  | S1KRNEAFB01520M Samsung SSD 840 EVO 1TB mSATA EXT43B6Q | 931.0 GB | Online, Spun Up | 6.0Gb/s  | 45C  | [252:0]  | 1       
c0u1p0 | HDD  | WD-WCC4E0346870WDC WD40EFRX-68WT0N0 80.00A80           | 3.637 TB | Online, Spun Up | 6.0Gb/s  | 38C  | [252:1]  | 0       

Since I didn't want to take risks, I rebooted the workstation and verified that it came back cleanly.

Once it did, I powered off the system and brought it back on. Again, it came back properly.

Now if I could extract 850 firmware from Samsung Magician app, I'd be a happy camper.

Friday, June 17, 2016

Monitoring for OpenStack - A practical HOWTO with Sensu

ATTENTION: This playbook is in the process of being fully released on github and may not work as-is.
It is also very RHEL-centric although it will likely work on Centos/RDO as well. Please bear with me as I make this more useable. Thank you.



  • Graeme Gillis
  • Gaetan Trellu
  • Alexandre Maumene
  • Cyril Lopez
  • Gaël Lambert
  • Guillaume coré



As of OSP7/OSP8, RHEL OSP uses the tripleo upstream code to deploy Openstack using a minimal (but critical) Openstack called the 'undercloud'. [1]
I won't go into the specifics of this kind of deployment but suffice it to say that the most simple OSP setup instantly becomes, well... quite 'convoluted'.

At the same time, all of the different subsystems, nodes and endpoints are deployed without alerting, monitoring and graphing, leaving it to
the customer to deploy his/her monitoring framework on top of OpenStack.

Some simple alerting and monitoring based on 'sensu' ( is scheduled to find its way in OSP10.

Also, Graeme Gillis from the Openstack Operations Engineering Team [2] was nice enough to put some great resources for those wanting to deploy
alerting, monitoring and graphing on OSP7. [3], [4], [6] and [7]

This small project [5] aims to build upon these tools and procedures to provide an out-of-the-box alerting framework for an OpenStack cloud.
Please remember that it is a work in progress and subject to change without notice. All comments/improvements/questions are thus welcomed.

Logical Architecture

Sensu was selected for this project because it:
1) is already popular within the OpenStack community.
2) is already selected for OSP8 as the alerting framework. ([6] and [7])

Here's a diagram describing the logical architecture (Thanks Graeme Gillis):

The logical architecture deployed by the tooling described in this document includes a single Sensu server for the entire Undercloud and Overcloud.
While it might be feasible to deploy an HA-enabled Sensu configuration or a redundant Sensu architecure with several Sensu servers, it is outside of the scope of this document.

Technical Architecture

The Sensu server may be a Virtual Machine (KVM, VirtualBox, etc..) or a physical machine. We'll be only describing the KVM-Based Virtual machine setup within this document.
While the most obvious requirement is that the Sensu server runs RHEL7.x and has at least 8GB RAM, the most ubiquitous pre-requisite is related to the network: Your Sensu server -must- have access to the heart of your OverCloud.
This means: the control-plane, the provisioning network AND the OOB network (to monitor your IPMI access points into your overcloud nodes).

Therefore, it makes sense to build your Sensu server and your Undercloud machine alike.
If your undercloud machine is a KVM guest, it makes sense to create your Sensu Server as a KVM guest using the exact same bridges/networks on the same Hypervisor.
This setup is described here: a KVM Hypervisor with two KVM guests : the undercloud and the Sensu server.

If your undercloud machine is another type of VM (VBox, VMware, etc..), you'll have to do some network planning prior to installing your Sensu server and figure out the networks by yourself.

Here's an example of an OSP7 cloud after OSP-D installation (Thanks to Dimitri Savinea for the original Dia):

And here is the same OSP7 cloud with the Sensu Server added as a VM on the same Hypervisor as the Undercloud
(notice the pink box underneath the undercloud in the top-right corner)

Some Screenshots

The uchiwa dashboard is the Operator's interface to the Sensu server.

The Operator is first prompted to login to uchiwa using the credentials from the playbook (more on this later)

After logging in, the 'Events' dashboard is displayed (note the buttons to the left to navigate the views).
We notice a warning on the 'over_ceilometer_api' check out of 72 checks configured on the Cloud.

Clicking on 'Clients' brings up the list of registered clients and their keepalive status. Here we have 18 clients and 72 different check types.
Clicking on a Client brings up the detailed view of the checks being performed on this client (here with sensu 0.16.0)

Sensu is a work in progress. Features are added and bugs fixed as new versions are released. Here's the same Client with Sensu 0.20.6:

Installation Howto

Create the Sensu VM on the appropriate Hypervisor
We will be creating a Sensu server on the same hypervisor as the undercloud and we will copy the network configuration from the latter.
This will happen once your entire cloud is deployed as we need the services to be up in order to check them.
For quickstarters, you could also 'clone' the undercloud VM and uninstall its OSP packages (with the VM's network down, of course)

Setup the skeleton for the Sensu server VM

1. Download a RHEL guest image (See [6]):

ls -la images/rhel-guest-image-7.2-20151102.0.x86_64.qcow2
-rw-r-----. 1 vcojot vcojot 474909696 Jan  7 11:55 images/rhel-guest-image-7.2-20151102.0.x86_64.qcow2

2. Customize your RHEL guest image on your RHEL7 or Fedora box

(Adapt the sample script provided with the ansible role to fit your needs)
Replace my SSH pub Key with yours. Also replace the UNIX password for the admin acount by the one generated at the previous step.

The following script provides you with a RHEL7.2 guest image which includes most of the requirements. The following script is meant for Fedora.

a) Modify the rhel7.2 image on Fedora (Fedora Only, for RHEL7 please see below).

First, create your credentials file:
cat .sm_creds
STACK_SSH_KEY="ssh-rsa AAAAB3NzaC1yc2EA...... user@mymachine"

Next, run the provided/adapted script:
./ansible/ansible/tools/ line 21: [: too many arguments
‘images/rhel-guest-image-7.2-20151102.0.x86_64.qcow2’ -> ‘rhel-7.2-guest-sensu.x86_64.qcow2’
Image resized.
‘rhel-7.2-guest-sensu.x86_64.qcow2’ -> ‘_tmpdisk.qcow2’
[   0.0] Examining rhel-7.2-guest-sensu.x86_64.qcow2
Summary of changes:
/dev/sda1: This partition will be resized from 6.0G to 128.0G.  The
filesystem xfs on /dev/sda1 will be expanded using the 'xfs_growfs' method.
[   4.1] Setting up initial partition table on _tmpdisk.qcow2
[   4.2] Copying /dev/sda1
Please note that the above script is only provided as a convenience and should only be used if there aren't ready-use image available.

You'll also need to setup RH subscription on the sensu VMs.

b) if using RHEL7, copy the RHEL7.2 image and run the required ansible playbook.

If using this method, then simply download the rhel-7.2 guest image, and proceed further when it's ready to be installed.
That VM will need to be subscribed to the proper CDN channels.

c) Check your results.

If all goes well, this should provide your with a ready-to-use QCOW, which we'll use later.
ls -la rhel-7.2-guest-sensu.x86_64.qcow2
-rw-r-----. 1 vcojot vcojot 2009989120 Jan 21 13:00 rhel-7.2-guest-sensu.x86_64.qcow2

Alternatively, you could also deploy any RHEL7 VM and use Gaetan's ansible playbook to perform the above tasks [8]

4. Copy the guest image to your Hypervisor

Upload this file to your KVM host and place it under /var/lib/libvirt/images.

Integrate the sensu VM with your cloud (network and credentials)

We will be copying the network configuration from the instack VM since we are deploying a 'sibling' VM.
WARNING: The actual network configuration of the instack and sensu VM's varies from deployment to deployment.
The walkthrough below will probably give you a rough idea and you will have to adapt this to you actual network configuration.

1. List the undercloud's network config:

Let's become root on the hypervisor and see what we have:
[root@kvm1 ~]# virsh list --all
Id    Name                           State
2     sc-instack                     running
sc-instack is the 'undercloud' VM, we want to copy that configuration to the new Sensu VM.
Let's look at the network configuration (I have highlighted the relevant information for our example):
[root@kvm1 ~]# virsh domiflist 2
Interface  Type       Source     Model       MAC
vnet0      bridge     br3115   virtio      52:54:00:27:b6:f4
vnet1      bridge     br2320   virtio      52:54:00:5b:b5:fb
vnet2      bridge     brpxe    virtio      52:54:00:85:7a:01

So we have 'br3115', 'br2320' and 'brpxe', in that order. These will be 'eth0', 'eth1' and 'eth2'.
You'll have to create/pick/compute 3 new MAC addresses (shown in green) as we'll be adding 3 network interfaces.
Let's use all that with our newly created QCOW image.

2. Install and boot your Sensu VM

[root@kvm1 ~]# virt-install --boot hd --connect qemu:///system --name sensu01 --ram 16384 --vcpus=8 --cpu kvm64 --virt-type kvm --hvm --accelerate --noautoconsole \
--network=bridge:br3115,model=virtio \
--network=bridge:br2320,model=virtio \
--network=bridge:brpxe,model=virtio \
--serial pty --os-type linux \
--disk path=/var/lib/libvirt/images/sensu.x86_64.qcow2,format=qcow2,bus=virtio,cache=none

[root@kvm1 ~]# virsh console sensu01
Connected to domain sensu01
Escape character is ^]
Employee SKU
Kernel 3.10.0-327.el7.x86_64 on an x86_64
sensu01 login:

3. Reserve some IPs for the Sensu server on your both your undercloud and overcloud

Login to your undercloud and source the proper rc files
(one for the undercloud, one for the overcloud).
Identify the 'internal_api' and the 'ctlplane' networks, they are two of the bridges we identified earlier ('br2320' and 'brpxe', respectively).
These will get mapped to your Sensu server on 'eth1' and 'eth2', eth0 being the outside network interface.

[stack@sc-instack ~]$ . stackrc
[stack@sc-instack ~]$ neutron net-list
| id                                   | name         | subnets                                                |
| 175c21a7-9858-412a-bb7a-6763bf6d84ee | storage_mgmt | 967dcecb-73e4-476f-ba21-eba91d551823    |
44bb7c18-2ba6-49ab-b344-7d644bb3110f | internal_api | fc3ec57c-ff10-40b6-9b63-d6293bfe6ee1    |
| 75cbd5c2-aee3-47be-a4eb-b355d1edb281 | storage      | cb738311-89f0-4543-a850-b1258c1a6d6c    |
207a3108-e341-4360-b433-bfd6007cc59d | ctlplane     | 7e1e052f-b4eb-4f3b-8b1c-ba298cbe530f    |
| bf45910d-36e9-43f7-9802-1545d7182608 | tenant       | 9ad085d8-e185-4d73-8721-8a2ef0ce5e87    |
| c7f74ecb-ff08-49da-9d8b-f3070fbcbcee | external     | 8d915163-43ec-431c-81ab-841750682475 |

Now it's time to get some IP's on these two subnets (Hint: use 'neutron port-list|grep <subnet_id>' )
Reserve an unused IP on the 'internal_api' network (I picked IP <subnet>.42 because it was available  )
[stack@sc-instack ~]$ neutron port-create --fixed-ip ip_address= 44bb7c18-2ba6-49ab-b344-7d644bb3110f (internal_api)
Created a new port:
| Field                 | Value                                                                               |
| admin_state_up        | True                                                                                |
| allowed_address_pairs |                                                                                     |
| binding:host_id       |                                                                                     |
| binding:profile       | {}                                                                                  |
| binding:vif_details   | {}                                                                                  |
| binding:vif_type      | unbound                                                                             |
| binding:vnic_type     | normal                                                                              |
| device_id             |                                                                                     |
| device_owner          |                                                                                     |
| fixed_ips             | {"subnet_id": "fc3ec57c-ff10-40b6-9b63-d6293bfe6ee1", "ip_address": ""} |
| id                    | b6e0bdd9-aac4-4689-8f31-a0c0bf2c1324                                                |
| mac_address           | 52:54:00:65:7e:b9                                                                   |
| name                  |                                                                                     |
| network_id            | 44bb7c18-2ba6-49ab-b344-7d644bb3110f                                                |
| security_groups       | 92c4d34a-2b9c-4a85-b309-d3425214eca1                                                |
| status                | DOWN                                                                                |
| tenant_id             | fae58cc4e36440b3aa9c9844e54f968d                                                    |

Do the same with the 'ctlplane' network (IP <subnet>.42 was free there too..)
[stack@sc-instack ~]$ neutron port-create --fixed-ip ip_address= 207a3108-e341-4360-b433-bfd6007cc59d (ctlplane)
Created a new port:
| Field                 | Value                                                                               |
| admin_state_up        | True                                                                                |
| allowed_address_pairs |                                                                                     |
| binding:host_id       |                                                                                     |
| binding:profile       | {}                                                                                  |
| binding:vif_details   | {}                                                                                  |
| binding:vif_type      | unbound                                                                             |
| binding:vnic_type     | normal                                                                              |
| device_id             |                                                                                     |
| device_owner          |                                                                                     |
| fixed_ips             | {"subnet_id": "7e1e052f-b4eb-4f3b-8b1c-ba298cbe530f", "ip_address": ""} |
| id                    | 64ece3ea-5df4-4840-93ae-fcd844e8cc29                                                |
| mac_address           | 52:54:00:91:45:b3                                                                   |
| name                  |                                                                                     |
| network_id            | 207a3108-e341-4360-b433-bfd6007cc59d                                                |
| security_groups       | 92c4d34a-2b9c-4a85-b309-d3425214eca1                                                |
| status                | DOWN                                                                                |
| tenant_id             | fae58cc4e36440b3aa9c9844e54f968d                                                    |

4. Configure the reserved IPs on your Sensu server.

Of course, now that the IP's are reserved we could just enable DHCP on 'eth1' and 'eth2' but it would make the Sensu VM rely on the Cloud's DHCP infrastructure
so we will simply use static IPV4 addresses.
As usual, adapt for your network..

[admin@sensu01 ~]$ sudo su -
[root@sensu01 admin]# nmcli con mod "System eth0" eth0
[root@sensu01 admin]# nmcli con mod "System eth1" eth1
[root@sensu01 admin]# nmcli con mod "System eth2" eth2
[root@sensu01 admin]# nmcli con mod eth1 ipv4.addresses[root@sensu01 admin]# nmcli con mod eth1 ipv4.gateway[root@sensu01 admin]# nmcli con mod eth1 ipv4.method manual
[root@sensu01 admin]# nmcli con up eth1
[root@sensu01 admin]# nmcli con mod eth2 ipv4.addresses[root@sensu01 admin]# nmcli con mod eth2 ipv4.method manual
[root@sensu01 admin]# nmcli con up eth2
##or if you don't want to use NetworkManager
cat /etc/sysconfig/network-scripts/ifcfg-eth1
cat /etc/sysconfig/network-scripts/ifcfg-eth1.20
ifup eth1
ifup eth1.20

5. Create a monitoring user on both the undercloud and overcloud

In order to perform checks against the Openstack API of the undercloud and of the overcloud, we'll need a tenant and a tenant in those two databases.
Note that I am using 'monitoring' and 'sensu/sensu'. Change the former as you see fit but remember these values as we'll need them during the ansible part.
[stack@sc-instack ~]$ . stackrc
[stack@sc-instack ~]$ keystone tenant-create --name monitoring --enabled true --description 'Tenant used by the OSP monitoring framework'
|   Property  |                    Value                    |
| description | Tenant used by the OSP monitoring framework |
|   enabled   |                     True                    |
|      id     |       cc95c4d9a9654c469b2b352895109c5d      |
|     name    |                  monitoring                 |
[stack@sc-instack ~]$ keystone user-create --name sensu --tenant monitoring --pass sensu --email --enabled true
| Property |              Value               |
|  email   |         |
| enabled  |               True               |
|    id    | 4cd0578ee84740538283de84940cd737 |
|   name   |              sensu               |
| tenantId | cc95c4d9a9654c469b2b352895109c5d |
| username |              sensu               |
[stack@sc-instack ~]$ . overcloudrc
[stack@sc-instack ~]$ keystone tenant-create --name monitoring --enabled true --description 'Tenant used by the OSP monitoring framework'
|   Property  |                    Value                    |
| description | Tenant used by the OSP monitoring framework |
|   enabled   |                     True                    |
|      id     |       499b5edd1c724d37b4c6573ed15d9a85      |
|     name    |                  monitoring                 |
[stack@sc-instack ~]$ keystone user-create --name sensu --tenant monitoring --pass sensu --email --enabled true
| Property |              Value               |
|  email   |         |
| enabled  |               True               |
|    id    | 6f8c07c1c8e045698eb31e2187e9fc59 |
|   name   |              sensu               |
| tenantId | 499b5edd1c724d37b4c6573ed15d9a85 |
| username |              sensu               |

6. Run ansible

You can run ansible from the undercloud (or sensu), just make sure the key of the machine from where you run ansible is installed everywhere.

In our case, we use undercloud and user heat-admin because it's already present on most machines.

Obtain the playbook and adapt to your environment

1. Pull down the GIT repository on the sensu VM (or copy it from elsewhere)

[stack@sensu01 ~]$ mkdir mycloud
[stack@sensu01 ~]$ cd mycloud
[stack@sensu01 mycloud]$ git -c clone 
Cloning into 'ansible-sensu-for-openstack'...
remote: Counting objects: 473, done.
remote: Compressing objects: 100% (451/451), done.
remote: Total 473 (delta 242), reused 0 (delta 0)
Receiving objects: 100% (473/473), 75.73 MiB | 1.70 MiB/s, done.
Resolving deltas: 100% (242/242), done.
Checking connectivity... done.

2. Install ansible version >2

[root@sensu01 ~]# easy_install pip
[root@sensu01 ~]# pip install ansible

3. Create the inventory file and the playbook.

(Look inside the within the ansible role and copy/paste). Adapt the IP's and credentials to your environment, of course.

If you followed the previous steps you can now use a small tool to generate your inventory.
This tools works by contacting the undercloud machine so it really requires a working network configuration.
It will build an inventory file with all of your hosts, including the IPMI IP addresses (as configured in Nova).
Redirect the scripts's output to a an inventory file..

[admin@sensu01 mycloud]$ ./ansible/ansible/tools/ stack@
# Collecting information from Nova............Done!
sc-cmpt00 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-cmpt01 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-cmpt02 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-cmpt03 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-cmpt04 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ctrl00 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ctrl01 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ctrl02 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ceph00 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ceph01 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ceph02 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ceph03 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-ceph04 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-strg00 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-strg01 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sc-strg02 ansible_ssh_host= ansible_user=heat-admin ipmi_lan_addr=
sensu01 ansible_ssh_host= ansible_user=admin
instack ansible_ssh_host= ansible_user=stack
[admin@sensu01 mycloud]$ ./ansible-sensu-for-openstack/tools/ stack@ > hosts
Next, create files in group_vars/   to customize the IP's, logins and password to match those found in your infrastructure.

You'll need the API URL's for your undercloud and overcloud. You can start from the .sample files:

[stack@sensu01 ansible-sensu-for-openstack]$ cat group_vars/all.sample > group_vars/all
[stack@sensu01 ansible-sensu-for-openstack]$ cat group_vars/sensu_server.sample > group_vars/sensu_server
for example:

[stack@sensu01 ansible-sensu-for-openstack]$ cat group_vars/all # Put this in your playbook at group_vars/all #sensu_use_local_repo: false #sensu_use_upstream_version: false #sensu_api_ssl: false sensu_server_rabbitmq_hostname: "" [stack@sensu01 ansible-sensu-for-openstack]$ cat group_vars/sensu_server #sensu_server_dashboard_user: uchiwa #sensu_server_dashboard_password: mypassword sensu_smtp_from: "" sensu_smtp_to: "" sensu_smtp_relay: "" #sensu_handlers: #  email: #    type: pipe #    command: "mail -S smtp={{ sensu_smtp_relay }} -s 'Sensu alert' -r {{ sensu_smtp_from }} {{ sensu_smtp_to }}" #over_os_username: sensu #over_os_password: sensu #over_os_tenant_name: monitoring over_os_auth_url: #under_os_username: sensu #under_os_password: sensu #under_os_tenant_name: monitoring under_os_auth_url:

4. Execute the role with ansible to deploy sensu and uchiwa

When your config is ready, you will want to execute the playbook and check for errors (if any).
A good way to test if your host list is fine and if your SSH keys are imported is to run the following ansible CLI before launching the playbook itself:

admin@sensu01$ ansible -m ping -i hosts all
When ready, launch the playbook with the following CLI:

admin@sensu01$ ansible-playbook -i hosts playbook/sensu.yml

If all goes well, you should receive an output similar to those included below.

Most of the IP's & config settings can be overriden in either the playbook, the group_vars or by editing the <playbook_dir>/defaults/main.yml file.
Should your servers be unable to reach out to the Internet and/or contact CDN, it is possible to use 'sensu_use_local_repo: true' to install the local set of rpms provided with the GIT repo.
This should only be performed if you have valid RHEL and OSP subscriptions but cannot download software from the internet on your OSP nodes.

5. Sample outputs

Fig. 1 (Starting the playbook)

Fig. 2 (Playbook finished)

Verify proper deployment
Once the playbook has run successfully, you will be able to log into your uchiwa interface to check the current status of your OSP.

Known Issues

Raw list of Sensu checks included with this playbook (To be completed).

The following is a work in progress which lists the checks that are currently implemented
Check NameImplementationSubscribersPurpose
ceph_health:/usr/bin/sudo .../checks/oschecks-check_ceph_healthceph
ceph_disk_free:/usr/bin/sudo .../checks/oschecks-check_ceph_dfceph
nova-compute:sudo .../checks/oschecks-check_amqp nova-computecmpt
proc_nova_compute:.../plugins/ nova-compute 1 100cmpt
proc_ceilometer-agent-compute:.../plugins/ ceilometer-agent-compute 1 1cmptLooks for 1 (up to 100) nova-compute process(es)
rabbitmq_status:/usr/bin/sudo /usr/sbin/rabbitmqctl statusctrlRabbitmqctl Status
rabbitmq_cluster_status:/usr/bin/sudo /usr/sbin/rabbitmqctl cluster_statusctrlRabbitmqctl Cluster Status
pacemaker_status:/usr/bin/sudo /usr/sbin/crm_mon -sctrlPacemaker Cluster Status
proc_keystone_all:.../plugins/ keystone-all 3 100ctrl, instackLooks for 3 (up to 100) keystone-all process(es)
proc_httpd:.../plugins/ httpd 3 100ctrl, instack
proc_mongod:.../plugins/ mongod 1 1ctrl
proc_nova_api:.../plugins/ nova-api 1 100ctrl, instack
proc_glance_api:.../plugins/ glance-api 1 100ctrl, instack
proc_glance_registry:.../plugins/ glance-registry 1 100ctrl, instack
proc_nova_conductor:.../plugins/ nova-conductor 1 100ctrl, instack
proc_nova_consoleauth:.../plugins/ nova-consoleauth 1 1ctrl, instack
proc_nova_novncproxy:.../plugins/ nova-novncproxy 1 1ctrl
proc_neutron-server:.../plugins/ neutron-server 1 100ctrl
proc_neutron-l3-agent:.../plugins/ neutron-l3-agent 1 1ctrl
proc_neutron-dhcp-agent:.../plugins/ neutron-dhcp-agent 1 1ctrl
proc_neutron-openvswitch-agent:.../plugins/ neutron-openvswitch-agent 1 1ctrl
proc_neutron-metadata-agent:.../plugins/ neutron-metadata-agent 1 100ctrl
proc_neutron-ns-metadata-proxy:.../plugins/ neutron-ns-metadata-proxy 1 100ctrl
ceilometer-collector:sudo .../checks/oschecks-check_amqp ceilometer-collectorctrl, instack
ceilometer-agent-notification :sudo .../checks/oschecks-check_amqp ceilometer-agent-notificationctrl, instack
ceilometer-alarm-notifier :sudo .../checks/oschecks-check_amqp ceilometer-alarm-notifierctrl, instack
cinder-scheduler :sudo .../checks/oschecks-check_amqp cinder-schedulerctrl
nova-consoleauth :sudo .../checks/oschecks-check_amqp nova-consoleauthctrl, instack
nova-conductor :sudo .../checks/oschecks-check_amqp nova-conductorctrl, instack
nova-scheduler :sudo .../checks/oschecks-check_amqp nova-schedulerctrl, instack
neutron-server :sudo .../checks/oschecks-check_amqp neutron-serverctrl, instack
neutron-l3-agent :sudo .../checks/oschecks-check_amqp neutron-l3-agentctrl
neutron-lbaas-agent :sudo .../checks/oschecks-check_amqp neutron-lbaas-agentlbaas
neutron-dhcp-agent :sudo .../checks/oschecks-check_amqp neutron-dhcp-agentctrl, instack
heat-engine :sudo .../checks/oschecks-check_amqp heat-enginectrl, instack
heat_service_list:/usr/bin/sudo /usr/bin/heat-manage service listctrl, instack
proc_chronyd:.../plugins/ chronyd 1 1instack
over_ceilometer_api:.../checks/oschecks-check_ceilometer_api <OS_ARGS>openstack_over_api
over_cinder_volume:.../checks/oschecks-check_cinder_volume <OS_ARGS>openstack_over_api
over_glance_api:.../checks/oschecks-check_glance_api <OS_ARGS>openstack_over_api
over_glance_image_exists:.../checks/oschecks-check_glance_image_exists <OS_ARGS>openstack_over_api
over_glance_upload:.../checks/oschecks-check_glance_upload <OS_ARGS>openstack_over_api
over_keystone_api:.../checks/oschecks-check_keystone_api <OS_ARGS>openstack_over_api
over_neutron_api:.../checks/oschecks-check_neutron_api <OS_ARGS>openstack_over_api
over_neutron_floating_ip:.../checks/oschecks-check_neutron_floating_ip <OS_ARGS>openstack_over_api
over_nova_api:.../checks/oschecks-check_nova_api <OS_ARGS>openstack_over_api
over_nova_instance:.../checks/oschecks-check_nova_instance <OS_ARGS>openstack_over_api
instack_glance_api:.../checks/oschecks-check_glance_api <OS_ARGS>openstack_under_api
instack_glance_image_exists:.../checks/oschecks-check_glance_image_exists <OS_ARGS>openstack_under_api
instack_glance_upload:.../checks/oschecks-check_glance_upload <OS_ARGS>openstack_under_api
instack_keystone_api:.../checks/oschecks-check_keystone_api <OS_ARGS>openstack_under_api
instack_nova_api:.../checks/oschecks-check_nova_api <OS_ARGS>openstack_under_api
proc_ntpd:.../plugins/ ntpd 1 1osp_generic
proc_xinetd:.../plugins/ xinetd 1 1osp_generic
proc_ntpd:.../plugins/ ntpd 1 1overcld_generic
proc_xinetd:.../plugins/ xinetd 1 1overcld_generic
proc_redis-server:.../plugins/ redis-server 1 1server,ctrl
proc_rabbitmq:.../plugins/ beam.smp 1 1server
sensu_api:.../plugins/ sensu-api 1 1server
sensu_server:.../plugins/ sensu-server 1 1server
proc_swift-object-server:.../plugins/ swift-object-server 1 2strg
proc_swift-account-server:.../plugins/ swift-account-server 1 2strg
proc_swift-container-server:.../plugins/ swift-container-server 1 2strg
proc_swift-object-replicator:.../plugins/ swift-object-replicator 1 2strg
proc_swift-account-replicator:.../plugins/ swift-account-replicator 1 2strg
proc_swift-container-replicator:.../plugins/ swift-container-replicator 1 2strg
proc_swift-object-auditor:.../plugins/ swift-object-auditor 1 3strg
proc_swift-account-auditor:.../plugins/ swift-account-auditor 1 2strg
proc_swift-container-auditor:.../plugins/ swift-container-auditor 1 2strg
LSI_PERC_status:sudo .../plugins/megaclisas-status --nagiossystem
sensu_client:.../plugins/ sensu-client 1 5system
system_memory:.../plugins/check_mem.shsystemChecks systems memory usage
system_FS_root:.../plugins/ -c 90 -w 80 -d /systemChecks root FS available space
system_FS_root_inodes:.../plugins/check_disk_inodes -w 80 -c 90 -p /systemChecks root FS available inodes
proc_crond:.../plugins/ crond 1 1systemLooks for 1 crond process
proc_systemd:.../plugins/ systemd 1 100systemLooks for 1 systemd process
proc_sshd:.../plugins/ sshd 1 100systemLooks for 1 (up to 100) sshd process(es)

The WIP document listing all of the checks is found on Google Docs [9]

Related links

[2] N/A
[3] N/A
[4] N/A
[6] N/A
[7] N/A
[8] N/A
[9] N/A

An NVidia GTX 1050 Ti in that PowerEdge T440 without the GPU Kit.

Having recently upgraded the GPU in my Dell Precision T7910, I found myself with this card lying around (ASUS ROC STRIX 1050 Ti Gaming): ...