Recommendations

Ubuntu 16.10

Kernel recommendation

There are two critical bugs that affects Ubuntu 16.10 kernel on PowerVM and KVM virtualization. They are:

These two fixes were released on kernel 4.8.0-23.25. We strongly recommend any 16.10 users to use a kernel at this level or newer than this version.

If that screws up does not function correctly, your data is not repairable

Ubuntu 16.04

Btrfs on Ubuntu

We would caution anyone on relying on BTRFS for production use. There isn't a fsck userspace tool for btrfs, it's done in-kernel. If that does not function correctly, your data is not repairable. Colin King on the kernel team had also done some thorough testing of btrfs about a year ago and he deemed it still experimental quality. The test matrix from a year ago still demonstrates some bad failures on various tests: http://kernel.ubuntu.com/~cking/btrfs-testing/jan-9-2015/

Max pids on Ubuntu 16.04

If you are creating a lot of processes on Ubuntu, it has a systemd cgroup security mechanism that will limit it. It was defined by this proposal.

This is the case when you receive messages like make: fork: Resource temporarily unavailable or Cannot fork.

You can increate the max amount of pids or, even, disable this security feature changing file /etc/systemd/system.conf.

If you want to increase the amount of process, you can increase the line:

      DefaultTasksMax=512

In order to disable it completely, you can do:

      DefaultTasksAccounting=yes

to

      DefaultTasksMax=infinity

Ubuntu 14.04

Samba on 14.04

Samba on 14.04 is not working properly on Ubuntu/ppc64el. The problem doesn't happen on 14.10 and later release. For more information about this bug, check https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1472584

RAS features

powerpc-utils package

In order to have the package powerpc utils installed in Ubuntu, you need to install the package named powerpc-ibm-utils instead of powerpc-utils.

The powerpc-utils is a package focused on the old POWER Apple machines and shouldn't be used in IBM POWER servers.

So, in order to install it, run:

# apt-get install powerpc-ibm-utils

Crash Kernel recommendations

The following are the recommended crashkernel values for different memory ranges. This values are arrived at after testing different scenarios:

  • For memory between 2G through 4G, reserve 320M
  • For memory between 4G through 32G, reserve 512M
  • For memory between 32G through 64G, reserve 1024M
  • For memory between 64G through 128G, reserve 2048M
  • For memory above 128G, reserve 4096M

Actually, we can pass it as a condition based crashkernel= parameter based on the size of total system memory, so that it works irrespective of system memory size, like below:

crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M

So, the above parameter effective converts to crashkernel=320M on a system with 2G through 4G memory, while it means crashkernel=512M on a system with 4G through 32G memory and so on. Also, if Out of Memory issues are seen in kdump kernel, try increasing the memory reserved for crashkernel.

Note: If kdump kernel fails to boot, reserve memory at an offset of 32M like below:

crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@32M

kdump kernel boot issues

It is a good idea to check whether kdump is working, while configuring a system. If kdump kernel fails to boot, try passing maxcpus=1 instead of nr_cpus=1 parameter and try again. This is observed to help in some cases (like when SMT is off).

Kdump not generating crash dump file

Kdump might not be generating a log file into /var/crash after a crash.

This could be due either to a missing paramenter on the kernel command line or the target device being on NVMe. Currently kdump targets on NVMe are not supported.

To correct this problem, one should add the parameter "noirqdistrib" to eiher /etc/default/kdump-tools, on variable "KDUMP_CMDLINE_APPEND" or in the kernel command line itself. Solution:

- Add "noirqdistrib" parameter to KDUMP_CMDLINE_APPEND in /etc/default/kdump-tools file.

- Run "systemctl restart kdump-tools.service" to reload kdump kernel with this parameter.

After the aforementioned solution, after a crash, the directory /var/crash should be populated with the logs from kdump.

Kdump and USB

Writing the vmcore to a filesystem on a USB mounted disk works if the disk or partition is specified by UUID or by-path in the /etc/fstab file.

The key issue is: if any disk listed in /etc/fstab is a USB attached disk, then the kdump kernel must support USB. Otherwise the kdump startup process will drop to an emergency shell. It may not be obvious that the "nousb" option is a default for the kdump kernel commandline.

To override the nousb default for the kdump kernel, use the following line in "/etc/default/kdump-tools"

KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 systemd.unit=kdump-tools.service"

Note that the nousb option is not present

Configuring Dump Capturing Support (KDump/FADump)

Since the memory required to boot capture Kernel is a moving target that depends on many factors like hardware attached to the system, kernel and modules in use, packages installed and services enabled, there is no one-size-fits-all. So, please take the above recommendations with a pinch of salt and remember to try capturing dump a few times to confirm that the system is configured successfully with dump capturing support. Remember to retry dump capturing whenever:

  • a) kernel got updated
    b) new packages are installed
    c) new services are enabled
    d) boot/sysctl parameters are changed

Note that the above recommendation applies to both KDump and FADump dump capturing mechanisms.

Firmware Assisted Dump

Firmware Assisted Dump (fadump) is an alternative to kdump crash dumping mechanism, available in powerpc architecture. To understand how fadump works, please refer to the kernel documentation below:

Three steps are needed to use fadump as the crash dumping mechanism.

Firstly, enabling fadump by passing "fadump=on" to kernel.

Secondly, reserving some memory for fadump.

Thirdly, registering fadump by echo'ing 1 to /sys/kernel/fadump_registered.

1. To enable fadump:

  • Add "fadump=on" to GRUB_CMDLINE_LINUX in /etc/default/grub file.
  • Rebuild grub config (you may do this once at the end of step 2)

        # grub-mkconfig -o /boot/grub/grub.cfg

2. To reserve memory for fadump:

  • Add "fadump_reserve_mem=<mem>" to GRUB_CMDLINE_LINUX in /etc/default/grub file.

  • Rebuild grub config

        # grub-mkconfig -o /boot/grub/grub.cfg

NOTE: For Ubuntu kernels that are based on kernel v4.12 or above, use 'crashkernel=' instead of 'fadump_reserve_mem=', to specify the memory to be reserved for fadump.

3. To register fadump:

  • Reboot the system for steps 1 & 2 to make the changes take effect.

  • kdump-tools, scripts and tools for automating kdump, are updated to make it fadump aware.
  • When fadump is enabled, kdump-tools registers fadump as crash dumping mechanism, by echo'ing 1 to /sys/kernel/fadump_registered. For more help, see:

        # kdump-config help

NOTE: If fadump fails to collect a dump with an Out Of Memory error, increase the memory reserved for the firmware-assisted dump (see step 2).

Kernel Config

Possible network interface name change after upgrading kernel

Linux kernel 4.4 starts using a mechanism called Network Predictable Naming for the network interfaces. It means the name of network interfaces is based on PCI addresses of the network adapters. For example, an adapter with PCI address 0003:01:00.0 would have a mapped network interface called enP3p1s0f0.

Due to recent changes on Linux kernel to better accommodate the Network Predictable naming in ppc64el architecture, users can possibly experience change in their network interfaces' names on Ubuntu kernel upgrade to version 4.4.0-36 or subsequent versions.

The solution to this issue is to change the network interface name on file /etc/network/interfaces to fit the new name interface got after the kernel upgrade. After this, for all subsequent kernel versions >= 4.4.0-36 there will be no more naming modifications.

Notice however that booting from old kernel after changing the interface name will present the same issue again!

This issue appears when upgrading from old kernel to 4.4.0-36 and subsequent. Happens on Ubuntu 16.04 and 14.04.5 .


Hint: to show all network interfaces on your system, just issue ls -l /sys/class/net - it'll show all the interfaces currently available and the symbolic link to their PCI devices.

Network interface names changes after PCI addition/removal

With the ifname approach to network device naming and the retirement of /lib/udev/rules.d/75-persistent-net-generator.rules in Ubuntu 16.04, there are cases where the addition/removal of PCI devices could cause network names to slip for ppc64le based systems.

To avoid the potential for name slippage in automatically generated network device names, you can create your own udev rules file to revert to the previous behavior of pinning the network names based on MAC address via /etc/udev/rules.d/70-persistent-net.rules. For example:

# cat /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:c8", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:c9", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:ca", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:cb", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en3"

Note: Avoid using the auto-assigned names of eth* or enP*. Be sure to update your /etc/network/interfaces* files with the new network device names chosen.

Min_free_kbytes kernel configuration

The default value of min_free_kbytes doesn't work fine because there are lots of "interrupt: 501 at plpar_hcall_norets+0x1c/0x2" created when we run with the default value. If we increase min_free_kbytes it will eliminate the fault.

If you encounter these types of problems, try doubling the value of vm.min_free_kbytes up to a maximum of either 65536 or 5% of total memory (whichever is lower).

Netboot installation over IPMI

Ubuntu installer is loaded via kexec when netboot installation is used. The kexec by default will append Petitboot (the bootloader) command-line on Ubuntu's installer kernel. Ubuntu installer parses this command-line and if it has one or more "console=" entries, it will display the installer by default on the last console. We experienced situations in which the last console was tty1 and from the ipmi console perspective, installer seemed hanged since the output was in another terminal.

That said, we have a recommendation on netboot installation over IPMI: append manually your preferred console on Ubuntu's kexec procedure, so it will surely shows the installer output on the right console. Example:

kexec -l vmlinux --initrd initrd.gz --apppend="console=hvc0"

In band firmware update

There is a known issue on with opal firmware regarding in-band firmware update (with "update_flash" (from powerpc-utils 1.3.1-1). This is not recommended in Ubuntu 16.04 until the problem is fixed.

The recommended firmware update should be done using the out-of-band method.

Graphics support for Ubuntu 14.04.3

Installation of required packages

  • In order to have a properly functional Graphical User Interface (GUI), depending on the graphics card currently in use, there is the need to install a package that is currently not being automatically processed as a dependency of the GUI meta packages.


    For the AMD FirePro 2270 adapter (Feature Code: EC41 [1] or EC42 [2]), use the following command prior to installing the GUI meta package desired:

    • # apt-get install xserver-xorg-video-ati
    For the ASPEED (AST) graphics adapter [3], use the following command to prior to installing the GUI meta package desired:
    • # apt-get install xserver-xorg-video-modesetting

Unity and Gnome support

  • Unity and Gnome are not supported in Power for video cards lacking 3D Hardware Acceleration, this is due to llvmpipe being currently disabled for Power in the distro. This is also affected by gnome-session which currently isn't setting the flag "--disable-acceleration-check" on start.


    So ASPEED (AST) adapter [3] won't work with these environments. As an alternative other desktop environments like xfce (xubuntu-desktop meta package) or KDE Plasma (kubuntu-desktop meta package) can be used.


    AMD FirePro 2270 adapter (Feature Code: EC41 [1] or EC42 [2]) support Unity and Gnome since it has 3D hardware acceleration.

[1] http://www-01.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec41.htm
[2] http://www-01.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec42.htm
[3] Found in the following Machines Type and Model (MTM): 8335-GCA, 8335-GTA, and 8348-21C.

PowerVM recommendations

vNIC support

There are some known bugs on Ubuntu 16.10 and 16.04 at this moment affecting vNIC support. Use it with caution until we have all the bugs solved.

LPAR migration

LPAR migration may fail when memory is overcommitted. Please consider turning off memory overcommit if you are performing migrations in high memory-stress situations. Please see the following for further information on linux overcommit handling:

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Live Partition Migration

When running Ubuntu on a PowerVM LPAR and doing Live Partition Migration (LPM), the LPAR might fails to migrate if the memory is under heavy utilization.

mlx4 adapter DLPAR

There is a known problem when hot unplugging adapters using mlx4 driver on PowerVM through the Dynamic LPAR (DLPAR) on Ubuntu 16.04 when using Mellanx OFED (MOFED) drivers.

This dynamic remove might cause a EEH (Enhanced Error Handling) stack on the kernel, as showed below:

[19684.385410] mlx4_core 002b:01:00.0: device is going to be reset
[19684.385412] mlx4_core 002b:01:00.0: device was reset successfully
[19684.385416] mlx4_core 002b:01:00.0: MAP_ICM_AUX command failed, aborting
[19684.385741] mlx4_core: probe of 002b:01:00.0 failed with error -5
[19684.385776] EEH: Notify device driver to resume
[19684.385784] EEH: Detected PCI bus error on PHB#43-PE#10000
[19684.385793] EEH: PHB#43-PE#10000 has failed 6 times in the
[19684.385793] last hour and has been permanently disabled.
[19684.386803] EEH: of node=002b:01:00:0
[19684.386874] EEH: PCI device/vendor: 100315b3
[19684.386945] EEH: PCI cmd/status register: 00100142
[19684.386946] EEH: PCI-E capabilities and status follow:
[19684.387304] EEH: PCI-E 00: 0002c010 00008e02 0000504e 0843f483
[19684.387591] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
[19684.387593] EEH: PCI-E 20: 00000000
[19684.387594] EEH: PCI-E AER capability register set follows:
[19684.387955] EEH: PCI-E AER 00: 18c20001 00000000 00000000 00062010
[19684.388242] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[19684.388529] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[19684.388601] EEH: PCI-E AER 30: 00000000 00000000

KVM recommendations

Memory recommendations

If you are create a VM that has more then 64 vCPUs, you should have more than 2Gb of memory, otherwise you will face invocations of OOM, as for example:

[  315.431587] Out of memory: Kill process 4844 (systemd-cgroups) score 2 or sacrifice child
[  315.431794] Killed process 4844 (systemd-cgroups) total-vm:10624kB, anon-rss:0kB, file-rss:576kB, shmem-rss:0kB
[  315.438311] Out of memory: Kill process 4059 (rtas_errd) score 2 or sacrifice child
[  315.438746] Killed process 4059 (rtas_errd) total-vm:7104kB, anon-rss:0kB, file-rss:1472kB, shmem-rss:0kB
[  323.092263] Out of memory: Kill process 4331 (iscsid) score 3 or sacrifice child
[  323.094412] Killed process 4331 (iscsid) total-vm:4160kB, anon-rss:0kB, file-rss:1920kB, shmem-rss:192kB
[  323.166873] Out of memory: Kill process 4295 (rpc.mountd) score 2 or sacrifice child
[  323.167112] Killed process 4295 (rpc.mountd) total-vm:8384kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Migration between kvm hosts

There is a known bug on libvirt, when migrating a guest between kvm hosts on ppc64el would fail.

Errors like these:

Migration: [100 %]error: internal error: early end of file from monitor, possible problem: 
2016-12-26T11:09:42.030260Z qemu-system-ppc64: Unknown savevm section or instance 'pci@800000020000000:06.0/ohci' 0
2016-12-26T11:09:42.031574Z qemu-system-ppc64: load of migration failed: Invalid argument

error: monitor socket did not show up: No such file or directory

Migration: [100 %]error: operation failed: job: unexpectedly failed

The workaround to avoid that kind of migration problem is to define the usb controller model to 'pci-ohci' in the guest's xml.

Where you see the definition of "controller type='usb'". When it looked like so, for instance:

<controller>
...
  <controller type='usb' index='0'>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
...
</controller>

Add the model='pci-ohci':

<controller>
...
  <controller type='usb' model='pci-ohci' index='0'>
    <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
...
</controller>

That should make the migration finally work.

ppc64el/Recommendations (last edited 2020-01-23 06:08:44 by fheimes)