Recommendations

Revision 27 as of 2016-04-27 13:11:49

Clear message

Max pids on Ubuntu 16.04

If you are creating a lot of processes on Ubuntu, it has a systemd cgroup security mechanism that will limit it. It was defined by this proposal.

This is the case when you receive messages like make: fork: Resource temporarily unavailable or Cannot fork.

You can increate the max amount of pids or, even, disable this security feature changing file /etc/systemd/system.conf.

If you want to increase the amount of process, you can increase the line:

      DefaultTasksMax=512

In order to disable it completely, you can do:

      DefaultTasksAccounting=yes

to

      DefaultTasksMax=infinity

Btrfs on Ubuntu

We would caution anyone on relying on BTRFS for production use. There isn't a fsck userspace tool for btrfs, it's done in-kernel. If that screws up, your data is unfixable. Colin King on the kernel team had also done some thorough testing of btrfs about a year ago and he deemed it still experimental quality. The test matrix from a year ago still demonstrates some bad failures on various tests: http://kernel.ubuntu.com/~cking/btrfs-testing/jan-9-2015/

Hot Plug on Ubuntu 15.04

If you are running Ubuntu 15.04 or newer, there is a cgroup hotplug issue to consider when changing SMT modes. In summary, if the system was previously set to a lower SMT mode and a user changes the system to a higher SMT mode, this cgroup hotplug issue may prevent tasks from running on those CPUs that were brought online to switch the processor cores to a higher SMT mode.

There is a documentation about this topic at SMT and cgroup cpusets documentation

Samba on 14.04

Samba on 14.04 is not working properly on Ubuntu/ppc64el. The problem doesn't happen on 14.10 and later release. For more information about this bug, check https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1472584

Hot plug

As soon as you install Ubuntu on a POWER system, it is recommended to install the ppc64-diag package in order to enable RAS features, as for example, hot plug completation, firmware log dump, etc.

# apt-get install ppc64-diag

powerpc-utils package

In order to have the package powerpc utils installed in Ubuntu, you need to install the package named powerpc-ibm-utils instead of powerpc-utils.

The powerpc-utils is a package focused on the old POWER Apple machines and shouldn't be used in IBM POWER servers.

So, in order to install it, run:

# apt-get install powerpc-ibm-utils

Crash Kernel recommendations

The following are the recommended crashkernel values for different memory ranges. This values are arrived at after testing different scenarios:

  • For memory between 2G through 4G, reserve 320M
  • For memory between 4G through 32G, reserve 512M
  • For memory between 32G through 64G, reserve 1024M
  • For memory between 64G through 128G, reserve 2048M
  • For memory above 128G, reserve 4096M
  • Reserve memory at an offset of 32M

Actaully, we can pass it as a condition based crashkernel= parameter based on the size of total system memory, so that it works irrespective of system memory size, like below:

        crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@32M

So, the above parameter effective converts to crashkernel=320M on a system with 2G through 4G memory, while it means crashkernel=512M on a system with 4G through 32G memory and so on. Also, if Out of Memory issues are seen in kdump kernel, try increasing the memory reserved for crashkernel.

Firmware Assisted Dump

Firmware Assisted Dump (fadump) is an alternative to kdump crash dumping mechanism, available in powerpc architecture. To understand how fadump works, please refer to the kernel documentation below:

Two steps are needed to use fadump as the crash dumping mechanism. Firstly, enabling fadump by passing "fadump=on" to kernel. Secondly, registering fadump by echo'ing 1 to /sys/kernel/fadump_registered.

1. To enable fadump:

  • Add "fadump=on" to GRUB_CMDLINE_LINUX in /etc/default/grub file.
  • Rebuild grub config

        # grub-mkconfig -o /boot/grub/grub.cfg
  • Reboot

2. To register fadump:

  • kdump-tools, scripts and tools for automating kdump, is updated to make it fadump aware.

    When fadump is enabled, kdump-tools registers fadump as crash dumping mechanism, by echo'ing 1 to /sys/kernel/fadump_registered. For more help, see:

        # kdump-config help

NOTE: If fadump fails to collect dump with Out Of Memory error, use "fadump_reserve_mem=" parameter to spike up the memory reserved for firmware-assisted dump.

Min_free_kbytes kernel configuration

The default value of min_free_kbytes doesn't work fine because there are lots of "interrupt: 501 at plpar_hcall_norets+0x1c/0x2" created when we run with the default value. If we increase min_free_kbytes it will eliminate the fault.

If you encounter these types of problems, try doubling the value of vm.min_free_kbytes up to a maximum of either 65536 or 5% of total memory (whichever is lower).

Graphics support for Ubuntu 14.04.3

Installation of required packages

  • In order to have a properly functional Graphical User Interface (GUI), depending on the graphics card currently in use, there is the need to install a package that is currently not being automatically processed as a dependency of the GUI meta packages.


    For the AMD FirePro 2270 adapter (Feature Code: EC41 [1] or EC42 [2]), use the following command prior to installing the GUI meta package desired:

    • # apt-get install xserver-xorg-video-ati
    For the ASPEED (AST) graphics adapter [3], use the following command to prior to installing the GUI meta package desired:
    • # apt-get install xserver-xorg-video-modesetting

Unity and Gnome support

  • Unity and Gnome are not supported in Power for video cards lacking 3D Hardware Acceleration, this is due to llvmpipe being currently disabled for Power in the distro. This is also affected by gnome-session which currently isn't setting the flag "--disable-acceleration-check" on start.


    So ASPEED (AST) adapter [3] won't work with these environments. As an alternative other desktop environments like xfce (xubuntu-desktop meta package) or KDE Plasma (kubuntu-desktop meta package) can be used.


    AMD FirePro 2270 adapter (Feature Code: EC41 [1] or EC42 [2]) support Unity and Gnome since it has 3D hardware acceleration.

[1] http://www-01.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec41.htm
[2] http://www-01.ibm.com/support/knowledgecenter/POWER8/p8hcd/fcec42.htm
[3] Found in the following Machines Type and Model (MTM): 8335-GCA, 8335-GTA, and 8348-21C.

mlx4 adapter DLPAR

There is a known problem when hot unplugging adapters using mlx4 driver on PowerVM through the Dynamic LPAR (DLPAR) on Ubuntu 16.04 when using Mellanx OFED (MOFED) drivers.

This dynamic remove might cause a EEH (Enhanced Error Handling) stack on the kernel, as showed below:

[19684.385410] mlx4_core 002b:01:00.0: device is going to be reset
[19684.385412] mlx4_core 002b:01:00.0: device was reset successfully
[19684.385416] mlx4_core 002b:01:00.0: MAP_ICM_AUX command failed, aborting
[19684.385741] mlx4_core: probe of 002b:01:00.0 failed with error -5
[19684.385776] EEH: Notify device driver to resume
[19684.385784] EEH: Detected PCI bus error on PHB#43-PE#10000
[19684.385793] EEH: PHB#43-PE#10000 has failed 6 times in the
[19684.385793] last hour and has been permanently disabled.
[19684.386803] EEH: of node=002b:01:00:0
[19684.386874] EEH: PCI device/vendor: 100315b3
[19684.386945] EEH: PCI cmd/status register: 00100142
[19684.386946] EEH: PCI-E capabilities and status follow:
[19684.387304] EEH: PCI-E 00: 0002c010 00008e02 0000504e 0843f483
[19684.387591] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
[19684.387593] EEH: PCI-E 20: 00000000
[19684.387594] EEH: PCI-E AER capability register set follows:
[19684.387955] EEH: PCI-E AER 00: 18c20001 00000000 00000000 00062010
[19684.388242] EEH: PCI-E AER 10: 00000000 00002000 000001e0 00000000
[19684.388529] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[19684.388601] EEH: PCI-E AER 30: 00000000 00000000

Network interface names changes after PCI addition/removal

With the ifname approach to network device naming and the retirement of /lib/udev/rules.d/75-persistent-net-generator.rules in Ubuntu 16.04, there are cases where the addition/removal of PCI devices could cause network names to slip for ppc64le based systems.

To avoid the potential for name slippage in automatically generated network device names, you can create your own udev rules file to revert to the previous behavior of pinning the network names based on MAC address via /etc/udev/rules.d/70-persistent-net.rules. For example:

# cat /etc/udev/rules.d/70-persistent-net.rules
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:c8", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:c9", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:ca", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="40:f2:e9:5b:f6:cb", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="?*", NAME="en3"

Note: Avoid using the auto-assigned names of eth* or enP*. Be sure to update your /etc/network/interfaces* files with the new network device names chosen.

Firmware update

In band firmware update

There is a known issue on with opal firmware regarding in-band firmware update (with "update_flash" (from powerpc-utils 1.3.1-1). This is not recommended in Ubuntu 16.04 until the problem is fixed.

The recommended firmware update should be done using the out-of-band method.