ACPITricksAndTips
ACPI Tricks and Tips
|
The ACPI driver
The ACPI specification is large and unwieldy hence the Linux ACPI driver is a very large and complex software component. There are many varieties of ACPI tables, but they break into two main categories - configuration data and ACPI Machine Language (AML) byte code. The driver is responsible for locating and extracting data from the ACPI tables and interpreting it in a way that conforms to the specification. The driver must also in many cases work the same way that Windows interprets the specification. The driver contains workarounds to firmware, Embedded Controller and Southbridge bugs; clearly this adds to the complexity of the code.
The driver doesn't just use configuration data in the ACPI tables - it also has to be able to interpret the ACPI AML byte code, which also adds to the complexity. The driver also contains an OS abstraction layer that maps to the Linux kernel way of accessing memory and I/O regions, which also adds another layer of indirection. All in all ACPI is a heavily overly engineered solution, however it is powerful because it provides an abstract view of how to control machine specific hardware to any operating system. The downside to ACPI is that it requires a driver to be able to correctly implement the ACPI specification and also work around non-conforming machines. The major downside is that most firmware vendors engineer ACPI to match with a very Microsoft centric implementation of ACPI which at times is a little lax in some areas and does not fully conform to the ACPI specification.
ACPI bugs
Bugs fall into several areas:
ACPI AML code bugs
The AML code allows firmware vendors to interface with the underlying hardware (e.g. Southbridge, Embedded Controller, I/O ports, etc) in a machine specific way but be able to present to the host operating system a standardised interface which has been carefully abstracted and defined fairly clearly in the ACPI spec. For example, to be able to query the current backlight brightness level, the vendor writes an AML method called _BQC (Backlight Query Current) which returns the level. The method could be implemented in many different ways, for example, just reading a value from the Embedded Controller's memory, or maybe it jumps into the BIOS to fetch information using a System Management Mode service. As it is, the kernel has no idea of how the method is implemented - it just executes the AML byte code which has full access to I/O regions and memory to allow it to interact with the hardware.
Unfortunately the AML code is normally hand crafted code and being software can have bugs, here are just a few:
- Timing issues - e.g. interacting with the Embedded Controller, mis-timed read/writes. AML may execute correctly in Windows but not in Linux because of the different speed AML operations are being executed in different host operating systems.
- Race conditions - e.g. poor locking on regions that need atomic locking
- Mishandling of time-outs - e.g. not checking mutex lock time-out errors
- Busy loops - e.g. busy loops checking Embedded Controller port status changes, and looping forever in a tight loop.
- Methods returning incorrect values, or wrong data types
- Methods having multiple return paths where some return values, others don't.
- Semantic bugs - methods expected to behave according to the specification but don't.
- Syntactic bugs - AML compiles correctly on the Microsoft AML compiler but won't with the stricter Intel iasl compiler.
- Access of data outside predefined I/O or memory regions. Indexes into regions fall outside and leave undefined behaviour.
- Recursion - methods that recurse deeper than 255 levels are broken and stopped from overflowing the stack.
- Missing methods - e.g. brightness controls missing, or misspelt (such as _BQC spelled as _BCQ).
In fact the list is endless since it's just like program code.
The AML code is found in two types of tables; the DSDT and the SSDT. There should be only one DSDT but there can be one or more SSDTs. To view the code requires the following steps: Extract the tables from ROM and disassemble the AML byte code into AML assembler:
sudo acpidump > acpidata.dat acpixtract -sSSDT acpidata.dat acpixtract -sDSDT acpidata.dat iasl -d DSDT.dat SSDT*.dat
..and then look at the .dsl files that contain the AML assembler.
Mis-configured Tables
Tables contain data that defines how a system is configured and behaves. ACPI is very feature rich (probably too feature rich) and defines many aspects of system configuration. For example, CPU frequency levels can be defined in the ACPI tables, and it has been known for the maximum frequency level on some Atom netbooks to be omitted from the _PSS (Performance Supported States) object and hence the user cannot run their machine at the maximum frequency. The _PSS object is in fact in the DSDT or SSDT as AML code, so one needs to extract the table and disassemble this.
However, some tables are just pure configuration data structures, such as the FACP. To look at the individual fields one needs to extract the table and disassemble it. One can use:
sudo acpidump > acpidata.dat acpixtract -sFACP acpidata.dat iasl -d FACP.dat
..and then look at the FACP.dsl file and check each field against the ACPI specification which can be found at https://uefi.org/specifications
The development version of the firmware test suite (for 11.04) will contain the ability to dump and annotate the ACPI tables as follows:
sudo fwts acpidump -
which is a little easier.
Interaction with BIOS
This is where things are just plain evil. ACPI AML byte code can trigger System Management Interrupts by writing a magic number to a magic port. This then triggers a non-maskable interrupt out of kernel context straight into the BIOS context. The BIOS can then do anything it wants in a SMI, and return back to the kernel context at some arbitrary point in the future. The BIOS is at liberty to twiddle with any I/O ports or memory region and it also messes up the CPU cache during the SMI. So expect weird issues when SMIs are being used.
Incidentally, the FACP contains information concerning the SMI - the word at 0x2e contains the SCI interrupt number (which should show up as the acpi interrupt in /proc/interrupts) and the 4 byte integer at offset 0x30 in the FACP contains the SMI command port address. One writes a magic value to this command port to generate an SMI.
Interaction with the Embedded Controller
This is where really difficult to diagnose issues occur. The Embedded Controller (EC) contains proprietary code to control peripherals. For example, the EC may be directly connected to the sleep button, so that when a user presses the sleep key, the EC is interrupted and it then pokes the southbridge which generates an General Purpose Event (GPE) which the ACPI driver handles. Then the ACPI driver checks attempts to execute a AML method to handle this GPE event. The naming convention is as follows:
_Lxx for level triggered GPE, _Exx for edge triggered GPE.
the xx maps on the hexadecimal number of the GPE, e.g. level triggered GPE 0x1e will require a Method called _L1E in the DSDT to handle this event.
Another way the EC is interacted with is by reading/writing to the EC memory. This memory is mapped into the kernel's address space using AML OperationRegion() definitions, for example on an Lenovo 3000N200, this is declared as follows
OperationRegion (ERAM, EmbeddedControl, Zero, 0xFF) Field (ERAM, ByteAcc, Lock, Preserve) { Offset (0x60), // offset 0x60 from start of ERAM SMPR, 8, SMST, 8, SMAD, 8, .... etc.. .... GAU1, 8, CYC1, 8, BPC1, 16, BAC1, 16, BAT1, 8, BTW1, 16 }
Note how a symbolic name is mapped onto 8 or 16 bit fields in the embedded controllers RAM. Methods can then refer to the symbols to read/write to the EC RAM in a straight forward load/store instruction. However, the ACPI driver knows that these regions are EmbeddedControl addresses and maps the load/stores into data read/write commands that are issued over the EC command and data ports.
The EC command port location can be found by using:
cat /proc/acpi/embedded_controller/*/info gpe: 0x1c ports: 0x66, 0x62 use global lock: no
The convention is that the command port/status port is listed first (0x66) then the data port (0x62). The EC signals to the kernel via the acpi interrupt and generates GPE 0x1c. The global lock flag indicator if the EC uses a global lock to protect it when doing read/write transactions. The driver for the EC is found in drivers/acpi/ec.c
Generally, most Embedded Controllers share common features in their memory locations, with minor vendor specific changes from machine to machine. Hence most Lenovos share the same basic functionality mappings onto the EC memory, which enables one to figure out which memory location may be used for specific features. Since these fields are proprietary there is scant documentation and a lot of the debugging at this level is down to intelligent guesswork and reverse engineering the AML methods to figure out what how the EC works.
Section 5.6.4.1.1 if the version 4.0 of the ACPI specification "Queuing the Matching Control Method for Execution" explains another aspect of EC and ACPI interaction - the _Qxx embedded controller event methods.
General-purpose events can be raised from a GPE bit tied to an embedded controller. When these occur, the event is handled by acpi_ec_gpe_handler() which ultimately calls acpi_ec_sync_query() - this queries the EC for an 8 bit event code (via acpi_ec_query_unlocked()). The 8 bit event code indicates which _Qxx method to be called (xx is the hexadecimal number of the query). Note that query event code 0x00 is reserved - it indicates that there are no outstanding events.
The following examples of _Qxx events handle event codes 0xba and 0xbb for LID close and LID open events. Note how they store lid state and then generate a LID notification.
Method (_QBA, 0, NotSerialized) { Store (Zero, ^^^^LID0.LIDS) Notify (LID0, 0x80) } Method (_QBB, 0, NotSerialized) { Store (One, ^^^^LID0.LIDS) Notify (LID0, 0x80) }
To debug this, build a kernel with dynamic debug enabled "CONFIG_DYNAMIC_DEBUG=y" and boot. Then as root, enable debug as follows:
echo -n 'file ec.c +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
and messages appear in the kernel log. The following messages are of interest:
"~~~> interrupt" - an EC GPE has occurred.
- "push query execution (0xXX) on queue" - XX is the hexadecimal EC event code.
- "push gpe query to the queue" - shows a EC SCI has occurred and a EC query is being pushed
"---> status = 0xXX" - EC read status, XX is the hex status value
"---> data = 0xXX" - EC read of data, XX is the hex data value
"<--- command = 0xXX", EC command, XX is the hex command
"<--- data = 0xXX", EC data write, XX is the hex data value
How to Debug
If you have got this far and not feeling completely put off then well done! The next step to figuring out bugs in the ACPI domain is to be able to effectively tweak the ACPI driver debug code to get the required information out of the driver at run time.
The first step is to enable the ACPI debug code. This is a compile time kernel option. Enable CONFIG_ACPI_DEBUG and build a debug kernel.
Next, install the kernel and increase the internal kernel printk() circular buffer to ~8-16M as one can generate a lot of debug messages with the ACPI debug enabled. Use the kernel parameter:
log_buf_len=16M
Note that the size must be in powers of 2 to work correctly.
Next we need to select the appropriate debug bit masks. These come in two forms - the debug_layer which controls which component of the ACPI driver can generate debug and then the debug_level which debugs various types of messages (e.g. initialisation, method execution, info messages etc).
You can either set these at boot type using kernel parameters, e.g.:
acpi.debug_layer=0x8400082 acpi.debug_level=0x31000200
or at run time by echoing the hexadecimal values into the /sys/module/acpi/parameters files as root:
echo 0x8400082 > /sys/module/acpi/parameters/debug_layer echo 0x31000200 > /sys/module/acpi/parameters/debug_level
Below is a program to calculate the bit masks to enable specific debug features:
#include <stdio.h> #define ACPI_UTILITIES 0x00000001 #define ACPI_HARDWARE 0x00000002 #define ACPI_EVENTS 0x00000004 #define ACPI_TABLES 0x00000008 #define ACPI_NAMESPACE 0x00000010 #define ACPI_PARSER 0x00000020 #define ACPI_DISPATCHER 0x00000040 #define ACPI_EXECUTER 0x00000080 #define ACPI_RESOURCES 0x00000100 #define ACPI_CA_DEBUGGER 0x00000200 #define ACPI_OS_SERVICES 0x00000400 #define ACPI_CA_DISASSEMBLER 0x00000800 #define ACPI_COMPILER 0x00001000 #define ACPI_TOOLS 0x00002000 #define ACPI_BUS_COMPONENT 0x00010000 #define ACPI_AC_COMPONENT 0x00020000 #define ACPI_BATTERY_COMPONENT 0x00040000 #define ACPI_BUTTON_COMPONENT 0x00080000 #define ACPI_SBS_COMPONENT 0x00100000 #define ACPI_FAN_COMPONENT 0x00200000 #define ACPI_PCI_COMPONENT 0x00400000 #define ACPI_POWER_COMPONENT 0x00800000 #define ACPI_CONTAINER_COMPONENT 0x01000000 #define ACPI_SYSTEM_COMPONENT 0x02000000 #define ACPI_THERMAL_COMPONENT 0x04000000 #define ACPI_MEMORY_DEVICE_COMPONENT 0x08000000 #define ACPI_VIDEO_COMPONENT 0x10000000 #define ACPI_PROCESSOR_COMPONENT 0x20000000 #define ACPI_LV_INIT 0x00000001 #define ACPI_LV_DEBUG_OBJECT 0x00000002 #define ACPI_LV_INFO 0x00000004 #define ACPI_LV_INIT_NAMES 0x00000020 #define ACPI_LV_PARSE 0x00000040 #define ACPI_LV_LOAD 0x00000080 #define ACPI_LV_DISPATCH 0x00000100 #define ACPI_LV_EXEC 0x00000200 #define ACPI_LV_NAMES 0x00000400 #define ACPI_LV_OPREGION 0x00000800 #define ACPI_LV_BFIELD 0x00001000 #define ACPI_LV_TABLES 0x00002000 #define ACPI_LV_VALUES 0x00004000 #define ACPI_LV_OBJECTS 0x00008000 #define ACPI_LV_RESOURCES 0x00010000 #define ACPI_LV_USER_REQUESTS 0x00020000 #define ACPI_LV_PACKAGE 0x00040000 #define ACPI_LV_ALLOCATIONS 0x00100000 #define ACPI_LV_FUNCTIONS 0x00200000 #define ACPI_LV_OPTIMIZATIONS 0x00400000 #define ACPI_LV_MUTEX 0x01000000 #define ACPI_LV_THREADS 0x02000000 #define ACPI_LV_IO 0x04000000 #define ACPI_LV_INTERRUPTS 0x08000000 #define ACPI_LV_AML_DISASSEMBLE 0x10000000 #define ACPI_LV_VERBOSE_INFO 0x20000000 #define ACPI_LV_FULL_TABLES 0x40000000 #define ACPI_LV_EVENTS 0x80000000 int main(int argc, char **argv) { unsigned long debug_layer = ACPI_HARDWARE | ACPI_EXECUTER | ACPI_PCI_COMPONENT | ACPI_MEMORY_DEVICE_COMPONENT; unsigned long debug_level = ACPI_LV_EXEC | ACPI_LV_MUTEX | ACPI_LV_AML_DISASSEMBLE | ACPI_LV_VERBOSE_INFO; printf("echo 0x%lx > /sys/module/acpi/parameters/debug_layer\n", debug_layer); printf("echo 0x%lx > /sys/module/acpi/parameters/debug_level\n", debug_level); }
in the example above, I wanted to debug the ACPI AML code being executed and I wanted to observe possible race conditions around the EC hence I wanted to look at the way mutexes were being used, so I enabled ACPI_LV_EXEC, ACPI_LV_MUTEX, ACPI_LV_AML_DISASSEMBLE, ACPI_LV_VERBOSE_INFO. I wanted to trace inside the driver's hardware component to observe read/writes to the EC memory space, and I wanted to debug all PCI and memory operations as well as see how the AML executer was functioning.
As you can see, there is a lot of fine control. Be prepared to see lots of output and be patient - one needs to look at how the AML bytecode is being executed and compare that to the AML inside the DSDT and SSDTs. Needless to say, it is a very time consuming exercise. There is plenty of good documentation available in the Linux kernel, in document: Documentation/acpi/debug.txt
Sanity Checking ACPI Tables
Fortunately we have tools to sanity check the ACPI tables.
fwts
This is the firmware test suite, designed to interrogate and look at BIOS + ACPI firmware bugs. Several tests are available in the latest development version:
- Get general ACPI information (shallow test)
sudo fwts acpiinfo -
- Check for one instance of APIC defined in the tables:
sudo fwts apicinstance -
- Checksum the ACPI tables
sudo fwts checksum -
- FADT SCI_EN enabled check.
sudo fwts fadt -
- MCFG PCI Express* memory mapped config space.
sudo fwts mcfg -
- Re-assemble the DSDT and find syntax errors, warnings and some semantic check of AML code.
sudo fwts syntaxcheck -
- Dump and annoate ACPI tables.
sudo fwts acpidump -
- Test suspend/resume
sudo fwts s3 -
- Test hibernate/resume
sudo fwts s4 -
- Interactive tests, hotkeys, lid, battery, etc..
sudo fwts --interactive
acpiexec
As yet, this tool is not packaged up in debian or Ubuntu, so one needs to build this oneself. This tool allows one to load in the ACPI tables into an emulator and then interogate and probe the methods interactivelt. The downside is that it cannot emulate ACPI/BIOS and ACPI/EC interactions.
The acpiexec tool is an AML emulator that allows one to execute and interactively ACPI AML code from your BIOS. The tarball can be downloaded from the ACPICA website and built as follows:
Unzip and untar the acica-unix-20100304.tar.gz tarball from http://www.acpica.org/downloads/unix_source_code.php
- cd into tools/acpiexec
- run make
This should build acpiexec. Now for the fun part - executing your ACPI inside the emulator. To do this grab your ACPI tables and extract them using:
sudo acpidump > acpi.info && acpixtract -a acpi.info
Now load these tables into the emulator and run with verbose mode:
./acpiexec -v *.dat
Inside the emulator you can type help to navigate around the help system. It may take a little bit of work to get familiar with all the commands available.
As a quick introduction, here is how to execute the battery information _BIF method.
1. Get a list of all the available methods, type:
methods
on a Lenovo 3000N200 laptop the battery information method is labelled \_SB_.PCI0.LPCB.BAT1._BIF, so to execute this method one uses:
execute \_SB_.PCI0.LPCB.BAT1._BIF Executing \_SB_.PCI0.LPCB.BAT1._BIF Execution of \_SB_.PCI0.LPCB.BAT1._BIF returned object 0x19669d0 Buflen 178 [Package] Contains 13 Elements: [Integer] = 0000000000000001 [Integer] = 0000000000000FA0 [Integer] = 0000000000000FA0 [Integer] = 0000000000000001 [Integer] = 0000000000002B5C [Integer] = 00000000000001A4 [Integer] = 000000000000009C [Integer] = 0000000000000108 [Integer] = 0000000000000EC4 [String] Length 08 = PA3465U [String] Length 05 = 3658Q [String] Length 06 = Li-Ion [String] Length 07 = COMPAL
To single stepped through the code use the debug command on the method as follows:
debug \_SB_.PCI0.LPCB.BAT1._BIF
at each % prompt, one can press enter to step the next instruction. If the method requires arguments, these can be passed into the method by specifying them after the method name from the debug command.
To see any local variables used during execution, use the locals command. The list command lists the current AML instructions. The set commands allows one to set method data and interact with the debugging processes.
Hopefully this gives one a taste of what the emulator can do. The internal help is enough to get one up and running, and one does generally require the current ACPI specification to figure out what's happening in your ACPI tables.
Serialized Code
ACPI in a multi processor environment can be a headache - a subtle one at that. The Differentiated System Description Table (DSDT) contains AML bytecode that gets interpreted by the Linux Kernel ACPI driver. The DSDT varies from machine to machine as it is totally hardware specific. Sometimes an AML method is declared as NotSerialized when in fact it should be Serialised to prevent multiple threads of execution occurring simultaneously. To fix this, one could re-write the DSDT (non exactly user friendly), or ask for the BIOS to be fixed.
Fortunately the Linux kernel has a workaround - the acpi_serialize boot flag. Boot the kernel with the acpi_serialize kernel boot flag and hopefully this will resolve this kind of issues.
If acpi_serialize fixes an issue then one should identify which Methods need to be declared as NotSerialized and then recommend a BIOS fix to the vendor.
References
Kernel/Reference/ACPITricksAndTips (last edited 2021-09-09 22:18:15 by andika)