There are a number of ways MariaDB can result in a fault for you as the user. Getting a good clear understanding of the problem is the perfect way to get a good bug report to developers to allow them to fix your problem.
Like the MariaDB Community Bug Reporting says, start by searching for an existing bug. If its not found, or it not 100% clear that it applies to you, gain some additional information an include those in a new or perhaps and existing bug report.
This page is licensed: CC BY-SA / Gnu FDL
There are a number of causes of MariaDB crashes. Most of these are MariaDB encountering a situation where it cannot safely proceed. Crashes on Linux/Unix systems are fatal signals. On Windows these are similar looking exceptions.
These are indicated by the following text in the error log:
[ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
For this its essential that a full backtrace of [how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mariadbd-process|all the running threads].
These look like [ERROR] mariadbd got signal 6
. There might be associated text in the error log like:
InnoDB: Assertion failure in file ./storage/innobase/os/os0file.cc line 3540
InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
On OSX it might show up as Exception Type: EXC_CRASH (SIGABRT)
.
There maybe a stack trace, and perhaps a user SQL query (depending on the location). These are usually program errors that MariaDB got into a state where the developers did not expect to happen. Occasionally like the above, there are operating system factors that the developers haven't written handling routines for. In all cases this should be reported as a bug.
A SEGV, or Segment Violation, is an operating system concept in this case MariaDB accessing a location in memory it doesn't own. It will occur in the logs like:
[ERROR] mysqld got signal 11 ;
A full stack trace from the [how-to-produce-a-full-stack-trace-for-mysqld/#analyzing-a-core-file-with-gdb-on-linux|core file generated] is needed to resolve this form of error.
If there is a SQL query in the log file, include this in the bug report, along with EXPLAIN query
and SHOW CREATE TABLE tblname
for tables involved to enable a test case for this crash.
This occurs when the alignment of memory doesn't correspond to the code accessing it.
[ERROR] mysqld got signal 7
Threat like SEGV.
This indicates the MariaDB you have has been compiled for a CPU hardware feature that isn't available.
One cause is distributions occasionally target a minimum CPU version and if you are running on an earlier version it will crash in this way.
MariaDB does use optimizations in a number of critical paths under feature detection. Its possible that these aren't comprehensive for you your hardware. Very precise reporting of the hardware CPU model is required when reporting these bugs.
This is an indicator that a user or OS killed of MariaDB. User invoked SIGKILL
termination can be indirect using podman kill {container}
.
The OS will kill MariaDB in an out of memory scenario in an attempt to regain memory. As MariaDB is a big memory user its usually fairly high on the list of processes killed by the OS.
If MariaDB is shutting down, it might be possible under some service manager that if it takes too long, it will get a SIGKILL. The default MariaDB systemd service has SendSIGKILL=No
.
Galera SST scripts also have a kill -9
in them.
Generally the SIGKILL won't be a bug, however the slow time that resulted in a service manager taking this drastic step might be.
This can be disk corruption where MariaDB reads data and that data wasn't what was written. This will result in MariaDB doing unexpected things, including potentially accessing unallocated memory.
RAM corruptions can also result in undefined behaviour that is unfixable with software.
This page is licensed: CC BY-SA / Gnu FDL
MariaDB has a number of error messages. If the meaning is not clear, use the following table to guide you to the desired actions.
[ERROR] mysqld: Can't lock aria control file '/var/lib/mysql/aria_log_control' for exclusive use, error: 11. Will retry for 30 seconds
The perror 11
is 'Resource temporarily unavailable'. This is an indication that an operating system lock couldn't be obtained on the ariia_log_control file. This is a strong indication that an existing MariaDB server is already running on the datadir.
Look at the operating system process list. Identify if the existing mysqld/mariadbd process is shutting down (look at the logs, and if its listening to new connections). MariaDB will stop accepting new SQL connections immediately once the shutdown starts.
If its shutting down, its best to let it continue. Force killing off the process will cause a crash recovery on the next restart. If you are about to upgrade there is a possibility your next upgraded version will not complete the crash recovery and you will need the current version to complete the log recovery.
If it completing the shutdown, take two [how-to-produce-a-full-stack-trace-for-mysqld/#getting-full-backtraces-for-all-threads-from-a-running-mariadbd-process|backtraces] with 30 seconds to a minute between them.
[ERROR] mariadbd: Can't create/write to file '/var/run/mariadb/mariadb.pid' (Errcode: 2 "No such file or directory")
.
The errcode can be look up with [perror](https://app.gitbook.com/s/SsmexDFPv2xG2OTyO5yV/clients-and-utilities/perror) X
to return a text description of the code if not already there. In may cases there are directly the same as operating system errors. Common ones are:
Error 2 - No such file or directory. MariaDB is expecting to read/write/create a file somewhere and it doesn't exist. If creating, the directory doesn't exist.
Error 13 - EPERM, permission denied. Either file permissions or an access control mechanism like selinux or apparmor or systemd is preventing access to this location. Alternately you are running MariaDB as the incorrect user for the data it is trying to access.
Error 28 - ENOSPC - no space left on device. The filesytem where writing is occurring is full and cannot fit any more data.
This indicates that MariaDB is starting on a MySQL-8.0 data directory. Other flags are possible if a different page size or different features (see MDEV-27882).
MariaDB cannot startup on MySQL-8.0 data directory. MySQL should be run on the MySQL data directory and if the objective is to migrate, take logical dumps from MySQL before recreating a MariaDB database, prehaps in a different location, or after the MySQL data directory has been moved.
This page is licensed: CC BY-SA / Gnu FDL
Operating system memory can be assigned, giving MariaDB a virtual address space, and allocated (paged in), once MariaDB starts to use this memory.
When looking at something that might be a memory leak, pay particular attention between the virtual size and the resident size.
When MariaDB has the same virtual size, but the resident size increases over time, this is a good indication that the memory buffers allocated to MariaDB are slowly being used. In this case its unlikely to be a memory leak.
Because memory leaks are hard to track down, documenting a known good version and a version that is leaking memory can make it easier to examine code changes in those versions against some other criteria like those obtained from the sections below to identify the cause.
The memory instrumentation of the performance schema can be enable with the following:
performance_schema=on
performance-schema-instrument='memory/%=ON'
This is available since MariaDB 10.5.2. With this enabled, performance_schema.memory_summary_global_by_event_name can start to show where the memory leak is occurring.
memleak
is a BPF program for Linux that is frequently packaged like bcc-tools or similar (bpfcc-tools on Debian).
The upstream documentation provides this example.
The important aspect on measurement is to let MariaDB startup, and commence an normal workload before measurement. Starting too early or before the caches are populated is likely to result in false recording of memory allocations that will be eventually released.
Increasing the time interval of memleak
is important to run longer than the usual query. If your memory is leaking over hours, then a high interval of 10 minutes so will reduce memory temporary allocated for a query. Recording for a longer time or multiple times should provide enough information to narrow down the location of the leak.
Note: Very old Linux kernels like Centos/RHEL 7 might not have sufficient hooks to measure this correctly.
This page is licensed: CC BY-SA / Gnu FDL