Safety is the primary design consideration throughout all of Skeema’s features and functionality. There are several types of guardrails throughout the system.
- Schema design safety: linter engine with many configurable rules for enforcing company policies and best practices
- Unsafe change detection: automatic detection and prevention of generated DDL which is destructive or highly risky
- Operational safety: numerous options for fine-tuning how DDL is generated and executed
- Internal correctness: functionality for confirming that Skeema has introspected all tables correctly, and empirically verifying that all generated DDL has the expected effect
Schema linting
Skeema’s linter provides a set of configurable rules for detecting schema design problems, and enforcing company policies on use of various database features.
To check for problems in all CREATE
statements in your *.sql files, use the skeema lint
command. By default, this command also reformats statements to their canonical format, but this behavior may be disabled with the skip-format option.
The skeema diff
and skeema push
commands also perform linting on the new definitions of all modified objects in the diff. This behavior is enabled by default, but can be disabled with the skip-lint option.
Each linter rule has a corresponding enum option, which can be configured to generate an error, a warning, or be ignored entirely. For example, the lint-pk option specifies how to handle tables that lack a primary key; it can be set to either “error” (emit a fatal error), “warning” (log a non-fatal warning), or “ignore” (disable this linter rule). This is sometimes referred to as configuring the severity of the linter rule.
Linter rule enum options treat skip-
prefixes as “ignore”, so for example skip-lint-pk
is equivalent to lint-pk=ignore
. This can be especially useful as an ad-hoc command-line override, for example skeema push --skip-lint-pk
, if you are confident that a failing linter check can be bypassed safely for a specific operation or situation.
Some linter rules also have supplemental configuration options, such as an allow-list. For example, the lint-charset option checks for acceptable character sets, using the allow-list specified by allow-charset. Together, this pair of options controls character set linting: the allow-charset option is set to a comma-separated list specifying which character sets to allow, while the lint-charset option indicates the severity (error/warning/ignore) for non-allowed character sets.
Linter option | Default | What it checks/flags |
---|---|---|
lint-pk | warning | Tables that lack a primary key |
lint-dupe-index | warning | Duplicate secondary indexes |
lint-charset | warning | Column character sets, using allow-list allow-charset |
lint-engine | warning | Table storage engines, using allow-list allow-engine |
lint-fk-parent | warning | Foreign key parent-side table missing or lacks unique index |
lint-reserved-word | warning | Names of objects which conflict with server reserved words |
lint-name-case | ignore | Tables or views that have uppercase letters in their names |
lint-zero-date | warning | date , datetime , or timestamp column default values with 00 in day/month/year |
lint-auto-inc | warning | auto_increment column data types, using allow-list allow-auto-inc |
lint-pk-type | ignore | Primary key column data types, using allow-list allow-pk-type |
lint-compression | warning | Table compression options, using allow-list allow-compression |
lint-display-width | warning | Integer column types with nonstandard display width modifiers |
lint-definer | error | Definer users for stored objects, using allow-list allow-definer |
lint-has-enum | ignore | Columns using enum or set data types |
lint-has-float | ignore | Columns using float or double data types |
lint-has-time | ignore | Columns using timestamp , datetime , or time data types |
lint-has-fk | ignore | Any use of foreign keys constraints |
lint-has-routine | ignore | Any use of stored procedures or functions |
lint-has-view | ignore | Any use of views (Premium Edition only) |
lint-has-trigger | ignore | Any use of triggers (Premium Edition only) |
lint-has-event | ignore | Any use of events (Premium Edition v1.12+ only) |
Supplemental allow-list options include the following:
Allow-list | Default value | Notes |
---|---|---|
allow-auto-inc | int unsigned, bigint unsigned | |
allow-charset | latin1, utf8mb4 | |
allow-compression | none, 4kb, 8kb | |
allow-definer | %@% | Default is permissive (allows any user) |
allow-engine | innodb | |
allow-pk-type | n/a | Must be explicitly configured if using lint-pk-type |
Unsafe change detection
By default, skeema push
refuses to execute DDL on a schema if any of the requested operations are “unsafe” – that is, they have the potential to destroy data, or a high risk of causing subtle problems for application workloads. Similarly, skeema diff
also returns a fatal error in this case. Even though skeema diff
never executes DDL anyway, it serves as an accurate “dry run” for skeema push
and therefore aborts in the same fashion.
The following operations are considered unsafe, as of the most recent version of Skeema:
- Dropping a table
- Altering a table to drop a normal column or stored (non-virtual) generated column
- Altering a table to modify the data type of an existing column in a way that potentially causes data loss, truncation, corruption, or reduction in precision
- Altering a table to modify the character set of an existing column
- Altering a table to modify the collation of an existing column, if that column is already part of a uniqueness constraint
- Altering a table to change its storage engine
- Dropping a stored procedure or function
- Dropping and re-creating a stored procedure or function to change its parameters or return type
- Dropping and re-creating a MySQL stored procedure or function to change its body, DEFINER, DETERMINISTIC property, or creation-time metadata
- This is only unsafe in MySQL, which lacks atomic
CREATE OR REPLACE
syntax. In MariaDB, this operation safe.
- This is only unsafe in MySQL, which lacks atomic
- Dropping a view (only supported in premium Skeema products)
- Dropping a trigger (only supported in premium Skeema products)
- Dropping an event, unless it is already disabled/expired (only supported in premium Skeema products)
In order to proceed with unsafe changes, you must enable the allow-unsafe option. For example, run skeema diff --allow-unsafe
to first preview the full set of changes with unsafe operations permitted, potentially followed by skeema push --allow-unsafe
to execute the operations.
It is not recommended to permanently enable allow-unsafe in an option file or wrapper script, especially in the production environment. It is safer to require users to supply it manually on the command-line on an as-needed basis, to serve as a confirmation step for unsafe operations.
You may also configure Skeema to conditionally permit unsafe operations on tables below a certain size (in bytes), or always permit unsafe operations on tables that have no rows. This behavior is configured through the safe-below-size option. For example, you could configure safe-below-size=10M
to automatically permit unsafe operations on tables under 10 megabytes in size, or safe-below-size=1
to automatically permit unsafe operations only for completely empty tables.
Operational safety
Even if your desired schema change is non-destructive and passes all configured linter checks, there can be additional safety concerns around operational execution. These risks depend on your company’s specific database environment, and often require human involvement, configuration, and/or coordination.
Dry run
Before running skeema push
, be sure to always review the generated DDL diff by first running skeema diff
. Under the hood, these commands share the exact same underlying code paths, with only two major behavioral differences:
skeema diff
displays generated DDL without actually executing it on the target database server, whereasskeema push
displays and then executes the generated DDL.skeema diff
has different process exit code behavior thanskeema push
, as described in each commands' documentation.
Running skeema diff
is actually completely equivalent to using skeema push --dry-run
. In terms of the implementation, skeema diff
is just a shim around skeema push --dry-run
with slightly different command help/usage text.
ALTER TABLE
operational safety
By default, altering a large table can be disruptive to application queries, depending on your database server version and the type of alteration being performed. Refer to the online schema change documentation for in-depth coverage on how to configure non-disruptive table alteration. The bottom of that page also covers metadata locking problems, which can cause unexpected query pile-ups.
DROP TABLE
operational safety
Prior to MySQL 8.0.23, whenever innodb_buffer_pool_size
is large, DROP TABLE
can potentially impact database performance even if the table being dropped is small or empty.
Regardless of database server version, DROP TABLE
performance is improved when innodb_adaptive_hash_index
is disabled on the server. Additionally, if you self-host your databases (rather than using e.g. RDS or CloudSQL), choice of Linux filesystem can also greatly affect DROP TABLE
performance for large tables. As an example, XFS can unlink files quickly regardless of file size, whereas EXT3/EXT4 cannot. Unlink performance can also depend on the specific filesystem mount options.
Foreign key constraint validation
By default, skeema push
executes DDL in a session with foreign key checks disabled. When adding a new foreign key to an existing table, no immediate integrity check is performed on existing data. This results in faster ALTER TABLE
execution, avoids order-of-operation issues with circular dependencies, and eliminates one possible failure vector for the DDL.
This behavior may be overridden by enabling the foreign-key-checks boolean option. When enabled, skeema push
enables foreign key checks for any ALTER TABLE
that adds one or more foreign keys to an existing table. This means the server will validate existing data’s referential integrity for new foreign keys, and the ALTER TABLE
will fail with a fatal error if the constraint is not met for all rows.
Virtual column validation
When adding a new VIRTUAL
column, by default MySQL servers will not confirm that the column generation expression is valid for all existing rows of the table. To perform this validation in MySQL, enable Skeema’s alter-validate-virtual boolean option, which will add a WITH VALIDATION
clause to any generated ALTER TABLE
which affects virtual columns. However, this validation can cause slower ALTER TABLE
performance. Refer to the MySQL manual for more information.
In contrast, on a MariaDB server, validation of virtual columns occurs only when an ALGORITHM=COPY
clause is used. To force validation on a MariaDB server, you must use Skeema’s alter-algorithm=copy option setting, which may cause ALTER TABLE
to be significantly slower, and potentially disruptive to writes.
When an external online schema change tool is being used, such tools typically rebuild an entire table, in which case the database server will inherently perform virtual column validation as rows are bulk-copied into the new table. No special syntax is required in that situation.
Workspace safety
In order to accurately introspect the schemas represented in your filesystem’s *.sql files, Skeema actually runs your CREATE
statements in a temporary location, called a workspace. By default, Skeema creates, uses, and then carefully drops a schema called _skeema_tmp
on each database server it interacts with. However, this behavior is highly configurable.
When operating on schemas containing hundreds of table definitions, workspace activity can potentially be disruptive to busy production database servers, especially on older versions of MySQL with large InnoDB buffer pools. Be sure to carefully read the workspace documentation, which describes how to configure offloading workspace activity to either a local Docker container (any edition of Skeema) or a non-production database server of your choice (Skeema Premium only).
Internal correctness guardrails
Skeema verifies the correctness of its own behavior in several ways.
Generated ALTER TABLE
verification
By default, whenever Skeema generates ALTER TABLE
statements, it will automatically verify their correctness by testing them in a workspace. An empty copy of the old (live) table definition is created in the workspace schema, and then the auto-generated ALTER TABLE
is run against it. Skeema then verifies that the resulting table definition correctly matches the desired state from the filesystem .sql file. If verification fails, Skeema aborts with a fatal error.
If desired, diff verification can be skipped by disabling the verify option. This can improve diff performance slightly. It can also be disabled as an ad-hoc mechanism to bypass situations where the verification is failing due to an unimportant edge case. In practice, this is almost never necessary though; situations causing diff verification failures have been extremely rare throughout Skeema’s history.
Table introspection validation
To form an in-memory representation of your database objects, Skeema queries the database server’s information_schema
, as well as running various SHOW CREATE
commands. Table introspection is somewhat complicated, due to table metadata being spread out among many places in information_schema
, some of which contain subtle edge-cases in different database server versions. A few rarely-used table features are not supported, due to the necessary information_schema
introspection code not being implemented yet.
In order to validate that a table has been correctly introspected and doesn’t use any rare unsupported table features, Skeema internally attempts to reconstruct the expected output of SHOW CREATE TABLE
, and compares this to the actual output of that command. If an unexpected mismatch is found, the table will be considered unsupported for diff operations: Skeema will be able to create the table, and lint its supported features, but it cannot generate an ALTER TABLE
for the table. In this situation, skeema diff
and skeema push
will skip the affected table and log a warning message.
You can still alter these tables externally from Skeema (e.g., direct invocation of ALTER TABLE
or pt-online-schema-change
). Afterwards, you can update your schema repo using skeema pull
, which will work properly even on these tables.
Beginning with Skeema v1.10.1, Skeema can often generate ALTER TABLE
statements for pre-existing tables that already use unsupported table features, as long as the new modification to the table does not involve adding or modifying those unsupported features. Internally, Skeema leverages its diff verification logic to empirically confirm the correctness of the generated ALTER TABLE
. Even if the verify option is disabled, this verification step still occurs specifically for tables that use unsupported features.
Integration test suite
Skeema’s codebase has an extensive suite of unit, integration, and end-to-end functional tests. Every commit goes through automated testing against the most commonly-used versions of MySQL and MariaDB. For new Skeema releases, the test suite is run using every supported version of MySQL, MariaDB, and Percona Server. Skeema Premium is additionally tested against every supported version of Amazon Aurora.
No reliance on SQL parsing
Skeema’s behavior does not rely on parsing SQL DDL. Parsing SQL is brittle across various MySQL and MariaDB versions, which have subtle differences in features, functionality, and syntax. Instead, Skeema introspects your database objects by querying information_schema
and using SHOW CREATE
commands, to obtain metadata directly from the database itself. See the workspace documentation to learn how this works for interpreting your CREATE
statements from the filesystem.
When operating on a workspace, Skeema halts immediately if any workspace table is detected to be non-empty (contains any rows). This prevents disaster if someone accidentally misconfigures Skeema’s workspace-related options to point to a real/live database schema.