Docs: Features: Workspaces

Unlike some other declarative tools, Skeema’s behavior does not primarily rely on parsing SQL DDL. Parsing SQL is brittle across various MySQL and MariaDB versions, which have subtle differences in features, functionality, and syntax. Instead, Skeema introspects your database objects by querying information_schema and using SHOW commands, to obtain metadata directly from the database itself.

In order to accurately introspect the definitions in your filesystem’s *.sql files, Skeema actually runs these CREATE statements in a temporary location, called a workspace. By default, Skeema creates, uses, and then carefully drops a schema called _skeema_tmp on each database server it interacts with. This behavior is highly configurable, as described in sections below. In situations involving non-trivial network latency between Skeema and your database server, tuning workspace settings can substantially improve Skeema’s performance.

The following commands use workspaces:

The workspace option controls where workspaces are created: either as a temporary schema on each real database server (workspace=temp-schema), or inside of a dedicated ephemeral Docker container on your local machine (workspace=docker).

temp-schema workspaces

With the default value of workspace=temp-schema, the workspace schema is created on each database server that Skeema interacts with. For example, skeema diff staging will connect to your configured staging database server, run your *.sql CREATEs in a temporary workspace schema there, and then compare the resulting objects to the corresponding “real” schema on that same server.

The workspace schema name defaults to “_skeema_tmp”, but this may be configured by setting the temp-schema option to any arbitrary name. If any temp-schema workspace table is detected to be non-empty (contains any rows), Skeema detects this and exits with an error, preventing disaster if this option is misconfigured.

temp-schema performance and load tuning

By default, Skeema uses concurrency and batching during workspace creation and cleanup. This improves performance, but with a potential trade-off of introducing spiky load on the database server. This behavior can be tuned using the temp-schema-mode option in Skeema v1.13+, or using temp-schema-threads in prior versions.

When running Skeema against a database server in a remote datacenter, the network latency will reduce Skeema’s workspace execution speed, especially in directories containing hundreds or thousands of CREATE statements. Using a heavier mode can mitigate these issues, but be sure to test carefully when tuning this on a production environment. In some situations, using a lighter mode is necessary to avoid disrupting other production workloads.

General recommendations when tuning Skeema’s workspace behaviors as of Skeema v1.13:

  • Be sure to upgrade to the latest release of Skeema. Performance improvements are present in nearly every release.
  • Enable the debug option to log workspace performance times, making it easier to compare the effect of each configuration change.
  • Try different temp-schema-mode values and observe the effect on performance. When doing this on a production server, be sure to carefully monitor performance of the database server itself, as well as any applications or services.
  • If possible, give Skeema’s user sufficient privileges to skip session binary logging. Skeema will auto-detect that it has such permissions, and will use them to speed up temp-schema workspace behaviors.
  • Consider moving workspaces away from busy production servers, to a non-production server on the local network. In Skeema Premium, the temp-schema offloading feature allows you to configure an alternative location for the temp-schema workspace; this can also make it totally safe to use a heavier temp-schema-mode.
  • Alternatively, in any Skeema edition, the Docker workspace feature allows you to use an automatically-managed ephemeral database server container on localhost. This eliminates the need for further workspace tuning, but requires a Docker installation on every machine that runs Skeema.

temp-schema offloading

In Skeema Premium, the temp-schema-environment option may be used to configure an alternative database server for the temp-schema workspace, instead of placing it on each database server that Skeema interacts with.

This is primarily beneficial for isolation motivations. On busy production database servers with high-volume OLTP workloads, Skeema’s workspace setup and workspace cleanup can negatively impact overall database server performance. Offloading the workspace behavior to a different server ensures that your real production database server(s) are not impacted.

It can also be beneficial for avoiding network latency: if your real production databases are only accessible over a WAN or VPN, you can improve Skeema’s performance by offloading the temp-schema workspace to a database on localhost, or elsewhere on your LAN.

To configure offloading, the temp-schema-environment option should be set to an environment name (.skeema file section) to use for configuring the workspace host location. You may either re-use another environment (for example, “run all workspaces on the staging database server”), or define a new environment dedicated to workspaces. For example, consider this .skeema file snippet:

temp-schema-environment=scratch

# since workspaces are on dedicated scratch DB,
# it's safe to ramp up their load and improve perf
temp-schema-mode=extreme

[production]
host=db-prod.example.com
user=produser

[staging]
host=db-stage.example.com
user=stageuser

[scratch]
host=db-scratch.example.com
user=scratchuser

[localdev]
host=localhost
user=root
socket=/var/lib/mysql/mysql.sock
skip-temp-schema-environment

With the above configuration, Skeema will perform temp-schema workspace operations on the dedicated “scratch” database server, instead of the normal target database host and user, when running skeema against any environment except localdev. (Because temp-schema-environment is configured at the top of the file in this example, it applies to all environments, except ones that explicitly override it.)

Alternatively, to re-use an existing environment for workspaces, the above example could have used temp-schema-environment=staging (run all workspaces on the staging DB, since it presumably matches production’s configuration closely) or perhaps temp-schema-environment=localdev (run all workspaces locally, to avoid network latency). In either case, the temp-schema workspace would be placed in _skeema_tmp (or whatever name the temp-schema option is set to), and it is not a problem if there are other “real” schemas on the same database server.

Only connectivity-related options in the specified environment will take effect – options that specify which host to connect to, and how to connect to it. These include:

For any of these options that are not configured in the temp-schema’s environment, the original configuration (from the command-line, target environment, and/or “sectionless” lines at the top of a .skeema file) will be used instead, as usual. For example, if both hosts should have the same user or password, there is no need to specify these options in the temp-schema environment if they are already configured in another location.

Note that the above option list deliberately excludes options relating to schema names (schema, temp-schema) or workspace configuration (workspace, temp-schema-mode, etc). To configure these options, you must do so in the original target environment, not the temp-schema environment. This design permits environment re-use and avoids ambiguous/conflicting workspace configurations.

If the hosts require different password values, consider using environment variables to avoid placing plaintext passwords in .skeema files. Alternatively, you can indicate that the temp-schema environment needs to re-prompt on STDIN interactively for a different password by including a bare “password” line (with no =value) in the environment’s configuration.

If multiple hosts are configured by the temp-schema environment – either via a comma-separated host value, or a host-wrapper script which emits multiple hosts – then the first successfully-reachable host will be used for the temp-schema. This functionality may be leveraged as a mechanism for high availability of the workspace database servers.

You should ensure that the database server for the temp-schema environment closely matches the real target environment, in terms of vendor (MySQL vs MariaDB), version series, and server options that affect DDL such as sql_mode and innodb_strict_mode. Skeema will emit a non-fatal warning if the servers differ in flavor, or a fatal error if the servers differ in lower_case_table_names, but no other validations are performed. If needed, you can override any differing session-scoped variables by setting connect-options in the temp-schema environment’s configuration.

Docker workspaces

Instead of placing workspace schemas on real database servers, you can configure Skeema to use an Docker container on localhost via the workspace=docker setting. Skeema automatically selects an appropriate Docker image (based on your real server’s flavor) and manages the container lifecycle. This option is available in both the Community and Premium editions of Skeema.

Using workspace=docker requires Docker or a drop-in compatible equivalent. Skeema v1.11+ simply shells out to the docker CLI binary, which must be in your PATH. (Older Skeema versions used the Docker API, but that approach required a large dependency tree which was prone to CVE false-positives by security scanners.)

Compared to temp-schema, Docker workspaces can have substantial benefits for performance. By running on localhost, they completely eliminate network latency, and they can be configured to use an in-memory tmpfs volume to eliminate disk latency as well. These benefits are significant when operating on schemas with hundreds of tables or stored routines, especially when running Skeema from a different region than your database server.

Docker workspaces are also beneficial in terms of isolation (no workspace impact to the workload on real database servers), and security (reducing the number of privilege grants for Skeema’s user on real database servers).

However, there are some potential downsides to Docker workspaces in specific situations:

  • Kubernetes / managed containerized environments: If the skeema binary itself is running inside a container, Docker workspaces require complicated configurations to allow Skeema to interact with a Docker engine. You must either bind-mount the host’s Docker socket into Skeema’s container and install a docker CLI in the container; or use a privileged Docker-in-Docker (dind) image.

  • MySQL 5.x on Mac or other arm64 systems: Docker images for MySQL 5.x are only available for Intel/AMD (amd64) CPUs. When using workspace=docker on an ARM system (modern Macs, AWS Graviton EC2 instances, etc), a MySQL 8.0 image is automatically substituted, but this may cause compatibility problems.

  • AWS Aurora compatibility: With Aurora flavors, vanilla MySQL images are substituted automatically. For example, with flavor=aurora:8.0, Docker image “mysql:8.0” will be used instead. This can cause subtle problems if your schema definitions rely on Aurora-specific features. For example, AWS backported MySQL 8’s utf8mb4_0900_ai_ci collation into Aurora 5.7, whereas standard MySQL 5.7 does not support it.

  • Cross-schema views: In Skeema Premium, Docker workspaces cannot be used if views in one schema refer to tables in a different schema, unless you configure Skeema to ignore these views entirely.

If the drawbacks of Docker workspaces are too problematic in your environment, as an alternative consider using the temp-schema offloading Premium feature, which is more flexible.

Docker workspace containers

The containers managed by workspace=docker have the following properties:

  • The container image will be based on the flavor option specified for the corresponding database server, ensuring that the workspace behavior matches the live database. For example, when interacting with a live database running MySQL 8.4 (flavor=mysql:8.4), the local container will use image “mysql:8.4”.
    • See the previous section for important notes on cases where a different image must be substituted, causing potential compatibility problems.
  • The container name follows a template based on the image. For example, with flavor=percona:5.7, the container will be called “skeema-percona-5.7”.
  • The containerized DB server will only listen on the 127.0.0.1 localhost loopback interface, to ensure that external machines cannot communicate with it.
  • The containerized DB server will have an empty root password.
  • As of Skeema v1.10.1, your real database server’s sql_mode will automatically be applied to sessions on the containerized DB server. This behavior avoids common problems with strict sql_mode mismatches: Docker images for all modern MySQL and MariaDB versions default to using a strict sql_mode, but managed platforms such as RDS often use a non-strict sql_mode instead.
  • Aside from sql_mode, the vast majority of global variables of the containerized DB server are left at their defaults for the corresponding version/flavor. This may cause divergent behavior if your live databases have unusual overrides to global settings that affect DDL, such as innodb_strict_mode, explicit_defaults_for_timestamp, innodb_large_prefix, or innodb_file_format. If needed, most of these may be set at the session level via connect-options; such settings will be used for Skeema’s sessions on the workspace container as well as live databases.

Skeema dynamically manages containers and images as needed. If a container with the expected name does not already exist in the local Docker engine, it will be created on-the-fly. If the necessary image tag also is not already present in the local Docker engine, the image will be fetched from DockerHub automatically. These steps may take some extra time upon first use of workspace=docker.

If needed, you can circumvent some of these automatic behaviors by taking Docker-related actions outside of Skeema. For example, if your company uses a different container registry than DockerHub, you could pre-fetch or manually tag another image with the expected name.

Container lifecycle

When using a Docker workspace, the docker-cleanup option controls cleanup behavior of dynamically-managed Docker containers right before Skeema exits.

  • docker-cleanup=none (default in Skeema v1): Containers are left in the running state. This allows subsequent Skeema invocations to perform well, since no time is wasted recreating or restarting local database containers. However, the running containers may consume some resources on your local machine.

  • docker-cleanup=stop (planned default in Skeema v2): Containers are stopped, but not removed. Subsequent invocations of Skeema will need to restart the containers, which can take a few seconds, but is still faster than completely recreating the containers from scratch. The stopped containers won’t consume CPU or memory on your local machine, but they may consume disk space.

  • docker-cleanup=destroy: Containers (and their volumes) are deleted upon Skeema shutdown. Each invocation of Skeema will need to recreate the containers, which may take several seconds. This option avoids most resource consumption on your local machine, aside from image storage.

On recent versions of Skeema, setting this option to “destroy” also automatically enables use of an in-memory tmpfs volume for the containerized database’s data directory. This greatly improves performance by preventing disk writes. This optimization is enabled when running Skeema v1.11.1+ on a Linux system, or Skeema v1.12.1+ on MacOS or Windows. In some edge cases (especially Percona Server images), tmpfs is only available with Skeema v1.13.0+.

Regardless of the option used here, Skeema does not automatically delete or un-tag images. You may need to periodically perform prune operations in Docker itself to completely avoid any storage impact.