Docs: Features: Workspaces

Unlike some other declarative tools, Skeema’s behavior does not primarily rely on parsing SQL DDL. Parsing SQL is brittle across various MySQL and MariaDB versions, which have subtle differences in features, functionality, and syntax. Instead, Skeema introspects your database objects by querying information_schema and using SHOW commands, to obtain metadata directly from the database itself.

In order to accurately introspect the definitions in your filesystem’s *.sql files, Skeema actually runs these CREATE statements in a temporary location, called a workspace. By default, Skeema creates, uses, and then carefully drops a schema called _skeema_tmp on each database server it interacts with. This behavior is highly configurable, as described in sections below.

The following commands use workspaces:

When operating on a workspace, Skeema halts immediately if any workspace table is detected to be non-empty (contains any rows). This prevents disaster if someone accidentally misconfigures Skeema’s workspace-related options.

temp-schema workspaces

The workspace option controls where workspaces are created: either as a temporary schema on each real database server (workspace=temp-schema), or inside of a dedicated ephemeral Docker container (workspace=docker).

With the default value of workspace=temp-schema, the workspace schema is created on each database server that Skeema interacts with. For example, skeema diff staging will connect to your configured staging database server, run your *.sql CREATEs in a temporary workspace schema there, and then compare the resulting objects to the corresponding “real” schema on that same server.

temp-schema tuning

Schema name: The workspace schema name defaults to “_skeema_tmp”, but this may be configured by setting the temp-schema option to any other arbitrary name.

Concurrency: Skeema can run some CREATE queries concurrently during the temp-schema workspace setup step, and likewise for DROP queries during the temp-schema workspace cleanup step. This improves performance by queueing up multiple operations at once, instead of waiting for network round-trips. The level of concurrency is controlled by the temp-schema-threads option, which defaults to 5 threads. Raising this value may improve performance in some situations; in other situations, this concurrency may need to be disabled to avoid mutex contention. Refer to the temp-schema-threads documentation for more information.

Replication: If sufficient user privileges are available, Skeema will automatically attempt to skip binary logging for temp-schema workspace queries, since there is no benefit to replicating them. See the temp-schema-binlog option to configure this behavior.

temp-schema offloading

In Skeema Premium, the temp-schema-environment option may be used to configure an alternative database server for the temp-schema workspace, instead of placing it on each database server that Skeema interacts with.

This is primarily beneficial for isolation motivations. On busy production database servers with high-volume OLTP workloads, Skeema’s workspace setup and workspace cleanup can negatively impact overall database server performance. Offloading the workspace behavior to a different server ensures that your real production database server(s) are not impacted.

It can also be beneficial for avoiding network latency: if your real production databases are only accessible over a WAN or VPN, you can improve Skeema’s performance by offloading the temp-schema workspace to a database on localhost, or elsewhere on your LAN.

To configure offloading, the temp-schema-environment option should be set to an environment name (.skeema file section) to use for configuring the workspace host location. You may either re-use another environment (for example, “run all workspaces on the staging database server”), or define a new environment dedicated to workspaces. For example, consider this .skeema file snippet:

temp-schema-environment=scratch

[production]
host=db-prod.example.com
user=produser

[staging]
host=db-stage.example.com
user=stageuser

[scratch]
host=db-scratch.example.com
user=scratchuser

[localdev]
host=localhost
user=root
socket=/var/lib/mysql/mysql.sock
skip-temp-schema-environment

With the above configuration, Skeema will perform temp-schema workspace operations on the dedicated “scratch” database server, instead of the normal target database host and user, when running skeema against any environment except localdev. (Because temp-schema-environment is configured at the top of the file in this example, it applies to all environments, except ones that explicitly override it.)

Alternatively, to re-use an existing environment for workspaces, the above example could have used temp-schema-environment=staging (run all workspaces on the staging DB, since it presumably matches production’s configuration closely) or perhaps temp-schema-environment=localdev (run all workspaces locally, to avoid network latency). In either case, the temp-schema workspace would be placed in _skeema_tmp (or whatever name the temp-schema option is set to), and it is not a problem if there are other “real” schemas on the same database server.

Only connectivity-related options in the specified environment will take effect – options that specify which host to connect to, and how to connect to it. These include:

For any of these options that are not configured in the temp-schema’s environment, the original configuration (from the command-line, target environment, and/or “sectionless” lines at the top of a .skeema file) will be used instead, as usual. For example, if both hosts should have the same user or password, there is no need to specify these options in the temp-schema environment if they are already configured in another location.

Note that the above option list deliberately excludes options relating to schema names (schema, temp-schema) or workspace configuration (workspace, temp-schema-threads, etc). To configure these options, you must do so in the original target environment, not the temp-schema environment. This design permits environment re-use and avoids ambiguous/conflicting workspace configurations.

If the hosts require different password values, consider using environment variables to avoid placing plaintext passwords in .skeema files. Alternatively, you can indicate that the temp-schema environment needs to re-prompt on STDIN interactively for a different password by including a bare “password” line (with no =value) in the environment’s configuration.

If multiple hosts are configured by the temp-schema environment – either via a comma-separated host value, or a host-wrapper script which emits multiple hosts – then the first successfully-reachable host will be used for the temp-schema. This functionality may be leveraged as a mechanism for high availability of the workspace database servers.

You should generally ensure that the database server for the temp-schema environment closely matches the real target environment, in terms of vendor (MySQL vs MariaDB), version series, and server options that affect DDL such as sql_mode and innodb_strict_mode. Skeema will emit a non-fatal warning if the servers differ in flavor, or a fatal error if the servers differ in lower_case_table_names, but no other validations are performed. If needed, you can override any differing session-scoped variables by setting connect-options in the temp-schema environment’s configuration.

Docker workspaces

Instead of placing workspace schemas on real database servers, you can configure Skeema to use an automatically-managed Docker container on localhost via the workspace=docker setting. This option is available in both the Community and Premium editions of Skeema.

Using workspace=docker requires Docker or a drop-in compatible equivalent. Skeema v1.11+ simply shells out to the docker CLI binary, which must be in your PATH. (Older versions of Skeema bundle a built-in Docker client and communicate directly with a Docker Engine daemon reachable at /var/run/docker.sock on the same machine as Skeema. We’ve moved away from this approach due to the huge dependency tree of the Golang Docker client packages.)

Docker workspaces offer various trade-offs relative to temp-schema workspaces:

  • Pro: No network latency
    • If the machine you’re running Skeema from is in a different region/datacenter than your database, interacting with a container on localhost will avoid the network latency penalty, especially when your schemas contain a large number of tables.
  • Pro: Isolation
    • Docker workspaces avoid risk of the workspace activity impacting performance on a “real” database server. For example, prior to MySQL 8.0.23, whenever innodb_buffer_pool_size is large, DROP TABLE can impact database performance even if the table being dropped is empty. This means the cleanup step of workspace=temp-schema can impact other queries, especially if the number of tables is high.
  • Pro: Reduced privileges
    • Docker workspaces can reduce the number of privilege grants that Skeema’s user requires on real database servers.
  • Con: Kubernetes and other containerized environments
    • If the skeema binary itself is running inside a container, Docker workspaces require complicated configurations to allow Skeema to interact with a Docker engine. You must either bind-mount the host’s Docker socket into Skeema’s container and install a docker CLI in the container; or use a privileged Docker-in-Docker (dind) image. Each choice has trade-offs involving operational complexity and security.
  • Con: Older MySQL with newer Macs or other arm64 systems
    • Recent Macs use ARM processors, but ARM images for MySQL 5.7 (or older) are not available, so workspace=docker must substitute a MySQL 8 image in this situation. This can cause compatibility problems, especially with charsets and collations prior to Skeema v1.11.2.
    • This problem also affects other arm64 systems, such as when running Skeema on AWS Graviton EC2 instances.
  • Con: Aurora compatibility
    • With Aurora flavors, vanilla MySQL images will be substituted automatically. For example, with flavor=aurora:8.0, Docker image “mysql:8.0” will be used instead.
    • This can cause problems if your schema definitions rely on Aurora-specific features. For example, AWS backported MySQL 8’s utf8mb4_0900_ai_ci collation into Aurora 5.7, whereas standard MySQL 5.7 does not support it.
  • Con: Configuration drift
    • If your real database servers use unusual non-default values for settings which affect DDL, then the workspace behavior may not match that of your real database servers. See the next section.
  • Con: Cross-schema views

If the drawbacks of Docker workspaces are too problematic in your environment, as an alternative consider using the temp-schema offloading Premium feature, which is more flexible.

Docker workspace containers

The containers managed by workspace=docker have the following properties:

  • The container image will be based on the flavor option specified for the corresponding database server, ensuring that the workspace behavior matches the live database. For example, when interacting with a live database running Percona Server 5.7 (flavor=percona:5.7), the local container will use image “percona:5.7”.
    • See the previous section for important notes on cases where a different image must be substituted, causing potential compatibility problems.
  • The container name follows a template based on the image. For example, with flavor=percona:5.7, the container will be called “skeema-percona-5.7”.
  • The containerized DB server will only listen on the 127.0.0.1 localhost loopback interface, to ensure that external machines cannot communicate with it.
  • The containerized DB server will have an empty root password.
  • The vast majority of global variables of the containerized DB server are left at their defaults for the corresponding version/flavor. This may cause divergent behavior if your live databases configure non-default global settings that affect DDL, such as innodb_strict_mode, explicit_defaults_for_timestamp, innodb_large_prefix, innodb_file_format, among others. Many of these may be set at the session level via connect-options; such settings will be used for Skeema’s sessions on the workspace container as well as live databases.
    • As of Skeema v1.10.1, your real database server’s sql_mode will automatically be applied to sessions on the containerized DB server. This behavior avoids common problems with strict sql_mode mismatches: Docker images for all modern MySQL and MariaDB versions default to using a strict sql_mode, but database-as-a-service platforms such as RDS / Aurora use a non-strict / blank sql_mode instead.
    • In older versions of Skeema, if your real database server uses a different sql_mode than the default listed in your database server’s documentation, you must manually configure sql_mode in connect-options to match that of your real database server.

Skeema dynamically manages containers and images as needed. If a container with the expected name does not already exist in the local Docker engine, it will be created on-the-fly. If the necessary image tag also is not already present in the local Docker engine, the image will be fetched from DockerHub automatically. These steps may take some extra time upon first use of workspace=docker.

If needed, you can circumvent some of these automatic behaviors by taking Docker-related actions outside of Skeema. For example, if your company uses a different container registry than DockerHub, you could pre-fetch or manually tag another image with the expected name.

Container lifecycle

When using a Docker workspace, the docker-cleanup option controls cleanup behavior of dynamically-managed Docker containers right before Skeema exits.

  • docker-cleanup=none (default): Containers are left in the running state. This allows subsequent Skeema invocations to perform well, since no time is wasted recreating or restarting local database containers. However, the running containers may consume some resources on your local machine.

  • docker-cleanup=stop: Containers are stopped, but not removed. Subsequent invocations of Skeema will need to restart the containers, which can take a few seconds, but is still faster than completely recreating the containers from scratch. The stopped containers won’t consume CPU or memory on your local machine, but they may consume disk space.

  • docker-cleanup=destroy: Containers (and their volumes) are deleted upon Skeema shutdown. Each invocation of Skeema will need to recreate the containers, which can take 10-20 seconds. This option avoids most resource consumption on your local machine, aside from image storage.

When using Skeema v1.11.1+ on a Linux system, setting this option to “destroy” also automatically enables use of an in-memory tmpfs volume for the containerized database’s data directory. This improves performance by preventing disk writes. Some images do not support this behavior though, including Percona Server 8.1+ images (any architecture) or Percona Server 8.0 images (arm64 specific), and in these cases tmpfs is not used.

Regardless of the option used here, Skeema does not automatically delete or un-tag images. You may need to periodically perform prune operations in Docker itself to completely avoid any storage impact.