Create a git repo of CREATE statements
Use the skeema init
command to generate a “schema dump”: a directory tree of CREATE files (CREATE TABLE, CREATE PROCEDURE, CREATE FUNCTION) for all tables and routines found on a DB server. A subdirectory will be created for each schema, containing .sql files corresponding to all objects in that schema.
Connectivity options for skeema init
are identical to those for the standard MySQL and MariaDB command-line clients:
- TCP/IP connection to default port 3306:
skeema init -h <host> -u <user> -p[password]
- TCP/IP connection to custom port:
skeema init -h <host> -P <port> -u <user> -p[password]
- Localhost unix domain socket:
skeema init -h localhost -S <socketpath> -u <user> -p[password]
There are several ways to supply a password:
- If you have an existing
~/.my.cnf
MySQL client configuration file, Skeema will parse it to obtain user and password. - To interactively prompt for password on STDIN, supply
-p
without a value, just like when using themysql
command-line client. - To supply a password directly on the command-line, for example “foo”, use
-pfoo
or--password=foo
; do not use a space between the flag and value. (This functionality matches how the standardmysql
command-line client takes passwords.) - You may also supply a password using the
$MYSQL_PWD
environment variable, instead of using the password option on the command-line.
The -d
(--dir
) option tells skeema init
what to call the host-level directory. If omitted, it will default to the hostname, but a descriptive name is usually preferable:
skeema init -h my.prod.db.hostname -u root -p -d schemas
Skeema saves the host information in a configuration file called .skeema in the top-level created directory. Each database schema subdirectory also has its own .skeema file, defining the schema name and character set defaults. Configuration options in .skeema files “cascade” down to subdirectories, allowing you to define options that only affect particular hosts or schemas.
If your DB server only has one relevant database schema, you may use skeema init
’s --schema
option to create a “flat” single-directory layout. In this case, the generated .skeema file will contain DB host information and the schema information.
The above example assumes the host information corresponds to the production environment. Multi-environment examples are found later in this document.
This example also assumes you only have a single primary DB cluster. If you have several distinct clusters (that is, masters with completely different sets of schemas/tables), run skeema init
once per cluster, supplying a different --dir
each time reflecting the cluster name.
Skeema is not opinionated about your repo layout. You can choose to have a single repo dedicated to Skeema files, or multiple dedicated repos (e.g. one per database cluster), or a “mono-repo” which places Skeema files in a subdirectory alongside application code. In any case, simply cd
to the desired location before running subsequent skeema
commands.
cd schemas
All subsequent examples assume you are already in the directory created by skeema init
. Skeema’s other commands all operate recursively from the current working directory. You can manipulate all schemas on a database server by invoking commands from a host-level directory, or just a single schema by invoking commands from a schema-level subdirectory.
All examples below also omit password handling. If you supplied a password to skeema init
on the command-line, you’ll need to repeat that for all other skeema
commands below. Again, you can avoid this by configuring a password in ~/.my.cnf
, or by setting the $MYSQL_PWD
environment variable.
Alter a table
Skeema is a declarative system: your repo is a set of CREATE
statements, representing a desired state. You never need to write ALTER
statements or include them in your repo. The system figures out automatically how to transition a live database into the desired state represented by the repo.
To make a schema change to an existing table, first simply edit the table’s CREATE TABLE
definition in its .sql file. Then, run skeema diff
to preview the auto-generated ALTER TABLE
DDL which corresponds to your change. skeema diff
displays the generated DDL, but does not actually run it. To execute the ALTER, use skeema push
.
vi product/users.sql
skeema diff
skeema push
Ordinarily, in between skeema diff
and skeema push
, you would want to make a commit to a new branch, open a pull request, get a coworker to review, and merge the pull request. These steps have been elided here for brevity.
If you have large tables, you may want to configure Skeema to use an external OSC tool. Otherwise, altering a large table can be a disruptive operation.
Create or drop a table
To create a new table, simply add a new .sql file containing the desired CREATE TABLE
statement. The output of skeema diff
will include the new table, and running skeema push
will actually create it.
Similarly, to drop a table, delete its corresponding .sql file and then run skeema diff
to preview or skeema push
to execute. However, since a drop is a destructive action, it will be prevented unless the --allow-unsafe
option is used.
vi product/comments.sql
skeema diff
rm product/tags.sql
skeema diff
skeema diff --allow-unsafe
skeema push --allow-unsafe
To aid in rapid development, you can configure Skeema to always allow dropping empty tables or small tables with the safe-below-size option. For example, putting safe-below-size=10m
in ~/schemas/.skeema will remove the requirement of specifying --allow-unsafe
when dropping any table under 10 megabytes in size. Or use safe-below-size=1
to only loosen safeties for tables that have no rows. (Skeema always treats zero-row tables as size 0 bytes, as a special-case.)
Check table definitions for problems
Skeema’s linter checks the CREATE statements in *.sql files for common problems, including SQL syntax errors, undesirable storage engine or character set usage, lack of primary key, and more.
skeema lint
By default, this will rewrite all of the CREATE statements in the *.sql files to match the canonical format shown by the database server’s SHOW CREATE, but this behavior may be disabled via --skip-format
. Conversely, if you only want to reformat statements, see the skeema format
command.
Update .sql files with changes made manually / outside of Skeema
If you make changes outside of Skeema – either due to use of a language-specific migration tool, or to perform an operation unsupported by Skeema – you can use skeema pull
to update the filesystem to match the database. Essentially, this is the opposite of skeema push
.
skeema pull
By default, this also normalizes file format like skeema format
, but you can skip that behavior with the --skip-format
option (or equivalently set as --format=0
or --format=false
).
Keep dev and prod in-sync
Let’s assume each engineer has a dev database server on their local dev server. This example shows how to add an environment named “development”, using the --socket
(-S
) option to reach a database server on localhost.
As a one-time setup to configure this new environment, use skeema add-environment
from the host directory previously created by init:
skeema add-environment development -h localhost -S /var/lib/mysql/mysql.sock -u root
This automatically added a new section to the .skeema file, configuring the dev environment’s connection information. You could also do this by hand instead of using skeema add-environment
.
In general, you can always manually add any other config directives to any .skeema file. If you put options at the top of the file (outside of any [section]
), they’ll affect all environments. Or put them in an environment section to only affect that environment.
Once your additional environments are configured, you may now interact with them by supplying the environment name as a positional arg to most Skeema commands, such as skeema diff development
. As an example, here is a more advanced team workflow for schema changes:
# make the schema change in dev using any preferred method
# this is for Django, just as an example
python manage.py migrate
# pull the changes from dev into the repo
cd ~/schemas
skeema pull development
# diff the changes against production, to safety check
skeema diff production
# commit using any desired git flow
git checkout -b my-new-branch
git commit -a -m 'Updating schema files from Django migration'
git push -u origin my-new-branch
# make a pull request (or whatever your normal code workflow is)
# get coworker review / sign-off before merging
# after PR is merged, push the changes to production
# diff'ing first is not strictly necessary, but recommended
skeema diff production
skeema push production
Advanced configuration
This example shows how to configure Skeema to use the following set of rules:
- Development
- Database servers are located on each engineer’s local dev box
- Be fully permissive about dropping tables or columns in dev, regardless of table size
- Just use standard ALTERs, no online DDL or external OSC tool in dev
- Any environment EXCEPT development
- Only automatically permit unsafe changes for tables that have no rows. For any table with at least one row, force the user to supply
--allow-unsafe
on the command-line to confirm when needed. - When ALTERing any table 1GB or larger, use the
pt-online-schema-change
external OSC tool - When ALTERing any table below 1GB, use the server’s built-in online DDL. (Some specific ALTERs will fail due to requiring offline DDL; in this case the user can supply
--skip-alter-algorithm
and--skip-alter-lock
as needed.)
- Only automatically permit unsafe changes for tables that have no rows. For any table with at least one row, force the user to supply
- Staging
- Has its own database server, reached via TCP/IP, on a nonstandard port
- Production
- Has its own database server, reached via TCP/IP, on the standard 3306 port
alter-wrapper="/usr/local/bin/pt-online-schema-change --execute --alter {CLAUSES} D={SCHEMA},t={TABLE},h={HOST},P={PORT},u={USER},p={PASSWORDX}"
alter-wrapper-min-size=1g
alter-algorithm=inplace
alter-lock=none
safe-below-size=1
[development]
host=localhost
socket=/var/lib/mysql/mysql.sock
allow-unsafe
skip-alter-wrapper
skip-alter-algorithm
skip-alter-lock
[staging]
host=staging-db.mycompany.com
port=3333
[production]
host=prod-db.mycompany.com
Note that the lines at the top of the file (prior to any named section) will apply to all environments. But then the [development] environment overrides a few settings, and these overrides take precedence when using the environment.
The above example is assuming a .skeema file located in a host-level directory. Alternatively, you could apply this configuration at an individual schema level by putting it in a schema directory’s .skeema file, minus all the host
, port
, and socket
lines. This permits you to have a different configuration for specific schemas that need special-case logic.
Or if you have multiple distinct database clusters, and you want them to all follow this logic, you could put this configuration at a global level (/etc/skeema, /usr/local/etc/skeema, or ~/.skeema), again minus all the host
, port
, and socket
lines. You can still override specific settings on a per-host and/or per-schema basis, in the .skeema file corresponding to their directories.