As I am wrapping up 3.3.x changes and looking forward to what
changes and features people are asking for and what I would like
improved, I am thinking it is time to look at larger changes around a
Liquibase 4.0 release.
My most-likely overly-ambitious goal for Liquibase 4.0 is to do a
major housecleaning of the Liquibase codebase to simplify it while
increasing testability and test coverage. There have been changes in
scope and requirements over the last 7 years and the codebase is getting
a bit over-complex and haphazard. This makes it harder for me to
maintain and a barrier to entry for contributors.
Previously I thought about trying to break this into a few smaller
blocks/themes and focus on one for 3.4, one for 3.5, etc. until they are
done and have 4.0 be just the final cleanup “no more changes” release.
After doing some work with the testing and the snapshot logic, however,
I think they all bleed into each other enough to make it much easier to
do a single major release that just breaks everything.
The jump to 4.0 will signify compatibility-breaking API changes, but
there should be no changes to the changelog format: 4.0 should be a
drop-in replacement for anyone just using a changelog and not writing
extensions etc.
The major themes/changes I’m looking to make are:
Improvement of how state is managed and high-level functions are
called
Currently there is the liquibase.Liquibase façade object that wraps
many common functions with method parameters, a variety of
“configuration” objects (such as DiffOutputControl), and a heavy use of
singletons. Ant/maven/command line etc. call out to the Liquibase façade
as best they can.
The problems with this setup include:
- Liquibase facace is getting overly large and complex
- Difficult to add/override standard command functionality in
extensions
- There is a lot of duplicated setup/validation
logic in ant/maven/command line/etc. before calling the façade
objects
- Singletons not always cleaned up between calls
and/or run into each other
- Difficult/impossible for
extensions to rely on configuration that does not happen to be
on the method parameters passed along
- Method signatures
get long but still don’t have all parameters we sometimes need
Planned changes:
- Switch to using “Command” objects in favor of a monolithic
Liquibase façade
- Allows command logic to be
encapsulated and more easily extended including validation
and setup
- Allows new commands to be written within
extensions and exposed through command line, maven,
etc.
- Create a hierarchical “Scope” object
- Works similar to AngularJS $scope where a root container
object is created and passed along to sub-methods.
- Along the call chain, new attributes can be added that are
only visible to methods further down the call chain
- Root Scope object created as part of the Command execution and
builds from there
- Replace use of singletons with
objects added to the Scope
- Ideally we can access the
scope object without needing to include it in every method
signature, but unsure on a good implementation yet
- Configuration objects available from Scope object
Improved Testing/Testability
Liquibase currently doesn’t have a great way to handle testing of the
interaction with the database. Traditionally, I’ve had the
liquibase-integration-tests module which mainly uses a set of changeSets
with example changes and scenarios and preconditions to test that they
ran successfully.
Problems with this setup include:
- Slow to execute
- Contributors need special
database setup to test and so cannot normally run the tests to
validate their changes
- Preconditions are not designed to
be an assertion library
- No structure to tests to know
what is tested and what is not
Beyond the database interaction tests, there is some unit test
coverage but not enough
Planned changes:
- Fully implement my new “VerifiedTest” framework. The idea
is that each test creates a simple text description of the
interaction (such as the SQL string to execute) and a way to
validate the test passed. Previous test run text descriptions
and the results are stored in a markdown-formatted file. On each
test run, the text description of the code is compared to the
last run and if they are the same (most of the time) they are
just marked as passed. If they are different, the validation
code is used to make sure the new version is still correct and
then the markdown file is updated. If the validation fails, the
test fails. If the validation cannot run (database is
unavailable) the markdown file is updated but marked as “not
validated”
- The hope is to allow for database tests
in a more standard test framework, allow most integration
tests to run in unit-test speed, allow contributors to know
when they have potentially broken things even if they don’t
have the database to test against, and provide a way for me
to see how the interactions have changed from within the
pull request system
- Finish move to Spock
testing
- TDD develop new and changed 4.0 functionality to
increase general coverage
- Use generative testing to
ensure all permutations are tested, both with standard unit test
and “VerifiedTest”
Use more 3rd party libraries
In the past, I’ve tried to avoid the use of 3rd party libraries in
order to avoid jar-hell for people using Liquibase. I think there are
few places where there have been enough convergence on a “standard”
and/or isolated-enough use cases that I should introduce some 3rd party
libraries in order to simplify my codebase. In particular:
- SLF4j instead of custom logging wrapper over
java.logging
- Apache Commons-CLI: Only really needed if
running command-line version where jar-hell doesn’t really
matter since it’s more of a packaged application
- Considering but not decided (need to research more, don’t want
to cause issues for users with different versions or
technologies)
- Serialize/Deserialize logic? Need to
research options for XML and/or YAML/JSON
- Dependency Injection/Class finding/Classloading logic? Maybe
spring? Maybe OSGi?
Improve database snapshot functionality
Database Snapshot support was never really central to Liquibase, but
it has become more and more used within Liquibase. The current
implementation is overly complex and the way it abstracts the logic for
extension doesn’t really fit with how extension is happening in real
life leading to performance issues and excessive code writing.
Furthermore, testing is slow and difficult to impossible.
Planned changes:
- Change snapshot algorithm from starting with a single
object (schema, table, etc.) and then recursively finding
related objects to a process where we first just fetch all
objects in the database and then connect up those objects in
memory if needed. The base object to snapshot (a schema, a
table, a column, etc.) is still passed to each of the fetch
methods which can limit what is read from the database if it so
chooses, but it will be a much less convoluted process
- Ensure the snapshot interfaces and base classes do not make
JDBC or even RDBMS assumptions. The snapshot process should
be able to handle non-traditional “databases” such as
hibernate mappings, changelog files, mongodb and other nosql
database, etc. Ideally it would even be able to snapshot
non-databases such as server configurations although that is
less important.
- Ability (or at least API hooks) to
support data diff
- Add a way to specify subsets of
items to not snapshot. For example, don’t include tables that
match the name “ADM_.*”
- Needs to be able to do
something like “snapshot all objects but only diff the data
in “.*_lookup” tables and only include tablespace
information for “.*_lob” tables
- Use VerifyTest
framework to ensure good testing of the snapshot functionality
- Better model the connection between primary keys, foreign keys,
unique constraints, and indexes
Improve Change and SQL Generation logic
Currently we have Change classes which represent what can be in a
changeLog file. These generate one or more Statement objects which are a
lower-level logical database change. The Statement objects are then fed
to SqlGenerator objects which create the actual SQL based on the
Statement and the Database.
Planned changes:
- Remove Change/Statement distinction in favor of a more
general purpose Action class. The current Change and Statement
objects are mainly duplicates of each other and there doesn’t
really need to be a distinction.
- The new
Action classes will also include things that are currently
outside the scope of the Change/Statement objects such as
the metadata lookup. Bringing the metadata lookup into the same
“Action” framework will allow us to have just one code path for
all “I want to do X against this database” logic
- Change most SqlGenerator logic from building up SQL strings
programmatically to using simple text files with templates of the
SQL that can be filled in
Improve Cross-database Logic
Both update and snapshot logic currently have issues with
cross-database functionality.
Data types is the major problem:
On update, sometimes people want to be able to specify a simple type
like “text” and have that mean “clob” on one database and
“nvarchar(max)” on another. Other times, people want “text” to mean
“clob” on one database but “text” (not nvarchar(max)) on another. Or
“int” should be the database default “int” on all but oracle where it
should be number(23). Then there are boolean types where some databases
don’t support Boolean so you need to use “bit” but you also sometimes
need to specify a actual “bit” type which isn’t used as a boolean and so
should be tinyint on the db that supports Boolean but not bit.
On snapshot, if you are comparing a mssql and a mysql database should
you mark the columns as different if they data types are int vs.
integer? What about nvarchar(10) vs varchar(10) when one doesn’t support
nvarchar? Bit vs. Boolean? Text vs. nvarchar(max)? Text. Vs. Text (when
mssql’s nvarchar(max) is more like mysql text?)
On generateChangeLog, do you generate generic types or
database-specific types?
Case handling is the other major problem:
How should differences in case be handled in comparisons? When should
case sensitivity be preserved and when should it not matter?
There are other issues too:
- Auto-generated names vary, how do we best handle
those?
- If Mysql has an index on a FK column but oracle
doesn’t, is that a difference to fix since mysql auto-generates
the index?
- If you try to create a sequence on a database
that doesn‘t suppport sequences, should that be an error? Or
expected to fail and skipped?
- Are sequences
different than other non-supported features like
non-clustered PKs, full text indexes, etc.?
General Code Improvements
- Handle multiple active connections
- ChangeSet/Actions can target different connections
- Allows multiple databases to be updated in concert
- Want to further reduce duplication of code between XML and
YAML/JSON parsers and serialize/deserialize logic
- Simple
SQL Parser
- Enough to be able to handle strings
vs. keywords vs. objects
- May be helpful with new
“Action Template” functionality
- May be helpful with
<sql> <createView> etc. validation and
checksum
- More granularity on checksum versioning
- Currently there is a “version” as part of the checksum tag for
when I need to make a change to the logic that affects how they are
generated, but most often there changes just in individual tags or
certain scenarios of certain tags. We need a better way to handle
this to make updates more seamless for everyone.
- UTF8 /
Other Charsets
- I need to better understand
charset handling and ensure we are handling files
correctly.
- There are some placeholder hooks and naming
in place to support non-java uses of the code. In particular I
was hoping to be able to use ikvm to re-compile most of the
Liquibase logic for .Net and just plug in particular classes to
make it better integrate (.net-native connections, xml parsers,
etc.) There has never been any traction on this and I think it
should be pulled out to simplify things
- Improved prepared
statement logic: sometimes you need to use prepared statements,
not simple statements. Liquibase has tried to avoid prepared
statements and so places where they are needed are badly wedged
in.
- Safer modifySql logic: currently the modifySql just
does a simple string replacement of the SQL, but if/when the
generated SQL logic changes that can transparently break
previously working modifySql. Need a way to make this safer
- Better OSGi support: I don’t really know OSGi well enough to
know if what we have is good or not
- Separate SQL
logging: Currently most SQL goes through the DEBUG level logging
but people often want SQL logged but no other debug info and/or
to log SQL to a separate location. Ensure all SQL is logged and
handled separately
- The tag table structure currently
doesn’t support multiple tags at the same point and doesn’t
always track all the changes in a tag well. Probably need a
separate DATABASECHANGELOGTAG table
- Refactor the Database
API: It is currently a mix of “Dialect logic”, connection
handling, and more. Some logic should be split out, other
dialect logic scattered throughout the code should be brought
into the Database class
- Refactor ResourceAccessor API: I
made some changes with 3.3 but ensure the APIs cover what is
needed
- Clean up multi-schema support: There is some support
for managing multiple schemas but it is not consistently used
and supported.
- Move non-core database support to
extensions
- What are core databases? I would
suggest mysql, pgsql, oracle, mssql, db2
- Should not be using “database instanceof MysqlDatabase” etc.
Should be using subclassing instead.
New Features
- Postconditions: Like preconditions but ran before
committing the changeSet
- updateReference,
rollbackReference, and other *Reference commands that
perform the same logic as the normal version (update, rollback,
etc) but against the “reference” database.
- Improved
DBDoc with an updated skin and new features
Infrastructure Improvements:
- Split SDK from main Liquibase code and improve SDK
- Improve extension portal
- Improve generation of doc
for website
- Improve Javadoc
- Consider Grade vs
Maven
- Testing of Liquibase in Java 7 and 8
- Not
yet ready to drop Java 6 support
- Liquibase 3
compatibility layer: Is it possible? Is it needed?
- Move
Liquibase blog to github pages
- Vet all classes
with extensions and subclassing in mind
What are your thoughts on the 4.0 feature list? Anything you
think should be added or removed?
Nathan