OpenDDL Specification
The current version of the specification is v1alpha
.
1. Introduction
This document specifies the OpenDDL API, a unified interface for Object-Relational Mapping (ORM) frameworks to export an application's database schema and metadata about it. The objective is to foster interoperability with various tools involved in schema management, database compliance, data catalogs, and data pipelines.
2. Conventions used in this document
The keyword "SHOULD" is used to indicate that among several possibilities, one is recommended as particularly suitable, without implying a requirement. Optional features are described using "MAY" and required features with "MUST".
3. Overview
The OpenDDL API enables ORMs to directly expose the database schema, leveraging the database's Data Definition Language (DDL). This facilitates direct and efficient consumption of schema metadata by downstream tools, maintaining accuracy and relevance.
4. Architecture
The OpenDDL API is designed to be a simple, lightweight, and extensible interface for exporting database schema metadata. It is not a standalone tool, but rather a specification the ORMs or other tools can implement to generate OpenDDL documents. This section outlines the different components involving the OpenDDL API.
4.1 ORMs (and other Data Access Libraries)
ORMs (and other data access libraries) are software components that help software develoeprs define their data model in the native programming language and interact with the database without directly writing SQL queries. This data model is always translated into a specific database schema that is required by the application to function.
4.2 Target Databases
ORMs try to abstract the database schema from the application developer, but in practice, during runtime the application always runs against a concrete target database. The target database is the database that the application is designed to run against. Certain applications utilize ORMs in order to build software that can run against multiple database engine types, in which case they are seen by OpenDDL as having multiple target databases.
4.3 Application Schemas
The application schema is the data model defined by the ORM in the native programming language when translated into the target database schema. As databases vary in their DDL (data definition language) implementations and syntax, the schema is specified in the DDL of the target database.
4.4 Schema Loaders
Schema Loaders are the key components within the OpenDDL architecture. Their primary function is to introspect the application data model as it is natively defined within an ORM-backed application's source code. Following this introspection, they are responsible for generating and emitting a valid JSON document that adheres to the OpenDDL JSONSchema specification. This document represents the application's database schema in a standardized format, facilitating further actions by downstream tools and processes.
The OpenDDL JSON document schema is described in Section 5.
4.5 Schema Consumers
Schema Consumers act upon the output generated by Schema Loaders, interpreting and parsing the JSON document for various
purposes. These may include, but are not limited to, validating schema correctness, integrating with data cataloging
services, or facilitating schema migrations. Consumers must possess the capability to understand and process the
spec.ddl
field, which contains the Application Schema in the Target Database's DDL,
ensuring the accuracy and integrity of the interpreted schema.
4.5 Annotations
Annotations within the OpenDDL framework serve as a means to enrich database schema definitions with additional metadata, surpassing the limitations of standard SQL annotation capabilities. These annotations cater to a wide array of needs, including but not limited to, data governance and compliance, security protocols, code ownership, and lineage tracking.
The specification defines a set of standard annotations to foster uniformity and facilitate interoperability across different tools and platforms within the ecosystem. Schema Loader authors SHOULD employ these standardized annotations wherever applicable.
Annotations can be provided in two forms:
- Document-level - annotations that describe the Application Schema as a whole. This may include, the source ORM, an identification of the Schema Loader generating the document or the team owning the schema.
- Resource-level - annotations that describe attributes a specific resource in the Application Schema. This may include, annotations describing certain fields as containing Personally Identifiable Information (PII) or sensitive data that should be treated with caution.
5. OpenDDL Schema
The OpenDDL JSONSchema is a specification that defines the structure of the JSON document output by the OpenDDL tool. The full schema definition is available in Appendix A. This section is relevant for authors of Schema Loaders who are responsible for generating OpenDDL JSON documents as well as Schema Consumers who need to interpret and process the output.
5.1 Schema Structure
The OpenDDL JSON document comprises the following key components:
apiVersion
: The version of the OpenDDL specification adhered to by the document.kind
: The type of resource described in the document.metadata
: Metadata about the schema, including the database dialect and top-level annotations.spec
: The schema specification, including the DDL and annotations.
5.1 apiVersion
and kind
The apiVersion
field specifies the version of the OpenDDL specification that the document adheres to. The kind
field
indicates the type of resource described in the document. Both fields are required and MUST be present in the output.
5.2 metadata
The metadata
field contains information about the schema, including the target database dialect and top-level
annotations. The dialect
field specifies the database dialect the schema is designed for. It is a required field and
MUST be included in the output.
5.3.1 dialect
The dialect
must be one of the following list:
mysql
mariadb
postgresql
sqlite
sqlserver
clickhouse
redshift
oracle
libsql
snowflake
cockroachdb
tidb
5.3.2 annotations
The annotations
field contains Document-Level annotations, and is an object comprised of key-value pairs of
annotations describing the entire OpenDDL document.
Annotations keys MUST adhere to the following format: namespace/key
. Namespaces MUST be formatted as a FQDN, and keys
MUST be alphanumeric with underscores and hyphens allowed. The OpenDDL specification reserves the openddl.io
namespace
for standard annotations.
For a complete list of standard annotations, refer to Appendix B.
5.3 spec
The spec
field contains the schema specification, including the DDL and annotations.
5.3.1 ddl
The ddl
field is a string containing the Application Schema as expressed in the DDL of the target database.
It is a required field and MUST be included in the output.
5.3.2 annotations
The annotations
field is an array of objects, each containing a selector
and properties
. The selector
object
specifies the scope of the annotation, and the properties
object contains key-value pairs of annotations.
5.3.2.1 selector
To support the diversity of data models existing between different database implementations, the selector
object is
flexible and MAY contain any number of key-value pairs. However, Schema Loader authors SHOULD adhere to the following
conventions when defining selectors keys where applicable:
schema
: Many database systems support namespacing resources into multiple schemas in a single database. Theschema
name should be specified if the . If the described application defines all resources in a single database schema, this field MAY be omitted.table
: The name of the table to which the annotation applies.column
: The name of the column to which the annotation applies. If the annotation applies to the entire table, this field MAY be omitted. If thecolumn
field is provided, thetable
field SHOULD also be provided.
The values of the selector
keys are strings and SHOULD adhere to the following conventions:
- The value of the field should be the exact name of the schema, table, column or other resource in the target database.
- The value SHOULD be case-sensitive and match the exact name of the resource in the target database.
- Wildcards ("*" character) are NOT supported in the
selector
values in the current version of the OpenDDL specification.
6. Security Considerations
Implementers SHOULD be aware of the security implications of exposing database schema metadata. Access to the OpenDDL tool and its output SHOULD be restricted to prevent unauthorized disclosure of potentially sensitive information.
Appendix A: OpenDDL JSONSchema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://openddl.io/schema.json",
"title": "OpenDDL Document Schema",
"type": "object",
"properties": {
"apiVersion": {
"type": "string",
"description": "The API version of the document",
"const": "openddl.io/v1alpha"
},
"kind": {
"type": "string",
"kind": "The kind of resource described in the document",
"const": "Schema"
},
"metadata": {
"type": "object",
"properties": {
"dialect": {
"type": "string",
"enum": [
"mysql",
"postgresql",
"sqlserver",
"sqlite",
"oracle",
"other"
]
},
"annotations": {
"$ref": "#/$defs/schemaAnnotations"
}
},
"required": [
"dialect"
],
"additionalProperties": false
},
"spec": {
"type": "object",
"properties": {
"ddl": {
"type": "string"
},
"annotations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"selector": {
"$ref": "#/$defs/selector"
},
"properties": {
"$ref": "#/$defs/objectAnnotations"
}
},
"required": [
"selector",
"properties"
],
"additionalProperties": false
}
}
},
"required": [
"ddl",
"annotations"
],
"additionalProperties": false
}
},
"required": [
"apiVersion",
"kind",
"metadata",
"spec"
],
"additionalProperties": false,
"$defs": {
"annotationKey": {
"patternProperties": {
"^[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
}
},
"schemaAnnotations": {
"type": "object",
"patternProperties": {
"^openddl\\.io/(owner|orm|schema-loader)$": {
"type": "string"
},
"^(?!openddl\\.io/)[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
},
"additionalProperties": false
},
"objectAnnotations": {
"type": "object",
"patternProperties": {
"^openddl\\.io/(owner|pii|sensitive)$": {
"type": "string"
},
"^(?!openddl\\.io/)[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
},
"additionalProperties": false
},
"selector": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"minProperties": 1
}
}
}
Appendix B: Standard Annotations
Document-level annotations
openddl.io/orm
- An identifier for the source ORM of the Application Schema. Schema Loader authors SHOULD use the Project Repository URL without the scheme where applicable. For example, For example:github.com/ent/ent
openddl.io/schema-loader
- An identifier for the tool generating this document. Schema Loader authors SHOULD use the Project Repository URL (without the scheme) where applicable. For example:github.com/openddl/schema-loader-django
If the schema-loader is built-in to the ORM/framework authors SHOULD usebuiltin
.openddl.io/owner
- Organization that assign specific code owners to schemas may use this annotation to denote the owner of the Application Schema.
Resource-level annotations
openddl.io/owner
- Organization that assign specific code owners to schemas may use this annotation to denote the owner of the Application Schema.openddl.io/pii
- This annotation is used to denote that a specific resource contains Personally Identifiable Information (PII). Schema Loaders authors SHOULD use the string"true"
when the field contains PII and the string"false"
to explicitly denote that the field does not contain PII. This key MAY be omitted in all other cases.openddl.io/sensitive
- Used to mark that a resource contains sensitive information and should be handled with care. For instance, downstream consumers, may opt to mask any value contained in this resource when printing logs. Schema Loaders authors SHOULD use the string"true"
when the field contains such information and the string"false"
to explicitly denote that the field does not contain it. This key MAY be omitted in all other cases.