Skip to main content

OpenDDL Specification

The current version of the specification is v1alpha.

1. Introduction

This document specifies the OpenDDL API, a unified interface for Object-Relational Mapping (ORM) frameworks to export an application's database schema and metadata about it. The objective is to foster interoperability with various tools involved in schema management, database compliance, data catalogs, and data pipelines.

2. Conventions used in this document

The keyword "SHOULD" is used to indicate that among several possibilities, one is recommended as particularly suitable, without implying a requirement. Optional features are described using "MAY" and required features with "MUST".

3. Overview

The OpenDDL API enables ORMs to directly expose the database schema, leveraging the database's Data Definition Language (DDL). This facilitates direct and efficient consumption of schema metadata by downstream tools, maintaining accuracy and relevance.

4. Architecture

The OpenDDL API is designed to be a simple, lightweight, and extensible interface for exporting database schema metadata. It is not a standalone tool, but rather a specification the ORMs or other tools can implement to generate OpenDDL documents. This section outlines the different components involving the OpenDDL API.

4.1 ORMs (and other Data Access Libraries)

ORMs (and other data access libraries) are software components that help software develoeprs define their data model in the native programming language and interact with the database without directly writing SQL queries. This data model is always translated into a specific database schema that is required by the application to function.

4.2 Target Databases

ORMs try to abstract the database schema from the application developer, but in practice, during runtime the application always runs against a concrete target database. The target database is the database that the application is designed to run against. Certain applications utilize ORMs in order to build software that can run against multiple database engine types, in which case they are seen by OpenDDL as having multiple target databases.

4.3 Application Schemas

The application schema is the data model defined by the ORM in the native programming language when translated into the target database schema. As databases vary in their DDL (data definition language) implementations and syntax, the schema is specified in the DDL of the target database.

4.4 Schema Loaders

Schema Loaders are the key components within the OpenDDL architecture. Their primary function is to introspect the application data model as it is natively defined within an ORM-backed application's source code. Following this introspection, they are responsible for generating and emitting a valid JSON document that adheres to the OpenDDL JSONSchema specification. This document represents the application's database schema in a standardized format, facilitating further actions by downstream tools and processes.

The OpenDDL JSON document schema is described in Section 5.

4.5 Schema Consumers

Schema Consumers act upon the output generated by Schema Loaders, interpreting and parsing the JSON document for various purposes. These may include, but are not limited to, validating schema correctness, integrating with data cataloging services, or facilitating schema migrations. Consumers must possess the capability to understand and process the spec.ddl field, which contains the Application Schema in the Target Database's DDL, ensuring the accuracy and integrity of the interpreted schema.

4.5 Annotations

Annotations within the OpenDDL framework serve as a means to enrich database schema definitions with additional metadata, surpassing the limitations of standard SQL annotation capabilities. These annotations cater to a wide array of needs, including but not limited to, data governance and compliance, security protocols, code ownership, and lineage tracking.

The specification defines a set of standard annotations to foster uniformity and facilitate interoperability across different tools and platforms within the ecosystem. Schema Loader authors SHOULD employ these standardized annotations wherever applicable.

Annotations can be provided in two forms:

  • Document-level - annotations that describe the Application Schema as a whole. This may include, the source ORM, an identification of the Schema Loader generating the document or the team owning the schema.
  • Resource-level - annotations that describe attributes a specific resource in the Application Schema. This may include, annotations describing certain fields as containing Personally Identifiable Information (PII) or sensitive data that should be treated with caution.

5. OpenDDL Schema

The OpenDDL JSONSchema is a specification that defines the structure of the JSON document output by the OpenDDL tool. The full schema definition is available in Appendix A. This section is relevant for authors of Schema Loaders who are responsible for generating OpenDDL JSON documents as well as Schema Consumers who need to interpret and process the output.

5.1 Schema Structure

The OpenDDL JSON document comprises the following key components:

  • apiVersion: The version of the OpenDDL specification adhered to by the document.
  • kind: The type of resource described in the document.
  • metadata: Metadata about the schema, including the database dialect and top-level annotations.
  • spec: The schema specification, including the DDL and annotations.

5.1 apiVersion and kind

The apiVersion field specifies the version of the OpenDDL specification that the document adheres to. The kind field indicates the type of resource described in the document. Both fields are required and MUST be present in the output.

5.2 metadata

The metadata field contains information about the schema, including the target database dialect and top-level annotations. The dialect field specifies the database dialect the schema is designed for. It is a required field and MUST be included in the output.

5.3.1 dialect

The dialect must be one of the following list:

  • mysql
  • mariadb
  • postgresql
  • sqlite
  • sqlserver
  • clickhouse
  • redshift
  • oracle
  • libsql
  • snowflake
  • cockroachdb
  • tidb

5.3.2 annotations

The annotations field contains Document-Level annotations, and is an object comprised of key-value pairs of annotations describing the entire OpenDDL document.

Annotations keys MUST adhere to the following format: namespace/key. Namespaces MUST be formatted as a FQDN, and keys MUST be alphanumeric with underscores and hyphens allowed. The OpenDDL specification reserves the openddl.io namespace for standard annotations.

For a complete list of standard annotations, refer to Appendix B.

5.3 spec

The spec field contains the schema specification, including the DDL and annotations.

5.3.1 ddl

The ddl field is a string containing the Application Schema as expressed in the DDL of the target database. It is a required field and MUST be included in the output.

5.3.2 annotations

The annotations field is an array of objects, each containing a selector and properties. The selector object specifies the scope of the annotation, and the properties object contains key-value pairs of annotations.

5.3.2.1 selector

To support the diversity of data models existing between different database implementations, the selector object is flexible and MAY contain any number of key-value pairs. However, Schema Loader authors SHOULD adhere to the following conventions when defining selectors keys where applicable:

  • schema: Many database systems support namespacing resources into multiple schemas in a single database. The schema name should be specified if the . If the described application defines all resources in a single database schema, this field MAY be omitted.
  • table: The name of the table to which the annotation applies.
  • column: The name of the column to which the annotation applies. If the annotation applies to the entire table, this field MAY be omitted. If the column field is provided, the table field SHOULD also be provided.

The values of the selector keys are strings and SHOULD adhere to the following conventions:

  • The value of the field should be the exact name of the schema, table, column or other resource in the target database.
  • The value SHOULD be case-sensitive and match the exact name of the resource in the target database.
  • Wildcards ("*" character) are NOT supported in the selector values in the current version of the OpenDDL specification.

6. Security Considerations

Implementers SHOULD be aware of the security implications of exposing database schema metadata. Access to the OpenDDL tool and its output SHOULD be restricted to prevent unauthorized disclosure of potentially sensitive information.

Appendix A: OpenDDL JSONSchema

{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://openddl.io/schema.json",
"title": "OpenDDL Document Schema",
"type": "object",
"properties": {
"apiVersion": {
"type": "string",
"description": "The API version of the document",
"const": "openddl.io/v1alpha"
},
"kind": {
"type": "string",
"kind": "The kind of resource described in the document",
"const": "Schema"
},
"metadata": {
"type": "object",
"properties": {
"dialect": {
"type": "string",
"enum": [
"mysql",
"postgresql",
"sqlserver",
"sqlite",
"oracle",
"other"
]
},
"annotations": {
"$ref": "#/$defs/schemaAnnotations"
}
},
"required": [
"dialect"
],
"additionalProperties": false
},
"spec": {
"type": "object",
"properties": {
"ddl": {
"type": "string"
},
"annotations": {
"type": "array",
"items": {
"type": "object",
"properties": {
"selector": {
"$ref": "#/$defs/selector"
},
"properties": {
"$ref": "#/$defs/objectAnnotations"
}
},
"required": [
"selector",
"properties"
],
"additionalProperties": false
}
}
},
"required": [
"ddl",
"annotations"
],
"additionalProperties": false
}
},
"required": [
"apiVersion",
"kind",
"metadata",
"spec"
],
"additionalProperties": false,
"$defs": {
"annotationKey": {
"patternProperties": {
"^[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
}
},
"schemaAnnotations": {
"type": "object",
"patternProperties": {
"^openddl\\.io/(owner|orm|schema-loader)$": {
"type": "string"
},
"^(?!openddl\\.io/)[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
},
"additionalProperties": false
},
"objectAnnotations": {
"type": "object",
"patternProperties": {
"^openddl\\.io/(owner|pii|sensitive)$": {
"type": "string"
},
"^(?!openddl\\.io/)[a-zA-Z0-9.-]+/[a-zA-Z0-9_-]+$": {
"type": "string"
}
},
"additionalProperties": false
},
"selector": {
"type": "object",
"additionalProperties": {
"type": "string"
},
"minProperties": 1
}
}
}

Appendix B: Standard Annotations

Document-level annotations

  • openddl.io/orm - An identifier for the source ORM of the Application Schema. Schema Loader authors SHOULD use the Project Repository URL without the scheme where applicable. For example, For example: github.com/ent/ent
  • openddl.io/schema-loader - An identifier for the tool generating this document. Schema Loader authors SHOULD use the Project Repository URL (without the scheme) where applicable. For example: github.com/openddl/schema-loader-django If the schema-loader is built-in to the ORM/framework authors SHOULD use builtin.
  • openddl.io/owner - Organization that assign specific code owners to schemas may use this annotation to denote the owner of the Application Schema.

Resource-level annotations

  • openddl.io/owner - Organization that assign specific code owners to schemas may use this annotation to denote the owner of the Application Schema.
  • openddl.io/pii - This annotation is used to denote that a specific resource contains Personally Identifiable Information (PII). Schema Loaders authors SHOULD use the string "true" when the field contains PII and the string "false" to explicitly denote that the field does not contain PII. This key MAY be omitted in all other cases.
  • openddl.io/sensitive - Used to mark that a resource contains sensitive information and should be handled with care. For instance, downstream consumers, may opt to mask any value contained in this resource when printing logs. Schema Loaders authors SHOULD use the string "true" when the field contains such information and the string "false" to explicitly denote that the field does not contain it. This key MAY be omitted in all other cases.