Business Glossary
This plugin pulls business glossary metadata from a yaml-formatted file. An example of one such file is located in the examples directory here.
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[datahub-business-glossary]'
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: datahub-business-glossary
  config:
    # Coordinates
    file: /path/to/business_glossary_yaml
    enable_auto_id: true # recommended to set to true so datahub will auto-generate guids from your term names
# sink configs if needed
Config Details
- Options
 - Schema
 
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
file ✅  One of string, string(path)  | File path or URL to business glossary file to ingest. | 
enable_auto_id  boolean  | Generate guid urns instead of a plaintext path urn with the node/term's hierarchy.  Default: False  | 
The JSONSchema for this configuration is inlined below.
{
  "title": "BusinessGlossarySourceConfig",
  "type": "object",
  "properties": {
    "file": {
      "title": "File",
      "description": "File path or URL to business glossary file to ingest.",
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "string",
          "format": "path"
        }
      ]
    },
    "enable_auto_id": {
      "title": "Enable Auto Id",
      "description": "Generate guid urns instead of a plaintext path urn with the node/term's hierarchy.",
      "default": false,
      "type": "boolean"
    }
  },
  "required": [
    "file"
  ],
  "additionalProperties": false
}
Business Glossary File Format
The business glossary source file should be a .yml file with the following top-level keys:
Glossary: the top level keys of the business glossary file
- version: the version of business glossary file config the config conforms to. Currently the only version released is 
1. - source: the source format of the terms. Currently only supports 
DataHub - owners: owners contains two nested fields
- users: (optional) a list of user ids
 - groups: (optional) a list of group ids
 
 - url: (optional) external url pointing to where the glossary is defined externally, if applicable.
 - nodes: (optional) list of child GlossaryNode objects
 - terms: (optional) list of child GlossaryTerm objects
 
GlossaryNode: a container of GlossaryNode and GlossaryTerm objects
- name: name of the node
 - description: description of the node
 - id: (optional) identifier of the node (normally inferred from the name, see 
enable_auto_idconfig. Use this if you need a stable identifier) - owners: (optional) owners contains two nested fields
- users: (optional) a list of user ids
 - groups: (optional) a list of group ids
 
 - terms: (optional) list of child GlossaryTerm objects
 - nodes: (optional) list of child GlossaryNode objects
 
GlossaryTerm: a term in your business glossary
- name: name of the term
 - description: description of the term
 - id: (optional) identifier of the term (normally inferred from the name, see 
enable_auto_idconfig. Use this if you need a stable identifier) - owners: (optional) owners contains two nested fields
- users: (optional) a list of user ids
 - groups: (optional) a list of group ids
 
 - term_source: One of 
EXTERNALorINTERNAL. Whether the term is coming from an external glossary or one defined in your organization. - source_ref: (optional) If external, what is the name of the source the glossary term is coming from?
 - source_url: (optional) If external, what is the url of the source definition?
 - inherits: (optional) List of GlossaryTerm that this term inherits from
 - contains: (optional) List of GlossaryTerm that this term contains
 - custom_properties: A map of key/value pairs of arbitrary custom properties
 - domain: (optional) domain name or domain urn
 
You can also view an example business glossary file checked in here
Compatibility
Compatible with version 1 of business glossary format. The source will be evolved as we publish newer versions of this format.
Code Coordinates
- Class Name: 
datahub.ingestion.source.metadata.business_glossary.BusinessGlossaryFileSource - Browse on GitHub
 
Questions
If you've got any questions on configuring ingestion for Business Glossary, feel free to ping us on our Slack.