Source Block
The source block defines external systems that provide data to your platform. Sources represent the origin points in your data architecture - the databases, APIs, streams, and files that contain raw data.
Purpose
Sources help you document:
- What external systems provide data
- Where data originates from
- System types and access patterns
- Connection details and characteristics
- Which entities map to which tables or endpoints
Syntax
source "SourceName" {
type = "postgres" | "mysql" | "api" | "file" | "kafka" | "s3" | ...
description = "Source description"
# Connection details (optional)
host = "hostname"
database = "database_name"
schema = "schema_name"
# Metadata
owner = "team-name"
refresh_frequency = "hourly" | "daily" | "weekly" | "realtime"
# Entity mappings (inline)
entity "Domain.Entity" {
table = "table_name" # For databases
notes = "Optional notes"
}
}Source Types
Sources are categorized by their technology stack and access method.
Database Sources
Relational or NoSQL databases.
Common types: "postgres", "mysql", "mongodb", "snowflake", "bigquery"
source "PostgreSQL_Production" {
type = "postgres"
description = "Production PostgreSQL database"
host = "prod-db.company.com"
database = "app_production"
owner = "platform-team"
refresh_frequency = "hourly"
entity "Sales.Customer" {
table = "users"
notes = "Maps to users table with email field"
}
entity "Sales.Order" {
table = "orders"
}
}API Sources
REST, GraphQL, or other web service endpoints.
Common types: "api", "rest", "graphql"
source "Stripe_API" {
type = "api"
description = "Stripe payment processing API"
host = "api.stripe.com"
owner = "finance-team"
refresh_frequency = "hourly"
entity "Sales.Payment" {
endpoint = "charges"
notes = "Payment transactions from Stripe"
}
entity "Platform.Subscription" {
endpoint = "subscriptions"
}
}Stream Sources
Event streams or message queues.
Common types: "kafka", "kinesis", "pubsub", "eventhub"
source "Events_Kafka" {
type = "kafka"
description = "User behavior events via Kafka"
host = "events.company.com:9092"
owner = "analytics-team"
refresh_frequency = "realtime"
entity "Analytics.Event" {
topic = "user_events"
notes = "Real-time user behavior events"
}
}File Sources
Batch files or object storage.
Common types: "s3", "gcs", "file", "sftp"
source "Legacy_Exports" {
type = "s3"
description = "Daily CSV exports from legacy system"
host = "s3://company-data-lake/legacy-exports/"
owner = "legacy-team"
refresh_frequency = "daily"
entity "Sales.Customer" {
path = "customers/*.csv"
notes = "Legacy customer enrichment data"
}
}Attributes
Required Attributes
type
The type of source system.
- Type: String
- Required: Yes
- Common values:
"postgres","mysql","api","kafka","s3", etc.
source "PostgreSQL_Production" {
type = "postgres"
}Optional Attributes
description
A clear description of what the source system is and what data it contains.
- Type: String
- Required: No (but highly recommended)
source "PostgreSQL_Production" {
type = "postgres"
description = "Production database with transactional data"
}host
Host or connection endpoint for the source system.
- Type: String
- Required: No
source "PostgreSQL_Production" {
type = "postgres"
host = "prod-db.company.com"
}database
Database name (for database sources).
- Type: String
- Required: No
source "PostgreSQL_Production" {
type = "postgres"
database = "app_production"
}schema
Schema name (for database sources).
- Type: String
- Required: No
source "PostgreSQL_Production" {
type = "postgres"
schema = "public"
}owner
The team or individual responsible for the source system.
- Type: String
- Required: No (but highly recommended)
source "PostgreSQL_Production" {
type = "postgres"
owner = "platform-team"
}refresh_frequency
How often the source data is updated or refreshed.
- Type: String
- Required: No
- Common values:
"realtime","hourly","daily","weekly"
source "PostgreSQL_Production" {
type = "postgres"
refresh_frequency = "hourly"
}Entity Mappings
Sources contain entity mappings that connect conceptual entities to their physical location in the source system. Mappings are defined inline within source blocks.
Syntax:
entity "Domain.Entity" {
table = "table_name" # For databases
endpoint = "endpoint" # For APIs
topic = "topic_name" # For streams
path = "file_path" # For files
notes = "Optional notes"
}Example:
source "PostgreSQL_Production" {
type = "postgres"
description = "Production database"
entity "Sales.Customer" {
table = "users"
notes = "Maps to users table with email_address field"
}
entity "Sales.Order" {
table = "orders"
notes = "Order transactions with line items"
}
}Examples
Database Source
source "PostgreSQL_Production" {
type = "postgres"
description = "Production PostgreSQL database with transactional data"
host = "prod-db.company.com"
database = "app_production"
owner = "platform-team"
refresh_frequency = "hourly"
entity "Platform.Customer" {
table = "users"
}
entity "Sales.Order" {
table = "orders"
}
entity "Sales.OrderLine" {
table = "order_items"
}
}API Source
source "Stripe_API" {
type = "api"
description = "Stripe payment processing API"
host = "api.stripe.com"
owner = "finance-team"
refresh_frequency = "hourly"
entity "Sales.Payment" {
endpoint = "charges"
notes = "Payment transactions"
}
entity "Platform.Subscription" {
endpoint = "subscriptions"
notes = "Customer subscription data"
}
entity "Platform.Customer" {
endpoint = "customers"
notes = "Stripe customer records"
}
}Stream Source
source "Segment_Events" {
type = "kafka"
description = "User behavior events via Segment/Kafka"
host = "events.company.com:9092"
owner = "analytics-team"
refresh_frequency = "realtime"
entity "Analytics.Event" {
topic = "user_events"
notes = "Real-time user behavior events"
}
entity "Analytics.PageView" {
topic = "page_views"
notes = "Page view tracking"
}
}File Source
source "Legacy_CRM" {
type = "s3"
description = "Daily customer data exports from legacy CRM"
host = "s3://company-data/legacy-crm/"
owner = "legacy-team"
refresh_frequency = "daily"
entity "Sales.Customer" {
path = "customers/*.csv"
notes = "Legacy customer enrichment data"
}
entity "Sales.Lead" {
path = "leads/*.csv"
notes = "Sales lead data"
}
}Complete Example
# Define conceptual model
domain "Sales" {
color = "#e74c3c"
description = "Customer sales and orders"
entity "Customer" {
type = "entity"
description = "A customer who can place orders"
pii = true
attribute "email" {
type = "string"
required = true
pii = true
}
attribute "lifetime_value" {
type = "number"
description = "Total revenue from this customer"
}
has-many "Sales.Order"
}
entity "Order" {
type = "entity"
description = "A customer order"
attribute "amount" {
type = "number"
required = true
}
attribute "placed_at" {
type = "date"
required = true
}
belongs-to "Sales.Customer"
}
}
# Map to logical sources
source "PostgreSQL_Production" {
type = "postgres"
description = "Production database"
host = "prod-db.company.com"
database = "app_production"
owner = "platform-team"
refresh_frequency = "hourly"
entity "Sales.Customer" {
table = "users"
notes = "Primary customer data from users table"
}
entity "Sales.Order" {
table = "orders"
notes = "Order transactions"
}
}
source "Stripe_API" {
type = "api"
description = "Stripe API for payment data"
host = "api.stripe.com"
owner = "finance-team"
refresh_frequency = "hourly"
entity "Sales.Payment" {
endpoint = "charges"
notes = "Payment data from Stripe"
}
}
source "Segment_Stream" {
type = "kafka"
description = "User behavior events via Segment"
host = "events.company.com:9092"
owner = "analytics-team"
refresh_frequency = "realtime"
entity "Analytics.Event" {
topic = "user_events"
notes = "User behavior events"
}
}Best Practices
Document all sources: Track every external system providing data
mmlsource "PostgreSQL_Production" { type = "postgres" description = "Main production database with transactional data" }Specify ownership: Assign clear responsibility for each source
mmlsource "PostgreSQL_Production" { type = "postgres" owner = "platform-team" }Include connection details: Document how to access sources (without secrets)
mmlsource "PostgreSQL_Production" { type = "postgres" host = "prod-db.company.com" database = "app_production" }Never include credentials: Keep secrets out of source definitions
mml// Good source "PostgreSQL_Production" { host = "prod-db.company.com" } // Bad - DO NOT DO THIS source "PostgreSQL_Production" { connection_string = "postgresql://user:password@prod-db.company.com" }Map entities inline: Use inline entity blocks to show table/endpoint mappings
mmlsource "PostgreSQL_Production" { type = "postgres" entity "Sales.Customer" { table = "users" } entity "Sales.Order" { table = "orders" } }Document refresh patterns: Specify data freshness and update frequency
mmlsource "PostgreSQL_Production" { type = "postgres" refresh_frequency = "hourly" } source "Legacy_Exports" { type = "s3" refresh_frequency = "daily" }Use appropriate types: Choose the source type that matches the technology
mml// Use postgres for PostgreSQL databases source "Production_DB" { type = "postgres" } // Use api for REST/GraphQL services source "Stripe" { type = "api" } // Use kafka for event streams source "Events" { type = "kafka" } // Use s3 for file storage source "Data_Lake" { type = "s3" }Add notes to mappings: Explain anything non-obvious about the mapping
mmlentity "Sales.Customer" { table = "users" notes = "Maps to users table with email_address field, requires deduplication" }
