Skip to content

Source Block

The source block defines external systems that provide data to your platform. Sources represent the origin points in your data architecture - the databases, APIs, streams, and files that contain raw data.

Purpose

Sources help you document:

  • What external systems provide data
  • Where data originates from
  • System types and access patterns
  • Connection details and characteristics
  • Which entities map to which tables or endpoints

Syntax

mml
source "SourceName" {
  type = "postgres" | "mysql" | "api" | "file" | "kafka" | "s3" | ...
  description = "Source description"

  # Connection details (optional)
  host = "hostname"
  database = "database_name"
  schema = "schema_name"

  # Metadata
  owner = "team-name"
  refresh_frequency = "hourly" | "daily" | "weekly" | "realtime"

  # Entity mappings (inline)
  entity "Domain.Entity" {
    table = "table_name"  # For databases
    notes = "Optional notes"
  }
}

Source Types

Sources are categorized by their technology stack and access method.

Database Sources

Relational or NoSQL databases.

Common types: "postgres", "mysql", "mongodb", "snowflake", "bigquery"

mml
source "PostgreSQL_Production" {
  type = "postgres"
  description = "Production PostgreSQL database"
  host = "prod-db.company.com"
  database = "app_production"
  owner = "platform-team"
  refresh_frequency = "hourly"

  entity "Sales.Customer" {
    table = "users"
    notes = "Maps to users table with email field"
  }

  entity "Sales.Order" {
    table = "orders"
  }
}

API Sources

REST, GraphQL, or other web service endpoints.

Common types: "api", "rest", "graphql"

mml
source "Stripe_API" {
  type = "api"
  description = "Stripe payment processing API"
  host = "api.stripe.com"
  owner = "finance-team"
  refresh_frequency = "hourly"

  entity "Sales.Payment" {
    endpoint = "charges"
    notes = "Payment transactions from Stripe"
  }

  entity "Platform.Subscription" {
    endpoint = "subscriptions"
  }
}

Stream Sources

Event streams or message queues.

Common types: "kafka", "kinesis", "pubsub", "eventhub"

mml
source "Events_Kafka" {
  type = "kafka"
  description = "User behavior events via Kafka"
  host = "events.company.com:9092"
  owner = "analytics-team"
  refresh_frequency = "realtime"

  entity "Analytics.Event" {
    topic = "user_events"
    notes = "Real-time user behavior events"
  }
}

File Sources

Batch files or object storage.

Common types: "s3", "gcs", "file", "sftp"

mml
source "Legacy_Exports" {
  type = "s3"
  description = "Daily CSV exports from legacy system"
  host = "s3://company-data-lake/legacy-exports/"
  owner = "legacy-team"
  refresh_frequency = "daily"

  entity "Sales.Customer" {
    path = "customers/*.csv"
    notes = "Legacy customer enrichment data"
  }
}

Attributes

Required Attributes

type

The type of source system.

  • Type: String
  • Required: Yes
  • Common values: "postgres", "mysql", "api", "kafka", "s3", etc.
mml
source "PostgreSQL_Production" {
  type = "postgres"
}

Optional Attributes

description

A clear description of what the source system is and what data it contains.

  • Type: String
  • Required: No (but highly recommended)
mml
source "PostgreSQL_Production" {
  type = "postgres"
  description = "Production database with transactional data"
}

host

Host or connection endpoint for the source system.

  • Type: String
  • Required: No
mml
source "PostgreSQL_Production" {
  type = "postgres"
  host = "prod-db.company.com"
}

database

Database name (for database sources).

  • Type: String
  • Required: No
mml
source "PostgreSQL_Production" {
  type = "postgres"
  database = "app_production"
}

schema

Schema name (for database sources).

  • Type: String
  • Required: No
mml
source "PostgreSQL_Production" {
  type = "postgres"
  schema = "public"
}

owner

The team or individual responsible for the source system.

  • Type: String
  • Required: No (but highly recommended)
mml
source "PostgreSQL_Production" {
  type = "postgres"
  owner = "platform-team"
}

refresh_frequency

How often the source data is updated or refreshed.

  • Type: String
  • Required: No
  • Common values: "realtime", "hourly", "daily", "weekly"
mml
source "PostgreSQL_Production" {
  type = "postgres"
  refresh_frequency = "hourly"
}

Entity Mappings

Sources contain entity mappings that connect conceptual entities to their physical location in the source system. Mappings are defined inline within source blocks.

Syntax:

mml
entity "Domain.Entity" {
  table = "table_name"      # For databases
  endpoint = "endpoint"     # For APIs
  topic = "topic_name"      # For streams
  path = "file_path"        # For files
  notes = "Optional notes"
}

Example:

mml
source "PostgreSQL_Production" {
  type = "postgres"
  description = "Production database"

  entity "Sales.Customer" {
    table = "users"
    notes = "Maps to users table with email_address field"
  }

  entity "Sales.Order" {
    table = "orders"
    notes = "Order transactions with line items"
  }
}

Examples

Database Source

mml
source "PostgreSQL_Production" {
  type = "postgres"
  description = "Production PostgreSQL database with transactional data"
  host = "prod-db.company.com"
  database = "app_production"
  owner = "platform-team"
  refresh_frequency = "hourly"

  entity "Platform.Customer" {
    table = "users"
  }

  entity "Sales.Order" {
    table = "orders"
  }

  entity "Sales.OrderLine" {
    table = "order_items"
  }
}

API Source

mml
source "Stripe_API" {
  type = "api"
  description = "Stripe payment processing API"
  host = "api.stripe.com"
  owner = "finance-team"
  refresh_frequency = "hourly"

  entity "Sales.Payment" {
    endpoint = "charges"
    notes = "Payment transactions"
  }

  entity "Platform.Subscription" {
    endpoint = "subscriptions"
    notes = "Customer subscription data"
  }

  entity "Platform.Customer" {
    endpoint = "customers"
    notes = "Stripe customer records"
  }
}

Stream Source

mml
source "Segment_Events" {
  type = "kafka"
  description = "User behavior events via Segment/Kafka"
  host = "events.company.com:9092"
  owner = "analytics-team"
  refresh_frequency = "realtime"

  entity "Analytics.Event" {
    topic = "user_events"
    notes = "Real-time user behavior events"
  }

  entity "Analytics.PageView" {
    topic = "page_views"
    notes = "Page view tracking"
  }
}

File Source

mml
source "Legacy_CRM" {
  type = "s3"
  description = "Daily customer data exports from legacy CRM"
  host = "s3://company-data/legacy-crm/"
  owner = "legacy-team"
  refresh_frequency = "daily"

  entity "Sales.Customer" {
    path = "customers/*.csv"
    notes = "Legacy customer enrichment data"
  }

  entity "Sales.Lead" {
    path = "leads/*.csv"
    notes = "Sales lead data"
  }
}

Complete Example

mml
# Define conceptual model
domain "Sales" {
  color = "#e74c3c"
  description = "Customer sales and orders"

  entity "Customer" {
    type = "entity"
    description = "A customer who can place orders"
    pii = true

    attribute "email" {
      type = "string"
      required = true
      pii = true
    }

    attribute "lifetime_value" {
      type = "number"
      description = "Total revenue from this customer"
    }

    has-many "Sales.Order"
  }

  entity "Order" {
    type = "entity"
    description = "A customer order"

    attribute "amount" {
      type = "number"
      required = true
    }

    attribute "placed_at" {
      type = "date"
      required = true
    }

    belongs-to "Sales.Customer"
  }
}

# Map to logical sources
source "PostgreSQL_Production" {
  type = "postgres"
  description = "Production database"
  host = "prod-db.company.com"
  database = "app_production"
  owner = "platform-team"
  refresh_frequency = "hourly"

  entity "Sales.Customer" {
    table = "users"
    notes = "Primary customer data from users table"
  }

  entity "Sales.Order" {
    table = "orders"
    notes = "Order transactions"
  }
}

source "Stripe_API" {
  type = "api"
  description = "Stripe API for payment data"
  host = "api.stripe.com"
  owner = "finance-team"
  refresh_frequency = "hourly"

  entity "Sales.Payment" {
    endpoint = "charges"
    notes = "Payment data from Stripe"
  }
}

source "Segment_Stream" {
  type = "kafka"
  description = "User behavior events via Segment"
  host = "events.company.com:9092"
  owner = "analytics-team"
  refresh_frequency = "realtime"

  entity "Analytics.Event" {
    topic = "user_events"
    notes = "User behavior events"
  }
}

Best Practices

  1. Document all sources: Track every external system providing data

    mml
    source "PostgreSQL_Production" {
      type = "postgres"
      description = "Main production database with transactional data"
    }
  2. Specify ownership: Assign clear responsibility for each source

    mml
    source "PostgreSQL_Production" {
      type = "postgres"
      owner = "platform-team"
    }
  3. Include connection details: Document how to access sources (without secrets)

    mml
    source "PostgreSQL_Production" {
      type = "postgres"
      host = "prod-db.company.com"
      database = "app_production"
    }
  4. Never include credentials: Keep secrets out of source definitions

    mml
    // Good
    source "PostgreSQL_Production" {
      host = "prod-db.company.com"
    }
    
    // Bad - DO NOT DO THIS
    source "PostgreSQL_Production" {
      connection_string = "postgresql://user:password@prod-db.company.com"
    }
  5. Map entities inline: Use inline entity blocks to show table/endpoint mappings

    mml
    source "PostgreSQL_Production" {
      type = "postgres"
    
      entity "Sales.Customer" {
        table = "users"
      }
    
      entity "Sales.Order" {
        table = "orders"
      }
    }
  6. Document refresh patterns: Specify data freshness and update frequency

    mml
    source "PostgreSQL_Production" {
      type = "postgres"
      refresh_frequency = "hourly"
    }
    
    source "Legacy_Exports" {
      type = "s3"
      refresh_frequency = "daily"
    }
  7. Use appropriate types: Choose the source type that matches the technology

    mml
    // Use postgres for PostgreSQL databases
    source "Production_DB" { type = "postgres" }
    
    // Use api for REST/GraphQL services
    source "Stripe" { type = "api" }
    
    // Use kafka for event streams
    source "Events" { type = "kafka" }
    
    // Use s3 for file storage
    source "Data_Lake" { type = "s3" }
  8. Add notes to mappings: Explain anything non-obvious about the mapping

    mml
    entity "Sales.Customer" {
      table = "users"
      notes = "Maps to users table with email_address field, requires deduplication"
    }

Released under the MIT License.