Skip to content

Logical Modeling

The logical layer bridges the gap between your conceptual business model and physical data sources. It defines where data comes from and how it maps to your conceptual entities.

Core Principles

1. Source-First Mapping

Logical models connect conceptual entities to real data sources:

  • Define each data source (database, API, stream, file)
  • Map source tables/endpoints to conceptual entities
  • Document transformation logic and business rules
  • Handle multiple sources for the same entity

2. Independence from Physical Schema

Focus on logical connections, not implementation details:

  • Abstract away database-specific syntax
  • Define mappings in terms of business concepts
  • Allow flexibility in physical implementation
  • Support multiple source systems for one entity

3. Data Lineage

Track data flow from source to consumption:

  • Document origin of each field
  • Capture transformation steps
  • Enable impact analysis
  • Support data governance

Components

Sources

A source represents an external system providing data to your platform.

Source Types:

  1. Database - Relational or NoSQL databases

    • Example: PostgreSQL, MySQL, MongoDB
    • Characteristics: Structured, queryable, transactional
    • Access pattern: SQL queries or database-specific APIs
  2. API - REST, GraphQL, or other web services

    • Example: Stripe API, Salesforce API, custom microservices
    • Characteristics: Request/response, rate-limited, versioned
    • Access pattern: HTTP requests with authentication
  3. Stream - Event streams or message queues

    • Example: Kafka, Kinesis, Pub/Sub
    • Characteristics: Real-time, append-only, high-volume
    • Access pattern: Consumer subscriptions
  4. File - Batch files or object storage

    • Example: S3, GCS, SFTP
    • Characteristics: Batch-oriented, scheduled, varied formats
    • Access pattern: File readers, periodic ingestion

MML Syntax:

mml
logical_model "my_platform" {
  source "production_db" {
    type = "database"
    description = "Main PostgreSQL production database"
  }

  source "stripe_api" {
    type = "api"
    description = "Stripe payment processing API"
  }

  source "events_stream" {
    type = "stream"
    description = "User behavior events from Kafka"
  }

  source "csv_exports" {
    type = "file"
    description = "Daily CSV exports from legacy system"
  }
}

Entity Mappings

Entity mappings connect conceptual entities to source tables or endpoints.

Purpose:

  • Link conceptual entities to physical data
  • Define which source provides each entity's data
  • Support multiple sources for a single entity (entity resolution)
  • Document table/endpoint names

MML Syntax:

mml
logical_model "e_commerce" {
  source "production_db" {
    type = "database"
    description = "PostgreSQL production database"
  }

  source "crm_api" {
    type = "api"
    description = "Salesforce CRM API"
  }

  # Map conceptual entity to database table
  entity_mapping "platform.customer" {
    source = "production_db"
    table = "users"
    description = "Customer data from main users table"
  }

  # Map same entity to additional source (entity resolution)
  entity_mapping "platform.customer" {
    source = "crm_api"
    endpoint = "contacts"
    description = "Additional customer data from CRM"
  }

  # Map order entity
  entity_mapping "sales.order" {
    source = "production_db"
    table = "orders"
    description = "Order transactions"
  }
}

Attributes and Transformations

Attributes define how fields map from source to conceptual entity, including transformations.

Transformation Types:

  1. Direct mapping - Field exists in source as-is

    mml
    attribute "email" {
      source_field = "email_address"
    }
  2. Type conversion - Change data type

    mml
    attribute "created_at" {
      source_field = "created_timestamp"
      transformation = "CAST(created_timestamp AS TIMESTAMP)"
    }
  3. Calculation - Derive from multiple fields

    mml
    attribute "full_name" {
      source_fields = ["first_name", "last_name"]
      transformation = "CONCAT(first_name, ' ', last_name)"
    }
  4. Lookup/Join - Reference data from another source

    mml
    attribute "country_name" {
      source_field = "country_code"
      transformation = "JOIN countries ON country_code = countries.code"
    }

MML Syntax:

mml
entity_mapping "platform.customer" {
  source = "production_db"
  table = "users"

  attribute "customer_id" {
    source_field = "id"
    description = "Primary key"
  }

  attribute "email" {
    source_field = "email_address"
    description = "User's email"
  }

  attribute "full_name" {
    source_fields = ["first_name", "last_name"]
    transformation = "CONCAT(first_name, ' ', last_name)"
    description = "Concatenated full name"
  }

  attribute "created_at" {
    source_field = "created_timestamp"
    transformation = "CAST(created_timestamp AS TIMESTAMP)"
    description = "Account creation timestamp"
  }

  attribute "lifetime_value" {
    source_field = "ltv_cents"
    transformation = "ltv_cents / 100.0"
    description = "LTV in dollars (converted from cents)"
  }
}

Relationships

Logical relationships define foreign keys and join conditions between source tables.

Relationship Patterns:

  1. Simple foreign key

    mml
    relationship "order_to_customer" {
      from = "sales.order"
      to = "platform.customer"
      foreign_key = "customer_id"
      description = "Each order belongs to one customer"
    }
  2. Composite key

    mml
    relationship "line_item_to_order" {
      from = "sales.line_item"
      to = "sales.order"
      foreign_keys = ["order_id", "order_version"]
      description = "Line items reference versioned orders"
    }
  3. Cross-source relationship

    mml
    relationship "customer_to_crm_contact" {
      from = "platform.customer" # from production_db
      to = "platform.customer"   # to crm_api
      join_condition = "users.email = contacts.email"
      description = "Match customers across systems via email"
    }

MML Syntax:

mml
logical_model "e_commerce" {
  # ... sources and mappings ...

  relationship "order_to_customer" {
    from = "sales.order"
    to = "platform.customer"
    foreign_key = "customer_id"
    description = "FK: orders.customer_id -> users.id"
  }

  relationship "payment_to_order" {
    from = "sales.payment"
    to = "sales.order"
    foreign_key = "order_id"
    description = "FK: payments.order_id -> orders.id"
  }
}

Best Practices

1. Document Source Details

Provide context for each data source:

  • Source system name and type
  • Connection details (without credentials!)
  • Update frequency and latency
  • Data quality expectations
  • Source owner/team

2. Use Clear Naming

  • Source names: Match system names (production_db, stripe_api)
  • Table/endpoint names: Use exact names from source
  • Attribute names: Use conceptual entity field names (not source field names)

3. Handle Entity Resolution

When one conceptual entity comes from multiple sources:

  • Create separate entity_mapping for each source
  • Document the "golden source" (primary/authoritative)
  • Define merge logic for conflicting values
  • Use relationships to link source records

4. Abstract Transformations

Keep logical layer transformations simple:

  • Basic type conversions: OK
  • Simple calculations: OK
  • Complex business logic: Move to physical/product layer
  • Source-specific syntax: Abstract when possible

5. Track Data Lineage

Document the flow from source to entity:

  • Which source provides which attributes
  • What transformations are applied
  • Dependencies between entities
  • Update patterns and refresh logic

Example: E-commerce Platform

mml
logical_model "e_commerce_platform" {
  # Define data sources
  source "production_db" {
    type = "database"
    description = "PostgreSQL production database with transactional data"
  }

  source "stripe_api" {
    type = "api"
    description = "Stripe API for payment and subscription data"
  }

  source "segment_stream" {
    type = "stream"
    description = "User behavior events via Segment/Kafka"
  }

  # Map Customer entity to production database
  entity_mapping "platform.customer" {
    source = "production_db"
    table = "users"
    description = "Primary customer data from users table"

    attribute "customer_id" {
      source_field = "id"
    }

    attribute "email" {
      source_field = "email"
    }

    attribute "full_name" {
      source_fields = ["first_name", "last_name"]
      transformation = "CONCAT(first_name, ' ', last_name)"
    }

    attribute "created_at" {
      source_field = "created_at"
    }

    attribute "account_status" {
      source_field = "status"
      transformation = "CASE WHEN status = 1 THEN 'active' WHEN status = 0 THEN 'inactive' END"
    }
  }

  # Map Subscription entity to Stripe API
  entity_mapping "platform.subscription" {
    source = "stripe_api"
    endpoint = "subscriptions"
    description = "Subscription data from Stripe"

    attribute "subscription_id" {
      source_field = "id"
    }

    attribute "customer_id" {
      source_field = "customer"
      description = "Links to Stripe customer ID"
    }

    attribute "plan_id" {
      source_field = "items.data[0].plan.id"
      description = "First plan item in subscription"
    }

    attribute "status" {
      source_field = "status"
    }

    attribute "current_period_start" {
      source_field = "current_period_start"
      transformation = "TIMESTAMP_SECONDS(current_period_start)"
    }

    attribute "mrr" {
      source_field = "plan.amount"
      transformation = "plan.amount / 100.0"
      description = "Monthly recurring revenue in dollars"
    }
  }

  # Map Order entity to production database
  entity_mapping "sales.order" {
    source = "production_db"
    table = "orders"
    description = "Order transactions"

    attribute "order_id" {
      source_field = "id"
    }

    attribute "customer_id" {
      source_field = "user_id"
      description = "FK to users table"
    }

    attribute "order_date" {
      source_field = "created_at"
    }

    attribute "total_amount" {
      source_field = "total_cents"
      transformation = "total_cents / 100.0"
    }

    attribute "status" {
      source_field = "order_status"
    }
  }

  # Map Payment entity to Stripe API
  entity_mapping "sales.payment" {
    source = "stripe_api"
    endpoint = "charges"
    description = "Payment transactions from Stripe"

    attribute "payment_id" {
      source_field = "id"
    }

    attribute "amount" {
      source_field = "amount"
      transformation = "amount / 100.0"
    }

    attribute "currency" {
      source_field = "currency"
      transformation = "UPPER(currency)"
    }

    attribute "status" {
      source_field = "status"
    }

    attribute "created_at" {
      source_field = "created"
      transformation = "TIMESTAMP_SECONDS(created)"
    }
  }

  # Map Event entity to stream
  entity_mapping "analytics.event" {
    source = "segment_stream"
    topic = "user_events"
    description = "User behavior events from Segment"

    attribute "event_id" {
      source_field = "messageId"
    }

    attribute "user_id" {
      source_field = "userId"
    }

    attribute "event_name" {
      source_field = "event"
    }

    attribute "timestamp" {
      source_field = "timestamp"
      transformation = "PARSE_TIMESTAMP_ISO8601(timestamp)"
    }

    attribute "properties" {
      source_field = "properties"
      description = "JSON blob of event properties"
    }
  }

  # Define relationships
  relationship "order_to_customer" {
    from = "sales.order"
    to = "platform.customer"
    foreign_key = "customer_id"
    description = "FK: orders.user_id -> users.id"
  }

  relationship "subscription_to_customer" {
    from = "platform.subscription"
    to = "platform.customer"
    foreign_key = "customer_id"
    description = "Links Stripe subscription to platform customer"
  }

  relationship "payment_to_order" {
    from = "sales.payment"
    to = "sales.order"
    join_condition = "charges.metadata.order_id = orders.id"
    description = "Links Stripe payment to order via metadata"
  }

  relationship "event_to_customer" {
    from = "analytics.event"
    to = "platform.customer"
    foreign_key = "user_id"
    description = "Links events to users"
  }
}

Common Patterns

Pattern 1: Multi-Source Entity Resolution

When a single conceptual entity comes from multiple sources:

mml
logical_model "customer_resolution" {
  source "production_db" {
    type = "database"
  }

  source "crm_api" {
    type = "api"
  }

  # Primary source
  entity_mapping "platform.customer" {
    source = "production_db"
    table = "users"
    description = "Primary customer data (golden source)"

    attribute "customer_id" { source_field = "id" }
    attribute "email" { source_field = "email" }
    attribute "created_at" { source_field = "created_at" }
  }

  # Secondary source for enrichment
  entity_mapping "platform.customer" {
    source = "crm_api"
    endpoint = "contacts"
    description = "Additional customer attributes from CRM"

    attribute "customer_id" {
      source_field = "external_id"
      description = "Maps to production DB ID"
    }
    attribute "company" { source_field = "company_name" }
    attribute "industry" { source_field = "industry" }
    attribute "deal_size" { source_field = "annual_contract_value" }
  }

  # Define resolution relationship
  relationship "customer_to_crm" {
    from = "platform.customer"  # production_db
    to = "platform.customer"    # crm_api
    join_condition = "users.id = contacts.external_id"
    description = "Resolve customers across systems"
  }
}

Pattern 2: Derived Metrics from Multiple Sources

mml
logical_model "revenue_metrics" {
  source "orders_db" {
    type = "database"
  }

  source "stripe_api" {
    type = "api"
  }

  # Map base order entity
  entity_mapping "sales.order" {
    source = "orders_db"
    table = "orders"
  }

  # Map payment entity
  entity_mapping "sales.payment" {
    source = "stripe_api"
    endpoint = "charges"
  }

  # Revenue metric derived from both
  entity_mapping "sales.revenue" {
    description = "Calculated revenue metric"

    attribute "total_revenue" {
      sources = ["sales.order", "sales.payment"]
      transformation = """
        SELECT
          SUM(orders.total_amount) as order_revenue,
          SUM(charges.amount) as payment_revenue,
          -- Reconcile differences
          SUM(orders.total_amount) - SUM(charges.amount) as variance
        FROM orders
        LEFT JOIN charges ON charges.metadata.order_id = orders.id
      """
    }
  }
}

Pattern 3: Slowly Changing Dimensions

Track historical changes to dimension attributes:

mml
logical_model "product_history" {
  source "production_db" {
    type = "database"
  }

  entity_mapping "product.product" {
    source = "production_db"
    table = "products"
    description = "Current product catalog"

    attribute "product_id" { source_field = "id" }
    attribute "name" { source_field = "name" }
    attribute "price" { source_field = "current_price" }
  }

  # Historical version tracking
  entity_mapping "product.product_history" {
    source = "production_db"
    table = "product_history"
    description = "SCD Type 2: Historical product changes"

    attribute "product_id" { source_field = "product_id" }
    attribute "name" { source_field = "name" }
    attribute "price" { source_field = "price" }
    attribute "valid_from" { source_field = "valid_from" }
    attribute "valid_to" { source_field = "valid_to" }
    attribute "is_current" { source_field = "is_current" }
  }
}

Pattern 4: Event Stream Processing

mml
logical_model "event_processing" {
  source "kafka_stream" {
    type = "stream"
    description = "Real-time user events"
  }

  # Raw events
  entity_mapping "analytics.event" {
    source = "kafka_stream"
    topic = "events"

    attribute "event_id" { source_field = "id" }
    attribute "user_id" { source_field = "user_id" }
    attribute "event_type" { source_field = "type" }
    attribute "timestamp" { source_field = "timestamp" }
  }

  # Sessionized events (derived)
  entity_mapping "analytics.session" {
    source = "kafka_stream"
    topic = "events"
    description = "Derived sessions from event stream"

    attribute "session_id" {
      transformation = """
        SESSION_WINDOW(user_id, timestamp, INTERVAL '30' MINUTE)
      """
    }

    attribute "user_id" { source_field = "user_id" }

    attribute "session_start" {
      transformation = "MIN(timestamp) OVER (PARTITION BY session_id)"
    }

    attribute "session_end" {
      transformation = "MAX(timestamp) OVER (PARTITION BY session_id)"
    }

    attribute "event_count" {
      transformation = "COUNT(*) OVER (PARTITION BY session_id)"
    }
  }
}

Physical Model Generation

Modality automatically generates a physical model from your logical model's entity-source mappings. When you export your model to Git, the physical model is created with:

Auto-Generated Elements

  1. Tables: One table per entity-source mapping
  2. Primary Keys: Auto-generated {entity}_id field for each table
  3. Foreign Keys: Created from conceptual model relationships
  4. Descriptions: Pulled from conceptual entity descriptions

Example Flow

Conceptual Model:

mml
domain "Sales" {
  entity "Customer" {
    type = "entity"
    description = "Customer master data"
    has-many "Sales.Order"
  }

  entity "Order" {
    type = "entity"
    description = "Customer orders"
    belongs-to "Sales.Customer"
  }
}

Logical Model:

mml
source "PostgreSQL_Production" {
  type = "postgres"

  entity "Sales.Customer" {
    table = "users"
  }

  entity "Sales.Order" {
    table = "orders"
  }
}

Auto-Generated Physical Model:

mml
physical_model {
  table "customer" {
    entity_group = "sales"
    entity = "entity_customer"
    source = "source_postgresql_production"
    description = "Customer master data"

    field "entity_customer_id" {
      type = "uuid"
      nullable = false
      primary_key = true
    }
  }

  table "order" {
    entity_group = "sales"
    entity = "entity_order"
    source = "source_postgresql_production"
    description = "Customer orders"

    field "entity_order_id" {
      type = "uuid"
      nullable = false
      primary_key = true
    }

    # Auto-generated from belongs-to relationship
    field "customer_id" {
      type = "uuid"
      nullable = true
      description = "Foreign key to customer"
      foreign_key {
        table = "customer"
        field = "entity_customer_id"
        on_delete = "CASCADE"
      }
    }
  }
}

You can then enhance the physical model by adding:

  • Additional fields not captured in conceptual/logical models
  • Indexes for performance optimization
  • Physical attributes (partitioning, retention policies)
  • PII and sensitivity markings

See the Physical Model Block documentation for details.

Next Steps

Released under the MIT License.