Logical Modeling
The logical layer bridges the gap between your conceptual business model and physical data sources. It defines where data comes from and how it maps to your conceptual entities.
Core Principles
1. Source-First Mapping
Logical models connect conceptual entities to real data sources:
- Define each data source (database, API, stream, file)
- Map source tables/endpoints to conceptual entities
- Document transformation logic and business rules
- Handle multiple sources for the same entity
2. Independence from Physical Schema
Focus on logical connections, not implementation details:
- Abstract away database-specific syntax
- Define mappings in terms of business concepts
- Allow flexibility in physical implementation
- Support multiple source systems for one entity
3. Data Lineage
Track data flow from source to consumption:
- Document origin of each field
- Capture transformation steps
- Enable impact analysis
- Support data governance
Components
Sources
A source represents an external system providing data to your platform.
Source Types:
Database - Relational or NoSQL databases
- Example: PostgreSQL, MySQL, MongoDB
- Characteristics: Structured, queryable, transactional
- Access pattern: SQL queries or database-specific APIs
API - REST, GraphQL, or other web services
- Example: Stripe API, Salesforce API, custom microservices
- Characteristics: Request/response, rate-limited, versioned
- Access pattern: HTTP requests with authentication
Stream - Event streams or message queues
- Example: Kafka, Kinesis, Pub/Sub
- Characteristics: Real-time, append-only, high-volume
- Access pattern: Consumer subscriptions
File - Batch files or object storage
- Example: S3, GCS, SFTP
- Characteristics: Batch-oriented, scheduled, varied formats
- Access pattern: File readers, periodic ingestion
MML Syntax:
logical_model "my_platform" {
source "production_db" {
type = "database"
description = "Main PostgreSQL production database"
}
source "stripe_api" {
type = "api"
description = "Stripe payment processing API"
}
source "events_stream" {
type = "stream"
description = "User behavior events from Kafka"
}
source "csv_exports" {
type = "file"
description = "Daily CSV exports from legacy system"
}
}Entity Mappings
Entity mappings connect conceptual entities to source tables or endpoints.
Purpose:
- Link conceptual entities to physical data
- Define which source provides each entity's data
- Support multiple sources for a single entity (entity resolution)
- Document table/endpoint names
MML Syntax:
logical_model "e_commerce" {
source "production_db" {
type = "database"
description = "PostgreSQL production database"
}
source "crm_api" {
type = "api"
description = "Salesforce CRM API"
}
# Map conceptual entity to database table
entity_mapping "platform.customer" {
source = "production_db"
table = "users"
description = "Customer data from main users table"
}
# Map same entity to additional source (entity resolution)
entity_mapping "platform.customer" {
source = "crm_api"
endpoint = "contacts"
description = "Additional customer data from CRM"
}
# Map order entity
entity_mapping "sales.order" {
source = "production_db"
table = "orders"
description = "Order transactions"
}
}Attributes and Transformations
Attributes define how fields map from source to conceptual entity, including transformations.
Transformation Types:
Direct mapping - Field exists in source as-is
mmlattribute "email" { source_field = "email_address" }Type conversion - Change data type
mmlattribute "created_at" { source_field = "created_timestamp" transformation = "CAST(created_timestamp AS TIMESTAMP)" }Calculation - Derive from multiple fields
mmlattribute "full_name" { source_fields = ["first_name", "last_name"] transformation = "CONCAT(first_name, ' ', last_name)" }Lookup/Join - Reference data from another source
mmlattribute "country_name" { source_field = "country_code" transformation = "JOIN countries ON country_code = countries.code" }
MML Syntax:
entity_mapping "platform.customer" {
source = "production_db"
table = "users"
attribute "customer_id" {
source_field = "id"
description = "Primary key"
}
attribute "email" {
source_field = "email_address"
description = "User's email"
}
attribute "full_name" {
source_fields = ["first_name", "last_name"]
transformation = "CONCAT(first_name, ' ', last_name)"
description = "Concatenated full name"
}
attribute "created_at" {
source_field = "created_timestamp"
transformation = "CAST(created_timestamp AS TIMESTAMP)"
description = "Account creation timestamp"
}
attribute "lifetime_value" {
source_field = "ltv_cents"
transformation = "ltv_cents / 100.0"
description = "LTV in dollars (converted from cents)"
}
}Relationships
Logical relationships define foreign keys and join conditions between source tables.
Relationship Patterns:
Simple foreign key
mmlrelationship "order_to_customer" { from = "sales.order" to = "platform.customer" foreign_key = "customer_id" description = "Each order belongs to one customer" }Composite key
mmlrelationship "line_item_to_order" { from = "sales.line_item" to = "sales.order" foreign_keys = ["order_id", "order_version"] description = "Line items reference versioned orders" }Cross-source relationship
mmlrelationship "customer_to_crm_contact" { from = "platform.customer" # from production_db to = "platform.customer" # to crm_api join_condition = "users.email = contacts.email" description = "Match customers across systems via email" }
MML Syntax:
logical_model "e_commerce" {
# ... sources and mappings ...
relationship "order_to_customer" {
from = "sales.order"
to = "platform.customer"
foreign_key = "customer_id"
description = "FK: orders.customer_id -> users.id"
}
relationship "payment_to_order" {
from = "sales.payment"
to = "sales.order"
foreign_key = "order_id"
description = "FK: payments.order_id -> orders.id"
}
}Best Practices
1. Document Source Details
Provide context for each data source:
- Source system name and type
- Connection details (without credentials!)
- Update frequency and latency
- Data quality expectations
- Source owner/team
2. Use Clear Naming
- Source names: Match system names (production_db, stripe_api)
- Table/endpoint names: Use exact names from source
- Attribute names: Use conceptual entity field names (not source field names)
3. Handle Entity Resolution
When one conceptual entity comes from multiple sources:
- Create separate entity_mapping for each source
- Document the "golden source" (primary/authoritative)
- Define merge logic for conflicting values
- Use relationships to link source records
4. Abstract Transformations
Keep logical layer transformations simple:
- Basic type conversions: OK
- Simple calculations: OK
- Complex business logic: Move to physical/product layer
- Source-specific syntax: Abstract when possible
5. Track Data Lineage
Document the flow from source to entity:
- Which source provides which attributes
- What transformations are applied
- Dependencies between entities
- Update patterns and refresh logic
Example: E-commerce Platform
logical_model "e_commerce_platform" {
# Define data sources
source "production_db" {
type = "database"
description = "PostgreSQL production database with transactional data"
}
source "stripe_api" {
type = "api"
description = "Stripe API for payment and subscription data"
}
source "segment_stream" {
type = "stream"
description = "User behavior events via Segment/Kafka"
}
# Map Customer entity to production database
entity_mapping "platform.customer" {
source = "production_db"
table = "users"
description = "Primary customer data from users table"
attribute "customer_id" {
source_field = "id"
}
attribute "email" {
source_field = "email"
}
attribute "full_name" {
source_fields = ["first_name", "last_name"]
transformation = "CONCAT(first_name, ' ', last_name)"
}
attribute "created_at" {
source_field = "created_at"
}
attribute "account_status" {
source_field = "status"
transformation = "CASE WHEN status = 1 THEN 'active' WHEN status = 0 THEN 'inactive' END"
}
}
# Map Subscription entity to Stripe API
entity_mapping "platform.subscription" {
source = "stripe_api"
endpoint = "subscriptions"
description = "Subscription data from Stripe"
attribute "subscription_id" {
source_field = "id"
}
attribute "customer_id" {
source_field = "customer"
description = "Links to Stripe customer ID"
}
attribute "plan_id" {
source_field = "items.data[0].plan.id"
description = "First plan item in subscription"
}
attribute "status" {
source_field = "status"
}
attribute "current_period_start" {
source_field = "current_period_start"
transformation = "TIMESTAMP_SECONDS(current_period_start)"
}
attribute "mrr" {
source_field = "plan.amount"
transformation = "plan.amount / 100.0"
description = "Monthly recurring revenue in dollars"
}
}
# Map Order entity to production database
entity_mapping "sales.order" {
source = "production_db"
table = "orders"
description = "Order transactions"
attribute "order_id" {
source_field = "id"
}
attribute "customer_id" {
source_field = "user_id"
description = "FK to users table"
}
attribute "order_date" {
source_field = "created_at"
}
attribute "total_amount" {
source_field = "total_cents"
transformation = "total_cents / 100.0"
}
attribute "status" {
source_field = "order_status"
}
}
# Map Payment entity to Stripe API
entity_mapping "sales.payment" {
source = "stripe_api"
endpoint = "charges"
description = "Payment transactions from Stripe"
attribute "payment_id" {
source_field = "id"
}
attribute "amount" {
source_field = "amount"
transformation = "amount / 100.0"
}
attribute "currency" {
source_field = "currency"
transformation = "UPPER(currency)"
}
attribute "status" {
source_field = "status"
}
attribute "created_at" {
source_field = "created"
transformation = "TIMESTAMP_SECONDS(created)"
}
}
# Map Event entity to stream
entity_mapping "analytics.event" {
source = "segment_stream"
topic = "user_events"
description = "User behavior events from Segment"
attribute "event_id" {
source_field = "messageId"
}
attribute "user_id" {
source_field = "userId"
}
attribute "event_name" {
source_field = "event"
}
attribute "timestamp" {
source_field = "timestamp"
transformation = "PARSE_TIMESTAMP_ISO8601(timestamp)"
}
attribute "properties" {
source_field = "properties"
description = "JSON blob of event properties"
}
}
# Define relationships
relationship "order_to_customer" {
from = "sales.order"
to = "platform.customer"
foreign_key = "customer_id"
description = "FK: orders.user_id -> users.id"
}
relationship "subscription_to_customer" {
from = "platform.subscription"
to = "platform.customer"
foreign_key = "customer_id"
description = "Links Stripe subscription to platform customer"
}
relationship "payment_to_order" {
from = "sales.payment"
to = "sales.order"
join_condition = "charges.metadata.order_id = orders.id"
description = "Links Stripe payment to order via metadata"
}
relationship "event_to_customer" {
from = "analytics.event"
to = "platform.customer"
foreign_key = "user_id"
description = "Links events to users"
}
}Common Patterns
Pattern 1: Multi-Source Entity Resolution
When a single conceptual entity comes from multiple sources:
logical_model "customer_resolution" {
source "production_db" {
type = "database"
}
source "crm_api" {
type = "api"
}
# Primary source
entity_mapping "platform.customer" {
source = "production_db"
table = "users"
description = "Primary customer data (golden source)"
attribute "customer_id" { source_field = "id" }
attribute "email" { source_field = "email" }
attribute "created_at" { source_field = "created_at" }
}
# Secondary source for enrichment
entity_mapping "platform.customer" {
source = "crm_api"
endpoint = "contacts"
description = "Additional customer attributes from CRM"
attribute "customer_id" {
source_field = "external_id"
description = "Maps to production DB ID"
}
attribute "company" { source_field = "company_name" }
attribute "industry" { source_field = "industry" }
attribute "deal_size" { source_field = "annual_contract_value" }
}
# Define resolution relationship
relationship "customer_to_crm" {
from = "platform.customer" # production_db
to = "platform.customer" # crm_api
join_condition = "users.id = contacts.external_id"
description = "Resolve customers across systems"
}
}Pattern 2: Derived Metrics from Multiple Sources
logical_model "revenue_metrics" {
source "orders_db" {
type = "database"
}
source "stripe_api" {
type = "api"
}
# Map base order entity
entity_mapping "sales.order" {
source = "orders_db"
table = "orders"
}
# Map payment entity
entity_mapping "sales.payment" {
source = "stripe_api"
endpoint = "charges"
}
# Revenue metric derived from both
entity_mapping "sales.revenue" {
description = "Calculated revenue metric"
attribute "total_revenue" {
sources = ["sales.order", "sales.payment"]
transformation = """
SELECT
SUM(orders.total_amount) as order_revenue,
SUM(charges.amount) as payment_revenue,
-- Reconcile differences
SUM(orders.total_amount) - SUM(charges.amount) as variance
FROM orders
LEFT JOIN charges ON charges.metadata.order_id = orders.id
"""
}
}
}Pattern 3: Slowly Changing Dimensions
Track historical changes to dimension attributes:
logical_model "product_history" {
source "production_db" {
type = "database"
}
entity_mapping "product.product" {
source = "production_db"
table = "products"
description = "Current product catalog"
attribute "product_id" { source_field = "id" }
attribute "name" { source_field = "name" }
attribute "price" { source_field = "current_price" }
}
# Historical version tracking
entity_mapping "product.product_history" {
source = "production_db"
table = "product_history"
description = "SCD Type 2: Historical product changes"
attribute "product_id" { source_field = "product_id" }
attribute "name" { source_field = "name" }
attribute "price" { source_field = "price" }
attribute "valid_from" { source_field = "valid_from" }
attribute "valid_to" { source_field = "valid_to" }
attribute "is_current" { source_field = "is_current" }
}
}Pattern 4: Event Stream Processing
logical_model "event_processing" {
source "kafka_stream" {
type = "stream"
description = "Real-time user events"
}
# Raw events
entity_mapping "analytics.event" {
source = "kafka_stream"
topic = "events"
attribute "event_id" { source_field = "id" }
attribute "user_id" { source_field = "user_id" }
attribute "event_type" { source_field = "type" }
attribute "timestamp" { source_field = "timestamp" }
}
# Sessionized events (derived)
entity_mapping "analytics.session" {
source = "kafka_stream"
topic = "events"
description = "Derived sessions from event stream"
attribute "session_id" {
transformation = """
SESSION_WINDOW(user_id, timestamp, INTERVAL '30' MINUTE)
"""
}
attribute "user_id" { source_field = "user_id" }
attribute "session_start" {
transformation = "MIN(timestamp) OVER (PARTITION BY session_id)"
}
attribute "session_end" {
transformation = "MAX(timestamp) OVER (PARTITION BY session_id)"
}
attribute "event_count" {
transformation = "COUNT(*) OVER (PARTITION BY session_id)"
}
}
}Physical Model Generation
Modality automatically generates a physical model from your logical model's entity-source mappings. When you export your model to Git, the physical model is created with:
Auto-Generated Elements
- Tables: One table per entity-source mapping
- Primary Keys: Auto-generated
{entity}_idfield for each table - Foreign Keys: Created from conceptual model relationships
- Descriptions: Pulled from conceptual entity descriptions
Example Flow
Conceptual Model:
domain "Sales" {
entity "Customer" {
type = "entity"
description = "Customer master data"
has-many "Sales.Order"
}
entity "Order" {
type = "entity"
description = "Customer orders"
belongs-to "Sales.Customer"
}
}Logical Model:
source "PostgreSQL_Production" {
type = "postgres"
entity "Sales.Customer" {
table = "users"
}
entity "Sales.Order" {
table = "orders"
}
}Auto-Generated Physical Model:
physical_model {
table "customer" {
entity_group = "sales"
entity = "entity_customer"
source = "source_postgresql_production"
description = "Customer master data"
field "entity_customer_id" {
type = "uuid"
nullable = false
primary_key = true
}
}
table "order" {
entity_group = "sales"
entity = "entity_order"
source = "source_postgresql_production"
description = "Customer orders"
field "entity_order_id" {
type = "uuid"
nullable = false
primary_key = true
}
# Auto-generated from belongs-to relationship
field "customer_id" {
type = "uuid"
nullable = true
description = "Foreign key to customer"
foreign_key {
table = "customer"
field = "entity_customer_id"
on_delete = "CASCADE"
}
}
}
}You can then enhance the physical model by adding:
- Additional fields not captured in conceptual/logical models
- Indexes for performance optimization
- Physical attributes (partitioning, retention policies)
- PII and sensitivity markings
See the Physical Model Block documentation for details.
Next Steps
- Conceptual Modeling - Define business entities and domains
- Physical Model Block - Understanding the auto-generated physical schema
- Data Product Modeling - Create consumption-ready data products
- MML Reference - Complete syntax documentation
