Estimating

Accurate estimation is critical for planning data platform development. Modality's estimation framework helps you forecast development effort for building entities, integrating sources, and creating reports.

Overview

Modality provides three types of estimation:

Entity Estimates: Effort to build data entities (tables, views, models)
Source Estimates: Effort to integrate data sources
Report Estimates: Effort to build dashboards and reports

Each estimate automatically calculates development hours based on technical attributes, with optional manual overrides.

Entity Estimation

Entity estimates calculate the effort required to build a data entity based on data quality, volume, complexity, and transformation requirements.

Attributes

Required Attributes

data_quality: Quality of source data
- low: Significant cleaning and validation needed (+24 hours)
- medium: Some cleanup required (+8 hours)
- high: Clean, well-structured data (+0 hours)
data_volume: Amount of data to process
- small: < 1M records (+0 hours)
- medium: 1M - 10M records (+8 hours)
- large: 10M - 100M records (+16 hours)
- very-large: > 100M records (+32 hours)
scd_type: Slowly changing dimension pattern
- type-0: No history tracking (+0 hours)
- type-1: Overwrite changes (+8 hours)
- type-2: Full history tracking (+24 hours)
- type-3: Limited history (+16 hours)
business_rules_complexity: Business logic complexity (base hours)
- simple: Straightforward transformations (40 hours base)
- moderate: Some complex rules (80 hours base)
- complex: Extensive business logic (120 hours base)
requires_resolution: Multi-source entity resolution needed
- true: Entity resolution required (+16 hours)
- false: Single source or no resolution needed (+0 hours)
derivation_count: Number of derived/calculated fields
- Each derivation adds 4 hours
kpi_count: Number of KPIs calculated from this entity
- Each KPI adds 4 hours
testing_effort: Hours allocated for testing
- User-defined value added to total

Optional Attributes

manual_estimate_hours: Override automatic calculation
notes: Estimation assumptions and notes

Estimation Formula

total_hours =
  base_hours (from business_rules_complexity) +
  data_quality_hours +
  data_volume_hours +
  scd_type_hours +
  (requires_resolution ? 16 : 0) +
  (derivation_count * 4) +
  (kpi_count * 4) +
  testing_effort

Complexity Scoring

Modality also calculates a complexity score (low/medium/high) based on weighted points:

Factor	Points
Data quality: low	3
Data quality: medium	2
Data quality: high	1
Data volume: very-large	3
Data volume: large	2
Data volume: medium	1
SCD type-2	3
SCD type-3	2
SCD type-1	1
Business rules: complex	3
Business rules: moderate	2
Business rules: simple	1
Requires resolution	2
Derivations > 5	2
Derivations 2-5	1
KPIs > 5	2
KPIs 2-5	1

Complexity levels:

Score ≤ 6: low
Score 7-12: medium
Score ≥ 13: high

MML Syntax

mml

entity_estimate "Domain.Entity" {
  data_quality = "low" | "medium" | "high"
  data_volume = "small" | "medium" | "large" | "very-large"
  scd_type = "type-0" | "type-1" | "type-2" | "type-3"
  business_rules_complexity = "simple" | "moderate" | "complex"
  requires_resolution = true | false
  derivation_count = 5
  kpi_count = 3
  testing_effort = 8

  # Optional
  manual_estimate_hours = 150  # Override calculation
  notes = "Estimation assumptions..."
}

Examples

Simple Entity

mml

entity_estimate "Sales.Customer" {
  data_quality = "high"           # +0 hours
  data_volume = "medium"          # +8 hours
  scd_type = "type-1"             # +8 hours
  business_rules_complexity = "simple"  # 40 hours base
  requires_resolution = false     # +0 hours
  derivation_count = 2            # +8 hours (2 * 4)
  kpi_count = 1                   # +4 hours
  testing_effort = 8              # +8 hours

  # Total: 40 + 0 + 8 + 8 + 0 + 8 + 4 + 8 = 76 hours
}

Complex Entity

mml

entity_estimate "Analytics.Customer360" {
  data_quality = "low"            # +24 hours (needs cleaning)
  data_volume = "very-large"      # +32 hours (>100M records)
  scd_type = "type-2"             # +24 hours (full history)
  business_rules_complexity = "complex"  # 120 hours base
  requires_resolution = true      # +16 hours (multi-source)
  derivation_count = 15           # +60 hours (15 * 4)
  kpi_count = 8                   # +32 hours (8 * 4)
  testing_effort = 16             # +16 hours

  notes = "Requires resolution across CRM, billing, and support systems"

  # Total: 120 + 24 + 32 + 24 + 16 + 60 + 32 + 16 = 324 hours
  # Complexity: high (score 17+)
}

Source Estimation

Source estimates calculate the effort required to integrate a data source based on connectivity, transformation, and refresh requirements.

Attributes

Required Attributes

source_complexity: Technical integration complexity (base hours)
- simple: Direct database connection, standard schema (4 hours base)
- moderate: API integration, some transformation needed (8 hours base)
- complex: Custom connectors, complex protocols (16 hours base)
data_volume: Amount of data to ingest
- small: < 1GB (+ 1 hour)
- medium: 1GB - 100GB (+2 hours)
- large: 100GB - 1TB (+4 hours)
- very-large: > 1TB (+8 hours)
refresh_frequency: How often data is refreshed
- monthly: (+1 hour)
- weekly: (+1 hour)
- daily: (+2 hours)
- hourly: (+4 hours)
- real-time: Streaming setup (+12 hours)
requires_auth: Authentication/authorization needed
- true: OAuth, API keys, complex auth (+4 hours)
- false: Direct access (+0 hours)
requires_transformation: Significant data transformation needed
- true: Schema mapping, denormalization (+6 hours)
- false: Direct copy (+0 hours)

Optional Attributes

manual_estimate_hours: Override automatic calculation
notes: Integration assumptions and notes

Estimation Formula

total_hours =
  base_hours (from source_complexity) +
  data_volume_hours +
  refresh_frequency_hours +
  (requires_auth ? 4 : 0) +
  (requires_transformation ? 6 : 0)

MML Syntax

mml

source_estimate "SourceName" {
  source_complexity = "simple" | "moderate" | "complex"
  data_volume = "small" | "medium" | "large" | "very-large"
  refresh_frequency = "real-time" | "hourly" | "daily" | "weekly" | "monthly"
  requires_auth = true | false
  requires_transformation = true | false

  # Optional
  manual_estimate_hours = 30
  notes = "Estimation assumptions..."
}

Examples

Simple Database Source

mml

source_estimate "PostgreSQL_Production" {
  source_complexity = "simple"     # 4 hours base
  data_volume = "medium"           # +2 hours
  refresh_frequency = "daily"      # +2 hours
  requires_auth = false            # +0 hours
  requires_transformation = false  # +0 hours

  # Total: 4 + 2 + 2 + 0 + 0 = 8 hours
}

Complex API Source

mml

source_estimate "Salesforce_API" {
  source_complexity = "complex"    # 16 hours base (custom connector)
  data_volume = "large"            # +4 hours
  refresh_frequency = "hourly"     # +4 hours
  requires_auth = true             # +4 hours (OAuth)
  requires_transformation = true   # +6 hours (schema mapping)

  notes = "OAuth refresh token management, rate limiting, pagination"

  # Total: 16 + 4 + 4 + 4 + 6 = 34 hours
}

Real-time Streaming Source

mml

source_estimate "Kafka_Events" {
  source_complexity = "moderate"   # 8 hours base
  data_volume = "very-large"       # +8 hours
  refresh_frequency = "real-time"  # +12 hours (streaming)
  requires_auth = true             # +4 hours
  requires_transformation = true   # +6 hours

  notes = "Kafka consumer setup, schema registry, dead letter queue"

  # Total: 8 + 8 + 12 + 4 + 6 = 38 hours
}

Report Estimation

Report estimates calculate the effort required to build dashboards and reports based on complexity, visualizations, and interactivity.

Attributes

Required Attributes

report_complexity: Overall report complexity (base hours)
- simple: Single-page, few metrics (40 hours / 1 week)
- moderate: Multi-page, standard visualizations (80 hours / 2 weeks)
- complex: Advanced features, custom views (120 hours / 3 weeks)
visualization_count: Number of charts/visualizations
- Each visualization adds 4 hours
interactivity_level: User interaction complexity
- static: No interaction (+0 hours)
- basic: Filters and drill-downs (+8 hours)
- advanced: Dynamic parameters, cross-filtering (+24 hours)
requires_custom_design: Custom UI/UX design needed
- true: Brand-specific design (+16 hours)
- false: Standard templates (+0 hours)

Optional Attributes

manual_estimate_hours: Override automatic calculation
notes: Design and feature assumptions

Estimation Formula

total_hours =
  base_hours (from report_complexity) +
  (visualization_count * 4) +
  interactivity_hours +
  (requires_custom_design ? 16 : 0)

MML Syntax

mml

report_estimate "ProductName.ReportName" {
  report_complexity = "simple" | "moderate" | "complex"
  visualization_count = 8
  interactivity_level = "static" | "basic" | "advanced"
  requires_custom_design = true | false

  # Optional
  manual_estimate_hours = 120
  notes = "Design assumptions..."
}

Examples

Simple Dashboard

mml

report_estimate "Sales Analytics.Revenue Dashboard" {
  report_complexity = "simple"     # 40 hours base
  visualization_count = 5          # +20 hours (5 * 4)
  interactivity_level = "basic"    # +8 hours
  requires_custom_design = false   # +0 hours

  # Total: 40 + 20 + 8 + 0 = 68 hours
}

Complex Executive Dashboard

mml

report_estimate "Executive Dashboard.Company KPIs" {
  report_complexity = "complex"    # 120 hours base
  visualization_count = 15         # +60 hours (15 * 4)
  interactivity_level = "advanced" # +24 hours
  requires_custom_design = true    # +16 hours

  notes = "Custom branding, real-time updates, mobile responsive"

  # Total: 120 + 60 + 24 + 16 = 220 hours
}

Complete Example

mml

# Conceptual entities
domain "Sales" {
  entity "Customer" {
    type = "entity"
    description = "Customer master data"
  }

  entity "Order" {
    type = "entity"
    description = "Sales orders"
  }
}

# Entity estimates
entity_estimate "Sales.Customer" {
  data_quality = "medium"
  data_volume = "large"
  scd_type = "type-2"
  business_rules_complexity = "moderate"
  requires_resolution = true
  derivation_count = 8
  kpi_count = 5
  testing_effort = 16

  notes = "Resolution across CRM and billing systems"
}

entity_estimate "Sales.Order" {
  data_quality = "high"
  data_volume = "very-large"
  scd_type = "type-1"
  business_rules_complexity = "moderate"
  requires_resolution = false
  derivation_count = 5
  kpi_count = 3
  testing_effort = 8
}

# Source estimates
source_estimate "PostgreSQL_Production" {
  source_complexity = "simple"
  data_volume = "large"
  refresh_frequency = "hourly"
  requires_auth = false
  requires_transformation = false
}

source_estimate "Salesforce_API" {
  source_complexity = "complex"
  data_volume = "medium"
  refresh_frequency = "daily"
  requires_auth = true
  requires_transformation = true

  notes = "OAuth setup, rate limiting considerations"
}

# Data product
data_product "Sales Analytics" {
  owner = "sales-team"

  report "Revenue Dashboard" {
    metric "Total Revenue" {
      calculation = "SUM(Order.amount)"
      uses "Sales.Order"
    }
  }
}

# Report estimate
report_estimate "Sales Analytics.Revenue Dashboard" {
  report_complexity = "moderate"
  visualization_count = 10
  interactivity_level = "advanced"
  requires_custom_design = false
}

Best Practices

1. Estimate Early and Often

Create estimates during planning, before development starts:

mml

# Estimate during architecture design
entity_estimate "Planned.NewEntity" {
  business_rules_complexity = "moderate"
  data_quality = "medium"
  # ... other attributes
}

2. Use Manual Overrides for Known Complexity

If you have prior experience, override the formula:

mml

entity_estimate "Sales.Customer" {
  # Automatic calculation
  data_quality = "high"
  data_volume = "large"
  # ... results in 84 hours

  # But based on prior experience:
  manual_estimate_hours = 120
  notes = "Previous similar entity took 120 hours due to legacy system integration"
}

3. Document Assumptions

Always explain your estimation reasoning:

mml

source_estimate "Legacy_Mainframe" {
  source_complexity = "complex"
  manual_estimate_hours = 200

  notes = """
    Complex COBOL extraction required.
    Need to build custom connector.
    Data quality issues discovered in analysis.
    Requires coordination with mainframe team.
  """
}

4. Track Actuals vs. Estimates

Use notes to record actual time spent:

mml

entity_estimate "Sales.Customer" {
  estimated_hours = 120

  notes = """
    Initial estimate: 120 hours
    Actual time: 145 hours
    Variance: +25 hours due to unexpected data quality issues
  """
}

5. Review and Refine

Update estimates as you learn more:

mml

entity_estimate "Analytics.Customer360" {
  # Initial estimate
  business_rules_complexity = "moderate"  # 80 hours

  notes = """
    Week 1: Discovered additional business rules
    Updated complexity to 'complex' (120 hours base)
    Updated derivation_count from 8 to 15
    New estimate: 250 hours
  """
}

Summary Statistics

Modality automatically calculates project-level statistics:

Total hours: Sum of all estimates
Entity breakdown: Hours by entity
Source breakdown: Hours by source
Report breakdown: Hours by report
Complexity distribution: % low/medium/high
Coverage: % of model estimated
Incomplete estimates: Items missing estimates

These summaries help with:

Sprint planning
Team capacity planning
Budget forecasting
Risk identification (high complexity items)
Coverage tracking (what's not estimated)

Estimating ​

Overview ​

Entity Estimation ​

Attributes ​

Required Attributes ​

Optional Attributes ​

Estimation Formula ​

Complexity Scoring ​

MML Syntax ​

Examples ​

Simple Entity ​

Complex Entity ​

Source Estimation ​

Attributes ​

Required Attributes ​

Optional Attributes ​

Estimation Formula ​

MML Syntax ​

Examples ​

Simple Database Source ​

Complex API Source ​

Real-time Streaming Source ​

Report Estimation ​

Attributes ​

Required Attributes ​

Optional Attributes ​

Estimation Formula ​

MML Syntax ​

Examples ​

Simple Dashboard ​

Complex Executive Dashboard ​

Complete Example ​

Best Practices ​

1. Estimate Early and Often ​

2. Use Manual Overrides for Known Complexity ​

3. Document Assumptions ​

4. Track Actuals vs. Estimates ​

5. Review and Refine ​

Summary Statistics ​

Estimating

Overview

Entity Estimation

Attributes

Required Attributes

Optional Attributes

Estimation Formula

Complexity Scoring

MML Syntax

Examples

Simple Entity

Complex Entity

Source Estimation

Attributes

Required Attributes

Optional Attributes

Estimation Formula

MML Syntax

Examples

Simple Database Source

Complex API Source

Real-time Streaming Source

Report Estimation

Attributes

Required Attributes

Optional Attributes

Estimation Formula

MML Syntax

Examples

Simple Dashboard

Complex Executive Dashboard

Complete Example

Best Practices

1. Estimate Early and Often

2. Use Manual Overrides for Known Complexity

3. Document Assumptions

4. Track Actuals vs. Estimates

5. Review and Refine

Summary Statistics