Skip to content

Estimating

Accurate estimation is critical for planning data platform development. Modality's estimation framework helps you forecast development effort for building entities, integrating sources, and creating reports.

Overview

Modality provides three types of estimation:

  1. Entity Estimates: Effort to build data entities (tables, views, models)
  2. Source Estimates: Effort to integrate data sources
  3. Report Estimates: Effort to build dashboards and reports

Each estimate automatically calculates development hours based on technical attributes, with optional manual overrides.

Entity Estimation

Entity estimates calculate the effort required to build a data entity based on data quality, volume, complexity, and transformation requirements.

Attributes

Required Attributes

  • data_quality: Quality of source data

    • low: Significant cleaning and validation needed (+24 hours)
    • medium: Some cleanup required (+8 hours)
    • high: Clean, well-structured data (+0 hours)
  • data_volume: Amount of data to process

    • small: < 1M records (+0 hours)
    • medium: 1M - 10M records (+8 hours)
    • large: 10M - 100M records (+16 hours)
    • very-large: > 100M records (+32 hours)
  • scd_type: Slowly changing dimension pattern

    • type-0: No history tracking (+0 hours)
    • type-1: Overwrite changes (+8 hours)
    • type-2: Full history tracking (+24 hours)
    • type-3: Limited history (+16 hours)
  • business_rules_complexity: Business logic complexity (base hours)

    • simple: Straightforward transformations (40 hours base)
    • moderate: Some complex rules (80 hours base)
    • complex: Extensive business logic (120 hours base)
  • requires_resolution: Multi-source entity resolution needed

    • true: Entity resolution required (+16 hours)
    • false: Single source or no resolution needed (+0 hours)
  • derivation_count: Number of derived/calculated fields

    • Each derivation adds 4 hours
  • kpi_count: Number of KPIs calculated from this entity

    • Each KPI adds 4 hours
  • testing_effort: Hours allocated for testing

    • User-defined value added to total

Optional Attributes

  • manual_estimate_hours: Override automatic calculation
  • notes: Estimation assumptions and notes

Estimation Formula

total_hours =
  base_hours (from business_rules_complexity) +
  data_quality_hours +
  data_volume_hours +
  scd_type_hours +
  (requires_resolution ? 16 : 0) +
  (derivation_count * 4) +
  (kpi_count * 4) +
  testing_effort

Complexity Scoring

Modality also calculates a complexity score (low/medium/high) based on weighted points:

FactorPoints
Data quality: low3
Data quality: medium2
Data quality: high1
Data volume: very-large3
Data volume: large2
Data volume: medium1
SCD type-23
SCD type-32
SCD type-11
Business rules: complex3
Business rules: moderate2
Business rules: simple1
Requires resolution2
Derivations > 52
Derivations 2-51
KPIs > 52
KPIs 2-51

Complexity levels:

  • Score ≤ 6: low
  • Score 7-12: medium
  • Score ≥ 13: high

MML Syntax

mml
entity_estimate "Domain.Entity" {
  data_quality = "low" | "medium" | "high"
  data_volume = "small" | "medium" | "large" | "very-large"
  scd_type = "type-0" | "type-1" | "type-2" | "type-3"
  business_rules_complexity = "simple" | "moderate" | "complex"
  requires_resolution = true | false
  derivation_count = 5
  kpi_count = 3
  testing_effort = 8

  # Optional
  manual_estimate_hours = 150  # Override calculation
  notes = "Estimation assumptions..."
}

Examples

Simple Entity

mml
entity_estimate "Sales.Customer" {
  data_quality = "high"           # +0 hours
  data_volume = "medium"          # +8 hours
  scd_type = "type-1"             # +8 hours
  business_rules_complexity = "simple"  # 40 hours base
  requires_resolution = false     # +0 hours
  derivation_count = 2            # +8 hours (2 * 4)
  kpi_count = 1                   # +4 hours
  testing_effort = 8              # +8 hours

  # Total: 40 + 0 + 8 + 8 + 0 + 8 + 4 + 8 = 76 hours
}

Complex Entity

mml
entity_estimate "Analytics.Customer360" {
  data_quality = "low"            # +24 hours (needs cleaning)
  data_volume = "very-large"      # +32 hours (>100M records)
  scd_type = "type-2"             # +24 hours (full history)
  business_rules_complexity = "complex"  # 120 hours base
  requires_resolution = true      # +16 hours (multi-source)
  derivation_count = 15           # +60 hours (15 * 4)
  kpi_count = 8                   # +32 hours (8 * 4)
  testing_effort = 16             # +16 hours

  notes = "Requires resolution across CRM, billing, and support systems"

  # Total: 120 + 24 + 32 + 24 + 16 + 60 + 32 + 16 = 324 hours
  # Complexity: high (score 17+)
}

Source Estimation

Source estimates calculate the effort required to integrate a data source based on connectivity, transformation, and refresh requirements.

Attributes

Required Attributes

  • source_complexity: Technical integration complexity (base hours)

    • simple: Direct database connection, standard schema (4 hours base)
    • moderate: API integration, some transformation needed (8 hours base)
    • complex: Custom connectors, complex protocols (16 hours base)
  • data_volume: Amount of data to ingest

    • small: < 1GB (+ 1 hour)
    • medium: 1GB - 100GB (+2 hours)
    • large: 100GB - 1TB (+4 hours)
    • very-large: > 1TB (+8 hours)
  • refresh_frequency: How often data is refreshed

    • monthly: (+1 hour)
    • weekly: (+1 hour)
    • daily: (+2 hours)
    • hourly: (+4 hours)
    • real-time: Streaming setup (+12 hours)
  • requires_auth: Authentication/authorization needed

    • true: OAuth, API keys, complex auth (+4 hours)
    • false: Direct access (+0 hours)
  • requires_transformation: Significant data transformation needed

    • true: Schema mapping, denormalization (+6 hours)
    • false: Direct copy (+0 hours)

Optional Attributes

  • manual_estimate_hours: Override automatic calculation
  • notes: Integration assumptions and notes

Estimation Formula

total_hours =
  base_hours (from source_complexity) +
  data_volume_hours +
  refresh_frequency_hours +
  (requires_auth ? 4 : 0) +
  (requires_transformation ? 6 : 0)

MML Syntax

mml
source_estimate "SourceName" {
  source_complexity = "simple" | "moderate" | "complex"
  data_volume = "small" | "medium" | "large" | "very-large"
  refresh_frequency = "real-time" | "hourly" | "daily" | "weekly" | "monthly"
  requires_auth = true | false
  requires_transformation = true | false

  # Optional
  manual_estimate_hours = 30
  notes = "Estimation assumptions..."
}

Examples

Simple Database Source

mml
source_estimate "PostgreSQL_Production" {
  source_complexity = "simple"     # 4 hours base
  data_volume = "medium"           # +2 hours
  refresh_frequency = "daily"      # +2 hours
  requires_auth = false            # +0 hours
  requires_transformation = false  # +0 hours

  # Total: 4 + 2 + 2 + 0 + 0 = 8 hours
}

Complex API Source

mml
source_estimate "Salesforce_API" {
  source_complexity = "complex"    # 16 hours base (custom connector)
  data_volume = "large"            # +4 hours
  refresh_frequency = "hourly"     # +4 hours
  requires_auth = true             # +4 hours (OAuth)
  requires_transformation = true   # +6 hours (schema mapping)

  notes = "OAuth refresh token management, rate limiting, pagination"

  # Total: 16 + 4 + 4 + 4 + 6 = 34 hours
}

Real-time Streaming Source

mml
source_estimate "Kafka_Events" {
  source_complexity = "moderate"   # 8 hours base
  data_volume = "very-large"       # +8 hours
  refresh_frequency = "real-time"  # +12 hours (streaming)
  requires_auth = true             # +4 hours
  requires_transformation = true   # +6 hours

  notes = "Kafka consumer setup, schema registry, dead letter queue"

  # Total: 8 + 8 + 12 + 4 + 6 = 38 hours
}

Report Estimation

Report estimates calculate the effort required to build dashboards and reports based on complexity, visualizations, and interactivity.

Attributes

Required Attributes

  • report_complexity: Overall report complexity (base hours)

    • simple: Single-page, few metrics (40 hours / 1 week)
    • moderate: Multi-page, standard visualizations (80 hours / 2 weeks)
    • complex: Advanced features, custom views (120 hours / 3 weeks)
  • visualization_count: Number of charts/visualizations

    • Each visualization adds 4 hours
  • interactivity_level: User interaction complexity

    • static: No interaction (+0 hours)
    • basic: Filters and drill-downs (+8 hours)
    • advanced: Dynamic parameters, cross-filtering (+24 hours)
  • requires_custom_design: Custom UI/UX design needed

    • true: Brand-specific design (+16 hours)
    • false: Standard templates (+0 hours)

Optional Attributes

  • manual_estimate_hours: Override automatic calculation
  • notes: Design and feature assumptions

Estimation Formula

total_hours =
  base_hours (from report_complexity) +
  (visualization_count * 4) +
  interactivity_hours +
  (requires_custom_design ? 16 : 0)

MML Syntax

mml
report_estimate "ProductName.ReportName" {
  report_complexity = "simple" | "moderate" | "complex"
  visualization_count = 8
  interactivity_level = "static" | "basic" | "advanced"
  requires_custom_design = true | false

  # Optional
  manual_estimate_hours = 120
  notes = "Design assumptions..."
}

Examples

Simple Dashboard

mml
report_estimate "Sales Analytics.Revenue Dashboard" {
  report_complexity = "simple"     # 40 hours base
  visualization_count = 5          # +20 hours (5 * 4)
  interactivity_level = "basic"    # +8 hours
  requires_custom_design = false   # +0 hours

  # Total: 40 + 20 + 8 + 0 = 68 hours
}

Complex Executive Dashboard

mml
report_estimate "Executive Dashboard.Company KPIs" {
  report_complexity = "complex"    # 120 hours base
  visualization_count = 15         # +60 hours (15 * 4)
  interactivity_level = "advanced" # +24 hours
  requires_custom_design = true    # +16 hours

  notes = "Custom branding, real-time updates, mobile responsive"

  # Total: 120 + 60 + 24 + 16 = 220 hours
}

Complete Example

mml
# Conceptual entities
domain "Sales" {
  entity "Customer" {
    type = "entity"
    description = "Customer master data"
  }

  entity "Order" {
    type = "entity"
    description = "Sales orders"
  }
}

# Entity estimates
entity_estimate "Sales.Customer" {
  data_quality = "medium"
  data_volume = "large"
  scd_type = "type-2"
  business_rules_complexity = "moderate"
  requires_resolution = true
  derivation_count = 8
  kpi_count = 5
  testing_effort = 16

  notes = "Resolution across CRM and billing systems"
}

entity_estimate "Sales.Order" {
  data_quality = "high"
  data_volume = "very-large"
  scd_type = "type-1"
  business_rules_complexity = "moderate"
  requires_resolution = false
  derivation_count = 5
  kpi_count = 3
  testing_effort = 8
}

# Source estimates
source_estimate "PostgreSQL_Production" {
  source_complexity = "simple"
  data_volume = "large"
  refresh_frequency = "hourly"
  requires_auth = false
  requires_transformation = false
}

source_estimate "Salesforce_API" {
  source_complexity = "complex"
  data_volume = "medium"
  refresh_frequency = "daily"
  requires_auth = true
  requires_transformation = true

  notes = "OAuth setup, rate limiting considerations"
}

# Data product
data_product "Sales Analytics" {
  owner = "sales-team"

  report "Revenue Dashboard" {
    metric "Total Revenue" {
      calculation = "SUM(Order.amount)"
      uses "Sales.Order"
    }
  }
}

# Report estimate
report_estimate "Sales Analytics.Revenue Dashboard" {
  report_complexity = "moderate"
  visualization_count = 10
  interactivity_level = "advanced"
  requires_custom_design = false
}

Best Practices

1. Estimate Early and Often

Create estimates during planning, before development starts:

mml
# Estimate during architecture design
entity_estimate "Planned.NewEntity" {
  business_rules_complexity = "moderate"
  data_quality = "medium"
  # ... other attributes
}

2. Use Manual Overrides for Known Complexity

If you have prior experience, override the formula:

mml
entity_estimate "Sales.Customer" {
  # Automatic calculation
  data_quality = "high"
  data_volume = "large"
  # ... results in 84 hours

  # But based on prior experience:
  manual_estimate_hours = 120
  notes = "Previous similar entity took 120 hours due to legacy system integration"
}

3. Document Assumptions

Always explain your estimation reasoning:

mml
source_estimate "Legacy_Mainframe" {
  source_complexity = "complex"
  manual_estimate_hours = 200

  notes = """
    Complex COBOL extraction required.
    Need to build custom connector.
    Data quality issues discovered in analysis.
    Requires coordination with mainframe team.
  """
}

4. Track Actuals vs. Estimates

Use notes to record actual time spent:

mml
entity_estimate "Sales.Customer" {
  estimated_hours = 120

  notes = """
    Initial estimate: 120 hours
    Actual time: 145 hours
    Variance: +25 hours due to unexpected data quality issues
  """
}

5. Review and Refine

Update estimates as you learn more:

mml
entity_estimate "Analytics.Customer360" {
  # Initial estimate
  business_rules_complexity = "moderate"  # 80 hours

  notes = """
    Week 1: Discovered additional business rules
    Updated complexity to 'complex' (120 hours base)
    Updated derivation_count from 8 to 15
    New estimate: 250 hours
  """
}

Summary Statistics

Modality automatically calculates project-level statistics:

  • Total hours: Sum of all estimates
  • Entity breakdown: Hours by entity
  • Source breakdown: Hours by source
  • Report breakdown: Hours by report
  • Complexity distribution: % low/medium/high
  • Coverage: % of model estimated
  • Incomplete estimates: Items missing estimates

These summaries help with:

  • Sprint planning
  • Team capacity planning
  • Budget forecasting
  • Risk identification (high complexity items)
  • Coverage tracking (what's not estimated)

Released under the MIT License.