Estimating
Accurate estimation is critical for planning data platform development. Modality's estimation framework helps you forecast development effort for building entities, integrating sources, and creating reports.
Overview
Modality provides three types of estimation:
- Entity Estimates: Effort to build data entities (tables, views, models)
- Source Estimates: Effort to integrate data sources
- Report Estimates: Effort to build dashboards and reports
Each estimate automatically calculates development hours based on technical attributes, with optional manual overrides.
Entity Estimation
Entity estimates calculate the effort required to build a data entity based on data quality, volume, complexity, and transformation requirements.
Attributes
Required Attributes
data_quality: Quality of source datalow: Significant cleaning and validation needed (+24 hours)medium: Some cleanup required (+8 hours)high: Clean, well-structured data (+0 hours)
data_volume: Amount of data to processsmall: < 1M records (+0 hours)medium: 1M - 10M records (+8 hours)large: 10M - 100M records (+16 hours)very-large: > 100M records (+32 hours)
scd_type: Slowly changing dimension patterntype-0: No history tracking (+0 hours)type-1: Overwrite changes (+8 hours)type-2: Full history tracking (+24 hours)type-3: Limited history (+16 hours)
business_rules_complexity: Business logic complexity (base hours)simple: Straightforward transformations (40 hours base)moderate: Some complex rules (80 hours base)complex: Extensive business logic (120 hours base)
requires_resolution: Multi-source entity resolution neededtrue: Entity resolution required (+16 hours)false: Single source or no resolution needed (+0 hours)
derivation_count: Number of derived/calculated fields- Each derivation adds 4 hours
kpi_count: Number of KPIs calculated from this entity- Each KPI adds 4 hours
testing_effort: Hours allocated for testing- User-defined value added to total
Optional Attributes
manual_estimate_hours: Override automatic calculationnotes: Estimation assumptions and notes
Estimation Formula
total_hours =
base_hours (from business_rules_complexity) +
data_quality_hours +
data_volume_hours +
scd_type_hours +
(requires_resolution ? 16 : 0) +
(derivation_count * 4) +
(kpi_count * 4) +
testing_effortComplexity Scoring
Modality also calculates a complexity score (low/medium/high) based on weighted points:
| Factor | Points |
|---|---|
| Data quality: low | 3 |
| Data quality: medium | 2 |
| Data quality: high | 1 |
| Data volume: very-large | 3 |
| Data volume: large | 2 |
| Data volume: medium | 1 |
| SCD type-2 | 3 |
| SCD type-3 | 2 |
| SCD type-1 | 1 |
| Business rules: complex | 3 |
| Business rules: moderate | 2 |
| Business rules: simple | 1 |
| Requires resolution | 2 |
| Derivations > 5 | 2 |
| Derivations 2-5 | 1 |
| KPIs > 5 | 2 |
| KPIs 2-5 | 1 |
Complexity levels:
- Score ≤ 6: low
- Score 7-12: medium
- Score ≥ 13: high
MML Syntax
entity_estimate "Domain.Entity" {
data_quality = "low" | "medium" | "high"
data_volume = "small" | "medium" | "large" | "very-large"
scd_type = "type-0" | "type-1" | "type-2" | "type-3"
business_rules_complexity = "simple" | "moderate" | "complex"
requires_resolution = true | false
derivation_count = 5
kpi_count = 3
testing_effort = 8
# Optional
manual_estimate_hours = 150 # Override calculation
notes = "Estimation assumptions..."
}Examples
Simple Entity
entity_estimate "Sales.Customer" {
data_quality = "high" # +0 hours
data_volume = "medium" # +8 hours
scd_type = "type-1" # +8 hours
business_rules_complexity = "simple" # 40 hours base
requires_resolution = false # +0 hours
derivation_count = 2 # +8 hours (2 * 4)
kpi_count = 1 # +4 hours
testing_effort = 8 # +8 hours
# Total: 40 + 0 + 8 + 8 + 0 + 8 + 4 + 8 = 76 hours
}Complex Entity
entity_estimate "Analytics.Customer360" {
data_quality = "low" # +24 hours (needs cleaning)
data_volume = "very-large" # +32 hours (>100M records)
scd_type = "type-2" # +24 hours (full history)
business_rules_complexity = "complex" # 120 hours base
requires_resolution = true # +16 hours (multi-source)
derivation_count = 15 # +60 hours (15 * 4)
kpi_count = 8 # +32 hours (8 * 4)
testing_effort = 16 # +16 hours
notes = "Requires resolution across CRM, billing, and support systems"
# Total: 120 + 24 + 32 + 24 + 16 + 60 + 32 + 16 = 324 hours
# Complexity: high (score 17+)
}Source Estimation
Source estimates calculate the effort required to integrate a data source based on connectivity, transformation, and refresh requirements.
Attributes
Required Attributes
source_complexity: Technical integration complexity (base hours)simple: Direct database connection, standard schema (4 hours base)moderate: API integration, some transformation needed (8 hours base)complex: Custom connectors, complex protocols (16 hours base)
data_volume: Amount of data to ingestsmall: < 1GB (+ 1 hour)medium: 1GB - 100GB (+2 hours)large: 100GB - 1TB (+4 hours)very-large: > 1TB (+8 hours)
refresh_frequency: How often data is refreshedmonthly: (+1 hour)weekly: (+1 hour)daily: (+2 hours)hourly: (+4 hours)real-time: Streaming setup (+12 hours)
requires_auth: Authentication/authorization neededtrue: OAuth, API keys, complex auth (+4 hours)false: Direct access (+0 hours)
requires_transformation: Significant data transformation neededtrue: Schema mapping, denormalization (+6 hours)false: Direct copy (+0 hours)
Optional Attributes
manual_estimate_hours: Override automatic calculationnotes: Integration assumptions and notes
Estimation Formula
total_hours =
base_hours (from source_complexity) +
data_volume_hours +
refresh_frequency_hours +
(requires_auth ? 4 : 0) +
(requires_transformation ? 6 : 0)MML Syntax
source_estimate "SourceName" {
source_complexity = "simple" | "moderate" | "complex"
data_volume = "small" | "medium" | "large" | "very-large"
refresh_frequency = "real-time" | "hourly" | "daily" | "weekly" | "monthly"
requires_auth = true | false
requires_transformation = true | false
# Optional
manual_estimate_hours = 30
notes = "Estimation assumptions..."
}Examples
Simple Database Source
source_estimate "PostgreSQL_Production" {
source_complexity = "simple" # 4 hours base
data_volume = "medium" # +2 hours
refresh_frequency = "daily" # +2 hours
requires_auth = false # +0 hours
requires_transformation = false # +0 hours
# Total: 4 + 2 + 2 + 0 + 0 = 8 hours
}Complex API Source
source_estimate "Salesforce_API" {
source_complexity = "complex" # 16 hours base (custom connector)
data_volume = "large" # +4 hours
refresh_frequency = "hourly" # +4 hours
requires_auth = true # +4 hours (OAuth)
requires_transformation = true # +6 hours (schema mapping)
notes = "OAuth refresh token management, rate limiting, pagination"
# Total: 16 + 4 + 4 + 4 + 6 = 34 hours
}Real-time Streaming Source
source_estimate "Kafka_Events" {
source_complexity = "moderate" # 8 hours base
data_volume = "very-large" # +8 hours
refresh_frequency = "real-time" # +12 hours (streaming)
requires_auth = true # +4 hours
requires_transformation = true # +6 hours
notes = "Kafka consumer setup, schema registry, dead letter queue"
# Total: 8 + 8 + 12 + 4 + 6 = 38 hours
}Report Estimation
Report estimates calculate the effort required to build dashboards and reports based on complexity, visualizations, and interactivity.
Attributes
Required Attributes
report_complexity: Overall report complexity (base hours)simple: Single-page, few metrics (40 hours / 1 week)moderate: Multi-page, standard visualizations (80 hours / 2 weeks)complex: Advanced features, custom views (120 hours / 3 weeks)
visualization_count: Number of charts/visualizations- Each visualization adds 4 hours
interactivity_level: User interaction complexitystatic: No interaction (+0 hours)basic: Filters and drill-downs (+8 hours)advanced: Dynamic parameters, cross-filtering (+24 hours)
requires_custom_design: Custom UI/UX design neededtrue: Brand-specific design (+16 hours)false: Standard templates (+0 hours)
Optional Attributes
manual_estimate_hours: Override automatic calculationnotes: Design and feature assumptions
Estimation Formula
total_hours =
base_hours (from report_complexity) +
(visualization_count * 4) +
interactivity_hours +
(requires_custom_design ? 16 : 0)MML Syntax
report_estimate "ProductName.ReportName" {
report_complexity = "simple" | "moderate" | "complex"
visualization_count = 8
interactivity_level = "static" | "basic" | "advanced"
requires_custom_design = true | false
# Optional
manual_estimate_hours = 120
notes = "Design assumptions..."
}Examples
Simple Dashboard
report_estimate "Sales Analytics.Revenue Dashboard" {
report_complexity = "simple" # 40 hours base
visualization_count = 5 # +20 hours (5 * 4)
interactivity_level = "basic" # +8 hours
requires_custom_design = false # +0 hours
# Total: 40 + 20 + 8 + 0 = 68 hours
}Complex Executive Dashboard
report_estimate "Executive Dashboard.Company KPIs" {
report_complexity = "complex" # 120 hours base
visualization_count = 15 # +60 hours (15 * 4)
interactivity_level = "advanced" # +24 hours
requires_custom_design = true # +16 hours
notes = "Custom branding, real-time updates, mobile responsive"
# Total: 120 + 60 + 24 + 16 = 220 hours
}Complete Example
# Conceptual entities
domain "Sales" {
entity "Customer" {
type = "entity"
description = "Customer master data"
}
entity "Order" {
type = "entity"
description = "Sales orders"
}
}
# Entity estimates
entity_estimate "Sales.Customer" {
data_quality = "medium"
data_volume = "large"
scd_type = "type-2"
business_rules_complexity = "moderate"
requires_resolution = true
derivation_count = 8
kpi_count = 5
testing_effort = 16
notes = "Resolution across CRM and billing systems"
}
entity_estimate "Sales.Order" {
data_quality = "high"
data_volume = "very-large"
scd_type = "type-1"
business_rules_complexity = "moderate"
requires_resolution = false
derivation_count = 5
kpi_count = 3
testing_effort = 8
}
# Source estimates
source_estimate "PostgreSQL_Production" {
source_complexity = "simple"
data_volume = "large"
refresh_frequency = "hourly"
requires_auth = false
requires_transformation = false
}
source_estimate "Salesforce_API" {
source_complexity = "complex"
data_volume = "medium"
refresh_frequency = "daily"
requires_auth = true
requires_transformation = true
notes = "OAuth setup, rate limiting considerations"
}
# Data product
data_product "Sales Analytics" {
owner = "sales-team"
report "Revenue Dashboard" {
metric "Total Revenue" {
calculation = "SUM(Order.amount)"
uses "Sales.Order"
}
}
}
# Report estimate
report_estimate "Sales Analytics.Revenue Dashboard" {
report_complexity = "moderate"
visualization_count = 10
interactivity_level = "advanced"
requires_custom_design = false
}Best Practices
1. Estimate Early and Often
Create estimates during planning, before development starts:
# Estimate during architecture design
entity_estimate "Planned.NewEntity" {
business_rules_complexity = "moderate"
data_quality = "medium"
# ... other attributes
}2. Use Manual Overrides for Known Complexity
If you have prior experience, override the formula:
entity_estimate "Sales.Customer" {
# Automatic calculation
data_quality = "high"
data_volume = "large"
# ... results in 84 hours
# But based on prior experience:
manual_estimate_hours = 120
notes = "Previous similar entity took 120 hours due to legacy system integration"
}3. Document Assumptions
Always explain your estimation reasoning:
source_estimate "Legacy_Mainframe" {
source_complexity = "complex"
manual_estimate_hours = 200
notes = """
Complex COBOL extraction required.
Need to build custom connector.
Data quality issues discovered in analysis.
Requires coordination with mainframe team.
"""
}4. Track Actuals vs. Estimates
Use notes to record actual time spent:
entity_estimate "Sales.Customer" {
estimated_hours = 120
notes = """
Initial estimate: 120 hours
Actual time: 145 hours
Variance: +25 hours due to unexpected data quality issues
"""
}5. Review and Refine
Update estimates as you learn more:
entity_estimate "Analytics.Customer360" {
# Initial estimate
business_rules_complexity = "moderate" # 80 hours
notes = """
Week 1: Discovered additional business rules
Updated complexity to 'complex' (120 hours base)
Updated derivation_count from 8 to 15
New estimate: 250 hours
"""
}Summary Statistics
Modality automatically calculates project-level statistics:
- Total hours: Sum of all estimates
- Entity breakdown: Hours by entity
- Source breakdown: Hours by source
- Report breakdown: Hours by report
- Complexity distribution: % low/medium/high
- Coverage: % of model estimated
- Incomplete estimates: Items missing estimates
These summaries help with:
- Sprint planning
- Team capacity planning
- Budget forecasting
- Risk identification (high complexity items)
- Coverage tracking (what's not estimated)
