Growth Metrics Made Simple Via DSL

Business analysts often need to compare current performance with historical data to understand trends and growth patterns. Questions like “How did our sales this month compare to last month?” or “What was our year-over-year revenue growth?” are fundamental to business intelligence.

 

However, writing the SQL queries to answer these questions proves to be surprisingly complex. A simple year-over-year comparison requires joining a table with itself, shifting dates, and handling edge cases like missing data or division by zero. This complexity creates a barrier between analysts and the insights they need.

 

This article demonstrates how a comprehensive DSL system addresses such analytical challenges through a detailed implementation of time-based comparisons. The focus is on the critical translation process from a higher-level, business-oriented DSL to a lower-level, execution-ready DSL that Apache Calcite can efficiently translate to database-specific SQL.

DSL Translation Architecture: High-Level to Low-Level Transformation

 

This implementation showcases a multi-layered DSL translation system that transforms business-oriented configurations into execution-ready representations. The time-based analytics module demonstrates the translation process from an intuitive, high-level DSL to a lower-level DSL that Apache Calcite can optimize and convert to database-specific SQL queries. Instead of analysts writing complex SQL with multiple joins and date calculations, they work with intuitive business configurations that the system automatically transforms through multiple DSL layers before reaching Apache Calcite for final SQL generation.

Core DSL System Capabilities Demonstrated

 

The time-based analytics module illustrates several key capabilities of the broader DSL system:

 

Year-over-Year (YoY): Compares current period values with the same period last year, for example: Compare Q3 2024 revenue with Q3 2023 revenue

 

Month-over-Month (MoM): Compares current period values with the previous month, for example: Compare September sales with August sales

 

Lookback: Retrieves historical values for comparison without calculating percentages, for example: Show both current revenue and last year’s revenue side by side

 

The system calculates percentage changes, absolute differences, and handles common edge cases automatically.

 

Key Capabilities

 

Flexible Time Periods: The system supports different aggregation levels (monthly, quarterly, yearly) and can truncate data appropriately. For instance, when comparing yearly data, it can group daily transactions by month for meaningful comparisons.

 

Mixed Metric Support: Users can combine both aggregated metrics (like SUM of sales) and non-aggregated metrics (like pre-calculated values from data warehouses) in the same query.

 

Robust Error Handling: The system automatically handles missing data, prevents division by zero errors, and manages null values gracefully.

 

Performance Optimization: For queries with multiple time-based metrics, the system intelligently combines database operations to minimize query complexity and execution time.

 

How It Works

 

The Multi-Layer Translation Process

 

The DSL translation system operates through distinct transformation layers, each serving a specific purpose in converting business intent to executable queries:

 

  1. Business DSL Analysis: The system interprets high-level business configurations that describe analytical operations in domain terms

 

  1. Intermediate DSL Generation: Business requirements are converted into a structured intermediate representation that captures analytical logic while remaining database-agnostic

 

  1. Query Structure Planning: The system decomposes complex operations into manageable sub-queries with proper relationships and dependencies

 

  1. Calcite-Compatible DSL: The intermediate representation is transformed into a lower-level DSL that Apache Calcite can understand and optimize

 

  1. SQL Generation: Apache Calcite processes the low-level DSL to produce database-specific optimized SQL

 

This layered approach allows the same business logic to be executed efficiently across different database platforms while maintaining separation between business concerns and execution details.

 

Input: Business-Oriented High-Level DSL

 

The system accepts business-focused configurations that express analytical intent without requiring knowledge of SQL complexity:

 

{
  "datasource": {
    "connId": "demo_datasource",
    "name": "demo_datasource",
    "tableType": "MYSQL"
  },
  "dimensions": [
    {
      "name": "order_date",
      "type": "COLUMN",
      "ref": "order_date"
    }
  ],
  "metrics": [
    {
      "name": "order_money",
      "type": "COLUMN",
      "ref": "order_money",
      "operator": "SUM",
      "derive": "YOY",
      "extra": {
        "dateColumn": "order_date",
        "timeAdd": -1,
        "timeAddUnit": "year",
        "truncUnit": "month",
        "srcType": "datetime",
        "destType": "datetime"
      }
    }
  ],
  "table": {
    "name": "t1",
    "ref": "demo.t_order",
    "type": "table"
  }
}

 

This high-level DSL configuration captures business intent (“compare order money year-over-year, grouped by month”) in a declarative format that abstracts away all SQL complexity.

 

Output: Executable SQL Query

 

The transformation process produces optimized SQL that handles all the complexity:

 

SELECT DATE_TRUNC('month', "t1"."order_date") AS "order_date", 
       IF("t4"."order_money" IS NULL OR "t4"."order_money" = 0, NULL, 
          ("t1"."order_money" - "t4"."order_money") / "t4"."order_money") AS "order_money"
FROM (SELECT DATE_TRUNC('month', "order_date") AS "order_date", 
             SUM("order_money") AS "order_money"
      FROM "demo"."t_order"
      GROUP BY DATE_TRUNC('month', "order_date")) AS "t1"
LEFT JOIN (SELECT DATE_TRUNC('month', DATE_ADD('year', -1, "order_date")) AS "order_date", 
                  SUM("order_money") AS "order_money"
           FROM "demo"."t_order"
           GROUP BY DATE_TRUNC('month', DATE_ADD('year', -1, "order_date"))) AS "t4" 
     ON "t1"."order_date" = "t4"."order_date"

DSL Translation Implementation Details

 

Core Translation Logic

 

The heart of the system manages the transformation from business-oriented DSL to Calcite-compatible low-level DSL:

 

private static DSLConfigNode yoyDslConvert(
    DSLConfigNode dslConfigNode, List<OperatorExpression> yoyMetrics, boolean isAgg) {
  OperatorExpression firstYoyMetric = yoyMetrics.get(0);
  YoyExtra yoyExtra = YoyExtra.getYoyExtra(firstYoyMetric);
  DSLConfigNode currentPeriodDsl = createYearDsl(
      dslConfigNode, CURRENT_PERIOD_TABLE_ALIAS, 0, yoyExtra, yoyMetrics, isAgg);
  List<DSLConfigNode> previousDsls = new ArrayList<>();
  for (OperatorExpression yoyMetric : yoyMetrics) {
    YoyExtra metricYoyExtra = YoyExtra.getYoyExtra(yoyMetric);
    DSLConfigNode previousPeriodDsl = createYearDsl(
        dslConfigNode, getTableAlias(yoyMetric),
        metricYoyExtra.getTimeAdd(), metricYoyExtra, yoyMetrics, isAgg);
    previousDsls.add(previousPeriodDsl);
  }
  return createYoyDsl(yoyMetrics, currentPeriodDsl, previousDsls, isAgg);
}

 

Formula Implementation

 

Different comparison types require different calculation formulas. The system implements these automatically:

 

private static String getRef(
    OperatorExpression yoyMetric, String metricName, String previousPeriodTableAlias) {
  Operator operator = getOperator(yoyMetric);
  if (Objects.equals(operator, Operator.YOY)
      || Objects.equals(operator, Operator.MOM)) {
    return String.format(
        "IF(%s.%s IS NULL OR %s.%s = 0, NULL, (%s.%s - %s.%s) / %s.%s)",
        previousPeriodTableAlias, metricName,
        previousPeriodTableAlias, metricName,
        CURRENT_PERIOD_TABLE_ALIAS, metricName,
        previousPeriodTableAlias, metricName,
        previousPeriodTableAlias, metricName
    );
  }
  if (Objects.equals(operator, Operator.YOY_DIFF)
      || Objects.equals(operator, Operator.MOM_DIFF)) {
    return String.format("%s.%s - %s.%s",
        CURRENT_PERIOD_TABLE_ALIAS, metricName,
        previousPeriodTableAlias, metricName
    );
  }
  if (Objects.equals(operator, Operator.LOOKBACK)) {
    return String.format("%s.%s",
        previousPeriodTableAlias, metricName
    );
  }
  return null;
}

 

This code demonstrates how the system:

  • For YOY/MOM: Calculates percentage change with proper null handling
  • For YOY_DIFF/MOM_DIFF: Computes absolute differences
  • For LOOKBACK: Simply returns the historical value

 

Technical Challenges Solved

 

Complex Date Handling

Working with time-based data involves numerous edge cases: leap years, missing data periods, different date formats, and timezone considerations. The DSL abstracts away these complexities while ensuring accurate calculations.

 

Query Optimization

When users request multiple time-based metrics in a single report, a naive approach would generate multiple separate queries. Instead, the system intelligently combines operations, reuses common table expressions, and minimizes database round trips.

 

Mixed Aggregation Scenarios

Real-world data warehouses often contain both raw transactional data and pre-aggregated summary tables. The DSL handles both scenarios and prevents users from accidentally mixing incompatible aggregation levels.

 

The system performs checks preventing the mixing of aggregated and non-aggregated metrics, ensuring data integrity while providing flexibility for different use cases.

 

Performance at Scale

For large datasets, time-based comparisons can become expensive operations. The system includes several optimization strategies:

  • Reusing common date calculations across multiple metrics
  • Combining similar joins to reduce query complexity
  • Leveraging database-specific optimizations where available

 

For multiple metrics, the system combines joins that reference the same table, extracts fields from combined joins, and rewrites join conditions to minimize computational overhead.

 

Intermediate DSL Structure

 

The system uses a carefully designed intermediate DSL representation that bridges business requirements and execution planning. This structure contains all the information needed for Apache Calcite to generate optimal SQL:

 

{
  "dslType": "NO_AGG",
  "name": "yoyQuery",
  "projections": [
    {
      "name": "order_date",
      "type": "UNIFY_CAST",
      "ref": "order_date",
      "refExtra": {
        "srcType": "datetime",
        "destType": "datetime",
        "timeAddUnit": "year",
        "timeAdd": 0,
        "truncUnit": "month"
      },
      "operator": "IDENTIFIER"
    },
    {
      "name": "order_money",
      "type": "NEW_EXPRESSION",
      "ref": "IF(t_yoy_previous_order_money_1_year.order_money IS NULL OR t_yoy_previous_order_money_1_year.order_money = 0, NULL, (t_yoy_current.order_money - t_yoy_previous_order_money_1_year.order_money) / t_yoy_previous_order_money_1_year.order_money)",
      "operator": "IDENTIFIER"
    }
  ],
  "join": {
    "joinTypes": ["left"],
    "tables": [
      {
        "name": "t_yoy_current",
        "type": "QUERY",
        "ref": "t_yoy_current"
      },
      {
        "name": "t_yoy_previous_order_money_1_year",
        "type": "QUERY",
        "ref": "t_yoy_previous_order_money_1_year"
      }
    ],
    "onFilters": [
      [
        {
          "type": "COLUMN",
          "ref": "t_yoy_current.order_date",
          "operator": "EQ",
          "value": "t_yoy_previous_order_money_1_year.order_date",
          "valueType": "COLUMN"
        }
      ]
    ]
  },
  "children": [
    {
      "dslType": "AGG",
      "name": "t_yoy_current",
      "metrics": [
        {
          "name": "order_money",
          "type": "COLUMN",
          "ref": "order_money",
          "operator": "SUM"
        }
      ],
      "table": {
        "name": "t1",
        "type": "TABLE",
        "ref": "demo.t_order"
      },
      "dimensions": [
        {
          "name": "order_date",
          "type": "UNIFY_CAST",
          "ref": "order_date",
          "refExtra": {
            "timeAdd": 0,
            "truncUnit": "month"
          }
        }
      ]
    },
    {
      "dslType": "AGG", 
      "name": "t_yoy_previous_order_money_1_year",
      "metrics": [
        {
          "name": "order_money",
          "type": "COLUMN", 
          "ref": "order_money",
          "operator": "SUM"
        }
      ],
      "table": {
        "name": "t1",
        "type": "TABLE",
        "ref": "demo.t_order"
      },
      "dimensions": [
        {
          "name": "order_date",
          "type": "UNIFY_CAST",
          "ref": "order_date", 
          "refExtra": {
            "timeAdd": -1,
            "truncUnit": "month"
          }
        }
      ]
    }
  ]
}

 

This intermediate DSL demonstrates the system’s ability to decompose complex analytical operations into structured components that maintain semantic clarity while being ready for SQL generation by Apache Calcite.

 

Impact and Results

 

This DSL translation system dramatically reduces the complexity of implementing sophisticated analytical capabilities. The clear separation between business-oriented high-level DSL and execution-ready low-level DSL allows for independent evolution of business logic and optimization strategies.

 

The modular architecture makes it easy to extend the system with new analytical modules beyond time-based comparisons. Each module follows the same translation pattern: business DSL → intermediate processing → Calcite-compatible DSL → optimized SQL.

 

By establishing this translation framework, the system enables rapid development of new analytical capabilities while leveraging Apache Calcite’s powerful optimization engine for consistent, high-performance SQL generation across diverse analytical use cases.