Growth Metrics Made Simple Via DSL
Business analysts often need to compare current performance with historical data to understand trends and growth patterns. Questions like “How did our sales this month compare to last month?” or “What was our year-over-year revenue growth?” are fundamental to business intelligence.
However, writing the SQL queries to answer these questions proves to be surprisingly complex. A simple year-over-year comparison requires joining a table with itself, shifting dates, and handling edge cases like missing data or division by zero. This complexity creates a barrier between analysts and the insights they need.
This article demonstrates how a comprehensive DSL system addresses such analytical challenges through a detailed implementation of time-based comparisons. The focus is on the critical translation process from a higher-level, business-oriented DSL to a lower-level, execution-ready DSL that Apache Calcite can efficiently translate to database-specific SQL.
DSL Translation Architecture: High-Level to Low-Level Transformation
This implementation showcases a multi-layered DSL translation system that transforms business-oriented configurations into execution-ready representations. The time-based analytics module demonstrates the translation process from an intuitive, high-level DSL to a lower-level DSL that Apache Calcite can optimize and convert to database-specific SQL queries. Instead of analysts writing complex SQL with multiple joins and date calculations, they work with intuitive business configurations that the system automatically transforms through multiple DSL layers before reaching Apache Calcite for final SQL generation.
Core DSL System Capabilities Demonstrated
The time-based analytics module illustrates several key capabilities of the broader DSL system:
Year-over-Year (YoY): Compares current period values with the same period last year, for example: Compare Q3 2024 revenue with Q3 2023 revenue
Month-over-Month (MoM): Compares current period values with the previous month, for example: Compare September sales with August sales
Lookback: Retrieves historical values for comparison without calculating percentages, for example: Show both current revenue and last year’s revenue side by side
The system calculates percentage changes, absolute differences, and handles common edge cases automatically.
Key Capabilities
Flexible Time Periods: The system supports different aggregation levels (monthly, quarterly, yearly) and can truncate data appropriately. For instance, when comparing yearly data, it can group daily transactions by month for meaningful comparisons.
Mixed Metric Support: Users can combine both aggregated metrics (like SUM of sales) and non-aggregated metrics (like pre-calculated values from data warehouses) in the same query.
Robust Error Handling: The system automatically handles missing data, prevents division by zero errors, and manages null values gracefully.
Performance Optimization: For queries with multiple time-based metrics, the system intelligently combines database operations to minimize query complexity and execution time.
How It Works
The Multi-Layer Translation Process
The DSL translation system operates through distinct transformation layers, each serving a specific purpose in converting business intent to executable queries:
- Business DSL Analysis: The system interprets high-level business configurations that describe analytical operations in domain terms
- Intermediate DSL Generation: Business requirements are converted into a structured intermediate representation that captures analytical logic while remaining database-agnostic
- Query Structure Planning: The system decomposes complex operations into manageable sub-queries with proper relationships and dependencies
- Calcite-Compatible DSL: The intermediate representation is transformed into a lower-level DSL that Apache Calcite can understand and optimize
- SQL Generation: Apache Calcite processes the low-level DSL to produce database-specific optimized SQL
This layered approach allows the same business logic to be executed efficiently across different database platforms while maintaining separation between business concerns and execution details.
Input: Business-Oriented High-Level DSL
The system accepts business-focused configurations that express analytical intent without requiring knowledge of SQL complexity:
{
"datasource": {
"connId": "demo_datasource",
"name": "demo_datasource",
"tableType": "MYSQL"
},
"dimensions": [
{
"name": "order_date",
"type": "COLUMN",
"ref": "order_date"
}
],
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"operator": "SUM",
"derive": "YOY",
"extra": {
"dateColumn": "order_date",
"timeAdd": -1,
"timeAddUnit": "year",
"truncUnit": "month",
"srcType": "datetime",
"destType": "datetime"
}
}
],
"table": {
"name": "t1",
"ref": "demo.t_order",
"type": "table"
}
}
This high-level DSL configuration captures business intent (“compare order money year-over-year, grouped by month”) in a declarative format that abstracts away all SQL complexity.
Output: Executable SQL Query
The transformation process produces optimized SQL that handles all the complexity:
SELECT DATE_TRUNC('month', "t1"."order_date") AS "order_date",
IF("t4"."order_money" IS NULL OR "t4"."order_money" = 0, NULL,
("t1"."order_money" - "t4"."order_money") / "t4"."order_money") AS "order_money"
FROM (SELECT DATE_TRUNC('month', "order_date") AS "order_date",
SUM("order_money") AS "order_money"
FROM "demo"."t_order"
GROUP BY DATE_TRUNC('month', "order_date")) AS "t1"
LEFT JOIN (SELECT DATE_TRUNC('month', DATE_ADD('year', -1, "order_date")) AS "order_date",
SUM("order_money") AS "order_money"
FROM "demo"."t_order"
GROUP BY DATE_TRUNC('month', DATE_ADD('year', -1, "order_date"))) AS "t4"
ON "t1"."order_date" = "t4"."order_date"
DSL Translation Implementation Details
Core Translation Logic
The heart of the system manages the transformation from business-oriented DSL to Calcite-compatible low-level DSL:
private static DSLConfigNode yoyDslConvert(
DSLConfigNode dslConfigNode, List<OperatorExpression> yoyMetrics, boolean isAgg) {
OperatorExpression firstYoyMetric = yoyMetrics.get(0);
YoyExtra yoyExtra = YoyExtra.getYoyExtra(firstYoyMetric);
DSLConfigNode currentPeriodDsl = createYearDsl(
dslConfigNode, CURRENT_PERIOD_TABLE_ALIAS, 0, yoyExtra, yoyMetrics, isAgg);
List<DSLConfigNode> previousDsls = new ArrayList<>();
for (OperatorExpression yoyMetric : yoyMetrics) {
YoyExtra metricYoyExtra = YoyExtra.getYoyExtra(yoyMetric);
DSLConfigNode previousPeriodDsl = createYearDsl(
dslConfigNode, getTableAlias(yoyMetric),
metricYoyExtra.getTimeAdd(), metricYoyExtra, yoyMetrics, isAgg);
previousDsls.add(previousPeriodDsl);
}
return createYoyDsl(yoyMetrics, currentPeriodDsl, previousDsls, isAgg);
}
Formula Implementation
Different comparison types require different calculation formulas. The system implements these automatically:
private static String getRef(
OperatorExpression yoyMetric, String metricName, String previousPeriodTableAlias) {
Operator operator = getOperator(yoyMetric);
if (Objects.equals(operator, Operator.YOY)
|| Objects.equals(operator, Operator.MOM)) {
return String.format(
"IF(%s.%s IS NULL OR %s.%s = 0, NULL, (%s.%s - %s.%s) / %s.%s)",
previousPeriodTableAlias, metricName,
previousPeriodTableAlias, metricName,
CURRENT_PERIOD_TABLE_ALIAS, metricName,
previousPeriodTableAlias, metricName,
previousPeriodTableAlias, metricName
);
}
if (Objects.equals(operator, Operator.YOY_DIFF)
|| Objects.equals(operator, Operator.MOM_DIFF)) {
return String.format("%s.%s - %s.%s",
CURRENT_PERIOD_TABLE_ALIAS, metricName,
previousPeriodTableAlias, metricName
);
}
if (Objects.equals(operator, Operator.LOOKBACK)) {
return String.format("%s.%s",
previousPeriodTableAlias, metricName
);
}
return null;
}
This code demonstrates how the system:
- For
YOY/MOM: Calculates percentage change with proper null handling - For
YOY_DIFF/MOM_DIFF: Computes absolute differences - For
LOOKBACK: Simply returns the historical value
Technical Challenges Solved
Complex Date Handling
Working with time-based data involves numerous edge cases: leap years, missing data periods, different date formats, and timezone considerations. The DSL abstracts away these complexities while ensuring accurate calculations.
Query Optimization
When users request multiple time-based metrics in a single report, a naive approach would generate multiple separate queries. Instead, the system intelligently combines operations, reuses common table expressions, and minimizes database round trips.
Mixed Aggregation Scenarios
Real-world data warehouses often contain both raw transactional data and pre-aggregated summary tables. The DSL handles both scenarios and prevents users from accidentally mixing incompatible aggregation levels.
The system performs checks preventing the mixing of aggregated and non-aggregated metrics, ensuring data integrity while providing flexibility for different use cases.
Performance at Scale
For large datasets, time-based comparisons can become expensive operations. The system includes several optimization strategies:
- Reusing common date calculations across multiple metrics
- Combining similar joins to reduce query complexity
- Leveraging database-specific optimizations where available
For multiple metrics, the system combines joins that reference the same table, extracts fields from combined joins, and rewrites join conditions to minimize computational overhead.
Intermediate DSL Structure
The system uses a carefully designed intermediate DSL representation that bridges business requirements and execution planning. This structure contains all the information needed for Apache Calcite to generate optimal SQL:
{
"dslType": "NO_AGG",
"name": "yoyQuery",
"projections": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"srcType": "datetime",
"destType": "datetime",
"timeAddUnit": "year",
"timeAdd": 0,
"truncUnit": "month"
},
"operator": "IDENTIFIER"
},
{
"name": "order_money",
"type": "NEW_EXPRESSION",
"ref": "IF(t_yoy_previous_order_money_1_year.order_money IS NULL OR t_yoy_previous_order_money_1_year.order_money = 0, NULL, (t_yoy_current.order_money - t_yoy_previous_order_money_1_year.order_money) / t_yoy_previous_order_money_1_year.order_money)",
"operator": "IDENTIFIER"
}
],
"join": {
"joinTypes": ["left"],
"tables": [
{
"name": "t_yoy_current",
"type": "QUERY",
"ref": "t_yoy_current"
},
{
"name": "t_yoy_previous_order_money_1_year",
"type": "QUERY",
"ref": "t_yoy_previous_order_money_1_year"
}
],
"onFilters": [
[
{
"type": "COLUMN",
"ref": "t_yoy_current.order_date",
"operator": "EQ",
"value": "t_yoy_previous_order_money_1_year.order_date",
"valueType": "COLUMN"
}
]
]
},
"children": [
{
"dslType": "AGG",
"name": "t_yoy_current",
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"operator": "SUM"
}
],
"table": {
"name": "t1",
"type": "TABLE",
"ref": "demo.t_order"
},
"dimensions": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"timeAdd": 0,
"truncUnit": "month"
}
}
]
},
{
"dslType": "AGG",
"name": "t_yoy_previous_order_money_1_year",
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"operator": "SUM"
}
],
"table": {
"name": "t1",
"type": "TABLE",
"ref": "demo.t_order"
},
"dimensions": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"timeAdd": -1,
"truncUnit": "month"
}
}
]
}
]
}
This intermediate DSL demonstrates the system’s ability to decompose complex analytical operations into structured components that maintain semantic clarity while being ready for SQL generation by Apache Calcite.
Impact and Results
This DSL translation system dramatically reduces the complexity of implementing sophisticated analytical capabilities. The clear separation between business-oriented high-level DSL and execution-ready low-level DSL allows for independent evolution of business logic and optimization strategies.
The modular architecture makes it easy to extend the system with new analytical modules beyond time-based comparisons. Each module follows the same translation pattern: business DSL → intermediate processing → Calcite-compatible DSL → optimized SQL.
By establishing this translation framework, the system enables rapid development of new analytical capabilities while leveraging Apache Calcite’s powerful optimization engine for consistent, high-performance SQL generation across diverse analytical use cases.