DSL
YoY
Intention
We provide a series of frequently used financial comparison functions such as Year-over-year (YoY), for looking at two or more measurable events on an annualized basis.
- YoY: Year-over-year
- MoM: Month-over-month
- Lookback: looking back certain value over a certain period of time.
It allows user to calculate relative changes in metrics over time periods by comparing current values with historical values (previous year or month).
Reference: https://help.sigmacomputing.com/docs/datelookback
Solution
This is implemented through a DSL converter that transforms high-level metric definitions into SQL queries.
This implementation allows users to easily define time-based comparative analytics without writing complex SQL joins and calculations manually.
Key Features
- Support both aggregated and non-aggregated metrics
- Handles multiple YoY/MoM metrics in the same query
- Supports data truncation (e.g., by month)
- Handles different date formats and types
- Performs null checks and division by zero protection
Main Components
YoYDslConverter
: The main class that handles the conversionYoYExtra
: Contains configuration for time-based comparisonsDSLConfigNode
: Represents the structure of quires
Workflow
Input Processing
The system accepts DSL configurations where metrics can be marked with special operators:
- YOY: Year-over-Year percentage change
- MOM: Month-over-Month percentage change
- YOY_DIFF: Difference between years
- MOM_DIFF: Difference between months
- LOOKBACK: just fetch the historical value
The example can be found in the appendix.
Conversion Process
The converter:
- Identified YoY/MoM metrics in the input
- Creates two queries:
- One for current period data
- One for historical period data (shifted by year/month)
- Join these queries together
- Applies the appropriate calculation formula
The core conversion logic can be found in the appendix.
Formula Implementation
The system implements different formulas based on the operator:
- For YOY/MOM:
(current - previous) / previous
- For YOY_DIFF/MOM_DIFF:
current - previous
- For LOOKBACK: just returns the previous value
The formula implementation can be found in the appendix
Output
The converter first converts it into low-level DSL that can be interpreted by SQL converter.
Then it finally converts into SQL queries.
Both examples can be found in the appendix.
Challenges
I handle these challenges through:
- Modular code structure
- Clear separation of concerns between DSL parsing and SQL generation
- Flexible DSL configuration
- Unit tests (30+)
- Strong validation checks
- Comprehensive error handling
Multiple and Nested Conversion in a Single Query
- Complex JOIN conditions
- Supporting multiple YoY/MoM metrics simultaneously
- Ensuring consistent date column usage across metrics
Non-aggregation Support
We provide aggregated (with GROUP BY) YoY and MoM operations by default. However, for pre-calculated tables that are already aggregated. A non-aggregation operation is needed for such a scenario.
Challenges
- Aggregated metrics
- Non-aggregated metrics (raw data)
Handle - Perform checks preventing the mixing of aggregated and non-aggregated metrics
Query Optimization
Query performance can be degraded if multiple YoY/MoM metrics in a single query.
For multiple metrics:
- Combine joints that reference the same table
- Extract fields from combined joints
- Rewrite join conditions
For nested queries:
- Convert the underlying view into the result table
- Rewrite the reference into the result table
Edge cases
- Missing data handling
- Divided by zero handling
- Null value propagation
- String date format parsing
Appendix
DSL and SQL
Example of High Level DSL
{
"datasource": {
"connId": "demo_datasource",
"name": "demo_datasource",
"tableType": "MYSQL"
},
"dimensions": [
{
"name": "order_date",
"type": "COLUMN",
"ref": "order_date"
}
],
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"operator": "SUM",
"derive": "YOY",
"extra": {
"dateColumn": "order_date",
"timeAdd": -1,
"timeAddUnit": "year",
"truncUnit": "month",
"srcType": "datetime",
"srcFormat": null,
"destType": "datetime",
"destFormat": null
}
}
],
"table": {
"name": "t1",
"ref": "demo.t_order",
"type": "table",
"refExtra": {
"metadata": {
"columns": [
"order_date",
"order_money"
],
"columnTypes": [
"datetime",
"decimal"
]
}
}
}
}
Example of low level DSL
{
"dslType": "NO_AGG",
"name": "yoyQuery",
"projections": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"srcType": "datetime",
"srcFormat": null,
"destType": "datetime",
"destFormat": null,
"timeAddUnit": "year",
"timeAdd": 0,
"truncUnit": "month"
},
"evict": false,
"operator": "IDENTIFIER",
"nodeName": ""
},
{
"name": "order_money",
"type": "NEW_EXPRESSION",
"ref": "IF(t_yoy_previous_order_money_1_year.order_money IS NULL OR t_yoy_previous_order_money_1_year.order_money = 0, NULL, (t_yoy_current.order_money - t_yoy_previous_order_money_1_year.order_money) / t_yoy_previous_order_money_1_year.order_money)",
"evict": false,
"operator": "IDENTIFIER",
"nodeName": ""
}
],
"join": {
"joinTypes": [
"left"
],
"tables": [
{
"name": "t_yoy_current",
"type": "QUERY",
"ref": "t_yoy_current",
"evict": false,
"nodeName": ""
},
{
"name": "t_yoy_previous_order_money_1_year",
"type": "QUERY",
"ref": "t_yoy_previous_order_money_1_year",
"evict": false,
"nodeName": ""
}
],
"on": null,
"onFilters": [
[
{
"type": "COLUMN",
"ref": "t_yoy_current.order_date",
"evict": false,
"operator": "EQ",
"value": "t_yoy_previous_order_money_1_year.order_date",
"valueType": "COLUMN",
"nodeName": ""
},
{
"type": "COLUMN",
"ref": "t_yoy_current.order_date",
"evict": false,
"operator": "EQ",
"value": "t_yoy_previous_order_money_1_year.order_date",
"valueType": "COLUMN",
"nodeName": ""
}
]
],
"nodeName": "",
"pos": null
},
"children": [
{
"dslType": "AGG",
"name": "t_yoy_current",
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"evict": false,
"operator": "SUM",
"nodeName": ""
}
],
"metricsCombine": [],
"datasource": {
"name": "demo_datasource",
"connId": "demo_datasource",
"tableType": "MYSQL",
"defaultDb": null,
"nodeName": "",
"pos": null
},
"table": {
"name": "t1",
"type": "TABLE",
"ref": "demo.t_order",
"refExtra": {
"metadata": {
"columns": [
"order_date",
"order_money"
],
"columnTypes": [
"datetime",
"decimal"
]
}
},
"evict": false,
"nodeName": ""
},
"dimensions": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"srcType": "datetime",
"srcFormat": null,
"destType": "datetime",
"destFormat": null,
"timeAddUnit": "year",
"timeAdd": 0,
"truncUnit": "month"
},
"evict": false,
"operator": "IDENTIFIER",
"nodeName": ""
}
],
"limit": 0,
"offset": 0,
"nodeName": ""
},
{
"dslType": "AGG",
"name": "t_yoy_previous_order_money_1_year",
"metrics": [
{
"name": "order_money",
"type": "COLUMN",
"ref": "order_money",
"evict": false,
"operator": "SUM",
"nodeName": ""
}
],
"metricsCombine": [],
"datasource": {
"name": "demo_datasource",
"connId": "demo_datasource",
"tableType": "MYSQL",
"defaultDb": null,
"nodeName": "",
"pos": null
},
"table": {
"name": "t1",
"type": "TABLE",
"ref": "demo.t_order",
"refExtra": {
"metadata": {
"columns": [
"order_date",
"order_money"
],
"columnTypes": [
"datetime",
"decimal"
]
}
},
"evict": false,
"nodeName": ""
},
"dimensions": [
{
"name": "order_date",
"type": "UNIFY_CAST",
"ref": "order_date",
"refExtra": {
"srcType": "datetime",
"srcFormat": null,
"destType": "datetime",
"destFormat": null,
"timeAddUnit": "year",
"timeAdd": -1,
"truncUnit": "month"
},
"evict": false,
"operator": "IDENTIFIER",
"nodeName": ""
}
],
"limit": 0,
"offset": 0,
"nodeName": ""
}
],
"limit": 0,
"offset": 0,
"nodeName": ""
}
Example of output SQL
SELECT DATE_TRUNC('month', "t1"."order_date") AS "order_date", IF("t4"."order_money" IS NULL OR "t4"."order_money" = 0, NULL, ("t1"."order_money" - "t4"."order_money") / "t4"."order_money") AS "order_money"
FROM (SELECT DATE_TRUNC('month', "order_date") AS "order_date", SUM("order_money") AS "order_money"
FROM "demo"."t_order"
GROUP BY DATE_TRUNC('month', "order_date")) AS "t1"
LEFT JOIN (SELECT DATE_TRUNC('month', DATE_ADD('year', -1, "order_date")) AS "order_date", SUM("order_money") AS "order_money"
FROM "demo"."t_order"
GROUP BY DATE_TRUNC('month', DATE_ADD('year', -1, "order_date"))) AS "t4" ON "t1"."order_date" = "t4"."order_date"
Core conversion logic
private static DSLConfigNode yoyDslConvert(
DSLConfigNode dslConfigNode, List<OperatorExpression> yoyMetrics, boolean isAgg) {
// 使用第一个指标获取基础 YoyExtra 配置
OperatorExpression firstYoyMetric = yoyMetrics.get(0);
YoyExtra yoyExtra = YoyExtra.getYoyExtra(firstYoyMetric);
// 创建当前周期查询
DSLConfigNode currentPeriodDsl = createYearDsl(
dslConfigNode, CURRENT_PERIOD_TABLE_ALIAS, 0, yoyExtra, yoyMetrics, isAgg);
// 为每个 YoY 指标创建对应的历史查询
List<DSLConfigNode> previousDsls = new ArrayList<>();
for (OperatorExpression yoyMetric : yoyMetrics) {
YoyExtra metricYoyExtra = YoyExtra.getYoyExtra(yoyMetric);
DSLConfigNode previousPeriodDsl = createYearDsl(
dslConfigNode, getTableAlias(yoyMetric),
metricYoyExtra.getTimeAdd(), metricYoyExtra, yoyMetrics, isAgg);
previousDsls.add(previousPeriodDsl);
}
return createYoyDsl(yoyMetrics, currentPeriodDsl, previousDsls, isAgg);
}
Formula implementation
private static String getRef(
OperatorExpression yoyMetric, String metricName, String previousPeriodTableAlias) {
Operator operator = getOperator(yoyMetric);
if (Objects.equals(operator, Operator.YOY)
|| Objects.equals(operator, Operator.MOM)) {
return String.format(
"IF(%s.%s IS NULL OR %s.%s = 0, NULL, (%s.%s - %s.%s) / %s.%s)",
previousPeriodTableAlias, metricName,
previousPeriodTableAlias, metricName,
CURRENT_PERIOD_TABLE_ALIAS, metricName,
previousPeriodTableAlias, metricName,
previousPeriodTableAlias, metricName
);
}
if (Objects.equals(operator, Operator.YOY_DIFF)
|| Objects.equals(operator, Operator.MOM_DIFF)) {
return String.format("%s.%s - %s.%s",
CURRENT_PERIOD_TABLE_ALIAS, metricName,
previousPeriodTableAlias, metricName
);
}
if (Objects.equals(operator, Operator.LOOKBACK)) {
return String.format("%s.%s",
previousPeriodTableAlias, metricName
);
}
return null;
}