Formula Engine

Abstract

The Formula Engine is a set of tools that provides to evaluate formulas. This document describes the architecture and general behavior of the Formula Engine and the different components that compose it, and the design choices that where made as well as the limitations of the current implementation.

How to read this document

Graph topology

There is a multiple of graph's topology, this section will list them for futur QA purposes. We will consider a formula as a tree graph. Real life graphs are more complex, but we can consider them as trees for demo purposes. An edge can be backward or forward, but not both. It can also be cross or self referring.

Loops are formed by either a backward edge or several cross edges

simple tree

mermaid

graph LR;
    A((A)) --> B;
    B((B)) --> D((D));
    B((B)) --> E((E));
    A((A)) --> C((C));
    C((C)) --> F((F));

forward edge

mermaid

graph LR;
    A((A)) --> B;
    B((B)) --> D((D));
    B((B)) --> E((E));
    A((A)) --> C((C));
    C((C)) --> F((F));
    A((A)) --> E((E));

backward edge

mermaid

graph LR;
    A((A)) --> B;
    B((B)) --> D((D));
    B((B)) --> E((E));
    A((A)) --> C((C));
    C((C)) --> F((F));
    F((F)) --> A((A));

cross edge

mermaid

graph LR;
    A((A)) --> B;
    B((B)) --> D((D));
    B((B)) --> E((E));
    A((A)) --> C((C));
    C((C)) --> F((F));
    E((E)) --> F((F));

self referring edge

mermaid

graph LR;
    A((A)) --> B;
    B((B)) --> D((D));
    B((B)) --> E((E));
    A((A)) --> C((C));
    C((C)) --> F((F));
    D((D)) --> D((D));

How to read the formula graphs

The graphs provided as example respect the following convention:

mermaid

graph LR;
    A((A)) --> B((B));

A and B are 2 formulas (on fields A and B), formula A needs the formula B to be evaluated. In other world we need to evaluate formula B before formula A.

mermaid

graph LR;
    A((A))
    B((B));
    subgraph rule#1
      copyB{{copy field B}}
    end
      A --> copyB
    copyB --> B

A is a formula with 1 rule (rule#1) that copies field B

mermaid

graph LR;
    A((A))
      A --> ifB
    subgraph rule#1
      ifB{if field B is empty}
      ifB -- false --> copy{{copy field B}} 
      ifB -- true --> set{{set val}} 
    end
    copy-->B((B));

A is a formula with 1 rule (rule#1) that if field B is empty, set a value, otherwise copy field B.

Design choices

Granularity

The granularity of the formula engine is the formula itself. This means that the formula engine is not able to evaluate or optimize rules, actions or conditions.

In some cases, if the granularity had been at rule or action/condition level, it would have been possible to run some rules.

Action / condition granularity

In this example we set up a formula on A and B fields

mermaid

flowchart LR;
   A((A))
   copyA{{copy field A}}
    setValB{{ set value }}
    B((B))
    Bif{field B is empty}
      A-->copyB 
    subgraph "rule #1 (target A)"
      copyB{{copy field B}}
    end
    subgraph "rule #2 (target B)"
        Bif -- true --> copyA
        Bif -- false --> setValB
        
    end
    B --> Bif;
     copyA-->A
     copyB-->B

If at run time when evaluating formula on field A (rule#1), we need to evaluate field B first (rule#2), when evaluating field B two scenarios are possible:

Field B is empty, and we evaluate B = A (back to rule#1), creating a loop (invalid)
Field B is not empty, and we evaluate 'set value', which is not a loop (valid)

Rules granularity

In this example we set up a formula on A and B fields

mermaid

flowchart LR;
   A((A))
   B((B))
   rule1[rule#1]
   rule2[rule#2]
   rule3[rule#3]
  A-->rule1
  A-->rule2
  rule1-->B
  B-->rule3
  rule3-->A

At run time, we would be able to evaluate rule 2 but not rule 1 or 3.

Formula granularity

Both condition/action and rule granularity were rejected because it would have been too complex to implement (would have needed a DFS at runtime), maintain, and use by the final user. The formula granularity with the same setup as above exclude both A and B formula from evaluation. Following the same use case as above, the effective dependency graph would be:

mermaid

graph RL;
    B((B)) --> A;
    A((A)) --> B;

This solution is also more secure because it is recursion-less, preventing infinite loops, and faster because it allows some parallelization.

Loop prevention

Circular dependencies are not allowed. If a formula depends on another formula, we are not able to know which one will be executed first.

mermaid

graph RL;
    B((B)) --> A;
    A((A)) --> B;

In the above example we need field A to evaluate field B, and field B to evaluate field A

In order to avoid circular dependencies, we have to evaluate the formulas graph topological order. If the topological sort has leftovers, it means there is at least one loop in the graph.

Topological sort

The topological sort is done via Khan's algorithm. It is a linear time algorithm that is able to detect loops in the graph. It is however unable to list the different loops

The main idea is to add to a stack and remove all the nodes that have no outgoing edges (ie leaves), from the graph. If the graph has new leaves after the removal, we repeat the operation until the graph does not have more leaves. Any node that is still in the graph is either in a loop or depends directly or indirectly on a loop, they will be called leftovers.

mermaid

graph LR;
    A((A))
    B((B))
    C((C))
    D((D))
    A-->B
    A-->C
    C-->D

In the above example, the topological sort would be: [B,D], [C], [A] with no leftovers.

mermaid

graph LR;
    A((A))
    B((B))
    C((C))
    D((D))
    A-->B
    A-->C
    C-->D
    D-->C

In the above example, the topological sort would be: [B] with leftovers [A,C,D], because C and D are dependent on each other, and A is dependent on C.

mermaid

graph LR;
    A((A))
    B((B))
    C((C))
    D((D))
    E((E))
    F((F))
    G((G))
    H((H))
    A-->C
    A-->B
    A-->D
    A-->E
    B-->D
    B-->E
    E-->F
    C-->F
    F-->G
    F-->H
    H-->D

In the above example, the topological sort would be: [G, D], [H], [F], [C,E], [B], [A] with no leftovers.

Self reference exception

If a formula has a self reference, it will not be considered as a dependency. This behavior is intended to allow field initialization.

For example, if we have a formula on A

mermaid

graph RL
        A((A)) --> B;
        D --> A;
    subgraph rule #1
         B{if A is empty};
        B -- true --> D{{set value}};
    end

The formula engine will not consider A as a dependency, and will evaluate it. The value of A in the condition will be whatever the item.values.A is at runtime

General behavior

Import

After the main process of the import is completed, a formula graph is generated from the database with only not-archived formulas and rules. The formula graph is then explored using Khan algorithm to check if there is a circular dependency. If there is one or multiple circular dependencies, then each formula in the cycle is marked as invalid, and each formula depending on a loop is marked as invalid. Then the topological order is saved on each valid formula.

In case of an error, additional info can be exported in the logs (details about the nodes, the loops etc).

Evaluation

The evaluation of the formula is done in 3 steps:

Load the formulas structures in the topological order, with the execution tooling, and evaluate them item by item. Formula marked as invalid are skipped.
Store change in a file
The file is then read and its data imported in the database When evaluating a self reference formula, the value of the field is the one in the item.values at runtime.

Security considerations

When running a formula we should consider the issue of infinite loops, and its potential DOS attack. The formula engine is designed to prevent infinite loops, and to be able to detect them. The absence of any recursion prevents them.

Performance considerations

The table import has to do post import checks, load all formulas in memory and graph them. This is both memory and CPU heavy but had no impact on evaluation. The algorithm to graph and explore the formulas itself is O(E+V) where E is the number of edges and V the number of vertices (nodes).

When evaluating the formulas the impact on performance is minimal because the topological order is already known. The implementation of khan algorithm outputs steps, each step containing N formulas to evaluate. The formulas are calculated in parallel, and the steps are calculated sequentially.

Limitations

Topological sort maximum size

Maximum number of steps in the topological sort. Default is 50.

Formula Engine ​

Abstract ​

How to read this document ​

Graph topology ​

How to read the formula graphs ​

Design choices ​

Granularity ​

Action / condition granularity ​

Rules granularity ​

Formula granularity ​

Loop prevention ​

Topological sort ​

Self reference exception ​

General behavior ​

Import ​

Evaluation ​

Security considerations ​

Performance considerations ​

Limitations ​

Topological sort maximum size ​

Goings further ​

Formula Engine

Abstract

How to read this document

Graph topology

How to read the formula graphs

Design choices

Granularity

Action / condition granularity

Rules granularity

Formula granularity

Loop prevention

Topological sort

Self reference exception

General behavior

Import

Evaluation

Security considerations

Performance considerations

Limitations

Topological sort maximum size

Goings further